Resumen: Event cameras are sensors of great interest for many applications that run in low-resource and challenging environments. They log sparse illumination changes with high temporal resolution and high dynamic range, while they present minimal power consumption. However, top-performing methods often ignore specific event-data properties, leading to the development of generic but computationally expensive algorithms. Efforts toward efficient solutions usually do not achieve top-accuracy results for complex tasks. This work proposes a novel framework, Event Transformer (EvT) 1 , that effectively takes advantage of event-data properties to be highly efficient and accurate. We introduce a new patch-based event representation and a compact transformer-like architecture to process it. EvT is evaluated on different event-based benchmarks for action and gesture recognition. Evaluation results show better or comparable accuracy to the state-of-the-art while requiring significantly less computation resources, which makes EvT able to work with minimal latency both on GPU and CPU. Idioma: Inglés DOI: 10.1109/CVPRW56347.2022.00301 Año: 2022 Publicado en: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2022 (2022), 2676-2685 ISSN: 2160-7508 Financiación: info:eu-repo/grantAgreement/ES/DGA-FSE/T45-17R Financiación: info:eu-repo/grantAgreement/ES/MICIU-AEI-FEDER/PGC2018-098817-A-I00 Tipo y forma: Article (PostPrint) Área (Departamento): Área Ingen.Sistemas y Automát. (Dpto. Informát.Ingenie.Sistms.) Área (Departamento): Área Lenguajes y Sistemas Inf. (Dpto. Informát.Ingenie.Sistms.)