121235 20260112133149.0 doi 10.1109/TASLP.2022.3224282 sideral 131864 ART-2023-131864 eng Diaz-Guerra, David (orcid)0000-0002-1041-0498 Direction of arrival estimation of sound sources using icosahedral CNNs 2023 Access copy available to the general public Unrestricted In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10∘ even in scenarios with a reverberation time T60 of 1.5s . info:eu-repo/grantAgreement/ES/DGA-FEDER/2014-2020 info:eu-repo/semantics/closedAccess All rights reserved http://www.europeana.eu/rights/rr-f/ 4.1 2023 ACOUSTICS 4 / 40 = 0.1 2023 Q1 T1 ENGINEERING, ELECTRICAL & ELECTRONIC 94 / 353 = 0.266 2023 Q2 T1 1.542 2023 Acoustics and Ultrasonics 2023 Q1 Computational Mathematics 2023 Q1 Computer Science (miscellaneous) 2023 Q1 Speech and Hearing 2023 Q1 Instrumentation 2023 Q1 Media Technology 2023 Q1 Signal Processing 2023 Q1 Electrical and Electronic Engineering 2023 Q1 11.3 2023 info:eu-repo/semantics/article info:eu-repo/semantics/acceptedVersion Miguel, Antonio Universidad de Zaragoza (orcid)0000-0001-5803-4316 Beltran, Jose R. Universidad de Zaragoza (orcid)0000-0002-7500-4650 5008 785 Universidad de Zaragoza Dpto. Ingeniería Electrón.Com. Área Tecnología Electrónica 5008 800 Universidad de Zaragoza Dpto. Ingeniería Electrón.Com. Área Teoría Señal y Comunicac. 31 (2023), 313-321 IEEE/ACM trans. audio speech lang. process. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2329-9290 1944152 http://zaguan.unizar.es/record/121235/files/texto_completo.pdf Postprint 3423313 http://zaguan.unizar.es/record/121235/files/texto_completo.jpg?subformat=icon icon Postprint oai:zaguan.unizar.es:121235 articulos driver 2026-01-12-12:37:14 ARTICLE