000121235 001__ 121235
000121235 005__ 20231023123346.0
000121235 0247_ $$2doi$$a10.1109/TASLP.2022.3224282
000121235 0248_ $$2sideral$$a131864
000121235 037__ $$aART-2023-131864
000121235 041__ $$aeng
000121235 100__ $$0(orcid)0000-0002-1041-0498$$aDiaz-Guerra, David
000121235 245__ $$aDirection of arrival estimation of sound sources using icosahedral CNNs
000121235 260__ $$c2023
000121235 5060_ $$aAccess copy available to the general public$$fUnrestricted
000121235 5203_ $$aIn this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10∘ even in scenarios with a reverberation time T60 of 1.5s .
000121235 536__ $$9info:eu-repo/grantAgreement/ES/DGA-FEDER/2014-2020
000121235 540__ $$9info:eu-repo/semantics/openAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000121235 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000121235 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000121235 700__ $$0(orcid)0000-0002-7500-4650$$aBeltran, Jose R.$$uUniversidad de Zaragoza
000121235 7102_ $$15008$$2785$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Tecnología Electrónica
000121235 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000121235 773__ $$g31 (2023), 313-321$$pIEEE/ACM trans. audio speech lang. process.$$tIEEE/ACM Transactions on Audio, Speech, and Language Processing$$x2329-9290
000121235 8564_ $$s746438$$uhttps://zaguan.unizar.es/record/121235/files/texto_completo.pdf$$yPostprint
000121235 8564_ $$s3423409$$uhttps://zaguan.unizar.es/record/121235/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000121235 909CO $$ooai:zaguan.unizar.es:121235$$particulos$$pdriver
000121235 951__ $$a2023-10-23-12:22:59
000121235 980__ $$aARTICLE