Direction of arrival estimation of sound sources using icosahedral CNNs

Diaz-Guerra, David; Miguel, Antonio; Beltran, Jose R.
doi:10.1109/TASLP.2022.3224282
000121235 001__ 121235
000121235 005__ 20260112133149.0
000121235 0247_ $$2doi$$a10.1109/TASLP.2022.3224282
000121235 0248_ $$2sideral$$a131864
000121235 037__ $$aART-2023-131864
000121235 041__ $$aeng
000121235 100__ $$0(orcid)0000-0002-1041-0498$$aDiaz-Guerra, David
000121235 245__ $$aDirection of arrival estimation of sound sources using icosahedral CNNs
000121235 260__ $$c2023
000121235 5060_ $$aAccess copy available to the general public$$fUnrestricted
000121235 5203_ $$aIn this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10∘ even in scenarios with a reverberation time T60 of 1.5s .
000121235 536__ $$9info:eu-repo/grantAgreement/ES/DGA-FEDER/2014-2020
000121235 540__ $$9info:eu-repo/semantics/closedAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000121235 590__ $$a4.1$$b2023
000121235 592__ $$a1.542$$b2023
000121235 591__ $$aACOUSTICS$$b4 / 40 = 0.1$$c2023$$dQ1$$eT1
000121235 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b94 / 353 = 0.266$$c2023$$dQ2$$eT1
000121235 593__ $$aAcoustics and Ultrasonics$$c2023$$dQ1
000121235 594__ $$a11.3$$b2023
000121235 593__ $$aComputational Mathematics$$c2023$$dQ1
000121235 593__ $$aComputer Science (miscellaneous)$$c2023$$dQ1
000121235 593__ $$aSpeech and Hearing$$c2023$$dQ1
000121235 593__ $$aInstrumentation$$c2023$$dQ1
000121235 593__ $$aMedia Technology$$c2023$$dQ1
000121235 593__ $$aSignal Processing$$c2023$$dQ1
000121235 593__ $$aElectrical and Electronic Engineering$$c2023$$dQ1
000121235 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000121235 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000121235 700__ $$0(orcid)0000-0002-7500-4650$$aBeltran, Jose R.$$uUniversidad de Zaragoza
000121235 7102_ $$15008$$2785$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Tecnología Electrónica
000121235 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000121235 773__ $$g31 (2023), 313-321$$pIEEE/ACM trans. audio speech lang. process.$$tIEEE/ACM Transactions on Audio, Speech, and Language Processing$$x2329-9290
000121235 8564_ $$s1944152$$uhttps://zaguan.unizar.es/record/121235/files/texto_completo.pdf$$yPostprint
000121235 8564_ $$s3423313$$uhttps://zaguan.unizar.es/record/121235/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000121235 909CO $$ooai:zaguan.unizar.es:121235$$particulos$$pdriver
000121235 951__ $$a2026-01-12-12:37:14
000121235 980__ $$aARTICLE
Universidad de Zaragoza Repository