000165793 001__ 165793
000165793 005__ 20260114135812.0
000165793 0247_ $$2doi$$a10.1109/TASLP.2020.3040031
000165793 0248_ $$2sideral$$a121597
000165793 037__ $$aART-2021-121597
000165793 041__ $$aeng
000165793 100__ $$0(orcid)0000-0002-1041-0498$$aDiaz-Guerra, D.$$uUniversidad de Zaragoza
000165793 245__ $$aRobust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks
000165793 260__ $$c2021
000165793 5203_ $$aIn this article, we present a new single sound source DOA estimation and tracking system based on the well-known SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network. It uses SRP-PHAT power maps as input features of a fully convolutional causal architecture that uses 3D convolutional layers to accurately perform the tracking of a sound source even in highly reverberant scenarios where most of the state of the art techniques fail. Unlike previous methods, since we do not use bidirectional recurrent layers and all our convolutional layers are causal in the time dimension, our system is feasible for real-time applications and it provides a new DOA estimation for each new SRP-PHAT map. To train the model, we introduce a new procedure to simulate random trajectories as they are needed during the training, equivalent to an infinite-size dataset with high flexibility to modify its acoustical conditions such as the reverberation time. We use both acoustical simulations on a large range of reverberation times and the actual recordings of the LOCATA dataset to prove the robustness of our system and its good performance even using low-resolution SRP-PHAT maps.
000165793 540__ $$9info:eu-repo/semantics/closedAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000165793 590__ $$a4.364$$b2021
000165793 591__ $$aACOUSTICS$$b5 / 32 = 0.156$$c2021$$dQ1$$eT1
000165793 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b82 / 274 = 0.299$$c2021$$dQ2$$eT1
000165793 592__ $$a1.591$$b2021
000165793 593__ $$aAcoustics and Ultrasonics$$c2021$$dQ1
000165793 593__ $$aComputational Mathematics$$c2021$$dQ1
000165793 593__ $$aSpeech and Hearing$$c2021$$dQ1
000165793 593__ $$aInstrumentation$$c2021$$dQ1
000165793 593__ $$aMedia Technology$$c2021$$dQ1
000165793 593__ $$aElectrical and Electronic Engineering$$c2021$$dQ1
000165793 594__ $$a9.4$$b2021
000165793 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000165793 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, A.$$uUniversidad de Zaragoza
000165793 700__ $$0(orcid)0000-0002-7500-4650$$aBeltran, J.R.$$uUniversidad de Zaragoza
000165793 7102_ $$15008$$2785$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Tecnología Electrónica
000165793 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000165793 773__ $$g29 (2021), 300-311$$pIEEE/ACM trans. audio speech lang. process.$$tIEEE/ACM Transactions on Audio, Speech, and Language Processing$$x2329-9290
000165793 8564_ $$s2112487$$uhttps://zaguan.unizar.es/record/165793/files/texto_completo.pdf$$yVersión publicada
000165793 8564_ $$s3483261$$uhttps://zaguan.unizar.es/record/165793/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000165793 909CO $$ooai:zaguan.unizar.es:165793$$particulos$$pdriver
000165793 951__ $$a2026-01-14-12:45:51
000165793 980__ $$aARTICLE