Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs
Resumen: In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10∘ even in scenarios with a reverberation time T60 of 1.5s .
Idioma: Inglés
DOI: 10.1109/TASLP.2022.3224282
Año: 2023
Publicado en: IEEE/ACM Transactions on Audio, Speech, and Language Processing 31 (2023), 313-321
ISSN: 2329-9290

Financiación: info:eu-repo/grantAgreement/ES/DGA-FEDER/2014-2020
Tipo y forma: Article (PostPrint)
Área (Departamento): Área Tecnología Electrónica (Dpto. Ingeniería Electrón.Com.)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Rights Reserved All rights reserved by journal editor

Exportado de SIDERAL (2023-01-11-08:50:00)

Este artículo se encuentra en las siguientes colecciones:

 Record created 2023-01-11, last modified 2023-01-11

Rate this document:

Rate this document:
(Not yet reviewed)