Unsupervised adaptation of deep speech activity detection models to unseen domains

Gimeno, P. (Universidad de Zaragoza) ; Ribas, D. ; Ortega, A. (Universidad de Zaragoza) ; Miguel, A. (Universidad de Zaragoza) ; Lleida, E. (Universidad de Zaragoza)
Unsupervised adaptation of deep speech activity detection models to unseen domains
Financiación H2020 / H2020 Funds
Resumen: Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A baseline system is trained on a combination of data from different domains and then adapted to a new unseen domain, namely, data from Apollo space missions coming from the Fearless Steps Challenge. Experimental results demonstrate that domain adaptation techniques seeking to minimise the statistical distribution shift provide the most promising results. In particular, Deep CORAL method reports a 13% relative improvement in the original evaluation metric when compared to the unadapted baseline model. Further experiments show that the cascaded application of Deep CORAL and pseudo-labelling techniques can improve even more the results, yielding a significant 24% relative improvement in the evaluation metric when compared to the baseline system.
Idioma: Inglés
DOI: 10.3390/app12041832
Año: 2022
Publicado en: Applied Sciences (Switzerland) 12, 4 (2022), 1832 [23 pp.]
ISSN: 2076-3417

Factor impacto JCR: 2.7 (2022)
Categ. JCR: PHYSICS, APPLIED rank: 78 / 160 = 0.488 (2022) - Q2 - T2
Categ. JCR: ENGINEERING, MULTIDISCIPLINARY rank: 42 / 90 = 0.467 (2022) - Q2 - T2
Categ. JCR: CHEMISTRY, MULTIDISCIPLINARY rank: 100 / 178 = 0.562 (2022) - Q3 - T2
Categ. JCR: MATERIALS SCIENCE, MULTIDISCIPLINARY rank: 208 / 343 = 0.606 (2022) - Q3 - T2

Factor impacto CITESCORE: 4.5 - Engineering (Q2) - Materials Science (Q2) - Chemical Engineering (Q2) - Computer Science (Q2) - Physics and Astronomy (Q2)

Factor impacto SCIMAGO: 0.492 - Fluid Flow and Transfer Processes (Q2) - Materials Science (miscellaneous) (Q2) - Engineering (miscellaneous) (Q2) - Instrumentation (Q2) - Process Chemistry and Technology (Q3) - Computer Science Applications (Q3)

Financiación: info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41
Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Creative Commons Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.


Exportado de SIDERAL (2024-03-18-12:52:19)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos



 Registro creado el 2022-03-01, última modificación el 2024-03-19


Versión publicada:
 PDF
Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)