Unsupervised adaptation of deep speech activity detection models to unseen domains

Gimeno, P. (Universidad de Zaragoza) ; Ribas, D. ; Ortega, A. (Universidad de Zaragoza) ; Miguel, A. (Universidad de Zaragoza) ; Lleida, E. (Universidad de Zaragoza)
Unsupervised adaptation of deep speech activity detection models to unseen domains
Financiación H2020 / H2020 Funds
Resumen: Speech Activity Detection (SAD) aims to accurately classify audio fragments containing human speech. Current state-of-the-art systems for the SAD task are mainly based on deep learning solutions. These applications usually show a significant drop in performance when test data are different from training data due to the domain shift observed. Furthermore, machine learning algorithms require large amounts of labelled data, which may be hard to obtain in real applications. Considering both ideas, in this paper we evaluate three unsupervised domain adaptation techniques applied to the SAD task. A baseline system is trained on a combination of data from different domains and then adapted to a new unseen domain, namely, data from Apollo space missions coming from the Fearless Steps Challenge. Experimental results demonstrate that domain adaptation techniques seeking to minimise the statistical distribution shift provide the most promising results. In particular, Deep CORAL method reports a 13% relative improvement in the original evaluation metric when compared to the unadapted baseline model. Further experiments show that the cascaded application of Deep CORAL and pseudo-labelling techniques can improve even more the results, yielding a significant 24% relative improvement in the evaluation metric when compared to the baseline system.
Idioma: Inglés
DOI: 10.3390/app12041832
Año: 2022
Publicado en: Applied Sciences (Switzerland) 12, 4 (2022), 1832 [23 pp.]
ISSN: 2076-3417

Factor impacto JCR: 2.7 (2022)
Categ. JCR: PHYSICS, APPLIED rank: 77 / 159 = 0.484 (2022) - Q2 - T2
Categ. JCR: ENGINEERING, MULTIDISCIPLINARY rank: 42 / 90 = 0.467 (2022) - Q2 - T2
Categ. JCR: CHEMISTRY, MULTIDISCIPLINARY rank: 100 / 178 = 0.562 (2022) - Q3 - T2
Categ. JCR: MATERIALS SCIENCE, MULTIDISCIPLINARY rank: 207 / 341 = 0.607 (2022) - Q3 - T2

Factor impacto CITESCORE: 4.5 - Engineering (Q2) - Materials Science (Q2) - Chemical Engineering (Q2) - Computer Science (Q2) - Physics and Astronomy (Q2)

Factor impacto SCIMAGO: 0.492 - Fluid Flow and Transfer Processes (Q2) - Materials Science (miscellaneous) (Q2) - Engineering (miscellaneous) (Q2) - Instrumentation (Q2) - Process Chemistry and Technology (Q3) - Computer Science Applications (Q3)

Financiación: info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41
Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033
Tipo y forma: Article (Published version)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Creative Commons You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Exportado de SIDERAL (2023-09-13-11:13:39)

Este artículo se encuentra en las siguientes colecciones:

 Record created 2022-03-01, last modified 2023-09-14

Versión publicada:
Rate this document:

Rate this document:
(Not yet reviewed)