The Domain Mismatch Problem in the Broadcast Speaker Attribution Task

Viñals, Ignacio (Universidad de Zaragoza) ; Ortega, Alfonso (Universidad de Zaragoza) ; Miguel, Antonio (Universidad de Zaragoza) ; Lleida, Eduardo (Universidad de Zaragoza)
The Domain Mismatch Problem in the Broadcast Speaker Attribution Task
Financiación H2020 / H2020 Funds
Resumen: The demand of high-quality metadata for the available multimedia content requires the development of new techniques able to correctly identify more and more information, including the speaker information. The task known as speaker attribution aims at identifying all or part of the speakers in the audio under analysis. In this work, we carry out a study of the speaker attribution problem in the broadcast domain. Through our experiments, we illustrate the positive impact of diarization on the final performance. Additionally, we show the influence of the variability present in broadcast data, depicting the broadcast domain as a collection of subdomains with particular characteristics. Taking these two factors into account, we also propose alternative approximations robust against domain mismatch. These approximations include a semisupervised alternative as well as a totally unsupervised new hybrid solution fusing diarization and speaker assignment. Thanks to these two approximations, our performance is boosted around a relative 50%. The analysis has been carried out using the corpus for the Albayzín 2020 challenge, a diarization and speaker attribution evaluation working with broadcast data. These data, provided by Radio Televisión Española (RTVE), the Spanish public Radio and TV Corporation, include multiple shows and genres to analyze the impact of new speech technologies in real-world scenarios.
Idioma: Inglés
DOI: 10.3390/app11188521
Año: 2021
Publicado en: Applied Sciences (Switzerland) 11, 18 (2021), 8521 [19 p.]
ISSN: 2076-3417

Factor impacto JCR: 2.838 (2021)
Categ. JCR: ENGINEERING, MULTIDISCIPLINARY rank: 39 / 92 = 0.424 (2021) - Q2 - T2
Categ. JCR: PHYSICS, APPLIED rank: 76 / 161 = 0.472 (2021) - Q2 - T2
Categ. JCR: MATERIALS SCIENCE, MULTIDISCIPLINARY rank: 218 / 345 = 0.632 (2021) - Q3 - T2
Categ. JCR: CHEMISTRY, MULTIDISCIPLINARY rank: 100 / 180 = 0.556 (2021) - Q3 - T2

Factor impacto CITESCORE: 3.7 - Engineering (Q2) - Materials Science (Q2) - Chemical Engineering (Q2) - Computer Science (Q2) - Physics and Astronomy (Q2)

Factor impacto SCIMAGO: 0.507 - Engineering (miscellaneous) (Q2) - Computer Science Applications (Q2) - Process Chemistry and Technology (Q2) - Materials Science (miscellaneous) (Q2) - Fluid Flow and Transfer Processes (Q2)

Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MINECO/TIN2017-85854-C4-1-R
Tipo y forma: Article (Published version)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Creative Commons You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.


Exportado de SIDERAL (2023-05-18-15:01:16)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Articles > Artículos por área > Teoría de la Señal y Comunicaciones



 Record created 2021-11-15, last modified 2023-05-19


Versión publicada:
 PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)