Multimodal Diarization Systems by Training Enrollment Models as Identity Representations

Mingote, Victoria (Universidad de Zaragoza) ; Viñals, Ignacio (Universidad de Zaragoza) ; Gimeno, Pablo (Universidad de Zaragoza) ; Miguel, Antonio (Universidad de Zaragoza) ; Ortega, Alfonso (Universidad de Zaragoza) ; Lleida, Eduardo (Universidad de Zaragoza)
Multimodal Diarization Systems by Training Enrollment Models as Identity Representations
Financiación H2020 / H2020 Funds
Resumen: This paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diarization of audiovisual files and the assignment of an identity to each segment where a person is detected. In this work, we implemented two different subsystems to address this task using the audio and the video from audiovisual files separately. To develop our subsystems, we used the state-of-the-art speaker and face verification embeddings extracted from publicly available deep neural networks (DNN). Different clustering techniques were also employed in combination with the tracking and identity assignment process. Furthermore, we included a novel back-end approach in the face verification subsystem to train an enrollment model for each identity, which we have previously shown to improve the results compared to the average of the enrollment data. Using this approach, we trained a learnable vector to represent each enrollment character. The loss function employed to train this vector was an approximated version of the detection cost function (aDCF) which is inspired by the DCF widely used metric to measure performance in verification tasks. In this paper, we also focused on exploring and analyzing the effect of training this vector with several configurations of this objective loss function. This analysis allows us to assess the impact of the configuration parameters of the loss in the amount and type of errors produced by the system.
Idioma: Inglés
DOI: 10.3390/app12031141
Año: 2022
Publicado en: Applied Sciences (Switzerland) 12, 3 (2022), 1141 [15 pp]
ISSN: 2076-3417

Factor impacto JCR: 2.7 (2022)
Categ. JCR: PHYSICS, APPLIED rank: 78 / 160 = 0.488 (2022) - Q2 - T2
Categ. JCR: ENGINEERING, MULTIDISCIPLINARY rank: 42 / 90 = 0.467 (2022) - Q2 - T2
Categ. JCR: CHEMISTRY, MULTIDISCIPLINARY rank: 100 / 178 = 0.562 (2022) - Q3 - T2
Categ. JCR: MATERIALS SCIENCE, MULTIDISCIPLINARY rank: 208 / 343 = 0.606 (2022) - Q3 - T2

Factor impacto CITESCORE: 4.5 - Engineering (Q2) - Materials Science (Q2) - Chemical Engineering (Q2) - Computer Science (Q2) - Physics and Astronomy (Q2)

Factor impacto SCIMAGO: 0.492 - Fluid Flow and Transfer Processes (Q2) - Materials Science (miscellaneous) (Q2) - Engineering (miscellaneous) (Q2) - Instrumentation (Q2) - Process Chemistry and Technology (Q3) - Computer Science Applications (Q3)

Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033
Financiación: info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Creative Commons Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.


Exportado de SIDERAL (2024-03-18-12:51:50)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos



 Registro creado el 2022-03-01, última modificación el 2024-03-19


Versión publicada:
 PDF
Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)