000110867 001__ 110867
000110867 005__ 20240319080949.0
000110867 0247_ $$2doi$$a10.3390/app12031141
000110867 0248_ $$2sideral$$a127694
000110867 037__ $$aART-2022-127694
000110867 041__ $$aeng
000110867 100__ $$0(orcid)0000-0002-3505-0249$$aMingote, Victoria$$uUniversidad de Zaragoza
000110867 245__ $$aMultimodal Diarization Systems by Training Enrollment Models as Identity Representations
000110867 260__ $$c2022
000110867 5060_ $$aAccess copy available to the general public$$fUnrestricted
000110867 5203_ $$aThis paper describes a post-evaluation analysis of the system developed by ViVoLAB research group for the IberSPEECH-RTVE 2020 Multimodal Diarization (MD) Challenge. This challenge focuses on the study of multimodal systems for the diarization of audiovisual files and the assignment of an identity to each segment where a person is detected. In this work, we implemented two different subsystems to address this task using the audio and the video from audiovisual files separately. To develop our subsystems, we used the state-of-the-art speaker and face verification embeddings extracted from publicly available deep neural networks (DNN). Different clustering techniques were also employed in combination with the tracking and identity assignment process. Furthermore, we included a novel back-end approach in the face verification subsystem to train an enrollment model for each identity, which we have previously shown to improve the results compared to the average of the enrollment data. Using this approach, we trained a learnable vector to represent each enrollment character. The loss function employed to train this vector was an approximated version of the detection cost function (aDCF) which is inspired by the DCF widely used metric to measure performance in verification tasks. In this paper, we also focused on exploring and analyzing the effect of training this vector with several configurations of this objective loss function. This analysis allows us to assess the impact of the configuration parameters of the loss in the amount and type of errors produced by the system.
000110867 536__ $$9info:eu-repo/grantAgreement/ES/DGA/T36-20R$$9info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101007666-ESPERANTO$$9info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033$$9info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312
000110867 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/
000110867 590__ $$a2.7$$b2022
000110867 592__ $$a0.492$$b2022
000110867 591__ $$aPHYSICS, APPLIED$$b78 / 160 = 0.488$$c2022$$dQ2$$eT2
000110867 591__ $$aENGINEERING, MULTIDISCIPLINARY$$b42 / 90 = 0.467$$c2022$$dQ2$$eT2
000110867 591__ $$aCHEMISTRY, MULTIDISCIPLINARY$$b100 / 178 = 0.562$$c2022$$dQ3$$eT2
000110867 591__ $$aMATERIALS SCIENCE, MULTIDISCIPLINARY$$b208 / 343 = 0.606$$c2022$$dQ3$$eT2
000110867 593__ $$aFluid Flow and Transfer Processes$$c2022$$dQ2
000110867 593__ $$aMaterials Science (miscellaneous)$$c2022$$dQ2
000110867 593__ $$aEngineering (miscellaneous)$$c2022$$dQ2
000110867 593__ $$aInstrumentation$$c2022$$dQ2
000110867 593__ $$aProcess Chemistry and Technology$$c2022$$dQ3
000110867 593__ $$aComputer Science Applications$$c2022$$dQ3
000110867 594__ $$a4.5$$b2022
000110867 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000110867 700__ $$0(orcid)0000-0001-9137-4013$$aViñals, Ignacio$$uUniversidad de Zaragoza
000110867 700__ $$0(orcid)0000-0002-3142-0708$$aGimeno, Pablo$$uUniversidad de Zaragoza
000110867 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000110867 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, Alfonso$$uUniversidad de Zaragoza
000110867 700__ $$0(orcid)0000-0003-1772-0605$$aLleida, Eduardo$$uUniversidad de Zaragoza
000110867 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000110867 773__ $$g12, 3 (2022), 1141 [15 pp]$$pAppl. sci.$$tApplied Sciences (Switzerland)$$x2076-3417
000110867 8564_ $$s572169$$uhttps://zaguan.unizar.es/record/110867/files/texto_completo.pdf$$yVersión publicada
000110867 8564_ $$s2799278$$uhttps://zaguan.unizar.es/record/110867/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000110867 909CO $$ooai:zaguan.unizar.es:110867$$particulos$$pdriver
000110867 951__ $$a2024-03-18-12:51:50
000110867 980__ $$aARTICLE