aDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems

Mingote, V.; Ribas, D.; Ortega, A.; Miguel, A.; Lleida, E.
doi:10.1109/TASLP.2022.3145307
000110866 001__ 110866
000110866 005__ 20240319080948.0
000110866 0247_ $$2doi$$a10.1109/TASLP.2022.3145307
000110866 0248_ $$2sideral$$a127692
000110866 037__ $$aART-2022-127692
000110866 041__ $$aeng
000110866 100__ $$0(orcid)0000-0002-3505-0249$$aMingote, V.$$uUniversidad de Zaragoza
000110866 245__ $$aaDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems
000110866 260__ $$c2022
000110866 5060_ $$aAccess copy available to the general public$$fUnrestricted
000110866 5203_ $$aMetric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identification losses. However, these methods do not consider the performance measure and can involve high computational cost, for example, the need for a careful pair or triplet data selection. This paper proposes the approximated Detection Cost Function (aDCF) loss, which is a loss function based on the measure of the decision errors in SV systems, namely the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). With aDCF loss as the training objective function, the end-to-end system learns how to minimize decision errors. Furthermore, we replace the typical linear layer as the last layer of DNN by a cosine distance layer, which reduces the difference between the metric in the training process and the metric during evaluation. aDCF loss function was evaluated in RSR2015-Part I and RSR2015-Part II datasets for text-dependent speaker verification. The system trained with aDCF loss outperforms all the state-of-the-art functions employed in this paper in both parts of the database.
000110866 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41$$9info:eu-repo/grantAgreement/ES/DGA/T36-20R$$9info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101007666-ESPERANTO$$9info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033$$9info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312
000110866 540__ $$9info:eu-repo/semantics/openAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000110866 590__ $$a5.4$$b2022
000110866 592__ $$a1.348$$b2022
000110866 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b61 / 274 = 0.223$$c2022$$dQ1$$eT1
000110866 591__ $$aACOUSTICS$$b3 / 31 = 0.097$$c2022$$dQ1$$eT1
000110866 593__ $$aAcoustics and Ultrasonics$$c2022$$dQ1
000110866 593__ $$aComputational Mathematics$$c2022$$dQ1
000110866 593__ $$aComputer Science (miscellaneous)$$c2022$$dQ1
000110866 593__ $$aSpeech and Hearing$$c2022$$dQ1
000110866 593__ $$aInstrumentation$$c2022$$dQ1
000110866 593__ $$aMedia Technology$$c2022$$dQ1
000110866 593__ $$aSignal Processing$$c2022$$dQ1
000110866 593__ $$aElectrical and Electronic Engineering$$c2022$$dQ1
000110866 594__ $$a10.1$$b2022
000110866 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000110866 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, A.$$uUniversidad de Zaragoza
000110866 700__ $$0(orcid)0000-0003-3813-4998$$aRibas, D.
000110866 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, A.$$uUniversidad de Zaragoza
000110866 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, E.$$uUniversidad de Zaragoza
000110866 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000110866 773__ $$g30 (2022), 772-784$$pIEEE/ACM trans. audio speech lang. process.$$tIEEE/ACM Transactions on Audio, Speech, and Language Processing$$x2329-9290
000110866 8564_ $$s3005730$$uhttps://zaguan.unizar.es/record/110866/files/texto_completo.pdf$$yPostprint
000110866 8564_ $$s2662685$$uhttps://zaguan.unizar.es/record/110866/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000110866 909CO $$ooai:zaguan.unizar.es:110866$$particulos$$pdriver
000110866 951__ $$a2024-03-18-12:48:13
000110866 980__ $$aARTICLE
Repositorio Institucional de Documentos