110866 20240319080948.0 doi 10.1109/TASLP.2022.3145307 sideral 127692 ART-2022-127692 eng Mingote, V. Universidad de Zaragoza (orcid)0000-0002-3505-0249 aDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems 2022 Access copy available to the general public Unrestricted Metric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identification losses. However, these methods do not consider the performance measure and can involve high computational cost, for example, the need for a careful pair or triplet data selection. This paper proposes the approximated Detection Cost Function (aDCF) loss, which is a loss function based on the measure of the decision errors in SV systems, namely the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). With aDCF loss as the training objective function, the end-to-end system learns how to minimize decision errors. Furthermore, we replace the typical linear layer as the last layer of DNN by a cosine distance layer, which reduces the difference between the metric in the training process and the metric during evaluation. aDCF loss function was evaluated in RSR2015-Part I and RSR2015-Part II datasets for text-dependent speaker verification. The system trained with aDCF loss outperforms all the state-of-the-art functions employed in this paper in both parts of the database. info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41 info:eu-repo/grantAgreement/ES/DGA/T36-20R info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101007666-ESPERANTO info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033 info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312 info:eu-repo/semantics/openAccess All rights reserved http://www.europeana.eu/rights/rr-f/ 5.4 2022 ENGINEERING, ELECTRICAL & ELECTRONIC 61 / 274 = 0.223 2022 Q1 T1 ACOUSTICS 3 / 31 = 0.097 2022 Q1 T1 1.348 2022 Acoustics and Ultrasonics 2022 Q1 Computational Mathematics 2022 Q1 Computer Science (miscellaneous) 2022 Q1 Speech and Hearing 2022 Q1 Instrumentation 2022 Q1 Media Technology 2022 Q1 Signal Processing 2022 Q1 Electrical and Electronic Engineering 2022 Q1 10.1 2022 info:eu-repo/semantics/article info:eu-repo/semantics/acceptedVersion Miguel, A. Universidad de Zaragoza (orcid)0000-0001-5803-4316 Ribas, D. (orcid)0000-0003-3813-4998 Ortega, A. Universidad de Zaragoza (orcid)0000-0002-3886-7748 Lleida, E. Universidad de Zaragoza (orcid)0000-0001-9137-4013 5008 800 Universidad de Zaragoza Dpto. Ingeniería Electrón.Com. Área Teoría Señal y Comunicac. 30 (2022), 772-784 IEEE/ACM trans. audio speech lang. process. IEEE/ACM Transactions on Audio, Speech, and Language Processing 2329-9290 3005730 http://zaguan.unizar.es/record/110866/files/texto_completo.pdf Postprint 2662685 http://zaguan.unizar.es/record/110866/files/texto_completo.jpg?subformat=icon icon Postprint oai:zaguan.unizar.es:110866 articulos driver 2024-03-18-12:48:13 ARTICLE