aDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems

Mingote, V.; Ribas, D.; Ortega, A.; Miguel, A.; Lleida, E.

doi:10.1109/TASLP.2022.3145307

aDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems

Mingote, V. (Universidad de Zaragoza) ; Miguel, A. (Universidad de Zaragoza) ; Ribas, D. ; Ortega, A. (Universidad de Zaragoza) ; Lleida, E. (Universidad de Zaragoza)

Resumen: Metric learning approaches have widely expanded to the training of Speaker Verification (SV) systems based on Deep Neural Networks (DNNs), by using a loss function more consistent with the evaluation process than the traditional identification losses. However, these methods do not consider the performance measure and can involve high computational cost, for example, the need for a careful pair or triplet data selection. This paper proposes the approximated Detection Cost Function (aDCF) loss, which is a loss function based on the measure of the decision errors in SV systems, namely the False Rejection Rate (FRR) and the False Acceptance Rate (FAR). With aDCF loss as the training objective function, the end-to-end system learns how to minimize decision errors. Furthermore, we replace the typical linear layer as the last layer of DNN by a cosine distance layer, which reduces the difference between the metric in the training process and the metric during evaluation. aDCF loss function was evaluated in RSR2015-Part I and RSR2015-Part II datasets for text-dependent speaker verification. The system trained with aDCF loss outperforms all the state-of-the-art functions employed in this paper in both parts of the database.
Idioma: Inglés
DOI: 10.1109/TASLP.2022.3145307
Año: 2022
Publicado en: IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 772-784
ISSN: 2329-9290
Factor impacto JCR: 5.4 (2022)
Categ. JCR: ENGINEERING, ELECTRICAL & ELECTRONIC rank: 61 / 274 = 0.223 (2022) - Q1 - T1
Categ. JCR: ACOUSTICS rank: 3 / 31 = 0.097 (2022) - Q1 - T1
Factor impacto CITESCORE: 10.1 - Engineering (Q1) - Mathematics (Q1) - Computer Science (Q1) - Physics and Astronomy (Q1)

Factor impacto SCIMAGO: 1.348 - Acoustics and Ultrasonics (Q1) - Computational Mathematics (Q1) - Computer Science (miscellaneous) (Q1) - Speech and Hearing (Q1) - Instrumentation (Q1) - Media Technology (Q1) - Signal Processing (Q1) - Electrical and Electronic Engineering (Q1)

Financiación: info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41
Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MCIN/AEI/10.13039/501100011033
Financiación: info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312
Tipo y forma: Article (PostPrint)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Permalink:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Articles > Artículos por área > Teoría de la Señal y Comunicaciones

Back to search

Record created 2022-03-01, last modified 2024-03-19

Postprint:
PDF

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Universidad de Zaragoza Repository

aDCF loss function for deep metric learning in end-to-end text-dependent speaker verification systems