000148265 001__ 148265
000148265 005__ 20250115160155.0
000148265 0247_ $$2doi$$a10.1016/j.csl.2020.101078
000148265 0248_ $$2sideral$$a117139
000148265 037__ $$aART-2020-117139
000148265 041__ $$aeng
000148265 100__ $$0(orcid)0000-0002-3505-0249$$aMingote, V.$$uUniversidad de Zaragoza
000148265 245__ $$aOptimization of the area under the ROC curve using neural network supervectors for text-dependent speaker verification
000148265 260__ $$c2020
000148265 5060_ $$aAccess copy available to the general public$$fUnrestricted
000148265 5203_ $$aThis paper explores two techniques to improve the performance of text-dependent speaker verification systems based on deep neural networks. Firstly, we propose a general alignment mechanism to keep the temporal structure of each phrase and obtain a supervector with the speaker and phrase information, since both are relevant for a text-dependent verification. As we show, it is possible to use different alignment techniques to replace the global average pooling providing significant gains in performance. Moreover, we also present a novel Back-end approach to train a neural network for detection tasks by optimizing the Area Under the Curve (AUC) as an alternative to the usual triplet loss function, so the system is end-to-end, with a cost function close to our desired measure of performance. As we can see in the experimental section, this approach improves the system performance, since our triplet neural network based on an approximation of the AUC (aAUC) learns how to discriminate between pairs of examples from the same identity and pairs of different identities. The different alignment techniques to produce supervectors in addition to the new Back-end approach were tested on the RSR2015-Part I and RSR2015-Part II database for text-dependent speaker verification, providing competitive results compared to similar size networks using the global average pooling to extract supervectors and using a simple Back-end or triplet loss training.
000148265 536__ $$9info:eu-repo/grantAgreement/ES/DGA-FEDER/T36-17R$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2017-85854-C4-1-R
000148265 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc-nd$$uhttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
000148265 590__ $$a1.899$$b2020
000148265 591__ $$aCOMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE$$b95 / 139 = 0.683$$c2020$$dQ3$$eT3
000148265 592__ $$a0.452$$b2020
000148265 593__ $$aSoftware$$c2020$$dQ2
000148265 593__ $$aHuman-Computer Interaction$$c2020$$dQ2
000148265 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000148265 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, A.$$uUniversidad de Zaragoza
000148265 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, A.$$uUniversidad de Zaragoza
000148265 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, E.$$uUniversidad de Zaragoza
000148265 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000148265 773__ $$g63 (2020), 101078 [16 pp.]$$pComput. speech lang.$$tCOMPUTER SPEECH AND LANGUAGE$$x0885-2308
000148265 8564_ $$s558778$$uhttps://zaguan.unizar.es/record/148265/files/texto_completo.pdf$$yPostprint
000148265 8564_ $$s2455349$$uhttps://zaguan.unizar.es/record/148265/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000148265 909CO $$ooai:zaguan.unizar.es:148265$$particulos$$pdriver
000148265 951__ $$a2025-01-15-15:06:03
000148265 980__ $$aARTICLE