000121000 001__ 121000
000121000 005__ 20241125101127.0
000121000 0247_ $$2doi$$a10.1016/j.dsp.2022.103859
000121000 0248_ $$2sideral$$a131066
000121000 037__ $$aART-2023-131066
000121000 041__ $$aeng
000121000 100__ $$0(orcid)0000-0002-3505-0249$$aMingote, Victoria$$uUniversidad de Zaragoza
000121000 245__ $$aClass token and knowledge distillation for multi-head self-attention speaker verification systems
000121000 260__ $$c2023
000121000 5060_ $$aAccess copy available to the general public$$fUnrestricted
000121000 5203_ $$aThis paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings. Unlike global average pooling, our proposal takes into account the temporal structure of the input what is relevant for the text-dependent SV task. The class token is concatenated to the input before the first MSA layer, and its state at the output is used to predict the classes. To gain additional robustness, we introduce two approaches. First, we have developed a new sampling estimation of the class token. In this approach, the class token is obtained by sampling from a list of several trainable vectors. This strategy introduces uncertainty that helps to generalize better compared to a single initialization as it is shown in the experiments. Second, we have added a distilled representation token for training a teacher-student pair of networks using the Knowledge Distillation (KD) philosophy, which is combined with the class token. This distillation token is trained to mimic the predictions from the teacher network, while the class token replicates the true label. All the strategies have been tested on the RSR2015-Part II and DeepMine-Part 1 databases for text-dependent SV, providing competitive results compared to the same architecture using the average pooling mechanism to extract average embeddings.
000121000 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41$$9info:eu-repo/grantAgreement/ES/DGA/T36-20R$$9info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101007666-ESPERANTO$$9info:eu-repo/grantAgreement/ES/MINECO/PRE2018-083312
000121000 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/
000121000 590__ $$a2.9$$b2023
000121000 592__ $$a0.799$$b2023
000121000 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b143 / 353 = 0.405$$c2023$$dQ2$$eT2
000121000 593__ $$aApplied Mathematics$$c2023$$dQ2
000121000 593__ $$aArtificial Intelligence$$c2023$$dQ2
000121000 593__ $$aElectrical and Electronic Engineering$$c2023$$dQ2
000121000 593__ $$aStatistics, Probability and Uncertainty$$c2023$$dQ2
000121000 593__ $$aComputer Vision and Pattern Recognition$$c2023$$dQ2
000121000 593__ $$aSignal Processing$$c2023$$dQ2
000121000 593__ $$aComputational Theory and Mathematics$$c2023$$dQ2
000121000 594__ $$a5.3$$b2023
000121000 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000121000 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000121000 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, Alfonso$$uUniversidad de Zaragoza
000121000 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, Eduardo$$uUniversidad de Zaragoza
000121000 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000121000 773__ $$g133 (2023), 103859 [10 pp.]$$pDigit. signal process.$$tDIGITAL SIGNAL PROCESSING$$x1051-2004
000121000 8564_ $$s1800661$$uhttps://zaguan.unizar.es/record/121000/files/texto_completo.pdf$$yVersión publicada
000121000 8564_ $$s2732520$$uhttps://zaguan.unizar.es/record/121000/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000121000 909CO $$ooai:zaguan.unizar.es:121000$$particulos$$pdriver
000121000 951__ $$a2024-11-22-11:58:05
000121000 980__ $$aARTICLE