000162445 001__ 162445
000162445 005__ 20251017144634.0
000162445 0247_ $$2doi$$a10.1109/ICASSP39728.2021.9414859
000162445 0248_ $$2sideral$$a129899
000162445 037__ $$aART-2021-129899
000162445 041__ $$aeng
000162445 100__ $$0(orcid)0000-0002-3505-0249$$aMingote, Victoria$$uUniversidad de Zaragoza
000162445 245__ $$aMemory Layers with Multi-Head Attention Mechanisms for Text-Dependent Speaker Verification
000162445 260__ $$c2021
000162445 5060_ $$aAccess copy available to the general public$$fUnrestricted
000162445 5203_ $$aIn this paper, we explore an approach based on memory layers and multi-head attention mechanisms to improve in an efficient way the performance of text-dependent speaker verification (SV) systems. The most extended SV systems based on Deep Neural Networks (DNN) extract the embedding of the utterance from the average pooling of the temporal dimension after processing. Unlike previous works, we can exploit the phonetic knowledge needed for text-dependent SV systems by combining the temporal attention of multiple parallel heads with the phonetic embeddings extracted from a phonetic classification network, which helps to guide to the attention mechanism with the role of the positional embedding. The addition of a memory layer to a text-dependent SV system was tested on the RSR2015-part II and DeepMine-part I databases, where, in both cases outperformed the baseline result and the reference system based on the same transformer network without the memory layer.
000162445 536__ $$9info:eu-repo/grantAgreement/ES/DGA/T36-20R$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2017-85854-C4-1-R
000162445 540__ $$9info:eu-repo/semantics/openAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000162445 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000162445 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000162445 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, Alfonso$$uUniversidad de Zaragoza
000162445 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, Eduardo$$uUniversidad de Zaragoza
000162445 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000162445 773__ $$g2021 (2021), 6154-6158$$tProceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing$$x0736-7791
000162445 8564_ $$s241617$$uhttps://zaguan.unizar.es/record/162445/files/texto_completo.pdf$$yPostprint
000162445 8564_ $$s2968788$$uhttps://zaguan.unizar.es/record/162445/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000162445 909CO $$ooai:zaguan.unizar.es:162445$$particulos$$pdriver
000162445 951__ $$a2025-10-17-14:28:13
000162445 980__ $$aARTICLE