Shouted and whispered speech compensation for speaker verification systems
Financiación H2020 / H2020 Funds
Resumen: Nowadays, speaker verification systems begin to perform very well under normal speech conditions due to the plethora of neutrally-phonated speech data available, which are used to train such systems. Nevertheless, the use of vocal effort modes other than normal severely degrades performance because of vocal effort mismatch. In this paper, in which we consider whispered, normal and shouted speech production modes, we first study how vocal effort mismatch negatively affects speaker verification performance. Then, in order to mitigate this issue, we describe a series of techniques for score calibration and speaker embedding compensation relying on logistic regression-based vocal effort mode detection. To test the validity of all of these methodologies, speaker verification experiments using a modern x-vector-based speaker verification system are carried out. Experimental results show that we can achieve, when combining score calibration and embedding compensation relying upon vocal effort mode detection, up to 19% and 52% equal error rate (EER) relative improvements under the shouted-normal and whispered-normal scenarios, respectively, in comparison with a system applying neither calibration nor compensation. Compared to our previous work 1], we obtain a 7.3% relative improvement in terms of EER when adding score calibration in shouted-normal All vs. All condition. © 2022 Elsevier Inc.
Idioma: Inglés
DOI: 10.1016/j.dsp.2022.103536
Año: 2022
Publicado en: DIGITAL SIGNAL PROCESSING 127 (2022), 103536 [13 pp.]
ISSN: 1051-2004

Factor impacto JCR: 2.9 (2022)
Categ. JCR: ENGINEERING, ELECTRICAL & ELECTRONIC rank: 131 / 274 = 0.478 (2022) - Q2 - T2
Factor impacto CITESCORE: 4.5 - Engineering (Q2) - Mathematics (Q1) - Decision Sciences (Q2) - Computer Science (Q2)

Factor impacto SCIMAGO: 0.776 - Applied Mathematics (Q2) - Artificial Intelligence (Q2) - Computational Theory and Mathematics (Q2) - Statistics, Probability and Uncertainty (Q2) - Electrical and Electronic Engineering (Q2) - Signal Processing (Q2) - Computer Vision and Pattern Recognition (Q2)

Financiación: info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41
Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-20R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/ES/MICINN-AEI/10.13039/501100011033
Tipo y forma: Article (PostPrint)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Creative Commons You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes. If you remix, transform, or build upon the material, you may not distribute the modified material.


Exportado de SIDERAL (2025-01-15-15:06:16)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Articles > Artículos por área > Teoría de la Señal y Comunicaciones



 Record created 2025-01-15, last modified 2025-01-15


Postprint:
 PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)