Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems

Estevez, Mariel; Ferrer, Luciana; Ortega, Alfonso; Bonomi, Cyntia; Ribas, Dayana

doi:10.1016/j.jvoice.2025.11.037

Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems

Estevez, Mariel (Universidad de Zaragoza) ; Bonomi, Cyntia ; Ribas, Dayana (Universidad de Zaragoza) ; Ortega, Alfonso (Universidad de Zaragoza) ; Ferrer, Luciana

Resumen: We investigate and quantify demographic-dependent biases in automatic voice disorders detection (AVDD) systems by analyzing performance disparities across speaker groups, and evaluate group-specific calibration strategies for improving reliability. We conducted a comprehensive analysis of an AVDD system using existing voice disorder datasets with available demographic metadata. The study involved analyzing system performance across various demographic groups, particularly focusing on gender and age-based cohorts. Performance evaluation was based on multiple metrics, including normalized costs and cross-entropy. We employed calibration techniques trained separately on predefined demographic groups to address group-dependent miscalibration. Analysis revealed significant performance disparities across demographic groups despite strong global metrics. The system showed systematic biases, misclassifying healthy speakers over 55 as having a voice disorder and speakers with disorders aged 14–30 as healthy. Group-specific calibration improved posterior probability quality, reducing overconfidence. For young disordered speakers, low severity scores were identified as contributing to poor system performance. For older speakers, age-related voice characteristics and potential limitations in the pretrained Hubert model used as a feature extractor likely affected results. The study demonstrates that global performance metrics are insufficient for evaluating AVDD system performance. Group-specific analysis may unmask problems in system performance fairly which are hidden within global metrics. Further, group-dependent calibration strategies help mitigate biases, resulting in a more reliable indication of system confidence. These findings emphasize the need for demographic-specific evaluation and calibration in voice disorder detection systems while providing a methodological framework applicable to broader biomedical classification tasks where demographic metadata is available.
Idioma: Inglés
DOI: 10.1016/j.jvoice.2025.11.037
Año: 2025
Publicado en: JOURNAL OF VOICE (2025), [13 pp.]
ISSN: 0892-1997
Financiación: info:eu-repo/grantAgreement/ES/AEI/PID2021-126061OB-C44
Financiación: info:eu-repo/grantAgreement/ES/DGA/T36-23R
Financiación: info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO
Financiación: info:eu-repo/grantAgreement/EC/H2020/101206575/EU/Mental Illness Detection and Clinical Assessment with Reliable Interpretability/MIND-CLARITY
Tipo y forma: Artículo (PostPrint)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Derechos reservados por el editor de la revista

Fecha de embargo : 2026-12-19
Exportado de SIDERAL (2026-02-13-18:30:21)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Teoría de la Señal y Comunicaciones

Volver a la búsqueda

Registro creado el 2026-02-09, última modificación el 2026-02-13

Postprint:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems