Unsupervised adaptation of PLDA models for broadcast diarization

Viñals, Ignacio; Lleida, Eduardo; Ortega, Alfonso; Miguel, Antonio; Villalba, Jesús

doi:10.1186/s13636-019-0167-7

Unsupervised adaptation of PLDA models for broadcast diarization

Viñals, Ignacio (Universidad de Zaragoza) ; Ortega, Alfonso (Universidad de Zaragoza) ; Villalba, Jesús ; Miguel, Antonio (Universidad de Zaragoza) ; Lleida, Eduardo (Universidad de Zaragoza)

Resumen: We present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by means of supervised model adaptation approaches. By contrast, we propose an unsupervised adaptation method which does not need for in-domain labeled data but only the recording that we are diarizing. We rely on an inner adaptation block which combines Agglomerative Hierarchical Clustering (AHC) and Mean-Shift (MS) clustering techniques with a Fully Bayesian Probabilistic Linear Discriminant Analysis (PLDA) to produce pseudo-speaker labels suitable for model adaptation. We propose multiple adaptation approaches based on this basic block, including unsupervised and semi-supervised. Our proposed solutions, analyzed with the Multi-Genre Broadcast 2015 (MGB) dataset, reported significant improvements (16% relative improvement) with respect to the baseline, also outperforming a supervised adaptation proposal with low resources (9% relative improvement). Furthermore, our proposed unsupervised adaptation is totally compatible with a supervised one. The joint use of both adaptation techniques (supervised and unsupervised) shows a 13% relative improvement with respect to only considering the supervised adaptation.
Idioma: Inglés
DOI: 10.1186/s13636-019-0167-7
Año: 2019
Publicado en: EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING 2019, 24 (2019), [13 pp.]
ISSN: 1687-4714
Factor impacto JCR: 1.289 (2019)
Categ. JCR: ENGINEERING, ELECTRICAL & ELECTRONIC rank: 201 / 266 = 0.756 (2019) - Q4 - T3
Categ. JCR: ACOUSTICS rank: 21 / 32 = 0.656 (2019) - Q3 - T2
Factor impacto SCIMAGO: 0.289 - Electrical and Electronic Engineering (Q3) - Acoustics and Ultrasonics (Q3)

Financiación: info:eu-repo/grantAgreement/ES/DGA-FEDER/T36-17R
Financiación: info:eu-repo/grantAgreement/ES/MINECO/TIN2017-85854-C4-1-R
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Teoría Señal y Comunicac. (Dpto. Ingeniería Electrón.Com.)

Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.

Exportado de SIDERAL (2020-07-29-20:20:55)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Teoría de la Señal y Comunicaciones

Volver a la búsqueda

Registro creado el 2020-02-04, última modificación el 2020-07-29

Versión publicada:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

Unsupervised adaptation of PLDA models for broadcast diarization