Unsupervised adaptation of PLDA models for broadcast diarization

Viñals, Ignacio; Lleida, Eduardo; Ortega, Alfonso; Miguel, Antonio; Villalba, Jesús
doi:10.1186/s13636-019-0167-7
000087597 001__ 87597
000087597 005__ 20200729203743.0
000087597 0247_ $$2doi$$a10.1186/s13636-019-0167-7
000087597 0248_ $$2sideral$$a115809
000087597 037__ $$aART-2019-115809
000087597 041__ $$aeng
000087597 100__ $$0(orcid)0000-0003-1772-0605$$aViñals, Ignacio$$uUniversidad de Zaragoza
000087597 245__ $$aUnsupervised adaptation of PLDA models for broadcast diarization
000087597 260__ $$c2019
000087597 5060_ $$aAccess copy available to the general public$$fUnrestricted
000087597 5203_ $$aWe present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by means of supervised model adaptation approaches. By contrast, we propose an unsupervised adaptation method which does not need for in-domain labeled data but only the recording that we are diarizing. We rely on an inner adaptation block which combines Agglomerative Hierarchical Clustering (AHC) and Mean-Shift (MS) clustering techniques with a Fully Bayesian Probabilistic Linear Discriminant Analysis (PLDA) to produce pseudo-speaker labels suitable for model adaptation. We propose multiple adaptation approaches based on this basic block, including unsupervised and semi-supervised. Our proposed solutions, analyzed with the Multi-Genre Broadcast 2015 (MGB) dataset, reported significant improvements (16% relative improvement) with respect to the baseline, also outperforming a supervised adaptation proposal with low resources (9% relative improvement). Furthermore, our proposed unsupervised adaptation is totally compatible with a supervised one. The joint use of both adaptation techniques (supervised and unsupervised) shows a 13% relative improvement with respect to only considering the supervised adaptation.
000087597 536__ $$9info:eu-repo/grantAgreement/ES/DGA-FEDER/T36-17R$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2017-85854-C4-1-R
000087597 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/
000087597 590__ $$a1.289$$b2019
000087597 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b201 / 266 = 0.756$$c2019$$dQ4$$eT3
000087597 591__ $$aACOUSTICS$$b21 / 32 = 0.656$$c2019$$dQ3$$eT2
000087597 592__ $$a0.289$$b2019
000087597 593__ $$aElectrical and Electronic Engineering$$c2019$$dQ3
000087597 593__ $$aAcoustics and Ultrasonics$$c2019$$dQ3
000087597 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000087597 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, Alfonso$$uUniversidad de Zaragoza
000087597 700__ $$aVillalba, Jesús
000087597 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza
000087597 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, Eduardo$$uUniversidad de Zaragoza
000087597 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000087597 773__ $$g2019, 24 (2019), [13 pp.]$$pEURASIP j. audio, speech music. process.$$tEURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING$$x1687-4714
000087597 8564_ $$s1579181$$uhttps://zaguan.unizar.es/record/87597/files/texto_completo.pdf$$yVersión publicada
000087597 8564_ $$s12113$$uhttps://zaguan.unizar.es/record/87597/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000087597 909CO $$ooai:zaguan.unizar.es:87597$$particulos$$pdriver
000087597 951__ $$a2020-07-29-20:20:55
000087597 980__ $$aARTICLE
Universidad de Zaragoza Repository