Prosodic features and formant modeling for an ivector-based language recognition system

Martinez, D.; Ortega, A.; Miguel, A.; Lleida, E.
doi:10.1109/ICASSP.2013.6638988
000162481 001__ 162481
000162481 005__ 20251017144546.0
000162481 0247_ $$2doi$$a10.1109/ICASSP.2013.6638988
000162481 0248_ $$2sideral$$a84598
000162481 037__ $$aART-2013-84598
000162481 041__ $$aeng
000162481 100__ $$0(orcid)0000-0001-7593-1377$$aMartinez, D.$$uUniversidad de Zaragoza
000162481 245__ $$aProsodic features and formant modeling for an ivector-based language recognition system
000162481 260__ $$c2013
000162481 5203_ $$aThe prosody of a language is encoded in syllable length, loudness and pitch. These attributes make humans perceive rhythm, stress and intonation in speech. Depending on the language, these speech properties vary, making language classification possible. On the other hand, formants are the resonance frequencies of the vocal tract, depend heavily on the position adopted by the articulatory organs, and are especially useful to disambiguate vowel sounds. In this paper prosodic and formant information are combined to build a generative language identification system based on Gaussian models fed with iVectors. The system is evaluated on the NIST LRE09 database and the inclusion of formant information gives about 50% relative improvement for the 30 s task over a prosodic system without it. The fusion with a state-of-the-art acoustic system based on shifted delta cepstral coefficients (SDC) shows the complementarity of both approaches.
000162481 536__ $$9info:eu-repo/grantAgreement/ES/MINECO/TIN2011-28169-C05-02
000162481 540__ $$9info:eu-repo/semantics/closedAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000162481 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000162481 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, E.$$uUniversidad de Zaragoza
000162481 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, A.$$uUniversidad de Zaragoza
000162481 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, A.$$uUniversidad de Zaragoza
000162481 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac.
000162481 773__ $$g2013 (2013), 6847-6851$$pProc. IEEE Int. Conf. Acoust. Speech Signal Process.$$tProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing$$x1520-6149
000162481 8564_ $$s287373$$uhttps://zaguan.unizar.es/record/162481/files/texto_completo.pdf$$yVersión publicada
000162481 8564_ $$s3118135$$uhttps://zaguan.unizar.es/record/162481/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000162481 909CO $$ooai:zaguan.unizar.es:162481$$particulos$$pdriver
000162481 951__ $$a2025-10-17-14:09:58
000162481 980__ $$aARTICLE
Universidad de Zaragoza Repository