On the Use of Deep Feedforward Neural Networks for Automatic Language Identification

Lopez-Moreno, Ignacio; Gonzalez-Rodriguez, Joaquin; Martínez González, David; Plchot, Oldrich; Gonzalez-Dominguez, Javier; Moreno, Pedro

doi:10.1016/j.csl.2016.03.001

On the Use of Deep Feedforward Neural Networks for Automatic Language Identification

Lopez-Moreno, Ignacio ; Gonzalez-Dominguez, Javier ; Martínez González, David ; Plchot, Oldrich ; Gonzalez-Rodriguez, Joaquin ; Moreno, Pedro

Resumen: In this work, we present a comprehensive study on the use of deep neural networks (DNNs) for automatic language identification (LID). Motivated by the recent success of using DNNs in acoustic modeling for speech recognition, we adapt DNNs to the problem of identifying the language in a given utterance from its short-term acoustic features. We propose two different DNN- based approaches. In the first one, the DNN acts as an end-to-end LID classifier, receiving as input the speech features and providing as output the estimated probabilities of the target languages. In the second approach, the DNN is used to extract bottleneck features that are then used as inputs for a state-of-the-art i-vector system. Experiments are conducted in two different scenarios: the complete NIST Language Recognition Evaluation dataset 2009 (LRE’09) and a subset of the Voice of America (VOA) data from LRE’09, in which all languages have the same amount of training data. Results for both datasets demonstrate that the DNN-based systems significantly outperform a state-of-art i-vector system when dealing with short-duration utterances. Furthermore, the combination of the DNN-based and the classical i-vector system leads to additional performance improvements (up to 45% of relative improvement in both EER and Cavg on 3s and 10s conditions, respectively).
Idioma: Inglés
DOI: 10.1016/j.csl.2016.03.001
Año: 2016
Publicado en: COMPUTER SPEECH AND LANGUAGE 40 (2016), 46-59
ISSN: 0885-2308
Factor impacto JCR: 1.9 (2016)
Categ. JCR: COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE rank: 64 / 133 = 0.481 (2016) - Q2 - T2
Factor impacto SCIMAGO: 0.474 - Human-Computer Interaction (Q2) - Software (Q2) - Theoretical Computer Science (Q3)

Financiación: info:eu-repo/grantAgreement/ES/MINECO/TIN2011-28169-C05-02
Tipo y forma: Artículo (Versión definitiva)

Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.

Exportado de SIDERAL (2020-10-09-17:46:28)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos

Volver a la búsqueda

Registro creado el 2016-08-17, última modificación el 2020-10-09

Versión publicada:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

On the Use of Deep Feedforward Neural Networks for Automatic Language Identification