The GENIE System: classifying documents by combining mixed-techniques

Garrido, Ángel Luis; Escudero, Sandra; Peiró, Álvaro; Mena, Eduardo; Granados-Buey, María; Ilarri, Sergio

doi:10.1007/978-3-319-27030-2_15

The GENIE System: classifying documents by combining mixed-techniques

Garrido, Ángel Luis ; Granados-Buey, María ; Escudero, Sandra ; Peiró, Álvaro ; Ilarri, Sergio (Universidad de Zaragoza) ; Mena, Eduardo (Universidad de Zaragoza)

Resumen: Today, the automatic text classification is still an open problem and its implementation in companies and organizations with large volumes of data in text format is not a trivial matter. To achieve optimum results many parameters come into play, such as the language, the context, the level of knowledge of the issues discussed, the format of the documents, or the type of language that has been used in the documents to be classified. In this paper we describe a multi-language rule-based pipeline system, called GENIE, used for automatic document categorisation. We have used several business corpora in order to test the real capabilities of our proposal, and we have studied the results of applying different stages of the pipeline over the same data to test the influence of each step in the categorization process. The results obtained by this system are very promising, and in fact, the GENIE system is already being used on real production environments with very good results.
Idioma: Inglés
DOI: 10.1007/978-3-319-27030-2_15
Año: 2015
Publicado en: Lecture Notes in Business Information Processing 226 (2015), 231-246
ISSN: 1865-1348
Factor impacto SCIMAGO: 0.284 - Business and International Management (Q2) - Modeling and Simulation (Q3) - Information Systems and Management (Q3) - Management Information Systems (Q3) - Control and Systems Engineering (Q3) - Information Systems (Q3)

Financiación: info:eu-repo/grantAgreement/ES/MINECO/TIN2013-46238-C4-4-R
Tipo y forma: Artículo (PostPrint)
Área (Departamento): Área Lenguajes y Sistemas Inf. (Dpto. Informát.Ingenie.Sistms.)

Derechos reservados por el editor de la revista

Exportado de SIDERAL (2021-01-21-11:04:25)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Lenguajes y Sistemas Informáticos

Volver a la búsqueda

Registro creado el 2019-02-05, última modificación el 2021-01-21

Postprint:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

The GENIE System: classifying documents by combining mixed-techniques