A framework for the automated thematic annotation of open government data
Resumen: Governmental policies for transparency and reuse of public sector information have encouraged the launch of open government data portals around the world. Many of these portals are based on pyramidal structures: national open data portals are aggregators of the contents harvested from open data portals maintained by governments in charge of administrative areas with a narrower scope. Taking into account this hierarchical organization, these open data portals lack consistent and scalable mechanisms for thematic annotation, limiting dataset discoverability. This work proposes a framework for the automated thematic classification of open government data. The framework integrates (i) thematic annotation quality assessment, (ii) supervised machine learning models trained on annotated metadata corpora, and (iii) embedding-based semantic similarity methods for theme assignment in the absence of reliable annotations. The framework is evaluated using 29,793 datasets from data.europa.eu, the European open data portal. Experimental results show that supervised models achieve high classification performance, with Support Vector Machines reaching an accuracy of 93.65%, while unsupervised embedding-based approaches achieve substantial semantic agreement with portal-assigned themes (74.56%) using transformer-based representations. These results demonstrate that the proposed framework enables scalable, consistent, and interoperable thematic annotation, offering both theoretical contributions to automated metadata enrichment and practical value for integration into large-scale open data portal infrastructures.
Idioma: Inglés
DOI: 10.2298/CSIS251029022A
Año: 2026
Publicado en: COMPUTER SCIENCE AND INFORMATION SYSTEMS 23, 2 (2026), 917-946
ISSN: 1820-0214

Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Lenguajes y Sistemas Inf. (Dpto. Informát.Ingenie.Sistms.)
Dataset asociado: A Framework for the Automated Thematic Annotation of Open Government Data ( 10.5281/zenodo.18317554)

Creative Commons Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace. No puede utilizar el material para una finalidad comercial. Si remezcla, transforma o crea a partir del material, no puede difundir el material modificado.


Exportado de SIDERAL (2026-05-05-13:36:08)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Lenguajes y Sistemas Informáticos



 Registro creado el 2026-05-05, última modificación el 2026-05-05


Versión publicada:
 PDF
Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)