000171032 001__ 171032
000171032 005__ 20260505142649.0
000171032 0247_ $$2doi$$a10.2298/CSIS251029022A
000171032 0248_ $$2sideral$$a149127
000171032 037__ $$aART-2026-149127
000171032 041__ $$aeng
000171032 100__ $$0(orcid)0000-0003-3615-4573$$aAziz, Abdul
000171032 245__ $$aA framework for the automated thematic annotation of open government data
000171032 260__ $$c2026
000171032 5060_ $$aAccess copy available to the general public$$fUnrestricted
000171032 5203_ $$aGovernmental policies for transparency and reuse of public sector information have encouraged the launch of open government data portals around the world. Many of these portals are based on pyramidal structures: national open data portals are aggregators of the contents harvested from open data portals maintained by governments in charge of administrative areas with a narrower scope. Taking into account this hierarchical organization, these open data portals lack consistent and scalable mechanisms for thematic annotation, limiting dataset discoverability. This work proposes a framework for the automated thematic classification of open government data. The framework integrates (i) thematic annotation quality assessment, (ii) supervised machine learning models trained on annotated metadata corpora, and (iii) embedding-based semantic similarity methods for theme assignment in the absence of reliable annotations. The framework is evaluated using 29,793 datasets from data.europa.eu, the European open data portal. Experimental results show that supervised models achieve high classification performance, with Support Vector Machines reaching an accuracy of 93.65%, while unsupervised embedding-based approaches achieve substantial semantic agreement with portal-assigned themes (74.56%) using transformer-based representations. These results demonstrate that the proposed framework enables scalable, consistent, and interoperable thematic annotation, offering both theoretical contributions to automated metadata enrichment and practical value for integration into large-scale open data portal infrastructures.
000171032 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc-nd$$uhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
000171032 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000171032 700__ $$aAli, Mohsan
000171032 700__ $$aHerrera-Murillo, Dagoberto José
000171032 700__ $$aMaratsi, Maria Ioanna
000171032 700__ $$0(orcid)0000-0001-6491-7430$$aLopez-Pellicer, Francisco$$uUniversidad de Zaragoza
000171032 700__ $$0(orcid)0000-0002-1279-0367$$aNogueras-Iso, Javier$$uUniversidad de Zaragoza
000171032 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf.
000171032 773__ $$g23, 2 (2026), 917-946$$pCOMPUTER SCIENCE AND INFORMATION SYSTEMS$$tCOMPUTER SCIENCE AND INFORMATION SYSTEMS$$x1820-0214
000171032 787__ $$tA Framework for the Automated Thematic Annotation of Open Government Data$$w10.5281/zenodo.18317554
000171032 8564_ $$s1229389$$uhttps://zaguan.unizar.es/record/171032/files/texto_completo.pdf$$yVersión publicada
000171032 8564_ $$s1713815$$uhttps://zaguan.unizar.es/record/171032/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000171032 909CO $$ooai:zaguan.unizar.es:171032$$particulos$$pdriver
000171032 951__ $$a2026-05-05-13:36:08
000171032 980__ $$aARTICLE