000119865 001__ 119865
000119865 005__ 20230111103847.0
000119865 0247_ $$2doi$$a10.3233/SW-222899
000119865 0248_ $$2sideral$$a130783
000119865 037__ $$aART-2022-130783
000119865 041__ $$aeng
000119865 100__ $$aGoel, S.
000119865 245__ $$aBilingual dictionary generation and enrichment via graph exploration
000119865 260__ $$c2022
000119865 5060_ $$aAccess copy available to the general public$$fUnrestricted
000119865 5203_ $$aIn recent years, we have witnessed a steady growth of linguistic information represented and exposed as linked data on the Web. Such linguistic linked data have stimulated the development and use of openly available linguistic knowledge graphs, as is the case with the Apertium RDF, a collection of interconnected bilingual dictionaries represented and accessible through Semantic Web standards. In this work, we explore techniques that exploit the graph nature of bilingual dictionaries to automatically infer new links (translations). We build upon a cycle density based method: partitioning the graph into biconnected components for a speed-up, and simplifying the pipeline through a careful structural analysis that reduces hyperparameter tuning requirements. We also analyse the shortcomings of traditional evaluation metrics used for translation inference and propose to complement them with new ones, both-word precision (BWP) and both-word recall (BWR), aimed at being more informative of algorithmic improvements. Over twenty-seven language pairs, our algorithm produces dictionaries about 70% the size of existing Apertium RDF dictionaries at a high BWP of 85% from scratch within a minute. Human evaluation shows that 78% of the additional translations generated for dictionary enrichment are correct as well. We further describe an interesting use-case: inferring synonyms within a single language, on which our initial human-based evaluation shows an average accuracy of 84%. We release our tool as free/open-source software which can not only be applied to RDF data and Apertium dictionaries, but is also easily usable for other formats and communities.
000119865 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PID2020-113903RB-I00$$9info:eu-repo/grantAgreement/EC/H2020/825182/EU/Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors/Pret-a-LLOD$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 825182-Pret-a-LLOD$$9info:eu-repo/grantAgreement/ES/MINECO/RYC2019-028112-I$$9info:eu-repo/grantAgreement/ES/MINECO/TIN2016-78011-C4-3-R
000119865 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/
000119865 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000119865 700__ $$0(orcid)0000-0001-6452-7627$$aGracia, J.$$uUniversidad de Zaragoza
000119865 700__ $$aForcada, M. L.
000119865 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf.
000119865 773__ $$g13, 6 (2022), 1103-1132$$tSemantic Web$$x2210-4968
000119865 8564_ $$s708303$$uhttps://zaguan.unizar.es/record/119865/files/texto_completo.pdf$$yVersión publicada
000119865 8564_ $$s1891562$$uhttps://zaguan.unizar.es/record/119865/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000119865 909CO $$ooai:zaguan.unizar.es:119865$$particulos$$pdriver
000119865 951__ $$a2023-01-11-10:11:13
000119865 980__ $$aARTICLE