109500 20240319080946.0 doi 10.1007/s12559-020-09800-x sideral 123375 ART-2022-123375 eng Tessore, J.P. Distant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish 2022 Access copy available to the general public Unrestricted Tagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field. info:eu-repo/grantAgreement/ES/DGA-FEDER/T60-20R-AFFECTIVE LAB info:eu-repo/grantAgreement/ES/MCIU-AEI-FEDER/RTI2018-096986-B-C31 info:eu-repo/semantics/openAccess All rights reserved http://www.europeana.eu/rights/rr-f/ 5.4 2022 NEUROSCIENCES 61 / 272 = 0.224 2022 Q1 T1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE 47 / 145 = 0.324 2022 Q2 T1 1.037 2022 Computer Science Applications 2022 Q1 Computer Vision and Pattern Recognition 2022 Q1 Cognitive Neuroscience 2022 Q2 7.7 2022 info:eu-repo/semantics/article info:eu-repo/semantics/acceptedVersion Esnaola, L.M. Lanzarini, L. Baldassarri, S. Universidad de Zaragoza (orcid)0000-0002-9315-6391 5007 570 Universidad de Zaragoza Dpto. Informát.Ingenie.Sistms. Área Lenguajes y Sistemas Inf. 14 (2022), 407–424 COGNITIVE COMPUTATION COGNITIVE COMPUTATION 1866-9956 1026787 http://zaguan.unizar.es/record/109500/files/texto_completo.pdf Postprint 2510656 http://zaguan.unizar.es/record/109500/files/texto_completo.jpg?subformat=icon icon Postprint oai:zaguan.unizar.es:109500 articulos driver 2024-03-18-12:37:00 ARTICLE