000109500 001__ 109500
000109500 005__ 20240319080946.0
000109500 0247_ $$2doi$$a10.1007/s12559-020-09800-x
000109500 0248_ $$2sideral$$a123375
000109500 037__ $$aART-2022-123375
000109500 041__ $$aeng
000109500 100__ $$aTessore, J.P.
000109500 245__ $$aDistant Supervised Construction and Evaluation of a Novel Dataset of Emotion-Tagged Social Media Comments in Spanish
000109500 260__ $$c2022
000109500 5060_ $$aAccess copy available to the general public$$fUnrestricted
000109500 5203_ $$aTagged language resources are an essential requirement for developing machine-learning text-based classifiers. However, manual tagging is extremely time consuming and the resulting datasets are rather small, containing only a few thousand samples. Basic emotion datasets are particularly difficult to classify manually because categorization is prone to subjectivity, and thus, redundant classification is required to validate the assigned tag. Even though, in recent years, the amount of emotion-tagged text datasets in Spanish has been growing, it cannot be compared with the number, size, and quality of the datasets in English. Quality is a particularly concerning issue, as not many datasets in Spanish included a validation step in the construction process. In this article, a dataset of social media comments in Spanish is compiled, selected, filtered, and presented. A sample of the dataset is reclassified by a group of psychologists and validated using the Fleiss Kappa interrater agreement measure. Error analysis is performed by using the Sentic Computing tool BabelSenticNet. Results indicate that the agreement between the human raters and the automatically acquired tag is moderate, similar to other manually tagged datasets, with the advantages that the presented dataset contains several hundreds of thousands of tagged comments and it does not require extensive manual tagging. The agreement measured between human raters is very similar to the one between human raters and the original tag. Every measure presented is in the moderate agreement zone and, as such, suitable for training classification algorithms in sentiment analysis field.
000109500 536__ $$9info:eu-repo/grantAgreement/ES/DGA-FEDER/T60-20R-AFFECTIVE LAB$$9info:eu-repo/grantAgreement/ES/MCIU-AEI-FEDER/RTI2018-096986-B-C31
000109500 540__ $$9info:eu-repo/semantics/openAccess$$aAll rights reserved$$uhttp://www.europeana.eu/rights/rr-f/
000109500 590__ $$a5.4$$b2022
000109500 592__ $$a1.037$$b2022
000109500 591__ $$aNEUROSCIENCES$$b61 / 272 = 0.224$$c2022$$dQ1$$eT1
000109500 593__ $$aComputer Science Applications$$c2022$$dQ1
000109500 591__ $$aCOMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE$$b47 / 145 = 0.324$$c2022$$dQ2$$eT1
000109500 593__ $$aComputer Vision and Pattern Recognition$$c2022$$dQ1
000109500 593__ $$aCognitive Neuroscience$$c2022$$dQ2
000109500 594__ $$a7.7$$b2022
000109500 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/acceptedVersion
000109500 700__ $$aEsnaola, L.M.
000109500 700__ $$aLanzarini, L.
000109500 700__ $$0(orcid)0000-0002-9315-6391$$aBaldassarri, S.$$uUniversidad de Zaragoza
000109500 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf.
000109500 773__ $$g14 (2022), 407–424$$pCOGNITIVE COMPUTATION$$tCOGNITIVE COMPUTATION$$x1866-9956
000109500 8564_ $$s1026787$$uhttps://zaguan.unizar.es/record/109500/files/texto_completo.pdf$$yPostprint
000109500 8564_ $$s2510656$$uhttps://zaguan.unizar.es/record/109500/files/texto_completo.jpg?subformat=icon$$xicon$$yPostprint
000109500 909CO $$ooai:zaguan.unizar.es:109500$$particulos$$pdriver
000109500 951__ $$a2024-03-18-12:37:00
000109500 980__ $$aARTICLE