000125757 001__ 125757 000125757 005__ 20241125101132.0 000125757 0247_ $$2doi$$a10.1109/ACCESS.2023.3243986 000125757 0248_ $$2sideral$$a133350 000125757 037__ $$aART-2023-133350 000125757 041__ $$aeng 000125757 100__ $$0(orcid)0000-0003-3813-4998$$aRibas, Dayana 000125757 245__ $$aAutomatic voice disorder detection using self-supervised representations 000125757 260__ $$c2023 000125757 5060_ $$aAccess copy available to the general public$$fUnrestricted 000125757 5203_ $$aMany speech features and models, including Deep Neural Networks (DNN), are used for classification tasks between healthy and pathological speech with the Saarbruecken Voice Database (SVD). However, accuracy values of 80.71% for phrases or 82.8% for vowels /aiu/ are the highest reported for audio samples in SVD when the evaluation includes the wide amount of pathologies in the database, instead of a selection of some pathologies. This paper targets this top performance in the state-of-the-art Automatic Voice Disorder Detection (AVDD) systems. In the framework of a DNN-based AVDD system we study the capability of Self-Supervised (SS) representation learning for describing discriminative cues between healthy and pathological speech. The system processes the SS temporal sequence of features with a single feed-forward layer and Class-Token (CT) Transformer for obtaining the classification between healthy and pathological speech. Furthermore, there is evaluated a suitable data extension of the training set with out-of-domain data is also evaluated to deal with the low availability of data for using DNN-based models in voice pathology detection. Experimental results using audio samples corresponding to phrases in the SVD dataset, including all pathologies available, show classification accuracy values until 93.36%. This means that the proposed AVDD system achieved accuracy improvements of 4.1% without the training data extension, and 15.62% after the training data extension compared to the baseline system. Beyond the novelty of using SS representations for AVDD, the fact of obtaining accuracies over 90% in these conditions and using the whole set of pathologies in the SVD is a milestone for voice disorder-related research. Furthermore, the study on the amount of in-domain data in the training set related to the system performance show guidance for the data preparation stage. Lessons learned in this work suggest guidelines for taking advantage of DNN, to boost the performance in developing automatic systems for diagnosis, treatment, and monitoring of voice pathologies. 000125757 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PDC2021-120846-C41$$9info:eu-repo/grantAgreement/ES/AEI/PID2021-126061OB-C44$$9info:eu-repo/grantAgreement/ES/DGA/T36-20R$$9info:eu-repo/grantAgreement/EC/H2020/101007666/EU/Exchanges for SPEech ReseArch aNd TechnOlogies/ESPERANTO$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101007666-ESPERANTO 000125757 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/ 000125757 590__ $$a3.4$$b2023 000125757 592__ $$a0.96$$b2023 000125757 591__ $$aCOMPUTER SCIENCE, INFORMATION SYSTEMS$$b87 / 250 = 0.348$$c2023$$dQ2$$eT2 000125757 591__ $$aTELECOMMUNICATIONS$$b47 / 119 = 0.395$$c2023$$dQ2$$eT2 000125757 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b122 / 353 = 0.346$$c2023$$dQ2$$eT2 000125757 593__ $$aEngineering (miscellaneous)$$c2023$$dQ1 000125757 593__ $$aMaterials Science (miscellaneous)$$c2023$$dQ1 000125757 593__ $$aComputer Science (miscellaneous)$$c2023$$dQ1 000125757 594__ $$a9.8$$b2023 000125757 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion 000125757 700__ $$aPastor, Miguel A. 000125757 700__ $$0(orcid)0000-0001-5803-4316$$aMiguel, Antonio$$uUniversidad de Zaragoza 000125757 700__ $$aMartinez, David 000125757 700__ $$0(orcid)0000-0002-3886-7748$$aOrtega, Alfonso$$uUniversidad de Zaragoza 000125757 700__ $$0(orcid)0000-0001-9137-4013$$aLleida, Eduardo$$uUniversidad de Zaragoza 000125757 7102_ $$15008$$2800$$aUniversidad de Zaragoza$$bDpto. Ingeniería Electrón.Com.$$cÁrea Teoría Señal y Comunicac. 000125757 773__ $$g11 (2023), 14915-14927$$pIEEE Access$$tIEEE Access$$x2169-3536 000125757 8564_ $$s1572941$$uhttps://zaguan.unizar.es/record/125757/files/texto_completo.pdf$$yVersión publicada 000125757 8564_ $$s2557966$$uhttps://zaguan.unizar.es/record/125757/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada 000125757 909CO $$ooai:zaguan.unizar.es:125757$$particulos$$pdriver 000125757 951__ $$a2024-11-22-11:59:11 000125757 980__ $$aARTICLE