000112768 001__ 112768 000112768 005__ 20240319080953.0 000112768 0247_ $$2doi$$a10.2196/30345 000112768 0248_ $$2sideral$$a128188 000112768 037__ $$aART-2022-128188 000112768 041__ $$aeng 000112768 100__ $$aMontoto, C. 000112768 245__ $$aEvaluation of Natural Language Processing for the Identification of Crohn Disease-Related Variables in Spanish Electronic Health Records:A Validation Study for the PREMONITION-CD Project 000112768 260__ $$c2022 000112768 5060_ $$aAccess copy available to the general public$$fUnrestricted 000112768 5203_ $$aBackground: The exploration of clinically relevant information in the free text of electronic health records (EHRs) holds the potential to positively impact clinical practice as well as knowledge regarding Crohn disease (CD), an inflammatory bowel disease that may affect any segment of the gastrointestinal tract. The EHRead technology, a clinical natural language processing (cNLP) system, was designed to detect and extract clinical information from narratives in the clinical notes contained in EHRs. Objective: The aim of this study is to validate the performance of the EHRead technology in identifying information of patients with CD. Methods: We used the EHRead technology to explore and extract CD-related clinical information from EHRs. To validate this tool, we compared the output of the EHRead technology with a manually curated gold standard to assess the quality of our cNLP system in detecting records containing any reference to CD and its related variables. Results: The validation metrics for the main variable (CD) were a precision of 0.88, a recall of 0.98, and an F1 score of 0.93. Regarding the secondary variables, we obtained a precision of 0.91, a recall of 0.71, and an F1 score of 0.80 for CD flare, while for the variable vedolizumab (treatment), a precision, recall, and F1 score of 0.86, 0.94, and 0.90 were obtained, respectively. Conclusions: This evaluation demonstrates the ability of the EHRead technology to identify patients with CD and their related variables from the free text of EHRs. To the best of our knowledge, this study is the first to use a cNLP system for the identification of CD in EHRs written in Spanish. © 2022 JMIR Medical Informatics. All rights reserved. 000112768 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttp://creativecommons.org/licenses/by/3.0/es/ 000112768 590__ $$a3.2$$b2022 000112768 592__ $$a0.963$$b2022 000112768 591__ $$aMEDICAL INFORMATICS$$b18 / 31 = 0.581$$c2022$$dQ3$$eT2 000112768 593__ $$aHealth Information Management$$c2022$$dQ2 000112768 593__ $$aHealth Informatics$$c2022$$dQ2 000112768 594__ $$a5.6$$b2022 000112768 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion 000112768 700__ $$aGisbert, J.P. 000112768 700__ $$aGuerra, I. 000112768 700__ $$aPlaza, R. 000112768 700__ $$aPajares Villarroya, R. 000112768 700__ $$aMoreno Almazán, L. 000112768 700__ $$aLópez Martín, M. Del Carmen 000112768 700__ $$aDomínguez Antonaya, M. 000112768 700__ $$aVera Mendoza, I. 000112768 700__ $$aAparicio, J. 000112768 700__ $$aMartínez, V. 000112768 700__ $$aTagarro, I. 000112768 700__ $$aFernandez-Nistal, A. 000112768 700__ $$aCanales, L. 000112768 700__ $$aMenke, S. 000112768 700__ $$0(orcid)0000-0003-0076-3529$$aGomollón, F.$$uUniversidad de Zaragoza 000112768 7102_ $$11007$$2610$$aUniversidad de Zaragoza$$bDpto. Medicina, Psiqu. y Derm.$$cArea Medicina 000112768 773__ $$g10, 2 (2022), e30345 [9 pp.]$$pJMIR med. inform.$$tJMIR medical informatics$$x2291-9694 000112768 8564_ $$s234927$$uhttps://zaguan.unizar.es/record/112768/files/texto_completo.pdf$$yVersión publicada 000112768 8564_ $$s1956879$$uhttps://zaguan.unizar.es/record/112768/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada 000112768 909CO $$ooai:zaguan.unizar.es:112768$$particulos$$pdriver 000112768 951__ $$a2024-03-18-13:16:12 000112768 980__ $$aARTICLE