Evaluating large language models effectiveness for flow-based intrusion detection: a comparative study with ML and DL baselines

Mehavilla, Lorena; García, José; Alesanco, Álvaro; Rodríguez, María

doi:10.1007/s10462-025-11432-2

Evaluating large language models effectiveness for flow-based intrusion detection: a comparative study with ML and DL baselines

Mehavilla, Lorena (Universidad de Zaragoza) ; Rodríguez, María (Universidad de Zaragoza) ; García, José (Universidad de Zaragoza) ; Alesanco, Álvaro (Universidad de Zaragoza)

Resumen: This paper presents the first systematic benchmark evaluating Large Language Models (LLMs), specifically GPT-2, GPT-Neo-125M, and LLaMA-3.2-1B, as standalone classifiers for intrusion detection, covering both binary and multiclass classification tasks, using structured Zeek logs derived from the CIC IoT 2023 dataset. We compare their performance against established and widely used Machine Learning (XGBoost, Random Forest, Decision Tree) and Deep Learning models (MLP, GRU, LeNet-5) across key evaluation metrics: detection effectiveness (precision, recall and F1-score), inference speed, and resource consumption. All models are consistently trained and rigorously evaluated on the CIC IoT 2023 dataset, ensuring fair, reproducible, and transparent comparisons. Our findings indicate that while LLMs achieve strong F1-score exceeding 95%, and do not fully utilize available GPU resources, they still do not outperform top-performing ML models. Notably XGBoost achieves a higher F1-score of 96.96%, using only 4% of the available CPU. These results emphasize the practical trade-offs between detection capability, inference efficiency, and hardware requirements when applying LLMs in flow-based IDS contexts, particularly in resource-constrained environments such as IoT or edge deployments.
Idioma: Inglés
DOI: 10.1007/s10462-025-11432-2
Año: 2026
Publicado en: ARTIFICIAL INTELLIGENCE REVIEW 59, 2 (2026), [38 pp.]
ISSN: 0269-2821
Financiación: info:eu-repo/grantAgreement/ES/DGA/T31-20R
Financiación: info:eu-repo/grantAgreement/ES/MCINN/PID2022-136476OB-I00
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Ingeniería Telemática (Dpto. Ingeniería Electrón.Com.)

Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.

Exportado de SIDERAL (2026-01-26-14:50:32)

Enlace permanente:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Ingenieria Telematica

Volver a la búsqueda

Registro creado el 2026-01-26, última modificación el 2026-01-26

Versión publicada:
PDF

Valore este documento:

(Sin ninguna reseña)

Añadir a una carpeta personal
Exportar como BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Repositorio Institucional de Documentos

Evaluating large language models effectiveness for flow-based intrusion detection: a comparative study with ML and DL baselines