LLMs for industrial databases: an agro-food production plant use case

Dranca, Lacramioara ; Donate, Pablo (Universidad de Zaragoza) ; Sanguesa, Julio A. (Universidad de Zaragoza) ; Garrido, Piedad ; Torres-Sanz, Vicente (Universidad de Zaragoza) ; Martinez, Francisco J. (Universidad de Zaragoza)
LLMs for industrial databases: an agro-food production plant use case
Resumen: Introduction:
Extracting value from industrial time-series databases such as InfluxDB 2.0 requires expertise in specialized query languages (InfluxQL, Flux) that domain experts typically lack, and no labeled corpora exist for translating natural language into them.

Methods:
We present a modular system that translates questions written in Spanish into executable InfluxQL and Flux queries over an InfluxDB 2.0 instance deployed in a real agro-food production plant operating under a self-consumption energy scheme. It comprises a semantic layer implemented as a knowledge graph encoding the InfluxDB schema and its correspondence with the plant's physical components; a hierarchical entity-linking module; a lightweight language model fine-tuned for NL-to-query generation; and a query validation and sanitization module. To obtain training data, we develop a fully automated synthetic dataset distillation pipeline that uses a large teacher model with contextual retrieval from the official InfluxDB 2.0 documentation; each candidate undergoes parsing, AST extraction, and semantic checking against the knowledge graph, and only validated samples are retained. The corpus fine-tunes compact Small Language Models through domain-level conditioning followed by task-specific instruction tuning.

Results:
Performance is evaluated on a manually curated suite of 56 Spanish queries, evenly distributed across seven operationally relevant query families. Compact models reliably generate syntactically valid queries, but functional evaluation against the live instance reveals a three-layer pattern-high parser validity, moderate execution success, and lower result correctness-that locates the residual gap at the semantic layer.

Discussion:
The contributions are a complete NL-to-InfluxDB pipeline grounded in an explicit semantic representation of a real industrial schema; a documentation-driven synthetic data generation process with automatic syntactic and semantic verification; and a parameter-efficient fine-tuning strategy enabling query generation with lightweight models suitable for resource-constrained environments.

Idioma: Inglés
DOI: 10.3389/frai.2026.1764367
Año: 2026
Publicado en: Frontiers in artificial intelligence 9 (2026), [30 pp.]
ISSN: 2624-8212

Financiación: info:eu-repo/grantAgreement/ES/DGA-FSE/T40-23D
Tipo y forma: Artículo (Versión definitiva)
Área (Departamento): Área Arquit.Tecnología Comput. (Dpto. Informát.Ingenie.Sistms.)
Área (Departamento): Área Lenguajes y Sistemas Inf. (Dpto. Informát.Ingenie.Sistms.)


Creative Commons Debe reconocer adecuadamente la autoría, proporcionar un enlace a la licencia e indicar si se han realizado cambios. Puede hacerlo de cualquier manera razonable, pero no de una manera que sugiera que tiene el apoyo del licenciador o lo recibe por el uso que hace.


Exportado de SIDERAL (2026-06-03-11:04:50)


Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
Artículos > Artículos por área > Arquitectura y Tecnología de Computadores
Artículos > Artículos por área > Lenguajes y Sistemas Informáticos



 Registro creado el 2026-06-03, última modificación el 2026-06-03


Versión publicada:
 PDF
Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)