Resumen: Introduction:
Extracting value from industrial time-series databases such as InfluxDB 2.0 requires expertise in specialized query languages (InfluxQL, Flux) that domain experts typically lack, and no labeled corpora exist for translating natural language into them.
Methods:
We present a modular system that translates questions written in Spanish into executable InfluxQL and Flux queries over an InfluxDB 2.0 instance deployed in a real agro-food production plant operating under a self-consumption energy scheme. It comprises a semantic layer implemented as a knowledge graph encoding the InfluxDB schema and its correspondence with the plant's physical components; a hierarchical entity-linking module; a lightweight language model fine-tuned for NL-to-query generation; and a query validation and sanitization module. To obtain training data, we develop a fully automated synthetic dataset distillation pipeline that uses a large teacher model with contextual retrieval from the official InfluxDB 2.0 documentation; each candidate undergoes parsing, AST extraction, and semantic checking against the knowledge graph, and only validated samples are retained. The corpus fine-tunes compact Small Language Models through domain-level conditioning followed by task-specific instruction tuning.
Results:
Performance is evaluated on a manually curated suite of 56 Spanish queries, evenly distributed across seven operationally relevant query families. Compact models reliably generate syntactically valid queries, but functional evaluation against the live instance reveals a three-layer pattern-high parser validity, moderate execution success, and lower result correctness-that locates the residual gap at the semantic layer.