Resumen: This letter presents an Open-Vocabulary Online 3D semantic mapping pipeline, that we denote by its acronym OVO. Given a sequence of posed RGB-D frames, we detect and track 3D segments, which we describe using CLIP vectors. These are computed from the viewpoints where they are observed by a novel CLIP merging method. Notably, our OVO has a significantly lower computational and memory footprint than offline baselines, while also showing better segmentation metrics than offline and online ones. Along with superior segmentation performance, we also show experimental results of our mapping contributions integrated with two different full SLAM backbones (Gaussian-SLAM and ORB-SLAM2), being the first ones using a neural network to merge CLIP descriptors and demonstrating end-to-end open-vocabulary online 3D mapping with loop closure. Idioma: Inglés DOI: 10.1109/LRA.2025.3617736 Año: 2025 Publicado en: IEEE Robotics and Automation Letters 10, 11 (2025), 11745-11752 ISSN: 2377-3766 Financiación: info:eu-repo/grantAgreement/ES/AEI/PID2024-155886NB-I00 Financiación: info:eu-repo/grantAgreement/ES/DGA/T45-23R Financiación: info:eu-repo/grantAgreement/ES/MICINN/PID2021-127685NB-I00 Tipo y forma: Artículo (Versión definitiva) Área (Departamento): Área Ingen.Sistemas y Automát. (Dpto. Informát.Ingenie.Sistms.)