Resumen: In order to perform complex tasks in realistic human environments, robots need to be able to learn new concepts in the wild, incrementally, and through their interactions with humans. This article presents an end-to-end pipeline to learn object models incrementally during the human-robot interaction (HRI). The pipeline we propose consists of three parts: 1) recognizing the interaction type; 2) detecting the object that the interaction is targeting; and 3) learning incrementally the models from data recorded by the robot sensors. Our main contributions lie in the target object detection, guided by the recognized interaction, and in the incremental object learning. The novelty of our approach is the focus on natural, heterogeneous, and multimodal HRIs to incrementally learn new object models. Throughout the article, we highlight the main challenges associated with this problem, such as high degree of occlusion and clutter, domain change, low-resolution data, and interaction ambiguity. This article shows the benefits of using multiview approaches and combining visual and language features, and our experimental results outperform standard baselines. Idioma: Inglés DOI: 10.1109/TASE.2020.2980246 Año: 2020 Publicado en: IEEE Transactions on Automation Science and Engineering 17, 4 (2020), 1883 - 1900 ISSN: 1545-5955 Factor impacto JCR: 5.083 (2020) Categ. JCR: AUTOMATION & CONTROL SYSTEMS rank: 16 / 63 = 0.254 (2020) - Q2 - T1 Factor impacto SCIMAGO: 1.314 - Electrical and Electronic Engineering (Q1) - Control and Systems Engineering (Q1)