Abstract: In planetary environments with extreme visual aliasing, traditional place recognition systems for robots encounter diculties in unstructured and aliased environments. E↵ective place recognition is essential for robust localization and mapping, which, in turn, significantly impacts the performance of Simultaneous Localization and Mapping (SLAM) systems. This research aims to enhance existing place recognition systems by utilizing both LiDAR and visual information, improving performance in extreme environments. The use of LiDAR is crucial, as it provides valuable geometric data that complements visual data, resulting in more expressive and robust 3D grounded global features. We evaluated our methods using the Mt. Etna dataset and a synthetic dataset generated with the OAISYS tool. Our comprehensive review of state-of-the-art place recognition systems led to the development of a novel UMF (Unifying Local and Global Multimodal Features with Transformers) model, specifically designed for place recognition in environments with extreme aliasing. The UMF model integrates elements from the most advanced methods, enhancing performance in challenging environments by capturing intricate relationships between local and global context in both LiDAR and visual data. Two variants of the UMF model were explored, o↵ering alternative ways of processing and utilizing fine local features. Our UMF model outperforms other state-of-the-art methods in place recognition tasks, demonstrating the project’s success. The improved place recognition capabilities o↵ered by the UMF model can contribute to more accurate and robust SLAM systems, enabling robots to better navigate and explore unstructured and aliased environments. This research highlights the importance of multi-modal fusion, particularly the integration of LiDAR and visual data, in addressing the challenges of place recognition in aliased and low-texture environments. It also opens an exciting line of research focus in unified fusion multimodal approaches for robotics, computer vision, and machine learning applications, with a direct impact on SLAM and other related fields.