TAZ-TFM-2025-016


Leveraging foundation models to improve weakly supervised segmentation models in wildlife monitoring applications

Borja Moreno, César
Murillo Arnal, Ana Cristina (dir.) ; Plou Izquierdo, Carlos (dir.)

Universidad de Zaragoza, EINA, 2025
Departamento de Informática e Ingeniería de Sistemas, Área de Ingeniería de Sistemas y Automática

Máster Universitario en Robótica, Gráficos y Visión por Computador

Resumen: Semantic segmentation is a widely studied visual recognition task that focus on assigning a semantic label to each pixel in an image, offering a detailed understanding of the scene. However, training semantic segmentation models typically requires large amounts of high-quality pixel-level annotations. These annotations are often limited in many specific fields, as they require a significant human effort. Wildlife monitoring, and specially underwater imagery, is a clear example of a very relevant domain where such detailed annotations are scarce. This lack of pixel-level annotations and the huge human effort required to produce them, motivates the need to develop automatic tools that ease the labelling process required to train a semantic segmentation model for such a specific domain. In this work, we propose to leverage powerful foundation models to develop weak supervision strategies that generate dense and detailed labels from limited annotations. This approach could significantly reduce the time spent on manual labelling, making ecological research more efficient and helping researchers analyze into the health and dynamics of wildlife environments. Specifically, we explore label augmentation focusing on the next challenge: generate a ``dense'' semantic segmentation of an underwater image from a set of sparse point-level labels provided by an expert. Our approach is built upon SAM2 segmentation and DINOv2 features extraction capabilities. It starts with the propagation of all sparse point-labels across the image which is followed by a posterior refinement of the propagated segmentation by predicting labels for the remaining unlabeled pixels. As result, we generate a dense semantic segmentation from minimal annotations. The experiments demonstrate that our approach outperforms current state-of-the-art superpixel based method in terms of label augmentation quality. This improvement is particularly highlighted when we start from a extremely low number of point-labels ($\sim$ 0.01\% of image pixels) and when we qualitatively compare the mask shapes. Furthermore, we validate our approach, training a semantic segmentation model like SegFormer using only our augmented labels as supervision for the model. The results show that our SegFormer training strategy achieves competitive performance than when we trained it with dense ground truth labels.

Tipo de Trabajo Académico: Trabajo Fin de Master

Creative Commons License



El registro pertenece a las siguientes colecciones:
Trabajos académicos > Trabajos Académicos por Centro > Escuela de Ingeniería y Arquitectura
Trabajos académicos > Trabajos fin de máster



Volver a la búsqueda

Valore este documento:

Rate this document:
1
2
3
 
(Sin ninguna reseña)