000152275 001__ 152275
000152275 005__ 20250401114420.0
000152275 037__ $$aTAZ-TFM-2025-016
000152275 041__ $$aeng
000152275 1001_ $$aBorja Moreno, César
000152275 24200 $$aLeveraging foundation models to improve weakly supervised segmentation models in wildlife monitoring applications
000152275 24500 $$aLeveraging foundation models to improve weakly supervised segmentation models in wildlife monitoring applications
000152275 260__ $$aZaragoza$$bUniversidad de Zaragoza$$c2025
000152275 506__ $$aby-nc-sa$$bCreative Commons$$c3.0$$uhttp://creativecommons.org/licenses/by-nc-sa/3.0/
000152275 520__ $$aSemantic segmentation is a widely studied visual recognition task that focus on assigning a semantic label to each pixel in an image, offering a detailed understanding of the scene. However, training semantic segmentation models typically requires large amounts of high-quality pixel-level annotations. These annotations are often limited in many specific fields, as they require a significant human effort. Wildlife monitoring, and specially underwater imagery, is a clear example of a very relevant domain where such detailed annotations are  scarce. This lack of pixel-level annotations and the huge human effort required to produce them, motivates the need to develop automatic tools that ease the labelling process required to train a semantic segmentation model for such a specific domain. In this work, we propose to leverage powerful foundation models to develop weak supervision strategies that generate dense and detailed labels from limited annotations. This approach could significantly reduce the time spent on manual labelling, making ecological research more efficient and helping researchers analyze into the health and dynamics of wildlife environments. Specifically, we explore label augmentation focusing on the next challenge: generate a ``dense'' semantic segmentation of an underwater image from a set of sparse point-level labels provided by an expert. Our approach is built upon SAM2 segmentation and DINOv2 features extraction capabilities. It starts with the propagation of all sparse point-labels across the image which is followed by a posterior refinement of the propagated segmentation by predicting labels for the remaining unlabeled pixels. As result, we generate a dense semantic segmentation from minimal annotations. The experiments demonstrate that our approach outperforms current state-of-the-art superpixel based method in terms of label augmentation quality. This improvement is particularly highlighted when we start from a extremely low number of point-labels ($\sim$ 0.01\% of image pixels) and when we qualitatively compare the mask shapes. Furthermore, we validate  our approach, training a semantic segmentation model like SegFormer using only our  augmented labels as supervision for the model. The results show that our SegFormer training strategy achieves competitive performance than when we trained it with dense ground truth labels.<br />
000152275 521__ $$aMáster Universitario en Robótica, Gráficos y Visión por Computador
000152275 540__ $$aDerechos regulados por licencia Creative Commons
000152275 691__ $$a14
000152275 692__ $$aEl trabajo contribuye a la mejora de la comprensión de escenas en entornos submarinos, proporcionando una herramienta util para el procesado eficiente de imágenes de estos entornos por parte de expertos.
000152275 700__ $$aMurillo Arnal, Ana Cristina$$edir.
000152275 700__ $$aPlou Izquierdo, Carlos$$edir.
000152275 7102_ $$aUniversidad de Zaragoza$$bInformática e Ingeniería de Sistemas$$cIngeniería de Sistemas y Automática
000152275 8560_ $$f800675@unizar.es
000152275 8564_ $$s84978726$$uhttps://zaguan.unizar.es/record/152275/files/TAZ-TFM-2025-016.pdf$$yMemoria (eng)
000152275 909CO $$ooai:zaguan.unizar.es:152275$$pdriver$$ptrabajos-fin-master
000152275 950__ $$a
000152275 951__ $$adeposita:2025-04-01
000152275 980__ $$aTAZ$$bTFM$$cEINA
000152275 999__ $$a20250124191440.CREATION_DATE