000112212 001__ 112212
000112212 005__ 20220510091957.0
000112212 037__ $$aTAZ-TFM-2022-016
000112212 041__ $$aeng
000112212 1001_ $$aBernal Berdún, Edurne
000112212 24200 $$aModeling human visual behavior in dynamic 360º environments.
000112212 24500 $$aModeling human visual behavior in dynamic 360º environments.
000112212 260__ $$aZaragoza$$bUniversidad de Zaragoza$$c2022
000112212 506__ $$aby-nc-sa$$bCreative Commons$$c3.0$$uhttp://creativecommons.org/licenses/by-nc-sa/3.0/
000112212 520__ $$aVirtual reality (VR) is rapidly growing: Advances in hardware, together with the current high computational power, are driving this technology, which has the potential to change the way people consume content, and has been predicted to become the next big computing paradigm. However, although it has become accessible at a consumer level, much still remains unknown about the grammar and visual language in this medium. Understanding and predicting how humans behave in virtual environments remains an open problem, since the visual behavior known for traditional screen-based content does not hold for immersive VR environments: In VR, the user has total control of the camera, and therefore content creators cannot ensure where viewers’ attention will be directed to. This understanding of visual behavior, however, can be crucial in many applications, such as novel compression and rendering techniques, content design, or virtual tourism, among others. Some works have been devoted to analyzing and modeling human visual behavior. Most of them have focused on identifying the content’s regions that attract the observers’ visual attention, resorting to saliency as a topological measure of what part of a virtual scene might be of more interest. When consuming virtual reality content, which can be either static (i.e., 360◦ images) or dynamic (i.e., 360◦ videos), there are many factors that affect human visual behavior, which are mainly associated with the scene shown in the VR video or image (e.g., colors, shapes, movements, etc.), but also depend on the subjects observing it (their mood and background, the task being performed, previous knowledge, etc.). Therefore, all these variables affecting saliency make its prediction a challenging task. This master thesis presents a novel saliency prediction model for VR videos based on a deep learning approach (DL). DL networks have shown outstanding results in image processing tasks, automatically inferring the most relevant information from images. The proposed model is the first to exploit the joint potential of convolutional (CNN) and recurrent (RNN) neural networks to extract and model the inherent spatio-temporal features from videos, employing RNNs to account for temporal information at the time of feature extraction, rather than to post-process spatial features as in previous works. It is also tailored to the particularities of dynamic VR videos, with the use of spherical convolutions and a novel spherical loss function for saliency prediction that work on a 3D space rather than in traditional image space. To facilitate spatio-temporal learning, this work is also the first in including the optical flow between 360◦ frames for saliency prediction, since movement is known to be a highly salient feature in dynamic content. The proposed model was evaluated qualitatively and quantitatively, proving to outperform state-of-the-art works. Moreover, an exhaustive ablation study demonstrates the effectiveness of the different design decisions made throughout the development of the model. <br />
000112212 521__ $$aMáster Universitario en Robótica, Gráficos y Visión por Computador
000112212 540__ $$aDerechos regulados por licencia Creative Commons
000112212 700__ $$aMartín Serrano, Daniel$$edir.
000112212 700__ $$aMasiá Corcoy, Belén$$edir.
000112212 7102_ $$aUniversidad de Zaragoza$$bInformática e Ingeniería de Sistemas$$cLenguajes y Sistemas Informáticos
000112212 8560_ $$f740233@unizar.es
000112212 8564_ $$s31723024$$uhttps://zaguan.unizar.es/record/112212/files/TAZ-TFM-2022-016.pdf$$yMemoria (eng)
000112212 909CO $$ooai:zaguan.unizar.es:112212$$pdriver$$ptrabajos-fin-master
000112212 950__ $$a
000112212 951__ $$adeposita:2022-05-10
000112212 980__ $$aTAZ$$bTFM$$cEINA
000112212 999__ $$a20220128104825.CREATION_DATE
Repositorio Institucional de Documentos