000171046 001__ 171046 000171046 005__ 20260505142650.0 000171046 0247_ $$2doi$$a10.1109/TVCG.2026.3679918 000171046 0248_ $$2sideral$$a149157 000171046 037__ $$aART-2026-149157 000171046 041__ $$aeng 000171046 100__ $$aJiménez-Navarro, Daniel 000171046 245__ $$aOverdriving Visual Depth Perception via Sound Modulation in VR 000171046 260__ $$c2026 000171046 5060_ $$aAccess copy available to the general public$$fUnrestricted 000171046 5203_ $$aOur ability to perceive and navigate the spatial world is a cornerstone of human experience, relying on the integration of visual and auditory cues to form a coherent sense of depth and distance. In stereoscopic 3D vision, depth perception requires fixation of both eyes on a target object, which is achieved through vergence movements, with convergence for near objects and divergence for distant ones. In contrast, auditory cues provide complementary depth information through variations in loudness, interaural differences (IAD), and the frequency spectrum. We investigate the interaction between visual and auditory cues and examine how contradictory auditory information can overdrive visual depth perception in virtual reality (VR). When a new visual target appears, we introduce a spatial discrepancy between the visual and auditory cues: the visual target is shifted closer to the previously fixated object, while the corresponding sound localization is displaced in the opposite direction. By integrating these conflicting cues through multimodal processing, the resulting percept is biased toward the intended depth location. This audiovisual fusion counteracts depth compression, thus reducing the required vergence magnitude and enabling faster gaze retargeting. Such audio-driven depth enhancement may further help mitigate the vergence-accommodation conflict (VAC) in scenarios where physical depth must be compressed. In a series of psychophysical studies, we first assess the efficiency of depth overdriving for various VR-relevant combinations of initial fixations and shifted target locations, considering different scenarios of audio displacements and their loudness and frequency parameters. Next, we quantify the resulting speedup in gaze retargeting for target shifts that can be successfully overdriven by sound manipulations. Finally, we apply our method in a naturalistic VR scenario where user interface interactions with the scene show an extended perceptual depth. 000171046 536__ $$9info:eu-repo/grantAgreement/EC/H2020/101220555/EU/Predictive computational models for Adaptive Extended reality/PROXIE$$9This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No H2020 101220555-PROXIE$$9info:eu-repo/grantAgreement/ES/MICIU/PID2022-141766OB-I00 000171046 540__ $$9info:eu-repo/semantics/openAccess$$aby$$uhttps://creativecommons.org/licenses/by/4.0/deed.es 000171046 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion 000171046 700__ $$aGroth, Colin 000171046 700__ $$aPeng, Xi 000171046 700__ $$0(orcid)0009-0001-6833-6147$$aPina, Jorge$$uUniversidad de Zaragoza 000171046 700__ $$aSun, Qi 000171046 700__ $$aChakravarthula, Praneeth 000171046 700__ $$aMyszkowski, Karol 000171046 700__ $$aSeidel, Hans-Peter 000171046 700__ $$0(orcid)0000-0002-7796-3177$$aSerrano, Ana$$uUniversidad de Zaragoza 000171046 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf. 000171046 773__ $$g(2026), [11 pp.]$$pIEEE trans. vis. comput. graph.$$tIEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS$$x1077-2626 000171046 8564_ $$s7580682$$uhttps://zaguan.unizar.es/record/171046/files/texto_completo.pdf$$yVersión publicada 000171046 8564_ $$s3616743$$uhttps://zaguan.unizar.es/record/171046/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada 000171046 909CO $$ooai:zaguan.unizar.es:171046$$particulos$$pdriver 000171046 951__ $$a2026-05-05-13:36:30 000171046 980__ $$aARTICLE