000165093 001__ 165093
000165093 005__ 20251212165957.0
000165093 0247_ $$2doi$$a10.1016/j.fsidi.2025.301930
000165093 0248_ $$2sideral$$a146651
000165093 037__ $$aART-2025-146651
000165093 041__ $$aeng
000165093 100__ $$aHuici, Daniel$$uUniversidad de Zaragoza
000165093 245__ $$aAn extensible and scalable system for hash lookup and approximate similarity search with similarity digest algorithms
000165093 260__ $$c2025
000165093 5060_ $$aAccess copy available to the general public$$fUnrestricted
000165093 5203_ $$aEfficient management and analysis of large volumes of digital data has emerged as a major challenge in the field of digital forensics. To quickly identify and analyze relevant artifacts within large datasets, we introduce APOTHEOSIS, an approximate similarity search system designed for scalability and efficiency. Our system integrates approximate search techniques (which allow searching for a match on a close value) with Similarity Digest Algorithms (SDA; which capture common features between similar elements), using a space-saving radix tree and a graph-based hierarchical navigable small world structure to perform fast approximate nearest neighbor searches. We demonstrate the effectiveness and versatility of our system through two key case studies: first, in plagiarism detection, demonstrating the effectiveness of our system in identifying similar or duplicate documents within a large source code dataset; then, in memory artifact detection, showing its scalability and performance in processing large-scale forensic data collected from various versions of Microsoft Windows. Our comprehensive evaluation shows that APOTHEOSIS not only efficiently handles large datasets, but also provides a way to evaluate the performance of various SDA and their approximate similarity search in different forensic scenarios.
000165093 536__ $$9info:eu-repo/grantAgreement/ES/AEI/PID2020-113903RB-I00$$9info:eu-repo/grantAgreement/ES/DGA/T21-23R$$9info:eu-repo/grantAgreement/ES/DGA/T42-23R$$9info:eu-repo/grantAgreement/ES/MCIU/PID2023-151467OA-I00$$9info:eu-repo/grantAgreement/EUR/MICINN/TED2021-131115A-I00
000165093 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc-nd$$uhttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
000165093 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000165093 700__ $$0(orcid)0000-0001-7982-0359$$aRodríguez, Ricardo J.$$uUniversidad de Zaragoza
000165093 700__ $$0(orcid)0000-0002-7462-0080$$aMena, Eduardo$$uUniversidad de Zaragoza
000165093 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf.
000165093 773__ $$g53 (2025), 301930 [9 pp.]$$pForensic sci. int. digital invest.$$tForensic science international. Digital investigation$$x2666-2825
000165093 8564_ $$s3594962$$uhttps://zaguan.unizar.es/record/165093/files/texto_completo.pdf$$yVersión publicada
000165093 8564_ $$s2658292$$uhttps://zaguan.unizar.es/record/165093/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000165093 909CO $$ooai:zaguan.unizar.es:165093$$particulos$$pdriver
000165093 951__ $$a2025-12-12-14:42:32
000165093 980__ $$aARTICLE