<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
<record>
  <controlfield tag="001">151313</controlfield>
  <controlfield tag="005">20251017144618.0</controlfield>
  <datafield tag="024" ind1="7" ind2=" ">
    <subfield code="2">doi</subfield>
    <subfield code="a">10.1016/j.fsidi.2021.301120</subfield>
  </datafield>
  <datafield tag="024" ind1="8" ind2=" ">
    <subfield code="2">sideral</subfield>
    <subfield code="a">124080</subfield>
  </datafield>
  <datafield tag="037" ind1=" " ind2=" ">
    <subfield code="a">ART-2021-124080</subfield>
  </datafield>
  <datafield tag="041" ind1=" " ind2=" ">
    <subfield code="a">eng</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">Martín-Pérez, M.</subfield>
    <subfield code="u">Universidad de Zaragoza</subfield>
    <subfield code="0">(orcid)0000-0003-1666-5325</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">Bringing order to approximate matching: Classification and attacks on similarity digest algorithms</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021</subfield>
  </datafield>
  <datafield tag="506" ind1="0" ind2=" ">
    <subfield code="a">Access copy available to the general public</subfield>
    <subfield code="f">Unrestricted</subfield>
  </datafield>
  <datafield tag="520" ind1="3" ind2=" ">
    <subfield code="a">Fuzzy hashing or similarity hashing (a.k.a. bytewise approximate matching) converts digital artifacts into an intermediate representation to allow an efficient (fast) identification of similar objects, e.g., for blacklisting. They gained a lot of popularity over the past decade with new algorithms being developed and released to the digital forensics community. When releasing algorithms (e.g., as part of a scientific article), they are frequently compared with other algorithms to outline the benefits and sometimes also the weaknesses of the proposed approach. However, given the wide variety of algorithms and approaches, it is impossible to provide direct comparisons with all existing algorithms. In this paper, we present the first classification of approximate matching algorithms which allows an easier description and comparisons. Therefore, we first reviewed existing literature to understand the techniques various algorithms use and to familiarize ourselves with the common terminology. Our findings allowed us to develop a categorization relying heavily on the terminology proposed by NIST SP 800-168. In addition to the categorization, this article presents an abstract set of attacks against algorithms and why they are feasible. Lastly, we detail the characteristics needed to build robust algorithms to prevent attacks. We believe that this article helps newcomers, practitioners, and experts alike to better compare algorithms, understand their potential, as well as characteristics and implications they may have on forensic investigations.</subfield>
  </datafield>
  <datafield tag="536" ind1=" " ind2=" ">
    <subfield code="9">info:eu-repo/grantAgreement/ES/DGA/T21-20R-DISCO</subfield>
    <subfield code="9">info:eu-repo/grantAgreement/ES/MICIU/Medrese-RTI2018-098543-B-I00</subfield>
    <subfield code="9">info:eu-repo/grantAgreement/ES/MINECO-INCIBE/INCIBEC-2015-02486</subfield>
    <subfield code="9">info:eu-repo/grantAgreement/ES/MINECO-INCIBE/INCIBEI-2015-27300</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="9">info:eu-repo/semantics/openAccess</subfield>
    <subfield code="a">by-nc-nd</subfield>
    <subfield code="u">https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es</subfield>
  </datafield>
  <datafield tag="590" ind1=" " ind2=" ">
    <subfield code="a">1.805</subfield>
    <subfield code="b">2021</subfield>
  </datafield>
  <datafield tag="591" ind1=" " ind2=" ">
    <subfield code="a">COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS</subfield>
    <subfield code="b">94 / 112 = 0.839</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q4</subfield>
    <subfield code="e">T3</subfield>
  </datafield>
  <datafield tag="591" ind1=" " ind2=" ">
    <subfield code="a">COMPUTER SCIENCE, INFORMATION SYSTEMS</subfield>
    <subfield code="b">131 / 163 = 0.804</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q4</subfield>
    <subfield code="e">T3</subfield>
  </datafield>
  <datafield tag="592" ind1=" " ind2=" ">
    <subfield code="a">1.23</subfield>
    <subfield code="b">2021</subfield>
  </datafield>
  <datafield tag="593" ind1=" " ind2=" ">
    <subfield code="a">Computer Science Applications</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q1</subfield>
  </datafield>
  <datafield tag="593" ind1=" " ind2=" ">
    <subfield code="a">Pathology and Forensic Medicine</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q1</subfield>
  </datafield>
  <datafield tag="593" ind1=" " ind2=" ">
    <subfield code="a">Medical Laboratory Technology</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q1</subfield>
  </datafield>
  <datafield tag="593" ind1=" " ind2=" ">
    <subfield code="a">Information Systems</subfield>
    <subfield code="c">2021</subfield>
    <subfield code="d">Q1</subfield>
  </datafield>
  <datafield tag="594" ind1=" " ind2=" ">
    <subfield code="a">5.0</subfield>
    <subfield code="b">2021</subfield>
  </datafield>
  <datafield tag="655" ind1=" " ind2="4">
    <subfield code="a">info:eu-repo/semantics/article</subfield>
    <subfield code="v">info:eu-repo/semantics/publishedVersion</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Rodríguez, R.J.</subfield>
    <subfield code="u">Universidad de Zaragoza</subfield>
    <subfield code="0">(orcid)0000-0001-7982-0359</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Breitinger, F.</subfield>
  </datafield>
  <datafield tag="710" ind1="2" ind2=" ">
    <subfield code="1">5007</subfield>
    <subfield code="2">570</subfield>
    <subfield code="a">Universidad de Zaragoza</subfield>
    <subfield code="b">Dpto. Informát.Ingenie.Sistms.</subfield>
    <subfield code="c">Área Lenguajes y Sistemas Inf.</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="g">36, Suplem. (2021), 301120 [9 pp.]</subfield>
    <subfield code="p">Forensic sci. int. digital invest.</subfield>
    <subfield code="t">Forensic science international. Digital investigation</subfield>
    <subfield code="x">2666-2825</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">334204</subfield>
    <subfield code="u">http://zaguan.unizar.es/record/151313/files/texto_completo.pdf</subfield>
    <subfield code="y">Versión publicada</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2676072</subfield>
    <subfield code="u">http://zaguan.unizar.es/record/151313/files/texto_completo.jpg?subformat=icon</subfield>
    <subfield code="x">icon</subfield>
    <subfield code="y">Versión publicada</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="o">oai:zaguan.unizar.es:151313</subfield>
    <subfield code="p">articulos</subfield>
    <subfield code="p">driver</subfield>
  </datafield>
  <datafield tag="951" ind1=" " ind2=" ">
    <subfield code="a">2025-10-17-14:20:38</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">ARTICLE</subfield>
  </datafield>
</record>
</collection>