Detection of algorithmically generated malicious domain names using masked N-grams

Selvi, J.; Rodríguez, R.J.; Soria-Olivas, E.

doi:10.1016/j.eswa.2019.01.050

Detection of algorithmically generated malicious domain names using masked N-grams

Selvi, J. ; Rodríguez, R.J. (Universidad de Zaragoza) ; Soria-Olivas, E.

Resumen: Malware detection is a challenge that has increased in complexity in the last few years. A widely adopted strategy is to detect malware by means of analyzing network traffic, capturing the communications with their command and control (C&C;) servers. However, some malware families have shifted to a stealthier communication strategy, since anti-malware companies maintain blacklists of known malicious locations. Instead of using static IP addresses or domain names, they algorithmically generate domain names that may host their C&C; servers. Hence, blacklist approaches become ineffective since the number of domain names to block is large and varies from time to time. In this paper, we introduce a machine learning approach using Random Forest that relies on purely lexical features of the domain names to detect algorithmically generated domains. In particular, we propose using masked N-grams, together with other statistics obtained from the domain name. Furthermore, we provide a dataset built for experimentation that contains regular and algorithmically generated domain names, coming from different malware families. We also classify these families according to their type of domain generation algorithm. Our findings show that masked N-grams provide detection accuracy that is comparable to that of other existing techniques, but with much better performance.
Idioma: Inglés
DOI: 10.1016/j.eswa.2019.01.050
Año: 2019
Publicado en: Expert Systems with Applications 124 (2019), 156-163
ISSN: 0957-4174
Factor impacto JCR: 5.452 (2019)
Categ. JCR: COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE rank: 21 / 136 = 0.154 (2019) - Q1 - T1
Categ. JCR: OPERATIONS RESEARCH & MANAGEMENT SCIENCE rank: 2 / 83 = 0.024 (2019) - Q1 - T1
Categ. JCR: ENGINEERING, ELECTRICAL & ELECTRONIC rank: 32 / 266 = 0.12 (2019) - Q1 - T1
Factor impacto SCIMAGO: 1.494 - Artificial Intelligence (Q1) - Engineering (miscellaneous) (Q1) - Computer Science Applications (Q1)

Financiación: info:eu-repo/grantAgreement/ES/DGA/T21-17R-DISCO
Tipo y forma: Article (PostPrint)
Área (Departamento): Área Lenguajes y Sistemas Inf. (Dpto. Informát.Ingenie.Sistms.)
Exportado de SIDERAL (2020-07-16-09:17:48)

Permalink:

Visitas y descargas

Este artículo se encuentra en las siguientes colecciones:
articulos > articulos-por-area > lenguajes_y_sistemas_informaticos

Retour à la recherche

Notice créée le 2020-02-04, modifiée le 2020-07-16

Postprint:
PDF

Évaluer ce document:

(Pas encore évalué)

Ajouter au panier personnel
Exporter vers BibTeX, MARC, MARCXML, DC, EndNote, NLM, RefWorks

Atlantis Institut des Sciences Fictives

Detection of algorithmically generated malicious domain names using masked N-grams