Toward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names

Selvi J.; Rodríguez Fernández R.J.; Soria-Olivas E.
doi:10.1109/ACCESS.2021.3111307
000151353 001__ 151353
000151353 005__ 20250307114715.0
000151353 0247_ $$2doi$$a10.1109/ACCESS.2021.3111307
000151353 0248_ $$2sideral$$a127150
000151353 037__ $$aART-2021-127150
000151353 041__ $$aeng
000151353 100__ $$aSelvi J.
000151353 245__ $$aToward Optimal LSTM Neural Networks for Detecting Algorithmically Generated Domain Names
000151353 260__ $$c2021
000151353 5060_ $$aAccess copy available to the general public$$fUnrestricted
000151353 5203_ $$aMalware detection is a problem that has become particularly challenging over the last decade. A common strategy for detecting malware is to scan network traffic for malicious connections between infected devices and their command and control (CC) servers. However, malware developers are aware of this detection method and begin to incorporate new strategies to go unnoticed. In particular, they generate domain names instead of using static Internet Protocol addresses or regular domain names pointing to their CC servers. By using a domain generation algorithm, the effectiveness of the blacklisting of domains is reduced, as the large number of domain names that must be blocked greatly increases the size of the blacklist. In this paper, we study different Long Short-Term Memory neural network hyperparameters to find the best network configuration for algorithmically generated domain name detection. In particular, we focus on determining whether the (complex) feature engineering efforts required when using other deep learning techniques, such as Random Forest, can be avoided. In this regard, we have conducted a comparative analysis to study the effect of using different network sizes and configurations on network performance metrics. Our results show an accuracy of 97.62% and an area under the receiver operating characteristic curve of 0.9956 in the test dataset, indicating that it is possible to obtain good classification results despite avoiding the feature engineering process and additional readjustments required in other machine learning techniques.
000151353 536__ $$9info:eu-repo/grantAgreement/ES/DGA/T21-20R-DISCO$$9info:eu-repo/grantAgreement/ES/MICIU/Medrese-RTI2018-098543-B-I00$$9info:eu-repo/grantAgreement/ES/UZ/JIUZ-2020-TIC-08
000151353 540__ $$9info:eu-repo/semantics/openAccess$$aby-nc-nd$$uhttp://creativecommons.org/licenses/by-nc-nd/3.0/es/
000151353 590__ $$a3.476$$b2021
000151353 591__ $$aCOMPUTER SCIENCE, INFORMATION SYSTEMS$$b79 / 163 = 0.485$$c2021$$dQ2$$eT2
000151353 591__ $$aTELECOMMUNICATIONS$$b43 / 92 = 0.467$$c2021$$dQ2$$eT2
000151353 591__ $$aENGINEERING, ELECTRICAL & ELECTRONIC$$b105 / 274 = 0.383$$c2021$$dQ2$$eT2
000151353 592__ $$a0.927$$b2021
000151353 593__ $$aEngineering (miscellaneous)$$c2021$$dQ1
000151353 593__ $$aComputer Science (miscellaneous)$$c2021$$dQ1
000151353 594__ $$a6.7$$b2021
000151353 655_4 $$ainfo:eu-repo/semantics/article$$vinfo:eu-repo/semantics/publishedVersion
000151353 700__ $$0(orcid)0000-0001-7982-0359$$aRodríguez Fernández R.J.$$uUniversidad de Zaragoza
000151353 700__ $$aSoria-Olivas E.
000151353 7102_ $$15007$$2570$$aUniversidad de Zaragoza$$bDpto. Informát.Ingenie.Sistms.$$cÁrea Lenguajes y Sistemas Inf.
000151353 773__ $$g9 (2021), 126446-126456$$pIEEE Access$$tIEEE Access$$x2169-3536
000151353 8564_ $$s705699$$uhttps://zaguan.unizar.es/record/151353/files/texto_completo.pdf$$yVersión publicada
000151353 8564_ $$s2711051$$uhttps://zaguan.unizar.es/record/151353/files/texto_completo.jpg?subformat=icon$$xicon$$yVersión publicada
000151353 909CO $$ooai:zaguan.unizar.es:151353$$particulos$$pdriver
000151353 951__ $$a2025-03-07-09:32:53
000151353 980__ $$aARTICLE
Repositorio Institucional de Documentos