TY - GEN
T1 - Malware Classification by Deep Learning Using Characteristics of Hash Functions
AU - Baba, Takahiro
AU - Baba, Kensuke
AU - Yamauchi, Toshihiro
N1 - Funding Information:
Acknowledgments. A part of this research is supported by JST, PRESTO Grant Number JPMJPR1938 and JSPS Grants-in-Aid for Scientific Research JP19H05579.
Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - As the Internet develops, the number of Internet of Things (IoT) devices increases. Simultaneously, the risk of IoT devices being infected with malware also increases. Thus, malware detection has become an important issue. Dynamic analysis logs are effective at detecting malware, but it takes time to collect a large amount of data because the malware must be executed at least once before the logs can be collected. Moreover, dynamic analysis logs are affected by external factors such as the execution environment. A malware detection method that uses a static property analysis log could solve these problems. In this study, deep learning (DL) was used as a machine learning method because DL is effective for large-scale data and can automatically extract features. Research has been conducted on malware detection using static properties of portable executable (PE) files, establishing that such detection is possible. However, research on malware detection using hash functions such as Fuzzy hash and peHash is lacking. Therefore, we investigated the characteristics of hash values in malware classification. Moreover, when the surface analysis log is viewed in chronological order, that the data are considered have concept drift characteristics. Therefore, we compared malware detection performance using data with the concept drift property. We found that the hash function could be used to prevent performance degradation even with concept drift data. In an experiment combining PE surface information and hash values, concept drift showed the highest performance for certain data.
AB - As the Internet develops, the number of Internet of Things (IoT) devices increases. Simultaneously, the risk of IoT devices being infected with malware also increases. Thus, malware detection has become an important issue. Dynamic analysis logs are effective at detecting malware, but it takes time to collect a large amount of data because the malware must be executed at least once before the logs can be collected. Moreover, dynamic analysis logs are affected by external factors such as the execution environment. A malware detection method that uses a static property analysis log could solve these problems. In this study, deep learning (DL) was used as a machine learning method because DL is effective for large-scale data and can automatically extract features. Research has been conducted on malware detection using static properties of portable executable (PE) files, establishing that such detection is possible. However, research on malware detection using hash functions such as Fuzzy hash and peHash is lacking. Therefore, we investigated the characteristics of hash values in malware classification. Moreover, when the surface analysis log is viewed in chronological order, that the data are considered have concept drift characteristics. Therefore, we compared malware detection performance using data with the concept drift property. We found that the hash function could be used to prevent performance degradation even with concept drift data. In an experiment combining PE surface information and hash values, concept drift showed the highest performance for certain data.
KW - Deep learning
KW - Fuzzy hash
KW - Malware detection
KW - PE file
KW - peHash
UR - http://www.scopus.com/inward/record.url?scp=85128704778&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85128704778&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-99587-4_40
DO - 10.1007/978-3-030-99587-4_40
M3 - Conference contribution
AN - SCOPUS:85128704778
SN - 9783030995867
T3 - Lecture Notes in Networks and Systems
SP - 480
EP - 491
BT - Advanced Information Networking and Applications - Proceedings of the 36th International Conference on Advanced Information Networking and Applications AINA-2022
A2 - Barolli, Leonard
A2 - Hussain, Farookh
A2 - Enokido, Tomoya
PB - Springer Science and Business Media Deutschland GmbH
T2 - 36th International Conference on Advanced Information Networking and Applications, AINA 2022
Y2 - 13 April 2022 through 15 April 2022
ER -