Abstract
In the rapidly evolving landscape of cybersecurity, classification of malware families presents significant challenges due to the dynamic nature of malware, a phenomenon known as concept drift. In this research, we classify Windows PE malware families using static analysis of raw opcode sequences. By leveraging Convolutional Neural Networks (CNNs) to extract unique features from these sequences, our approach achieves high classification accuracy rates of 98.20% and 89.55% on the Microsoft Malware Classification Challenge and BODMAS datasets, respectively. We also conducted a temporal analysis on BODMAS over a 13-month period to observe the evolution of malware families and identify periods where our model’s accuracy decreases. We implemented a retraining strategy, allowing us to observe how retraining the model with new data helps it adapt to new malware patterns. The study also examined the impact of packed malware and different types of packers on the model’s performance. Our findings indicate that packed malware significantly affects the model’s accuracy, with some packers having a more pronounced impact than others. These results underscore the importance of regular model updates and specialized handling of packed malware to maintain robust detection capabilities.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference on Applied Machine Learning for Information Security (CAMLIS 2024) |
Publisher | IEEE Xplore |
Publication status | Accepted - 04 Aug 2024 |
Event | Conference on Applied Machine Learning for Information Security - Arlington, United States Duration: 24 Oct 2024 → 25 Oct 2024 Conference number: 2024 https://www.camlis.org/ |
Conference
Conference | Conference on Applied Machine Learning for Information Security |
---|---|
Abbreviated title | CAMLIS |
Country/Territory | United States |
City | Arlington |
Period | 24/10/2024 → 25/10/2024 |
Internet address |