Abstract
The continuing fight against intentionally malicious software has, to date,
favoured the proliforators of malware. Signature detection methods are growingly
impotent against rapidly evolving obfuscation techniques. Research has recently fo-
cussed on the low-level opcode analysis of disassembled executable programs, both
statically and dynamically. While able to detect malware, static analysis often still cannot unravel obfuscated code; dynamic approaches allow investigators to reveal the run-time code. Old and inadequately sampled datasets have limited the extrapolation potential of much of the body of research. This work presents a dynamic
opcode analysis approach to malware detection, applying machine learning tech-
niques to the largest dataset of its kind, both in terms of breadth (610-100k features) and depth (48k samples). N-gram analysis of opcode sequences from n=1..3 was applied as a means of enhancing the feature set. Feature selection was then investigated to tackle the feature explosion which resulted in more than 100,000 features in some cases. As the earliest detection of malware is the most favourable, run-length,i.e. the number of recorded opcodes in a trace, was examined to find the optimal capture size. This research found that dynamic opcode analysis can detect malware from benignware with a 99.01% accuracy rate, using a sequence of only 32k opcodes and 50 features. This demonstrates that a dynamic opcode analysis approach can compare with static analysis in terms of speed. Further, it has a very real potential application to the unending fight against malware, which is, by definition, continuously on the back foot.
favoured the proliforators of malware. Signature detection methods are growingly
impotent against rapidly evolving obfuscation techniques. Research has recently fo-
cussed on the low-level opcode analysis of disassembled executable programs, both
statically and dynamically. While able to detect malware, static analysis often still cannot unravel obfuscated code; dynamic approaches allow investigators to reveal the run-time code. Old and inadequately sampled datasets have limited the extrapolation potential of much of the body of research. This work presents a dynamic
opcode analysis approach to malware detection, applying machine learning tech-
niques to the largest dataset of its kind, both in terms of breadth (610-100k features) and depth (48k samples). N-gram analysis of opcode sequences from n=1..3 was applied as a means of enhancing the feature set. Feature selection was then investigated to tackle the feature explosion which resulted in more than 100,000 features in some cases. As the earliest detection of malware is the most favourable, run-length,i.e. the number of recorded opcodes in a trace, was examined to find the optimal capture size. This research found that dynamic opcode analysis can detect malware from benignware with a 99.01% accuracy rate, using a sequence of only 32k opcodes and 50 features. This demonstrates that a dynamic opcode analysis approach can compare with static analysis in terms of speed. Further, it has a very real potential application to the unending fight against malware, which is, by definition, continuously on the back foot.
Original language | English |
---|---|
Title of host publication | Data Analytics and Decision Support for Cybersecurity - Trends, Methodologies and Applications |
Publisher | Springer |
DOIs | |
Publication status | Published - 02 Aug 2017 |
Publication series
Name | Data Analytics |
---|
Fingerprint
Dive into the research topics of 'Dynamic Analysis of Malware using Run Time Opcodes'. Together they form a unique fingerprint.Student theses
-
Dynamic analyses of malware
Carlin, D. (Author), Sezer, S. (Supervisor) & O'Kane, P. (Supervisor), Dec 2018Student thesis: Doctoral Thesis › Doctor of Philosophy
File