The Effects of Traditional Anti-Virus Labels on Malware Detection using Dynamic Runtime Opcodes

Domhnall Carlin, Alexandra Cowan, Philip O'Kane, Sakir Sezer

Research output: Contribution to journalArticlepeer-review

28 Citations (Scopus)
530 Downloads (Pure)


The arms race between the distributors of malware and those seeking to provide defenses has so far favored the former. Signature detection methods have been unable to cope with the onslaught of new binaries aided by rapidly developing obfuscation techniques. Recent research has focused on the analysis of low-level opcodes, both static and dynamic, as a way to detect malware. Although sometimes successful at detecting malware, static analysis still fails to unravel obfuscated code, whereas dynamic analysis can allow researchers to investigate the revealed code at runtime. Research in the field has been limited by the underpinning data sets; old and inadequately sampled malware can lessen the extrapolation potential of such data sets. The main contribution of this paper is the creation of a new parsed runtime trace data set of over 100 000 labeled samples, which will address these shortcomings, and we offer the data set itself for use by the wider research community. This data set underpins the examination of the run traces using classifiers on count-based and sequence-based data. We find that malware detection rates are lessened when samples are labeled with traditional anti-virus (AV) labels. Neither count-based nor sequence-based algorithms can sufficiently distinguish between AV label classes. Detection increases when malware is re-classed with labels yielded from unsupervised learning. With sequenced-based learning, detection exceeds that of labeling as simply “malware” alone. This approach may yield future work, where the triaging of malware can be more effective.
Original languageEnglish
Pages (from-to)17742-17752
JournalIEEE Access
Issue number1
Publication statusPublished - 27 Sept 2017


  • Malware
  • Machine learning, Decision tree, Concept drift, Ensemble learning, Classification, Random forest
  • Cyber Security
  • network security


Dive into the research topics of 'The Effects of Traditional Anti-Virus Labels on Malware Detection using Dynamic Runtime Opcodes'. Together they form a unique fingerprint.

Cite this