N-gram Opcode Analysis for Android Malware Detection

BooJoong Kang, Suleiman Y. Yerima, Sakir Sezer, Kieran McLaughlin

Research output: Contribution to journalArticlepeer-review

967 Downloads (Pure)


Android malware has been on the rise in recent years due to the increasing popularity of Android and the proliferation of third party application markets. Emerging Android malware families are increasingly adopting sophisticated detection avoidance techniques and this calls for more effective approaches for Android malware detection. Hence, in this paper we present and evaluate an n-gram opcode features based approach that utilizes machine learning to identify and categorize Android malware. This approach enables automated feature discovery without relying on prior expert or domain knowledge for pre-determined features. Furthermore, by using a data segmentation technique for feature selection, our analysis is able to scale up to 10-gram opcodes. Our experiments on a dataset of 2520 samples showed achieved an f-measure of 98% using the n-gram opcode based approach. We also provide empirical findings that illustrate factors that have probable impact on the overall n-gram opcodes performance trends.
Original languageEnglish
Pages (from-to)231-254
Number of pages24
JournalInternational Journal on Cyber Situational Awareness
Issue number1
Publication statusPublished - 30 Nov 2016


  • Android malware
  • malware detection
  • malware categorization
  • Dalvik bytecode
  • n-gram
  • opcode
  • feature selection
  • machine learning


Dive into the research topics of 'N-gram Opcode Analysis for Android Malware Detection'. Together they form a unique fingerprint.

Cite this