Machine learning-based dynamic analysis of Android apps with improved code coverage

Suleiman Yerima, Mohammed Alzaylaee, Sakir Sezer

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.
LanguageEnglish
Pages1-24
JournalEURASIP Journal on Information Security
Volume4
DOIs
Publication statusPublished - 29 Apr 2019

Fingerprint

Application programs
Dynamic analysis
Learning systems
Learning algorithms
User interfaces
Android (operating system)
Malware
Experiments

Keywords

  • Android malware detection
  • code coverage
  • Monkey
  • DroidBot
  • Dynamic analysis
  • Machine learning
  • Event generation
  • State-based input generation
  • Model-based input generation
  • Random input generation

Cite this

@article{7ac1b87119cf481e8c3b8dc7d52184e3,
title = "Machine learning-based dynamic analysis of Android apps with improved code coverage",
abstract = "This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.",
keywords = "Android malware detection, code coverage, Monkey, DroidBot, Dynamic analysis, Machine learning, Event generation, State-based input generation, Model-based input generation, Random input generation",
author = "Suleiman Yerima and Mohammed Alzaylaee and Sakir Sezer",
year = "2019",
month = "4",
day = "29",
doi = "10.1186/s13635-019-0087-1",
language = "English",
volume = "4",
pages = "1--24",
journal = "EURASIP Journal on Information Security",
issn = "2510-523X",
publisher = "Springer",

}

Machine learning-based dynamic analysis of Android apps with improved code coverage. / Yerima, Suleiman; Alzaylaee, Mohammed; Sezer, Sakir.

In: EURASIP Journal on Information Security, Vol. 4, 29.04.2019, p. 1-24.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Machine learning-based dynamic analysis of Android apps with improved code coverage

AU - Yerima, Suleiman

AU - Alzaylaee, Mohammed

AU - Sezer, Sakir

PY - 2019/4/29

Y1 - 2019/4/29

N2 - This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.

AB - This paper investigates the impact of code coverage on machine learning-based dynamic analysis of Android malware. In order to maximize the code coverage, dynamic analysis on Android typically requires the generation of events to trigger the user interface and maximize the discovery of the run-time behavioral features. The commonly used event generation approach in most existing Android dynamic analysis systems is the random-based approach implemented with the Monkey tool that comes with the Android SDK. Monkey is utilized in popular dynamic analysis platforms like AASandbox, vetDroid, MobileSandbox, TraceDroid, Andrubis, ANANAS, DynaLog, and HADM. In this paper, we propose and investigate approaches based on stateful event generation and compare their code coverage capabilities with the state-of-the-practice random-based Monkey approach. The two proposed approaches are the state-based method (implemented with DroidBot) and a hybrid approach that combines the state-based and random-based methods. We compare the three different input generation methods on real devices, in terms of their ability to log dynamic behavior features and the impact on various machine learning algorithms that utilize the behavioral features for malware detection. Experiments performed using 17,444 applications show that overall, the proposed methods provide much better code coverage which in turn leads to more accurate machine learning-based malware detection compared to the state-of- the- art approach.

KW - Android malware detection

KW - code coverage

KW - Monkey

KW - DroidBot

KW - Dynamic analysis

KW - Machine learning

KW - Event generation

KW - State-based input generation

KW - Model-based input generation

KW - Random input generation

U2 - 10.1186/s13635-019-0087-1

DO - 10.1186/s13635-019-0087-1

M3 - Article

VL - 4

SP - 1

EP - 24

JO - EURASIP Journal on Information Security

T2 - EURASIP Journal on Information Security

JF - EURASIP Journal on Information Security

SN - 2510-523X

ER -