Investigation of browser and web-based threats

  • Jonah Burgess

Student thesis: Doctoral ThesisDoctor of Philosophy

Abstract

This thesis explores the evolution of browser and web-based threats, focusing on the emerging CryptoJacking threat and the well-established Exploit Kit (EK) industry. Two ground-truth datasets are compiled for the experiments. The first is a set of 887 CryptoJacking samples comprising 11 miner families, extracted from the Alexa top 1m websites using a lightweight, static crawler and subject to manual analysis. The second is a set of 1279 EK samples compiled from network traffic samples obtained from reputable sources, representing various public and private networks, and encompassing the entire history of EK families.

Subsequently, knowledge gained from a large-scale analysis of EK samples is used to develop REdiREKT, a system that utilises the open-source Zeek Intrusion Detection System (IDS) to map HTTP redirection chains and extract distinguishing features for machine learning (ML). By processing a unique combination of 9 redirection techniques, REdiREKT correctly extracted 96.52% of malicious domains from 1279 EK samples, spanning 28 families and 8 campaigns, and only failed to extract 0.7% of malicious chains.

REdiREKT extracted 12,783 domains from 5910 redirection chains when applied to the benign dataset. A range of 48 HTTP, URL, redirect, and content-based features are subsequently extracted for each node (domain) in each redirection chain and stored appropriately to ensure the intrinsic, sequential structure is maintained. The malicious and benign features are assessed to identify common trends, as is the evolution of EK families.

Finally, the first known application of a Long Short-Term Memory (LSTM) network to detect EK traffic is presented. Samples are processed as sequences, where each timestep represents a redirect and contains a unique combination of 48 features. Hyper-parameters are tuned via 5-fold cross-validation (CV), with the optimal configuration achieving an F1 score of 0.9878 against the unseen test set. Furthermore, isolated feature categories are contrasted to assess their importance.

Date of AwardJul 2023
Original languageEnglish
Awarding Institution
  • Queen's University Belfast
SupervisorKieran McLaughlin (Supervisor) & Sakir Sezer (Supervisor)

Keywords

  • Malware detection
  • malware analysis
  • cyber-security
  • cybercrime
  • machine learning
  • information and technology

Cite this

'