Designing and optimizing alignment datasets for IoT security: a synergistic approach with static analysis insights

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Vulnerabilities without domain-focused datasets. In this paper, we present a structured methodology to design and optimize IoT-specific alignment datasets informed by static analysis insights, thereby bridging the gap between generic language models and specialized IoT security requirements. Our approach integrates findings from IoT firmware analysis tools (e.g. FACT and Binwalk) with authoritative vulnerability repositories (MITRE CVE, CWE, CAPEC) to construct three key dataset types: (1) Base Datasets, capturing essential IoT vulnerabilities and configurations, (2) Classification Datasets, discerning IoT from non-IoT prompts, and (3) Alignment Datasets employing Contrastive Preference Optimization (CPO), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for IoT-specific fine-tuning. We further incorporate secure-by-design principles and bias mitigation strategies—ranging from device-type diversity to synthetic data augmentation—to ensure fair, high-fidelity representations of IoT security scenarios. Experimental results demonstrate that our alignment datasets improve LLM responsiveness and correctness for vulnerabilities discovered via offline static analysis, including outdated libraries, hard-coded credentials, and insecure default services. Notably, Kahneman-Tversky Optimization achieves a 97% alignment accuracy, reflecting the impact of clear binary classifications in high-stakes security tasks. This work underscores the significance of dual-system integration (static analysis plus LLM alignment) for proactive IoT defense. By foregrounding domain-specific vulnerabilities in carefully curated datasets, we enable LLMs to generate more actionable, context-aware security recommendations, thus advancing state-of-the-art IoT protections in both research and industry deployments.

Original languageEnglish
Title of host publication21st International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2025): Proceedings
PublisherACM New York, NY
Publication statusAccepted - 25 Mar 2025
EventPROMISE 2025: The 21st International Conference on Predictive Models and Data Analytics in Software Engineering - Trondheim, Norway
Duration: 23 Jun 202527 Jun 2025

Publication series

Name
ISSN (Electronic)1111-1111

Conference

ConferencePROMISE 2025: The 21st International Conference on Predictive Models and Data Analytics in Software Engineering
Country/TerritoryNorway
CityTrondheim
Period23/06/202527/06/2025

Keywords

  • alignment datasets
  • IoT security
  • static analysis insights

Fingerprint

Dive into the research topics of 'Designing and optimizing alignment datasets for IoT security: a synergistic approach with static analysis insights'. Together they form a unique fingerprint.

Cite this