Abstract
Vulnerabilities without domain-focused datasets. In this paper, we present a structured methodology to design and optimize IoT-specific alignment datasets informed by static analysis insights, thereby bridging the gap between generic language models and specialized IoT security requirements. Our approach integrates findings from IoT firmware analysis tools (e.g. FACT and Binwalk) with authoritative vulnerability repositories (MITRE CVE, CWE, CAPEC) to construct three key dataset types: (1) Base Datasets, capturing essential IoT vulnerabilities and configurations, (2) Classification Datasets, discerning IoT from non-IoT prompts, and (3) Alignment Datasets employing Contrastive Preference Optimization (CPO), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for IoT-specific fine-tuning. We further incorporate secure-by-design principles and bias mitigation strategies—ranging from device-type diversity to synthetic data augmentation—to ensure fair, high-fidelity representations of IoT security scenarios. Experimental results demonstrate that our alignment datasets improve LLM responsiveness and correctness for vulnerabilities discovered via offline static analysis, including outdated libraries, hard-coded credentials, and insecure default services. Notably, Kahneman-Tversky Optimization achieves a 97% alignment accuracy, reflecting the impact of clear binary classifications in high-stakes security tasks. This work underscores the significance of dual-system integration (static analysis plus LLM alignment) for proactive IoT defense. By foregrounding domain-specific vulnerabilities in carefully curated datasets, we enable LLMs to generate more actionable, context-aware security recommendations, thus advancing state-of-the-art IoT protections in both research and industry deployments.
Original language | English |
---|---|
Title of host publication | 21st International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2025): Proceedings |
Publisher | ACM New York, NY |
Publication status | Accepted - 25 Mar 2025 |
Event | PROMISE 2025: The 21st International Conference on Predictive Models and Data Analytics in Software Engineering - Trondheim, Norway Duration: 23 Jun 2025 → 27 Jun 2025 |
Publication series
Name | |
---|---|
ISSN (Electronic) | 1111-1111 |
Conference
Conference | PROMISE 2025: The 21st International Conference on Predictive Models and Data Analytics in Software Engineering |
---|---|
Country/Territory | Norway |
City | Trondheim |
Period | 23/06/2025 → 27/06/2025 |
Keywords
- alignment datasets
- IoT security
- static analysis insights
Fingerprint
Dive into the research topics of 'Designing and optimizing alignment datasets for IoT security: a synergistic approach with static analysis insights'. Together they form a unique fingerprint.Student theses
-
Secure internet of things using deep learning: applying transformer-based NLP techniques for enhanced IoT security
Al-Zuraiqi, A. M. (Author), McMullan, P. (Supervisor) & Greer, D. (Supervisor), Jul 2025Student thesis: Doctoral Thesis › Doctor of Philosophy