Abstract
Anomaly detection algorithms identify unusual events and outliers in large datasets where manual approaches are highly impractical. Most prior anomaly detection methods assume simple unimodal Gaussian data distributions; however, they produce suboptimal results on complex multimodal distributions. To address this problem, we propose DIP-ECOD, a novel anomaly detection algorithm leveraging unsupervised machine learning that generalises to both multimodal and unimodal distributions. DIP-ECOD integrates a dip test within the ECOD framework, using SkinnyDip to split a probability distribution into separate modes, after which ECOD is applied. In this way, difficult-to-find outliers between modes and hidden in the distribution tails of each mode are also detected. Experiments using nine benchmark datasets across a range of domains such as healthcare and imagery demonstrate DIP-ECOD’s improved performance over ECOD in detecting outliers in both multimodal and unimodal distributions, with DIP-ECOD achieving an average AUC score of 0.791 compared to ECOD’s 0.761. Further, using a proprietary enterprise dataset, we show DIP-ECOD effectively identifies anomalous Github commits, indicating its applicability to information security and software vulnerability, where multi modal distributions are expected.
Original language | English |
---|---|
Title of host publication | Proceedings of the Conference on Applied Machine Learning for Information Security (CAMLIS 2024) |
Publisher | IEEE Xplore |
Publication status | Accepted - 04 Aug 2024 |
Event | Conference on Applied Machine Learning in Information Security (CAMLIS) - Arlington, VA, Washington, United States Duration: 24 Oct 2024 → 25 Oct 2024 |
Conference
Conference | Conference on Applied Machine Learning in Information Security (CAMLIS) |
---|---|
Country/Territory | United States |
City | Washington |
Period | 24/10/2024 → 25/10/2024 |