Abstract
The density-based clustering algorithm DBSCAN is a stateof-the-art data clustering technique with numerous applications in many fields. However, its O(n2) time complexity still remains a severe weakness. In this paper, we propose a novel anytime approach to cope with this problem by reducing both the range query and the label propagation time of DBSCAN. Our algorithm, called AnyDBC, compresses the data into smaller density-connected subsets called primitive
clusters and labels objects based on connected components of these primitive clusters for reducing the label propagation time. Moreover, instead of passively performing the range query for all objects like existing techniques, AnyDBC iteratively and actively learns the current cluster structure of the data and selects a few most promising objects for refining clusters at each iteration. Thus, in the end, it performs substantially fewer range queries compared to DBSCAN while
still guaranteeing the exact final result of DBSCAN. Experiments show speedup factors of orders of magnitude compared to DBSCAN and its fastest variants on very large real and synthetic complex datasets.
clusters and labels objects based on connected components of these primitive clusters for reducing the label propagation time. Moreover, instead of passively performing the range query for all objects like existing techniques, AnyDBC iteratively and actively learns the current cluster structure of the data and selects a few most promising objects for refining clusters at each iteration. Thus, in the end, it performs substantially fewer range queries compared to DBSCAN while
still guaranteeing the exact final result of DBSCAN. Experiments show speedup factors of orders of magnitude compared to DBSCAN and its fastest variants on very large real and synthetic complex datasets.
Original language | English |
---|---|
Title of host publication | ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD) |
Pages | 1025-1034 |
DOIs | |
Publication status | Published - 2016 |
Externally published | Yes |
Keywords
- Density-based clustering
- anytime clustering
- active learning
Fingerprint
Dive into the research topics of 'AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets'. Together they form a unique fingerprint.Profiles
-
Thai Son Mai
- School of Electronics, Electrical Engineering and Computer Science - Senior Lecturer
Person: Academic