Anytime parallel density-based clustering

Son T. Mai, Ira Assent, Jon Jacobsen, Martin Storgaard Dieu

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

The density-based clustering algorithm DBSCAN is a state-of-the-art data
clustering technique with numerous applications in many fields. However, DBSCAN requires neighborhood queries for all objects and propagation of labels from object to object. This scheme is time consuming and thus limits its applicability for large datasets. In this paper, we propose a novel anytime approach to cope with this problem by reducing both the range query and the label propagation time of DBSCAN. Our algorithm, called AnyDBC, compresses the data into smaller density-connected subsets called primitive clusters and labels objects based on connected components of these primitive clusters to reduce the label propagation time. Moreover, instead of passively performing range queries for all objects as in existing techniques, AnyDBC iteratively and actively learns the current cluster structure of the data and selects a few most promising objects for refining clusters at each iteration. Thus, in the end,
it performs substantially fewer range queries compared to DBSCAN while still satisfying the cluster definition of DBSCAN. Moreover, by processing queries in block and merging the results into the current cluster structure, AnyDBC can be efficiently parallelized on shared memory architectures to further accelerate the performance, uniquely making it a parallel and anytime technique at the same time. Experiments show speedup factors of orders of magnitude compared to DBSCAN and its fastest variants as well as a high parallel scalability on multicore processors for very large real and synthetic complex datasets.
Original languageEnglish
Pages (from-to)1121--1176
Number of pages56
JournalData Mining and Knowledge Discovery
Volume32
Issue number4
Early online date10 Apr 2018
DOIs
Publication statusPublished - Jul 2018

Keywords

  • Density-based clustering
  • Anytime clustering
  • Active clustering
  • Parallel processing
  • Multicore processors

Fingerprint Dive into the research topics of 'Anytime parallel density-based clustering'. Together they form a unique fingerprint.

Cite this