Anytime density-based clustering of complex data

Thai Son Mai, Xiao He, Jing Feng, Claudia Plant, christian boehm

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)

Abstract

Many clustering algorithms suffer from scalability problems on massive datasets and do not support any user interaction during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further run. Also, they can be stopped or suspended anytime to provide an intermediate answer. In this paper, we propose a novel anytime clustering algorithm modelled on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to many complex data such as trajectory, medical data, etc. The general idea of our algorithm is to use a sequence of lower-bounding functions (LBs) of the true distance function to produce multiple approximate results of the true density-based clusters. A-DBSCAN operates in multiple levels w.r.t. the LBs and is mainly based on two algorithmic schemes: (1) an efficient distance upgrade scheme which restricts distance calculations to core objects at each level of the LBs; (2) a local re-clustering scheme which restricts update operations to the relevant objects only. To further improve the performance, we propose a significant extension version of A-DBSCAN called A-DBSCAN-XS which is built upon the anytime scheme of A-DBSCAN and the µ-range query scheme of a data structure called extended Xseedlist. A-DBSCAN-XS requires less distance calculations at each level than A-DBSCAN thus is more efficient. Extensive experiments demonstrate that A-DBSCAN and A-DBSCAN-XS acquire very good clustering results at very early stages of execution thus save a large amount of computational time. Even if they run to the end, A-DBSCAN and A-DBSCAN-XS are still orders of magnitude faster than the original algorithm DBSCAN and its variants. We also introduce a novel application for our algorithms for the segmentation of the white matter fiber tracts in human brain
which is an important tool for studying the brain structure and various diseases such as Alzheimer.
Original languageEnglish
Pages (from-to)319-355
JournalKnowledge and Information Systems
Volume45
Issue number2
Early online date22 Oct 2014
DOIs
Publication statusPublished - Nov 2015
Externally publishedYes

Keywords

  • Anytime Clustering
  • Density-based Clustering
  • Lower-bounding Distance
  • Fiber Segmentation
  • Fiber Clustering
  • Diffusion Tensor Imaging

Fingerprint Dive into the research topics of 'Anytime density-based clustering of complex data'. Together they form a unique fingerprint.

Cite this