Efficient Anytime Density-based Clustering

Thai Son Mai, Jing Feng, Xiao He, Christian Boehm

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Citations (Scopus)

Abstract

Many clustering algorithms suffer from scalability problems
on massive datasets and do not support any user interaction
during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further
run. Also, they can be stopped or suspended anytime and
provide an answer. In this paper, we propose a novel anytime clustering algorithm based on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to very high dimensional databases such as time
series, trajectory, medical data, etc. The general idea of our
algorithm is to use a sequence of lower-bounding functions
(LBs) of the true similarity measure to produce multiple
approximate results of the true density-based clusters. ADBSCAN operates in multiple levels w.r.t. the LBs and is
mainly based on two algorithmic schemes: (1) an efficient
distance upgrade scheme which restricts distance calculations to core-objects at each level of the LBs; (2) a local reclustering scheme which restricts update operations to the
relevant objects only. Extensive experiments demonstrate
that A-DBSCAN acquires very good clustering results at
very early stages of execution thus saves a large amount of
computational time. Even if it runs to the end, A-DBSCAN
is still orders of magnitude faster than DBSCAN.
Original languageEnglish
Title of host publicationSIAM International Conference on Data Mining (SDM)
Pages112--120
DOIs
Publication statusPublished - 2013
Externally publishedYes

Keywords

  • Anytime Clustering
  • Density-based Clustering
  • Lower-bounding distance

Fingerprint Dive into the research topics of 'Efficient Anytime Density-based Clustering'. Together they form a unique fingerprint.

Cite this