Abstract
Many clustering algorithms suffer from scalability problems
on massive datasets and do not support any user interaction
during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further
run. Also, they can be stopped or suspended anytime and
provide an answer. In this paper, we propose a novel anytime clustering algorithm based on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to very high dimensional databases such as time
series, trajectory, medical data, etc. The general idea of our
algorithm is to use a sequence of lower-bounding functions
(LBs) of the true similarity measure to produce multiple
approximate results of the true density-based clusters. ADBSCAN operates in multiple levels w.r.t. the LBs and is
mainly based on two algorithmic schemes: (1) an efficient
distance upgrade scheme which restricts distance calculations to core-objects at each level of the LBs; (2) a local reclustering scheme which restricts update operations to the
relevant objects only. Extensive experiments demonstrate
that A-DBSCAN acquires very good clustering results at
very early stages of execution thus saves a large amount of
computational time. Even if it runs to the end, A-DBSCAN
is still orders of magnitude faster than DBSCAN.
on massive datasets and do not support any user interaction
during runtime. To tackle these problems, anytime clustering algorithms are proposed. They produce a fast approximate result which is continuously refined during the further
run. Also, they can be stopped or suspended anytime and
provide an answer. In this paper, we propose a novel anytime clustering algorithm based on the density-based clustering paradigm. Our algorithm called A-DBSCAN is applicable to very high dimensional databases such as time
series, trajectory, medical data, etc. The general idea of our
algorithm is to use a sequence of lower-bounding functions
(LBs) of the true similarity measure to produce multiple
approximate results of the true density-based clusters. ADBSCAN operates in multiple levels w.r.t. the LBs and is
mainly based on two algorithmic schemes: (1) an efficient
distance upgrade scheme which restricts distance calculations to core-objects at each level of the LBs; (2) a local reclustering scheme which restricts update operations to the
relevant objects only. Extensive experiments demonstrate
that A-DBSCAN acquires very good clustering results at
very early stages of execution thus saves a large amount of
computational time. Even if it runs to the end, A-DBSCAN
is still orders of magnitude faster than DBSCAN.
Original language | English |
---|---|
Title of host publication | SIAM International Conference on Data Mining (SDM) |
Pages | 112--120 |
DOIs | |
Publication status | Published - 2013 |
Externally published | Yes |
Keywords
- Anytime Clustering
- Density-based Clustering
- Lower-bounding distance