OPTICS is a fundamental data clustering technique that has been widely applied in many fields. However, it suffers from performance degradation when faced with large datasets and expensive distance measures because of its quadratic complexity in terms of both time and distance function calls. In this paper, we introduce a novel anytime approach to tackle the above problems. The general idea is to use a sequence of lower-bounding (LB) distances of the true distance measure to produce multiple approximations of the true reachability plot of OPTICS. The algorithm quickly produces an approximation result using the first LB distance. It then continuously refines the results with subsequent LB distances and the results from the previous computations. At any time, users can suspend and resume the algorithm to examine the results, enabling them to stop the algorithm whenever they are satisfied with the obtained results, thereby saving computational cost. Our proposed algorithms, called Any-OPTICS and Any-OPTICS-XS, are built upon this anytime scheme and can be applied for many complex datasets. Our experiments show that Any-OPTICS obtains very good clustering results at early stages of execution, leading to orders of magnitudes speed up. Even when run to the final distance measure, the cumulative runtime of Any-OPTICS is faster than OPTICS and its extensions.
|Title of host publication
|International Conference on Database Systems for Advanced Applications (DASFAA)
|Published - 25 Mar 2016
|Lecture Notes in Computer Science