Abstract
We introduce a novel interactive framework to handle both instancelevel and temporal smoothness constraints for clustering large longitudinal
data and for tracking the cluster evolutions over time. It consists of a constrained clustering algorithm, called CVQE+, which optimizes the clustering
quality, constraint violation and the historical cost between consecutive data
snapshots. At the center of our framework is a simple yet effective active learning technique, named Border, for iteratively selecting the most informative
pairs of objects to query users about, and updating the clustering with new
constraints. Those constraints are then propagated inside each data snapshot
and between snapshots via two schemes, called constraint inheritance and constraint propagation, to further enhance the results. Moreover, a historical constraint is enforced between consecutive snapshots to ensure the consistency of results among them. Experiments show better or comparable clustering results than state-of-the-art techniques as well as high scalability for large datasets. Finally, we apply our algorithm for clustering phenotypes in patients with Obstructive Sleep Apnea (OSA) as well as for tracking how these clusters evolve over time.
data and for tracking the cluster evolutions over time. It consists of a constrained clustering algorithm, called CVQE+, which optimizes the clustering
quality, constraint violation and the historical cost between consecutive data
snapshots. At the center of our framework is a simple yet effective active learning technique, named Border, for iteratively selecting the most informative
pairs of objects to query users about, and updating the clustering with new
constraints. Those constraints are then propagated inside each data snapshot
and between snapshots via two schemes, called constraint inheritance and constraint propagation, to further enhance the results. Moreover, a historical constraint is enforced between consecutive snapshots to ensure the consistency of results among them. Experiments show better or comparable clustering results than state-of-the-art techniques as well as high scalability for large datasets. Finally, we apply our algorithm for clustering phenotypes in patients with Obstructive Sleep Apnea (OSA) as well as for tracking how these clusters evolve over time.
Original language | English |
---|---|
Pages (from-to) | 359--378 |
Journal | Data Science and Engineering |
Volume | 3 |
Issue number | 4 |
Early online date | 07 Nov 2018 |
DOIs | |
Publication status | Published - 01 Dec 2018 |