Abstract
A clustering may be considered as fair on pre-specified sensitive attributes if the proportions of sensitive attribute groups
in each cluster reflect that in the dataset. In this paper, we consider the task of fair clustering for scenarios involving multiple
multi-valued or numeric sensitive attributes. We propose a fair
clustering method, FairKM (Fair K-Means), that is inspired by
the popular K-Means clustering formulation. We outline a computational notion of fairness which is used along with a cluster
coherence objective, to yield the FairKM clustering method. We
empirically evaluate our approach, wherein we quantify both
the quality and fairness of clusters, over real-world datasets. Our
experimental evaluation illustrates that the clusters generated by
FairKM fare significantly better on both clustering quality and
fair representation of sensitive attribute groups compared to the
clusters from a state-of-the-art baseline fair clustering method.
in each cluster reflect that in the dataset. In this paper, we consider the task of fair clustering for scenarios involving multiple
multi-valued or numeric sensitive attributes. We propose a fair
clustering method, FairKM (Fair K-Means), that is inspired by
the popular K-Means clustering formulation. We outline a computational notion of fairness which is used along with a cluster
coherence objective, to yield the FairKM clustering method. We
empirically evaluate our approach, wherein we quantify both
the quality and fairness of clusters, over real-world datasets. Our
experimental evaluation illustrates that the clusters generated by
FairKM fare significantly better on both clustering quality and
fair representation of sensitive attribute groups compared to the
clusters from a state-of-the-art baseline fair clustering method.
Original language | English |
---|---|
Title of host publication | International Conference on Extending Database Technology: Proceedings |
Pages | 287-298 |
Number of pages | 12 |
ISBN (Electronic) | 978-3-89318-083-7 |
Publication status | Published - 31 Mar 2020 |
Event | EDBT/ICDT 2020 Joint Conference - Copenhagen, Copenhagen, Denmark Duration: 30 Mar 2020 → 02 Apr 2020 https://diku-dk.github.io/edbticdt2020/ |
Publication series
Name | Advances in Database Technology: Proceedings |
---|---|
ISSN (Electronic) | 2367-2005 |
Conference
Conference | EDBT/ICDT 2020 Joint Conference |
---|---|
Country/Territory | Denmark |
City | Copenhagen |
Period | 30/03/2020 → 02/04/2020 |
Internet address |