Abstract
Record linkage, referred to also as entity resolution, is the process of identifying pairs of records representing the same real-world entity (for example, a person)within a dataset or across multiple datasets. This allows for the integration of multi-source data which allows for better knowledge discovery. In order to reduce the number of record comparisons, record linkage frameworks initially perform a process commonly referred to as blocking, which involves separating records into blocks using a partition (or blocking) scheme. This restricts comparisons among records that belong to the same block during the linkage process. Existing blocking techniques often require some form of manual fine-tuning of parameter values for optimal performance. Optimal parameter values may be selected manually by a domain expert, or automatically learned using labelled data. However, in many real world situations no such labelled dataset may be available. In this paper we propose a novel unsupervised blocking technique for structured datasets that does not require labelled data or manual fine-tuning of parameters. Experimental evaluations, across a large number of datasets, demonstrate that this novel approach often achieves superior levels of proficiency to both supervised and unsupervised baseline techniques, often in less time.
Original language | English |
---|---|
Pages (from-to) | 181-195 |
Number of pages | 15 |
Journal | Data & Knowledge Engineering |
Volume | 122 |
DOIs | |
Publication status | Published - 08 Jul 2019 |
Fingerprint
Dive into the research topics of 'An unsupervised blocking technique for more efficient record linkage'. Together they form a unique fingerprint.Student theses
-
Design and development of blocking approaches for record linkage
O'Hare, K. (Author), Jurek-Loughrey, A. (Supervisor) & de Campos, C. (Supervisor), Jul 2021Student thesis: Doctoral Thesis › Doctor of Philosophy
File