AbstractThe overall aim of this PhD project was to contribute to the field of record linkage by proposing innovative solutions that address existing issues among blocking approaches used for record linkage.
A thorough explanation of blocking and record linkage is provided so as to better introduce the reader to these concepts. An extensive literature study is also provided to detail many existing blocking approaches, their advantages and disadvantages in comparison to one another, and how existing blocking methodologies could be improved in general.
In particular: (1) criticisms of commonly applied blocking evaluation metrics, (2) how to appropriately select a blocking approach for any new (i.e., never before evaluated) unlabelled dataset, (3) the requirement of labelled data or manual fine-tuning of parameters for many existing blocking methods for optimal performance, (4) and how many existing blocking methods may only be applied to structured datasets.
In each technical contribution potential solutions to each of these issues are proposed along with an explanation of their benefits and underlying logic. Algorithms and figures are used where possible to more easily convey the key points of each proposed solution. Each proposed solution is empirically evaluated using a large selection of existing blocking approaches and datasets, accompanied by a thorough discussion of results so as to better demonstrate the performance of proposed techniques in comparison to existing approaches.
The contributions of this thesis are summarised in the conclusion section, and an outline of how the work of this thesis could be further developed in future work is provided.
|Date of Award
|Administrative Data Research Centre Northern Ireland (ADRC-NI) & Northern Ireland Department for the Economy
|Anna Jurek-Loughrey (Supervisor) & Cassio de Campos (Supervisor)