Abstract
Data integration is one of the key problems in the era of Big Data analytics. The key challenge of data integration is the identification of records representing the same entities (e.g. person). This task is referred to as Record Linkage. It is uncommon for different data sources to share a unique identifier hence the records must be matched by comparing their corresponding values. Most of the existing methods assume that records across different sources are structured and represented by the same set of attributes (e.g. name, date of birth). However, nowadays majority of the data comes without structure (e.g. social media sites). We propose a new approach to Record Linkage based on application of Siamese Neural Network. The model can be applied with structured, semi-structured and unstructured records and it does not assume a common format across different data sources. We demonstrate that the model performs on par with other approaches, which make constraining assumptions regarding the data.
Original language | English |
---|---|
Title of host publication | Proceedings of the 22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS2020) |
Editors | Maria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Kotsis |
Publisher | Association for Computing Machinery |
Pages | 417-425 |
ISBN (Print) | 9781450389242 |
DOIs | |
Publication status | Published - 27 Jan 2021 |
Event | 22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020) - virtual, online Duration: 30 Nov 2020 → 02 Dec 2020 https://www.iiwas.org/conferences/iiwas2020/ |
Conference
Conference | 22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020) |
---|---|
City | virtual, online |
Period | 30/11/2020 → 02/12/2020 |
Internet address |