Siamese Neural Network for unstructured data linkage

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Data integration is one of the key problems in the era of Big Data analytics. The key challenge of data integration is the identification of records representing the same entities (e.g. person). This task is referred to as Record Linkage. It is uncommon for different data sources to share a unique identifier hence the records must be matched by comparing their corresponding values. Most of the existing methods assume that records across different sources are structured and represented by the same set of attributes (e.g. name, date of birth). However, nowadays majority of the data comes without structure (e.g. social media sites). We propose a new approach to Record Linkage based on application of Siamese Neural Network. The model can be applied with structured, semi-structured and unstructured records and it does not assume a common format across different data sources. We demonstrate that the model performs on par with other approaches, which make constraining assumptions regarding the data.

Original languageEnglish
Title of host publicationProceedings of the 22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS2020)
EditorsMaria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Kotsis
PublisherAssociation for Computing Machinery
Pages417-425
ISBN (Print)9781450389242
DOIs
Publication statusPublished - 27 Jan 2021
Event22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020) - virtual, online
Duration: 30 Nov 202002 Dec 2020
https://www.iiwas.org/conferences/iiwas2020/

Conference

Conference22nd International Conference on Information Integration and Web-based Applications and Services (iiWAS 2020)
Cityvirtual, online
Period30/11/202002/12/2020
Internet address

Fingerprint

Dive into the research topics of 'Siamese Neural Network for unstructured data linkage'. Together they form a unique fingerprint.

Cite this