Exploiting evidence from unstructured data to enhance master data management

    Research output: Contribution to journalArticle

    Published
    • Karin Murthy
    • Prasad Deshpande
    • Atreyee Dey
    • Ramanujam Halasipuram
    • Mukesh Mohania
    • Deepak Padmanabhan
    • Jennifer Reed
    • Scott Schumacher

    View graph of relations

    Master data management (MDM) integrates data from multiple
    structured data sources and builds a consolidated 360-
    degree view of business entities such as customers and products.
    Today’s MDM systems are not prepared to integrate
    information from unstructured data sources, such as news
    reports, emails, call-center transcripts, and chat logs. However,
    those unstructured data sources may contain valuable
    information about the same entities known to MDM from
    the structured data sources. Integrating information from
    unstructured data into MDM is challenging as textual references
    to existing MDM entities are often incomplete and
    imprecise and the additional entity information extracted
    from text should not impact the trustworthiness of MDM
    data.
    In this paper, we present an architecture for making MDM
    text-aware and showcase its implementation as IBM InfoSphere
    MDM Extension for Unstructured Text Correlation,
    an add-on to IBM InfoSphere Master Data Management
    Standard Edition. We highlight how MDM benefits from
    additional evidence found in documents when doing entity
    resolution and relationship discovery. We experimentally
    demonstrate the feasibility of integrating information from
    unstructured data sources into MDM.
    Original languageEnglish
    Number of pages12
    Pages (from-to)1862-1873
    JournalProceedings of the VLDB Endowment
    Journal publication date2012
    Issue number12
    Volume5
    Publication statusPublished - 2012

    ID: 17791929