MixKMeans: Clustering Question-Answer Archives

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Published

    View graph of relations

    Community-driven Question Answering (CQA) systems that crowdsource experiential information in the form of questions and answers and have accumulated valuable reusable knowledge. Clustering of QA datasets from CQA systems provides a means of organizing the content to ease tasks such as manual curation and tagging. In this paper, we present a clustering method that exploits the two-part question-answer structure in QA datasets to improve clustering quality. Our method, {\it MixKMeans}, composes question and answer space similarities in a way that the space on which the match is higher is allowed to dominate. This construction is motivated by our observation that semantic similarity between question-answer data (QAs) could get localized in either space. We empirically evaluate our method on a variety of real-world labeled datasets. Our results indicate that our method significantly outperforms state-of-the-art clustering methods for the task of clustering question-answer archives.

    Documents

    • MixKMeans: Clustering Question-Answer Archives

      Rights statement: Copyright 2016 ACL. This work is made available online in accordance with the publisher’s policies. Please refer to any applicable terms of use of the publisher

      Accepted author manuscript, 288 KB, PDF-document

    Original languageEnglish
    Title of host publicationProceedings of the Conference on Empirical Methods in Natural Language Processing 2016
    PublisherAssociation for Computing Machinery (ACM)
    Publication statusPublished - 06 Nov 2016
    EventConference on Empirical Methods in Natural Language Processing - Texas, Austin, United States
    Duration: 02 Nov 201606 Nov 2016
    http://www.emnlp2016.net/

    Conference

    ConferenceConference on Empirical Methods in Natural Language Processing
    Abbreviated titleEMNLP 2016
    CountryUnited States
    CityAustin
    Period02/11/201606/11/2016
    Internet address

    ID: 73541843