Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition

Alona Fyshe, Partha Talukdar, Brian Murphy, Tom Mitchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

15 Citations (Scopus)

Abstract

In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Original languageEnglish
Title of host publication17th Conference on Computational Natural Language Learning (CoNLL 2013)
PublisherAssociation for Computational Linguistics
Pages84-93
Number of pages10
ISBN (Print)9781629930077
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition'. Together they form a unique fingerprint.

Cite this