Abstract
In most previous research on distributional
semantics, Vector Space Models
(VSMs) of words are built either from
topical information (e.g., documents in
which a word is present), or from syntactic/semantic
types of words (e.g., dependency
parse links of a word in sentences),
but not both. In this paper, we explore the
utility of combining these two representations
to build VSM for the task of semantic
composition of adjective-noun phrases.
Through extensive experiments on benchmark
datasets, we find that even though
a type-based VSM is effective for semantic
composition, it is often outperformed
by a VSM built using a combination of
topic- and type-based statistics. We also
introduce a new evaluation task wherein
we predict the composed vector representation
of a phrase from the brain activity of
a human subject reading that phrase. We
exploit a large syntactically parsed corpus
of 16 billion tokens to build our VSMs,
with vectors for both phrases and words,
and make them publicly available.
Original language | English |
---|---|
Title of host publication | 17th Conference on Computational Natural Language Learning (CoNLL 2013) |
Publisher | Association for Computational Linguistics |
Pages | 84-93 |
Number of pages | 10 |
ISBN (Print) | 9781629930077 |
Publication status | Published - 2013 |