Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition

Alona Fyshe, Partha Talukdar, Brian Murphy, Tom Mitchell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.
Original languageEnglish
Title of host publication17th Conference on Computational Natural Language Learning (CoNLL 2013)
PublisherAssociation for Computational Linguistics
Pages84-93
Number of pages10
ISBN (Print)9781629930077
Publication statusPublished - 2013

Fingerprint

Vector spaces
Semantics
Chemical analysis
Syntactics
Brain
Statistics
Experiments

Cite this

Fyshe, A., Talukdar, P., Murphy, B., & Mitchell, T. (2013). Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition. In 17th Conference on Computational Natural Language Learning (CoNLL 2013) (pp. 84-93). Association for Computational Linguistics.
Fyshe, Alona ; Talukdar, Partha ; Murphy, Brian ; Mitchell, Tom. / Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition. 17th Conference on Computational Natural Language Learning (CoNLL 2013). Association for Computational Linguistics, 2013. pp. 84-93
@inproceedings{df7a2d5f3fae42bbb0f01a34e97a2585,
title = "Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition",
abstract = "In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.",
author = "Alona Fyshe and Partha Talukdar and Brian Murphy and Tom Mitchell",
year = "2013",
language = "English",
isbn = "9781629930077",
pages = "84--93",
booktitle = "17th Conference on Computational Natural Language Learning (CoNLL 2013)",
publisher = "Association for Computational Linguistics",

}

Fyshe, A, Talukdar, P, Murphy, B & Mitchell, T 2013, Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition. in 17th Conference on Computational Natural Language Learning (CoNLL 2013). Association for Computational Linguistics, pp. 84-93.

Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition. / Fyshe, Alona; Talukdar, Partha; Murphy, Brian; Mitchell, Tom.

17th Conference on Computational Natural Language Learning (CoNLL 2013). Association for Computational Linguistics, 2013. p. 84-93.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition

AU - Fyshe, Alona

AU - Talukdar, Partha

AU - Murphy, Brian

AU - Mitchell, Tom

PY - 2013

Y1 - 2013

N2 - In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.

AB - In most previous research on distributional semantics, Vector Space Models (VSMs) of words are built either from topical information (e.g., documents in which a word is present), or from syntactic/semantic types of words (e.g., dependency parse links of a word in sentences), but not both. In this paper, we explore the utility of combining these two representations to build VSM for the task of semantic composition of adjective-noun phrases. Through extensive experiments on benchmark datasets, we find that even though a type-based VSM is effective for semantic composition, it is often outperformed by a VSM built using a combination of topic- and type-based statistics. We also introduce a new evaluation task wherein we predict the composed vector representation of a phrase from the brain activity of a human subject reading that phrase. We exploit a large syntactically parsed corpus of 16 billion tokens to build our VSMs, with vectors for both phrases and words, and make them publicly available.

M3 - Conference contribution

SN - 9781629930077

SP - 84

EP - 93

BT - 17th Conference on Computational Natural Language Learning (CoNLL 2013)

PB - Association for Computational Linguistics

ER -

Fyshe A, Talukdar P, Murphy B, Mitchell T. Documents and Dependencies: an Exploration of Vector Space Models for Semantic Composition. In 17th Conference on Computational Natural Language Learning (CoNLL 2013). Association for Computational Linguistics. 2013. p. 84-93