Automatic Extraction of Property Norm-Like Data FromLarge Text Corpora

Colin Kelly*, Barry Devereux, Anna Korhonen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car-petrol). We propose a system for the challenging task of automatic, large-scale acquisition of unconstrained, human-like property norms from large text corpora, and discuss the theoretical implications of such a system. We employ syntactic, semantic, and encyclopedic information to guide our extraction, yielding concept-relation-feature triples (e.g., car be fast, car require petrol, car cause pollution), which approximate property-based conceptual representations. Our novel method extracts candidate triples from parsed corpora (Wikipedia and the British National Corpus) using syntactically and grammatically motivated rules, then reweights triples with a linear combination of their frequency and four statistical metrics. We assess our system output in three ways: lexical comparison with norms derived from human-generated property norm data, direct evaluation by four human judges, and a semantic distance comparison with both WordNet similarity data and human-judged concept similarity ratings. Our system offers a viable and performant method of plausible triple extraction: Our lexical comparison shows comparable performance to the current state-of-the-art, while subsequent evaluations exhibit the human-like character of our generated properties.

Original languageEnglish
Pages (from-to)638-682
Number of pages45
JournalCognitive Science
Volume38
Early online date06 Nov 2013
DOIs
Publication statusPublished - 06 Nov 2014
Externally publishedYes

Keywords

  • Entropy
  • Human evaluation
  • Log-likelihood
  • Natural language processing
  • Pointwise mutual information
  • Property norm
  • Wikipedia
  • WordNet

ASJC Scopus subject areas

  • Language and Linguistics
  • Experimental and Cognitive Psychology
  • Cognitive Neuroscience
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Automatic Extraction of Property Norm-Like Data FromLarge Text Corpora'. Together they form a unique fingerprint.

Cite this