Statistically-constrained shallow text marking: Techniques, evaluation paradigm and results

B. Murphy, C. Vogel

Research output: Chapter in Book/Report/Conference proceedingChapter

14 Citations (Scopus)


We present three natural language marking strategies based on fast and reliable shallow parsing techniques, and on widely available lexical resources: lexical substitution, adjective conjunction swaps, and relativiser switching. We test these techniques on a random sample of the British National Corpus. Individual candidate marks are checked for goodness of structural and semantic fit, using both lexical resources, and the web as a corpus. A representative sample of marks is given to 25 human judges to evaluate for acceptability and preservation of meaning. This establishes a correlation between corpus based felicity measures and perceived quality, and makes qualified predictions. Grammatical acceptability correlates with our automatic measure strongly (Pearson's r = 0.795, p = 0.001), allowing us to account for about two thirds of variability in human judgements. A moderate but statistically insignificant (Pearson's r = 0.422, p = 0.356) correlation is found with judgements of meaning preservation, indicating that the contextual window of five content words used for our automatic measure may need to be extended.
Original languageEnglish
Title of host publicationProceedings of SPIE - The International Society for Optical Engineering
Publication statusPublished - 01 Jan 2007


Dive into the research topics of 'Statistically-constrained shallow text marking: Techniques, evaluation paradigm and results'. Together they form a unique fingerprint.

Cite this