Pattern-based information extraction from pathology reports for cancer registration

Giulio Napolitano, Colin Fox, Richard Middleton, David Connolly

Research output: Contribution to journalArticlepeer-review

25 Citations (Scopus)
408 Downloads (Pure)

Abstract

Objective
To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness.

Method
Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports potentially containing Gleason score was used for evaluation.

Results
Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved.

Conclusions
In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.
Original languageEnglish
Pages (from-to)1887-1894
Number of pages8
JournalCancer Causes & Control: an international journal of studies of cancer in human populations
Volume21
Issue number11
Early online date23 Jul 2010
DOIs
Publication statusPublished - Nov 2010

ASJC Scopus subject areas

  • Cancer Research
  • Oncology

Fingerprint

Dive into the research topics of 'Pattern-based information extraction from pathology reports for cancer registration'. Together they form a unique fingerprint.

Cite this