Evaluation of information extraction techniques to label extracted data from e-commerce web page

Neil Anderson, Jun Hong

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)


Automatically determining and assigning shared and meaningful text labels to data extracted from an e-Commerce web page is a challenging problem. An e-Commerce web page can display a list of data records, each of which can contain a combination of data items (e.g. product name and price) and explicit labels, which describe some of these data items. Recent advances in extraction techniques have made it much easier to precisely extract individual data items and labels from a web page, however, there are two open problems: 1. assigning an explicit label to a data item, and 2. determining labels for the remaining data items. Furthermore, improvements in the availability and coverage of vocabularies, especially in the context of e-Commerce web sites, means that we now have access to a bank of relevant, meaningful and shared labels which can be assigned to extracted data items. However, there is a need for a technique which will take as input a set of extracted data items and assign automatically to them the most relevant and meaningful labels from a shared vocabulary. We observe that the Information Extraction (IE) community has developed a great number of techniques which solve problems similar to our own. In this work-in-progress paper we propose our intention to theoretically and experimentally evaluate different IE techniques to ascertain which is most suitable to solve this problem.
Original languageEnglish
Title of host publicationWWW 2014 Companion
PublisherAssociation for Computing Machinery
Number of pages4
ISBN (Print)9781450327459
Publication statusPublished - Apr 2014
EventInternational World Wide Web Conference - Seoul, Korea, Republic of
Duration: 07 Apr 201411 Apr 2014


ConferenceInternational World Wide Web Conference
Country/TerritoryKorea, Republic of


Dive into the research topics of 'Evaluation of information extraction techniques to label extracted data from e-commerce web page'. Together they form a unique fingerprint.

Cite this