Abstract
Automatically determining and assigning shared and meaningful
text labels to data extracted from an e-Commerce web
page is a challenging problem. An e-Commerce web page
can display a list of data records, each of which can contain
a combination of data items (e.g. product name and price)
and explicit labels, which describe some of these data items.
Recent advances in extraction techniques have made it
much easier to precisely extract individual data items and
labels from a web page, however, there are two open problems:
1. assigning an explicit label to a data item, and
2. determining labels for the remaining data items. Furthermore,
improvements in the availability and coverage of
vocabularies, especially in the context of e-Commerce web
sites, means that we now have access to a bank of relevant,
meaningful and shared labels which can be assigned to extracted
data items.
However, there is a need for a technique which will take
as input a set of extracted data items and assign automatically
to them the most relevant and meaningful labels from
a shared vocabulary. We observe that the Information Extraction
(IE) community has developed a great number of
techniques which solve problems similar to our own. In this
work-in-progress paper we propose our intention to theoretically
and experimentally evaluate different IE techniques to
ascertain which is most suitable to solve this problem.
Original language | English |
---|---|
Title of host publication | WWW 2014 Companion |
Publisher | Association for Computing Machinery |
Pages | 1275-1278 |
Number of pages | 4 |
ISBN (Print) | 9781450327459 |
DOIs | |
Publication status | Published - Apr 2014 |
Event | International World Wide Web Conference - Seoul, Korea, Republic of Duration: 07 Apr 2014 → 11 Apr 2014 |
Conference
Conference | International World Wide Web Conference |
---|---|
Country/Territory | Korea, Republic of |
City | Seoul |
Period | 07/04/2014 → 11/04/2014 |