Extracting Query Interfaces Based on Form Structures and Semantic Similarity

Jun Hong, Zhongtian He, David A. Bell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

Web databases are now pervasive. Such a database can be accessed via its query interface (usually HTML query form) only. Extracting Web query interfaces is a critical step in data integration across multiple Web databases, which creates a formal representation of a query form by extracting a set of query conditions in it. This paper presents a novel approach to extracting Web query interfaces. In this approach, a generic set of query condition rules are created to define query conditions that are semantically equivalent to SQL search conditions. Query condition rules represent the semantic roles that labels and form elements play in query conditions, and how they are hierarchically grouped into constructs of query conditions. To group labels and form elements in a query form, we explore both their structural proximity in the hierarchy of structures in the query form, which is captured by a tree of nested tags in the HTML codes of the form, and their semantic similarity, which is captured by various short texts used in labels, form elements and their properties. We have implemented the proposed approach and our experimental results show that the approach is highly effective.
Original languageEnglish
Title of host publication2009 IEEE 25th International Conference on Data Engineering
Subtitle of host publication(ICDE 2009)
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1259-1262
Number of pages4
ISBN (Electronic)978-0-7695-3545-6
ISBN (Print)978-1-4244-3422-0
DOIs
Publication statusPublished - Apr 2009
EventICDE 2009 25th International Conference on Data Engineering - Shanghai, China
Duration: 29 Mar 200902 Apr 2009

Conference

ConferenceICDE 2009 25th International Conference on Data Engineering
CountryChina
CityShanghai
Period29/03/200902/04/2009

Bibliographical note

ISSN: 978-0-7695-3545-6

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Fingerprint Dive into the research topics of 'Extracting Query Interfaces Based on Form Structures and Semantic Similarity'. Together they form a unique fingerprint.

  • Cite this

    Hong, J., He, Z., & Bell, D. A. (2009). Extracting Query Interfaces Based on Form Structures and Semantic Similarity. In 2009 IEEE 25th International Conference on Data Engineering : (ICDE 2009) (pp. 1259-1262). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/ICDE.2009.215