## Abstract

Recently, spatial keyword queries become a hot topic in the literature. One example of these queries is the collective spatial keyword query (CoSKQ) which is to find a set of objects in the database such that it covers a set of given keywords collectively and has the smallest cost. Unfortunately, existing exact algorithms have severe scalability problems and existing approximate algorithms, though scalable, cannot guarantee near-to-optimal solutions. In this paper, we study the CoSKQ problem and address the above issues.

Firstly, we consider the CoSKQ problem using an existing cost measurement called the maximum sum cost. This problem is called MaxSum-CoSKQ and is known to be NP-hard. We observe that the maximum sum cost of a set of objects is dominated by at most three objects which we call the distance owners of the set. Motivated by this, we propose a distance owner-driven approach which involves two algorithms: one is an exact algorithm which runs faster than the best-known existing algorithm by several orders of magnitude and the other is an approximate algorithm which improves the best-known constant approximation factor from 2 to 1.375.

Secondly, we propose a new cost measurement called diameter cost and CoSKQ with this measurement is called Dia-CoSKQ. We prove that Dia-CoSKQ is NP-hard. With the same distance owner-driven approach, we design two algorithms for Dia-CoSKQ: one is an exact algorithm which is efficient and scalable and the other is an approximate algorithm which gives a √3-factor approximation.

We conducted extensive experiments on real datasets which verified that the proposed exact algorithms are scalable and the proposed approximate algorithms return near-to-optimal solutions.

Firstly, we consider the CoSKQ problem using an existing cost measurement called the maximum sum cost. This problem is called MaxSum-CoSKQ and is known to be NP-hard. We observe that the maximum sum cost of a set of objects is dominated by at most three objects which we call the distance owners of the set. Motivated by this, we propose a distance owner-driven approach which involves two algorithms: one is an exact algorithm which runs faster than the best-known existing algorithm by several orders of magnitude and the other is an approximate algorithm which improves the best-known constant approximation factor from 2 to 1.375.

Secondly, we propose a new cost measurement called diameter cost and CoSKQ with this measurement is called Dia-CoSKQ. We prove that Dia-CoSKQ is NP-hard. With the same distance owner-driven approach, we design two algorithms for Dia-CoSKQ: one is an exact algorithm which is efficient and scalable and the other is an approximate algorithm which gives a √3-factor approximation.

We conducted extensive experiments on real datasets which verified that the proposed exact algorithms are scalable and the proposed approximate algorithms return near-to-optimal solutions.

Original language | English |
---|---|

Title of host publication | Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data |

Place of Publication | New York, USA |

Publisher | Association for Computing Machinery (ACM) |

Pages | 689-700 |

Number of pages | 12 |

ISBN (Electronic) | 978-1-4503-2037-5 |

Publication status | Published - 2013 |

Externally published | Yes |