Efficient and effective multi-modal queries through heterogeneous network embedding

Chi Thang Duong, Thanh Tam Nguyen*, Hongzhi Yin, Matthias Weidlich, Thai Son Mai, Karl Aberer, Quoc Viet Hung Nguyen

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)
366 Downloads (Pure)


The heterogeneity of today’s Web sources requires information retrieval (IR) systems to handle multi-modal queries. Such queries define a user’s information needs by different data modalities, such as keywords, hashtags, user profiles, and other media. Recent IR systems answer such a multi-modal query by considering it as a set of separate uni-modal queries. However, depending on the chosen operationalisation, such an approach is inefficient or ineffective. It either requires multiple passes over the data or leads to inaccuracies since the relations between data modalities are neglected in the relevance assessment. To mitigate these challenges, we present an IR system that has been designed to answer genuine multi-modal queries. It relies on a heterogeneous network embedding, so that features from diverse modalities can be incorporated when representing both, a query and the data over which it shall be evaluated. By embedding a query and the data in the same vector space, the relations across modalities are made explicit and exploited for more accurate query evaluation. At the same time, multi-modal queries are answered with a single pass over the data. An experimental evaluation using diverse real-world and synthetic datasets illustrates that our approach returns twice the amount of relevant information compared to baseline techniques, while scaling to large multi-modal databases.

Original languageEnglish
Pages (from-to)5307-5320
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number11
Early online date19 Jan 2021
Publication statusPublished - 01 Nov 2022


Dive into the research topics of 'Efficient and effective multi-modal queries through heterogeneous network embedding'. Together they form a unique fingerprint.

Cite this