BIND: An indexing strategy for big data processing

Adib Habbal, Fatima Binta Adamu, Suhaidi Hassan, R. Les Cottrell, Bebo White, Mustafa Kaiiali, Ahmad Samer Wazan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

With the huge amount of data continuously accumulated and shared by individuals and organizations, it has become necessary to meet the emerging processing and information retrieval requirements associated with these large volumes of data. This could be achieved by indexing the data sets and reducing heavy computational overhead accustomed to most current indexing strategies during processing of very large amounts of data sets. This study proposes a novel Indexing strategy called Big Data INDexing Strategy (BIND), using a concept of high performance parallel computing. BIND supports parallel distribution of data and performs processing in a MapReduce fashion. To develop the BIND strategy, Ian Foster's task-scheduling concept for parallel processing is applied. The proposed indexing strategy was first tested on a 2-node cluster environment where varying sizes of datasets were used to note if the performance improves or declines as the size of the data increases. Subsequently, it was tested on a 3-node cluster to note the performance when the number of computation resources are increased. The results demonstrates that BIND minimizes the processing and query time as compared to the current strategy. The findings have significant implication in efficiently managing Big Data and facilitating data processing and information retrieval for users and organizations that manage Big Data.
Original languageEnglish
Title of host publicationTENCON 2017 - 2017 IEEE Region 10 Conference
Pages645-650
DOIs
Publication statusPublished - 2017

Fingerprint

Dive into the research topics of 'BIND: An indexing strategy for big data processing'. Together they form a unique fingerprint.

Cite this