Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system

Anh Vu Vo, Chamin Nalinda Lokugam Hewage, Gianmarco Russo, Neel Chauhan, Debra F. Laefer, Michela Bertolotto, Nhien-An Le-Khac, Ulrich Ofterdinger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Downloads (Pure)

Abstract

This paper introduces a novel LiDAR point cloud data encoding solution that is compact, flexible, and fully supports distributed data storage within the Hadoop distributed computing environment. The proposed data encoding solution is developed based on Sequence File and Google Protocol Buffers. Sequence File is a generic splittable binary file format built in the Hadoop framework for storage of arbitrary binary data. The key challenge in adopting the Sequence File format for LiDAR data is in the strategy for effectively encoding the LiDAR data as binary sequences in a way that the data can be represented compactly, while allowing necessary mutation. For that purpose, a data encoding solution, based on Google Protocol Buffers (a language-neutral, cross-platform, extensible data serialisation framework) was developed and evaluated. Since neither of the underlying technologies is sufficient to completely and efficiently represent all necessary point formats for distributed computing, an innovative fusion of them was required to provide a viable data storage solution. This paper presents the details of such a data encoding implementation and rigorously evaluates the efficiency of the proposed data encoding solution. Benchmarking was done against a straightforward, naive text encoding implementation using a high-density aerial LiDAR scan of a portion of Dublin, Ireland. The results demonstrated a 6-times reduction in data volume, a 4-times reduction in database ingestion time, and up to a 5 times reduction in querying time.
Original languageEnglish
Title of host publicationIEEE BigData 2019 Los Angeles, CA, USA
Publisher IEEE
Pages5644-5653
Number of pages10
DOIs
Publication statusPublished - 24 Feb 2020
EventIEEE Big Data 2019: 2019 IEEE International Conference on Big Data - The Westin Bonaventure Hotel & Suites, Los Angeles, United States
Duration: 09 Dec 201912 Dec 2019
http://bigdataieee.org/BigData2019/index.html

Conference

ConferenceIEEE Big Data 2019
CountryUnited States
CityLos Angeles
Period09/12/201912/12/2019
Internet address

    Fingerprint

Cite this

Vo, A. V., Hewage, C. N. L., Russo, G., Chauhan, N., Laefer, D. F., Bertolotto, M., ... Ofterdinger, U. (2020). Efficient LiDAR point cloud data encoding for scalable data management within the Hadoop eco-system. In IEEE BigData 2019 Los Angeles, CA, USA (pp. 5644-5653). IEEE . https://doi.org/10.1109/BigData47090.2019.9006044