HPTA: High-Performance Text Analytics

Hans Vandierendonck, Karen Murphy, Mahwish Arif, Dimitrios S. Nikolopoulos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
494 Downloads (Pure)

Abstract

One of the main targets of data analytics is unstructured data, which primarily involves textual data. High-performance processing of textual data is non-trivial. We present the HPTA library for high-performance text analytics. The library helps programmers to map textual data to a dense numeric representation, which can be handled more efficiently. HPTA encapsulates three performance optimizations: (i) efficient memory management for textual data, (ii) parallel computation on associative data structures that map text to values and (iii) optimization of the type of associative data structure depending on the program context. We demonstrate that HPTA outperforms popular frameworks for text analytics such as scikit-learn and Spark. 
Original languageEnglish
Title of host publicationProceedings of tge IEEE International Conference on Big Data
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages416-423
Number of pages8
DOIs
Publication statusPublished - 06 Feb 2017
Event2016 IEEE International Conference on Big Data - DC, Washington, United States
Duration: 05 Dec 201608 Dec 2016

Conference

Conference2016 IEEE International Conference on Big Data
Country/TerritoryUnited States
CityWashington
Period05/12/201608/12/2016

Keywords

  • data analytics
  • performance optimization
  • text analytics

Fingerprint

Dive into the research topics of 'HPTA: High-Performance Text Analytics'. Together they form a unique fingerprint.

Cite this