HPTA: High-Performance Text Analytics

Hans Vandierendonck, Karen Murphy, Mahwish Arif, Dimitrios S. Nikolopoulos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)
404 Downloads (Pure)


One of the main targets of data analytics is unstructured data, which primarily involves textual data. High-performance processing of textual data is non-trivial. We present the HPTA library for high-performance text analytics. The library helps programmers to map textual data to a dense numeric representation, which can be handled more efficiently. HPTA encapsulates three performance optimizations: (i) efficient memory management for textual data, (ii) parallel computation on associative data structures that map text to values and (iii) optimization of the type of associative data structure depending on the program context. We demonstrate that HPTA outperforms popular frameworks for text analytics such as scikit-learn and Spark. 
Original languageEnglish
Title of host publicationProceedings of tge IEEE International Conference on Big Data
Publisher IEEE
Number of pages8
Publication statusPublished - 06 Feb 2017
Event2016 IEEE International Conference on Big Data - DC, Washington, United States
Duration: 05 Dec 201608 Dec 2016


Conference2016 IEEE International Conference on Big Data
Country/TerritoryUnited States


  • data analytics
  • performance optimization
  • text analytics


Dive into the research topics of 'HPTA: High-Performance Text Analytics'. Together they form a unique fingerprint.

Cite this