Projects per year
Abstract
One of the main targets of data analytics is unstructured data, which primarily involves
textual data. High-performance processing of textual data is non-trivial. We present the
HPTA library for high-performance text analytics. The library helps programmers to map
textual data to a dense numeric representation, which can be handled more efficiently.
HPTA encapsulates three performance optimizations: (i) efficient memory management
for textual data, (ii) parallel computation on associative data structures that map text to
values and (iii) optimization of the type of associative data structure depending on the
program context. We demonstrate that HPTA outperforms popular frameworks for text
analytics such as scikit-learn and Spark.
Original language | English |
---|---|
Title of host publication | Proceedings of tge IEEE International Conference on Big Data |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 416-423 |
Number of pages | 8 |
DOIs | |
Publication status | Published - 06 Feb 2017 |
Event | 2016 IEEE International Conference on Big Data - DC, Washington, United States Duration: 05 Dec 2016 → 08 Dec 2016 |
Conference
Conference | 2016 IEEE International Conference on Big Data |
---|---|
Country/Territory | United States |
City | Washington |
Period | 05/12/2016 → 08/12/2016 |
Keywords
- data analytics
- performance optimization
- text analytics
Fingerprint
Dive into the research topics of 'HPTA: High-Performance Text Analytics'. Together they form a unique fingerprint.Projects
- 2 Finished
-
R1451CSC: Hybrid Static/Dynamic Scheduling for Task Dataflow Parallel Programs
Vandierendonck, H. (PI)
28/07/2014 → 02/03/2017
Project: Research
-
R6438CSC: An Adaptive, highly Scalable Analytics Platform
Vandierendonck, H. (PI), Nikolopoulos, D. (CoI) & Robinson, P. (CoI)
21/03/2014 → 28/02/2017
Project: Research