A dynamic resource controller for resolving quality of service issues in modern streaming processing engines

  • M. Reza HoseinyFarahabady
  • , Javid Taheri
  • , Albert Y. Zomaya
  • , Zahir Tari

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Devising an elastic resource allocation controller of data analytical applications in virtualized data-center has received a great attention recently, mainly due to the fact that even a slight performance improvement can translate to huge monetary savings in practical large-scale execution. Apache Flink is among modern streamed data processing run-times that can provide both low latency and high throughput computation in to execute processing pipelines over high-volume and high-velocity data-items under tight latency constraints. However, a yet to be answered challenge in a large-scale platform with tens of worker nodes is how to resolve the run-time violation in the quality of service (QoS) level in a multi-tenant data streaming platforms, particularly when the amount of workload generated by different users fluctuates. Studies showed that a static resource allocation algorithm (round-robin), which is used by default in Apache Flink, suffer from lack of responsiveness to sudden traffic surges happening unpredictably during the run-time. In this paper, we address the problem of resource management in a Flink platform for ensuring different QoS enforcement levels in a platform with shared computing resources. The proposed solution applies theoretical principals borrowed from close-loop control theory to design a CPU and memory adjustment mechanism with the primary goal to fulfill the different QoS levels requested by submitted applications while the resource interference is considered as the critical performance-limiting factor. The performance evaluation is carried out by comparing the proposed resource allocation mechanism with two static heuristics (round robin and class-based weighted fair queuing) in a 80-core cluster under multiple traffic patterns resembling sudden changes in the incoming workloads of low-priory streaming applications. The experimental results confirm the stability of the proposed controller to regulate the underlying platform resources to smoothly follow the target values (QoS violation rates). Particularly, the proposed solution can achieve higher efficiency compared to the other heuristics by reducing the response-time of high priority applications by 53% while maintaining the enforced QoS levels during the burst traffic periods.

Original languageEnglish
Title of host publication2020 IEEE 19th International Symposium on Network Computing and Applications (NCA 2020): Proceedings
EditorsAris Gkoulalas-Divanis, Mirco Marchetti, Dimiter R. Avresky
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages8
ISBN (Electronic)9781728183268
DOIs
Publication statusPublished - 05 Jan 2021
Externally publishedYes
Event19th IEEE International Symposium on Network Computing and Applications, NCA 2020 - Cambridge, United States
Duration: 24 Nov 202027 Nov 2020

Publication series

NameIEEE International Symposium on Network Computing and Applications: Proceedings
PublisherIEEE
ISSN (Print)2643-7910
ISSN (Electronic)2643-7929

Conference

Conference19th IEEE International Symposium on Network Computing and Applications, NCA 2020
Country/TerritoryUnited States
CityCambridge
Period24/11/202027/11/2020

Keywords

  • Apache Flink Streaming Platform
  • Elastic Auto-Tuning
  • Performance Modeling of Computer System
  • Quality of Services (QoS) Issues

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'A dynamic resource controller for resolving quality of service issues in modern streaming processing engines'. Together they form a unique fingerprint.

Cite this