Dynamic resource provisioning for sustainable cloud computing systems in the presence of correlated failures

Yogesh Sharma*, Javid Taheri, Weisheng Si, Daniel Sun, Bahman Javadi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


Dependence of computing resources on each other in cloud computing systems (CCS) makes them prone to fail in correlated manner which significantly impacts their service reliability and energy efficiency. Focusing on these two metrics of CCS while considering correlated failures remained an open question, which is the focus of this work. This paper proposes mechanisms for improving reliability and energy efficiency jointly under correlated failures in CCS. In order to model failure correlation, statistical cluster analysis techniques are applied to real failure traces. Then, mathematical models are built to calculate reliability and energy consumption of failure prone CCS. These mathematical models are used to design fault-tolerant and energy-aware resource provisioning mechanisms/policies. In order to further reduce the energy consumption, a correlated failure-aware VM consolidation policy is also proposed in this paper. A simulation based study of the proposed resource management policies and fault tolerance mechanisms is conducted by using real failure traces and Bag-of-Tasks workload. The results demonstrate that by exploiting failure correlation with the proposed resource management policies, we reduce the occurrence of failures on tasks by 34 percent and increase the energy efficiency of the system by 20 percent, approximately in comparison to the environments where failures are handled independently.

Original languageEnglish
Pages (from-to)641-654
Number of pages14
JournalIEEE Transactions on Sustainable Computing
Issue number4
Early online date18 Sept 2020
Publication statusPublished - 01 Oct 2021
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2020 IEEE.


  • Bag of tasks
  • Checkpointing
  • Cloud computing
  • Cluster analysis
  • Correlated failures
  • Energy efficiency
  • Reliability
  • VM consolidation
  • VM migration

ASJC Scopus subject areas

  • Software
  • Renewable Energy, Sustainability and the Environment
  • Hardware and Architecture
  • Control and Optimization
  • Computational Theory and Mathematics


Dive into the research topics of 'Dynamic resource provisioning for sustainable cloud computing systems in the presence of correlated failures'. Together they form a unique fingerprint.

Cite this