Projects per year
Cloud data centres are implemented as large-scale clusters with demanding requirements for service performance, availability and cost of operation. As a result of scale and complexity, data centres typically exhibit large numbers of system anomalies resulting from operator error, resource over/under provisioning, hardware or software failures and security issus anomalies are inherently difficult to identify and resolve promptly via human inspection. Therefore, it is vital in a cloud system to have automatic system monitoring that detects potential anomalies and identifies their source. In this paper we present a lightweight anomaly detection tool for Cloud data centres which combines extended log analysis and rigorous correlation of system metrics, implemented by an efficient correlation algorithm which does not require training or complex infrastructure set up. The LADT algorithm is based on the premise that there is a strong correlation between node level and VM level metrics in a cloud system. This correlation will drop significantly in the event of any performance anomaly at the node-level and a continuous drop in the correlation can indicate the presence of a true anomaly in the node. The log analysis of LADT assists in determining whether the correlation drop could be caused by naturally occurring cloud management activity such as VM migration, creation, suspension, termination or resizing. In this way, any potential anomaly alerts are reasoned about to prevent false positives that could be caused by the cloud operator’s activity. We demonstrate LADT with log analysis in a Cloud environment to show how the log analysis is combined with the correlation of systems metrics to achieve accurate anomaly detection.
|Title of host publication||Communications in Computer and Information Science|
|Subtitle of host publication||Cloud Computing and Services Science|
|Publisher||Springer International Publishing Switzerland|
|Publication status||Accepted - 03 Feb 2016|