Risk Scores

Overview

Risk score can be used for risk assessment on your estate. Risk score is the probability of a capacity issue happening if the demand increases. Each metric for each entity in the model is given a risk score so you can check if a given metric on a particular server has a high level of risk. Compared to the watch list, that shows you events that already happened, risk is showing you the probability of things going wrong.

In the Risk panel, you can find out which VMs are behaving unusually, but not necessarily exhibiting any definite growth trends. You can also turn off the metric you are not interested in.

Risk is a score between 0 and 1, where 1 indicates that the machine is at its capacity for significant periods of time and 0 means that there is no risk.

Machines that are at their capacity or that are spending a lot of time at or close to their capacity with sufficient variance in the time series to suggest that a small change to that variance will result in threshold breaches, are given higher risk scores. Often time series that are not trending upwards but have lots of variance with the values close to capacity tend to have higher risk scores.

Check the risk score

Risk scores are combined through the hierarchy of the sunburst.

You can check the risk score of an individual segment by following the steps:

  • Hover over a segment in the sunburst to see additional information about that segment including the risk score. What you see is the combined risk score for all entities from that point in the sunburst.

You can check the risk scores of all entities visible in the sunburst. To do this, follow these steps:

  1. Open the Risk panel and observe the risks scores on the list.
  2. Leaving the Risk panel open, zoom into a segment to see how the risk scores change accordingly.
    • You can expand each node of the tree in the risk panel to drill down to the leaf entity and corresponding metric. At each level, the ordering is based on risk score.

Understanding the risk score

In this section, you can read examples of how to use risk scores and drill down to more information in order to better understand how these risk scores affect your estate.

Example 1

In this example, you want to determine the machine and metric with the highest risk value in cluster C47.

Need something to introduce this

  1. In the Risk panel, use the hierarchical view and open each level of the hierarchy. You can see that CPU on the virtual machine VM0510 has a risk score of 0.867. An asterisk beside a value indicates that this is an aggregated score, determined by combining the risk scores of all entities under that point in the hierarchy.
  2. Hover a mouse over the metric, and select Options . Open time series chart for this metric and observe its CPU utilisation. There are a couple of factors that are contributing to the high risk score:
    • Every time the capacity of the machine changes, the change is very significant.
    • The capacity of the machine changes frequently.
    The machine is regularly spending quite some time at 100% CPU level. As capacity decreases, it is possible that CPU will stay at 100% for much longer.

Example 2

In this example, you want to find the machine and metric with the highest risk value in one of the clusters.

The other way of viewing risk is to view all entities from the point in the hardware ordered by risk. In this example we can see that CPU on VM1891 has the highest risk score:

  1. In the sunburst, zoom into the segment that represents that cluster.
  2. In the Risk panel, use the flat view. All entities from that point in the hardware are displayed and ordered by risk. You can see that CPU on the virtual machine VM1891 has a risk score of 1 which is the highest risk score.
  3. Hover a mouse over the metric, and select Options . Open time series chart for this metric and observe its CPU utilisation. You can observe the following:
    • The capacity of the machine changes frequently.
    • CPU behavioural pattern has changed significantly and is now flattening out very close to capacity.