Capacity Events

Overview

Events are predictions of when metrics trends are such that a threshold is likely to be breached at some point in the near future. This allows you to take proactive steps before events occur. This is different from looking at events that already happened, for example when a resource hit a threshold. For these types of events, see Watch List.

Capacity Planner uses multi-linear regression to predict when events will occur. It uses historical data to predict events which are 75% of the duration of the historical data into the future.

For example, if Capacity Planner has 15 days of CPU demand data, it can make predictions of events up to 10 days into the future. Capacity Planner trends and potentially raises capacity events on any metric, provided it has thresholds to trend towards. These thresholds are configured when the project is first set up.

There are two types of events that you can see in the Events configuration panel and the main differences between them are:

  • Operational events — events are predicted for individual entity metric combinations, for example allocation on a datastore or memory utilisation on a server. They cover a shorter period of time because they help you prevent immediate outages. Typical look ahead for these types of events is 3 weeks.
  • Long term cluster trends — trends are predicted for combined hosts on a cluster. They cover a longer period of time because they help you predict your cluster running out of capacity or falling below acceptable levels of server redundancy.

You can drill down into an event to view the historical data used to make the prediction, the predicted trend, and the confidence interval for the prediction. Each event is given an R-squared value. This is a number between 0 and 1 that indicates how closely the trend is likely to fit the prediction. A perfectly linear upwards trend has a value of 1. Typically, any R-squared value greater than 0.7 is considered a good fit. The confidence interval is the grey zone drawn around the predicted trend. The wider this zone, the lower the R-squared is. This zone indicates that the prediction is 95% certain that the values will fall within this range if the trend continues as observed. The event list shows the date and time the events are predicted to occur based on current trends.

The total number of events is shown on the Events panel tab. This number is the sum of Operational events and Long term cluster events.

The events panel always shows all known events regardless of the selected metric view. To filter down the events, use the filtering options available in the panel.

View events

To view Operational events or Long cluster trends, follow these steps:

  1. On the configuration panel on the right, select Events.
  2. In the dialog window that opens, select the tab with the type of events that you want to see.
    The list of events opens.
  3. To see more details of an event, click Options next to the event and select what you want to do next.

Operational events

Operational events are predictions of when resource utilisation is likely to breach a threshold based on trends. An operational event can come from a number of different sources, such as a workload or a server. Use the Events panel to view the predicted events, and see the data used for the analysis of each event.

In the Operational events tab, the predicted events are shown with priority indicators and information about the event. The list is ordered chronologically, with the date indicating when the event will take place. The events are classified with the following default degrees of severity:

  • Critical — over 100% capacity, or over 200% capacity for storage allocation events.
  • Major — over 90% capacity, or over 190% capacity for storage allocation events.
  • Warning — over 70% capacity, or over 170% capacity for storage allocation events.

Long term cluster trends

Long term cluster trends provide forecasts of when a cluster will fall below certain levels of hardware redundancy prior to reaching saturation point. Longer term cluster trends can forecast 12 months into the future. For each cluster, trends are analysed for all its hosts.

In the Long term cluster trends tab, the predicted events are shown with priority indicators and information about the event. The list is ordered chronologically, with the date indicating when the event will take place. The events are classified with the following degrees of severity:

  • Saturation — indicates that at this point the demand in the cluster will reach the current operational capacity of the cluster. Operational capacity is full capacity minus reserve headroom (normally this is 20% for CPU).
  • < N+1 — indicates that at this point a cluster will have less than 1 redundant server.
  • < N+2 — indicates that at this point a cluster will have less than 2 redundant servers.

Analyse time series chart for an event

Each event has options to show data from the source that the given event relates to, and to present the data used in the analysis of that event as a time series chart. A time series chart allows you to see the current trend and a visual representation of the events listed in the Events panel. It shows the current demand and the forecast trend to the point where the threshold is crossed.

To see the time series chart for any event, follow these steps:

  1. Open the Events configuration panel and select the Operational events tab or Long term cluster trends tab, depending on which events you want to see the details of.
  2. Locate the event you are interested in.
  3. Click Options next to the chosen event.
  4. Click Show time series chart to view the time series for that event.

The time series chart is shown with the underlying trend data, additional information about the event, and a legend indicating the predicted data and confidence interval, as well as a chart of the utilization trends leading to and following the event.

The time series chart contains the following important information:

  • Actual — this line represents what was really happening with the resource utilisation.
  • Prediction training period — the highlighted region is the training period for the prediction algorithm. This is the range of data that was used to determine the creation of a predicted capacity event.
  • Confidence interval — this represents the confidence in the given prediction occurring. It is the 95% confidence interval indicating that Capacity Planner is 95% confident that the metric will fall within that range.
  • Priority — priority indicates the degree of severity with which the event is classified.
  • R-squared — r-squared value is an indication of how strong the correlation is. It is an indicator between 0 and 1, with a value of 1 meaning a perfect fit for all data points on the model.