State Tracker Plug-In - User Guide

Introduction

The Geneos State Tracker allows you to monitor any process, system or application that writes to a log file and think in terms of "current state" instead of keyword matching.

To determine the current state you will need to define events that can be identified through the log that indicate a change of state. Events can be a keyword, a record, a piece of text, a collection of statistics or anything that can be extracted from the log file.

The State Tracker is not a state machine so you do not have to define a graph of interconnecting states. State Tracker instead looks at the defined events and determines the state based on a matching event. This is more like a mapping of states to events.

You describe events through keys associated with each state and keys are simply regular expressions you define to find the event. You can use Perl compatible regular expressions which gives you a powerful and expressive regular expression language at your fingertips.

As well as identifying the state you may wish to extract some of the information from the found event and display this in the data view. These extracted values are called event parameters and we'll describe how to capture these later.

Terminology summary:

  • State — a logical state for the application, process, whatever we're tracking.
  • Event — a detectable item from the log being monitored that triggers a change in state.
  • Event Parameters — useful information from the event that is extracted for the view.

Views

View

Table Legend

Name Description
tracker The name of the tracker
previousState The previous state (not shown above)
state The last logical state detected.
time The time the state change was detected using a time extracted from the log file if possible.
Parameter1..n A state change triggered by a regular expression can extract parameters from the event. These are explained in the technical reference.

Configuring State Tracker

For more information on how to configure State Tracker, seeState Tracker Plug-In - Technical Reference.

Below is a snapshot of the State Tracker Plug-in configuration using the Gateway Setup Editor. The first thing you see is a Tracker.

Trackers are defined in groups. Each group of trackers corresponds with a dataview. This can be useful to group similar events from different sources into the same dataview.

Within this group are defined individual trackers.

A tracker enables you to collect together logical states for whatever you may be tracking. You may want to track multiple things in a single file or various things across several files and present them in the same view. The tracker should therefore have a unique name within the group to distinguish what is being tracked.

You could use individual trackers to track the status of a scheduled backup, error messages for a particular component, current state of a message moving through the system, etc.

The snapshot also shows some of the other features of a Tracker which we'll look at shortly.

Basic Trackers

Let's start with a simple example. We'll define a tracker to report the latest error message in a file.

Let's assume the file has a very simple format:

<msg type>: <message text>

We're interested in message of type "Error".

e.g.

Error: LineMonitor(line1): no data at all during last 65000 ms (at 05:10:44)

We can define a tracker, give it a name, the full path to the log file and uncheck the rewind option as we're only interested in messages from this point onwards. We can ignore everything else until we define States.

We'll begin with a default state. This is the state the tracker automatically reports. At this point we would not have seen an error message so we can call it "No Errors".

This state doesn't have to have any keys to match events because we won't be looking for anything to bring us back to this state.

Below this we define another state called "Error" with a single key defined as:

^Error:\s

This is enough to detect that an error occurred. The regular expression defining the key looks for a line starting with the text "Error" followed by a colon and a whitespace character. The carat "^" Means at the start and "\s" means space or whitespace.

Sending this to the Gateway displays the following dataview:

We can now state that we haven't seen any error messages since the time stated in the view.

Until a process writes our earlier error message to the file and the view updates on the next sample.

So now we know that an error occurred and when it occurred. The tracker detected the event by matching the key against the record written to the log file. The key belongs to the Error state so the tracker's state is changed and the time of the change is recorded.

Basic Trackers with Event Parameters

It would be more useful however if we actually knew what the error was. The text of the error can be extracted as an event parameter.

Event parameters are the "capturing groups" within a regular expression. In a regular expression you can group parts of the expression by surrounding them with parentheses.

e.g.

^Error:\s+(.*)

Now we're looking for Error at the start of a line followed by one or more spaces then any character repeated zero or more times. The parentheses group anything we capture after the spaces. The State Tracker uses this to identify that you are interested in this information and extracts it as the first event parameter.

With this regular expression the resulting view looks like this:

Now it's possible to see when an error occurred and what the error was.

Notice the plug-in added a new column. There will be a new column for each expected event parameter.

Column Headings

If your trackers are all tracking the same kind of thing, it would probably be useful to give column headings for your event parameters. It is possible to do this in your tracker group or on the advanced tab of the plug-in. The optional setting on the advanced tab is used if no headings are defined for a group.

By defining column headings, the view becomes more readable to someone who has not configured the plug-in and may not know what parameter1 and parameter2 contain.

Removing Uninteresting Messages with Filters

Suppose you were interested in error messages but not interested in LineMonitor error messages.

Each Tracker has a place where you can define filters. Filters are themselves regular expressions that if matched exclude the message from being matched by the tracker.

We can add a filter with the following regular expression to remove LineMonitor errors.

LineMonitor\([\w\d]+\)

For the purpose of this example we'll assume that a line is identified by any number of word characters (alphanumeric and underscore) and digits [1]. The "\" before the opening and closing parenthesis is necessary to show that it isn't marking a group and we're actually looking for this character.

If we had sent this setup to the probe our Tracker would not report the LineMonitor error from our previous example.

Timeouts

It's possible to change state automatically after a configurable period of time or at a set time. This is known as a timeout. You can define timeouts for any state.

Consider this example. A process writes a heartbeat message to a log file every 10 minutes. It would be useful to define a tracker to look for heartbeats and flag when a heartbeat is missed.

We can start as before a default state of "Awaiting Heartbeat" and a "Heartbeat Received" State. However if we're in either of these states for longer than 10 minutes we should timeout and change to a "Missed A Beat" state.

The configuration would look like this:

There's a lot of information there but you can see the three states defined. Both the default state and Heartbeat Received have a timeout configured at ten minutes. After which the state will change to Missed a Beat.

The only State with a key and thus looking for events is Heartbeat Received. For this example it doesn't matter what the key is as long as we can detect the heartbeat message.

As expected before we receive any heartbeats, the dataview will be similar to this one:

After a timeout, the state will change:

On receipt of a heartbeat, the state will change to Heartbeat Received.

Because both the default state and the Heartbeat Received state share a common timeout, it makes sense to define that state in its own right and refer to it from each dependent state. It is also possible to define states inline so you can define your timeout while defining the state.

There are three types of timeout, relative, atTimeOfDay and absolute.

AtTimeOfDay timeouts are triggered at a fixed point in time each day, such as 00:00, 09:00 or 17:30:01. An atTimeOfDay timeout will only be triggered if the tracker is in the state for which it is defined at the time specified.

Absolute timeouts are also triggered at a fixed point in time. An absolute timeout will be triggered if the tracker is in the state for which it is defined at or after the specified time, including if it returns to that state at any time between the specified time and midnight.

Timestamps

So far the Tracker has been reporting the time of the state change based on the system time the probe retrieves when the event occurs. It's common however for log file messages to include a timestamp. The tracker can be configured to extract and interpret the timestamp and report that time instead.

This may be useful if your probe is running on a different machine or even a different time zone to the process writing to the log.

Configuring a timestamp is done in three steps.

  1. Define a regular expression to capture the timestamp
  2. Specify which capturing group contains the timestamp
  3. Define a date mask to interpret the captured timestamp

Let's suppose our log files contain messages like this:

[yyyy-mm-dd-HH-MM-SS] <Message text>

Our timestamp description would look like:

The pattern or regular expression is quite simple. We have a capturing group as we're not interested in the square brackets surrounding the date-time. Because of this our date pattern group is one. If our expression had no capturing groups we would use zero (which is the default) and means we capture the whole matching value.

The date format is a mask that tells the plug-in how to interpret the captured value. There is a detailed explanation of how to configure this in the technical reference.

Let's suppose our configuration is tracking the state of a process. When the process goes down we extract the date-time from the log file and reflect this in the view.

[2008-09-15-10-15-30] important\_process Exiting

Note the time in the view above. It doesn't matter when the State Tracker finds this text it's the timestamp in the file that is reported.

Timeouts and Timestamps

It's important to note how timeouts work with timestamps. State Tracker uses the time from the timestamp to determine if the current state has timed out whenever it sees a new event. However periodically the State Tracker checks the timeout against the current system time and the time it was seen (probe time). This is to ensure that states time out when no events are received and is fine for the majority of cases.

You should be aware of this however because an absolute timeout of - for example - 19:00 may timeout at 19:00 probe time regardless of the timestamps in the file. If a message timestamp of 19:00 occurs first however this will trigger the timeout.

Using intermediate states to determine reason for arriving at current state

Sometimes you may have a state that can be arrived at through different events. It can be useful to add an intermediate event to see how we got there.

For example, consider the following:

state-tracker26

Here we want to report disconnection but we can get to that state as the result of an error or a legitimate disconnect. With state tracker you can define keys for the error message that transition you to the Connection Error state. The Connection Error state can then have an immediate timeout so that it transitions to disconnected.

In the state tracker view the previousState will be "Connection Error" and the current state will be "Disconnected". With the usual scenario the previousState would be "Connected".

Dynamic Tracking with Templates

Back in section Configuring State Tracker we started with a Tracker to monitor errors in a log file. Let's suppose that we wanted to monitor the latest errors on any line.

The errors look like this:

Error: LineMonitor(line3): no data at all during last 65000 ms (at
05:10:44)

To achieve this you need to first mark the Tracker as a template.

Doing this means that the Tracker will use the keys to match as before but you also need to provide a little extra information so that the Tracker can find an identifier that makes the match unique.

In our message above the identifier would be line3. So when an error like the one above is found we would like a Tracker created to look for errors on line3. If another error occurred on line3 that tracker would capture it. However if an error occurred on another line a new Tracker would be created from the template.

When we come to define the state, our key would look like this.

^Error:\s+LineMonitor\((?<id>[\w\d]+)\):\s+(.*)

What's new here is that our first capturing group has a new construct. The construct "(?<id>…)" is a named group. When a tracker is created as a template, the State Tracker needs to know how to construct a unique identifier for the tracker that will be created once a match has been made. For this reason, each key must contain a named group to define the identifier.

If there is a named group called "id", then the value of this group is used as the identifier. A compound identifier can be configured by specifying multiple capture groups with the names

"id1", "id2", … up to "idN" for some integer N. The tracker identifier will constructed by joining the text matched by each group with underscore characters (e.g. "\id1_id2_…_idN").

That's all that's necessary from a configuration point of view. The view will now only contain the latest errors where they exist for any line.

Note: You can further improve the layout of dynamic trackers by renaming the tracker column and removing the ID from the reported event parameters.

See the below settings in the configuration:

Removing Dynamic Trackers

It is possible that the view could become cluttered with trackers created on the fly by templates. A template option exists that allows dynamically created trackers to be removed by a state change.

This enables trackers to disappear on a timeout state or if an event is found in the log leading to a transition into a removal state.

Each state has optional template options. At the moment there is only one of these, "Remove tracker". Selecting this option removes the tracker created from the template.

A transition into another state that doesn't trigger a removal will bring the tracker back.

Restricting State Transitions - a more state machine like behaviour

The default behaviour of state-tracker is to allow transitions from one state to any other state as long as the matching event occurs.

Sometimes it is useful to restrict state transitions to ensure that events occur in a particular order e.g. A leads to B, leads to C. Alternatively if an event triggers a particular state you may want to ignore all other events until an event occurs that logical transitions from that state.

It is possible to achieve this by defining a set of allowed transitions from any state that you define.

In the example above the default state for what's being monitored is OK. There are states for machine error, machine shop and fixed, although there could be many more.

When the state is machine error we want to ignore all other events until we move to the machine shop state.

This functionality is useful when you want to ensure that states occur in sequence

Taking Exception To Disallowed States

With the above setup you can be sure that to have arrived at a particular state you would have gotten there through the allowed state transitions.

There may be an scenario where if events arrive out of sequence you want to transition to an state that indicates this. The "onException" setting allows a state to referenced or defined that will be transitioned to if an event matches a state transition other than those allowed.

In the following example we have a server that is created, registered with a number of users and then waits for users to connect.

We want to ensure that it registers with at least one user before waiting.

The events looks like this

Server created
Registering to <user name>Registering to ...
Waiting for users

The configuration would look like this:

The server created state can only go to Registering to User. If a waiting for users message was found then the state would be Out of Sequence Error.

The Registering to User state allows transitions to itself and the Waiting for User state.

Care should be taken when configuring the onException state as remember that by default you can transition into any other state and possibly miss it. If you write a rule to detect it, rule evaluation will see it however.