Message Tracker

Introduction Copied

The Message Tracker plug-in allows users to measure the latency of message propagation (e.g. orders and acknowledgements) through their systems and to monitor messages that fail to arrive at specified destinations. The monitored data can be viewed in real time or it can be stored to a database for offline report generation.

In order to monitor latency and missed messages, points on the message path at which to monitor the messages must first be defined. These points are referred to as Checkpoints. Each checkpoint is monitored by a Message Tracker plug-in. Multiple checkpoints can be monitored by the same Message Tracker plug-in.

A Message Tracker plug-in is composed of two parts:

Adapters can be internal or external. Internal Adapters are packaged as part of the Netprobe executable and do not run as a separate process. The external Adapters are independent of the Netprobe and run as a separate process.

There is an API for third parties to write their own external adapters. In addition ITRS provides a custom adapter development service.

For more information about Message Tracker plugin configuration and settings, see the following documentation:

Plug-In Copied

The plug-in accepts data from a set of Adapters, which it then uses to perform real time analysis of the messages and/or log the message data to a database.

If the plug-in is configured to perform real time analysis it measures the time taken for each message to traverse the link from a previous checkpoint. In order to perform real time monitoring, the plug-in is configured with a part of the message path: it knows which Checkpoints are immediately before the checkpoint that it is monitoring; these are referred to as Parent Checkpoints. It also knows which checkpoints are immediately after the checkpoint that it is monitoring; these are referred to as Child Checkpoint.

Once a message has been detected by the plug-in and one of its Parent Checkpoint, the time taken for the message to pass between these two Checkpoints can be calculated and the latency of the Link is displayed in the Latency View.

Messages that do not get delivered to children can also be recorded. Any message that has been seen at a Parent Checkpoint and at the Current Checkpoint, but is not seen at a Child Checkpoint is considered a lost message. The user can specify how long the system should wait before it considers a message lost. The plug-in displays the ID and description of the message so that users are notified and can carry out remedial action.

The plug-in can also record all messages seen at a checkpoint to a database for later processing.

Adapter Copied

The Adapter passively listens to the message flow through a defined checkpoint and extracts the Message Identifiers, Message Attributes and the timestamp at which the message was observed. It then passes this information to the plug-in for real-time analysis and database storage.

Internal Adapter Copied

The plug-in is supplied with a set of internal adapters to read data from the client machines. The adapters perform the following operations on the data;

Data Extraction Copied

In order to extract data from a file or data stream, the source of that data needs to be supplied along with the format of the data. The complexity of each of the format types differs considerably. FIX requires a few simple settings, while Regex requires a regular expression to be defined for each piece of data that the user wishes to extract from the data source. A manual is available to describe the setup and scope of each internal adapter. These manuals concentrate on the data extraction, as this is what differentiates one adapter from another.

Message Filtering Copied

Not all messages that are visible at a checkpoint are actually messages that should be monitored. Filtering allows undesired messages that to be ignored. The user can specify a single Message Regex that is applied to the message before it is processed in any way. If the messages do not match the provided expression they are then ignored. The user can also specify any number of regular expressions which are applied to tagged data extracted from the message. Again if the tag does not match the provided regular expression the message will be ignored.

The adapters allow inverted regular expressions which match when the non-inverted regular expression would not match and vice versa. This applies to the message and not a tag. So if the tag filter ITRS applied to tag 11 was inverted then messages with tag 11 that did not include ITRS would match the inverted filter and so would messages that lacked tag 11 entirely. (i.e. Messages with tag 11 = ITRS would be ignored).

Data Normalisation Copied

Once the data has been extracted and filtered, it is presented to adapter’s normalisation system as a set of named tags. These names may be defined by the data extractor, as in the case with the FIX adapter where the tag names are the fix tag numbers, or they may be defined by the user, as in case with the REGEX adapter where the user defines a tag name for each regular expression to apply to the message.

The data normalisation allows the user to take one or more tags and map these into IDs and Attributes. For example when monitoring FIX messages from multiple clients the system may need to monitor more than just the ClientOrderID (Tag 11). They may need to use a Message Identifier that is composed of the ClientID (Tag 109) and the ClientOrderID. The data normalisation is used to set up such mappings.

Both Message IDs and Attributes are set up this way. A Named ID must be unique across all the messages sent. It does not have to be present on the message at each checkpoint, but it cannot be present on two different messages. For example using the following information;

Seen at Checkpoint 1:        Message (ClientID=1111, InternalID=734234)        Message (ClientID=1112, InternalID=838432)
Seen at Checkpoint 2:        Message (ClientID=1111, InternalID=98343)        Message (ClientID=1113, InternalID=1112)        Message (ClientID=9283, InternalID=838432)

The ClientID on the 1st message at Checkpoint 1 matches the ClientID on the 1st message at Checkpoint 2, so these are both part of the same message. The InternalID on the 2nd message at Checkpoint 1 matches the InternalID on the 3rd message at Checkpoint 2, so these are part of the same message. While the ClientID on the 2nd message at Checkpoint 1 matches the InternalID on the 2nd message at Checkpoint 2, these are not part of the same message because Message Identifiers are only compared if they have the same name.

Attributes have no unique requirements and can be used to record various message data such as “Buy Price” and “Client Name”.

Message Processing Copied

Once the message has been normalised and filtered, it is then passed to the plug-in for database insertion and real time tracking. The Real Time Tracking does not pass all the information about the message around. The user can select an attribute (or ID) to use as a description for the message (used when displaying lost messages) and a category (used when displaying latency measurements).

External Adapters Copied

The user can provide an external adapter that communicates directly to the Real Time Tracking. The adapter uses an XMLRPC API that is described in the MESSAGE TRACKER PLUGIN API. It is the Adapter’s job to perform the data extraction, the data normalisation and the message filtering. Because the adapter is communicating directly to the Real Time Checkpoint Tracking, only 2 message attributes can be supplied (Description and Category). It is not currently possible to use an external adapter to submit data to the database via the Message Tracker plug-in.

Example Copied

In this example, orders arrive from clients to the FIX engine. The FIX engine sends the orders to the Order Management System, which in turn sends the orders to the appropriate exchange. The confirmations that are received form the exchange are sent back to the clients.

Four checkpoints are defined in this example:

Message Flow

There are two approaches in monitor the latency in this example:

The difference is in the way the monitoring path is configured. Message Tracker supports both approaches.

Link Latency Monitoring will allow monitoring of the time taken between Checkpoint1 and Checkpoint 2, Checkpoint2 and Checkpoint 3 and finally Checkpoint3 and Checkpoint 4. This monitoring approach requires that the clocks on all the machines involved are in sync. If they are not then use the Turnaround Latency Monitoring approach - see below. If millisecond accuracy is required then these clocks must be synchronised to within a millisecond.

Setup with synchronised Clocks

Example Setup for Link Latency Monitoring

Below is an example setup for real time tracking of messages using link latency monitoring. To keep the setup simple, the setup has been kept data agnostic and the setup for filters, source and tag mapping have been skipped.

Plug-in-1

message-tracker        adapter                name: FIX-OUT                checkpoint: CHECKPOINT1                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        adapter                name: FIX-RETURN                checkpoint: CHECKPOINT4                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        rtt                name: CHECKPOINT1                child                        name: CHECKPOINT2                        host: order-management-system:12121        rtt                name: CHECKPOINT4                parent                        name: CHECKPOINT3

Plug-in-2

message-tracker        adapter                name: OMS-OUT                checkpoint: CHECKPOINT2                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        adapter                name: OMS-RETURN                checkpoint: CHECKPOINT3                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        rtt                name: CHECKPOINT2                parent                        name: CHECKPOINT1                child                        name: CHECKPOINT3        rtt                name: CHECKPOINT3                parent                        name: CHECKPOINT2
                child                        name: CHECKPOINT4                        host: fix-system:12121

Turnaround Latency Monitoring Copied

Turnaround Latency Monitoring will allow monitoring of the time taken between Checkpoint1 and Checkpoint 4, and between Checkpoint2 and Checkpoint 3. This provides roundtrip latency monitoring of the orders at FIX Server and the Order Management System.

This monitoring approach does not require the clocks on all the machines to be in sync, because the latency is measured between checkpoints that are on the same physical box.

Setup with unsynchronised Clocks

Example Setup for Turnaround Latency Monitoring

Below is an example setup for real time tracking of messages using link latency monitoring. To keep the setup simple, the setup has been kept data agnostic and the setup for filters, source and tag mapping have been skipped.

Plug-in-1

message-tracker        adapter                name: FIX-OUT                checkpoint: CHECKPOINT1                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        adapter                name: FIX-RETURN                checkpoint: CHECKPOINT4                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        rtt                name: CHECKPOINT1                child                        name: CHECKPOINT4        rtt                name: CHECKPOINT4                parent                        name: CHECKPOINT1

Plug-in-2

message-tracker        adapter                name: OMS-OUT                checkpoint: CHECKPOINT2                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        adapter                name: OMS-RETURN                checkpoint: CHECKPOINT3                source: …                filters: …                format type: …                tag mapping: …                real time tracking:                record: …        rtt                name: CHECKPOINT2                child                        name: CHECKPOINT3        rtt                name: CHECKPOINT3                parent                        name: CHECKPOINT2

Views Copied

This plug-in creates five views, examples of which are shown below.

Latency View Copied

This view is used to show the latency of all messages arriving at monitored checkpoints from a configured parent checkpoint.

Note

Unsolicited messages are ignored unless the monitored checkpoint has no parents.

The view has one row per connection pair. Sub-rows are created using the category specified in the message. If the category is not specified in the plug-in for that checkpoint but it has been at a previous checkpoint, the category used in the previous checkpoint will be used. If no category is defined at any checkpoint then no sub rows are created.

FIX Latency View

OMS Latency View

Latency Table Legend

Name Description
link Message Set identifier composed of the start and end checkpoints in the link together with an optional category
parentCheckpoint Checkpoint name of 1st checkpoint in the link
childCheckpoint Checkpoint name of last (2nd) checkpoint in the link.
category Value of the Category Attribute in the message
linkStatus Status of the monitoring link between the two checkpoints.
maxLatency Maximum latency of the messages monitored over a user defined period (See average > interval). This will be blank if no messages have been seen.
averageLatency Average latency of the messages monitored over a user defined period (See average > interval). This will be blank if no messages have been seen.
stdDevLatency The standard deviation of the messages monitored over a user defined period. (See average > interval). This will be blank if no messages have been seen.
numMessages Number of messages seen over a user defined period (See average > interval).
lastMessageTime Time the last message was detected.
totalMessages Total number of messages seen today.

Lost, Unacknowledged and Slow Message Views Copied

The lost, unacknowledged and slow message views are used to indicate messages which may need further investigation. These views show the following conditions:

The lost message view is always enabled and is controlled by the lost message timeout. Unacknowledged and slow message views are enabled by configuring the unacknowledged timeout and slow message timeouts in the Real Time Tracking section.

All views have a configurable maximum number of rows (see the reportingListSize setting). Messages in these views can be controlled by using the plug-in commands context-menu.

Lost Messages View

Headline Legend

Name Description
lostMessages

The number of all lost messages seen.

The count may be greater than the number of messages displayed, if the display is limited by the reporting list size.

The count can also be altered as a result of executing plug-in commands. Lost Messages view only.

cyclicMessages The number of cyclic messages seen (a message is cyclic if seen multiple times at the same checkpoint). Lost Messages view only.
unacknowledgedMessages The number of unacknowledged messages seen. Unacknowledged Messages view only.
maxShownUnacknowledgedPerQueue The configured maximum number of unacknowledged messages to display per checkpoint queue. See the maxShownUnacknowledgedPerQueue setting. Unacknowledged Messages view only.
slowMessages The number of slow messages seen. Slow Messages view only.

Table Legend

Name Description
messageID Message Identifer.
failureType

One of:

  • Lost Messages - LOST or CYCLIC
  • Unacknowledged Messages - UNACK
  • Slow Messages - SLOW
lastCheckpointVisited Name of the Last Checkpoint to see the message.
seenAt List of the Checkpoints the message successfully passed through.
expectedAt List of child Checkpoint names. The message was expected to be seen by at least one of these checkpoints.
problemAt Checkpoint name reporting a slow message. This is the name of parent checkpoint, which received a slow response from a child checkpoint. Slow Messages view only.
description Description of the message as provided by the source adapter.
lastSeenTime Time the failed message was seen at this Checkpoint.
Latency The message latency - Slow Messages view only.

Admin Adapter View Copied

The view has one row per adapter. This is used to check the adapter is running and to display the number of messages that each adapter has detected. This can be turned off using the hideAdminViews setting so long as there is at least one real time check point configured. If the plug-in is not supporting any real time checkpoints, then this view will always be present as it will be the only view published.

Admin Adapter View

Links Headline Legend

Name Description
databaseQueueSize The number of messages waiting to be submitted to the database.

Links Table Legend

Name Description
name Adapter Name.
source Name of the data source file name or stream description.
checkpoint Name of the checkpoint that the adapter is reading messages for
messagesLastSample Number of messages seen by the adapter in the current day
messagesPerDay Number of messages seen by the adapter in the current day
realTimeTracking ON if the adapter is submitting the messages to a real time checkpoint for real time monitoring
DBlogging ON if the adapter is placing the message on the database queue for submission to the database

Admin Checkpoint View Copied

The view has one row per Real Time Checkpoint the plug-in is supporting. This is used to check the checkpoints can communicate with one another. The view can be turned off using the hideAdminViews setting.

Admin Checkpoint View

Links Table Legend

Name Description
checkpoint Checkpoint name
totalParents Number of parents (active and inactive)
activeParents Number of parent checkpoints with which the name real time checkpoint can communicate with
inactiveParents Number of parent checkpoints with which the name real time checkpoint can not communicate with
totalChildren Number of children (active and inactive)
activeChildren Number of child checkpoints with which the name real time checkpoint can communicate with
inactiveChildren Number of child checkpoints with which the name real time checkpoint can not communicate with

The view has one row per adapter. This is used to check the adapter is running and to display the number of messages that each adapter has detected. This can be turned of using the hideAdminViews setting so long as there is at least one real time check point configured. If the plug-in is not supporting any real time checkpoints, then this view will always be present as it will be the only view published.

Admin Link View

Links Table Legend

Name Description
linkName Link name (defined by the checkpoints at either end of the link).
lastHeartbeatTime Timestamp of the last heartbeat.
status Status of the link. (ACTIVE or INACTIVE)

Viewing Multiple Checkpoints Copied

To visualise the latency of the overall system, the latency of multiple checkpoints from multiple samplers can be combined into a single view. This uses the Active Console Custom Metrics View. In Active Console, it is possible to specify a set of paths to different data views for a single Metric View.

An example of a Custom Metrics View is shown below, where the latency between Checkpoints is shown in a single view.

Custom Metrics in Active Console

Plug-In Commands Copied

Each messages view (for Lost Messages, Slow Messages and Unacknowledged Messages) provides three plug-in commands to clear the messages. These are presented in a context menu available when right-clicking on a message in the view.

Lost Messages menu showing 2 of the 3 Probe commands

These commands are described in more detail below.

Note

For unacknowledged message views, clearing a message entry will only prevent that message from being displayed. If the message is detected as either lost or slow, it will subsequently be displayed in one of these views even if it was previously cleared from the Unacknowledged Messages view.

Clear All Copied

This command will remove all messages from the view. It is available on the dataview and every cell and headline inside it. Clearing the view will also reset the count headline for the view to 0.

Clear this Checkpoint Copied

This command will remove all messages that were seen at a specific checkpoint. It is only available by right-clicking on the cells in the lastCheckpointVisited column for Lost and Unacknowledged messages, or the problemAt column for Slow messages.

All messages that share the same checkpoint with the cell the command was triggered from will be removed from the dataview. This includes the message that the command was triggered on.

Clear Message Copied

This command will remove a single message. It is available on all the cells in the dataview. The message represented by the row the command was triggered upon will be removed.

Further reading Copied

The following related documents are available:

["Geneos"] ["Geneos > Netprobe"] ["Technical Reference"]

Was this topic helpful?