Message Tracker Plug-in User Guide

Introduction

The Message Tracker plug-in allows users to measure the latency of message propagation (e.g. orders and acknowledgements) through their systems and to monitor messages that fail to arrive at specified destinations. The monitored data can be viewed in real time or it can be stored to a database for offline report generation.

In order to monitor latency and missed messages, points on the message path at which to monitor the messages must first be defined. These points are referred to as Checkpoints. Each checkpoint is monitored by a Message Tracker plug-in. Multiple checkpoints can be monitored by the same Message Tracker plug-in.

A Message Tracker plug-in is composed of two parts:

  • Plug-In - this is the generic part of the product that deals with abstract messages. It routes information to other Message Tracker plug-in, and stores data in the database.
  • Adapter - this decodes and parses the messages into a normalised form for processing by the plug-in.

Adapters can be internal or external. Internal Adapters are packaged as part of the Netprobe executable and do not run as a separate process. The external Adapters are independent of the Netprobe and run as a separate process.

There is an API for third parties to write their own external adapters. In addition ITRS provides a custom adapter development service.

Plug-In

The plug-in accepts data from a set of Adapters, which it then uses to perform real time analysis of the messages and/or log the message data to a database.

If the plug-in is configured to perform real time analysis it measures the time taken for each message to traverse the link from a previous checkpoint. In order to perform real time monitoring, the plug-in is configured with a part of the message path: it knows which Checkpoints are immediately before the checkpoint that it is monitoring; these are referred to as Parent Checkpoints. It also knows which checkpoints are immediately after the checkpoint that it is monitoring; these are referred to as Child Checkpoint.

Once a message has been detected by the plug-in and one of its Parent Checkpoint, the time taken for the message to pass between these two Checkpoints can be calculated and the latency of the Link is displayed in the Latency View.

Messages that do not get delivered to children can also be recorded. Any message that has been seen at a Parent Checkpoint and at the Current Checkpoint, but is not seen at a Child Checkpoint is considered a lost message. The user can specify how long the system should wait before it considers a message lost. The plug-in displays the ID and description of the message so that users are notified and can carry out remedial action.

The plug-in can also record all messages seen at a checkpoint to a database for later processing.

Adapter

The Adapter passively listens to the message flow through a defined checkpoint and extracts the Message Identifiers, Message Attributes and the timestamp at which the message was observed. It then passes this information to the plug-in for real-time analysis and database storage.

Internal Adapter

The plug-in is supplied with a set of internal adapters to read data from the client machines. The adapters perform the following operations on the data;

Data Extraction

In order to extract data from a file or data stream, the source of that data needs to be supplied along with the format of the data. The complexity of each of the format types differs considerably. FIX requires a few simple settings, while Regex requires a regular expression to be defined for each piece of data that the user wishes to extract from the data source. A manual is available to describe the setup and scope of each internal adapter. These manuals concentrate on the data extraction, as this is what differentiates one adapter from another.

Message Filtering

Not all messages that are visible at a checkpoint are actually messages that should be monitored. Filtering allows undesired messages that to be ignored. The user can specify a single Message Regex that is applied to the message before it is processed in any way. If the messages do not match the provided expression they are then ignored. The user can also specify any number of regular expressions which are applied to tagged data extracted from the message. Again if the tag does not match the provided regular expression the message will be ignored.

The adapters allow inverted regular expressions which match when the non-inverted regular expression would not match and vice versa. This applies to the message and not a tag. So if the tag filter ITRS applied to tag 11 was inverted then messages with tag 11 that did not include ITRS would match the inverted filter and so would messages that lacked tag 11 entirely. (i.e. Messages with tag 11 = ITRS would be ignored).

Data Normalisation

Once the data has been extracted and filtered, it is presented to adapter's normalisation system as a set of named tags. These names may be defined by the data extractor, as in the case with the FIX adapter where the tag names are the fix tag numbers, or they may be defined by the user, as in case with the REGEX adapter where the user defines a tag name for each regular expression to apply to the message.

The data normalisation allows the user to take one or more tags and map these into IDs and Attributes. For example when monitoring FIX messages from multiple clients the system may need to monitor more than just the ClientOrderID (Tag 11). They may need to use a Message Identifier that is composed of the ClientID (Tag 109) and the ClientOrderID. The data normalisation is used to set up such mappings.

Both Message IDs and Attributes are set up this way. A Named ID must be unique across all the messages sent. It does not have to be present on the message at each checkpoint, but it cannot be present on two different messages. For example using the following information;

Seen at Checkpoint 1:
        Message (ClientID=1111, InternalID=734234)
        Message (ClientID=1112, InternalID=838432)
Seen at Checkpoint 2:
        Message (ClientID=1111, InternalID=98343)
        Message (ClientID=1113, InternalID=1112)
        Message (ClientID=9283, InternalID=838432)

The ClientID on the 1st message at Checkpoint 1 matches the ClientID on the 1st message at Checkpoint 2, so these are both part of the same message. The InternalID on the 2nd message at Checkpoint 1 matches the InternalID on the 3rd message at Checkpoint 2, so these are part of the same message. While the ClientID on the 2nd message at Checkpoint 1 matches the InternalID on the 2nd message at Checkpoint 2, these are not part of the same message because Message Identifiers are only compared if they have the same name.

Attributes have no unique requirements and can be used to record various message data such as "Buy Price" and "Client Name".

Message Processing

Once the message has been normalised and filtered, it is then passed to the plug-in for database insertion and real time tracking. The Real Time Tracking does not pass all the information about the message around. The user can select an attribute (or ID) to use as a description for the message (used when displaying lost messages) and a category (used when displaying latency measurements).

External Adapters

The user can provide an external adapter that communicates directly to the Real Time Tracking. The adapter uses an XMLRPC API that is described in the MESSAGE TRACKER PLUGIN API. It is the Adapter's job to perform the data extraction, the data normalisation and the message filtering. Because the adapter is communicating directly to the Real Time Checkpoint Tracking, only 2 message attributes can be supplied (Description and Category). It is not currently possible to use an external adapter to submit data to the database via the Message Tracker plug-in.

Example

In this example, orders arrive from clients to the FIX engine. The FIX engine sends the orders to the Order Management System, which in turn sends the orders to the appropriate exchange. The confirmations that are received form the exchange are sent back to the clients.

Four checkpoints are defined in this example:

  • Checkpoint1 is where an order arrives into the bank system from the client via FIX.
  • Checkpoint2 is where the order leaves the bank system and is delivered to the exchange.
  • Checkpoint3 is where an order confirmation is returned to the bank system from the exchange.
  • Checkpoint4 is where the order confirmation leaves to the bank system and returns to the client via FIX.

msg-tracker33

Figure 1 Message Flow

There are two approaches in monitor the latency in this example:

  • Link Latency Monitoring
  • Turnaround Latency Monitoring

The difference is in the way the monitoring path is configured. Message Tracker supports both approaches.

Turnaround Latency Monitoring

Turnaround Latency Monitoring will allow monitoring of the time taken between Checkpoint1 and Checkpoint 4, and between Checkpoint2 and Checkpoint 3. This provides roundtrip latency monitoring of the orders at FIX Server and the Order Management System.

This monitoring approach does not require the clocks on all the machines to be in sync, because the latency is measured between checkpoints that are on the same physical box.

msg-tracker35

Figure 3 Setup with unsynchronised Clocks

Example Setup for Turnaround Latency Monitoring

Below is an example setup for real time tracking of messages using link latency monitoring. To keep the setup simple, the setup has been kept data agnostic and the setup for filters, source and tag mapping have been skipped.

Plug-in-1

message-tracker
        adapter
                name: FIX-OUT
                checkpoint: CHECKPOINT1
                source: …                filters: …                format type: …                tag mapping: …                real time tracking:
                record: …        adapter
                name: FIX-RETURN
                checkpoint: CHECKPOINT4
                source: …                filters: …                format type: …                tag mapping: …                real time tracking:
                record: …        rtt
                name: CHECKPOINT1
                child
                        name: CHECKPOINT4
        rtt
                name: CHECKPOINT4
                parent
                        name: CHECKPOINT1

Plug-in-2

message-tracker
        adapter
                name: OMS-OUT
                checkpoint: CHECKPOINT2
                source: …                filters: …                format type: …                tag mapping: …                real time tracking:
                record: …        adapter
                name: OMS-RETURN
                checkpoint: CHECKPOINT3
                source: …                filters: …                format type: …                tag mapping: …                real time tracking:
                record: …        rtt
                name: CHECKPOINT2
                child
                        name: CHECKPOINT3
        rtt
                name: CHECKPOINT3
                parent
                        name: CHECKPOINT2

Views

This plug-in creates five views, examples of which are shown below.

Latency View

This view is used to show the latency of all messages arriving at monitored checkpoints from a configured parent checkpoint.

Note: Unsolicited messages are ignored unless the monitored checkpoint has no parents.

The view has one row per connection pair. Sub-rows are created using the category specified in the message. If the category is not specified in the plug-in for that checkpoint but it has been at a previous checkpoint, the category used in the previous checkpoint will be used. If no category is defined at any checkpoint then no sub rows are created.

msg-tracker17

Figure 4 FIX Latency View

msg-tracker18

Figure 5 OMS Latency View

Latency Table Legend

Name Description
link Message Set identifier composed of the start and end checkpoints in the link together with an optional category
parentCheckpoint Checkpoint name of 1st checkpoint in the link
childCheckpoint Checkpoint name of last (2nd) checkpoint in the link.
category Value of the Category Attribute in the message
linkStatus Status of the monitoring link between the two checkpoints.
maxLatency Maximum latency of the messages monitored over a user defined period (See average > interval). This will be blank if no messages have been seen.
averageLatency Average latency of the messages monitored over a user defined period (See average > interval). This will be blank if no messages have been seen.
stdDevLatency The standard deviation of the messages monitored over a user defined period. (See average > interval). This will be blank if no messages have been seen.
numMessages Number of messages seen over a user defined period (See average > interval).
lastMessageTime Time the last message was detected.
totalMessages Total number of messages seen today.

Lost, Unacknowledged and Slow Message Views

The lost, unacknowledged and slow message views are used to indicate messages which may need further investigation. These views show the following conditions:

  • Lost - messages which have failed to propagate correctly through the system.
  • Unacknowledged - messages older than a threshold still awaiting acknowledgement.
  • Slow - acknowledged messages with a latency above the configured threshold.

The lost message view is always enabled and is controlled by the lost message timeout. Unacknowledged and slow message views are enabled by configuring the unacknowledged timeout and slow message timeouts in the Real Time Tracking section.

All views have a configurable maximum number of rows (see the reportingListSize setting). Messages in these views can be controlled by using the context-menu plug-in commands.

msg-tracker19

Figure 6 Lost Messages View

Headline Legend

Name Description
lostMessages

The number of all lost messages seen.

The count may be greater than the number of messages displayed, if the display is limited by the reporting list size.

The count can also be altered as a result of executing plug-in commands. Lost Messages view only.

cyclicMessages The number of cyclic messages seen (a message is cyclic if seen multiple times at the same checkpoint). Lost Messages view only.
unacknowledgedMessages The number of unacknowledged messages seen. Unacknowledged Messages view only.
maxShownUnacknowledgedPerQueue The configured maximum number of unacknowledged messages to display per checkpoint queue. See the maxShownUnacknowledgedPerQueue setting. Unacknowledged Messages view only.
slowMessages The number of slow messages seen. Slow Messages view only.

Table Legend

Name Description
messageID Message Identifer.
failureType

One of:

  • Lost Messages - LOST or CYCLIC
  • Unacknowledged Messages - UNACK
  • Slow Messages - SLOW
lastCheckpointVisited Name of the Last Checkpoint to see the message.
seenAt List of the Checkpoints the message successfully passed through.
expectedAt List of child Checkpoint names. The message was expected to be seen by at least one of these checkpoints.
problemAt Checkpoint name reporting a slow message. This is the name of parent checkpoint, which received a slow response from a child checkpoint. Slow Messages view only.
description Description of the message as provided by the source adapter.
lastSeenTime Time the failed message was seen at this Checkpoint.
Latency The message latency - Slow Messages view only.

Admin Adapter View

The view has one row per adapter. This is used to check the adapter is running and to display the number of messages that each adapter has detected. This can be turned off using the hideAdminViews setting so long as there is at least one real time check point configured. If the plug-in is not supporting any real time checkpoints, then this view will always be present as it will be the only view published.

msg-tracker20

Figure 7 Admin Adapter View

Links Headline Legend

Name Description
databaseQueueSize The number of messages waiting to be submitted to the database.

Links Table Legend

Name Description
name Adapter Name.
source Name of the data source file name or stream description.
checkpoint Name of the checkpoint that the adapter is reading messages for
messagesLastSample Number of messages seen by the adapter in the current day
messagesPerDay Number of messages seen by the adapter in the current day
realTimeTracking ON if the adapter is submitting the messages to a real time checkpoint for real time monitoring
DBlogging ON if the adapter is placing the message on the database queue for submission to the database

Admin Checkpoint View

The view has one row per Real Time Checkpoint the plug-in is supporting. This is used to check the checkpoints can communicate with one another. The view can be turned off using the hideAdminViews setting.

msg-tracker21

Figure 8 Admin Checkpoint View

Links Table Legend

Name Description
checkpoint Checkpoint name
totalParents Number of parents (active and inactive)
activeParents Number of parent checkpoints with which the name real time checkpoint can communicate with
inactiveParents Number of parent checkpoints with which the name real time checkpoint can not communicate with
totalChildren Number of children (active and inactive)
activeChildren Number of child checkpoints with which the name real time checkpoint can communicate with
inactiveChildren Number of child checkpoints with which the name real time checkpoint can not communicate with

Viewing Multiple Checkpoints

To visualise the latency of the overall system, the latency of multiple checkpoints from multiple samplers can be combined into a single view. This uses the Active Console Custom Metrics View. In Active Console, it is possible to specify a set of paths to different data views for a single Metric View.

An example of a Custom Metrics View is shown below, where the latency between Checkpoints is shown in a single view.

msg-tracker23

Figure 10 Custom Metrics in Active Console

Plug-In Commands

Each messages view (for Lost Messages, Slow Messages and Unacknowledged Messages) provides three plug-in commands to clear the messages. These are presented in a context menu available when right-clicking on a message in the view.

msg-tracker24

Figure 11 Lost Messages menu showing 2 of the 3 Probe commands

These commands are described in more detail below.

Note: For unacknowledged message views, clearing a message entry will only prevent that message from being displayed. If the message is detected as either lost or slow, it will subsequently be displayed in one of these views even if it was previously cleared from the Unacknowledged Messages view.

Clear All

This command will remove all messages from the view. It is available on the dataview and every cell and headline inside it. Clearing the view will also reset the count headline for the view to 0.

Clear this Checkpoint

This command will remove all messages that were seen at a specific checkpoint. It is only available by right-clicking on the cells in the lastCheckpointVisited column for Lost and Unacknowledged messages, or the problemAt column for Slow messages.

All messages that share the same checkpoint with the cell the command was triggered from will be removed from the dataview. This includes the message that the command was triggered on.

Clear Message

This command will remove a single message. It is available on all the cells in the dataview. The message represented by the row the command was triggered upon will be removed.

References

The following related documents are available:

  1. MESSAGE TRACKER - FIX ADAPTER REFERENCE
  2. MESSAGE TRACKER - REGEX ADAPTER REFERENCE
  3. MESSAGE TRACKER PLUGIN API