Geneos ["Geneos"]

The end of life (EOL) date for this module is on 31 January, 2020.

Cluster Failure Cases

Open Access Client Disconnects

These could occur due to excessive subscriptions or number of clients. The underlying framework will automatically heal such errors, assuming the cluster node resource usage reduces sufficiently for this to occur.

To keep an eye on these, enable the clients view in Self-Monitoring and make sure that it has an entry for each client you expect.

Gateway Disconnects

The latest release of the Open Access nodes upgraded the EMF2 layer to include the same Data Quality algorithms as implemented in the gateway. This means that if the node is unable to keep up with the data being sent from the gateway, the gateway will be temporarily disconnected to allow the system to recover. Note that this only happens in extreme situations, for example the oldest message still to be processed being over a minute pending.

When such an event occurs, the status of the relevant row in the gateways view in Self-Monitoring will change to Error.

Cluster Split

If the internal communication between nodes in the cluster cannot be maintained, the cluster will split. In this situation, different nodes have different ideas about which other nodes are part of the cluster.

Each part of the split cluster will be connected to all the gateways, so it may be difficult to perceive the problem purely from an Open Access client. Rather, look in Self-Monitoring under each cluster-nodes view and make sure that there are the same number of rows in each.

Once a node has been removed from the cluster, it cannot rejoin without being restarted.

Client connection points

Each client is started with a list of nodes to connect to. All queries submitted are shared across each cluster node but there is a natural overhead for the initial node, since messages from other nodes must be funnelled back through it.

If possible, consider spreading your clients across multiple nodes in the cluster.