Load-Balanced Monitoring

Introduction

The OP5 Monitor back-end can be used as a load-balanced monitoring solution. The load balanced model looks like this.

The load-balanced solution have two or more peers in the same environment sharing the same tasks (the hosts to monitor). Any new configuration made on any of the peers is distributed to the other peers. The peers divides the load automatically and keep tracks of when one peer go down, the other(s) take over the job.

Before you start

Prerequisites

There are a few things you need to take care of before you can start setting up a load balanced monitoring. You need to make sure you have at least two servers of the same architecture (32/64 bit), both running the same version of OP5 Monitor.

More specifically, make sure that:

  • OP5 Monitor version >=5.2 is installed and running on all servers.
  • The peers will connect to each other on the following TCP ports, that must be opened up for successful communication:
    • 22 (SSH), used for distributing configuration between peers.
    • 15551 (merlin), used for state communication, such as check results.
  • All server names must be resolvable by DNS or manually via /etc/hosts.
  • All servers' system clocks must be synchronised, preferably by NTP.

Cluster state information

In the OP5 Monitor system, a tool called mon can be found via the command line (accessed via SSH). To view the current cluster state, run the command like this:

mon node status

All known nodes, the local one, peers and pollers, should be displayed, including their current state. A properly synchronised and online cluster should display all nodes as ACTIVE. Beware of any text colored in red.

More information regarding the The Mon Command can be found here.

The configuration

Caution: Duplicate host objects spell trouble. If you are "merging" two previously independent Monitor nodes as peers, they will by default have themselves listed as an object called "monitor". Before starting this configuration, give both of these "monitor" objects more descriptive names. If you don't, SNMPv3 monitoring will break as one key pair will disappear when merging these identical objects. To resolve this problem, if you have it, you must create a new SNMP user as described here.

Setting up the load balanced solution

This load balanced configuration will be set up with two peered nodes ("peers"):

  • peer-blue
  • peer-green

To set up a load balanced monitoring solution

  1. Log on to peer-green via SSH, as root.
  2. Add peer-blue to peer-green's configuration:
    mon node add peer-blue type=peer
  3. Set up SSH connectivity towards all of peer-green's configured peers: mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  4. Log on to peer-blue via SSH, as root.
  5. Add peer-green to peer-blue's configuration:
    mon node add peer-green type=peer
  6. Set up SSH connectivity towards all of peer-blue's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  7. Push peer-blue's configuration to peer-green:
    asmonitor mon oconf push peer-green
  8. Restart OP5 Monitor on all nodes:
    mon node ctrl --self -- mon restart
  9. After a minute or two, make sure that the peers are fully connected and synchronised.

In case you have been running OP5 Monitor for a while already, and you are now about to convert your standalone server to a load balanced setup, you should think of peer-blue as your current OP5 Monitor server, and peer-green as the new peer. This is important to get right, as you may otherwise push the new peer's empty host/service object configuration to the current server, effectively overwriting your actual configuration. If in doubt, please consult your technical contact at op5.

In addition, you will also want to copy /opt/monitor/var/status.sav from your original master to the new peer, in order for them to agree on host/service comments, acknowledgements, scheduled downtimes etc. issued on the original master before the new peer was added. To achieve this, you need to stop the Monitor service with on both nodes before copying the file and then start the Monitor service again on both nodes:

On both peers: mon stop

On peer-blue: scp /opt/monitor/var/status.sav peer-green: /opt/monitor/var/status.sav

On both peers: mon start

Adding another peer

In this instruction we will have the following hosts:

  • peer-green
  • peer-blue
  • peer-red (This is the new one.)

To add a new peer

  1. Log on to peer-red via SSH, as root.
  2. Add all the previously existing peers to peer-red:
    mon node add peer-green type=peer
  3. Set up SSH connectivity towards all of peer-red's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  4. Add peer-red to all other nodes:
    mon node ctrl --type=peer mon node add peer-red type=peer
  5. Log on to all previously existing peers via SSH, as root, and set up SSH connectivity towards all their configured peers (including the new peer-red):
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  6. On any one of the previously existing peers (green or blue in this case), push its configuration to the new peer:
    asmonitor mon oconf push peer-red
  7. Finally, on any of the peers (old or new), trigger a full restart op5 Monitor on all nodes:
    mon node ctrl --self -- mon restart

  8. After a minute or two, make sure that the peers are fully connected and synchronised.

Removing a peer

In this instruction we will remove a peer called:

  • peer-red

The peer will be removed from all other peers' configuration.

To remove a peer

  1. Log on to peer-red via SSH, as root.
  2. Remove oneself from all other peers:mon node ctrl --type=peer mon node remove peer-red\; mon restart
  3. The backslash (\) in front of the semi-colon (;) is important to get right in this command.

  4. Remove all local configuration:
    mon node remove $(mon node list --type=peer) 
  5. Restart OP5 Monitor:
    mon restart
  6. Unless peer-red isn't powered off, the node will be running with the same configuration as its previous peers, but as a standalone server, performing all host/service check on its own.

File and directory synchronisation

Information regarding how to synchronise files and/or directories between peers can be found in the File Synchronisation chapter.