Scale up your monitoring environment

Overview

The procedures below explain how to scale up your monitoring environment by adding new servers to create a load-balanced environment, a distributed environment, or a combination of both. The server types you can add are:

  • Peers, for load-balancing or redundancy.
  • Pollers, for distributed monitoring; you can add both ordinary pollers and Slim Pollers.

For an explanation of the types of monitoring environment you can create, see Scalable monitoring.

Check cluster state information

When you add new servers to scale up an existing environment, it is important to check that the environment is stable. You can check the state of the current cluster by using OP5 Monitor's integrated command-line back-end tool mon, as follows:

mon node status

OP5 Monitor displays all known nodes, including the local node, peers, and pollers, with their current state. A properly synchronised and online cluster displays all nodes as active. It displays any problems in red.

For more information on the mon tool, see Mon command reference.

Prerequisites

In any distributed or load-balanced monitoring environment, all servers must have:

  • The same operating system version, with the same 32 or 64 bit architecture.
  • The same version of OP5 Monitor installed.

Before you begin

Before you add a new poller or peer to your environment, you must ensure that you:

  • Have suitable servers installed with the same architecture and OP5 Monitor version as the other servers in the environment, with the following configuration:
    • The following TCP ports open on the poller nodes, to allow master nodes to successfully communicate with poller nodes:
      • 22 (SSH) — used for distributing configuration from master to poller nodes.
      • 15551 (Merlin) — used for state communication, such as check results.
    • All server names resolvable by DNS, or manually using /etc/hosts.
    • All server system clocks synchronised, preferably by NTP. For more information, see Install NTP and synchronise servers.
    • All other mandatory and recommended configuration completed on the hosts. For more information, see Additional server and software setup.
  • Create all host groups for which each poller and Slim Poller will be responsible on the master, containing at least one host, with at least one contact and one service. For more information, see Manage hosts and services.

Tip: A simple way to set up servers with the same server configuration as other servers in the same environment is to clone them and subsequently update the host details.

Add a new peer

You can set up peering between two nodes for load-balancing or redundancy. Note that ITRS only supports peering between nodes in the same location, and preferably on the same rack.

Caution: Do not duplicate host names. If you are merging two previously independent OP5 Monitor nodes as peers, they will by default have themselves listed as an object called monitor. Before starting this configuration, give both of these monitor objects more descriptive names. If you fail to do this, SNMPv3 monitoring will break, as one key pair will disappear when merging these identical objects. If you encounter this issue, you must resolve it by creating a new SNMP user. For guidance, see Configure an SNMPv3 user.

Add a new peer to another OP5 Monitor server

In this procedure we will set up a load-balanced monitoring environment with two peered nodes.

Note that in the command line examples below, in scenarios where you are converting an already running standalone server to a load balanced setup:

  • peer01 is your existing OP5 Monitor server.
  • peer02 is the new peer.

Caution: It is essential to get this right to avoid pushing the new peer's empty host and service object configuration to the existing server and overwriting your configuration. If in doubt, please contact ITRS Support.

Configure the new peer

  1. Log on to the new peer as root, using SSH.
  2. Add the existing peer to the new peer's configuration:
    mon node add peer01 type=peer
  3. Set up SSH connectivity towards all the new peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer

Configure the existing peer

  1. Log on to the existing peer as root, using SSH.
  2. Add the new peer to the existing peer's configuration:
    mon node add peer02 type=peer
  3. Set up SSH connectivity towards all the existing peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer

Push the existing peer's configuration to the new peer

  1. Push the existing peer's configuration to the new peer:
    asmonitor mon oconf push peer02
  2. Restart OP5 Monitor on all nodes:
    mon node ctrl --self -- mon restart
  3. After a few minutes, check that the peers are fully connected and synchronised. For guidance, see Check cluster state information.

Copy file status.sav from the existing peer to the new peer

You only need to perform this procedure if the master was already running. It ensures that the servers are coordinated on details such host and service comments, acknowledgements, and scheduled downtimes issued on the original master before the new peer was added.

  1. Stop the monitor service on both peers:
    mon stop
  2. Log on to the existing peer and copy file status.sav to the new peer:
    scp /opt/monitor/var/status.sav peer02:/opt/monitor/var/status.sav
  3. Start OP5 Monitor on both peers:
    mon start

Add a new peer to an existing load-balanced setup

In the command line examples below:

  • peer01 and peer02 are your existing peers.
  • peer03 is the new peer.
  1. Log on to the new peer as root, using SSH.
  2. Add the previously existing peers to the new peer:
    mon node add peer01 type=peer
    mon node add peer02 type=peer
  3. Set up SSH connectivity towards all the new peer's configured peers:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  4. Add the new peer to all other nodes:
    mon node ctrl --type=peer mon node add peer03 type=peer
  5. Log on to the first existing peer in as root, using SSH, and set up the SSH connectivity towards all its configured peers, including the new peer:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  6. Log on to the second existing peer as root, using SSH, and set up the SSH connectivity towards all its configured peers, including the new peer:
    mon sshkey push --type=peer
    asmonitor mon sshkey push --type=peer
  7. On either existing peer, push the server configuration to the new peer:
    asmonitor mon oconf push peer03
  8. On any of the three peers, trigger a full OP5 Monitor restart on all nodes:
    mon node ctrl --self -- mon restart
  9. After a few minutes, check that the peers are fully connected and synchronised. For guidance, see Check cluster state information.

Remove a peer

In this procedure, we will remove a peer from an existing load-balanced setup.

Note that this procedure only removes the peer from the configuration of all other peers, it does not remove the peer's monitoring configuration; it will continue to monitor the same hosts and services as its former peers, but running in standalone mode.

In the command line examples below, peer01 is the peer you are removing.

  1. Log on to the peer you want to remove as root, using SSH.
  2. Remove the peer from all other peers:
    mon node ctrl --type=peer mon node remove peer01\; mon restart
  3. Remove all local configuration:
    mon node remove $(mon node list --type=peer) 
  4. Restart OP5 Monitor:
    mon restart

Add a new poller

The following procedures explain how to add a poller to an existing master in a distributed setup.

In the command line examples below:

  • master01 is your existing master.
  • poller01 is the new poller.
  • se-gbg is the host group poller01 will monitor.

The following steps explain how to add the poller to the master.

Before you begin

Before you run the master server configuration commands, ensure that any peers are fully connected and synchronised. For guidance, see Check cluster state information.

Configure the master

In a load-balanced environment with peered masters, you must perform these steps on the master and all of its peers.

  1. Log in as root, using SSH.
  2. Verify that the host group exists and print its current host members:

    mon query ls hostgroups -c members name=se-gbg

    Caution: Assigning a non-existent host group to a poller will cause OP5 Monitor to crash.

  3. Add the new poller to the configuration:

    mon node add poller01 type=poller hostgroup=se-gbg takeover=no

    Note: Before you continue, ensure that the master can resolve the poller name by DNS or manually using /etc/hosts.

  4. Set up SSH connectivity between the master and the poller:
    1. Confirm that OpenSSH's client and server parts are installed:
      rpm -qa | grep openssh
    2. If they are not installed, install them:
      yum install openssh
      
    3. Edit file /etc/ssh/sshd_config with a text editor, and check that PasswordAuthentication is set to yes:
      edit /etc/ssh/sshd_config
    4. After you have saved your changes, push the SSH configuration to the poller:
      mon sshkey push poller01 
      asmonitor mon sshkey push poller01
  5. (Optional) For increased security, ITRS recommends creating the public SSH key for the OP5 Monitor user and placing it within the authorized_keys file. This enables OP5 Monitor to securely exchange configuration data without first requiring root keys to be exchanged. To do this, log on to both the master and the poller as root, using SSH, and run these commands:
    mkdir -p /opt/monitor/.ssh
    chown monitor:root /opt/monitor/.ssh
    su monitor
    ssh-keygen -t rsa
    scp ~/.ssh/id_rsa.pub root@poller.company.com:
    mkdir -p /opt/monitor/.ssh
    chown monitor:root /opt/monitor/.ssh
    mv ~/id_rsa.pub /opt/monitor/.ssh
    su monitor
    touch ~/.ssh/authorized_keys
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    more ~/.ssh/authorized_keys
    rm ~/id_rsa.pub
  6. On the master, add the master to the poller's configuration:

    mon node ctrl poller01 mon node add master01 type=master connect=no

Push the configuration

In a load-balanced environment with peered masters, perform these steps on the master server only.

  1. Restart Naemon on the master:

    systemctl restart naemon

  2. Push the configuration from the master to the new poller:

    asmonitor mon oconf push poller01

  3. Restart OP5 Monitor on the new poller:

    mon node ctrl poller01 mon restart

  4. Restart OP5 Monitor on the master and all its peers:

    mon node ctrl --self --type=peer mon restart

Add a new Slim Poller

The Slim Poller is a scaled-down version of the poller. For more information on its contents and limitations, see Slim Poller.

To deploy a Slim Poller you need to have:

  • A good knowledge of Docker and the Docker ecosystem.
  • A working Docker environment or container orchestration tool.

Note: ITRS does not support Docker or any of its services, such as swarm clusters and container orchestration. For more information on how to build and manage Docker services, refer to the Docker and Kubernetes documentation.

Before you begin

Before you add a Slim Poller to a master server, ensure the following:

  • Any peers are fully connected and synchronised. For guidance, see Check cluster state information.
  • The host groups which you want the Slim Poller to monitor exist on the master.
  • Folder /opt/monitor/.ssh exists on the master; if not, create it with the following commands:
    mkdir -p /opt/monitor/.ssh
    chown monitor /opt/monitor/.ssh
    chmod 700 /opt/monitor/.ssh				

Set up volumes and containers

Container overview

The Slim Poller consists of two containers, which include the following components:

  • slim-poller_naemon-core — contains Naemon, the Merlin Naemon Eventbroker Module and check plugins; it is responsible for executing all checks.
  • slim-poller_naemon-merlin — contains the Merlin daemon; the daemon communicates with the Eventbroker Module in the slim-poller_naemon-core container, and is responsible for sending all check results back to the master server.

The two containers have a one-to-one relationship. You cannot set up multiple slim-poller_naemon-core containers with one slim-poller_naemon-merlin container.

Volume overview

In order to create stateless containers, you need to have a few volumes set up, as summarised below. These volumes are also included in the deployment examples.

Volume Containers Mount point Persistence Required Description
ipc

slim-poller_naemon-core

slim-poller_naemon-merlin

/var/run/naemon/ Not required Yes Needed for sharing a Linux socket between the naemon-core and naemon-merlin containers.
merlin-conf

slim-poller_naemon-core

slim-poller_naemon-merlin

/opt/monitor/op5/merlin/ Required Yes The Merlin configuration files.
naemon-conf slim-poller_naemon-core /opt/monitor/etc/ Recommended No The Naemon configuration files. If no changes to the default configuration are required, you can omit this volume. Merlin will fetch the required object configuration from the master at startup.
ssh-conf slim-poller_naemon-core /opt/monitor/.ssh/ Required Yes Holds the SSH keys required to connect to the master server.
status slim-poller_naemon-core /opt/monitor/var/status/ Required Yes Saves the Naemon status files, which contain state information such as comments, downtimes, and acknowledgements.

Docker Compose quick start

To get started with Docker Compose:

  1. Save the Docker Compose YAML configuration below into a file called docker-compose.yml in a new directory on your system.
  2. Run the following command to start the Slim Poller containers:
    docker-compose up

Deployment examples

Note that the following deployment examples have a specific version tag set on the Slim Poller container images. You must ensure you are setting the same version tag as the version of your OP5 Monitor master.

Click the name of each YAML file below to see its contents.

Closeddocker-compose.yml

version: "3.1"
services:
  naemon-core:
    depends_on:
      - naemon-merlin
    volumes: 
      - ipc:/var/run/naemon
      - merlin-conf:/opt/monitor/op5/merlin
      - naemon-conf:/opt/monitor/etc
      - ssh-conf:/opt/monitor/.ssh
      - status:/opt/monitor/var/status/
    image: op5com/slim-poller_naemon-core:8.1.0
  naemon-merlin:
    volumes:
      - ipc:/var/run/naemon
      - merlin-conf:/opt/monitor/op5/merlin
    image: op5com/slim-poller_naemon-merlin:8.1.0
volumes:
  ipc:					      
  merlin-conf:
  naemon-conf:
  ssh-conf:
  status:					
Kubernetes/OpenShift

Closedslim-poller.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: op5-slim-poller
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: op5-slim-poller
    template:
      metadata:
	 labels:
	   app: op5-slim-poller
      spec:
        volumes:
  	 - name: ipc
	   emptyDir: {}
	 - name: merlin-conf
	   persistentVolumeClaim:
	     claimName: merlin-conf-pvc
	 - name: naemon-conf
	   persistentVolumeClaim:
	     claimName: naemon-conf-pvc
	 - name: ssh-conf
	   persistentVolumeClaim:
	     claimName: ssh-conf-pvc
	 - name: status
	   persistentVolumeClaim:
	     claimName: status-pvc
	 containers:
	 - image: op5com/slim-poller_naemon-core:8.1.0
	   name: naemon-core
	   resources: {}
	   volumeMounts:
	   - name: ipc
	     mountPath: /var/run/naemon
	   - name: merlin-conf
	     mountPath: /opt/monitor/op5/merlin
	   - name: naemon-conf
	     mountPath: /opt/monitor/etc
	   - name: ssh-conf
	     mountPath: /opt/monitor/.ssh
	   - name: status
	     mountPath: /opt/monitor/var/status
	 - image: op5com/slim-poller_naemon-merlin:8.0.10
	   name: naemon-merlin
	   resources: {}
	   volumeMounts:
	   - name: ipc
	   mountPath: /var/run/naemon
	   - name: merlin-conf
	   mountPath: /opt/monitor/op5/merlin
	 restartPolicy: Always
status: {}         

Closedmerlin-conf-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: merlin-conf-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
	storage: 5Mi     

Closednaemon-conf-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: naemon-conf-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Closedssh-conf-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ssh-conf-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
	storage: 5Mi	

Closedstatus-pvc.yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: status-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 150Mi

Configure the master

In a load-balanced environment with peered masters, you must perform these steps on the master and all of its peers.

  1. Add the Slim Poller to the master using the following command, replacing the variables between angled brackets (<>) with your own values:
    mon node add <poller_name> type=poller hostgroup=<selected_hostgroups> connect=no notifies=no takeover=no address=<poller_IP>

    Note: If the master is on the same network as the Slim Poller and can monitor the same components as the Slim Poller, then you do not need to specify takeover=no.

    For example:

    mon node add poller type=poller hostgroup=pollergroup connect=no notifies=no takeover=no address=192.168.1.2
  2. Restart the master.
    mon restart
  3. If your monitor user needs a password to sync SSH keys from the Slim Poller, create it with the following command:
    passwd monitor

    After the SSH keys are synced, you can delete the password with the following command:

    passwd --delete monitor

Success: You can now see file /var/cache/merlin/config/<POLLER_NAME>.cfg on your master server. If the file does not exist, the master cannot correctly generate the poller configuration.

Configure the Slim Poller

To configure the Slim Poller:

  1. Open a shell on the slim-poller_naemon-core container.
  2. Sync the SSH keys with the master server with the following command, replacing the variable between angled brackets (<>) with your own values:
    mon sshkey push <master_IP>	
  3. Use the following convenience script to set up the master server on the Slim Poller, replacing the variables between angled brackets (<>) with your own values:
    setup.sh --master-name <master_name> --master-address <master_IP> --poller-name <poller_name>	

    If you prefer to set up the master server manually using mon commands, see Poller configuration.

Plugin state retention

Some plugins save state between each invocation, such as check_by_snmp_cpu, which saves retention data at location /var/check_by_snmp_cpu. Other plugins, such as Plugin reference, might require a login file saved on the system.

For cases such as these, we recommend that you define a persistent storage volume in your deployment files. As a starting point for creating new persistent storage, you can use the status volume from the deployment examples in Deployment examples.

Add a new host group to a poller

A poller can monitor several host groups. This procedure is the simplest way of increasing a poller's scope.

Before you begin

Before you run the master server configuration commands, ensure that:

  • All peers (if applicable) are fully connected and synchronised. For more information, see Check cluster state information.
  • The host group you are adding to the poller already exists.

Configure the master server

In a load-balanced environment with peered masters, you must perform these steps on the master and all of its peers.

  1. Log on as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the configuration block related to the poller and append the new host group to the hostgroup setting value, prefixed by a comma. For example, to add a host group called op5-hg2, change the line from:
    hostgroup = op5-hg1

    to:

    hostgroup = op5-hg1,op5-hg2

    Caution: Host groups must be comma-separated only, without any white space. If you fail to do this, error Incompatible object config (sync triggered) may occur during Naemon restart. Remember that adding a non-existent host group will also cause OP5 Monitor to crash.

  4. After you have saved your changes, restart OP5 Monitor:
    mon restart

Remove a poller

In this procedure we will remove a poller called poller01.

The poller will be removed from the master's configuration, and then all distributed configuration on the poller will be removed.

In a load-balanced environment with peered masters, perform these steps on the master server only.

  1. Log on to the master as root, using SSH.
  2. Remove the poller from the configuration on all masters:

    mon node ctrl --self --type=peer mon node remove poller01

  3. Restart OP5 Monitor on all masters:

    mon node ctrl --self --type=peer mon restart

  4. Restart OP5 Monitor on the poller:

    mon node ctrl poller01 mon restart

Configure poller notifications through the master

Sending notifications directly from the poller is not always possible, for example, if the SMS or SMTP gateway does not exist or is inaccessible to the poller. In such scenarios it is possible to send notifications through the master instead.

Configure the master

In a load-balanced environment with peered masters, perform these steps on the master and all of its peers.

  1. Log on to the master as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the configuration block related to the poller and insert the option notifies = no at the end of the block:
  4. poller poller01 {
    address = 192.0.2.50
    port = 15551
    takeover = no
    notifies = no
    }
  5. After you have saved your changes, restart OP5 Monitor:
  6. mon restart

Configure the poller

  1. Log on to the poller as root, using SSH.
  2. Edit the file /opt/monitor/op5/merlin/merlin.conf using a text editor. For example:

    edit /opt/monitor/op5/merlin/merlin.conf

  3. Find the module configuration block and insert the option notifies = no at the end of the block:
  4. module {
    log_file = /var/log/op5/merlin/neb.log;
    notifies = no
    }
  5. After you have saved your changes, restart OP5 Monitor:

    mon restart

Set up file and directory synchronisation

OP5 Monitor has limited support for synchronising files between peers and between masters and pollers.

For example, when you add a new user to OP5 Monitor on one of your masters, you can use this feature to automatically synchronise the user database files on all other peers and pollers.

Synchronisation types

There are two different types of synchronisation:

  • Peered masters synchronising files with each other (two-way).
  • Masters synchronising files to pollers (one-way).

You configure both types in the same way. The example and the procedure described below apply to both of these cases.

Permissions limitations

Files are synchronised using the monitor system user, not root. This means that:

  • Files and directories set up for synchronisation must be readable and owned by the monitor user. For instance, root-only readable files cannot be synchronised.
  • All file paths and their corresponding directories must be writable by the monitor user on the destination node.

How synchronisation is triggered

File and directory synchronisation occur during a configuration push, which is triggered when a new configuration is saved in the user interface. For example, when you add a new host in OP5 Monitor, saving this new configuration triggers synchronisation.

OP5 Monitor only triggers a configuration push to pollers if the new configuration affects objects on the poller. You can trigger a manual configuration push using the command:

asmonitor mon oconf push

Configure synchronisation

For file synchronisation between peers, you need to repeat the procedure on all peers.

In this example:

  • The master will synchronise files to its poller, poller01.
  • The following files will be synchronised:
    • /etc/op5/auth_users.yml
    • /etc/op5/auth_groups.yml
  • The contents of the following directory will also be synchronised:
    • /opt/plugins/custom/

To configure synchronisation:

  1. Log on to the source node (master) as root, using SSH.
  2. Edit file /opt/monitor/op5/merlin/merlin.conf using a text editor:
    edit /opt/monitor/op5/merlin/merlin.conf 
  3. Find the configuration block related to the destination node, in this case poller01. Within this block, insert a new sync sub-block, saving your changes when you are done.

    poller poller01 {
    hostgroup = se-gbg
    address = 192.0.2.50
    port = 15551
    takeover = no
    sync {
    /etc/op5/auth_users.yml
    /etc/op5/auth_groups.yml
    /opt/plugins/custom/
    }
    }

    Note: The trailing slash at the end of /opt/plugins/custom/ in the example above indicates that the contents of the directory must be synchronised, rather than the directory itself. This is the recommended way of synchronising directories.