Distributed Monitoring

Overview Copied

Distributed Monitoring is a feature of Opsview Monitor that allows checks and notifications to be executed from remote servers, thus giving the capability of scaling your Opsview Monitor system by spreading the load and reducing latency. This is useful when:

Opsview Monitor uses Collectors to handle the execution and collection of results.

For additional failover and load balancing capabilities, Collectors may be grouped together to form a Cluster.

Note

There should always be an odd number of nodes within a collector cluster; 1, 3, and 5. This is to help with resiliency and to avoid split-brain issues.

Opsview Monitor system conext

Each Host in Opsview is assigned to a Cluster. The Host will be actively checked by any Collector in that Cluster.

Note

The first Cluster with a Collector that is registered into Opsview Monitor in a new system will assume the role of ‘Master Monitoring Server’. Using the Advanced Automated Installation, the ‘Master Monitoring Server’ forms part of the ‘Opsview Monitor Orchestrator Server’.

To setup additional Collectors, you need to:

  1. Install the software.
  2. Register it in Opsview Monitor.
  3. Assign it to a Cluster.

These steps are detailed in Managing Collector Servers.

Opsview Monitor Orchestrator Copied

The host where Opsview is first installed is called the Opsview Monitor Orchestrator Server. This host has all the necessary software packages installed so that it can function as a single Opsview system, but you can separate out the functional components onto other hosts to spread the load and decrease latency.

The Opsview Monitor Orchestrator Server will have Host Templates assigned to it for self-monitoring, during initial deployment and after upgrades.

This host is automatically assigned to the Master Monitoring Server, and will normally monitor itself.

To add, remove, register clusters and collectors, see Managing Collector Servers.

Troubleshooting Copied

The most common problem relates to misconfiguration of Components requiring access to the Master MessageQueue Server - Scheduler and Results-Sender. Check /var/log/opsview/opsview.log for detailed errors.

Architecture Copied

Opsview Scheduler is the main component of a Collector. It receives commands and configuration from the Orchestrator and schedules execution of monitoring plugins, event handlers and notification scripts.

The execution of plugins is performed by Opsview Executor, whose only job is to execute commands requested by Scheduler. Results are then sent back to the Opsview Scheduler who requested those commands.

This approach allows sharing multiple Opsview Executors among all Collectors of a given Cluster - Point all Components to the same Cluster MessageQueue, and automatic load-balancing will be available.

Opsview Scheduler sends the results to Opsview Results-Sender, which will forward them to the Results Processors. In the case of a network outage, the Results-Sender will hold the results for a configurable amount of time.

Scalability Copied

For high-availability we recommend you to have a single monitoring Cluster per monitored location (e.g. datacenter) with as many Collector nodes as required. All Collectors should point to single Cluster MessageQueue Server. For more information and assistance, contact our Customer Success Team.

Security Copied

To secure communication over the network, please refer to the Securing Network Connection documentation.

Failure scenarios Copied

Opsview 6 can handle n-1 Collector failures within a Monitoring Cluster and since there is no upper limit on the number of Collectors in Cluster, we recommend you have at least three nodes per Cluster.

If there is a Collector failure, the Orchestrator will detect this within 60 seconds and automatically re-assigns the hosts monitored by that failed Collector to the remaining Collectors of the Cluster. The re-assignment will use the current known state of the objects and the configuration of the last time you have performed an Apply Change from the Configuration menu. Re-assigned hosts and their services are instantly re-checked.

When the Collector recovers, the Orchestrator would also automatically re-assign the hosts back again.

["Opsview On-premises"] ["User Guide"]

Was this topic helpful?