Active Checks and Monitoring plugins

Overview Copied

Active Checks are performed by Monitoring Plugins and are the most common and popular way of monitoring hosts.

Monitoring plugins can be written in any language, from bash and C to Perl and Python. For a detailed look at writing Monitoring Plugins, see Monitoring Plugins Development Guidelines.

In essence, a monitoring plugin is a translator that resides between Opsview Monitor and the item we wish to monitor. The plugin speaks both languages; It knows how to speak to Opsview in Opsview Monitor’s language, and it knows how to talk to the Host in the Host’s language:

Opsview and host diagram

For example, If Opsview Monitor wants to talk to a Windows Host it will need to know how to ’talk Windows’. This is where a plugin comes in. Opsview Monitor simply asks the question, ‘Hey, go and find out how full the C: drive is’. The Plugin goes to the Windows Host, asks the question, gets the answer, and converts it into a format that Opsview Monitor understands and can process for alerts, graphs and more.

Most monitoring plugins require input in order to run. For example the Windows C: service check above will likely require the username and password to authenticate, but also the name of the drive that needs to be monitored. These pieces of information are known as arguments. Arguments provide the plugin with the information required to run correctly. Each plugin generally comes with a help file, visible within Opsview Monitor via ‘Show plugin help’, which explains what options are needed, what options are available, and how to set them.

Common options, also known as ‘flags’, are:

A plugin-based service check comprises two component parts:

<plugin> <arguments>

For example, the Service Check below returns the number of users connected to an Oracle database:

check_oracle_health --connect=$HOSTADDRESS$ --user=%ORACREDENTIALS:1% --password=%ORACREDENTIALS:2% --name=system --mode=connected-users

check_oracle_health is the plugin, and the remainder of the line are the arguments needed by the plugin in order to successfully log in and retrieve the number of users (text wrapped in $ symbols are known as macros, and text wrapped in % symbols are known as Variables).

Plugins directory Copied

Plugins live on the master server (or clusters which are managed by the master). They live within the /opt/opsview/monitoringscripts/plugins/ folder and must be executable by the Opsview user.

To add new plugins to this directory:

  1. Upload them via the UI or API.
  2. Run the sync_monitoringscripts playbook to synchronise them to all Collector systems ready for use. This will set them up with the correct permissions.

If the plugin does not have the correct permissions or ownership, it may not work at all or may return errors. The specific errors will depend on the plugin.

Status Codes Copied

All plugins return a status code. This status code is what Opsview Monitor uses to determine the state of the service check.

Status code ‘0’ means that the Service Check is running successfully and without errors, thus ‘OK’:

$  check_icmp -H 192.168.0.1 -w 100.0,20% -c 500.0,60%Output: OK - 192.168.0.1 : rta 0.287ms, lost 0%|rta=0.287ms;100.000;500.000;0; pl=0%;20;60;; rtmax=0.453ms;;;; rtmin=0.242ms;;;;
Errors:
Return code: 0

Status code ‘1’ means that the Service Check is in a warning state, as shown below:

$ check_http 'H 192.168.0.1 -w 5 -c 10
Output: HTTP WARNING: HTTP/1.1 401 Unauthorized - 192 bytes in 0.017 second response time |time=0.017094s;5.000000;10.000000;0.000000 size=192B;;;0
Errors: Return code: 1

Status code ‘2’ means that the Service Check is in a critical state, as shown below:

$  check_nrpe -H 192.168.0.1 -c check_load -a '-w 5,5,5 -c 9,9,9'
Output: Connection refused by host
Errors: Return code: 2

Status code ‘3’ means that the Service Check in an ‘UNKNOWN’ state. This may indicate that the Service Check is mis-configured or that there is an issue with the monitored Host:

$ check_apache_performance 'H 192.168.0.1 -m bytes_per_request -t 60
Output: APACHE STATUS UNKNOWN - 404 Not Found
Errors: Return code: 3

Performance Data Copied

Most plugins return what is known as performance data. This data is listed after the pipe symbol, |, and is picked up by Opsview Monitor and stored for graphing purposes.

$ check_icmp -H 192.168.0.1 -w 100.0,20% -c 500.0,60%
Output: OK - 192.168.0.1 : rta 0.287ms, lost 0% | rta=0.287ms;100.000;500.000;0; pl=0%;20;60;; rtmax=0.453ms;;;; rtmin=0.242ms;;;; Errors:
Return code: 0

The performance data returned is extracted by Opsview Monitor and can be found in the Performance Data field within the ‘investigate mode’ for the Service Check (covered later in this section):

Investigate mode

If there is no performance data present, the ‘Graph’ tab will be hidden (as there is no data to plot on the graph).

Plugin Extra Files Copied

If there are extra files needed for a plugin to work, there are different locations to use based on the type of data:

Administering Plugins in Opsview Monitor Copied

Warning

Log4j vulnerabilities

If importing custom Java-based plugins, we recommend ensuring that any version of Log4j used is >= 2.17.1, to mitigate vulnerabilities:

Importing a plugin Copied

Warning

Only import plugins from trusted sources. Importing untrusted and unverified scripts could compromise your system’s integrity.

Monitoring plugins are stored in the /opt/opsview/monitoringscripts/plugins/ directory on the Opsview Collector servers. Opsview Monitor provides three options for adding a new plugin: the UI, the REST API, or as part of an Opspack.

Warning

Importing monitoring plugins using the command line is no longer supported.

Note

To import a plugin successfully, its filename must start with check_. Additionally, all plugins should support the -h option for printing help output.

If a new plugin execution fails, please read the error message for any potential dependencies that need to be installed, for example, SDKs or pre-requisite packages.

User interface Copied

A monitoring plugin can be imported from the Monitoring Plugins page within the Opsview Monitor user interface, this can be found via ‘Configuration > Monitoring Plugins’

Once loaded, you will be presented with a view similar to the one below, listing all of your currently installed Monitoring Plugins. To import a Monitoring Plugin, click on the Import button:

Import a monitoring plugin

At this point you may get an error if imports have not been enabled by your Opsview Administrator:

Import error message

Note

Import Monitoring Plugin functionality has been disabled by default on new installations.

You can still import Opspacks through the command line, but importing Monitoring Plugins using the command line is no longer supported. Please contact your System Administrator for assistance.

To enable this feature, set the following variable in /opt/opsview/deploy/etc/user_vars.yml:

opsview_allow_plugin_upload: True

This will then be kept upon upgrade.

To push out this change to your system, you must run the orchestrator-install.yml playbook. However, if you do not want to run this playbook at this time, you can add the change to the /opt/opsview/webapp/opsview_web_local.yml file:

Controller::Settings:
  allow_plugin_upload: True

Then restart the opsview-web component:

/opt/opsview/watchdog/bin/opsview-monit restart opsview-web

If you only make this change and not the user_vars.yml edition, the change will be lost when you upgrade. Note that you may already have a Controller::Settings section in this file, so you can simply add to that section.

The import window will look like this:

Import Monitoring Plugins

To import a file, click Browse, select the file you want to import (for example, check_acme_server), and then click Upload.

Note

If plugin help is disabled on your system, you will see the message: Plugin execution not allowed. Contact Customer Support to add plugin help text.

Please contact ITRS Support for assistance.

Import changes

To import a plugin, click Import. Your new plugin will appear in the Monitoring Plugins list page.

If the plugin you are uploading already exists and is not one that is automatically included with Opsview, you can click Overwrite to use the newer plugin.

Overwrite changes

If you try to upload a plugin that already exists as an official Opsview plugin, you will get the error message:

Upload failed for `<PLUGIN_NAME>': Plugin already exists in builtin directory

This means that you cannot upload the plugin. You can work around this by renaming the plugin and then importing it again.

Clashing plugin

REST API Copied

Plugins can also be imported via the REST API following the instructions at Config Plugins for uploading and then importing.

Opspacks Copied

You can import plugins by adding them in new Opspacks. For more information, see Importing an Opspack.

Removing a plugin Copied

This page will show you how to remove a Plugin you no longer need. This can be done on the Monitoring Plugins page.

How to remove Copied

On the Plugin you wish to remove select the contextual menu icon.

Delete a plugin

Click the Delete option.

Accept the confirmation.

The Plugin will then be removed from the system.

Configuring a new plugin-based Service Check Copied

To configure a new plugin-based Service Check, go to the Configuration > Service Checks menu.

Once within the Service Checks window, click on the ‘Add New’ button in the top level and then click on Plugin Check.

Add new Plugin Check

This will cause the following window to load:

Plugin Check

The window is split into two tabs:

Details Tab: Advanced Copied

Hashtags Copied

The Hashtags to which this Service Check will belong, when applied to one or more Hosts.

Globally applied hashtags Copied

If the Service Check has been added to a Hashtag via the ‘Settings > Hashtags’ section instead of the selection box above, then the hashtags will be listed here. To remove the Service Check from the Hashtag listed here, you should edit the hashtag within ‘Settings > Hashtags’.

Dependencies Copied

Dependencies allow you to set a parent/child relationship for the Service Check, e.g. for a check that runs via the Infrastructure Agent, we may choose to have a dependency of a check which simply checks the agent is available. This will mean we only get one alert if the agent on a Host isn’t running, not one alert for each check that runs via that agent.

You can set more than one dependency per service check. In this case, if any one of dependencies is in a alert state, the child Service Check will trigger the “Dependency failure: xxx” status.

Max check attempts Copied

This field determines the number of times a Service Check has to fail for the Service Check to change into a ‘hard state’. In Opsview Monitor there is the concept of ‘soft’ and ‘hard’ states. When a Service Check fails and the Service Check changes into a failure it is initially considered a ‘soft’ failure. After the Service Check has failed for the number of times specified in this field is considered a ‘hard’ failure, i.e. not a temporary ‘blip’. You can use hard states so that you are only notified when a Service Check is truly failing.

Retry interval Copied

A separate field to the ‘Check interval’, the ‘Retry interval’ is only used when a Service Check goes into a ‘CRITICAL’ / ‘WARNING’ / ‘UNKNOWN’ state. For a Service Check to go from a ‘soft’ state to a ‘hard’ state, the Service Check must fail X times, where X is the value set in this field. For example, if the Retry Interval is 1 (minute) and Max Check Attempts is set to 3, the service check will run once a minute for three minutes, after which if the Service Check is still ‘CRITICAL’ it will change from a ‘soft DOWN’ to a ‘hard DOWN’.

Notify for service on Copied

This section determines which states of a Service Check should trigger notifications. For example, you can configure notifications to be sent only when the Service Check is in the CRITICAL or UNKNOWN state.

Note

If a Host does not notify on any states, then the Service Checks on that Host will also not send any Notifications.

Notification period Copied

This field uses the Time Periods already defined within the Opsview Monitor system, and determines when Notifications are allowed to be sent to users.

Create Multiple Services Copied

If a Variable is selected within this drop-down, for each Variable of the selected type added a new Service Check will be added with the value in the Variable added to the Service Check name. I.e. if we have ‘Disk Capacity’ as a Service Check with ‘%DISK%’ selected in the ‘Create Multiple Services: drop-down’, then if four Variables are added via the ‘Variables’ tab ’ four Service Checks will be added ‘Disk Capacity: Value1, Disk Capacity: Value2’, and so forth.

Flap Detection Copied

A service is considered flapping if its state changes too much. If this option is set, any services will be checked for this flapping condition and an icon will appear for the service and notifications will be temporarily disabled until the service comes out of a flapping state. We recommend that flap detection is enabled for active checks. However if you find a service is flapping frequently, there is probably another issue that needs investigating. We recommend that flap detection is disabled for passive checks.

Sensitive arguments Copied

If the Service Check is a plugin-based one, then the Sensitive Arguments checkbox allow you to determine if the arguments for the Service Check are displayed within the ‘Test Service Check’ tab within the investigate mode. If the flag is checked, the arguments will be hidden. If unchecked, the arguments will be shown. If you have TESTCHANGE set within your Role, you will be able to modify the arguments before testing the service check.

Record Output Changes Copied

Normally, the output of a Service Check is only recorded when the state of that service changes. For example, assuming a new check has been set up:

State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% No
OK Service OK: 15% No
OK Service OK: 20% No
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% NO
WARNING Service warning: 70% Yes
WARNING Service warning: 40% No
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% No

This option instead causes every change of output to be logged regardless of change of state (for the selected state changes). For example, for the same sequence above with OK and WARNING selected:

State Output Output Recorded
OK Service OK: 10% Yes
OK Service OK: 15% Yes
OK Service OK: 15% No
OK Service OK: 20% Yes
CRITICAL Service warning: 80% Yes
CRITICAL Service warning: 75% NO - CRITICAL option was not selected
WARNING Service warning: 70% Yes
WARNING Service warning: 40% Yes
WARNING Service warning: 40% No
OK Service OK: 20% Yes
OK Service OK: 18% Yes

Alert every failure Copied

This option forces a Notification to be sent on every check in a non-OK state. This is useful if you have a passive Service Check which receives results.

There are three states for this option:

Event handler Copied

Covered in greater detail in the Event Handlers section of the User Guide, Event Handlers are scripts that can be triggered when a Service Check goes into or returns from a problem state, such as CRITICAL or WARNING. The script can do anything you like, but a common usage includes restarting a service or server (virtual machine) via an API.

Always execute Copied

If the Always execute check box is ticked, then every result received for this service will cause the event handler to be executed. This is useful for passive checks if you need to process all results that arrive, such as matching 3 CRITICAL results with 3 subsequent OK results. If this is unticked, the event handler will be executed only when a state change occurs.

Check Freshness Copied

The Check Freshness is added to the Check Interval period. If the Check Interval was set to 5 minutes and the Check Freshness set to 1 minute then the service check would go Stale on the 6th minute if a results wasn’t received at the 5 minute mark.

Plugin and Arguments Tab Copied

Once you have configured the relevant options within the Details tab, you can click on the Plugin and Arguments tab. The two Help buttons will open extra windows containing either the plugin help or macro help details:

Arguments tab

The main User steps for the plugin-based Service Checks is:

Invert Plugin Results is a checkbox that when checked will invert certain result codes from a plugin, i.e. a critical result can be inverted to OK and vice versa.

Note

If the plugin is not listed on the Plugin drop-down, ensure the plugin starts with check_ and is added correctly.

Once the Service Check and its options have been configured, it can be applied to one or more Hosts. Please refer to the Service Checks Tab section for guides on how to add the newly-created Service Check to a Host.

["Opsview On-premises"] ["User Guide"]

Was this topic helpful?