Hosts
Overview Copied
In Opsview Monitor, a Host is an autonomous computing device, such as a server, virtual server, a collector server, database server, workstation, PC, network device, storage device, sensor, tablet, and mobile device.
Hosts are effectively logical end-points, meaning if you wish to monitor an Oracle database on a Host, you add the Host. Conversely, if you wish to monitor a VMware vSphere server running 64 guests, you can add that as a Host or you could add each guest individually as a Host, allowing the monitoring and alerting on the per-guest metrics such as CPU usage and so forth.
The creation, modification and deletion of Hosts is done via the ‘Hosts’ page from the Configuration menu.
You can also choose to add Hosts via an Autodiscovery automated scan (see Autodiscovery) or an Automonitor scan, see AutoMonitor Scanner.
The Host settings section comprises of a sortable/filterable grid view containing all of the Hosts within the Opsview Monitor system. Each column header can be filtered on relevant information, i.e. a User can filter the list to show only Hosts that have a given Host Template applied, or show only Hosts that are members of a given Host Group. When a filter is in place the column header changes color:
In the top left, there are six buttons:
- Clear filters: This button will clear all filters applied via the column headers. Note that this will only be seen when a filter is applied.
- Add new: This button loads a modal window which allows you to add a new Host.
- Edit or Bulk Edit: This button loads a modal window which allows you to change the same settings for one or more Hosts, depending on how many are selected with check boxes.
- Delete: This button allows you to delete one or more Hosts selected with check boxes.
- Export: This button allows you to download the list of Hosts and their relevant data (Host Group, Host template(s), etc) in the chosen format.
- Refresh icon: Reload the data within this page.
In the bottom left, a set of controls enables you to move quickly between the paginated list when a large number of Hosts are in the system.
A drop-down menu allows you to configure the number of Hosts visible on the page. There are options for ‘5, 10, 25, 50, 100 and 250’ Hosts per page. There is also a string of text highlighting the limit of Hosts that can be added to this system, based on your licensing plan.
The Host list page by default will list all Hosts within Opsview Monitor that your role can see.
However, when one or more Hosts have been modified and are in a ‘pending’ state, i.e. when modifications have been made but the Apply Changes hasn’t been performed yet, the Host list page will have a new section added at the top which will display all modified Hosts.
Specific hosts may be marked as special by the addition on an icon which means some actions may be limited on these hosts. These icons are as follows:
Icon | Server Type |
---|---|
Orchestrator | |
Collector | |
Flow Source |
The Apply Changes button will open the Apply Changes modal window where you can submit your modifications and effectively put them into production.
Working with Hosts Copied
Single Hosts Copied
You can edit a single Host by either double clicking on the row or clicking on the contextual menu button in the row and selecting the appropriate action.
Multiple Hosts Copied
You can also select a number of hosts to work with at same time via the checkboxes; this is known as Bulk Edit. The following behaviour is possible:
- Click on a checkbox to select.
- Click on a selected checkbox to deselect.
- Click on an unselected checkbox to add to selection.
- Shift-click to select a range from the last focused row to the current row, to add to selection.
- Control-click to clear selection and choose this row only.
To use the keyboard, use the mouse to select a row in the grid then:
- Press up/down to highlight rows.
- Press space to select a new row.
- Press space to deselect a row if it is already selected.
- Press shift-up or shift-down to select a range of rows to add to selection.
- Press control-space to clear selection and choose this row only.
Note
The All Hosts grid (bottom half of the page) and the Modified Hosts grid (top half of the page) will keep in sync with the selection of hosts.
As selection changes, the total number of hosts will be displayed in the grid header:
The buttons in the top right will update based on the selection:
No selection:
Single selection:
Multiple selection:
The Select All checkbox in the header can be used to select or deselect all hosts matching the current filtering. There will be a slight delay for the checkbox header to update as a backend call will be needed to get a list of all appropriate hosts.
The buttons in the top right will update based on the selection:
Adding or Editing a Single Host Copied
When a Host is edited or the ‘Add New’ button is clicked, a modal window will appear. A similar pre-filled window will be prompted if you click on the host contextual menu and select Edit.
The modal window is split into five tabs (the additional NetAudit tab is displayed if you have a subscription for the Network Analyzer):
- Host
- Notifications
- Service Checks
- SNMP
- Variables
- NetAudit
Host tab Copied
The Host tab is the main configuration window when adding or configuring a Host. It is split into two sections, ‘Basic’ and ‘Advanced’. The Basic section is the main settings a Host needs to have configured. Items denoted with a red star (*) are mandatory fields.
In the Basic section there are four options that need to be configured:
Primary Hostname/IP Copied
In this field should be the network address of the Host; either an IP or a domain resolvable by Opsview Monitor. The network address entered in this is used by the system macro ‘$HOSTADDRESS$’, which is used throughout the entire Opsview Monitor system.
Host Name Copied
This is the user-friendly name of the Host and is displayed in the Opsview Monitor. If your network address is 192.168.123.123
, you may want to give the Host an alternative name, for example Router
. This field must be unique in the system.
Host Group Copied
A Host Group is a container for one or more Hosts. In this drop-down list, all available Host Groups will be listed. Host Groups containing other Host Groups won’t be listed - Host Groups can only contain either all Hosts, or all Host Groups but not a mixture. See Host Group documentation for more details.
Monitored By Copied
This option is only visible when you have Collector Clusters set up for Distributed Monitoring and defines which Cluster will do the actual monitoring of the Host. This provides the ability to distribute the monitoring load across more servers.
The list of monitoring clusters will only show visible clusters (see Role Configuration) and is listed in the order of:
- Master Monitoring Server (the central orchestrator)
- The remaining clusters, in alphabetical order
This field may be entirely hidden if:
- this host is the central orchestrator host.
- this host is used as a collector.
- this host is used as a flow collector.
- there are no other monitoring clusters.
- the user does not have permission to see this monitoring cluster.
Host Templates Copied
Host Templates are a group of Service Checks that can be applied to a given Host. Host Templates provide the ability to monitor certain technologies; for example, if the Host you are adding is an Oracle database, apply the ‘Database - Oracle RDBMS’ Host template by selecting it in the left-hand column and clicking the ‘right arrow’:
In the Advanced section there is a range of optional settings that can be configured for the Host:
Other Hostnames/IPs Copied
This is a comma-separated list of other network addresses relating to the Host. For example, if the Host has two IP addresses, you may enter the first IP address in the ‘Basic > Primary Hostname/IP’ field, and the second IP address in this field. The primary Hostname/IP is addressed using the $HOSTADDRESS$ macro, whereas all comma-separated values entered in this field are addressed as ‘$ADDRESS1$, $ADDRESS2$ and so forth. To use these values in a Service Check instead of the Primary Hostname/IP, simply replace $HOSTADDRESS$ with $ADDRESS1$, for example. If other addresses are not specified in this field yet the $ADDRESSx$ macro is used, Opsview Monitor will default the value to the Primary Hostname/IP instead. This field is also used for relating these IP addresses to this Host for the purpose of SNMP trap processing.
Description Copied
Free text entry field, this field is purely for describing the Host and is not used elsewhere within Opsview Monitor.
Host Check Command Copied
The Host Check Command is used to determine the Host status which can be one of three statuses: ‘UP’ - responding to the Host Check Command, ‘DOWN’ - no response, or ‘UNREACHABLE’ for when the Host has a parent relationship configured and the parent is in a DOWN state
By default, the Host Check Command is ping, therefore if ICMP traffic is blocked between the Host and Opsview Monitor you should change the Host Check Command to one that is allowed to traverse the network, inbound to the Host.
Icon Copied
The icon is used within the ‘Navigator’ in the ‘Monitoring’ menu to identify the Host, along with being visible in the Host list page. Opsview Monitor ships with a series of default icons that can be chosen via the drop-down box.
To upload your own icon use the hosticon_admin
script via the command line. This script is located within /opt/opsview/coreutils/bin/
. As the root
user, run the command:
hosticon_admin add "LOGO - Hosticon" /path/to/Hosticon.png
where "LOGO - Hosticon"
is what you wish the icon to show as within the dropdown menu, and /path/to/Hosticon.png
is where the image is you wish to convert into an icon. To delete a Host icon, run the following command as the opsview user:
hosticon_admin remove "LOGO - Hosticon"
To list all of the icons within Opsview Monitor run the command:
hosticon_admin list
You may need to install the package imagemagick
(Debian/Ubuntu) or ImageMagick (RHEL/OL) to use this functionality.
Check Period Copied
The check period is a choice of a list of Time Periods available within Opsview Monitor. Time Periods are essentially a weekly format which allows a user to create a time period called ‘working hours’, for example, that is Monday to Friday, 9:00 am to 5:00 pm.
When this time period is applied to a Host, this Host is only monitored during the specified times of the Time Period.
Check Interval Copied
Working in combination with the check period and the Host Check Command, the Check Interval is how regularly the Host is checked using the Host Check Command during the specified time period. If set to ‘5m’ (default) and all settings are left to default, the Host will be pinged once every five minutes when the time period is valid (i.e. the Host is being monitored).
This field allows for hours (h), minutes (m) or seconds (s) i.e. ‘24h’ means once a day, ’30s’ means every 30 seconds. The field can also be set to ‘0’ which means the Host is always considered ‘UP’ unless a check has been manually requested (i.e. ‘Recheck’ is run against the Host via its contextual menu). This field must be greater than or equal to 0.
Max Check Attempts Copied
This field determines the number of times a Host Check Command has to fail for the Host to change into a ‘hard state’. In Opsview Monitor there is the concept of ‘Soft’ and ‘Hard’ states. When a Host check fails and the Host changes into the ‘DOWN’ state it is considered a ‘Soft’ state. After the Host Check Command has failed for the number of times specified in this field is considered a ‘hard’ state, i.e. not a temporary blip, etc. You can use hard states so that they are only notified when a Host is truly down. The interval used here is not the ‘check interval’ but the ‘Retry interval’. This field must be greater than or equal to 1.
Retry Interval Copied
A separate field to the ‘Check interval’, the ‘Retry Interval’ is only used when a Host goes into the ‘DOWN’ / ‘UNREACHABLE’ state. For a Host to go from a ‘soft’ state to a ‘hard’ state, the Host Check Command must fail $X number of times, where $X is the value set in this field. For example, if the Retry Interval is 1m and the Max Check Attempts is set to 3, the Host Check Command will run once a minute for two further minutes (the first failure is what triggers the retry)’ after which if the Host is still ‘DOWN’ it will change from a ‘soft DOWN’ to a ‘hard DOWN’. This field must be greater than 0 and must be less than Check Interval.
Hashtags Copied
Covered in greater detail within the Hashtags section, this drop-down is a list of all Hashtags within Opsview Monitor. By selecting one or more Hashtags from this drop down menu you are ’tagging’ the Host with the Hashtag. This means when you tag a Host with ’linux-systems’, anyone whose role allows them to view Hosts tagged with ’linux-systems’ will be able to view this Host. Similar logic applies for Notifications.
Globally Applied Hashtags Copied
When a hashtag is applied from the Configuration > Hashtags menu and not via the Host, it will appear in this list. To remove the hashtag from the Host simply edit the relevant Hashtag via Configuration > Hashtags and click on the hashtag in question.
Event Handler Copied
Covered in greater detail in the Event Handler section of the User Guide, Event Handlers are scripts that can be triggered when a Host goes into a ‘DOWN’ or ‘UNREACHABLE’ state (soft/hard, depending on the event handler script). The script can do anything you like, but a common usage includes restarting a service or server (virtual machine, for example) via an API.
Always execute Copied
If this is ticked, then every result received for this Host check will cause the event handler to be executed. If this is unticked, the event handler will be executed only when a state change occurs.
Parents Copied
This relationship is used to calculate if a Host is DOWN or UNREACHABLE, i.e., if the dependencies for the Host mean the Host is really down or if something in the middle is hiding the true state of the Host. Use this to relationship to minimize Notifications as you can disable Notifications for UNREACHABLE Hosts.
For example, if you have a switch as the parent of 10 Hosts and the switch is marked as DOWN, then when the 10 Hosts are checked and considered DOWN, they will be marked as UNREACHABLE instead and you will only get one Notification for the switch instead of 10 Host Notifications. There may be a delay in this eventual condition as results will be coming in at different times. You can select multiple parents, if you have a failover capability.
Note
Despite the UI/API allowing it, you should not set parent or child relationships between the collectors themselves in any monitoring cluster, as collectors do not have a dependency between each other and are considered equals.
Notifications tab Copied
The Notifications tab contains various settings relating to when and why Notifications are sent for this particular Host.
Notify On Copied
This section determines which states the Host should notify on, i.e. only on ‘DOWN’ or ‘UNREACHABLE’, for example. If a Host does not notify on any states, then the services on that Host will also not send any notifications.
Notification Period Copied
This field uses the ‘Time Periods’ already defined within Opsview Monitor, and determines when notifications are allowed to be sent to users
Flap Detection Copied
This checkbox toggles flap detection on and off. Flap Detection is used in notifications and other areas of Opsview Monitor, i.e. don’t send me an alert if the Host is flapping. A Host is considered ‘flapping’ if it changes state between OK and non-OK more than seven times in the last 20 checks.
Notifications for Flapping starting and stopping will not be sent when notifications are suppressed (such as when acknowledged or in downtime) or when flapping started during downtime but continues after downtime ends.
Service Checks tab Copied
The Service Checks tab is designed to give you the ability to:
- Add Service Checks to a Host.
- Modify Service Checks on a Host basis, i.e. use different arguments just for this Host.
- Omit Service Checks that have been inherited via a Host Template; i.e. ‘we don’t want this service check on this Host but we want the rest from the Host Template’.
- Test Service Checks against a Host before submitting the change and Apply Changes.
The left-hand section of the Service Checks tab displays the Service Check ’tree’. Service Checks reside within Service Groups, e.g. the checks visible above, such as ‘CPU statistics’, live within the service group ‘OS - Base Unix Agent’. The algorithm behind the tree structure creation uses the hyphens as the separator, therefore ‘OS - Base Unix Agent’ becomes ‘OS’ at the top level, and ‘Base Unix Agent’ at the 2nd level down.
In the tree on the far right of the Service Checks’ row (the items with the check boxes) is the location where one of two icons will potentially be displayed. These icons depict whether this Service Check is inherited from a Host template or whether it was originally inherited from a Service Check and has since been ‘omitted’, i.e. ‘don’t apply this Service Check to this Host’. This ‘omit’ option is toggled by using the ‘Remove Service Check from Host Template’ option within the Service Check, and only becomes visible when the Service Check is checked in the left hand section.
If a Service Check is inherited from a Host template yet isn’t ‘checked’ in the left-hand section, the ‘Exceptions’ section will not be editable and the ‘Remove Service Check from Host Template’ toggle button will not be visible. To edit these items, checking the box next to the Service Check tells Opsview Monitor to look at this section for information on this Service Check instead of the Host Template.
The right-hand section of the Service Checks tab is populated with information and options relevant to the selected Service Check and is commonly referred to as the ‘Service Check information panel’.
When no Service Check is selected, this section will contain a message informing you to select a Service Check first.
When a Service Check is selected and checked in the left-hand tree panel, the Service Check information panel will show:
- Service Check name.
- Service Check description.
The ‘Service Check information panel’ also contains:
- Plugin and Macro Help buttons.
- Test Service Check drawer.
- Variables drawer.
- Exceptions drawer.
- Timed Exceptions drawer.
- Event handler drawer.
The ‘Test Service Check’ drawer is designed to provide the ability to test that a Service Check will perform as expected against the relevant Host. This saves time by reducing the cycle of submitting then applying the changes to Opsview Monitor to check the result.
The Plugin Help button will load a new modal window displaying the ‘Help file’ for the plugin:
The Macro Help button will load a new modal window displaying all of the host specific macros:
The ‘Test Service Check’ accordion allows you to test that the Service Check definitions would run correctly on this host:
Note
You cannot change the arguments used, so it will be testing the arguments defined for this active Service Check.
Host variables Copied
The ‘Variables’ drawer contains all variables that the Service Check may be using. Variables act like standard computer science ‘Variables’, in that you can configure ‘-p %PORT%’ instead of ‘-p 9200’ for the Service Checks’ argument. The benefit of this is that by using a Variable instead of hard coding the port, you can apply the Service Check to hundreds of Hosts and simply add the ‘%PORT%’ Variable to the Host’s variables to switch the port.
If a Service Check requires a Variable in order to successfully work, then the Variable will be listed within the ‘Variables drawer’. In the example Service Check below we are applying a Service Check to monitor the number of Bytes received for a MySQL database. The syntax for this Service Check is:
-H $HOSTADDRESS$ -u %MYSQLCREDENTIALS:1% -p %MYSQLCREDENTIALS:2% –metricname=Bytes_received
This means that the username field (-U) and the password field (-P) are located within the %MYSQLCREDENTIALS% attribute. By default, the Variables drawer will be empty. It will only be populated with the Variables required once you have pressed the ‘Test’ button:
If these Variables are not populated with global defaults (Configuration > Variables), then the Service Check will fail as there is no means to log in to the MySQL database in order to monitor it. If that is the case then you will need to click the ‘Add’ button next to the Variable, which will navigate to the ‘Variables’ tab and add a new Variable as below:
Here you will need to enter a value (not relevant to this Service Check so enter anything) and click ‘Save’. Once saved the ‘Host variable details’ panel will populate. Here you can now check both ‘Override username’ and ‘Override password’ and enter the correct login information for this database.
Once the correct information is added, navigate back to the Service Checks tab and click ‘Test’ again and if you have added the correct credentials, the Service Check should now successfully work:
You can now Submit Changes and use the Apply Changes to Opsview Monitor knowing that the Service Check will work when applied to production.
If you wish to change the actual plugin arguments themselves (i.e. add a warning/critical level (-w/-c) to the Service Check), then you can do so via the Exceptions drawer.
As covered earlier in this section, the Exceptions drawer will not be visible until the Service Check is checked in the left-hand tree pane. Once checked and the Exceptions drawer is opened, this will be displayed:
Tick the checkbox to confirm you want to amend the default arguments.
For the ‘MySQL Aborted Connections’ Service Check you may wish to amend the ‘-c’ option from 30 to 35 - use the ‘Plugin Help’ modal for direction on how to modify these arguments. Press Test to then run the amended command:
The ‘Timed Exception’ option works exactly the same, however, the defined arguments will not be ‘injected’ into the Service Check until the relevant time period begins.
The Event Handler accordion allows you to have a script execute when state changes occur for this Service Check on this Host. See the Event Handler documentation for more details.
SNMP Tab Copied
The SNMP tab is where you can configure SNMP credentials for a Host. For example, if you wish to use plugins or Service Checks which rely on SNMP then the relevant SNMP credentials will first need to be configured and tested within this section.
The tab is split into two sections:
- Enable SNMP
- Credentials
The Credentials section is visible only when Enable SNMP is enabled. Otherwise, this section remains hidden. Additionally, enabling SNMP makes the Interfaces tab visible, allowing you to query and configure the host interfaces for monitoring.
Credentials Copied
The Credentials section is where you should select the version of SNMP used, along with the relevant authentication information. For SNMP v1 or v2c, only the port and community string need to be specified. For SNMP v3, a port, username, authentication protocol, authentication password, privacy protocol and privacy password are all required.
On first entry to the Credentials section, the SNMP community string for v2c (for example) will say ‘SNMP community encrypted ’ click to reset’. This message is displayed as a secure, encrypted placeholder is used until a valid community string is set. Simply click on the button as directed and enter the community string in before clicking ‘Test SNMP Connection’ to ensure the credentials have been entered correctly. Once authentication data has been entered into the UI it cannot be retrieved (due to security), however, it can be reset.
For a breakdown of the relevant credential information see the tables below:
SNMP v1 and v2c Copied
Field | Details |
---|---|
SNMP Port | This defines the port number to connect to the SNMP device. Default is 161. |
SNMP Community | This defines the community string to connect to the SNMP device. This value will be encrypted in the Opsview Monitor database. After this value has been saved, it cannot be retrieved back in the user interface. If you want to change the value, click the Reset button to change it. |
SNMP v3 Copied
Field | Details |
---|---|
SNMPv3 Username | This defines the SNMPv3 username to connect to the SNMP device. |
SNMPv3 Authentication Protocol | This defines the SNMPv3 protocol to connect to the SNMP device to authenticate the User. Valid values are: md5, sha (SHA-1). |
SNMPv3 Authentication Password | This defines the SNMPv3 password to connect to the SNMP device to authenticate the User. This value will be encrypted in the Opsview Monitor database. After this value has been saved, it cannot be retrieved back in the user interface. If you want to change the value, click the Reset button to change it. |
SNMPv3 Privacy Protocol | This defines the SNMPv3 protocol to encrypt traffic between Opsview Monitor and the SNMP device. Valid values are: des, aes, aes128, aes256, aes256c. Note that the ‘aes256’ and ‘aes256c’ options are only fully supported on some operating systems (see SNMP Privacy Protocol Support). |
SNMPv3 Privacy Password | This defines the SNMPv3 password to encrypt traffic between Opsview Monitor and the SNMP device. If this is not set, then no attempt to encrypt traffic will take place. For devices using Net-SNMP, an empty privacy password will still allow connection to the device even if a privacy password is defined for a user. This value will be encrypted in the Opsview database. After this value has been saved, it cannot be retrieved back in the user interface. If you want to change the value, click the Reset button to change it. |
Note
For backwards compatibility with argument strings, any occurrences of
$$
will be replaced with$
when processed by the system to run checks. This includes text within SNMP community strings (v2c or v3).However, this behavior will be removed in a future version. We recommend using single
$
, which will be processed as it is. This ensures future compatibility and avoids potential issues.
Interfaces tab Copied
Enabling SNMP in the SNMP tab makes the Interfaces tab visible. This tab lists all available interfaces on the host. This list is gathered via SNMP, so correct credentials and a properly configured SNMP daemon are prerequisites.
Note
Please note that in order to monitor the interfaces of a Host, you must apply the ‘SNMP ’ MIB II’ Host template before a performing an Apply Changes. This template is comprised of the Service Checks ‘Interface Poller’, ‘Interface’, ‘Discards’ and ‘Errors’.
When a host has a large number of interfaces (1000+), it may take a long time to fetch the interface data from the host. By default this service check is given 120 seconds to execute.
This time limit can be modified in the Executor configuration, see Advanced Automated Installation and Executor for details (service_check_slow
).
To view the interfaces of a Host click on the ‘Query Host’ button which will populate the table with the available interfaces. There are a few options you may wish to modify before running the query:
Extended Throughput Data Copied
If this option is enabled then the Interface Service Check will also return unicast, multicast and broadcast performance data. This will be in the form of bits per second based on the interface speed.
SNMP Message Size Copied
Some SNMP devices can return a significant amount of data which fills the standard SNMP buffer size of around 500 octets. Many devices cannot cope with setting the maximum buffer size so this option allows the size to be tailored to each device. The units are in Kio which are multiples of 1024.
Use SNMP GetNext Copied
Recent SNMP devices use SNMP GetBulk to obtain information, which older devices do not support. This forces the use of the older protocol.
Use SNMP ifName Copied
Older SNMP devices only provide interface ID’s through ifDescr but this can often be duplicated. More recent devices allow the use of ifName instead. Note: if this is changed, the service check names will be different and history and graphs will be lost.
Modify ifDescr Level Copied
Some SNMP devices can have very long descriptions (ifDescr) for each interface on a device, mostly made up from common words. There is a limit in Opsview Monitor that this description shouldn’t exceed 52 characters otherwise monitoring the interface will not work as expected (a ‘duplicate interface’ error may be shown at the bottom of the screen). Setting this option can remove common words to reduce the length of each interface ifDescr and help to avoid duplicate interfaces.
The settings are as follows:
Setting | Words Removed |
---|---|
Off (default) | None |
Level 1 | ‘Nortel Ethernet’, ‘Nortel’, ‘Routing’, ‘Module’ |
Level 2 | Trailing spaces removed |
Level 3 | ‘PCI Express’, ‘Quad Port’, ‘Gigabit’, ‘Server’ |
Level 4 | ‘Corrigent systems’, ‘, , ' |
Level 5 | ‘Ethernet’, ‘Frontpanel’, ‘RJ45’, ‘1000BASE-T’, ‘- no sfp inserted’ |
Level 6 | ‘Avaya’, ‘Virtual’, ‘Services’, ‘Platform’ |
Levels are cumulative. Further levels may be added in the future. The level should not be changed once monitoring is working to prevent loss of historical data.
The table section of the ‘Interfaces tab’ has five main columns:
- Selection box: Check box; check this to monitor the interface. If you select an interface using the check box beside the name, Opsview Monitor will create a service for each interface after the Apply Changes is performed. This will monitor throughput, errors and discards. Use the checkbox in the column header to toggle all interface checkboxes.
- Interfaces to poll: The description of the interface.
- Alert Type: see Alert Type.
- Throughput: see Discards, Errors and Throughput Thresholds.
- Errors: see Discards, Errors and Throughput Thresholds.
- Discards: see Discards, Errors and Throughput Thresholds.
Discards, Errors and Throughput Thresholds Copied
For the discards, errors and throughput fields a threshold can be set. For any selected interface, if the cell is empty, the threshold value will be taken from the default line. If a cell is set to -, then no threshold will be set. This is equivalent to saying ‘I do not want to set a warning threshold’.
Throughput is monitored from the ‘multiple’ Service Check called Interface. This calculates the rate of throughput between checks and returns the input and output information. If the rate is above the threshold value, then an alert will be raised at the appropriate level.
Performance data will be returned based on the input and output rate in octets per second. If the threshold is specified as a percentage value, the performance data returned will be a percentage value instead.
If a percentage threshold is not specified and it is not possible to work out the interface speed (e.g. VLANs), then the plugin will return a WARNING with the message:
INTERFACENAME throughput (in/out) X bps/Y bps but has an interface speed of 0, so cannot check a percentage threshold You should set the threshold to be based on bits per second for this interface, rather than using a percentage threshold.
It is possible to use advanced syntax for more complicated threshold checking. For example:
IN 10:50% - alert if input throughput is below 10% or above 50% OUT 30000:50000 - alert if output throughput is below 30,000 bits/sec or above 50,000 bits/sec IN 10:50% and OUT 30:55% - alert if both input throughput is below 10% or above 50% and output throughput is below 30% or above 55% IN 10:50% or OUT 30:55% - alert if either input throughput is below 10% or above 50% or output throughput is below 30% or above 55% 40:60% - this is the same as IN 40:60% or OUT 40:60% 75% - this is the same as 0:75% which was the old behavior.
Most whitespace is ignored. Note that you cannot mix percentage and bits per second values in the same threshold.
Errors are monitored from the ‘multiple’ Service Check called Errors. This calculates the average number of errors per minute between checks, and returns the input and output error per minute information. If the rate is above the threshold, then an alert will be raised at the appropriate level. Performance data will be returned based on the input and output errors per minute.
Discards are monitored from the ‘multiple’ Service Check called Discards. This calculates the average number of discards per minute between checks, and returns the input and output error per minute information. If the rate is above the threshold, then an alert will be raised at the appropriate level. Performance data will be returned based on the input and output errors per minute.
Note
If there is a Host you do not want to monitor throughput, errors or discards on, you can simply remove the service check ‘Interface’, ‘Errors’ or ‘Discards’ from the Host.
Alert Type Copied
The Alert Type drop-down menu offers two options: Normal
or Dormant
. By default, this field is set to Normal
, indicating that the admin status and link status of the interface are expected to be up. Conversely, if the admin status and link status are expected to be down, the field should be set to Dormant
.
When the ‘Interface’ Service Check runs for an interface with the Normal
Alert Type set, it will report:
- CRITICAL status if admin status is up but link status is down.
- WARNING status if admin status is down but link status is up.
- OK status if admin status and link status are both down.
- A status according to the configured thresholds if admin status and link status are both up.
When the ‘Interface’ Service Check runs for an interface with the Dormant
Alert Type set, it will report:
- OK status if admin status is up but link status is down.
- OK status if admin status is down but link status is up.
- OK status if admin status and link status are both down.
- CRITICAL status if admin status and link status are both up.
For any Alert Type, the ‘Errors’ and ‘Discards’ Service Checks will report OK if an interface is down.
SNMP limitations Copied
You need to have SNMPv2c if you are monitoring an interface of 100Mbs or over. This is because SNMPv2 supports 64bit counters, but SNMPv1 doesn’t. If you use SNMPv1, your graphs are likely to have gaps in them.
Interfaces are monitored by name, so if the SNMP index position changes (which could happen on a router reboot), then a rescan of the device will occur to check (Opsview Monitor treats the SNMP index as an internal number which a system does not need to know about. By working with names only, Opsview Monitor can automatically follow any changes to the SNMP index position without human intervention).
If there are multiple interfaces with the same name, the ifIndex will also be passed to the plugin to check. If the ifName does not match the expect interface name for this ifIndex, an alert will be raised which says:
WARNING - Interface name $user_specified_ifname expected at index $user_specified_index, but got $name!
You will need to run Query Host to list the interfaces to check again.
Note
If the index moves to a position with the same interface name, then Opsview Monitor will not see a change and will continue monitoring this interface as usual even though it could be a different interface. If you have a Cisco router, please check this Cisco support article regarding ifIndex persistence.
Getting a ‘Cannot connect with’ error when running ‘Test SNMP Connection’ Copied
Aside from invalid credentials, Test SNMP Connection
may fail when the host’s IPv6 address takes precedence over its IPv4 address in /etc/hosts
. The SNMP daemon tries to connect with IPv6 but fails as it is only listening on IPv4 by default.
To fix this, you can configure the SNMP daemon to listen on IPv6. You can do this by specifying the agentAddress
directive in /etc/snmp/snmpd.conf
as:
agentAddress udp:<host_primary_ipv4_address>:161,udp6:[<host_primary_ipv6_address>]:161
In RPM-based Linux distributions, the SNMP daemon additionally requires an IPv6 mapping of SNMP community strings to security names. In such a system, the com2sec6
directive should be specified in /etc/snmp/snmpd.conf
.
com2sec6 <security_name> <source> <community>
Getting a ‘Cannot query host’ error when running ‘Query Host’ Copied
The Query Host command should run properly if the appropriate Object Identifiers (OIDs) are indicated in the view
directive of /etc/snmp/snmpd.conf
. Ensure that you have the correct OIDs corresponding to the MIB subtrees. For more information about MIBs, see MIBs for SNMP Traps and Gets.
view <view_name> <type> <OID>
Note
Restart
snmpd
.After making any changes to your SNMP configuration or after adding new MIBs, you need restart the
snmpd
service for the changes to take effect. Use the following command:systemctl restart snmpd
. For more information about SNMP configuration, see Manpage of SNMPD.conf.
Why aren’t my interfaces being monitored? Copied
The services are only created if the Host has the ‘SNMP ’ MIB-II’ Host template applied, or has the Interface Poller, Interface, Discards and Errors Service Checks associated to the Host directly via the ‘Service Checks tab’.
I’m getting thresholds that are over 100% Copied
For each interface, Opsview Monitor will work out the utilization of an interface based on the amount of bytes transferred as reported by SNMP divided by the time difference of the two values, as a percentage of the interface speed as reported by SNMPs ifSpeed counter.
There seem to be different reasons for why you can get over 100% utilization:
- The wrong ifSpeed is reported by the device. This can sometimes occur with Net-SNMP, but it is possible to set the speed correctly in the configuration file.
- Some speeds are not the maximum possible throughput. ifSpeed is defined as ‘An estimate of the interface’s current bandwidth in bits per second’.
- Full duplex may skew the results as you may be able to get more transfer in one direction than in another.
- Some devices only update the SNMP counters at certain intervals. This means you could see sudden spikes in utilization if Opsview gathers data at different intervals.
If you have interfaces that are consistently reporting more than 100% utilization, please contact Opsview Monitor Customer Success who can assist.
Plugin raises a warning about an interface with 0 speed Copied
If you get an error like:
INTERFACENAME throughput (in/out) 0 bps/0 bps but has an interface speed of 0, so cannot check a percentage threshold
When a threshold is specified as a percentage value, Opsview Monitor works out the percent utilization based on the speed. However, if the speed is zero, this is not possible.
Possible resolutions:
- The device is reporting the incorrect speed - contact the device manufacturer. If the device is a Unix server running net-snmp, you can force net-snmp to set a specific speed per interface.
- The interface is not valid for monitoring - uncheck the interface from being monitored.
- You still want to monitor the interface status - set the threshold to a dash (which means that no threshold check will be required) or set an absolute threshold rather than a percentage, so the speed check is ignored.
There are duplicate names in the interface SNMP table which has some limitations Copied
Interfaces are tracked by their name rather than their ID as provided by the device being monitored - this is because some devices reallocate ID’s on a reboot.
Opsview tracks these interfaces by fetching each interface ‘IfDescr’ and shortening it to 52 characters and storing it as the ‘short interface name’. This limit is the standard length of interface description supported by the majority of devices. This can appear to cause duplicate interface names however, if the IfDescr contains unnecessary duplicate text, i.e.
Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 1 Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 2 Nortel Ethernet Routing Switch 5510-48T Module - Unit 1 Port 3
would all be shortened to
Nortel Ethernet Routing Switch 5510-48T Module ’ Un
You can either reconfigure all the interface IfDescr’s on the device to only contain short unique names such as
5510-48T Unit 1 Port 13
And the re-running the ‘Query Host’ on the Host configuration SNMP page, or set the ‘Modify ifDescr Level’ option which attempts to remove certain ‘common’ words. See the section ‘INTERFACES’ above for more details.
Variables tab Copied
As mentioned in the ‘Service Checks’ section, ‘Variables’ are covered in great detail within their relevant User Guide section. However, in essence, they act like standard computer science ‘Variables’, in that you can configure ‘-p %PORT%’ instead of ‘-p 9200’ for an Elasticsearch Service Check’s arguments. The benefit of this is that by using a Variable instead of hard coding the port, you can apply the Service Check to hundreds of Hosts and simply add the ‘%PORT%’ variable to the Hosts who don’t have Elasticsearch on port 9200.
In our example above, we have added the ‘Database ’ MySQL’ Host template which requires the %MYSQLCREDENTIALS% variable to be populated with relevant username/password data.
This can be configured at a global level via ‘Configuration > Variables > %MYSQLCREDENTIALS%’, which means any Host that has Service Checks/Host templates using the %MYSQLCREDENTIALS% variable will use the values set here (the global ‘defaults’), however, if a Host has a different set of credentials you can choose to add the %MYSQLCREDENTIALS% locally via the ‘Variables’ tab. If the Variable is added to the Host locally, the values set here are used first.
For the Host in the screen above, you can choose to override the username/password with the custom, Host-specific ones by checking the ‘Override username’ and ‘Override password’ fields, respectively. The ‘Password’ field has been set to an ’encrypted’ one at the Variable level, which means once the value is overridden and the ‘Submit changes’ button has been pressed, the value entered cannot be retrieved ’ only overwritten.
NetAudit tab Copied
The NetAudit tab is an optional tab present only for Users who have purchased the Network Analyzer module for Opsview Monitor.
In this tab, Users can configure the settings needed in order to allow Opsview Monitor to log in to the Host and back up the network device’s configuration.
For more information, see NetAudit.
Amending Multiple Hosts (Bulk Edit) Copied
You can open the bulk edit window by pressing the edit button when more than one Host has been selected via the checkboxes:
A subset of the host fields will be available to be changed. Changes will be applied to all the hosts that were selected in the grid.
The fields will be pre-populated with values only if they are exactly the same for all the selected hosts.
When you submit, hosts will be updated in batches of 50 at a time - a progress bar will appear as changes are made.
The following fields can be modified:
Host Group Copied
Changes the Host Group of the selected hosts.
Monitored By Copied
Changes the monitoring cluster these hosts will be monitored by.
Note
This field will be hidden if this is a single server instance.
This field will be disabled if any of the selected hosts are of the following types:
- The main orchestrator host.
- A host used as a flow source.
Host Templates Copied
Changes the Host Templates for the selected hosts. See Multi Options.
Hashtags Copied
Changes the Hashtags for the selected hosts. See Multi Options.
Host Icon Copied
Changes the Host Icon for the selected hosts.
Host Description Copied
Changes the Host Description for the selected hosts.
Parents Copied
Changes the Parents for the selected host. See Multi Options.
Note
To avoid circular parent host definitions, only hosts that are not in the currently selected list of hosts can be chosen.
Host Check Command Copied
Changes the Host Check Command for the selected hosts
Check Period Copied
Changes the Check Period for the selected hosts.
Check Interval Copied
Changes the Check Period for the selected hosts.
Retry Interval Copied
Changes the Host check Retry Interval for the selected hosts.
Note
This value must be higher than the Check Interval for each host. If you set this too high, you may get a “Rollback: Error trying to synchronise object X” message.
Max Check Attempts Copied
Changes the Host check Max Check Attempts for the selected hosts.
Host Check Command Copied
Change the Host Check Command for the selected hosts.
Deleting a Host Copied
You can delete a host you no longer wish to monitor by clicking the host contextual menu and selecting the Delete option.
You will then be required to confirm the host deletion.
Deleting Multiple Hosts (Bulk Delete) Copied
This button will be enabled when hosts have been selected using the checkboxes and they can be deleted:
Some special hosts are not deletable, such as the Orchestrator, Collectors or Flow Sources - in these cases, these hosts will be ignored.
After confirmation, all selected hosts (minus any special hosts) will be deleted.
Multi Options Copied
All fields, bar Host Templates and Hashtags, are simply replacement actions, i.e. you can choose to enter a new Host Check Command for 10 Hosts, and on ‘submit’ the Host Check Commands for those Hosts is changed to the Host Check Command specified.
For Host templates and Hashtags, the actions are a little more powerful ’ as a Host can have no Host Templates/Hashtags, or alternatively multiple Host templates/Hashtags. Therefore, in the Bulk Edit mechanism for these two fields there are the following options:
- Add to existing: Choose this option to append a new Hashtag/Host Template to the Hosts.
- Clear field: Choose this option to remove all Hashtags/Host Templates from the Hosts.
- Replace with: Choose this option to remove all Hashtags/Host Templates from the Hosts and add the selected Hashtag/Host templates instead. Essentially, this action clears the fields and then adds the selected items.
- Find and remove: Choose this option to remove the selected Hashtags/Host templates from the selected Hosts. This option is, in essence, a ‘selective delete’.