Opsview 6.x Known Issues
Overview Copied
Opsview Monitor known issues contain a list of bugs and issues that may affect the performance of your applications and components.
Note
These known issues apply to both Opsview Monitor 6.x and Opsview Cloud version, except that Ubuntu 18 is only applicable in Opsview Monitor 6.x.
OS Specific Copied
Ubuntu 22 Copied
This is a known issue affecting Ubuntu 22 repositories when using apt
. When deploying or upgrading Opsview, the deployment process might stall on the Update apt-get
task within the setup-hosts.yml
playbook. This occurs because of a bug within apt
that could cause some collectors, like coll-4
in this case, to fail to respond.
TASK [Update apt-get] **********************************************************
Tuesday 13 August 2024 03:11:49 +0000 (0:00:00.471) 0:00:12.826 ********
changed: [lrl-u22-orch]
changed: [lrl-u22-coll-5]
changed: [lrl-u22-coll-1]
changed: [lrl-u22-coll-7]
changed: [lrl-u22-coll-9]
changed: [lrl-u22-coll-3]
changed: [lrl-u22-coll-6]
changed: [lrl-u22-coll-8]
changed: [lrl-u22-coll-2]
Further investigation of the affected collectors should show that the apt-get
update processes are hanging.
# ps -ef | grep apt
root 3589 3588 0 03:11 ? 00:00:00 sudo apt-get -y --force-yes update
root 3590 3589 0 03:11 ? 00:00:01 apt-get -y --force-yes update
_apt 3599 3590 0 03:11 ? 00:00:00 /usr/lib/apt/methods/http
_apt 3601 3590 0 03:11 ? 00:00:00 /usr/lib/apt/methods/gpgv
As a workaround, perform the following steps.
-
Terminate the deployment process within the orchestrator.
kill -9 <PID>
-
Then in each affected collector, kill all
apt
processes.kill -9 <PID>
-
Rerun the failed
opsview-deploy
command that encountered the problem as per the normal documentation.
Ubuntu 20 Copied
Re-enabling TLS 1.0 and TLS 1.1 Copied
Ubuntu 20 ships by default with TLS 1.0 and TLS 1.1 disabled. This means you may get errors when any OpenSSL libraries try to connect to external services.
Ideally, the external service should be upgraded to support TLS 1.2, but if that is not possible, then you can re-enable TLS 1.0 and TLS 1.1.
Warning
Warning: By doing this, you are reducing security.
To test the external service:
openssl s_client -connect SERVER:443 -tls1_2
This will fail if the external service does not support TLS 1.2.
To allow Ubuntu 20 to use TLS 1.0, edit /etc/ssl/openssl.cnf and add this at the top:
openssl_conf = openssl_configuration
Then add this at the bottom:
[openssl_configuration]
ssl_conf = ssl_configuration
[ssl_configuration]
system_default = tls_system_default
[tls_system_default]
MinProtocol = TLSv1
CipherString = DEFAULT:@SECLEVEL=1
Now, check that connections will work.
Ubuntu 18 Copied
This known issue is only applicable to Opsview Monitor 6.x.
Email notifications will cease to work and need a new package installed Copied
In Ubuntu 18, the notify_by_email
Notification Method will fail to work due to /usr/bin/mail
being deprecated and replaced with /usr/bin/s-nail
.
Installing bsd-mailx
resolves the issue.
apt install bsd-mailx
Non-RHEL 8 Copied
SNMP Polling Copied
SNMP Polling Checks do not support the aes256 and aes256c SNMPv3 privacy protocols when run on non-RHEL8 collectors. You may see an UNKNOWN state and an error message starting with the following if this is attempted:
External command error: Invalid privacy protocol specified after -x flag: aes256
or
External command error: Invalid privacy protocol specified after -x flag: aes256c
See SNMP Privacy Protocol Support for further details.
SNMP Traps Copied
SNMP Traps being sent using the aes256 and aes256c SNMPv3 privacy protocol options will not appear if received by non-RHEL8 collectors.
Older operating systems Copied
On CentOS 7, OEL 7, and RHEL 7 Copied
- Email notifications: Local mail subject lines will not display 4-byte UTF-8 characters correctly on CentOS 7, OEL 7, and RHEL 7.
- Plugins, event handlers, and notification scripts that contain 4-byte UTF-8 characters do not display correctly in the filesystem, but work properly in Opsview.
OS Generic Copied
UI Copied
- Once a User Role has been authorised for a Host Group, removing the authorisation later does not prevent them from viewing Hosts within the group in the Navigator or Network Topology maps.
- The Network Topology maps do not allow a User Role with the VIEWALL permission to view all Hosts unless the Role is additionally enabled for All host groups and All Service Groups on the Status Objects permissions tab.
Upgrade and installation Copied
- After an upgrade, Cluster health monitoring may report a degradation of service. This can be caused by duplicate orchestrator processes that have become stuck. To fix, stop the orchestrator component via watchdog, then run
pkill -f "orchestratorlauncher$"
to clean up any hanging processes, and then restart the orchestrator via watchdog. If thepkill
did not stop theorchestratorlauncher
processes, you can runpkill -9 -f "orchestratorlauncher$"
instead. - The
opsview-deploy
package needs to be upgraded before runningopsview-deploy
to upgrade an Opsview Monitor System. - Changing the flow collectors configuration in Opsview Monitor currently requires a manual restart of the flow-collector component for it to start working again.
- At upgrade, the following are not preserved:
- Downtime — we recommend that you cancel any downtime (either active or scheduled) before you upgrade or migrate. Scheduling new downtime will work fine.
- Flapping status — the state from pre-upgrade or migration is not retained but if the host or service is still flapping, the next checks will set the status to a flapping status again.
- Acknowledgements — at the end of an upgrade or migration, the first reload removes the acknowledgement state from hosts and services. Any further acknowledgement will work as usual.
- If you use an HTTP proxy in your environment, the TimeSeries daemons may not be able to communicate. You can work around this by adding
export NO_PROXY=localhost,127.0.0.1
environment variable (this is in upper case, not lower case) to the Opsview user.bashrc
file. - Hosts and services in downtime will appear to stay in downtime even when it is cancelled. You can work around this issue by creating a new downtime, wait until it starts and then cancel it, or add a downtime that lasts only for 5 minutes, and let it expire naturally.
- The
opsview-messagequeue
may occasionally fail to upgrade correctly when runningopsview-deploy
. See MessageQueue Troubleshooting for steps to resolve the issue. - The
sync_monitoringscripts.yml
playbook fails to execute whenever the SSH connection between the host whereopsview-deploy
is being run and the other instances is reliant on a user other than root and we only define the private SSH key using theansible_ssh_private_key_file
property inopsview_deploy.yml
. This happens because the underlying rsync command is not being passed the private SSH key and thus fails to connect to the instances. As a work around, add this in the root SSH configs. Consider the following example:
# If you use ansible_ssh_private_key_file on the opsview_deploy.yml file
(...)
collector_clusters:
cluster-A:
collector_hosts:
ip-172-31-9-216:
ip: 172.31.9.216
user: ec2-user
vars:
ansible_ssh_private_key_file: /home/ec2-user/.ssh/ec2_key
ip-172-31-5-98:
ip: 172.31.5.98
user: ec2-user
vars:
ansible_ssh_private_key_file: /home/ec2-user/.ssh/ec2_key
(...)
# You need to add the following entries to /root/.ssh/config
Host ip-172-31-9-216 172.31.9.216
User ec2-user
IdentityFile /home/ec2-user/.ssh/ec2_key
Host ip-172-31-5-98 172.31.5.98
User ec2-user
IdentityFile /home/ec2-user/.ssh/ec2_key
Plugins Copied
- There is no automated mechanism in this release to synchronize scripts between the Opsview Monitor Orchestrator and Collector Clusters. A
sync_monitoringscripts.yml
deploy playbook is provided to fulfil this purpose but it must be run manually or from cron on a regular basis. check_wmi_plus.pl
may error relating to files within your/tmp/*
directory due to the ownership of these files needing to be updated to the Opsview user. Seen when upgrading from an earlier version of Opsview, as the nagios user previously ran this plugin.- Opsview Golang plugins no longer support legacy certificates. If you encounter the following error:
x509: certificate relies on legacy Common Name field
, use SANs instead and recreate the certificates used on your monitored services or devices accordingly, using the SAN x509 extension.
Modules support Copied
SMS Gateway is not available in this release. If you rely on this method, please contact ITRS Support.
- Collectors and clusters.
- Despite the UI or API currently allowing it, you should not set parent or child relationships between the collectors themselves in any monitoring cluster; collectors do not have a dependency between each other and are considered equals.
- When trying to Investigate a host, if you get an Opsview Web Exception error with
Caught exception in Opsview
message, this could be an indicator that the Cluster monitoring for that host has failed and needs you to address it.
Databases Copied
- All database users created by Opsview will use the
mysql_native_password
authentication plugin (for MySQL 8, the default is usuallycaching_sha2_password
). - The nightly backups of the Opsview and runtime database are now based on the MySQL server’s preferred format, rather than a
mysql40
compatible mode. - When using
utf8mb4
, the collation difference fromlatin1
means some rows may come back in a slightly different order (forlatin1
,check_snmp_weblogic_jmsqueuelength
,check_snmp_weblogic_jsm_dests
but utf8mb4 will be the other way round). - When using
mysqldump
with an external database and the--set-gtid-purged=off
is not set can lead to dump failures with the error:Couldn't execute 'FLUSH TABLES': Access denied; you need (at least one of) the RELOAD or FLUSH_TABLES privilege(s) for this operation
MySQL RPM Repository Key Copied
The MySQL RPM Repository Key stored within the product has expired. This has been fixed in a later version of Opsview Monitor, but it can be amended locally without upgrading.
For APT based systems, edit /opt/opsview/deploy/lib/roles/opsview_database/vars/apt.yml
on the Orchestrator, search for the line repo_key_id
and amend as follows:
mysql:
...
repo_key_id: 3A79BD29
For RPMO based systems, edit /opt/opsview/deploy/lib/roles/opsview_database/vars/yum.yml on the Orchestrator, search for the line repo_key_id and amend as follows:
mysql:
...
gpgkey: http://repo.mysql.com/RPM-GPG-KEY-mysql-2022
REST API Copied
REST API config/OBJECT
list calls: The ordering of results when using MySQL 8 is not necessarily deterministic, so REST API calls may need to specify a subsort field. For example, for hosts, order=hostgroup.name
is not sufficiently deterministic and will need to be order=hostgroup.name,id
so that the results come back in a fixed order.
Other issues Copied
- If restoring from the Audit Log returns the message
Restore failure: A restore is already in progress
incorrectly, then it is possible that a prior restore attempt failed to complete. Issues found will be logged in/opt/opsview/coreutils/var/log/db_restore.log
. To force the system to allow a new restore attempt, delete the file/opt/opsview/coreutils/var/restore_in_progress.lock
. - Once a User Role has been authorised for a Host Group, removing the authorisation later does not prevent them from viewing Hosts within the group in the Navigator or Network Topology maps.
- The Network Topology maps do not allow a User Role with the
VIEWALL
permission to view all Hosts unless the Role is additionally enabled forAll host groups
andAll Service Groups
on the Status Objects permissions tab. - There is no option to set a new home page via the UI yet. For new installations, the home page is set in the Configuration > Navigator page.
- Start and End Notifications for flapping states are not implemented in this release (when a Host or Service are flapping all notifications will be suppressed).
- Deploy cannot be used to update the database root password. Root user password changes should be made manually and the
/opt/opsview/deploy/etc/user_secrets.yml
file updated with the correct password. - When a Host has been configured with two or more parents and all of them are
DOWN
, the Status of the Services Checks on the host is set toCRITICAL
instead ofUNKNOWN
. Consequently, the Status Information is not accurate either. - If an Opsview Monitor system is configured to have UDP logging enabled in
rsyslog
, RabbitMQ will log atINFO
level messages toopsview.log
and syslog with a high frequency — 1 message every 20 seconds approximately. - Some components such as
opsview-web
andopsview-executor
can log credential information when in Debug mode. - When running an Autodiscovery Scan via a cluster for the first time, there must be at least one host already being monitored by that cluster. If the cluster does not monitor at least one host, the scan may fail with this message:
Cannot start scan because monitoring server is deactivated
. - You may get occasional errors appearing in syslog, such as:
Nov 28 16:31:50 production.opsview.com opsview-datastore[<0.6301.0>] req_err(2525593956) unknown_error : normal#012 [<<"chttpd:catch_error/3 L353">>,<<"chttpd:handle_req_after_auth/2 L319">>,<<"chttpd:process_request/1 L300">>, <<"chttpd:handle_request_int/1 L240">>,<<"mochiweb_http:headers/6 L124">>,<<"proc_lib:init_p_do_apply/3 L247">>] # You can ignore them as there is no operation impact.
- In order to get the SNMP Traps working on a hardened environment the following settings need to be changed:
# Add the following lines to /etc/hosts.allow snmpd:ALL snmptrapd:ALL # Add the following lines to hosts.deny snmpd: ALL: allow snmptrapd: ALL: allow
- Using Delete All on the SNMP Traps Exceptions page may sometimes hide new ones as they come in. They can by viewed again by changing the page size at the bottom of the window to a different number.
- CPU utilization is sometimes high due to the datastore.
Autodiscovery Copied
- If an Infrastructure Agent is detected by Autodiscovery and imported, but has TLS disabled, Service Checks will fail to run against the Agent. This is because the
-n
flag is required for thecheck_nrpe
command to run in non-TLS mode. To configure this, add the-n
flag to theNRPE_EXTRA_FLAGS
variable on the affected hosts. - Autodiscovery cannot detect Infrastructure Agents that are using custom certificates. This means that you cannot use Autodiscovery to automatically set up monitoring for these agents. Instead, you will need to set up monitoring manually. To do this, you will need to copy the custom certificates to the monitoring servers and use the
NRPE_CERTIFICATES
variable on the affected hosts. This variable specifies the paths to the correct certificates.
AutoMonitor Copied
- When an AutoMonitor Windows Express Scan is set with a wrong, but still reachable, Active Directory Server IP or FQDN, the scan could remain in a pending state until it times out (
1 Hour
is the default value). This means that no other scans can run on the same cluster for that period of time. This is due to PowerShell not timing out correctly. - Automonitor automatically creates the Host Groups used for the scan: Opsview > Automonitor > Windows Express Scan > Domain. If any of these Host Groups already exist elsewhere in Opsview Monitor, then the scan will fail. If one of the Host Groups is moved, then it should be renamed to avoid this problem.
- If you have renamed your Opsview top level Host Group, the Automonitor scan will currently fail. You will need to rename this or create a new Opsview Host Group in order for the scan to be successful.
- Automonitor application on logout will clear local storage which means that if a scan is in progress and a user logs out, when the user logs in they will not see that scans progress even if it is still running in the background.
- Any services already in dependency failure before upgrading to this release will not return to their previous state when leaving dependency failure, since that state will not have been saved. They will remain down until the next check occurs, as per the existing behaviour. However, any services that go into dependency failure after the upgrade has completed will follow the new recovery behaviour, as documented in Important Concepts.
Opspacks Copied
- Due to changes made to the Windows Active Directory Opspack, Windows hosts must now have a version of Powershell equal to or a higher than version 5.0.
- Due to the same Active Directory Opspack changes,
setup-opsview.yml
must be re-run to import the new Opspack plugin changes. - A reload must also be carried after to propagate the argument changes through to the collection plan for the Schedulers.
- Windows Active Directory Opspack checks may increase CPU usage on the target Windows servers when running checks.
- Windows WMI - Base Agentless - LAN Status Servicecheck: Utilization values for Network adaptors byte send/byte receive rates are around 8 times lower than expected. Therefore, warning and critical thresholds should be adjusted accordingly as a workaround.
- Cloud - AWS related Opspacks: The directory
/opt/opsview/monitoringscripts/etc/plugins/cloud-aws
, which is the default location foraws_credentials.cfg
file, is not created automatically by Opsview. Therefore, it needs to be created manually. - If
opsview_tls_enabled
is set tofalse
, the Cache Manager component used by Application - Kubernetes and OS - VMware vSphere Opspacks will not work correctly on distributed environments. - Hardware - Cisco UCS: If migrating this Opspack over from an Opsview 5.x system, it may produce error
Error while trying to read configuration file
orFile "./check_cisco_ucs_nagios", line 25, in <module> from UcsSdk import * ImportError: No module named UcsSdk
. - If this is seen, then running the following will resolve the issue. Place config file
cisco_ucs_nagios.cfg
into the plugins path/opt/opsview/monitoringscripts/plugins/
.
# as root
wget https://community.cisco.com/kxiwq67737/attachments/kxiwq67737/4354j-docs-cisco-dev-ucs-integ/862/1/UcsSdk-0.8.3.tar.gz
tar zxfv UcsSdk-0.8.3.tar.gz
cd UcsSdk-0.8.3
sudo python setup.py install
- Opsview - Login is critical on a rehomed system. Resolve this by adding an exception to the Servicecheck on the Host specifying
/opsview/login
as the destination rather than/login
.
Character support Copied
- Some characters may not display correctly. For more information about the current limitations, see Supported Unicode Characters.
- While correctly backed up during daily backups or after an Apply Changes, any 4-byte UTF-8 characters (outside the Basic Multilingual Plane) may end up being corrupted when restoring a database backup. Although they will appear as
????,
these corrupted characters can be fixed by manually updating the database. An unzipped backup file can be used to obtain the original characters before the restore if necessary. See Recovering from a Database Backup for further details. - Opsview Reporting Module: Report Chart legends will not display 4-byte UTF-8 characters correctly, and 4-byte UTF-8 characters are not supported in the report names.
- Email notifications: Emails sent by Opsview that contain non-ASCII UTF-8 characters may get blocked by mail relays with an error similar to
status=bounced
(SMTPUTF8
is required, but was not offered by hostmail.example.com[10.10.10.10]
), if the relay server does not advertise UTF-8 support. Additionally, local mail subject lines will not display 4-byte UTF-8 characters correctly on CentOS 7, OEL 7, and RHEL 7. - Netflow Dashlets and Autodiscovery: If international domain names are picked up and resolved, these may appear corrupted. This means that names in the Autodiscovery sandbox could contain corrupted characters. However, these can be updated manually before importing into Opsview.
- Plugins, event handlers, and notification scripts that contain 4-byte UTF-8 characters do not display correctly in the filesystem but work properly in Opsview.
- Syslog messages may display some Unicode characters as UTF-8 bytes or Unicode code points (for example,
\x{1F649}
). - Some Unicode characters that are particularly tall may be cut off at the top and bottom in some fields.
- Results Exporter: Regex filtering does not support Unicode categories (but code points still work), and file outputs do not export Unicode characters correctly.
- If you have multi-service checks where the generated service check name (with the host variable) has a URL encoded filename exceeding 255 bytes, the performance data stored in RRD from this service check will not be recognized or visible in graphs.
- Exporting as CSV (for example, Events Viewer) will include UTF-8 characters, but this may not import into Excel correctly.
- When defining collector names in
opsview_deploy.yml
, only ASCII characters can be used. However, they can be renamed within the Opsview UI when registering the collectors to clusters. - When Hosts are created with names containing 4-byte UTF-8 characters, the Host name uniqueness checks may not work correctly and could consider different names as duplicates.
- When Hosts are created with names containing 4-byte UTF-8 characters, the Navigator and Host Group configuration pages may not display correctly if viewed in the Firefox browser.
SNMP Traps Copied
SNMPTraps daemons are started on all nodes within a cluster. At start-up a master SNMP trap node is selected and is the only one in a cluster to receive and process traps. Other nodes silently drop traps.
The majority of SNMPTrap sending devices can at most send to 2 different devices.
The current fix is to manually pick two nodes in a given cluster to act as the SNMP trap and standby node. Then mark all other nodes within the cluster to not have the trap daemons installed, for example:
collector_clusters:
Trap Cluster:
collector_hosts:
traptest-col01: { ip: 192.168.18.53, ssh_user: centos }
traptest-col02: { ip: 192.168.18.157, ssh_user: centos }
traptest-col03: { ip: 192.168.18.155, ssh_user: centos, vars: { opsview_collector_enable_snmp: False } }
traptest-col04: { ip: 192.168.18.61, ssh_user: centos, vars: { opsview_collector_enable_snmp: False } }
traptest-col05:
ip: 192.168.18.61
ssh_user: centos
vars:
opsview_collector_enable_snmp: False
On a fresh installation the daemons will not be installed.
On an existing installation the trap packages must be removed and the trap demons on the 2 active nodes restarted to re-elect the master trap node:
# INACTIVE NODES:
CentOS/RHEL: yum remove opsview-snmptraps-base opsview-snmptraps-collector
Ubuntu/Debian: apt-get remove opsview-snmptraps-base opsview-snmptraps-collector
# ACTIVE NODES:
/opt/opsview/watchdog/bin/opsview-monit restart opsview-snmptrapscollector
/opt/opsview/watchdog/bin/opsview-monit restart opsview-snmptraps
Opsview Reporting Module Copied
On upgrade to the latest version of Reporting Module, email settings will need to be reapplied.
Email configuration can be found in the file: /opt/opsview/jasper/apache-tomcat/webapps/jasperserver/WEB-INF/js.quartz.properties
.
To configure email, edit the following lines in the configuration to match your required configuration.
Example configuration for internal email:
report.scheduler.mail.sender.host=localhost
report.scheduler.mail.sender.username=admin
report.scheduler.mail.sender.password=password
report.scheduler.mail.sender.from=admin@localhost
report.scheduler.mail.sender.protocol=smtp
report.scheduler.mail.sender.port=25
Example configuration for SNMP relay:
report.scheduler.mail.sender.host=mail.example.com
To apply changes, you will need to restart opsview-reportingmodule:
/opt/opsview/watchdog/bin/opsview-monit restart opsview-reportingmodule
- When accessing any URL under
/jasperserver
on an Opsview system without a valid session, a401
response is returned rather than a redirect to the login page. Users must navigate to the login page manually and login again in this case. - Running reports within the Jaspersoft Studio IDE when connected to the Opsview Reporting Module currently results in
401
error. See Running Reports in Jaspersoft Studio for alternatives. - Using Jaspersoft Studio when connected to the Opsview Reporting Module can quickly use up available sessions. See Session Manager Config for Jaspersoft Studio for mitigation details.
Apache Log4Shell Vulnerabilities Copied
All listed Log4j vulnerabilities are resolved in Opsview.