Planning your system
In this section, we discuss several design considerations to aid in planning your Opsview Monitor system, such as scalability considerations, resilience, disk partitioning and security.
Later in this section, we discuss how Opsview Monitor uses databases.
Achieving scalability Copied
When deploying your Opsview Monitor server, you should bear in mind the variables that may affect your system and how many Hosts can actually be monitored, As such, you need to be mindful of the following factors:
- Number of Service Checks per Host.
- Median interval for Service Checks.
- Type of checks being executed, that is, quality of plugin code, local execution vs. agent queries and so on.
- Network latency.
Typically we recommend 300 Hosts as a comfortably manageable limit for a single Opsview Monitor server, however, there are a number of assumptions made in making this recommendation. These assumptions result in approximately ten Service Checks per second being executed by the monitoring server.
- 10 Service Checks per Host (average).
- Five minute interval per Service Check (average).
- The majority of Service Checks are made against a remote agent; for example, Nagios® Remote Plugin Executor (NRPE) or Simple Network Management Protocol (SNMP).
- The majority of monitored Hosts are on the same Local Area Network (LAN).
- That your system specification is a modest physical or virtual server with 4-8 CPU cores and 16GB RAM
With appropriate tuning and use of better hardware, however, a single server can typically be made to scale well beyond 300 hosts.
Opsview Monitor Distributed Monitoring architecture can also be used to monitor a much larger number of Hosts.
Service checks Copied
When designing a system, one of the most important metrics for consideration is ‘service checks per second’, which is a factor of both the total number of checks configured as well as the interval between those checks. Generally, we recommend no more than around 20 service checks per second on a single machine. For example, if we have around 2000 hosts with ten checks per host using a five minute interval, this will clearly exceed our recommended checks per second, as shown in the example below:
An example configuration, which exceeds our recommended checks per second:
2000 (hosts) * 10 (service checks) / 300 (seconds) = 66 (service checks per second)
To achieve a comfortable rate and to bring down the checks per second within our recommended guidelines, we would need to attach three collectors to the master server, which will achieve a rate of 22 checks per second per collector. Moreover, if we utilize each CPU core of our collector systems to handle a separate worker thread, we can further divide our checks per second (66) by the number of cores our collector servers possess. For example, if we have 2 x dual core CPUs in our collector servers, this further reduces the number of checks per second for each core to 11, as we show below:
Utilising CPU cores can further reduce checks per second:
66 (service checks per second) / 4 (number of cores) = 16.5
Achieving resilience Copied
The Opsview Monitor distributed architecture combines both scalability and resilience; however, resilience can, in fact, be effectively enhanced by ‘doubling’ the components that comprise your system, as we demonstrate in the following list.
- Master server (active)
- Master server (standby)
- Database server #1: ‘opsview’ and ‘runtime’, replica of ‘odw’
- Database server #2: ‘odw’, replica of ‘opsview’ and ‘runtime’
- Collector clusters:
- Collector cluster #1
- Collector cluster #2
- Collector cluster #3
- Collector cluster #4
- Collector cluster #5
When assessing a collector cluster, you should allow the possibility of at least one node failure. If a collector cluster is nearing capacity, then the failure of one node may cause other nodes to exceed capacity. Also, there should always be an odd number of nodes within a collector cluster; 1, 3, 5, etc. This is to help with resiliency and to avoid split-brain issues when clustering the components on the servers.
Note
It is not possible to run two Opsview Monitor master servers in active/active configuration, only active/passive. Running a second master node in either High Availability or Disaster Recovery configuration requires an HA or DR subscription from Opsview.
Disk partitioning Copied
In this section, we detail the sizes of the disk partitions that are needed to operate the Opsview Monitor software and its software dependencies.
Opsview system Copied
In general, one large root partition is sufficient, although we recommend the root and /var
directories have at least 1 GB of disk space available. In the following list, we provide further information about other areas and their recommended sizes.
root
: We recommend at least 2GB is set aside for the operating system, allowing for any upgrades and so on./boot
: We recommend a separate boot partition of at least 256 MB./opt/opsview
: All Opsview Monitor software and runtime data are stored in this location. We recommend a minimum of 10 GB.
Temporary directory Copied
Opsview Monitor uses a temporary directory (/tmp
by default) when running opsview-web
and other related applications. You can set a system level environment variable, namely TMPDIR=/
, if you wish to use an alternate area.
Database system Copied
The database can either be located on the master or on a separate server. Nonetheless, in both instances, the Opsview Monitor database and backups are located in the /var directory and we provide our recommended size below.
/var
: We recommend that this directory has more than 100GB available when used in conjunction with the ODW; however, if ODW is not used, then 50 GB is sufficient for small and medium sized systems.
Backups Copied
Opsview Monitor uses cron to run a configuration backup job at 23:00 every night. The scope of the backups is to be able to aid restoring all configuration on a system after the required packages have been installed.
You will need to design your own backup strategies for long term archival of data.
Invoking a backup Copied
You can invoke an adhoc backup by running as the opsview user:
/opt/opsview/bin/backup_configs
This will backup all files within the directories: /opt/opsview/*/etc/
.
Full Opsview offline backup Copied
You can do a full offline backup by taking the following steps:
- Shutdown Opsview Monitor on Orchestrator.
- Shutdown Opsview Monitor on all Collectors.
- Backup filesystem
/opt/opsview
on all Opsview Monitor servers. - Stop MySQL on database server.
- Backup MySQL data files.
This will back up all data regarding Opsview Monitor so you can restore to this point in time
Security Copied
In this section, we highlight several security aspects of the Opsview Monitor system.
Web authentication Copied
Opsview Monitor’s web authentication uses the Opsview Session Manager to authenticate and manage sessions.
You can configure the amount of time that a web session can run before it expires. This is controlled by these two variables in user_vars.yml
:
opsview_web_session_timeout_secs: 86400 # Timeout for the web front end, default 24 hours
opsview_rest_api_session_timeout_secs: 3600 # Timeout for the REST API token, default 1 hour
Then run as root:
/opt/opsview/deploy/bin/opsview-deploy /opt/opsview/deploy/lib/playbooks/setup-everything.yml
For a more secure environment, set:
opsview_web_session_timeout_secs: 1800 # Web browser with 30 minute timeout
opsview_rest_api_session_timeout_secs: 1800 # REST API should have the same value
Web exceptions Copied
We recommend you leave the include_error_detail
value to the default 0
, as enabling this option can cause environmental information to be sent to users when a web application exception occurs.
Network Copied
Your Opsview Monitor server should be placed in a secure location. If your server is accessible through a public network, we recommend using a firewall to restrict access to various ports, see Ports.
Whitelisting Copied
If necessary you may need to whitelist the following URLs for Opsview purposes:
https://downloads.opsview.com/
https://deploy.opsview.com/
https://opsview-repository.s3-eu-west-1.amazonaws.com
Proxy Copied
Ensure you do not have any proxies set up in your user environments that could affect the opsview
user. Proxies should be configured just within your package management software (such as /etc/yum.conf
on RHEL/OL or /etc/apt/apt.conf.d/proxy
on Debian/Ubuntu).
Agents Copied
Infrastructure Agents are applications which run on a host to be monitored, and which will return status or performance metrics when requested. Agents can be contacted by the master or collector system (or clients) using certificates and ciphers to encrypt communication. Infrastructure Agents only permit strong ciphers, such as ADH-128 and ADH-256 to be accepted (where the OS provides a suitable version of OpenSSL).
For additional security, we recommend using firewall rules to restrict which servers can connect to the agent. You can also use the allowed_hosts
variable in the agent configuration to limit connections to only the monitoring servers.
Infrastructure Agents also support Secure Socket Layer (SSL) certificates. See Agent Security for more details.
Security Wallet Copied
Opsview Monitor has a feature called Security Wallet, which allows you to avoid storing clear text passwords for external systems in the filesystem and database. The user interface will not display any stored passwords and it will not be possible to retrieve any passwords once they have been set.
The master key file is stored in /opt/opsview/coreutils/etc/sw.key
and is randomly generated on installation.
If this key file is lost, then all passwords will need to be re-entered.
Where possible, plugins have been updated to hide any sensitive arguments in their command line.
Note
If you have any passwords stored in the Audit Log or any log files before an upgrade, these will not be altered. However, no passwords will be added in any new audit log entries.
Variables Copied
Opsview Monitor allows specific arguments to be marked as encrypted. When this has been chosen, a message will appear to confirm that this is what you want to do.
When you save, the default arg value will be encrypted for the Variable object and all related Host attributes will be encrypted as well. If you decide to mark the arg as unencrypted, then the argument will be cleared from this attribute and all related Host attributes. There is no way in the UI to recover these arguments.
On a new install of Opsview Monitor, we will set the Password argument of the following attributes to be encrypted:
- MSSQLCREDENTIALS
- MYSQLCREDENTIALS
- ORACREDENTIALS
- VMWAREGUESTCREDENTIALS
- VMWAREHOSTCREDENTIALS
- WINCREDENTIALS
On an upgrade, none of the existing Variable configuration will be encrypted. We recommend you manually convert the above Variable args to be encrypted.
Importing new Monitoring plugins or Opspacks Copied
For maximum security, we recommend you disable the uploading of Monitoring Plugins and Opspacks (the default for new installations) - see Advanced Automated Installation.
You can still import Opspacks through the command line, but importing Monitoring Plugins using the command line is no longer supported.
System setting: Time zone Copied
You can set the Linux/Unix server to any time zone; however, we recommend you set the time zone to be UTC. Opsview Monitor will show the time in the browser based on the browser’s time zone.
All data that is stored in files and databases will be time stamped in UTC format for consistency.
If you have changed the time zone of your Linux/Unix server, you will need to restart your system so that all services are aware of the update.
Users and Groups Copied
The Opsview Monitor installation will create the opsview
system user and group if it does not already exist.
If you use an external provider for authentication, the opsview
user and group should be configured as a local user to remove the dependency on your external authentication provider.
The Opsview Monitor installation will update the .profile (or .bash_profile) in the user’s home directory to source /opt/opsview/coreutils/bin/profile
to set several Opsview Monitor environment variables.
Databases Copied
Opsview Monitor uses four databases, as described in the table below:
Database | Description |
---|---|
opsview | Monitoring configuration and access control. |
runtime | Status data and short-term history. |
odw | Long-term retention of data. |
dashboard | State information about the Dashboard application. |
Both the opsview and runtime databases must be on the same server, whereas the odw and dashboard databases can be located on separate servers. In fact, some performance improvements can be achieved by locating the odw and dashboard databases on separate servers. For more information see section Databases on a Different Server.
New installations of Opsview Monitor will use randomly-generated passwords to connect to the databases. These passwords will be encrypted, so it will not be possible to decrypt these credentials to use to connect to the databases.
If you need to connect to the databases to run your own queries, we recommend that you create your own accounts for this purpose. Also, you can restrict the amount of information that would be available to this account, by limiting the tables that can be queried.
Database storage Copied
Supported databases Copied
For a list of supported databases in Opsview, see Supported databases.
Performance tuning Copied
It is recommended that you install MySQL prior to installing the Opsview Monitor software since you can then ’tune’ your MySQL database server before Opsview Monitor creates its necessary databases.
Note
Make a note of your MySQL root password, as you will be prompted for this during the installation process.
Add these entries in the mysqld section of the my.cnf
file:
innodb_file_per_table=1
innodb_flush_log_at_trx_commit=2
These options will cause transactional changes to be flushed to disk once per second, as opposed to on a ‘per transaction’ basis; see InnoDB Startup Options and System Variables, for more information. Finally, on a dedicated MySQL server, you should set the innodb_buffer_pool_size
as large as possible, leaving approximately 15% free memory for the operating system; see below:
innodb_buffer_pool_size=1G
MySQLReport is a useful tool to evaluate the performance of MySQL. To tune MySQL, edit these values in the mysqld
section of /etc/mysql/my.cnf
(OS dependent) and restart mysqld
.
Good starting values for a small database server with 2-4 GB of memory are:
- table_cache = 768 (check tables opened/sec in mysqlreport)
- query_cache_size = 16M (this should not be any higher due to limitations in mysql - see this post)
- key_buffer = 256M
- innodb_buffer_pool_size = 1024M
- innodb_file_per_table = 1
- innodb_flush_log_at_trx_commit = 2
- innodb_autoinc_lock_mode=1 # Required for replication with MySQL 5.1 or later
- max_allowed_packet = 16M
- binlog_format = ‘MIXED’ # when using binary logs in replication
- max_connections = 150
- tmp_table_size = 64M # To allow each connection to sort tables in memory. Maximum possible is max_connections x tmp_table_size
- max_heap_table_size = 64M # Set to the same as tmp_table_size
You can see the current values with mysqladmin variables
. You may want to consider starting mysqld
without name resolution.
You can use the /opt/opsview/coreutils/installer/opsview_preupgrade_check mysql_variables
script to see if there are any variables that need changing. Obviously, values will depend on the resources available on your server, so this acts only as a rough guideline.
Note
The recommendations for MySQL server variables depend on your system and what other services run on it, so you have to exercise judgement when changing your system. Make sure that you do not over-commit resources to MySQL because if it causes the server to go into swap space, this will reduce the performance of MySQL.
The crashed tables check may take a while to run. More information about the innodb parameters are on the MySQL documentation site.
General hints for performance tuning Copied
Check iostat -x 5
. This gives I/O statistics per disk. You could have a low overall I/O wait time, but it could be due to a single disk being used 100% of the time.
- For maximum I/O, you should stripe the disks so that all disks are being utilized.
- You should use separate disks for data files and index files - this improves read and write times.
- You should use a fast disk for redo logs.
Databases on a different server Copied
In this section, we describe how you can set up Opsview Monitor to use databases on a different server rather than exclusively running them on the Opsview Monitor master. As a result of undertaking this process, there will be an outage of the Opsview Monitor server during backup and restore. We recommend that you undertake a routine test to better understand how long the restore process will take prior to committing.
Set up MySQL on a database server Copied
The database server only requires MySQL to be installed, along with its dependencies (the infrastructure-agent may be installed if you are not using SNMP to monitor the server). You should ensure that mysql is listening on the appropriate port and that other Opsview-specific configurations have been applied.
Stop Opsview Copied
You will need to stop Opsview Monitor to achieve a consistent snapshot of the database and, as such, in the example below, we show you how to stop Opsview Monitor.
sudo /opt/opsview/watchdog/bin/opsview-monit stop all
Note
All commands should be run theopsview
user unless otherwise stated.
Backup your databases Copied
Here, we assume that you have undertaken a full database export; however, you may, of course, use any other application to back up your MySQL databases. You should be aware that if the new database server is located on a different architecture, that is, 32- or 64-bit, then you will need to export your database, as shown in the example below:
Now run the below command as root making sure to include any extra databases you may have (for example, include jasperserver if it exists). This will create a full database export.
mysqldump -u root -p --default-character-set=utf8mb4 --add-drop-database --opt --databases opsview runtime odw dashboard | sed 's/character_set_client = utf8 /character_set_client = utf8mb4 /' | gzip -c > databases.sql.gz
Restore your databases Copied
On your new server, restore the databases, as shown in the example below. You should also verify that your character set for the new databases are the same, as your previous version, since you may experience issues when upgrading.
On your new server restore your databases.
gunzip -c databases.sql.gz | mysql -u root -p
Set up access controls Copied
Locate the /opt/opsview/coreutils/etc/opsview.conf
file on the Opsview Monitor master server and update the file with the contents, as shown in the example below:
#
# This file overrides variables from opsview.defaults
# This file will not be overwritten on upgrades
#
$dbhost = "localhost";
$dbport = "3306";
$dbpasswd_encrypted = "redacted';
$odw_dbhost = "localhost";
$odw_dbport = "3306";
$odw_dbpasswd_encrypted = "redacted';
$runtime_dbhost = "localhost";
$runtime_dbport = "3306";
$runtime_dbpasswd_encrypted = "redacted';
$dashboard_dbhost = "localhost";
$dashboard_dbport = "3306";
$dashboard_dbpasswd_encrypted = "redacted';
$authtkt_shared_secret_encrypted = "redacted';
1;
The values here are encrypted by default on new installations of Opsview Monitor.
You will need to replace all the redacted
entries iwth encrypted strings; to generate the encrypted string for a plaintext password, you can use the following command:
$ /opt/opsview/coreutils/bin/opsview_crypt
Enter text to encrypt: ********
Encrypted value:
fe4940b982ee95eb881c8269fa1b227c08b84dd71e286c6050682715ea11d818
Simply copy and paste this value and add within the quotation marks:
runtime_dbpasswd_encrypted = 'fe4940b982ee95eb881c8269fa1b227c08b84dd71e286c6050682715ea11d818';
Note
Both runtime ($runtime_dbhost) and opsview ($dbhost) need to be on the same host since this enables queries (joins) to be made across both databases.
Access control Copied
You should also set up access controls with your new database server so that the Opsview Monitor master is allowed to connect. So, from the Opsview Monitor master run the command shown below:
/opt/opsview/coreutils/bin/db_mysql -t > opsview_access.sql
The file, opsview_access.sql
now contains all the necessary access credentials, which should be transferred to the new database and imported as follows:
mysql -u root -p < opsview_access.sql
For additional security, you may want to restrict access to the Opsview Monitor master only.
Restart Opsview Monitor Copied
From the Opsview Monitor Primary Server, restart opsview-web
and regenerate the configuration for Opsview Monitor, as this will update third-party software applications with the new connection information, as shown below:
/opt/opsview/coreutils/bin/rc.opsview gen_config
/opt/opsview/watchdog/bin/opsview-monit start opsview-web