Orchestrator
The Orchestrator creates and distributes collection plans to Collectors within Opsview (actually to a Scheduler component on the Collector). The plans contain configuration and state information which allows the Collectors to schedule and monitor host/service checks, execute event handlers and send out notifications.
-
After running Apply Changes (used to be called a Reload), individual collection plans are built and sent to each configured Collector within a Cluster. If a Cluster has no active Collectors, the Apply Changes will fail. Also included in these initial plans are configuration files for other components (e.g. SNMP traps, SNMP interfaces, Notification methods and Netflow). What files are sent are based on whether the feature is available within that Cluster.
-
If a Collector re-starts, the Orchestrator will re-send the collection plan to that Collector (with update state info’).
-
If a Collector drops out from a Cluster, then the Orchestrator will re-distribute the hosts monitored by that Collector to the remaining Collectors within the Cluster.
-
If a dropped Collector comes back on-line, the Orchestrator will re-distribute the hosts again, whilst attempting to ensure that the hosts are back on their original Collectors.
The Orchestrator also supports an HTTP API to allow for various real-time actions initiated from the UI (e.g. recheck host/service or set state).
Dependencies Copied
The Orchestrator requires access to the Database, MessageQueue, DataStore and LicenseManager. Please make sure these are installed, configured and running before you attempt to run Orchestrator.
You will also need to ensure the mysql
client binary is installed.
Installation Copied
This component is deployed on the Master server when installing Opsview Monitor and must not be moved to another server.
Configuration Copied
The user configuration options should be set in /opt/opsview/orchestrator/etc/orchestrator.yaml
. Default values are shown in /opt/opsview/orchestrator/etc/orchestrator.defaults.yaml
, but changes should not be made here since the file will get overwritten on package update.
The below list shows the options that can be set. Note that any changes made to the component configuration file will be overwritten when opsview-deploy is next run.
master_database_connection
: The connection to the master database server.opsview_database_name
: The name of the Opsview configuration database.runtime_database_name
: The name of the Opsview runtime database.bsm_queue
: The message-queue configuration to send BSM recalculation messages.collector_queue
: The message-queue configuration to send collection plans and commands.downtime_queue
: The message-queue configuration to send downtime requests.resultsdispatcher_queue
: The message-queue configuration to send acknowledgements and set-state messages.flow_request_queue
: The message-queue configuration to send flow request messages.flow_response_queue
: The message-queue configuration to receive flow response messages.orchestrator_queue
: The message-queue configuration to receive command messages.snmp_trap_trace_queue
: The message-queue configuration to send SNMP trap messages.command_queue
: The message-queue configuration to send command messages.license_manager
: The connection to the license manager.orchestrator_store
: The data-store configuration for configuration.notifications_logs_store
: The data-store configuration for notification logs.http_server
: The endpoint for the orchestrator API.registry
: Connection configuration for the Registry.snmp_mib_dirs
: A list of directories to search for MIBs.logging
: Component logging configuration.
Additionally, the service_check_defaults
block controls the initial state of a Servicecheck when first added into the configuration (the state before the first check is performed)
The service_check_defaults
configuration default is defined as follows:
orchestrator:
service_check_defaults:
state: 0
output: 'Service assumed OK - no results received'
Allowed states are:
State | Meaning |
---|---|
0 | OK |
1 | WARNING |
2 | CRITICAL |
3 | UNKNOWN |
Changing the service_check_defaults
has the following limitations:
- This initial state is not shown in the history of this service as we only record from the first received result onwards.
- If you set a non-OK state, ODW statistics will still assume an initial OK state, so service availability statistics will not be accurate until the first result is received.
- If a service is deleted and then added back, the initial state will be the last known state of the service
Finally, for use with the Network Topology feature if included in your subscription, the following options are also available within the network_topology
block:
rpc_timeout
: Timeout for Network Topology RPC requests i.e. resolving host IP addresses or searching for neighbours.verbose_rpc_timeout
: Timeout for long-running verbose Network Topology RPC requests.cluster_batch_size
: The number of clusters to run Network Topology detection on concurrently.requests_per_collector
: The number of Network Topology RPC requests to run on collectors within clusters concurrently.tolerated_collector_failures
: The number of Network Topology RPC request failures to tolerate per collector before considering that collector unavailable.host_request_retries
: The number of retries to attempt for failed Network Topology RPC requests.cluster_retry_delay
: The delay to wait between retrying failed Network Topology detections on a particular cluster.cluster_retries
: The number of times to retry failed Network Topology detections on a particular cluster.
For example:
network_topology:
rpc_timeout: 60
cluster_retries: 2
API Copied
The API operates over HTTP and supports the following commands:
Command | Description |
---|---|
generate | Generate collection plans (for example, for Apply Changes). |
downtime | Schedule or cancel downtime. |
recalculate_bsm_statuses | Recalculate bsm objects. |
acknowledgement | Acknowledgements, by name or object-id. |
flow_query | Execute a flow query using the flow collector. |
process_result | Process a manual result (set status). |
get_collectors_for_hosts | Return a map of collectors for hosts. |
get_machine_node_data | Return machine node data (from opsview-watchdog). |
stats | Receive general stats. |
recheck | Perform a recheck. |
set_actions | Sets actions such as enabling/disabling active checks. |
send_snmptrap_trace | Send set trace message to snmptraps collector. |
get_notification_logs | Returns notification logs from the Datastore back to the UI. |
execute_remote_command | Executes a remote command on the specified collector or will lookup the appropriate collector. |
Management Copied
Configuration Copied
DPKGs Copied
Watchdog service files are now managed by the package, doing a remove would leave the watchdog service file behind with a .save extension. Purging the package will remove it. The package managed config file:
/opt/opsview/watchdog/etc/services/opsview-orchestrator.conf
RPMs Copied
Watchdog service files are now managed by the package. Any modifications will be saved at upgrade and remove processes with the .rpmnew and .rpmsave extensions correspondingly.
/opt/opsview/watchdog/etc/services/opsview-orchestrator.conf
Service Administration Copied
As root, start, stop and restart the service using:
/opt/opsview/watchdog/bin/opsview-monit <start|stop|restart> opsview-orchestrator