The mon command is a very powerful command. Primarily, this is the command that is used to manually stop and start the monitor system processes, and to set up a distributed or a load balanced environment.
Handle this command with care! It has the power to both create and destroy your whole OP5 installation.
Do not use this command unless specifically instructed by OP5 Support or the Documentation itself.
Simply running the command without any arguments will present a small syntax help text, and a list of available sub commands. Most sub commands are categorized, meaning that mon has to be run with at least two arguments to trigger the sub command. A few sub commands are non-categorized, requiring only a single argument being passed to on to trigger the sub command.
mon ecmd search <regex>
Prints 'templates' for all available commands matching <regex>.
The search is case insensitive.
mon ecmd submit [options] command <parameters>
Submits a command to the monitoring engine using the supplied values.
An example command to add a new service comment for the service PING on the host foo would look something like this:
mon ecmd submit add_svc_comment service='foo;PING' persistent=1 author='John Doe' comment='the comment'
Note how services are written. You can also use positional arguments, in which case the arguments have to be in the correct order for the command's syntactic template. The above example would then look thus:
mon ecmd submit add_svc_comment 'foo;PING' 1 'John Doe' 'the comment'
# mon log show
Runs the showlog helper program. Arguments passed to this command will be sent to the showlog helper.
For more information, a help text can be found by running the command like this:
# mon log show --help
# mon node add <name> --type=[peer|poller|master] [var1=value] [varN=value]
Adds a node with the designated type and variables.
# mon node ctrl <name1> <name2> [--self] [all|--type=<peer|poller|master>] -- <command>
Execute <command> on the remote node(s) named.
|--self||Run the command on the local system also.|
|--all||Run the command on all configured nodes.|
|--type||Run the command on configured nodes of the given type(s).|
|--||Stop argument scanning. Everything beyond will be treated as the command to run.|
The first unrecognised argument marks the start of the command to be executed, but using double dashes is recommended. Use single-quotes to execute commands with shell variables, output redirection or scriptlets, like so:
# mon node ctrl -- '(for x in 1 2 3; do echo $x; done) > /tmp/foo'
# mon node ctrl -- cat /tmp/foo
# mon node remove <name1> [name2] [nameN]
Removes one or more nodes from the merlin configuration.
# mon node show [--type=poller,peer,master]
Display all variables for all nodes, or for one node in a fashion suitable for being used as eval $(mon node show nodename) from shell scripts and scriptlets.
# mon node status
Show status of all nodes configured in the running Merlin daemon.
Red text points to problem areas, such as high latency or the node being inactive, not handling any checks, or not sending regular enough program_status updates.
# mon oconf nodesplit
Same as 'split', but use merlin's config to split config into configuration files suitable for poller consumption
# mon oconf push
Splits configuration based on merlin's peer and poller configuration and send object configuration to all peers and pollers, restarting those that receive a configuration update. ssh keys need to be set up for this to be usable without admin supervision.
This command uses 'nodesplit' as its backend.
# mon oconf split <outfile:hostgroup1,hostgroup2,hostgroupN>
Write config for hostgroup1,hostgroup2 and hostgroupN into outfile.
# mon sshkey fetch
Fetches all the SSH keys from peers and pollers.
The fetch command is not recommended - run the push command instead.
# mon sysconf ramdisk
To enable the ramdisk setup:
# mon sysconf ramdisk enable
A ramdisk can be enabled for storing spools for performance data and checkresults.
Note: As of Monitor 6, enabling the ramdisk is no longer recommended.
$ mon check spool [--maxage=<seconds>] [--warning=X] [--critical=X] <path> [--delete]
Checks a certain spool directory for files (and files only) that are older than 'maxage'. It's intended to prevent buildup of checkresult files and unprocessed performance-data files in the various spool directories used by op5 Monitor.
|--delete||Remove files that are too old.|
Is given in seconds and defaults to 300 (5 minutes).
|<path>||May be 'perfdata' or 'checks', in which case directory names will be taken from op5 defaults|
|--warning and --critical||Have no effect if '--delete' is given and will otherwise specify threshold values.|
Only one directory at a time may be checked.
$ mon check cores --warning=X --critical=X [--dir=]
Checks for memory dumps resulting from segmentation violation from core parts of OP5 Monitor. Detected core-files are moved to /tmp/mon-cores in order to keep working directories clean.
|--warning||Default is 0|
|--critical||Default is 1 (any corefile results in a critical alert)|
|--dir||Lets you specify more paths to search for corefiles. This option can be given multiple times.|
|--delete||Deletes corefiles not coming from 'merlind' or 'monitor'.|
$ mon check distribution [--no-perfdata]
Checks to make sure distribution works ok.
Note that it's not expected to work properly the first couple of minutes after a new machine has been brought online or taken offline
$ mon check exectime [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Checks execution time of active checks.
|[host|service]||Select host or service execution time.|
|--warning||Set the warning threshold for min,max and average execution time, in seconds|
|Set the critical threshold for min,max and average execution time, in seconds|
$ mon check latency [host|service] --warning=<min,max,avg> --critical=<min,max,avg>
Checks latency time of active checks.
|[host|service]||Select host or service latency time.|
|Set the warning threshold for min,max and average execution time, in seconds|
|--critical||Set the critical threshold for min,max and average execution time, in seconds|