MONIT(1) User Commands MONIT(1)
NAME
Monit - utility for monitoring services on a Unix system
SYNOPSIS
monit [options] {arguments}
DESCRIPTION
monit is a utility for managing and monitoring processes, programs, files, directories and filesystems on a Unix system. Monit conducts
automatic maintenance and repair and can execute meaningful causal actions in error situations. E.g. Monit can start a process if it does
not run, restart a process if it does not respond and stop a process if it uses too much resources. You can use Monit to monitor files,
directories and filesystems for changes, such as timestamps changes, checksum changes or size changes.
Monit is controlled via an easy to configure control file based on a free-format, token-oriented syntax. Monit logs to syslog or to its own
log file and notifies you about error conditions via customizable alert messages. Monit can perform various TCP/IP network checks, protocol
checks and can utilize SSL for such checks. Monit provides a http(s) interface and you may use a browser to access the Monit program.
GENERAL OPERATION
The behavior of Monit is controlled by command-line options and a run control file, monitrc, the syntax of which we describe in a later
section. Command-line options override .monitrc declarations.
The default location for monitrc is ~/.monitrc. If this file does not exist, Monit will try /etc/monitrc and a few other places. See FILES
for details. You can also specify the control file directly by using the -c command-line switch to monit. For instance,
$ monit -c /var/monit/monitrc
Before Monit is started the first time, you can test the control file for syntax errors:
$ monit -t
$ Control file syntax OK
If there was an error, Monit will print an error message to the console, including the line number in the control file from where the error
was found.
Once you have a working Monit control file you can start Monit from the console, like so:
$ monit
You can change some configuration directives via command-line switches, but for simplicity it is recommended that you put these in the
control file.
If all goes well, Monit will now detach from the terminal and run as a background process, i.e. as a daemon process. As a daemon, Monit
runs in cycles; It monitor services, then goes to sleep for a configured period, then wakes up and start monitoring again in an endless
loop.
Options
The following options are recognized by Monit. However, it is recommended that you set options (when applicable) directly in the .monitrc
control file.
-c file
Use this control file
-d n
Run Monit as a daemon once per n seconds. Or use "set
daemon" in monitrc.
-g name
Set group name for start, stop, restart, monitor and
unmonitor action.
-l logfile
Print log information to this file. Or use "set logfile"
in monitrc.
-p pidfile
Use this lock file in daemon mode. Or use "set pidfile"
in monitrc.
-s statefile
Write state information to this file. Or use "set
statefile" in monitrc.
-I
Do not run in background (needed for run from init)
-t
Run syntax check for the control file
-v
Verbose mode, work noisy (diagnostic output)
-vv
Very verbose mode, same as -v plus log stack-trace on error
-H [filename]
Print MD5 and SHA1 hashes of the file or of stdin if the
filename is omitted; Monit will exit afterwards
-V
Print version number and patch level
-h
Print a help text
Arguments
Once you have Monit running as a daemon process, you can call Monit with one of the following arguments. Monit will then connect to the
Monit daemon (on TCP port 127.0.0.1:2812 by default) and ask the Monit daemon to perform the requested action. In other words; calling
monit without arguments starts the Monit daemon, and calling monit with arguments enables you to communicate with the Monit daemon process.
start all
Start all services listed in the control file and enable monitoring for them. If the group option is set (-g), only start and enable
monitoring of services in the named group ("all" is not required in this case).
start name
Start the named service and enable monitoring for it. The name is a service entry name from the monitrc file.
stop all
Stop all services listed in the control file and disable their monitoring. If the group option is set, only stop and disable monitoring
of the services in the named group (all" is not required in this case).
stop name
Stop the named service and disable its monitoring. The name is a service entry name from the monitrc file.
restart all
Stop and start all services. If the group option is set, only restart the services in the named group ("all" is not required in this
case).
restart name
Restart the named service. The name is a service entry name from the monitrc file.
monitor all
Enable monitoring of all services listed in the control file. If the group option is set, only start monitoring of services in the
named group ("all" is not required in this case).
monitor name
Enable monitoring of the named service. The name is a service entry name from the monitrc file. Monit will also enable monitoring of
all services this service depends on.
unmonitor all
Disable monitoring of all services listed in the control file. If the group option is set, only disable monitoring of services in the
named group ("all" is not required in this case).
unmonitor name
Disable monitoring of the named service. The name is a service entry name from the monitrc file. Monit will also disable monitoring of
all services that depends on this service.
status
Print status information of each service.
summary
Print a short status summary.
reload
Reinitialize a running Monit daemon, the daemon will reread its configuration, close and reopen log files.
quit
Kill the Monit daemon process
validate
Check all services listed in the control file. This action is also the default behavior when Monit runs in daemon mode.
procmatch regex
Allows for easy testing of pattern for process match check. The command takes regular expression as an argument and displays all
running processes matching the pattern.
WHAT TO MONITOR
?
You can use Monit to monitor daemon processes or similar programs running on localhost. Monit is particular useful for monitoring daemon
processes, such as those started at system boot time from /etc/init.d/. For instance sendmail, sshd, apache and mysql. In contrast to many
other monitoring systems, Monit can act if an error situation should occur, e.g.; if sendmail is not running, monit can start sendmail
again automatically or if apache is using too many resources (e.g. if a DoS attack is in progress) Monit can stop or restart apache and
send you an alert message. Monit can also monitor process characteristics, such as how much memory or cpu cycles a process is using.
You can also use Monit to monitor files, directories and filesystems on localhost. Monit can monitor these items for changes, such as
timestamps changes, checksum changes or size changes. This is also useful for security reasons - you can monitor the md5 or sha1 checksum
of files that should not change and get an alert or perform an action if they should change.
Monit can monitor network connections to various servers, either on localhost or on remote hosts. TCP, UDP and Unix Domain Sockets are
supported. Network test can be performed on a protocol level; Monit has built-in tests for the main Internet protocols, such as HTTP, SMTP
etc. Even if a protocol is not supported you can still test the server because you can configure Monit to send any data and test the
response from the server.
Monit can be used to test programs or scripts at certain times, much like cron, but in addition, you can test the exit value of a program
and perform an action or send an alert if the exit value indicate an error. This means that you can use Monit to perform any type of check
you can write a script for.
Finally, Monit can be used to monitor general system resources on localhost such as overall CPU usage, Memory and Load Average.
THE MONIT CONTROL FILE
Monit is configured and controlled via a control file called monitrc. The default location for this file is ~/.monitrc. If this file does
not exist, Monit will try /etc/monitrc, then @sysconfdir@/monitrc and finally ./monitrc. The value of @sysconfdir@ is given at configure
time as ./configure --sysconfdir. For instance, using ./configure --sysconfdir /var/monit/etc will make Monit search for monitrc in
/var/monit/etc
Monit uses its own Domain Specific Language (DSL); The control file consists of a series of service entries and global option statements in
a free-format, token-oriented syntax.
Comments begin with a # and extend through the end of the line. There are three kinds of tokens in the control file: keywords, numbers and
strings. On a semantic level, the control file consists of only three type of entries:
1. Global set-statements
A global set-statement starts with the keyword set and the item to configure.
2. Global include-statement
The include statement consists of the keyword include and a glob string.
3. One or more service entry statements.
A service entry starts with the keyword check followed by the service type.
The meaning of the various statements will be explained in the following sections.
LOGGING
Monit will log status and error messages to a log file. Use the set logfile statement in the monitrc control file. To setup Monit to log to
its own logfile, use e.g. set logfile /var/log/monit.log. If syslog is given as a value for the -l command-line switch (or the keyword set
logfile syslog is found in the control file) Monit will use the syslog system daemon to log messages with a priority assigned to each
message based on the context. To turn off logging, simply do not set the logfile in the control file (and of course, do not use the -l
switch)
DAEMON MODE
Use
set daemon n (where n is a number in seconds)
to specify Monit's poll cycle length and run Monit in daemon mode. You must specify a numeric argument which is a polling interval in
seconds. In daemon mode, Monit detaches from the console, puts itself in the background and runs continuously, monitoring each specified
service and then goes to sleep for the given poll interval, wakes up and start monitoring again in an endless cycle.
Alternatively, you can use the -d command line switch to set the poll interval, but it is strongly recommended to set the poll interval in
your ~/.monitrc file, by using set daemon.
Monit will then always start in daemon mode. If you do not use this statement and do not start monit with the -d option, Monit will just
run through the service checks once and then exit. This may be useful in some situations, but Monit is primarily designed to run as a
daemon process.
Calling monit with a Monit daemon running in the background sends a wake-up signal to the daemon, forcing it to check services immediately.
Calling monit with the quit argument will kill a running Monit daemon process instead of waking it up.
INIT SUPPORT
The set init statement prevents Monit from transforming itself into a daemon process. Instead Monit will run as a foreground process. (You
should still use set daemon to specify the poll cycle).
This is required to run Monit from init. Using init to start Monit is probably the best way to run Monit if you want to be certain that you
always have a running Monit daemon on your system. Another option is to run Monit from crontab. In any case, you should make sure that the
control file does not have any syntax errors before you start Monit from init or crontab.
To setup Monit to run from init, you can either use the set init statement in Monit's control file or use the -I option from the command
line. Here is what you must add to /etc/inittab:
# Run Monit in standard run-levels
mo:2345:respawn:/usr/local/bin/monit -Ic /etc/monitrc
After you have modified init's configuration file, you can run the following command to re-examine /etc/inittab and start Monit:
telinit q
For systems without telinit:
kill -1 1
If Monit is used to monitor services that are also started at boot time (e.g. services started via SYSV init rc scripts or via inittab)
then, in some cases, a race condition could occur. That is; if a service is slow to start, Monit can assume that the service is not running
and possibly try to start it and raise an alert, while, in fact the service is already about to start or already in its startup sequence.
Please see the FAQ for a solution to this problem.
INCLUDE FILES
The Monit control file, monitrc, can include additional configuration files. This feature helps one to maintain a certain structure or to
place repeating settings into one file. Include statements can be placed at virtually any spot. The syntax is the following:
include globstring
The globstring is any kind of string as defined in glob(7). Thus, you can refer to a single file or you can load several files at once. If
you want to use whitespace in your string the globstring need to be embedded into quotes (') or double quotes ("). If the globstring
matches a directory instead of a file, it is silently ignored.
Any include statements in included files are parsed as in the main control file.
If the globstring matches several results, the files are included in a non sorted manner. If you need to rely on a certain order, you might
need to use single include statements.
An example,
include /etc/monit.d/*.cfg
This will load any file matching the globstring. That is, all files in /etc/monit.d that ends with the prefix .cfg.
GROUP SUPPORT
Service entries in the control file, monitrc, can be grouped together by the group statement. The syntax is simply (keyword in capital):
GROUP groupname
With this statement it is possible to group similar service entries together and manage them as a whole. Monit provides functions to start,
stop, restart, monitor and unmonitor a group of services, like so:
To start a group of services from the console:
Monit -g <groupname> start
To stop a group of services:
Monit -g <groupname> stop
To restart a group of services:
Monit -g <groupname> restart
Note: the status and summary commands don't support the -g option and will print the state of all services.
Service can be added to multiple groups by adding group statement multiple times:
group www
group filesystem
MONITORING MODE
Monit supports three monitoring modes per service: active, passive and manual. See also the example section below for usage of the mode
statement.
In active mode, Monit will monitor a service and in case of problems Monit will act and raise alerts, start, stop or restart the service.
Active mode is the default mode.
In passive mode, Monit will passively monitor a service and specifically not try to fix a problem, but it will still raise alerts in case
of a problem.
For use in clustered environments there is also a manual mode. In this mode, Monit will enter active mode only if a service was brought
under monit's control, for example by executing the following command in the console:
Monit start sybase
(Monit will call sybase's start method and enable monitoring)
If a service was not started by Monit or was stopped or disabled for example by:
Monit stop sybase
(Monit will call sybase's stop method and disable monitoring)
Monit will then not monitor the service. This allows for having services configured in monitrc and start it with Monit only if it should
run. This feature can be used to build a simple failsafe cluster.
A service's monitoring state is persistent across Monit restart. This means that you probably would like to make certain that services in
manual mode are stopped or in unmonitored mode at server shutdown. Do for instance the following in a server shutdown script:
Monit stop sybase
or
Monit unmonitor sybase
If you use Monit in a HA-cluster you should place the state file in a temporary filesystem so if the machine should crash and the stand-by
machine take over services, any manual monitoring mode services that were started on the crashed machine won't be started on reboot. Use
for example:
set statefile /tmp/monit.state
ALERT MESSAGES
Monit will raise an email alert in the following situations:
o A service timed out
o A service does not exist
o A service related data access problem
o A service related program execution problem
o A service is of invalid object type
o A program status failed
o A icmp problem
o A port connection problem
o A resource statement match
o A file checksum problem
o A file size problem
o A file/directory timestamp problem
o A file/directory/filesystem permission problem
o A file/directory/filesystem uid problem
o A file/directory/filesystem gid problem
o An action is done per administrator's request
Monit will send an alert each time a monitored object changed. This involves:
o Monit started, stopped or reloaded
o A file checksum changed
o A file size changed
o A file content match
o A file/directory timestamp changed
o A filesystem mount flags changed
o A process PID changed
o A process PPID changed
You use the alert statement to notify Monit that you want alert messages sent to an email address. If you do not specify an alert
statement, Monit will not send alert messages.
There are two forms of alert statement:
o Global - common for all services
o Local - per service
In both cases you can use more than one alert statement. In other words, you can send many different emails to many different addresses.
Recipients in the global and in the local lists are alerted when a service failed, recovered or changed. If the same email address is in
the global and in the local list, Monit will only send one alert. Local (per service) defined alert email addresses override global
addresses in case of a conflict. Finally, you may choose to only use a global alert list (recommended), a local per service list or both.
It is also possible to disable the global alerts locally for particular service(s) and recipients.
Setting a global alert statement
If a change occurred on a monitored services, Monit will send an alert to all recipients in the global list who has registered interest for
the event type. Here is the syntax for the global alert statement:
SET ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}] [REMINDER number]
Simply using the following in the global section of monitrc:
set alert foo@bar
will send a default email to the address foo@bar whenever an event occurred on any service. Such an event may be that a service timed out,
a service doesn't exist and so on. If you want to send alert messages to more email addresses, add a set alert 'email' statement for each
address.
For explanations of the events, MAIL-FORMAT and REMINDER keywords above, please see below.
You can also use the NOT option ahead of the events list which will reverse the meaning of the list. That is, only send alerts for events
not in the list. This can save you some configuration bytes if you are interested in most events except a few.
Setting a local alert statement
Each service can also have its own recipient list.
ALERT mail-address [ [NOT] {events}] [MAIL-FORMAT {mail-format}] [REMINDER number]
or
NOALERT mail-address
If you only want an alert message sent for certain events and for certain service(s), for example only for timeout events or only if a
service died, then postfix the alert-statement with a filter block:
check process myproc with pidfile /var/run/my.pid
alert foo@bar only on { timeout, nonexist }
...
(only and on are noise keywords, ignored by Monit. As a side note; Noise keywords are used in the control file grammar to make an entry
resemble English and thus make it easier to read (or, so goes the philosophy). The full set of available noise keywords are listed below in
the Control File section).
You can also setup to send alerts for all events except some by putting the word not ahead of the list. For example, if you want to receive
alerts for all events except Monit instance events, you can write (note that the noise words 'but' and 'on' are optional):
check system myserver
alert foo@bar but not on { instance }
...
instead of:
alert foo@bar on { action
checksum
connection
content
data
exec
fsflags
gid
icmp
invalid
nonexist
permission
pid
ppid
resource
size
status
timeout
timestamp
uid
uptime }
This will send alerts for all events to foo@bar, except Monit instance events. An instance event BTW, is an event fired whenever the Monit
program start or stop.
Event filtering can be used to send an email to different email addresses depending on the events that occurred. For instance:
alert foo@bar { nonexist, timeout, resource, icmp, connection }
alert security@bar on { checksum, permission, uid, gid }
alert manager@bar
This will send an alert message to foo@bar whenever a nonexist, timeout, resource or connection problem occurs and a message to
security@bar if a checksum, permission, uid or gid problem occurs. And finally, a message to manager@bar whenever any error event occurs.
Here is the list of events you can use in a mail-filter: action, checksum, connection, content, data, exec, fsflags, gid, icmp, instance,
invalid, nonexist, permission, pid, ppid, resource, size, status, timeout, timestamp, uid, uptime
You can also disable the alerts locally using the NOALERT statement. This is useful if you have lots of services monitored and are using
the global alert statement, but don't want to receive alerts for some minor subset of services:
noalert appadmin@bar
For example, if you stick the noalert statement in a 'check system' entry, you won't receive system related alerts (such as Monit instance
started/stopped/reloaded alert, system overloaded alert, etc.) but will receive alerts for all other monitored services.
The following example will alert foo@bar on all events on all services by default, except the service mybar which will send an alert only
on timeout. The trick is based on the fact that local definition of the same recipient overrides the global setting (including registered
events and mail format):
set alert foo@bar
check process myfoo with pidfile /var/run/myfoo.pid
...
check process mybar with pidfile /var/run/mybar.pid
alert foo@bar only on { timeout }
Alert message layout
Monit provides a default mail message layout that is short and to the point. Here's an example of a standard alert mail sent by monit:
From: monit@tildeslash.com
Subject: Monit alert -- Does not exist apache
To: hauk@tildeslash.com
Date: Thu, 04 Sep 2003 02:33:03 +0200
Does not exist Service apache
Date: Thu, 04 Sep 2003 02:33:03 +0200
Action: restart
Host: www.tildeslash.com
Your faithful employee,
monit
If you want to, you can change the format of this message with the optional mail-format statement. The syntax for this statement is as
follows:
mail-format {
from: monit@localhost
reply-to: support@domain.com
subject: $SERVICE $EVENT at $DATE
message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
Yours sincerely,
monit
}
Where the keyword from: is the email address Monit should pretend it is sending from. It does not have to be a real mail address, but it
must be a proper formatted mail address, on the form: name@domain. The reply-to: keyword can be used to set the reply-to mail header. The
keyword subject: is for the email subject line. The subject must be on only one line. The message: keyword denotes the mail body. If used,
this keyword should always be the last in a mail-format statement. The mail body can be as long as you want, but must not contain the '}'
character.
All of these format keywords are optional, but if used, you must provide at least one. Thus if you only want to change the from address
Monit is using you can do:
set alert foo@bar with mail-format { from: bofh@bar.baz }
From the previous example you will notice that some special $XXX variables were used. If used, they will be substituted and expanded into
the text with these values:
o $EVENT
A string describing the event that occurred. The values are
fixed and are:
Event: | Failure state: | Success state:
-------------------------------------------------------------------
ACTION | "Action done" | "Action done"
CHECKSUM | "Checksum failed" | "Checksum succeeded"
CONNECTION| "Connection failed" | "Connection succeeded"
CONTENT | "Content failed", | "Content succeeded"
DATA | "Data access error" | "Data access succeeded"
EXEC | "Execution failed" | "Execution succeeded"
FSFLAG | "Filesystem flags failed"| "Filesystem flags succeeded"
GID | "GID failed" | "GID succeeded"
ICMP | "ICMP failed" | "ICMP succeeded"
INSTANCE | "Monit instance changed" | "Monit instance changed not"
INVALID | "Invalid type" | "Type succeeded"
NONEXIST | "Does not exist" | "Exists"
PERMISSION| "Permission failed" | "Permission succeeded"
PID | "PID failed" | "PID succeeded"
PPID | "PPID failed" | "PPID succeeded"
RESOURCE | "Resource limit matched" | "Resource limit succeeded"
SIZE | "Size failed" | "Size succeeded"
STATUS | "Status failed" | "Status succeeded"
TIMEOUT | "Timeout" | "Timeout recovery"
TIMESTAMP | "Timestamp failed" | "Timestamp succeeded"
UID | "UID failed" | "UID succeeded"
UPTIME | "Uptime failed" | "Uptime succeeded"
o $SERVICE
The service entry name in monitrc
o $DATE
The current time and date (RFC 822 date style).
o $HOST
The name of the host Monit is running on
o $ACTION
The name of the action which was done. Action names are fixed
and are:
Action: | Name:
--------------------
ALERT | "alert"
EXEC | "exec"
RESTART | "restart"
START | "start"
STOP | "stop"
UNMONITOR| "unmonitor"
o $DESCRIPTION
The description of the error condition
Setting a global mail format
It is possible to set a standard mail format with the following global set-statement (keywords are in capital):
SET MAIL-FORMAT {mail-format}
Format set with this statement will apply to every alert statement that does not have its own specified mail-format. This statement is
most useful for setting a default from address for messages sent by monit, like so:
set mail-format { from: monit@foo.bar.no }
Setting an error reminder
Monit by default sends just one error notification if a service failed and another when it recovered. If you want to be notified more then
once if a service remains in a failed state, you can use the reminder option to the alert statement (keywords are in capital):
ALERT ... [WITH] REMINDER [ON] number [CYCLES]
For example if you want to be notified each tenth cycle if a service remains in a failed state, you can use:
alert foo@bar with reminder on 10 cycles
Likewise if you want to be notified on each failed cycle, you can use:
alert foo@bar with reminder on 1 cycle
Setting a mail server for alert messages
The mail server Monit should use to send alert messages is defined with a global set statement (keywords are in capital and optional
statements in [brackets]):
SET MAILSERVER {hostname|ip-address [PORT port]
[USERNAME username] [PASSWORD password]
[using SSLV2|SSLV3|TLSV1] [CERTMD5 checksum]}+
[with TIMEOUT X SECONDS]
[using HOSTNAME hostname]
The port statement allows one to use SMTP servers other then those listening on port 25. If omitted, port 25 is used unless ssl or tls is
used, in which case port 465 is used by default.
Monit support plain smtp authentication - you can set a username and a password using the USERNAME and PASSWORD options.
To use secure communication, use the SSLV2, SSLV3 or TLSV1 options, you can also specify the server certificate checksum using CERTMD5
option.
As you can see, it is possible to set several SMTP servers. If Monit cannot connect to the first server in the list it will try the second
server and so on. Monit has a default 5 seconds connection timeout and if the SMTP server is slow, Monit could timeout when connecting or
reading from the server. If this is the case, you can use the optional timeout statement to explicit set the timeout to a higher value if
needed. Here is an example for setting several mail servers:
set mailserver mail.tildeslash.com, mail.foo.bar port 10025
username "Rabbi" password "Loew" using tlsv1, localhost
with timeout 15 seconds
Here Monit will first try to connect to the server "mail.tildeslash.com", if this server is down Monit will try "mail.foo.bar" on port
10025 using the given credentials via tls and finally "localhost". We also set an explicit connect and read timeout; If Monit cannot
connect to the first SMTP server in the list within 15 seconds it will try the next server and so on. The set mailserver .. statement is
optional and if not defined Monit will not send email alerts. Not setting a mail server is recommended only if alert notification is
delegated to M/Monit.
Monit, by default, use the local host name in SMTP HELO/EHLO and in the Message-ID header. Some mail servers check this information against
DNS for spam protection and can reject the email if the DNS and the hostname used in the transaction does not match. If this is the case,
you can override the default local host name by using the HOSTNAME option:
set mailserver mail.tildeslash.com using hostname
"myhost.example.org"
Event queue
If the MTA (mail server) for sending alerts is not available, Monit can queue events on the local file-system until the MTA recover. Monit
will then post queued events in order with their original timestamp so the events are not lost. This feature is most useful if Monit is
used together with M/Monit and when event history is important.
The event queue is persistent across Monit restarts and provided that the back-end filesystem is persistent too, across system restart as
well.
By default, the queue is disabled and if the alert handler fails, Monit will simply drop the alert message. To enable the event queue, add
the following statement to the Monit control file:
SET EVENTQUEUE BASEDIR <path> [SLOTS <number>]
The <path> is the path to the directory where events will be stored. Optionally if you want to limit the queue size, use the slots option
to only store up to number event messages. If the slots option is not used, Monit will store as many events as the backend filesystem
allows.
Example:
set eventqueue
basedir /var/monit
slots 5000
Events are stored in a binary format, with one file per event. The file size is ca. 130 bytes or a bit more (depending on the message
length). The file name is composed of the unix timestamp, underscore and the service name, for example:
/var/monit/1131269471_apache
If you are running more then one Monit instance on the same machine, you must use separated event queue directories to avoid sending wrong
alerts to the wrong addresses.
If you want to purge the queue by hand, that is, remove queued event-files, Monit should be stopped before the removal.
SERVICE TIMEOUT
Monit provides a service timeout mechanism for situations where a service simply refuses to start or respond over a longer period.
The timeout mechanism is based on number of service restarts and number of poll-cycles. For example, if a service had x restarts within y
poll-cycles (where x <= y) then Monit will perform an action (for example unmonitor the service). If a timeout occurs, Monit will send an
alert message if you have register interest for this event.
The syntax for the timeout statement is as follows (keywords are in capital):
IF <number> RESTART <number> CYCLE(S) THEN <action>
Here is an example where Monit will unmonitor the service if it was restarted 2 times within 3 cycles:
if 2 restarts within 3 cycles then unmonitor
To have Monit check the service again after a monitoring was disabled, run 'monit monitor <servicename>' from the command line.
Example for setting custom exec on timeout:
if 5 restarts within 5 cycles then exec "/foo/bar"
Example for stopping the service:
if 7 restarts within 10 cycles then stop
SERVICE TESTS
Monit provides several tests you can use in a 'check service' entry to test a service. There are two classes of tests: variable and
constant tests. That is, the condition we test is either constant e.g. a number or it can vary.
A constant test has this general format:
IF <TEST> [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION [ELSE IF SUCCEEDED [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION]
If the <TEST> condition should evaluate to true, then the selected action is executed each cycle the test condition remains true. The
comparison value is constant. Recovery action is evaluated only once (on a failed to succeeded state change only). The 'ELSE IF SUCCEEDED'
part is optional, if omitted, Monit will still send an alert on recovery. The alert is sent only once for each state change unless
overridden by the 'reminder' alert option.
A variable test has this general format:
IF CHANGED <TEST> [[<X>] [TIMES WITHIN] <Y> CYCLES] THEN ACTION
If the <TEST> should evaluate to true, then the selected action is executed once. The comparison value is a variable where the last result
becomes the new value and is used for comparisons in future cycles. An alert is delivered each time the condition becomes true.
You can use this test for alerts or for some automatic action, for example to reload monitored process after its configuration file was
changed. Variable tests are supported for 'checksum', 'size', 'pid, 'ppid' and 'timestamp' tests only.
... [[<X>] [TIMES WITHIN] <Y> CYCLES] ...
If a test match, its action is executed at once. This behavior can optionally be changed and you can for instance require that a test must
match over several poll cycles before the action is executed by using the statement above. You can use this in several ways. For example:
if failed port 80 for 3 times within 5 cycles then alert
or
if failed port 80 for 10 cycles then unmonitor
If you don't specify <X> times, it equals <Y> by default, thus the test match if it evaluate to true for <Y> consecutive cycles.
It is possible to use this option to tune and prevent a rush of notifications. You can use this option for failed, succeeded, recovered or
changed rules. Here is a more complex example:
check filesystem rootfs with path /dev/hda1
if space usage > 80% for 5 times within 15 cycles
then alert else if succeeded for 10 cycles then alert
if space usage > 90% for 5 cycles then
exec '/try/to/free/the/space'
In each test you must select the action to be executed from this list:
o ALERT sends the user an alert event on each state change (for constant tests) or on each change (for variable tests).
o RESTART restarts the service and sends an alert. Restart is conducted by first calling the service's registered stop method and then
the service's start method.
o START starts the service by calling the service's registered start method and send an alert.
o STOP stops the service by calling the service's registered stop method and send an alert. If Monit stops a service it will not be
checked by Monit anymore nor restarted again later. To reactivate monitoring of the service again you must explicitly enable
monitoring from the web interface or from the console, e.g. 'monit monitor apache'.
o EXEC can be used to execute an arbitrary program and send an alert. If you choose this action you must state the program to be executed
and if the program require arguments you must enclose the program and its arguments in a quoted string. You may optionally specify the
uid and gid the executed program should switch to upon start. For instance:
exec "/usr/local/tomcat/bin/startup.sh"
as uid nobody and gid nobody
The uid and gid switch can be useful if the program to be started cannot change to a lesser privileged user and group. This is
typically needed for Java Servers. Remember, if Monit is run by the superuser, then all programs executed by Monit will be started with
superuser privileges unless the uid and gid extension was used.
o UNMONITOR will disable monitoring of the service and send an alert. The service will not be checked by Monit anymore nor restarted
again later. To reactivate monitoring of the service you must explicitly enable monitoring from monit's web interface or from the
console using the monitor argument.
EXISTENCE TESTING
Monit's default action when services does not exist (for example a process is not running, a file doesn't exist, etc.) is to perform
service restart action.
The default action can be overrided with following statement:
IF [DOES] NOT EXIST [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
Example:
check file with path /cifs/mydata
if does not exist for 5 cycles then exec "/usr/bin/mount_cifs.sh"
RESOURCE TESTING
Monit can examine how much system resources a service is using. This test can only be used within a system or process service entry in the
Monit control file.
Depending on system or process characteristics, services can be stopped or restarted and alerts can be generated. Thus it is possible to
utilize systems which are idle and to spare system under high load.
The full syntax for a resource-statement used for resource testing is as follows (keywords are in capital and optional statements in
[brackets]),
IF resource operator value [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
resource is a choice of "CPU", "TOTALCPU", "CPU([user|system|wait])", "MEMORY", "SWAP", "CHILDREN", "TOTALMEMORY",
"LOADAVG([1min|5min|15min])". Some resource tests can be used inside a check system entry, some in a check process entry and some in both:
System only resource tests:
CPU([user|system|wait]) is the percent of time the system spend in user or system/kernel space. Some systems such as linux 2.6 supports a
'wait' indicator as well.
SWAP is the swap usage of the system in either percent (of the systems total) or as an amount (Byte, kB, MB, GB).
Process only resource tests:
CPU is the CPU usage of the process itself (percent).
TOTALCPU is the total CPU usage of the process and its children in (percent). You will want to use TOTALCPU typically for services like
Apache web server where one master process forks the child processes as workers.
CHILDREN is the number of child processes of the process.
TOTALMEMORY is the memory usage of the process and its child processes in either percent or as an amount (Byte, kB, MB, GB).
System and process resource tests:
MEMORY is the memory usage of the system or of a process (without children) in either percent (of the systems total) or as an amount (Byte,
kB, MB, GB).
LOADAVG([1min|5min|15min]) refers to the system's load average. The load average is the number of processes in the system run queue,
averaged over the specified time period.
operator is a choice of "<", ">", "!=", "==" in C notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal",
"notequal" in human readable form (if not specified, default is EQUAL).
value is either an integer or a real number (except for CHILDREN). For CPU, TOTALCPU, MEMORY and TOTALMEMORY you need to specify a unit.
This could be "%" or if applicable "B" (Byte), "kB" (1024 Byte), "MB" (1024 KiloByte) or "GB" (1024 MegaByte).
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
To calculate the cycles, a counter is raised whenever the expression above is true and it is lowered whenever it is false (but not below
0). All counters are reset in case of a restart.
The following is an example to check that the CPU usage of a service is not going beyond 50% during five poll cycles. If it does, Monit
will restart the service:
if cpu is greater than 50% for 5 cycles then restart
See also the example section below.
FILE CHECKSUM TESTING
The checksum statement may only be used in a file service entry. If specified in the control file, Monit will compute a md5 or sha1
checksum for a file.
The checksum test in constant form is used to verify that a file does not change. Syntax (keywords are in capital):
IF FAILED [MD5|SHA1] CHECKSUM [EXPECT checksum] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
The checksum test in variable form is used to watch for file changes. Syntax (keywords are in capital):
IF CHANGED [MD5|SHA1] CHECKSUM [[<X>] <Y> CYCLES] THEN action
The choice of MD5 or SHA1 is optional. MD5 features a 256 bit and SHA1 a 320 bit checksum. If this option is omitted Monit tries to guess
the method from the EXPECT string or uses MD5 as default.
expect is optional and if used it specifies a md5 or sha1 string Monit should expect when testing a file's checksum. If expect is used,
Monit will not compute an initial checksum for the file, but instead use the string you submit. For example:
if failed checksum and
expect the sum 8f7f419955cefa0b33a2ba316cba3659
then alert
You can, for example, use the GNU utility md5sum(1) or sha1sum(1) to create a checksum string for a file and use this string in the expect-
statement.
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The checksum statement in variable form may be used to check a file for changes and if changed, do a specified action. For instance to
reload a server if its configuration file was changed. The following illustrates this for the apache web server:
check file httpd.conf path /usr/local/apache/conf/httpd.conf
if changed sha1 checksum
then exec "/usr/local/apache/bin/apachectl graceful"
If you plan to use the checksum statement for security reasons, (a very good idea, by the way) and to monitor a file or files which should
not change, then please use the constant form and also read the DEPENDENCY TREE section below to see a detailed example on how to do this
properly.
Monit can also test the checksum for files on a remote host via the HTTP protocol. See the CONNECTION TESTING section below.
TIMESTAMP TESTING
The timestamp statement may only be used in a file, fifo or directory service entry.
The timestamp test in constant form is used to verify various timestamp conditions. Syntax (keywords are in capital):
IF TIMESTAMP [[operator] value [unit]] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
The timestamp statement in variable form is simply to test an existing file or directory for timestamp changes and if changed, execute an
action. Syntax (keywords are in capital):
IF CHANGED TIMESTAMP [[<X>] <Y> CYCLES] THEN action
operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT", "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL",
"NOTEQUAL" in human readable form (if not specified, default is EQUAL).
value is a time watermark.
unit is either "SECOND", "MINUTE", "HOUR" or "DAY" (it is also possible to use "SECONDS", "MINUTES", "HOURS", or "DAYS").
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The variable timestamp statement is useful for checking a file for changes and then execute an action. This version was written
particularly with configuration files in mind. For instance, if you monitor the apache web server you can use this statement to reload
apache if the httpd.conf (apache's configuration file) was changed. Like so:
check file httpd.conf with path /usr/local/apache/conf/httpd.conf
if changed timestamp
then exec "/usr/local/apache/bin/apachectl graceful"
The constant timestamp version is useful for monitoring systems able to report its state by changing the timestamp of certain state files.
For instance the iPlanet Messaging server stored process system updates the timestamp of the following files:
o stored.ckp
o stored.lcu
o stored.per
If a task should fail, the system keeps the timestamp. To report stored problems you can use the following statements:
check file stored.ckp with path /msg-foo/config/stored.ckp
if timestamp > 1 minute then alert
check file stored.lcu with path /msg-foo/config/stored.lcu
if timestamp > 5 minutes then alert
check file stored.per with path /msg-foo/config/stored.per
if timestamp > 1 hour then alert
As mentioned above, you can also use the timestamp statement for monitoring directories for changes. If files are added or removed from a
directory, its timestamp is changed:
check directory mydir path /foo/directory
if timestamp > 1 hour then alert
or
check directory myotherdir path /foo/secure/directory
if timestamp < 1 hour then alert
The following example is a hack for restarting a process after a certain time. Sometimes this is a necessary workaround for some third-
party applications, until the vendor fix a problem:
check file server.pid path /var/run/server.pid
if timestamp > 7 days
then exec "/usr/local/server/restart-server"
FILE SIZE TESTING
The size statement may only be used in a check file service entry. If specified in the control file, Monit will compute a size for a file.
The size test in constant form is used to verify various size conditions. Syntax (keywords are in capital):
IF SIZE [[operator] value [unit]] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
The size statement in variable form is simply to test an existing file for size changes and if changed, execute an action. Syntax (keywords
are in capital):
IF CHANGED SIZE [[<X>] <Y> CYCLES] THEN action
operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT", "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL",
"NOTEQUAL" in human readable form (if not specified, default is EQUAL).
value is a size watermark.
unit is a choice of "B","KB","MB","GB" or long alternatives "byte", "kilobyte", "megabyte", "gigabyte". If it is not specified, "byte" unit
is assumed by default.
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The variable size test form is useful for checking a file for changes and send an alert or execute an action. Monit will register the size
of the file at startup and monitor the file for changes. As soon as the value changes, Monit will perform the specified action, reset the
registered value to the new value and continue monitoring and test if the size changes again.
One example of use for this statement is to conduct security checks, for instance:
check file su with path /bin/su
if changed size then exec "/sbin/ifconfig eth0 down"
which will "cut the cable" and stop a possible intruder from compromising the system further. This test is just one of many you may use to
increase the security awareness on a system. If you plan to use Monit for security reasons we recommend that you use this test in
combination with other supported tests like checksum, timestamp, and so on.
The constant form of this test can be useful in similar or different contexts. It can, for instance, be used to test if a certain file size
was exceeded and then alert you or Monit may execute a certain action specified by you. An example is to use this statement to rotate log
files after they have reached a certain size or to check that a database file does not grow beyond a specified threshold.
To rotate a log file:
check file myapp.log with path /var/log/myapp.log
if size > 50 MB then
exec "/usr/local/bin/rotate /var/log/myapp.log myapp"
where /usr/local/bin/rotate may be a simple script, such as:
#/bin/bash
/bin/mv $1 $1.`date +%y-%m-%d`
/usr/bin/pkill -HUP $2
Or you may use this statement to trigger the logrotate(8) program, to do an "emergency" rotate. Or to send an alert if a file becomes a
known bottleneck if it grows behind a certain size because of limits in a database engine:
check file mydb with path /data/mydatabase.db
if size > 1 GB then alert
This is a more restrictive form of the first example where the size is explicitly defined (note that the real su size is system dependent):
check file su with path /bin/su
if size != 95564 then exec "/sbin/ifconfig eth0 down"
FILE CONTENT TESTING
The match statement allows you to test the content of a text file by using regular expressions. This is a great feature if you need to
periodically test files, such as log files, for certain patterns. If a pattern match, Monit defaults to raise an alert, other actions are
also possible.
The syntax (keywords in capital) for using this test is:
IF [NOT] MATCH {regex|path} [[<X>] <Y> CYCLES] THEN action
regex is a string containing the extended regular expression. See also regex(7).
path is an absolute path to a file containing extended regular expression on every line. See also regex(7).
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
You can use the NOT statement to invert a match.
The content is only being checked every cycle. If content is being added and removed between two checks they are unnoticed.
On startup the read position is set to the end of the file and Monit continue to scan to the end of file on each cycle. But if the file
size should decrease or inode change the read position is set to the start of the file.
Only lines ending with a newline character are inspected. Thus, lines are being ignored until they have been completed with this character.
Also note that only the first 511 characters of a line are inspected.
IGNORE [NOT] MATCH {regex|path}
Lines matching an IGNORE are not inspected during later evaluations. IGNORE MATCH has always precedence over IF MATCH.
All IGNORE MATCH statements are evaluated first, in the order of their appearance. Thereafter, all the IF MATCH statements are evaluated.
A real life example might look like this:
check file syslog with path /var/log/syslog
ignore match
"^w{3} [ :0-9]{11} [._[:alnum:]-]+ monit[[0-9]+]:"
ignore match /etc/monit/ignore.regex
if match
"^w{3} [ :0-9]{11} [._[:alnum:]-]+ mrcoffee[[0-9]+]:"
if match /etc/monit/active.regex then alert
FILESYSTEM FLAGS TESTING
Monit can test the flags of a filesystem for changes. This test is implicit and Monit will send alert in case of failure by default.
This test is useful for detecting changes of the filesystem flags such as when the filesystem became read-only based on disk errors or the
mount flags were changed (such as nosuid). Each platform provides different set of flags. POSIX define the RDONLY and NOSUID flags which
should work on all platforms. Some platforms (such as FreeBSD) has additonal flags.
The syntax for the fsflags statement is:
IF CHANGED FSFLAGS [[<X>] <Y> CYCLES] THEN action
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
Example:
check filesystem rootfs with path /
if changed fsflags then exec "/my/script"
alert root@localhost
SPACE TESTING
Monit can test file systems for space usage. This test may only be used within a check filesystem service entry in the Monit control file.
Monit will check a filesystem's total space usage. If you only want to check available space for non-superuser, you must set the watermark
appropriately (i.e. total space minus reserved blocks for the superuser).
You can obtain (and set) the superuser's reserved blocks size, for example by using the tune2fs utility on Linux. On Linux 5% of available
blocks are reserved for the superuser by default. On solaris 10% of the blocks are reserved. You can also use tunefs on solaris to change
values on a live filesystem.
The full syntax for the space statement is:
IF SPACE operator value unit [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
operator is a choice of "<",">","!=","==" in c notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal",
"notequal" in human readable form (if not specified, default is EQUAL).
unit is a choice of "B","KB","MB","GB", "%" or long alternatives "byte", "kilobyte", "megabyte", "gigabyte", "percent".
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
INODE TESTING
If supported by the file-system, you can use Monit to test for inodes usage. This test may only be used within a check filesystem service
entry in the Monit control file.
If the filesystem becomes unavailable, Monit will call the service's registered start method, if it is defined and if Monit is running in
active mode. If Monit runs in passive mode or the start methods is not defined, Monit will just send an error alert.
The syntax for the inode statement is:
IF INODE(S) operator value [unit] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
operator is a choice of "<",">","!=","==" in c notation, "gt", "lt", "eq", "ne" in shell sh notation and "greater", "less", "equal",
"notequal" in human readable form (if not specified, default is EQUAL).
unit is optional. If not specified, the value is an absolute count of inodes. You can use the "%" character or the longer alternative
"percent" as a unit.
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
PERMISSION TESTING
Monit can monitor the permission of file objects. This test may only be used within a file, fifo, directory or filesystem service entry in
the Monit control file.
The syntax for the permission statement is:
IF FAILED PERM(ISSION) octalnumber [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
octalnumber defines permissions for a file, a directory or a filesystem as four octal digits (0-7). Valid range: 0000 - 7777 (you can omit
the leading zeros, Monit will add the zeros to the left thus for example "640" is valid value and matches "0640").
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The web interface will show a permission warning if the test failed.
We recommend that you use the UNMONITOR action in a permission statement. The rationale for this feature is security and that Monit does
not start a possible cracked program or script. Example:
check file monit.bin with path "/usr/local/bin/monit"
if failed permission 0555 then unmonitor
If the test fails, Monit will simply send an alert and stop monitoring the file and propagate an unmonitor action upward in a depend tree.
UID TESTING
Monit can monitor the owner user id (uid) of a file object. This test may only be used within a check - file, fifo, directory or
filesystem service entry in the Monit control file.
The syntax for the uid statement is:
IF FAILED UID user [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
user defines a user id either in numeric or in string form.
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The web interface will show a uid warning if the test should fail.
We recommend that you use the UNMONITOR action in a uid statement. The rationale for this feature is security and that Monit does not start
a possible cracked program or script. Example:
check file passwd with path /etc/passwd
if failed uid root then unmonitor
If the test fails, Monit will simply send an alert and stop monitoring the file and propagate an unmonitor action upward in a depend tree.
GID TESTING
Monit can monitor the owner group id (gid) of file objects. This test may only be used within a file, fifo, directory or filesystem service
entry in the Monit control file.
The syntax for the gid statement is:
IF FAILED GID user [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
user defines a group id either in numeric or in string form.
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
The web interface will show a gid warning if the test should fail.
We recommend that you use the UNMONITOR action in a gid statement. The rationale for this feature is security and that Monit does not start
a possible cracked program or script. Example:
check file shadow with path /etc/shadow
if failed gid root then unmonitor
If the test fails, Monit will simply send an alert and stop monitoring the file and propagate an unmonitor action upward in a depend tree.
PID TESTING
Monit can test the process identification number (pid) of a process for changes. This test is implicit and Monit will send a alert in the
case of failure by default.
The syntax for the pid statement is:
IF CHANGED PID [[<X>] <Y> CYCLES] THEN action
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
This test is useful to detect possible process restarts which has occurred in the timeframe between two Monit testing cycles. In the case
that the restart was fast and the process provides expected service (i.e. all tests succeeded) you will be notified that the process was
replaced.
For example sshd daemon can restart very quickly, thus if someone changes its configuration and do sshd restart outside of Monit's control
you will be notified that the process was replaced by a new instance (or you can optionally do some other action such as preventively stop
sshd).
Another example is a MySQL Cluster which has its own watchdog with process restart ability. You can use Monit for redundant monitoring.
Example:
check process sshd with pidfile /var/run/sshd.pid
if changed pid then exec "/my/script"
PPID TESTING
Monit can test the process parent process identification number (ppid) of a process for changes. This test is implicit and Monit will send
alert in the case of failure by default.
The syntax for the ppid statement is:
IF CHANGED PPID [[<X>] <Y> CYCLES] THEN action
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
This test is useful for detecting changes of a process parent.
Example:
check process myproc with pidfile /var/run/myproc.pid
if changed ppid then exec "/my/script"
UPTIME TESTING
The uptime statement may only be used in a check process service entry. If specified in the control file, Monit will test the process
uptime.
Syntax (keywords are in capital):
IF UPTIME [[operator] value [unit]] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y> CYCLES] THEN action]
operator is a choice of "<", ">", "!=", "==" in C notation, "GT", "LT", "EQ", "NE" in shell sh notation and "GREATER", "LESS", "EQUAL",
"NOTEQUAL" in human readable form (if not specified, default is EQUAL).
value is a uptime watermark.
unit is either "SECOND", "MINUTE", "HOUR" or "DAY" (it is also possible to use "SECONDS", "MINUTES", "HOURS", or "DAYS").
action is a choice of "ALERT", "RESTART", "START", "STOP", "EXEC" or "UNMONITOR".
Example of restarting the process if the uptime exceeded 3 days:
check process myapp with pidfile /var/run/myapp.pid
start program = "/etc/init.d/myapp start"
stop program = "/etc/init.d/myapp stop"
if uptime > 3 days then restart
CONNECTION TESTING
Monit is able to perform connection testing via networked ports or via Unix sockets. A connection test may only be used within a check
process or within a check host service entry in the Monit control file.
If a service listens on one or more sockets, Monit can connect to the port (using either tcp or udp) and verify that the service will
accept a connection and that it is possible to write and read from the socket. If a connection is not accepted or if there is a problem
with socket i/o, Monit will assume that something is wrong and execute a specified action. If Monit is compiled with openssl, then ssl
based network services can also be tested.
The full syntax for the statement used for connection testing is as follows (keywords are in capital and optional statements in
[brackets]),
IF FAILED [host] port [type] [protocol|{send/expect}+] [timeout] [retry] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y>
CYCLES] THEN action]
or for Unix sockets,
IF FAILED [unixsocket] [type] [protocol|{send/expect}+] [timeout] [retry] [[<X>] <Y> CYCLES] THEN action [ELSE IF SUCCEEDED [[<X>] <Y>
CYCLES] THEN action]
host:HOST hostname. Optionally specify the host to connect to. If the host is not given then localhost is assumed if this test is used
inside a process entry. If this test was used inside a remote host entry then the entry's remote host is assumed. Although host is
intended for testing name based virtual host in a HTTP server running on local or remote host, it does allow the connection statement to be
used to test a server running on another machine. This may be useful; For instance if you use Apache httpd as a front-end and an
application-server as the back-end running on another machine, this statement may be used to test that the back-end server is running and
if not raise an alert.
port:PORT number. The port number to connect to
unixsocket:UNIXSOCKET PATH. Specifies the path to a Unix socket. Servers based on Unix sockets always run on the local machine and do not
use a port.
type:TYPE {TCP|UDP|TCPSSL}. Optionally specify the socket type Monit should use when trying to connect to the port. The different socket
types are; TCP, UDP or TCPSSL, where TCP is a regular stream based socket, UDP is a datagram socket and TCPSSL specifies that Monit should
use a TCP socket with SSL when connecting to a port. The default socket type is TCP. If TCPSSL is used you may optionally specify the
SSL/TLS protocol to be used and the md5 sum of the server's certificate. The TCPSSL options are:
TCPSSL [SSLAUTO|SSLV2|SSLV3|TLSV1] [CERTMD5 md5sum]
proto(col):PROTO {protocols}. Optionally specify the protocol Monit should speak when a connection is established. At the moment Monit
knows how to speak:
APACHE-STATUS
DNS
DWP
FTP
GPS
HTTP
IMAP
CLAMAV
LDAP2
LDAP3
LMTP
MEMCACHE
MYSQL
NNTP
NTP3
POP
POSTFIX-POLICY
RADIUS
RDATE
RSYNC
SIP
SMTP
SSH
TNS
PGSQL If you have compiled Monit with ssl support, Monit can also speak the SSL variants such as:
HTTPS
FTPS
POPS
IMAPS To use the SSL protocol support you need to define the socket as SSL and use the general protocol name (for example in the case of
HTTPS) :
TYPE TCPSSL PROTOCOL HTTP If the server's protocol is not found in this list, simply do not specify the protocol and Monit will utilize a
default test, including test if it is possible to read and write to the port. This default test is in most cases more than good enough to
deduce if the server behind the port is up or not.
The protocol statement is:
PROTO(COL) {name}
The HTTP protocol supports in addition:
o REQUEST
o HOSTHEADER
o CHECKSUM
PROTO(COL) HTTP [REQUEST {"/path"} [with CHECKSUM checksum] [with HOSTHEADER "string"]
The Host header option can be used to explicit specify the HTTP host header in the request. If not used, Monit will use the hostname or IP-
address of the host as specified in the statement. Specifying a host header is useful if you want to connect to the host using an IP-
address, and the web-server handle name based virtual hosts. Examples:
if failed host 192.168.1.100 port 8080 protocol http
and request '/testing' hostheader 'example.com'
with timeout 20 seconds for 2 cycles
then alert
In addition to the standard protocols, the APACHE-STATUS protocol is a test of a specific server type, rather than a generic protocol.
Server performance is examined using the status page generated by Apache's mod_status, which is expected to be at its default address of
http://www.example.com/server-status. Currently the APACHE-STATUS protocol examines the percentage of Apache child processes which are
o logging (loglimit)
o closing connections (closelimit)
o performing DNS lookups (dnslimit)
o in keepalive with a client (keepalivelimit)
o replying to a client (replylimit)
o receiving a request (requestlimit)
o initialising (startlimit)
o waiting for incoming connections (waitlimit)
o gracefully closing down (gracefullimit)
o performing cleanup procedures (cleanuplimit)
Each of these quantities can be compared against a value relative to the total number of active Apache child processes. If the comparison
expression is true the chosen action is performed.
The apache-status protocol statement is formally defined as (keywords in uppercase):
PROTO(COL) {limit} OP PERCENT [OR {limit} OP PERCENT]*
where {limit} is one or more of: loglimit, closelimit, dnslimit, keepalivelimit, replylimit, requestlimit, startlimit, waitlimit
gracefullimit or cleanuplimit. The operator OP is one of: [<|=|>].
You can combine all of these test into one expression or you can choose to test a certain limit. If you combine the limits you must or'
them together using the OR keyword.
Here's an example were we test for a loglimit more than 10 percent, a dnslimit over 25 percent and a wait limit less than 20 percent of
processes. See also more examples below in the example section.
protocol apache-status
loglimit > 10% or
dnslimit > 50% or
waitlimit < 20%
then alert
Obviously, do not use this test unless the httpd server you are testing is Apache Httpd and mod_status is activated on the server.
send/expect: {SEND|EXPECT} "string" .... If Monit does not support the protocol spoken by the server, you can write your own protocol-test
using send and expect strings. The SEND statement sends a string to the server port and the EXPECT statement compares a string read from
the server with the string given in the expect statement. If your system supports POSIX regular expressions, you can use regular
expressions in the expect string, see regex(7) to learn more about the types of regular expressions you can use in an expect string.
Otherwise the string is used as it is. The send/expect statement is:
[{SEND|EXPECT} "string"]+
Note that Monit will send a string as it is, and you must remember to include CR and LF in the string sent to the server if the protocol
expect such characters to terminate a string (most text based protocols used over Internet does). Likewise monit will read up to 256 bytes
from the server and use this string when comparing the expect string. If the server sends strings terminated by CRLF, (i.e. "
") you may
remember to add the same terminating characters to the string you expect from the server.
As mentioned above, Monit limits the expect input to 255 bytes. You can override the default value by using this set statement at the top
of the Monit configuration file:
SET EXPECTBUFFER <number> ["b"|"kb"|"mb"]
For example, to set the expect buffer to read 10 kilobytes:
set expectbuffer 10 kb
Note, if you want to test the number of bytes returned from the server you need to work around a bound check limit in POSIX regex. You
cannot use something like expect ".{5000}" as the max number in a boundary check usually is {255}. However this should work, expect
"(.{250}){20}"
You can use non-printable characters in a send string if needed. Use the hex notation, xHEXHEX to send any char in the range
x00- xFF, that is, 0-255 in decimal. This may be useful when testing some network protocols, particularly those over UDP. For example,
to test a quake 3 server you can use the following,
send "