Difference: NagiosPassiveCheckGuide (1 vs. 4)

Revision 42006/11/07 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios Passive Check Guide

Line: 96 to 96
  #i'll setup NSCA on the clients so that this config script can be copied verbatim.
Changed:
<
<
$nsca_host="t2sentry0.t2.ucsd.edu";
>
>
$nsca_host="t2sentry0.local";
 $config="/etc/nagios/send_nsca.cfg"; $send_nsca="/usr/sbin/send_nsca -c $config -H $nsca_host";

Revision 32006/11/07 - Main.BruceThayre

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Nagios Passive Check Guide

Line: 19 to 19
  In order to properly report to Nagios your script must output a properly formatted service check packet.
Added:
>
>
To check the status of passive/active checks check the page: http://t2sentry0.t2.ucsd.edu/nagios
 

Nagios Passive Check Script output

Configuring your test to report a passive check to Nagos involves adding a little blurb to the end of your script that will output one of three codes: OK(code: 0 ), warning (code:1), critical (code:2), and then piping that output in the proper format to the send_nsca command on the local machine which is running your test.

Revision 22006/10/02 - Main.TerrenceMartin

Line: 1 to 1
 
META TOPICPARENT name="WebHome"
Changed:
<
<
So essentially all a passive check is in nagios is a cron job run on the client that reports the status
of whatever service you want. The output is sent to nagios via the NSCA service, the only issue here 
is to get your script to output a properly formatted service check packet. So if you have a script, 
you'll have to make some adjustments for nagios to process the output correctly.   

1.  Considering most of you will be monitoring specific services, you'll probably want to set the 
notification thresholds yourself.  This will involve adding a little blurb to the end of your script that will 
output one of three codes: OK(code: 0 ), warning (code:1), critical (code:2), and then pipe that 
output in the proper format to NSCA.   

>
>

Nagios Passive Check Guide

 
Changed:
<
<
The proper format of a service check packet is this: host_name;description;return_code;plugin_output
>
>

Contents

 
Changed:
<
<
An example packet:
>
>

Introduction

This document describes and provides examples for creating a passive check for Nagios at UCSD. A passive check is one in which the test initiates the updated and sends information to Nagios. This is opposed to an active check where nagios would connect to the system to be checked and actively run some sort of test.

A passive check is a cron job run on the client that reports the status of the service you are monitoring. The details of the program that performs the check are specific to the checks being performed and can be as simple or as complex as you like however the check program must produce output that is either;

  • One of three states OK, WARN, or CRITICAL, 0, 1 and 2 respective or;
  • A numerical value that nagios can interpret itself and assign an OK, WARN or CRITICAL value to.

The former is preferred so that Nagios does not have to contain as much knowledge of your test. Otherwise nagios has to be taught was is a potential WARN or CRITICAL state and this increases the complexity of the NAGIOS configuration.

In order to properly report to Nagios your script must output a properly formatted service check packet.

Nagios Passive Check Script output

Configuring your test to report a passive check to Nagos involves adding a little blurb to the end of your script that will output one of three codes: OK(code: 0 ), warning (code:1), critical (code:2), and then piping that output in the proper format to the send_nsca command on the local machine which is running your test.

Example packet

The following packet is piped to the send_nsca command.

 localhost TestMessage? 0 This is a test message
Added:
>
>

Example Script

  Here is an example script that checks the output of "df":
Added:
>
>
 #!/usr/bin/perl

#config.pl contains host and nsca information which i'll show at the end of this script

Line: 67 to 85
 open(SEND,"|$send_nsca") || die "Could not run $send_nsca: $!\n"; print SEND "$host\t$service\t$code\t$result\n"; close SEND;
Added:
>
>
  Contents of config.pl:
Added:
>
>
 #!/usr/bin/perl

#i'll setup NSCA on the clients so that this config script can be copied verbatim.

Line: 102 to 123
 if ($RESULT =~ /OK/) { $code = 0; } if ($RESULT =~ /WARNING/) { $code = 1; } if ($RESULT =~ /CRITICAL/) { $code = 2; }
Added:
>
>

Configuring Nagios to Support your new Service

 
Changed:
<
<
2. Once your script is NSCA compliant, send me the service name (i.e. df_check for the first example), and the host the check will be run on
>
>
Once your script is NSCA compliant, send the service name to the UCSD Nagios administrator (i.e. df_check for the first example), along with the host the check will be run on
  *As seen in the df_check example, to make changing threshold values more convenient, it's best to now hard code
Changed:
<
<
them into the script. And I will set up the crontab as so:
>
>
them into the script.*

Depending on where the passive check runs and whether or not you have administrative access you may need the Nagios administrator to configure cron. Otherwise you can configure cron to run your passive check yourself. Here is an example crontab file.

 SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin HOME=/
Deleted:
<
<
0-59 * * * * root /usr/lib/nagios/passive/check_up 2>&1 > /dev/null 0-59 * * * * root /usr/lib/nagios/passive/check_condor_master 2>&1 > /dev/null 0-59 * * * * root /usr/lib/nagios/passive/freeswap 80 50 "Swap" 2>&1 > /dev/null 0-59 * * * * root /usr/lib/nagios/passive/freespace 20 10 / "/ Free Space" 2>&1 > /dev/null 0-59 * * * * root /usr/lib/nagios/passive/freespace 20 10 /state/data "/state/data Free Space" 2>&1 > /dev/null
 0-59 * * * * root /usr/lib/nagios/passive/df_check 2>&1 > /dev/null

# run-parts

Line: 124 to 146
 02 4 * * * root run-parts /etc/cron.daily 22 4 * * 0 root run-parts /etc/cron.weekly 42 4 1 * * root run-parts /etc/cron.monthly
Added:
>
>

Additional Passive Service Check Examples

...

  -- BruceThayre - 30 Sep 2006
Changed:
<
<
>
>
-- TerrenceMartin - 2 Oct 2006

Revision 12006/10/01 - Main.BruceThayre

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"
So essentially all a passive check is in nagios is a cron job run on the client that reports the status
of whatever service you want. The output is sent to nagios via the NSCA service, the only issue here 
is to get your script to output a properly formatted service check packet. So if you have a script, 
you'll have to make some adjustments for nagios to process the output correctly.   

1.  Considering most of you will be monitoring specific services, you'll probably want to set the 
notification thresholds yourself.  This will involve adding a little blurb to the end of your script that will 
output one of three codes: OK(code: 0 ), warning (code:1), critical (code:2), and then pipe that 
output in the proper format to NSCA.   

The proper format of a service check packet is this: 
host_name;description;return_code;plugin_output

An example packet: 
localhost      TestMessage      0      This is a test message

Here is an example script that checks the output of "df": 

#!/usr/bin/perl

#config.pl contains host and nsca information which i'll show at the end of this script 
require "/usr/lib/nagios/passive/config.pl";


$service="df_check";

# Set the timeout for the process
$timeout=5;

eval {
    local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
    alarm $timeout;
    $pid = fork();
    die "fork() failed: $!" unless defined $pid;
		if ($pid) {
        wait();
		}
    else {
				exec "df > /dev/null";
    }
    alarm 0;
};
if ($@) {
		kill 9,$pid; # Hopefully it will be killable
		$result = "CRITICAL - df did NOT return within $timeout seconds";
		$code = 2;
}

##this script has only binary output, either the command works or doesn't, an example of 
trinary output can be found further down
else {
	 $code = 0;
	 $result = "OK - df return within $timeout seconds";
}


# Get our hostname
##This part just reports back the hostname, for this script it's cabinet-x-y-z.local
$hostname=`hostname`;
$hostname =~ /(\d+-\d+-\d+\.local)/;
$host="cabinet-$1";

##Following the conventions of this script, one can copy this segment of code which formats/sends the packet via nsca
open(SEND,"|$send_nsca") || die "Could not run $send_nsca: $!\n";
print SEND "$host\t$service\t$code\t$result\n";
close SEND;

Contents of config.pl: 
#!/usr/bin/perl

#i'll setup NSCA on the clients so that this config script can be copied verbatim. 

$nsca_host="t2sentry0.t2.ucsd.edu"; 
$config="/etc/nagios/send_nsca.cfg";
$send_nsca="/usr/sbin/send_nsca -c $config -H $nsca_host"; 

Example of trinary output: 

#!/usr/bin/perl

require "/usr/lib/nagios/passive/config.pl";


$warning=shift;
$critical=shift;
$device=shift;
$service=shift;

$cmd="/usr/lib/nagios/plugins/check_disk -w $warning -c $critical $device";


$hostname=`hostname`;
$hostname =~ /(\d+-\d+-\d+\.local)/;
$host="cabinet-$1";


$RESULT=`$cmd`;

if ($RESULT =~ /OK/) { $code = 0; }
if ($RESULT =~ /WARNING/) { $code = 1; }  
if ($RESULT =~ /CRITICAL/) { $code = 2; }

2.  Once your script is NSCA compliant, send me the service name (i.e. df_check for the first example),
 and the host the check will be run on  

*As seen in the df_check example, to make changing threshold values more convenient, it's best to now hard code
them into the script.  And I will set up the crontab as so:
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
HOME=/

0-59 * * * * root /usr/lib/nagios/passive/check_up 2>&1 > /dev/null 
0-59 * * * * root /usr/lib/nagios/passive/check_condor_master 2>&1 > /dev/null
0-59 * * * * root /usr/lib/nagios/passive/freeswap 80 50 "Swap" 2>&1 > /dev/null
0-59 * * * * root /usr/lib/nagios/passive/freespace 20 10 / "/ Free Space" 2>&1 > /dev/null
0-59 * * * * root /usr/lib/nagios/passive/freespace 20 10 /state/data "/state/data Free Space" 2>&1 > /dev/null
0-59 * * * * root /usr/lib/nagios/passive/df_check 2>&1 > /dev/null

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

-- Main.BruceThayre - 30 Sep 2006

 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback