Nagios Passive Check Guide

Contents

Introduction

This document describes and provides examples for creating a passive check for Nagios at UCSD. A passive check is one in which the test initiates the updated and sends information to Nagios. This is opposed to an active check where nagios would connect to the system to be checked and actively run some sort of test.

A passive check is a cron job run on the client that reports the status of the service you are monitoring. The details of the program that performs the check are specific to the checks being performed and can be as simple or as complex as you like however the check program must produce output that is either;

  • One of three states OK, WARN, or CRITICAL, 0, 1 and 2 respective or;
  • A numerical value that nagios can interpret itself and assign an OK, WARN or CRITICAL value to.

The former is preferred so that Nagios does not have to contain as much knowledge of your test. Otherwise nagios has to be taught was is a potential WARN or CRITICAL state and this increases the complexity of the NAGIOS configuration.

In order to properly report to Nagios your script must output a properly formatted service check packet.

To check the status of passive/active checks check the page: http://t2sentry0.t2.ucsd.edu/nagios

Nagios Passive Check Script output

Configuring your test to report a passive check to Nagos involves adding a little blurb to the end of your script that will output one of three codes: OK(code: 0 ), warning (code:1), critical (code:2), and then piping that output in the proper format to the send_nsca command on the local machine which is running your test.

Example packet

The following packet is piped to the send_nsca command.

localhost      TestMessage      0      This is a test message

Example Script

Here is an example script that checks the output of "df":

#!/usr/bin/perl

#config.pl contains host and nsca information which i'll show at the end of this script 
require "/usr/lib/nagios/passive/config.pl";


$service="df_check";

# Set the timeout for the process
$timeout=5;

eval {
    local $SIG{ALRM} = sub { die "alarm\n" }; # NB: \n required
    alarm $timeout;
    $pid = fork();
    die "fork() failed: $!" unless defined $pid;
      if ($pid) {
        wait();
      }
    else {
            exec "df > /dev/null";
    }
    alarm 0;
};
if ($@) {
      kill 9,$pid; # Hopefully it will be killable
      $result = "CRITICAL - df did NOT return within $timeout seconds";
      $code = 2;
}

##this script has only binary output, either the command works or doesn't, an example of 
trinary output can be found further down
else {
    $code = 0;
    $result = "OK - df return within $timeout seconds";
}


# Get our hostname
##This part just reports back the hostname, for this script it's cabinet-x-y-z.local
$hostname=`hostname`;
$hostname =~ /(\d+-\d+-\d+\.local)/;
$host="cabinet-$1";

##Following the conventions of this script, one can copy this segment of code which formats/sends the packet via nsca
open(SEND,"|$send_nsca") || die "Could not run $send_nsca: $!\n";
print SEND "$host\t$service\t$code\t$result\n";
close SEND;

Contents of config.pl:

#!/usr/bin/perl

#i'll setup NSCA on the clients so that this config script can be copied verbatim. 

$nsca_host="t2sentry0.local"; 
$config="/etc/nagios/send_nsca.cfg";
$send_nsca="/usr/sbin/send_nsca -c $config -H $nsca_host"; 

Example of trinary output: 

#!/usr/bin/perl

require "/usr/lib/nagios/passive/config.pl";


$warning=shift;
$critical=shift;
$device=shift;
$service=shift;

$cmd="/usr/lib/nagios/plugins/check_disk -w $warning -c $critical $device";


$hostname=`hostname`;
$hostname =~ /(\d+-\d+-\d+\.local)/;
$host="cabinet-$1";


$RESULT=`$cmd`;

if ($RESULT =~ /OK/) { $code = 0; }
if ($RESULT =~ /WARNING/) { $code = 1; }  
if ($RESULT =~ /CRITICAL/) { $code = 2; }

Configuring Nagios to Support your new Service

Once your script is NSCA compliant, send the service name to the UCSD Nagios administrator (i.e. df_check for the first example), along with the host the check will be run on

*As seen in the df_check example, to make changing threshold values more convenient, it's best to now hard code them into the script.*

Depending on where the passive check runs and whether or not you have administrative access you may need the Nagios administrator to configure cron. Otherwise you can configure cron to run your passive check yourself. Here is an example crontab file.

SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
HOME=/

0-59 * * * * root /usr/lib/nagios/passive/df_check 2>&1 > /dev/null

# run-parts
01 * * * * root run-parts /etc/cron.hourly
02 4 * * * root run-parts /etc/cron.daily
22 4 * * 0 root run-parts /etc/cron.weekly
42 4 1 * * root run-parts /etc/cron.monthly

Additional Passive Service Check Examples

...

-- BruceThayre - 30 Sep 2006

-- TerrenceMartin - 2 Oct 2006

Topic revision: r4 - 2006/11/07 - 22:57:18 - TerrenceMartin
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback