[Pkg-nagios-devel] Bug#405587: check_disk incorrectly computes high water marks in the perfdata output

Ralph Rößner roessner at capcom.de
Thu Jan 4 20:06:38 CET 2007


Package: nagios-plugins
Version: 1.4.5-1
Tags: patch

When working on multiple paths (-p options), check_disk fails to
re-initialize variables for each check. This results in incorrect high
water mark values displayed in the performance data. Note that the check
is performed correctly, only the perfdata output is wrong.

Example invocation of the distributed check_disk (line breaks inserted
for readability):

pluto:~# /usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p / -p /ccftp
DISK OK - free space: / 25628 MB (95% inode=98%);
 /ccftp 86131 MB (75% inode=99%);
 | /=1100MB;22526;25342;0;28158
 /ccftp=27913MB;22526;25342;0;120148

Note that the high water mark Mbyte counts calculated for the / file
system are repeated for the /ccftp file system even though it has a
different size than the first one. The repeated values are not 80% or
90%, respectively, of the /ccftp file system's size. A re-evaluation of
the check condition based on the performace data would even place the
/ccftp file system into critical state (as it does for our monitoring
setup).

The expected result would reflect the different file system sizes, like
this (invocation of a patched plugin):

pluto:~# /tmp/check_disk -w 20% -c 10% -p / -p /ccftp
DISK OK - free space: / 25628 MB (95% inode=98%);
 /ccftp 86131 MB (75% inode=99%);
 | /=1100MB;22526;25342;0;28158
 /ccftp=27913MB;96118;108133;0;120148


The reason for this incorrect behaviour is that the variables holding
the high water marks for warning and critical state are reused for each
pass of the test loop without being reinitialized. Part of the high
water mark caluclation is taking the minimum of the previously
calculated high water mark and a the result of a fresh calculation (this
is intended to the combine the percentile and absolute threshold
conditions). Thus, the lower high water marks of the first file system
are effectively reused for the second, larger, file system.

For confirmation, if you reorder the -p arguments so that the file
systems are checked in decreasing size order then the high water marks
are also decreasing, and the minium calculation does not hurt:

pluto:~# /tmp/check_disk -w 20% -c 10% -p /ccftp -p /
 DISK OK - free space: /ccftp 86131 MB (75% inode=99%);
 / 25628 MB (95% inode=98%);
 | /ccftp=27913MB;96118;108133;0;120148
 /=1100MB;22526;25342;0;28158


The following patch adds the reinitialization of the high water mark variables
within the test loop. The amount of context displayed has been extended
to show the first high water mark calculation that follows. The
initialization values are the same that are used at the start of the
function (but outside the test loop).

--8><-----------
***************
*** 279,298 ****
--- 279,302 ----
        temp_result = get_status(dused_pct, path->usedspace_percent);
        if (verbose >=3) printf("Usedspace_percent result=%d\n", temp_result);
        disk_result = max_state( disk_result, temp_result );

        temp_result = get_status(dused_inodes_percent, path->usedinodes_percent);
        if (verbose >=3) printf("Usedinodes_percent result=%d\n", temp_result);
        disk_result = max_state( disk_result, temp_result );

        result = max_state(result, disk_result);

+       /* reset high tide values */
+       warning_high_tide = UINT_MAX;
+       critical_high_tide = UINT_MAX;
+
        /* What a mess of units. The output shows free space, the perf data shows used space. Yikes!
           Hack here. Trying to get warn/crit levels from freespace_(units|percent) for perf
           data. Assumption that start=0. Roll on new syntax...
        */
        if (path->freespace_units->warning != NULL) {
          warning_high_tide = dtotal_units - path->freespace_units->warning->end;
        }
        if (path->freespace_percent->warning != NULL) {
          warning_high_tide = abs( min( (double) warning_high_tide, (double) (1.0 - path->freespace_percent->warning->end/100)*dtotal_units ));
        }
--8><-----------

A check_disk plugin with this patch applied produced my second example
output.


The bug is contained within the check_disk.c code and does not depend on
other software, so I neglect to include all the dependency, libc, kernel
information for nagios-plugins-basic on our test system.

Sincerely,
   Ralph Rößner




More information about the Pkg-nagios-devel mailing list