[Nut-upsdev] Re: [nut-Patches][303751] Checking UPS Temperature

Peter Selinger selinger at mathstat.dal.ca
Thu Jan 4 18:52:12 CET 2007


One disadvantage of handling it through a script is that is will not
be done by default. Most users probably don't know about the problem
of burning batteries, as it is not very common. 

A potential problem with Eric Wilde's patch is that it is not general
enough; some UPS models have an boolean OVERHEAT flag although they
don't report the actual temperature. So the UPSOVERTEMP mechanism will
not work for such models. The decision which temperature is "too high"
should be made in the driver, not in upsmon, as the normal operating
temperature could differ for different devices. For devices that
support temperature readings, this could be based on a threshold
(which can be made user-settable via a driver configuration variable
if necessary). For devices that only have a boolean OVERHEAT flag,
they should report this flag directly. 

A few drivers already support the OVERHEAT flag in ups.status. (None
seem to support ups.alarm, although this was perhaps originally
intended for this purpose). I wonder if it would make sense to allow
upsmon to react to the OVERHEAT flag.

-- Peter

Comment By: Arjen de Korte (adkorte-guest)
> Date: 2007-01-04 16:35
> 
> Message:
> You can handle this through a script that monitors the UPS
> temperature through upsc without any changes to upsmon by
> running
> 
> 	upsc myups at somewhere ups.temperature
> 
> and parse the results. If it determines the temperature is
> too high, it could send off a message to the operator or
> switch it off through sending an instcmd to the UPS to
> shutdown the UPS and keep it off.
> 
> I'm not in favor of doing this in upsmon, since only the
> 'ups.status' is guaranteed to be available for each driver.
> If we start adding variables that *might* be supported,
> there is no end to the number of possible variables. Where
> would we stop?
> 
> Furthermore, polling for the temperature doesn't need to be
> done as frequently as the line voltage, since it won't
> change that quickly (unless the UPS *is* on fire already).
> You don't need the near instantaneous reaction like we have
> for input/battery state changes.
> 
> Adding 'TEMP' to the 'ups.status' might be a good idea, but
> requires changes to the driver. It would be a much better
> option than changing upsmon in the way proposed here though.
> It should be the driver to decide something is not right and
> upsmon then acts upon that notice. I'm against reversing the
> order of events, since if upsmon is somehow not able to talk
> to the driver, nothing is done to resolve the situation.
> 


nut-patches at alioth.debian.org wrote:
> 
> Patches item #303751, was opened at 2006-08-12 00:04
> >Status: Closed
> Priority: 3
> Submitted By: Eric Wilde (ewilde-guest)
> Assigned to: Nobody (None)
> Summary: Checking UPS Temperature 
> >Resolution: Rejected
> Group: None
> Category: None
> 
> 
> Initial Comment:
> Last week, one of my UPS burned the batteries up (plates buckled, cases bulging, several of the sealed vent caps opened, plastic welded together).  The batteries eventually appear to have shorted and the UPS shut down, without warning, despite being on line power (lucky the equipment it was powering had a sense of humor).  From reading the log file posthumously, I see that the internal temps in the UPS reached 81 degrees Celsius, which is pretty hot.
> 
> Normal operating temperatures for this UPS are in the 40-50 degree range.  It went up into the 75-80 degree range 36 hours before the batteries shorted out so it appears that increased temperature is an excellent predictor of battery failure.
> 
> This being the case, I added the following code to upsmon to monitor temperature (changes based on nut-2.0.0).
> 
>                                  Eric Wilde
> 
>  
> --- upsmon.h.orig	2004-03-08 07:09:28.000000000 -0500
> +++ upsmon.h	2006-08-11 13:38:03.000000000 -0400
> @@ -29,4 +29,5 @@
>  /* was ST_FIRST 0x080 */
>  #define ST_CONNECTED	0x100	/* upscli_connect returned OK		*/
> +#define ST_OVERTEMP	0x200	/* UPS is running overtemp		*/ //EW
>  
>  /* required contents of flag file */
> @@ -72,4 +73,5 @@
>  #define NOTIFY_NOCOMM	8	/* UPS hasn't been contacted in awhile	*/
>  #define NOTIFY_NOPARENT	9	/* privileged parent process died	*/
> +#define NOTIFY_OVERTEMP	10	/* UPS went to overtemp			*/ //EW
>  
>  /* notify flag values */
> @@ -101,4 +103,5 @@
>  	{ NOTIFY_NOCOMM,   "NOCOMM",   NULL, "UPS %s is unavailable", 0 },
>  	{ NOTIFY_NOPARENT, "NOPARENT", NULL, "upsmon parent process died - shutdown impossible", 0 },
> +	{ NOTIFY_OVERTEMP, "OVERTEMP", NULL, "UPS %s is running at an excessive temperature", 0 }, //EW
>  	{ 0, NULL, NULL, NULL, 0 }
>  };
> 
> 
> --- upsmon.c.orig	2004-01-31 16:00:02.000000000 -0500
> +++ upsmon.c	2006-08-11 16:11:15.000000000 -0400
> @@ -50,4 +50,7 @@
>  static	int	rbwarntime = 43200;
>  
> +	/* default UPS overtemp value (degrees Celcius - 0.0 means ignore) */ //EW
> +static	double	upsovertemp = 0.0; //EW
> +
>  	/* default "all communications down" warning interval (seconds) */
>  static	int	nocommwarntime = 300;
> @@ -546,4 +549,13 @@
>  	}
>  
> +//EW >>>>>>
> +	if (!strcmp(var, "temp")) {
> +		query[0] = "VAR";
> +		query[1] = ups->upsname;
> +		query[2] = "ups.temperature";
> +		numq = 3;
> +	}
> +//EW <<<<<<
> +
>  	if (numq == 0) {
>  		upslogx(LOG_ERR, "get_var: programming error: var=%s", var);
> @@ -770,4 +782,21 @@
>  }
>  
> +//EW >>>>>>
> +static void ups_overtemp(utype *ups)
> +{
> +	if (flag_isset(ups->status, ST_OVERTEMP)) {		/* no change */
> +		debug("ups_overtemp(%s) (no change)\n", ups->sys);
> +		return;
> +	}
> +
> +	debug("ups_overtemp(%s) (first time)\n", ups->sys);
> +
> +	/* must have changed from !OVERTEMP to OVERTEMP, so notify */
> +
> +	do_notify(ups, NOTIFY_OVERTEMP);
> +	setflag(&ups->status, ST_OVERTEMP);
> +}
> +//EW <<<<<<
> +
>  /* cleanly close the connection to a given UPS */
>  static void drop_connection(utype *ups)
> @@ -1163,4 +1192,12 @@
>  	}
>  
> +//EW >>>>>>
> +	/* UPSOVERTEMP <num> */
> +	if (!strcmp(arg[0], "UPSOVERTEMP")) {
> +		upsovertemp = atof(arg[1]);
> +		return 1;
> +		}
> +//EW <<<<<<
> +
>  	/* NOCOMMWARNTIME <num> */
>  	if (!strcmp(arg[0], "NOCOMMWARNTIME")) {
> @@ -1563,4 +1600,31 @@
>  }
>  
> +//EW >>>>>>
> +/* deal with the ups.temperature for this ups */
> +static void parse_temperature(utype *ups, char *temperature)
> +{
> +	double temp;
> +
> +	debug("     temperature: [%s]\n", temperature);
> +
> +	/* empty response is ignored -- not all ups return temperatures */
> +	if (!strcmp(temperature, "")) {
> +		clearflag(&ups->status, ST_OVERTEMP);
> +		return;
> +	}
> +
> +	/* get the temperature as a double */
> +	temp = atof(temperature);
> +
> +	/* check the temperature against the overtemp value */
> +	if (temp > upsovertemp)
> +		ups_overtemp(ups);
> +	else
> +		clearflag(&ups->status, ST_OVERTEMP);
> +
> +	debug("\n");
> +}
> +//EW <<<<<<
> +
>  /* see what the status of the UPS is and handle any changes */
>  static void pollups(utype *ups)
> @@ -1578,4 +1642,19 @@
>  		debug("polling ups: %s\n", ups->sys);
>  
> +//EW >>>>>>
> +	/* if the user wants us to check for overtemp */
> +	if (upsovertemp > 0.0) {
> +		set_alarm();
> +
> +		if (get_var(ups, "temp", status, sizeof(status)) == 0) {
> +			clear_alarm();
> +			parse_temperature(ups, status);
> +		}
> +
> +		/* fallthrough: no communications */
> +		clear_alarm();
> +	}
> +//EW <<<<<<
> +
>  	set_alarm();
>  
> 
> 
> +++ upsmon.config (changes somewhere in the config file)
> 
> # --------------------------------------------------------------------------
> # UPSOVERTEMP - Temperature (in Celcius) which is too high for operation
> #
> # upsmon will check all UPS that return temperature information against this
> # value.  If the UPS temperature exceeds this value, an OVERTEMP notification
> # will be generated.
> #
> # Note that certain UPS are renown for cooking and even burning up batteries
> # (some reports of spectacular battery fires have been received).  From actual
> # observed log data, it appears that prior to burning up the batteries, the
> # UPS internal temperature rises significantly.  Hence, monitoring the UPS
> # temperature can be a valuable tool towards detecting battery cooking, before
> # the UPS burns the place down (the UPS is supposed to solve problems, not
> # cause them, isn't it).
> #
> # Once again, typical observed internal temperatures are in the 40 to 50 degree
> # Celcius range.  Observed temperatures of 80 degrees Celcius prior to an
> # actual battery failure are indicative of pending failure.  Thus, to be safe,
> # the the UPSOVERTEMP value should be set in the 60-70 degree range.
> 
> UPSOVERTEMP 60.0
> 
> # OVERTEMP : The UPSOVERTEMP value has been exceeded (for UPS that return temp)
> 
> NOTIFYFLAG OVERTEMP SYSLOG+EXEC
> 
> 



More information about the Nut-upsdev mailing list