[Nut-upsdev] Suspend-to-disk & NUT (solved!)

Arjen de Korte nut+devel at de-korte.org
Fri Jan 27 13:14:12 UTC 2006


I finally nailed down the problem with upsd declaring data stale when
resuming from suspend. The problem is two fold (not surprisingly):

1) The dstate_dataok() and dstate_datastale() routines in
'drivers/dstate.c' will only 'broadcast' *changes* in the driver state.
Since the driver has no notion of time, it won't notice at all that it was
suspended. Therefor, after resuming it is business as usual for the driver
and it won't broadcast any state change (it doesn't need to).

2) In 'server/upsd.c' the routine 'sstate_dead()' is used to update the
driver state. This is where the actual problem is located:

> int sstate_dead(upstype *ups, int maxage)
> {
>         time_t        now;
>         double        elapsed;
>
>         /* an unconnected ups is always dead */
>         if (ups->sock_fd == -1)
>                 return 1;        /* dead */
>
>         time(&now);
>
>         /* ignore DATAOK/DATASTALE unless the dump is done */
>         if (ups->dumpdone)
>                 if (!ups->data_ok)
>                         return 1;        /* dead */
>
>         elapsed = difftime(now, ups->last_heard);
>         if (elapsed > maxage)
>                 return 1;        /* dead */

Oops, there it is! Unless the suspend-to-disk and subsequent wake up take
less than 'maxage' (default 15) seconds, the UPS will be declared dead, we
won't even try to see if it is still alive. This will virtually always be
the case so we have a deadlock here. Instead of using the current time, I
propose to use 'ups->last_ping'. We actually want to know whether the time
elapsed since the last time we checked and the last time we heard an
answer is greater than 'maxage' seconds. This is not neccessarily related
to the current time (after a suspend-to-disk for instance, it isn't). This
obsoletes my previous patch to make changes in 'drivers/dstate.c'. I will
create a patch later today and try to upload that to the Development
branch (if I manage to make that work).

>         /* somewhere beyond the halfway point - prod it to make it talk */
>         if (elapsed > (maxage / 2))
>                 sendping(ups, now, maxage);
>         return 0;
> }

Furthermore I think 'docs/new-drivers.txt' needs to be updated. From what
is written there, I assumed that calling dstate_dataok() regularly is
needed and will tell upsd that the driver is still there. It turns out it
isn't and that one only needs to call this when the state has changed. If
the state hasn't changed, it will be silently ignored anyway (or we have
plans to do something else with it). In fact, declaring the UPS dead is
due to the fact that it is not responding to a 'ping', not because
dstate_dataok() hasn't been called lately.

Best regards,
Arjen
-- 
Eindhoven - The Netherlands
Key fingerprint - 66 4E 03 2C 9D B5 CB 9B  7A FE 7E C1 EE 88 BC 57



More information about the Nut-upsdev mailing list