[Nut-upsdev] NUT drivers socket polling issues

vencik vencik at razdva.cz
Fri Oct 12 17:45:35 UTC 2012


Hello gents,

while getting rid of gettimeofday, I've spotted something I'd like
to discuss in implementation of socket polling in
driver/main.c::main and driver/dstate.c::dstate_poll_fds

My observations follow:

1/ I'm not very happy with the poll mechanism design as whole.
I mean, typically, you implement the poll as { poll(); process(); }*
loop; in pseudo-code:

shut_down = false;

fd_set.init(...);
timeout.set(...);

while (!shut_down) {
     status = poll(fd_set, timeout);

     switch (status) {
         case POLL_ERROR:
             // Handle error ...

             break;

         case POLL_TIMEOUT:
             // Handle timeout ...

             timeout.set(...);

             break;

         case POLL_EVENT:
             // Handle data ...

             break;
}


I.e. you typically have a poll function call (select is used in NUT),
and you process the result of it in a loop.

In driver/main.c, this is done like this { poll_and_process(); }*,
i.e. there's an empty loop around function dstate_poll_fds
which calls select, processes the result and returns.
This approach has the drawback that logic of the { poll(); process(); }*
loop is cut by an interface; you need to propagate info about poll result
through that interface, which is problematic and ugly.


2/ There is actually possibility of busy loop in case the socket becomes
invalid (and any decent code should always count with that).
If this happen, select will return and the dstate_poll_fds
function will return almost immediately, too (after logging a message).
Almost all the time, the return code shall be 0 (timeout didn't occur).
This means that the dstate_poll_fds function will be immediately
re-executed, select will almost immediately return...  Busy loop.


3/ Setting of the timeout is just weird.
I actually believe that the intention was completely different
than what's implemented.
The dstate_poll_fds function 1st argument is the struct timeval
passed by value(!)  Ergo, each time it's called in the { 
poll_and_process() }*
loop, it's simply set to the same time (in future).
The function then computes a difference with the actual time
and uses the result as select timeout...
I.e, if the poll_interval was simply passed to the function
and directly used as select timeout, the result would be exactly
the same.
I think this definitely isn't the intended functionality; obviously,
what was ment is to poll at most poll_interval seconds, then
update data and poll again; i.e. the timeout should've been
propagated through the dstate_poll_fds interface while being
updated by select.  If done like this, the code would actually make
sense, although...

4/ The propagated timeout MUST NOT be passed directly to select
... although there is one small problem: in case of error,
select invalidates BOTH the FD set AND the timeout.
So, the timeout may actually get set to just weird very high
numbers, which will painfully mess the next (and further) polling.
No problem, though, we just need to copy the timeout for select.
However, this whole mechanism is just strange; hence my point 1/.
Not to mention that it can't really work, now (because of the timeout
copying with every call).


I think this code should be reviewed.  For now, I'm going to
just fix it so that it works as probably intended, before,
but that won't fix neither the potential busy loop problem
nor the broken { poll(); process() }* polling logic.
I don't want to rewrite that in scope of "getting rid of
obsolete gettimeofdate" task; I think a feature req.
or something is due way.


Comments?  Ideas?  Recommendations?  All welcome...

Thanks,

Regards,

vasek

--
Vaclav Krpec

Network UPS Tools Project
Eaton Opensource Team

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.alioth.debian.org/pipermail/nut-upsdev/attachments/20121012/901e6844/attachment.html>


More information about the Nut-upsdev mailing list