[Pkg-samba-maint] DO NOT REPLY [Bug 3204] winbindd: Exceeding 200 client connections, no idle connection found

samba-bugs at samba.org samba-bugs at samba.org
Tue Jan 13 20:37:42 UTC 2009


https://bugzilla.samba.org/show_bug.cgi?id=3204





------- Comment #61 from realrichardsharpe at gmail.com  2009-01-13 14:37 CST -------
Ahhhh, here, I suspect is the problem. Here is a log entry:

lib/util_tdb.c:tdb_chainlock_with_timeout_internal(84) 
tdb_chainlock_with_timeout_internal: alarm (40) timed out for key
ranger.msbdomain.lan in tdb /etc/samba/secrets.tdb
[2008/12/21 10:33:51.959971, 0, pid=17551/winbindd]
nsswitch/winbindd_cm.c:cm_prepare_connection(644)  cm_prepare_connection: mutex
grab failed for <dc name redacted>

What has happened here is two fold:

1. The code in 3.0.25 (up to and including possibly 3.0.30) had a bug in it
because we were not properly handling timeouts in the brlock code in
tdb/common/lock.c. The timeout handler would be called in
tdb_chainlock_with_timeout_internal, but the loop in tdb/common/lock:tdb_brlock
did this:

        do {
                ret = fcntl(tdb->fd,lck_type,&fl);
        } while (ret == -1 && errno == EINTR);

Which took us straight back into the fcntl. It the problem was simply that some
other process (winbindd?) had the lock for a extraordinary period of time
(longer that the 40 second timeout) the timeout counter would be called but
then we would go back to waiting on the lock.

2. When we finally got the lock, we would return to
tdb_chainlock_with_timeout_internal which had a bug. It just looked at the
timeout count, and if non-zero, returned an error. Now the process that was
waiting for the lock has the lock /mutex but does not know it and is unlikely
to release the lock. This would be more likely if multiple processes were
waiting for the mutex ... 

I alerted Jeremy to a race in the 3.0.31 and above code and he has fixed that
in the latest release, so I think this problem will be fixed by really
upgrading to the latest release, or by backporting the single line change. The
change is roughly this:

diff --git a/source3/lib/util_tdb.c b/source3/lib/util_tdb.c
index bb568bc..8ceaa46 100644
--- a/source3/lib/util_tdb.c
+++ b/source3/lib/util_tdb.c
@@ -64,7 +64,7 @@ static int tdb_chainlock_with_timeout_internal( TDB_CONTEXT
*tdb, TDB_DATA key,
                alarm(0);
                tdb_setalarm_sigptr(tdb, NULL);
                CatchSignal(SIGALRM, SIGNAL_CAST SIG_IGN);
-               if (gotalarm) {
+               if (gotalarm && (ret == -1)) {
                        DEBUG(0,("tdb_chainlock_with_timeout_internal: alarm
(%u) timed out for key
%s in tdb %s\n",
                                timeout, key.dptr, tdb_name(tdb)));
                        /* TODO: If we time out waiting for a lock, it might


-- 
Configure bugmail: https://bugzilla.samba.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Pkg-samba-maint mailing list