[Pkg-openldap-devel] Bug#464024: Bug#464024: syncrepl provider kills consumer by sending truncated cookie

Quanah Gibson-Mount quanah at zimbra.com
Sat Feb 9 16:27:58 UTC 2008


This has been filed upstream as ITS#5362

--Quanah

--On February 4, 2008 8:33:30 PM +0100 Ralph Rößner <roessner at capcom.de> 
wrote:

> Package: slapd
> Version: 2.4.7-3
> Severity: Important
>
> Hi,
>
> when our syncrepl consumers (refreshOnly mode) query the provider for
> changes, the provider will sometimes send back an intermediate message
> that has the syncronization cookie truncated (the csn is missing). This
> causes the consumer to die (segfault). Upon restart, the consumer
> database will be empty. In a rarer case, the consumer will survive but
> have its database cleaned out as well. This problem appeared after the
> upgrade from 2.3.83-1+lenny1.
>
> Our LDAP infrastructure contains a syncrepl provider and three consumers
> in refreshOnly mode. Two of the consumers get an identical subset of the
> data and are configured alike except for the replication user, while the
> third serves a different purpose. All consumers have been hit by the
> problem, the ones configured alike die at the same time. The problem
> appears at apparently random intervals, from a few hours to a few days.
>
> Since then I have tried a few changes to our configuration and an
> upgrade to 2.4.7-4, mainly to keep things alive (mail customers not
> being happy). This has yielded only one result, namely that switching to
> refreshAndPersist mode avoids the problem, I had one of the alike
> configured consumers running in refreshAndPersist, and it survived when
> the other failed.
>
> I have set up a test consumer server, copying the existing
> configuration, and it has nicely duplicated the problem, even
> reproducably for a stretch of time, So I am able to provide sane (i.e.
> without a lot of queries for mail adresses) debug logs that show the
> consumer failing. I have also captured a debug log of the provider
> working at the replication query, from a later point in time since
> restarting the provider to change the log level has cleared the problem
> for a while.
>
> You will notice in the logs that the intermediate message returned to
> the client contains a cookie that stops after the "csn=" string, i.e. it
> does not actually contain a value for the csn. I think that is what
> kills the consumer. I don't have a clue why the provider does that.
>
> I have provided a network trace (in pcap format) of the exchange,
> leaving out the handshake and bind request message to avoid password
> disclosure. Unless I'm mistaken, the refreshDeletes flag of the
> intermediate message is set to TRUE, indicating multiple deletes
> (right?). This fits well with the rare case of the consumer deleting all
> its entries (which I have not been able to get logs of so far).
>
> From the usual use of our provider server I would have expected zero or
> one changes within the poll interval, and definitely no deleted objects.
> So the fact that the provider is trying to send a sync id set at all
> and flag it as deletes looks suspicious to me. The test consumer server
> has never logged such an intermediate message as reaction to a
> synchronization search except in these fatal cases, for the few days
> that it has been running debug enabled now.
>
> Now I hope that someone has an idea about what might be going wrong in
> the provider server. I can just speculate that the problems we observe
> are symptoms of a deeper problem.
>
> Some software versions:
>
> slapd: 2.4.7-3
> libc6: 2.7-6
> libdb4.2: 4.2.52+dfsg-4
> libgnutls13: 2.0.4-1
> libiodbc2: 3.52.6-1
> libldap-2.4-2: 2.4.7-3
>
> Attached files:
>
> slapd.conf.keldon - provider configuration file
> slapd.conf.gorkon - test consumer configuration file
> slapd.crash.capture - network trace of the consumer - provider
>                     communication while performing the deadly replication
> slapd.crash.strace - syscall trace of the consumer at the same time
> slapd.crash.log -   debug log of the consumer, levels
>                     sync+stats+acl+trace, at the same time
> provider.log -      debug log of the provider, levels sync+stats+trace,
>                     at a later deadly replication
>
> Sincerely,
>    Ralph Rößner
>
> --
> Ralph Rößner
> CAPCom AG < http://www.capcom.de >
> Rundeturmstr. 10, 64283 Darmstadt, Germany
> Phone +49 6151 155 900, Fax +49 6151 155 909
>
> Vorstand: Luc Neumann (Vorsitzender)
> Vorsitzender des Aufsichtsrats: Prof. Dr.-Ing. José L. Encarnação
> Sitz der Gesellschaft: Darmstadt, Registergericht: Darmstadt HRB 8090



--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration





More information about the Pkg-openldap-devel mailing list