Bug#845569: exim4-daemon-heavy: Memory leak in callouts (fixed already in official Exim Git repo)

Mon Jan 2 10:26:25 UTC 2017

Hi,

Andreas Metzler <ametzler at bebt.de> (Sa 31 Dez 2016 17:55:30 CET):
> On 2016-11-24 "Heiko Schlittermann (HS12-RIPE)" <hs at schlittermann.de> wrote:
> > Package: exim4-daemon-heavy
> > Version: 4.84.2-2+deb8u1
> > Severity: important
> > Tags: upstream patch
> 
> > Dear Maintainer,
> 
> > Current Exim versions have a memory leak when doing callouts via TLS
> > connections. I can reproduce the problem and I've fixed it.
> 
> > The fix is already pushed to the upstream repository of Exim (as I'm
> > one of the Exim developers).
> 
> > Commit ed62aae3051c9a713d35c8ae516fbd193d1401ba contains the fix.
> [...]
> 
> Hello Heiko,
> 
> thanks for the report with fix (in the branch).
> 
> Would you mind explaining why this is an important bug? Afaiu most exim
> processes a short lived and I also would think that the respective
> structure would not be huge. So at a glance I would have expected a
> normal or even minor severity (... which would not be eligible for a
> stable update.)

You're right. Most Exim processes are short lived. The callout is done
by the (just forked) receiving process, *not* by another subprocess.
Thus, if there is a huge number of addresses to be checked by callouts,
the memory leak hurts.

I discovered the problem on a central mailhub. One of the sattelites is
a mailing list server (mailman), sending via its local Exim instance to
the central mailhub. The default configuration of Mailman and Exim
caused a batch of about 4k recipient addresses with a single message.

The receiving Exim on the mailhub tried to verify these 4k addresses via
TLS callouts. After about 1k address approx 4G¹ RAM where exhausted and
the receiving process crashed. Fortunately the callout results were
stored in the callout cache and the next connection caused the first 1k
addresses verified by the cache entries, but the 2nd 1k addresses caused
the receiver to crash during callouts… After about the 4th attempt all
addresses where verified and the mail went through.

In the above setup the delay was a major desaster. In other cases you might
have much less addresses to check, or much looser constraints about
delivery time … But the leak is clearly a bug and the fix is easy. (Even
there are possibilities to create work-arounds, on the sender's side,
and on the receivers side. Because of the callout cache it was kind of
self-healing, but with shorter cache times and longer retry intervals
this wouldn't work anymore.)

As one of the Exim developers I'd really like to see this bug fixed in
Exim releases that are distributed as "stable". If you need help for
backporting, I can assist you.

¹) I'm not sure about the real numbers, maybe it was 1.5k addresses
   and 8G RAM, but I think, you get the idea. (It was reproduceable.)

    Best regards from Dresden/Germany
    Viele Grüße aus Dresden
    Heiko Schlittermann
-- 
 SCHLITTERMANN.de ---------------------------- internet & unix support -
 Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} -
 gnupg encrypted messages are welcome --------------- key ID: F69376CE -
 ! key id 7CBF764A and 972EAC9F are revoked since 2015-01 ------------ -
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: Digital signature
URL: <http://lists.alioth.debian.org/pipermail/pkg-exim4-maintainers/attachments/20170102/8ac345fe/attachment.sig>