[Pkg-xen-devel] Bug#571634: xen-utils-common: using --physdev-out in the OUTPUT, FORWARD and POSTROUTING

Travis Millican travism at fmi-dallas.com
Thu Sep 16 17:46:03 UTC 2010


I recently encountered this in the logs of a new Debian Xen Dom0, and
having now spent the better part of a day researching and testing, I've
come to the conclusion that this is not a bug in xen-utils-common or
even iptables; it's merely the consequence of structural changes to the
core netfilter code starting in the 2.6.20 kernel.

This is rather long, but the issue is complicated. Please bear with me :)

I can't say with any certainty why some of you are having problems
doing packet forwarding on DomUs with iptables in place, but I suspect
it is a matter of misunderstanding how the bridging and routing kernel
code now interact, and the implications for the physdev iptables module.
Which is entirely understandable. I certainly didn't really "grok" what
was going on until spending quite a few hours reading up on it. I'm glad
I did, though, as I now know more than I ever wanted to about the
kernel's netfilter code...

An absolutely invaluable resource on this subject is the
"ebtables/iptables interaction on a Linux-based bridge" document
published by the ebtables developers:

http://ebtables.sourceforge.net/br_fw_ia/br_fw_ia.html

I don't know who specifically wrote it, but I can't thank them enough.
If you're like me, you'll have to read this slowly and several times
before it totally sinks in. I now have a copy of their detailed packet
flow chart (bottom of the article) printed out and hanging next to my
workstation :)

The long and short of it is this -- there are two different general
processes that an IP packet or link-layer frame can follow through the
core netfilter code in the kernel: the bridging process and the routing
process. Although iptables is "network layer" as opposed to "link
layer", some of the iptables chains are hooked in both the routing
process and the bridging process. The significance of this is that for
these chains (the filter table's FORWARD chain being a crucial example),
there are two entirely different contexts in which the chain may be
processed. That is, iptables installs itself into the kernel hooks in
both the bridging code and the routing code. More particularly, if the
OUTPUT, FORWARD, or POSTROUTING chains are called from the routing
context, no bridging decision has (yet) been made. Therefore it is not
possible for --physdev-out to ever match in this context, even though
it might naively seem to be a logical thing to do in certain situations.

And that brings us to the kernel warning emitted with respect to
iptables. Admittedly, the text of this warning could stand to be
revised a bit, as it does tend to give the wrong impression. Taking a
look at xt_physdev.c in the kernel code would be useful in figuring out
what the warning truly indicates:

In function physdev_mt_check, xt_physdev.c wrote:
> if (!(info->bitmask & XT_PHYSDEV_OP_MASK) ||
>     info->bitmask & ~XT_PHYSDEV_OP_MASK)
>         return false;
> if (info->bitmask & XT_PHYSDEV_OP_OUT &&
>     (!(info->bitmask & XT_PHYSDEV_OP_BRIDGED) ||
>      info->invert & XT_PHYSDEV_OP_BRIDGED) &&
>     par->hook_mask & ((1 << NF_INET_LOCAL_OUT) |
>     (1 << NF_INET_FORWARD) | (1 << NF_INET_POST_ROUTING))) {
>         printk(KERN_WARNING "physdev match: using --physdev-out in the "
>                "OUTPUT, FORWARD and POSTROUTING chains for non-bridged "
>                "traffic is not supported anymore.\n");
>         if (par->hook_mask & (1 << NF_INET_LOCAL_OUT))
>                 return false;
> }

In a nutshell, this warning is emitted any time that you use the
--physdev-out rule in the OUTPUT, FORWARD, or POSTROUTING chains if:

    1. You haven't included the --physdev-is-bridged option as well.
    2. You have explicitly tried to apply this rule to non-bridged
       traffic by including "! --physdev-is-bridged" in the rule.

Here is the description of --physdev-is-bridged in the man page:

IPTABLES(8) wrote:
> Matches  if  the  packet  is  being bridged and therefore is not
> being routed.  This is only useful in the FORWARD and  POSTROUT-
> ING chains.

Note that these are the same chains mentioned by the warning. This is
not merely coincidental. Since we can probably rule out the second
condition for emitting this warning, the most likely reason that you are
seeing this is the first.

In plain English, what this warning actually indicates is that you have
written a rule which *might* be processed in the context of the routing
process, but which cannot ever possibly match in that context. This 
could leading to potentially unexpected behavior -- namely that the rule
never matches any traffic at all. Therefore the netfilter code in the
kernel would like to see you explicitly acknowledge this situation by
always using --physdev-is-bridged whenever you use --physdev-out. By
doing this, you make it clear to anyone who might be looking at your
rules that this particular rule can only be used to match packets that
have arrived at iptables via the bridging process.

If it is your intention that your rule will only apply to bridged
traffic in the first place, and if you have verified that the rule is
in fact matching all of the traffic that you intend, you *can* safely
ignore this warning. However, as a matter of best-practices, I would
recommend that you go ahead and add --physdev-is-bridged to the rule for
readability/maintainability reasons anyway.

All that said, the nature of this warning might be a little less
confusing if you know its history. Prior to kernel version 2.6.20, there
were some deferred hooks in netfilter code that would allow the
aforementioned chains to be processed *after* bridging had occurred,
even for packets that followed the routing process rather than than the
bridging process. This was yanked for the reasons given in the
changelog:

http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.20

> [NETFILTER]: bridge-netfilter: remove deferred hooks
>  
> Remove the deferred hooks and all related code as scheduled in
> feature-removal-schedule.

And per that:

http://www.linuxhq.com/kernel/v2.6/20/Documentation/feature-removal-schedule.txt

> What:   Bridge netfilter deferred IPv4/IPv6 output hook calling
> When:   January 2007
> Why:   The deferred output hooks are a layering violation causing unusual
>    and broken behaviour on bridge devices. Examples of things they
>    break include QoS classifation using the MARK or CLASSIFY targets,
>    the IPsec policy match and connection tracking with VLANs on a
>    bridge. Their only use is to enable bridge output port filtering
>    within iptables with the physdev match, which can also be done by
>    combining iptables and ebtables using netfilter marks. Until it
>    will get removed the hook deferral is disabled by default and is
>    only enabled when needed.

Now we know why/when the feature was removed, but the warning message
itself is still fairly confusing to the uninitiated. To explain that,
we'll have to take a look at the patch history for xt_physdev.c.

http://www.linuxhq.com/kernel/v2.6/20-rc4/net/netfilter/xt_physdev.c

We can see that the previous incarnation of this warning was a
deprecation warning:

> printk(KERN_WARNING "physdev match: using --physdev-out in the "
>      "OUTPUT, FORWARD and POSTROUTING chains for non-bridged "
>      "traffic is deprecated and breaks other things, it will "
>      "be removed in January 2007. See Documentation/"
>      "feature-removal-schedule.txt for details. This doesn't "
>      "affect you in case you're using it for purely bridged "
>      "traffic.\n");

In other words, the warning was originally added back when using
--physdev-out was still possible for non-bridged traffic, but after that
feature had already been slated for removal. I.e. it was added to warn
people that their existing rules using --physdev-out might be broken
soon when the netfilter deferred hooks were removed from the kernel.
Once that change was committed, the warning was edited in a
less-than-clear fashion.

Unfortunately, there's no feasible way for this warning to only be
emitted when the rule is actually processed from a non-bridging
(routing) context, because doing so would require placing the check
inside of the callback functions that are hooked into the netfilter
code. I am not a kernel developer, but I suspect that this would have
a unacceptably negative performance impact on iptables.

So...there we are.

If you're still having issues routing traffic in a Xen DomU, it may or
may not be because of the condition flagged by the warning. The changes
made to the kernel in 2.6.20 are not a "bug", but they may require that
you re-think how you process traffic on a host that functions as both a
bridge and a router (or as a combination brouter). It should still be
possible for you to achieve whatever it is you're trying to do with the
post-2.6.20 kernel, but you may need to get a bit more sophisticated.
All I recommend is to check out the ebtables package. Ebtables is to the
bridging/link-layer process what iptables is to the routing/network
layer. The two are very similar by design, and both can be used in
conjunction to handle more complicated bridging + routing conditions.

-- 
Travis Millican





More information about the Pkg-xen-devel mailing list