The MTU for IP traffic encapsulated inside PPPoE traffic is smaller
than the MTU of the Ethernet device (1500). Connection tracking
gathers all IP packets and sometimes will refragment them in
ip_fragment(). We then need to subtract the length of the
encapsulating header from the mtu used in ip_fragment(). The check in
br_nf_dev_queue_xmit() which determines if ip_fragment() has to be
called is also updated for the PPPoE-encapsulated packets.
nf_bridge_copy_header() is also updated to make sure the PPPoE data
length field has the correct value.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
- fix IP DNAT on vlan- or pppoe-encapsulated traffic: The functions
neigh_hh_output() or dst->neighbour->output() overwrite the complete
Ethernet header, although we only need the destination MAC address.
For encapsulated packets, they ended up overwriting the encapsulating
header. The new code copies the Ethernet source MAC address and
protocol number before calling dst->neighbour->output(). The Ethernet
source MAC and protocol number are copied back in place in
br_nf_pre_routing_finish_bridge_slow(). This also makes the IP DNAT
more transparent because in the old scheme the source MAC of the
bridge was copied into the source address in the Ethernet header. We
also let skb->protocol equal ETH_P_IP resp. ETH_P_IPV6 during the
execution of the PF_INET resp. PF_INET6 hooks.
- Speed up IP DNAT by calling neigh_hh_bridge() instead of
neigh_hh_output(): if dst->hh is available, we already know the MAC
address so we can just copy it.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Remove br_netfilter.c::br_nf_local_out(). The function
br_nf_local_out() was needed because the PF_BRIDGE::LOCAL_OUT hook
could be called when IP DNAT happens on to-be-bridged traffic. The
new scheme eliminates this mess.
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
ip_refrag isn't used anymore in the bridge-netfilter code
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
bridge-netfilter: cleanup br_netfilter.c
- remove some of the graffiti at the head of br_netfilter.c
- remove __br_dnat_complain()
- remove KERN_INFO messages when CONFIG_NETFILTER_DEBUG is defined
Signed-off-by: Bart De Schuymer <bdschuym@pandora.be>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The IGMP3 report parsing is looking at the wrong address for
group records. This patch fixes it.
Reported-by: Banyeer <banyeer@yahoo.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Restore function signatures from bool to int so that we can report
memory allocation failures or similar using -ENOMEM rather than
always having to pass -EINVAL back.
// <smpl>
@@
type bool;
identifier check, par;
@@
-bool check
+int check
(struct xt_tgchk_param *par) { ... }
// </smpl>
Minus the change it does to xt_ct_find_proto.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Restore function signatures from bool to int so that we can report
memory allocation failures or similar using -ENOMEM rather than
always having to pass -EINVAL back.
This semantic patch may not be too precise (checking for functions
that use xt_mtchk_param rather than functions referenced by
xt_match.checkentry), but reviewed, it produced the intended result.
// <smpl>
@@
type bool;
identifier check, par;
@@
-bool check
+int check
(struct xt_mtchk_param *par) { ... }
// </smpl>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The first argument to NF_HOOK* is an nfproto since quite some time.
Commit v2.6.27-2457-gfdc9314 was the first to practically start using
the new names. Do that now for the remaining NF_HOOK calls.
The semantic patch used was:
// <smpl>
@@
@@
(NF_HOOK
|NF_HOOK_THRESH
)(
-PF_BRIDGE,
+NFPROTO_BRIDGE,
...)
@@
@@
NF_HOOK(
-PF_INET6,
+NFPROTO_IPV6,
...)
@@
@@
NF_HOOK(
-PF_INET,
+NFPROTO_IPV4,
...)
// </smpl>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Supplement to 1159683ef4.
Downgrade the log level to INFO for most checkentry messages as they
are, IMO, just an extra information to the -EINVAL code that is
returned as part of a parameter "constraint violation". Leave errors
to real errors, such as being unable to create a LED trigger.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
We never actually use iph again so this assignment can be removed.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not desired for underlaying devices to change type. At the time,
there is for example possible to have bond with changed type from
Ethernet to Infiniband as a port of a bridge. This patch fixes this.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENOMEM is a very obvious error code (cf. EINVAL), so I think we do not
really need a warning message. Not to mention that if the allocation
fails, the user is most likely going to get a stack trace from slab
already.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
The shared packet statistics are a potential source of slow down
on bridged traffic. Convert to per-cpu array, but only keep those
statistics which change per-packet.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_BRIDGE_IGMP_SNOOPING,
BR_INPUT_SKB_CB(skb)->mrouters_only is not appropriately
initialized, so we can see garbage.
A clear option to fix this is to set it even without that
config, but we cannot optimize out the branch.
Let's introduce a macro that returns value of mrouters_only
and let it return 0 without CONFIG_BRIDGE_IGMP_SNOOPING.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Michael Braun <michael-dev@fami-braun.de>
bridge: Fix br_forward crash in promiscuous mode
It's a linux-next kernel from 2010-03-12 on an x86 system and it
OOPs in the bridge module in br_pass_frame_up (called by
br_handle_frame_finish) because brdev cannot be dereferenced (its set to
a non-null value).
Adding some BUG_ON statements revealed that
BR_INPUT_SKB_CB(skb)->brdev == br-dev
(as set in br_handle_frame_finish first)
only holds until br_forward is called.
The next call to br_pass_frame_up then fails.
Digging deeper it seems that br_forward either frees the skb or passes
it to NF_HOOK which will in turn take care of freeing the skb. The
same is holds for br_pass_frame_ip. So it seems as if two independent
skb allocations are required. As far as I can see, commit
b33084be19 ("bridge: Avoid unnecessary
clone on forward path") removed skb duplication and so likely causes
this crash. This crash does not happen on 2.6.33.
I've therefore modified br_forward the same way br_flood has been
modified so that the skb is not freed if skb0 is going to be used
and I can confirm that the attached patch resolves the issue for me.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since all callers of br_mdb_ip_get need to check whether the
hash table is NULL, this patch moves the check into the function.
This fixes the two callers (query/leave handler) that didn't
check it.
Reported-by: Michael Braun <michael-dev@fami-braun.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
bridge: ensure to unlock in error path in br_multicast_query().
drivers/net/tulip/eeprom.c: fix bogus "(null)" in tulip init messages
sky2: Avoid rtnl_unlock without rtnl_lock
ipv6: Send netlink notification when DAD fails
drivers/net/tg3.c: change the field used with the TG3_FLAG_10_100_ONLY constant
ipconfig: Handle devices which take some time to come up.
mac80211: Fix memory leak in ieee80211_if_write()
mac80211: Fix (dynamic) power save entry
ipw2200: use kmalloc for large local variables
ath5k: read eeprom IQ calibration values correctly for G mode
ath5k: fix I/Q calibration (for real)
ath5k: fix TSF reset
ath5k: use fixed antenna for tx descriptors
libipw: split ieee->networks into small pieces
mac80211: Fix sta_mtx unlocking on insert STA failure path
rt2x00: remove KSEG1ADDR define from rt2x00soc.h
net: add ColdFire support to the smc91x driver
asix: fix setting mac address for AX88772
ipv6 ip6_tunnel: eliminate unused recursion field from ip6_tnl{}.
net: Fix dev_mc_add()
...
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thanks to Paul McKenny for pointing out that it is incorrect to use
synchronize_rcu_bh to ensure that pending callbacks have completed.
Instead we should use rcu_barrier_bh.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Paul McKenney correctly pointed out, __br_mdb_ip_get needs
to use the RCU list walking primitive in order to work correctly
on platforms where data-dependency ordering is not guaranteed.
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We dereference "port" on the lines immediately before and immediately
after the test so port should hopefully never be null here.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_multicast calls ip_send_check(), so it should depend on INET.
built-in:
br_multicast.c:(.text+0x88cf4): undefined reference to `ip_send_check'
or modular:
ERROR: "ip_send_check" [net/bridge/bridge.ko] undefined!
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following build error when IGMP_SNOOPING is not enabled.
In file included from net/bridge/br.c:24:
net/bridge/br_private.h: In function 'br_multicast_is_router':
net/bridge/br_private.h:361: error: 'struct net_bridge' has no member named 'multicast_router'
net/bridge/br_private.h:362: error: 'struct net_bridge' has no member named 'multicast_router'
net/bridge/br_private.h:363: error: 'struct net_bridge' has no member named 'multicast_router_timer'
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to the IGMP parameters related to the
snooping function of the bridge. This includes various time
values and retransmission limits.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to control the hash elasticity/max
parameters. The elasticity setting does not take effect until
the next new multicast group is added. At which point it is
checked and if after rehashing it still can't be satisfied then
snooping will be disabled.
The max setting on the other hand takes effect immediately. It
must be a power of two and cannot be set to a value less than the
current number of multicast group entries. This is the only way
to shrink the multicast hash.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to disable IGMP snooping completely
through a sysfs toggle. It also allows the user to reenable
snooping when it has been automatically disabled due to hash
collisions. If the collisions have not been resolved however
the system will refuse to reenable snooping.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows the user to forcibly enable/disable ports as
having multicast routers attached. A port with a multicast router
will receive all multicast traffic.
The value 0 disables it completely. The default is 1 which lets
the system automatically detect the presence of routers (currently
this is limited to picking up queries), and 2 means that the port
will always receive all multicast traffic.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch finally hooks up the multicast snooping module to the
data path. In particular, all multicast packets passing through
the bridge are fed into the module and switched by it.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch hooks up the bridge start/stop and add/delete/disable
port functions to the new multicast module.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds code to perform selective multicast forwarding.
We forward multicast traffic to a set of ports plus all multicast
router ports. In order to avoid duplications among these two
sets of ports, we order all ports by the numeric value of their
pointers. The two lists are then walked in lock-step to eliminate
duplicates.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the core functionality of IGMP snooping support
without actually hooking it up. So this patch should be a no-op
as far as the bridge's external behaviour is concerned.
All the new code and data is controlled by the Kconfig option
BRIDGE_IGMP_SNOOPING. A run-time toggle is also available.
The multicast switching is done using an hash table that is
lockless on the read-side through RCU. On the write-side the
new multicast_lock is used for all operations. The hash table
supports dynamic growth/rehashing.
The hash table will be rehashed if any chain length exceeds a
preset limit. If rehashing does not reduce the maximum chain
length then snooping will be disabled.
These features may be added in future (in no particular order):
* IGMPv3 source support
* Non-querier router detection
* IPv6
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves the main loop body in br_flood into the function
may_deliver. The code that clones an skb and delivers it is moved
into the deliver_clone function.
This allows this to be reused by the future multicast forward
function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch makes BR_INPUT_SKB_CB available on the xmit path so
that we could avoid passing the br pointer around for the purpose
of collecting device statistics.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the packet is delivered to the local bridge device we may
end up cloning it unnecessarily if no bridge port can receive
the packet in br_flood.
This patch avoids this by moving the skb_clone into br_flood.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows tail-call on the call to br_pass_frame_up
in br_handle_frame_finish. This is now possible because of the
previous patch to call br_pass_frame_up last.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment we deliver to the local bridge port via the function
br_pass_frame_up before all other ports. There is no requirement
for this.
For the purpose of IGMP snooping, it would be more convenient if
we did the local port last. Therefore this patch rearranges the
bridge input processing so that the local bridge port gets to see
the packet last (if at all).
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the required handlers to convert 32 bit
ebtables mark match and match target structs to 64bit layout.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
ebt_limit structure is larger on 64 bit systems due
to "long" type used in the (kernel-only) data section.
Setting .compatsize is enough in this case, these values
have no meaning in userspace.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
ebtables can be compiled to perform userspace-side padding of
structures. In that case, all the structures are already in the
'native' format expected by the kernel.
This tries to determine what format the userspace program is
using.
For most set/getsockopts, this can be done by checking
the len argument for sizeof(compat_ebt_replace) and
re-trying the native handler on error.
In case of EBT_SO_GET_ENTRIES, the native handler is tried first,
it will error out early when checking the *len argument
(the compat version has to defer this check until after
iterating over the kernel data set once, to adjust for all
the structure size differences).
As this would cause error printks, remove those as well, as
recommended by Bart de Schuymer.
Signed-off-by: Florian Westphal <fw@strlen.de>
Main code for 32 bit userland ebtables binary with 64 bit kernels
support.
Tested on x86_64 kernel only, using 64bit ebtables binary
for output comparision.
At least ebt_mark, m_mark and ebt_limit need CONFIG_COMPAT hooks, too.
remaining problem:
The ebtables userland makefile has:
ifeq ($(shell uname -m),sparc64)
CFLAGS+=-DEBT_MIN_ALIGN=8 -DKERNEL_64_USERSPACE_32
endif
struct ebt_replace, ebt_entry_match etc. then contain userland-side
padding, i.e. even if we are called from a 32 bit userland, the
structures may already be in the right format.
This problem is addressed in a follow-up patch.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
allows to call do_update_counters() from upcoming CONFIG_COMPAT
code instead of copy&pasting the same code.
Signed-off-by: Florian Westphal <fw@strlen.de>
once CONFIG_COMPAT support is added to ebtables, the new
copy_counters_to_user function can be called instead of duplicating
code.
Also remove last use of MEMPRINT, as requested by Bart De Schuymer.
Signed-off-by: Florian Westphal <fw@strlen.de>
once CONFIG_COMPAT support is merged this allows
to call do_replace_finish() after doing the CONFIG_COMPAT conversion
instead of copy & pasting this.
Signed-off-by: Florian Westphal <fw@strlen.de>
This will cause trouble once CONFIG_COMPAT support is added to ebtables.
xt_compat_*_offset() calculate the kernel/userland structure size delta
using:
XT_ALIGN(size) - COMPAT_XT_ALIGN(size)
If the match/target sizes are aligned at registration time,
delta is always zero.
Should have zero effect for existing systems: xtables uses
XT_ALIGN() whenever it deals with match/target sizes.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
next_offset must be > 0, otherwise this loops forever.
The offset also contains the size of the ebt_entry structure
itself, so anything smaller is invalid.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch removes the unused age_list member from the net_bridge
structure.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ->net to match destructor list like ->net in constructor list.
Make sure it's set in ebtables/iptables/ip6tables, this requires to
propagate netns up to *_unregister_table().
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Some complex match modules (like xt_hashlimit/xt_recent) want netns
information at constructor and destructor time. We propably can play
games at match destruction time, because netns can be passed in object,
but I think it's cleaner to explicitly pass netns.
Add ->net, make sure it's set from ebtables/iptables/ip6tables code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
__net_init/__net_exit are apparently not going away, so use them
to full extent.
In some cases __net_init was removed, because it was called from
__net_exit code.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
normal users are currently allowed to set/modify ebtables rules.
Restrict it to processes with CAP_NET_ADMIN.
Note that this cannot be reproduced with unmodified ebtables binary
because it uses SOCK_RAW.
Signed-off-by: Florian Westphal <fwestphal@astaro.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
mac80211: fix reorder buffer release
iwmc3200wifi: Enable wimax core through module parameter
iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
iwmc3200wifi: Coex table command does not expect a response
iwmc3200wifi: Update wiwi priority table
iwlwifi: driver version track kernel version
iwlwifi: indicate uCode type when fail dump error/event log
iwl3945: remove duplicated event logging code
b43: fix two warnings
ipw2100: fix rebooting hang with driver loaded
cfg80211: indent regulatory messages with spaces
iwmc3200wifi: fix NULL pointer dereference in pmkid update
mac80211: Fix TX status reporting for injected data frames
ath9k: enable 2GHz band only if the device supports it
airo: Fix integer overflow warning
rt2x00: Fix padding bug on L2PAD devices.
WE: Fix set events not propagated
b43legacy: avoid PPC fault during resume
b43: avoid PPC fault during resume
tcp: fix a timewait refcnt race
...
Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
kernel/sysctl_check.c
net/ipv4/sysctl_net_ipv4.c
net/ipv6/addrconf.c
net/sctp/sysctl.c
Not including net/atm/
Compiled tested x86 allyesconfig only
Added a > 80 column line or two, which I ignored.
Existing checkpatch plaints willfully, cheerfully ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A number of people have tried to add a wireless interface
(in managed mode) to a bridge and then complained that it
doesn't work. It cannot work, however, because in 802.11
networks all packets need to be acknowledged and as such
need to be sent to the right address. Promiscuous doesn't
help here. The wireless address format used for these
links has only space for three addresses, the
* transmitter, which must be equal to the sender (origin)
* receiver (on the wireless medium), which is the AP in
the case of managed mode
* the recipient (destination), which is on the APs local
network segment
In an IBSS, it is similar, but the receiver and recipient
must match and the third address is used as the BSSID.
To avoid such mistakes in the future, disallow adding a
wireless interface to a bridge.
Felix has recently added a four-address mode to the AP
and client side that can be used (after negotiating that
it is possible, which must happen out-of-band by setting
up both sides) for bridging, so allow that case.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Resolve the conflict between v2.6.32-rc7 where dn_def_dev_handler
gets a small bug fix and the sysctl tree where I am removing all
sysctl strategy routines.
Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.
In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.
Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Conflicts:
drivers/net/usb/cdc_ether.c
All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.
Signed-off-by: David S. Miller <davem@davemloft.net>
add_del_if() is called with RTNL, we can use __dev_get_by_index()
instead of [dev_get_by_index() + dev_put()]
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bridge code assumes ethernet addressing, so be more strict in
the what is allowed. This showed up when GRE had a bug and was not
using correct address format.
Add some more comments for increased clarity.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Speedup module unloading by factorizing synchronize_rcu() calls
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow enable/disable UFO on bridge device via ethtool
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a potential double-kfree in net/bridge/br_if.c. If br_fdb_insert
fails, then the kobject is put back (which calls kfree due to the kobject
release), and then kfree is called again on the net_bridge_port. This
patch fixes the crash.
Thanks to Stephen Hemminger for the one-line fix.
Signed-off-by: Jeff Hansen <x@jeffhansen.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's unused.
It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.
It _was_ used in two places at arch/frv for some reason.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The Ethernet framing is used for a lot of devices these days. Most
prominent are WiFi and WiMAX based devices. However for userspace
application it is important to classify these devices correctly and
not only see them as Ethernet devices. The daemons like HAL, DeviceKit
or even NetworkManager with udev support tries to do the classification
in userspace with a lot trickery and extra system calls. This is not
good and actually reaches its limitations. Especially since the kernel
does know the type of the Ethernet device it is pretty stupid.
To solve this problem the underlying device type needs to be set and
then the value will be exported as DEVTYPE via uevents and available
within udev.
# cat /sys/class/net/wlan0/uevent
DEVTYPE=wlan
INTERFACE=wlan0
IFINDEX=5
This is similar to subsystems like USB and SCSI that distinguish
between hosts, devices, disks, partitions etc.
The new SET_NETDEV_DEVTYPE() is a convenience helper to set the actual
device type. All device types are free form, but for convenience the
same strings as used with RFKILL are choosen.
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 19eda87 (netfilter: change return types of check functions for
Ebtables extensions) broke the ebtables ulog module by missing a return
value conversion.
Signed-off-by: Patrick McHardy <kaber@trash.net>
commit f216f082b2
([NETFILTER]: bridge netfilter: deal with martians correctly)
added a refcount leak on in_dev.
Instead of using in_dev_get(), we can use __in_dev_get_rcu(),
as netfilter hooks are running under rcu_read_lock(), as pointed
by Patrick.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The inputted table is never modified, so should be considered const.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch adds a 'hairpin' (also called 'reflective relay') mode
port configuration to the Linux Ethernet bridge kernel module.
A bridge supporting hairpin forwarding mode can send frames back
out through the port the frame was received on.
Hairpin mode is required to support basic VEPA (Virtual
Ethernet Port Aggregator) capabilities.
You can find additional information on VEPA here:
http://tech.groups.yahoo.com/group/evb/http://www.ieee802.org/1/files/public/docs2009/new-hudson-vepa_seminar-20090514d.pdfhttp://www.internet2.edu/presentations/jt2009jul/20090719-congdon.pdf
An additional patch 'bridge-utils: Add 'hairpin' port forwarding mode'
is provided to allow configuring hairpin mode from userspace tools.
Signed-off-by: Paul Congdon <paul.congdon@hp.com>
Signed-off-by: Anna Fischer <anna.fischer@hp.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
ebt_log uses its own implementation of print_mac to print MAC addresses.
This patch converts it to use the %pM conversion specifier for printk.
Signed-off-by: Tobias Klauser <klto@zhaw.ch>
Signed-off-by: Patrick McHardy <kaber@trash.net>
String literals are constant, and usually, we can also tag the array
of pointers const too, moving it to the .rodata section.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
kobject_init_and_add will alloc memory for kobj->name, so in br_add_if
error path, simply use kobject_del will not free memory for kobj->name.
Fix by using kobject_put instead, kobject_put will internally calls
kobject_del and frees memory for kobj->name.
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No functional change -- just for easier reading.
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the result of an automatic spatch transformation to convert
all ndo_start_xmit() return values of 0 to NETDEV_TX_OK.
Some occurences are missed by the automatic conversion, those will be
handled in a seperate patch.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
When unloading modules that uses call_rcu() callbacks, then we must
use rcu_barrier(). This module uses syncronize_net() which is not
enough to be sure that all callback has been completed.
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes FDB entry check for ATM LANE bridge integration.
There's no point in holding a FDB entry around SKB building.
br_fdb_get()/br_fdb_put() pair are changed into single br_fdb_test_addr()
hook that checks if the addr has FDB entry pointing to other port
to the one the request arrived on.
FDB entry refcounting is removed as it's not used anywhere else.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define three accessors to get/set dst attached to a skb
struct dst_entry *skb_dst(const struct sk_buff *skb)
void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
void skb_dst_drop(struct sk_buff *skb)
This one should replace occurrences of :
dst_release(skb->dst)
skb->dst = NULL;
Delete skb->dst field
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb
Delete skb->rtable field
Setting rtable is not allowed, just set dst instead as rtable is an alias.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Holding rtnl_lock when we are unregistering the sysfs files can
deadlock if we unconditionally take rtnl_lock in a sysfs file. So fix
it with the now familiar patter of: rtnl_trylock and syscall_restart()
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If bridge is configured with no STP and forwarding delay of 0 (which
is typical for virtualization) then when link starts it will flood all
packets for the first 20 seconds.
This bug was introduced by a combination of earlier changes:
* forwarding database uses hold time of zero to indicate
user wants to always flood packets
* optimzation of the case of forwarding delay of 0 avoids the initial
timer tick
The fix is to just skip all the topology change detection code if
kernel STP is not being used.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge catches all STP packets; even if STP is turned
off. This prevents other systems (which do have STP turned on)
from being able to detect loops in the network.
With this patch, if STP is off, then any packet sent to the STP
multicast group address is forwarded to all ports.
Based on earlier patch by Joakim Tjernlund with changes
to go through forwarding (not local chain), and optimization
that only last octet needs to be checked.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
br_nf_dev_queue_xmit only checks for ETH_P_IP packets for fragmenting but not
VLAN packets. This results in dropping of large VLAN packets. This can be
observed when connection tracking is enabled. Connection tracking re-assembles
fragmented packets, and these have to re-fragmented when transmitting out. Also,
make sure only refragmented packets are defragmented as per suggestion from
Patrick McHardy.
Signed-off-by: Saikiran Madugula <hummerbliss@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch renames the ebt_ulog nf_logger from "ulog" to "ebt_ulog" to
be in sync with other modules naming. As this name was currently only
used for informational purpose, the renaming should be harmless.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ebt_ulog module does not follow the fixed convention about function
return. Loading the module is triggering the following message:
sys_init_module: 'ebt_ulog'->init suspiciously returned 1, it should follow 0/-E convention
sys_init_module: loading module anyway...
Pid: 2334, comm: modprobe Not tainted 2.6.29-rc5edenwall0-00883-g199e57b #146
Call Trace:
[<c0441b81>] ? printk+0xf/0x16
[<c02311af>] sys_init_module+0x107/0x186
[<c0202cfa>] syscall_call+0x7/0xb
The following patch fixes the return treatment in ebt_ulog_init()
function.
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the declaration of the logger structure in ebt_log
and ebt_ulog: I forgot to remove the const option from their declaration
in the commit ca735b3aaa ("netfilter:
use a linked list of loggers").
Pointed-out-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
This fixes an crash when empty bond device is added to a bridge.
If an interface with invalid ethernet address (all zero) is added
to a bridge, then bridge code detects it when setting up the forward
databas entry. But the error unwind is broken, the bridge port object
can get freed twice: once when ref count went to zeo, and once by kfree.
Since object is never really accessible, just free it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch changes the return value of nlmsg_notify() as follows:
If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.
This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.
This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The initialization of the lock element is not needed
since the lock is always initialized in ebt_register_table.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Base versions handle constant folding now.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8cc784ee (netfilter: change return types of match functions
for ebtables extensions) broke ebtables matches by inverting the
sense of match/nomatch.
Reported-by: Matt Cross <matthltc@us.ibm.com>
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPPOE/VLAN processing code in the bridge netfilter is broken
by design. The VLAN tag and the PPPOE session ID are an integral
part of the packet flow information, yet they're completely
ignored by the bridge netfilter. This is potentially a security
hole as it treats all VLANs and PPPOE sessions as the same.
What's more, it's actually broken for PPPOE as the bridge netfilter
tries to trim the packets to the IP length without adjusting the
PPPOE header (and adjusting the PPPOE header isn't much better
since the PPPOE peer may require the padding to be present).
Therefore we should disable this by default.
It does mean that people relying on this feature may lose networking
depending on how their bridge netfilter rules are configured.
However, IMHO the problems this code causes are serious enough to
warrant this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the bridge FORWARD/POST_ROUTING chains treats all
non-IPv4 packets as IPv6. This packet fixes that by returning
NF_ACCEPT on non-IP packets instead, just as is done in PRE_ROUTING.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
In each case, if the NULL test is necessary, then the dereference should be
moved below the NULL test.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@
type T;
expression E;
identifier i,fld;
statement S;
@@
- T i = E->fld;
+ T i;
... when != E
when != i
if (E == NULL) S
+ i = E->fld;
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
As GRE tries to call the update_pmtu function on skb->dst and
bridge supplies an skb->dst that has a NULL ops field, all is
not well.
This patch fixes this by giving the bridge device an ops field
with an update_pmtu function. For the moment I've left all
other fields blank but we can fill them in later should the
need arise.
Based on report and patch by Philip Craig.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert to net_device_ops function table.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We have some reasons to kill netdev->priv:
1. netdev->priv is equal to netdev_priv().
2. netdev_priv() wraps the calculation of netdev->priv's offset, obviously
netdev_priv() is more flexible than netdev->priv.
But we cann't kill netdev->priv, because so many drivers reference to it
directly.
This patch is a safe convert for netdev->priv to netdev_priv(netdev).
Since all of the netdev->priv is only for read.
But it is too big to be sent in one mail.
I split it to 4 parts and make every part smaller than 100,000 bytes,
which is max size allowed by vger.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that ebt_unregister_table() can be called during netns stop, and module
pinning scheme can't prevent netns stop, do table cleanup by hand.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* return ebt_table from ebt_register_table(), module code will save it into
per-netns data for unregistration
* duplicate ebt_table at the very beginning of registration -- it's added into
list, so one ebt_table wouldn't end up in many lists (and each netns has
different one)
* introduce underscored tables in individial modules, this is temporary to not
break bisection.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
* propagate netns from userspace, register table in passed netns
* remporarily register every ebt_table in init_net
P. S.: one needs to add ".netns_ok = 1" to igmp_protocol to test with
ebtables(8) in netns.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
I want to compile out proc_* and sysctl_* handlers totally and
stub them to NULL depending on config options, however usage of &
will prevent this, since taking adress of NULL pointer will break
compilation.
So, drop & in front of every ->proc_handler and every ->strategy
handler, it was never needed in fact.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Open code NIP6_FMT in the one call inside sscanf and one user
of NIP6() that could use %p6 in the netfilter code.
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
My change
commit e2a6b85247
net: Enable TSO if supported by at least one device
didn't do what was intended because the netdev_compute_features
function was designed for conjunctions. So what happened was that
it would simply take the TSO status of the last constituent device.
This patch extends it to support both conjunctions and disjunctions
under the new name of netdev_increment_features.
It also adds a new function netdev_fix_features which does the
sanity checking that usually occurs upon registration. This ensures
that the computation doesn't result in an illegal combination
since this checking is absent when the change is initiated via
ethtool.
The two users of netdev_compute_features have been converted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
(Supplements: ee999d8b95)
NFPROTO_ARP actually has a different value from NF_ARP, so ensure all
callers use the new value so that packets _do_ get delivered to the
registered hooks.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some code here depends on CONFIG_KMOD to not try to load
protocol modules or similar, replace by CONFIG_MODULES
where more than just request_module depends on CONFIG_KMOD
and and also use try_then_request_module in ebtables.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
By passing in the family through which extensions were invoked, a bit
of data space can be reclaimed. The "family" member will be added to
the parameter structures and the check functions be adjusted.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' destroy functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' checkentry functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for target extensions' target functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for match extensions' destroy functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
This patch does this for match extensions' checkentry functions.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function signatures for Xtables extensions have grown over time.
It involves a lot of typing/replication, and also a bit of stack space
even if they are not used. Realize an NFWS2008 idea and pack them into
structs. The skb remains outside of the struct so gcc can continue to
apply its optimizations.
This patch does this for match extensions' match functions.
A few ambiguities have also been addressed. The "offset" parameter for
example has been renamed to "fragoff" (there are so many different
offsets already) and "protoff" to "thoff" (there is more than just one
protocol here, so clarify).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
It used to be that {ip,ip6,etc}_tables called extension->checkentry
themselves, but this can be moved into the xtables core.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Usually -EINVAL is used when checkentry fails (see *_tables).
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Ebtables ORs (1 << NF_BR_NUMHOOKS) into the hook mask to indicate that
the extension was called from a base chain. So this also needs to be
present in the extensions' ->hooks.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
The function signatures will be changed to match those of Xtables, and
the datalen argument will be gone. ebt_among unfortunately relies on
it, so we need to obtain it somehow.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
and (try to) consistently use u_int8_t for the L3 family.
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemming@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bridge as netdevice doesn't cross netns boundaries.
Bridge ports and bridge itself live in same netns.
Notifiers are fixed.
netns propagated from userspace socket for setup and teardown.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Stephen Hemminger <shemming@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dushan Tcholich reports that on his system ksoftirqd can consume
between %6 to %10 of cpu time, and cause ~200 context switches per
second.
He then correlated this with a report by bdupree@techfinesse.com:
http://marc.info/?l=linux-kernel&m=119613299024398&w=2
and the culprit cause seems to be starting the bridge interface.
In particular, when starting the bridge interface, his scripts
are specifying a hello timer interval of "0".
The bridge hello time can't be safely set to values less than 1
second, otherwise it is possible to end up with a runaway timer.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add more ethtool generic operations to dump the bridge offload
settings.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Stephen Hemminger <shemminger@vyatta.com>
Based upon original patch by Herbert Xu, which contained
the following problem description:
--------------------
When the forward delay is set to zero, we still delay the setting
of the forwarding state by one or possibly two timers depending
on whether STP is enabled. This could either turn out to be
instantaneous, or horribly slow depending on the load of the
machine.
As there is nothing preventing us from enabling forwarding straight
away, this patch eliminates this potential delay by executing the
code directly if the forward delay is zero.
The effect of this problem is that immediately after the carrier
comes on a port, the bridge will drop all packets received from
that port until it enters forwarding mode, thus causing unnecessary
packet loss.
Note that this patch doesn't fully remove the delay due to the
link watcher. We should also check the carrier state when we
are about to drop an incoming packet because the port is disabled.
But that's for another patch.
--------------------
This version of the fix takes a different approach, in that
it just does the state change directly.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the following warning due to incompatible pointer
assignment:
net/bridge/br_netfilter.c: In function 'br_netfilter_rtable_init':
net/bridge/br_netfilter.c:116: warning: assignment from incompatible
pointer type
This warning is due to commit 4adf0af681
from July 30 (send correct MTU value in PMTU (revised)).
Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When bridging interfaces with different MTUs, the bridge correctly chooses
the minimum of the MTUs of the physical devices as the bridges MTU. But
when a frame is passed which fits through the incoming, but not through
the outgoing interface, a "Fragmentation Needed" packet is generated.
However, the propagated MTU is hardcoded to 1500, which is wrong in this
situation. The sender will repeat the packet again with the same frame
size, and the same problem will occur again.
Instead of sending 1500, the (correct) MTU value of the bridge is now sent
via PMTU. To achieve this, the corresponding rtable structure is stored
in its net_bridge structure.
Modified to get rid of fake_net_device as well.
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without CONFIG_NET_NS, namespace is always &init_net.
Compiler will be able to omit namespace comparisons with this patch.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_set_promiscuity/allmulti might overflow.
Commit: "netdevice: Fix promiscuity and allmulti overflow" in net-next makes
dev_set_promiscuity/allmulti return error number if overflow happened.
Here, we check the positive increment for promiscuity to get error return.
Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The help text should refer to nflog instead of ulog. Noticed by
Krzysztof Halasa <khc@pm.waw.pl>.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the STP demux layer for receiving STP PDUs instead of directly
registering with LLC.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unregistering a bridge device may cause virtual devices stacked on the
bridge, like vlan or macvlan devices, to be unregistered as well.
br_cleanup_bridges() uses for_each_netdev_safe() to iterate over all
devices during cleanup. This is not enough however, if one of the
additionally unregistered devices is next in the list to the bridge
device, it will get freed as well and the iteration continues on
the freed element.
Restart iteration after each bridge device removal from the beginning to
fix this, similar to what rtnl_link_unregister() does.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add skb_warn_if_lro() to test whether an skb was received with LRO and
warn if so.
Change br_forward(), ip_forward() and ip6_forward() to call it) and
discard the skb if it returns true.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded. It can also confuse the GSO on output.
Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.
Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix bridge netfilter code so that it uses CONFIG_IPV6 as needed:
net/built-in.o: In function `ebt_filter_ip6':
ebt_ip6.c:(.text+0x87c37): undefined reference to `ipv6_skip_exthdr'
net/built-in.o: In function `ebt_log_packet':
ebt_log.c:(.text+0x88dee): undefined reference to `ipv6_skip_exthdr'
make[1]: *** [.tmp_vmlinux1] Error 1
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally, the bridge just chooses the smallest mac address as the
bridge id and mac address of bridge device. But if the administrator
has explictly set the interface address then don't change it.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Any frame addressed to link-local addresses should be processed by local
receive path. The earlier code would process them only if STP was enabled.
Since there are other frames like LACP for bonding, we should always
process them.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes CVS keywords that weren't updated for a long time
from comments.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It implements matching functions for IPv6 address & traffic class
(merged from the patch sent by Jan Engelhardt [jengelh@computergmbh.de]
http://marc.info/?l=netfilter-devel&m=120182168424052&w=2), protocol,
and layer-4 port id. Corresponding watcher logging function is also
added for IPv6.
Signed-off-by: Kuo-lang Tseng <kuo-lang.tseng@intel.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Even though bridges require 6 fields from struct net_device_stats,
the on-device stats are always there, so we may just use them.
The br_dev_get_stats is no longer required after this.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move rcu-protected lists from list.h into a new header file rculist.h.
This is done because list are a very used primitive structure all over the
kernel and it's currently impossible to include other header files in this
list.h without creating some circular dependencies.
For example, list.h implements rcu-protected list and uses rcu_dereference()
without including rcupdate.h. It actually compiles because users of
rcu_dereference() are macros. Others RCU functions could be used too but
aren't probably because of this.
Therefore this patch creates rculist.h which includes rcupdates without to
many changes/troubles.
Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Josh Triplett <josh@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This actually had to be merged with the patch #1, but I decided not to
mix two changes in one patch.
There are already two calls to free_netdev() in there, so merge them
into one.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case the register_netdevice() call fails the device is leaked,
since the out: label is just rtnl_unlock()+return.
Free the device.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The forwarding table binary interface (my bad choice), only exposes
the port number of the first 8 bits. The bridge code was limited to
256 ports at the time, but now the kernel supports up 1024 ports, so
the upper bits are lost when doing:
brctl showmacs
The fix is to squeeze the extra bits into small hole left in data
structure, to maintain binary compatiablity.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>