OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nikolay Aleksandrov	72f4af4e47	net: bridge: export also pvid flag in the xstats flags When I added support to export the vlan entry flags via xstats I forgot to add support for the pvid since it is manually matched, so check if the entry matches the vlan_group's pvid and set the flag appropriately. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-26 11:45:28 -07:00
Sabrina Dubroca	4249fc1f02	netfilter: ebtables: put module reference when an incorrect extension is found commit `bcf4934288` ("netfilter: ebtables: Fix extension lookup with identical name") added a second lookup in case the extension that was found during the first lookup matched another extension with the same name, but didn't release the reference on the incorrect module. Fixes: `bcf4934288` ("netfilter: ebtables: Fix extension lookup with identical name") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Phil Sutter <phil@nwl.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-08-25 13:18:06 +02:00
Liping Zhang	960fa72f67	netfilter: nft_meta: improve the validity check of pkttype set expr "meta pkttype set" is only supported on prerouting chain with bridge family and ingress chain with netdev family. But the validate check is incomplete, and the user can add the nft rules on input chain with bridge family, for example: # nft add table bridge filter # nft add chain bridge filter input {type filter hook input \ priority 0 \;} # nft add chain bridge filter test # nft add rule bridge filter test meta pkttype set unicast # nft add rule bridge filter input jump test This patch fixes the problem. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-08-25 13:12:03 +02:00
Nikolay Aleksandrov	61ba1a2da9	net: bridge: export vlan flags with the stats Use one of the vlan xstats padding fields to export the vlan flags. This is needed in order to be able to distinguish between master (bridge) and port vlan entries in user-space when dumping the bridge vlan stats. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-18 23:18:42 -07:00
Nikolay Aleksandrov	d5ff8c41b5	net: bridge: consolidate bridge and port linkxstats calls In the bridge driver we usually have the same function working for both port and bridge. In order to follow that logic and also avoid code duplication, consolidate the bridge_ and brport_ linkxstats calls into one since they share most of their code. As a side effect this allows us to dump the vlan stats also via the slave call which is in preparation for the upcoming per-port vlan stats and vlan flag dumping. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-18 23:18:42 -07:00
Toshiaki Makita	7bb90c3715	bridge: Fix problems around fdb entries pointing to the bridge device Adding fdb entries pointing to the bridge device uses fdb_insert(), which lacks various checks and does not respect added_by_user flag. As a result, some inconsistent behavior can happen: * Adding temporary entries succeeds but results in permanent entries. * Same goes for "dynamic" and "use". * Changing mac address of the bridge device causes deletion of user-added entries. * Replacing existing entries looks successful from userspace but actually not, regardless of NLM_F_EXCL flag. Use the same logic as other entries and fix them. Fixes: `3741873b4f` ("bridge: allow adding of fdb entries pointing to the bridge device") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-08-09 21:42:44 -07:00
Ido Schimmel	baedbe5588	bridge: Fix incorrect re-injection of LLDP packets Commit `8626c56c82` ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") caused LLDP packets arriving through a bridge port to be re-injected to the Rx path with skb->dev set to the bridge device, but this breaks the lldpad daemon. The lldpad daemon opens a packet socket with protocol set to ETH_P_LLDP for any valid device on the system, which doesn't not include soft devices such as bridge and VLAN. Since packet sockets (ptype_base) are processed in the Rx path after the Rx handler, LLDP packets with skb->dev set to the bridge device never reach the lldpad daemon. Fix this by making the bridge's Rx handler re-inject LLDP packets with RX_HANDLER_PASS, which effectively restores the behaviour prior to the mentioned commit. This means netfilter will never receive LLDP packets coming through a bridge port, as I don't see a way in which we can have okfn() consume the packet without breaking existing behaviour. I've already carried out a similar fix for STP packets in commit `56fae404fb` ("bridge: Fix incorrect re-injection of STP packets"). Fixes: `8626c56c82` ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Cc: Florian Westphal <fw@strlen.de> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-25 10:53:34 -07:00
Vivien Didelot	9e0b27fe5a	net: bridge: br_set_ageing_time takes a clock_t Change the ageing_time type in br_set_ageing_time() from u32 to what it is expected to be, i.e. a clock_t. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-25 10:30:03 -07:00
Vivien Didelot	dba479f3d6	net: bridge: fix br_stp_enable_bridge comment br_stp_enable_bridge() does take the br->lock spinlock. Fix its wrongly pasted comment and use the same as br_stp_disable_bridge(). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-25 10:30:03 -07:00
Nikolay Aleksandrov	37b090e6be	net: bridge: remove _deliver functions and consolidate forward code Before this patch we had two flavors of most forwarding functions - _forward and _deliver, the difference being that the latter are used when the packets are locally originated. Instead of all this function pointer passing and code duplication, we can just pass a boolean noting that the packet was locally originated and use that to perform the necessary checks in __br_forward. This gives a minor performance improvement but more importantly consolidates the forwarding paths. Also add a kernel doc comment to explain the exported br_forward()'s arguments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-16 19:57:38 -07:00
Nikolay Aleksandrov	b35c5f632b	net: bridge: drop skb2/skb0 variables and use a local_rcv boolean Currently if the packet is going to be received locally we set skb0 or sometimes called skb2 variables to the original skb. This can get confusing and also we can avoid one conditional on the fast path by simply using a boolean and passing it around. Thanks to Roopa for the name suggestion. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-16 19:57:38 -07:00
Nikolay Aleksandrov	e151aab9b5	net: bridge: rearrange flood vs unicast receive paths This patch removes one conditional from the unicast path by using the fact that skb is NULL only when the packet is multicast or is local. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-16 19:57:37 -07:00
Nikolay Aleksandrov	46c0772d85	net: bridge: minor style adjustments in br_handle_frame_finish Trivial style changes in br_handle_frame_finish. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-16 19:57:37 -07:00
Nikolay Aleksandrov	a65056ecf4	net: bridge: extend MLD/IGMP query stats As was suggested this patch adds support for the different versions of MLD and IGMP query types. Since the user visible structure is still in net-next we can augment it instead of adding netlink attributes. The distinction between the different IGMP/MLD query types is done as suggested in Section 7.1, RFC 3376 [1] and Section 8.1, RFC 3810 [2] based on query payload size and code for IGMP. Since all IGMP packets go through multicast_rcv() and it uses ip_mc_check_igmp/ipv6_mc_check_mld we can be sure that at least the ip/ipv6 header can be directly used. [1] https://tools.ietf.org/html/rfc3376#section-7 [2] https://tools.ietf.org/html/rfc3810#section-8.1 Suggested-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-09 17:40:09 -04:00
David S. Miller	30d0844bdc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en.h drivers/net/ethernet/mellanox/mlx5/core/en_main.c drivers/net/usb/r8152.c All three conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-06 10:35:22 -07:00
David S. Miller	ae3e4562e2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Don't use userspace datatypes in bridge netfilter code, from Tobin Harding. 2) Iterate only once over the expectation table when removing the helper module, instead of once per-netns, from Florian Westphal. 3) Extra sanitization in xt_hook_ops_alloc() to return error in case we ever pass zero hooks, xt_hook_ops_alloc(): 4) Handle NFPROTO_INET from the logging core infrastructure, from Liping Zhang. 5) Autoload loggers when TRACE target is used from rules, this doesn't change the behaviour in case the user already selected nfnetlink_log as preferred way to print tracing logs, also from Liping Zhang. 6) Conntrack slabs with SLAB_HWCACHE_ALIGN to allow rearranging fields by cache lines, increases the size of entries in 11% per entry. From Florian Westphal. 7) Skip zone comparison if CONFIG_NF_CONNTRACK_ZONES=n, from Florian. 8) Remove useless defensive check in nf_logger_find_get() from Shivani Bhardwaj. 9) Remove zone extension as place it in the conntrack object, this is always include in the hashing and we expect more intensive use of zones since containers are in place. Also from Florian Westphal. 10) Owner match now works from any namespace, from Eric Bierdeman. 11) Make sure we only reply with TCP reset to TCP traffic from nf_reject_ipv4, patch from Liping Zhang. 12) Introduce --nflog-size to indicate amount of network packet bytes that are copied to userspace via log message, from Vishwanath Pai. This obsoletes --nflog-range that has never worked, it was designed to achieve this but it has never worked. 13) Introduce generic macros for nf_tables object generation masks. 14) Use generation mask in table, chain and set objects in nf_tables. This allows fixes interferences with ongoing preparation phase of the commit protocol and object listings going on at the same time. This update is introduced in three patches, one per object. 15) Check if the object is active in the next generation for element deactivation in the rbtree implementation, given that deactivation happens from the commit phase path we have to observe the future status of the object. 16) Support for deletion of just added elements in the hash set type. 17) Allow to resize hashtable from /proc entry, not only from the obscure /sys entry that maps to the module parameter, from Florian Westphal. 18) Get rid of NFT_BASECHAIN_DISABLED, this code is not exercised anymore since we tear down the ruleset whenever the netdevice goes away. 19) Support for matching inverted set lookups, from Arturo Borrero. 20) Simplify the iptables_mangle_hook() by removing a superfluous extra branch. 21) Introduce ether_addr_equal_masked() and use it from the netfilter codebase, from Joe Perches. 22) Remove references to "Use netfilter MARK value as routing key" from the Netfilter Kconfig description given that this toggle doesn't exists already for 10 years, from Moritz Sichert. 23) Introduce generic NF_INVF() and use it from the xtables codebase, from Joe Perches. 24) Setting logger to NONE via /proc was not working unless explicit nul-termination was included in the string. This fixes seems to leave the former behaviour there, so we don't break backward. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-06 09:15:15 -07:00
Jiri Pirko	18bfb924f0	net: introduce default neigh_construct/destroy ndo calls for L2 upper devices L2 upper device needs to propagate neigh_construct/destroy calls down to lower devices. Do this by defining default ndo functions and use them in team, bond, bridge and vlan. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-07-05 09:06:28 -07:00
Joe Perches	c37a2dfa67	netfilter: Convert FWINV<[foo]> macros and uses to NF_INVF netfilter uses multiple FWINV #defines with identical form that hide a specific structure variable and dereference it with a invflags member. $ git grep "#define FWINV" include/linux/netfilter_bridge/ebtables.h:#define FWINV(bool,invflg) ((bool) ^ !!(info->invflags & invflg)) net/bridge/netfilter/ebtables.c:#define FWINV2(bool, invflg) ((bool) ^ !!(e->invflags & invflg)) net/ipv4/netfilter/arp_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(arpinfo->invflags & (invflg))) net/ipv4/netfilter/ip_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ipinfo->invflags & (invflg))) net/ipv6/netfilter/ip6_tables.c:#define FWINV(bool, invflg) ((bool) ^ !!(ip6info->invflags & (invflg))) net/netfilter/xt_tcpudp.c:#define FWINVTCP(bool, invflg) ((bool) ^ !!(tcpinfo->invflags & (invflg))) Consolidate these macros into a single NF_INVF macro. Miscellanea: o Neaten the alignment around these uses o A few lines are > 80 columns for intelligibility Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-03 10:55:07 +02:00
Joe Perches	4ae89ad924	etherdevice.h & bridge: netfilter: Add and use ether_addr_equal_masked There are code duplications of a masked ethernet address comparison here so make it a separate function instead. Miscellanea: o Neaten alignment of FWINV macro uses to make it clearer for the reader Signed-off-by: Joe Perches <joe@perches.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-07-01 16:37:06 +02:00
Shmulik Ladkani	fedbb6b4ff	ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output ip_skb_dst_mtu uses skb->sk, assuming it is an AF_INET socket (e.g. it calls ip_sk_use_pmtu which casts sk as an inet_sk). However, in the case of UDP tunneling, the skb->sk is not necessarily an inet socket (could be AF_PACKET socket, or AF_UNSPEC if arriving from tun/tap). OTOH, the sk passed as an argument throughout IP stack's output path is the one which is of PMTU interest: - In case of local sockets, sk is same as skb->sk; - In case of a udp tunnel, sk is the tunneling socket. Fix, by passing ip_finish_output's sk to ip_skb_dst_mtu. This augments `7026b1ddb6` 'netfilter: Pass socket pointer down through okfn().' Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-30 09:02:48 -04:00
Nikolay Aleksandrov	1080ab95e3	net: bridge: add support for IGMP/MLD stats and export them via netlink This patch adds stats support for the currently used IGMP/MLD types by the bridge. The stats are per-port (plus one stat per-bridge) and per-direction (RX/TX). The stats are exported via netlink via the new linkxstats API (RTM_GETSTATS). In order to minimize the performance impact, a new option is used to enable/disable the stats - multicast_stats_enabled, similar to the recent vlan stats. Also in order to avoid multiple IGMP/MLD type lookups and checks, we make use of the current "igmp" member of the bridge private skb->cb region to record the type on Rx (both host-generated and external packets pass by multicast_rcv()). We can do that since the igmp member was used as a boolean and all the valid IGMP/MLD types are positive values. The normal bridge fast-path is not affected at all, the only affected paths are the flooding ones and since we make use of the IGMP/MLD type, we can quickly determine if the packet should be counted using cache-hot data (cb's igmp member). We add counters for: * IGMP Queries * IGMP Leaves * IGMP v1/v2/v3 reports * MLD Queries * MLD Leaves * MLD v1/v2 reports These are invaluable when monitoring or debugging complex multicast setups with bridges. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-30 06:18:24 -04:00
Nikolay Aleksandrov	80e73cc563	net: rtnetlink: add support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute This patch adds support for the IFLA_STATS_LINK_XSTATS_SLAVE attribute which allows to export per-slave statistics if the master device supports the linkxstats callback. The attribute is passed down to the linkxstats callback and it is up to the callback user to use it (an example has been added to the only current user - the bridge). This allows us to query only specific slaves of master devices like bridge ports and export only what we're interested in instead of having to dump all ports and searching only for a single one. This will be used to export per-port IGMP/MLD stats and also per-port vlan stats in the future, possibly other statistics as well. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-30 06:15:04 -04:00
Nikolay Aleksandrov	565ce8f32a	net: bridge: fix vlan stats continue counter I made a dumb off-by-one mistake when I added the vlan stats counter dumping code. The increment should happen before the check, not after otherwise we miss one entry when we continue dumping. Fixes: `a60c090361` ("bridge: netlink: export per-vlan stats") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-29 05:33:35 -04:00
daniel	0888d5f3c0	Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address The bridge is falsly dropping ipv6 mulitcast packets if there is: 1. No ipv6 address assigned on the brigde. 2. No external mld querier present. 3. The internal querier enabled. When the bridge fails to build mld queries, because it has no ipv6 address, it slilently returns, but keeps the local querier enabled. This specific case causes confusing packet loss. Ipv6 multicast snooping can only work if: a) An external querier is present OR b) The bridge has an ipv6 address an is capable of sending own queries Otherwise it has to forward/flood the ipv6 multicast traffic, because snooping cannot work. This patch fixes the issue by adding a flag to the bridge struct that indicates that there is currently no ipv6 address assinged to the bridge and returns a false state for the local querier in __br_multicast_querier_exists(). Special thanks to Linus Lüssing. Fixes: `d1d81d4c3d` ("bridge: check return value of ipv6_dev_get_saddr()") Signed-off-by: Daniel Danzberger <daniel@dd-wrt.com> Acked-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-28 08:03:04 -04:00
Ido Schimmel	56fae404fb	bridge: Fix incorrect re-injection of STP packets Commit `8626c56c82` ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") fixed incorrect usage of NF_HOOK's return value by consuming packets in okfn via br_pass_frame_up(). However, this function re-injects packets to the Rx path with skb->dev set to the bridge device, which breaks kernel's STP, as all STP packets appear to originate from the bridge device itself. Instead, if STP is enabled and bridge isn't a 802.1ad bridge, then learn packet's SMAC and inject it back to the Rx path for further processing by the packet handlers. The patch also makes netfilter's behavior consistent with regards to packets destined to the Bridge Group Address, as no hook registered at LOCAL_IN will ever be called, regardless if STP is enabled or not. Cc: Florian Westphal <fw@strlen.de> Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Fixes: `8626c56c82` ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-10 22:41:58 -07:00
Toshiaki Makita	0b148def40	bridge: Don't insert unnecessary local fdb entry on changing mac address The missing br_vlan_should_use() test caused creation of an unneeded local fdb entry on changing mac address of a bridge device when there is a vlan which is configured on a bridge port but not on the bridge device. Fixes: `2594e9064a` ("bridge: vlan: add per-vlan struct and move to rhashtables") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-06-08 00:31:38 -07:00
Tobin C Harding	402f9030cb	bridge: netfilter: checkpatch data type fixes checkpatch produces data type 'checks'. This patch amends them by changing, for example: uint8_t -> u8 Signed-off-by: Tobin C Harding <me@tobin.cc> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-06-07 17:15:29 +02:00
David S. Miller	e800072c18	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net In netdevice.h we removed the structure in net-next that is being changes in 'net'. In macsec.c and rtnetlink.c we have overlaps between fixes in 'net' and the u64 attribute changes in 'net-next'. The mlx5 conflicts have to do with vxlan support dependencies. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-09 15:59:24 -04:00
Linus Lüssing	856ce5d083	bridge: fix igmp / mld query parsing With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMP and MLD query parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header. If there is a querier somewhere else, then this either causes the multicast snooping to stay disabled even though it could be enabled. Or, if we have the querier enabled too, then this can create unnecessary IGMP / MLD query messages on the link. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: `9afd85c9e4` ("net: Export IGMP/MLD message validation code") Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-06 12:55:13 -04:00
Nikolay Aleksandrov	31ca0458a6	net: bridge: fix old ioctl unlocked net device walk get_bridge_ifindices() is used from the old "deviceless" bridge ioctl calls which aren't called with rtnl held. The comment above says that it is called with rtnl but that is not really the case. Here's a sample output from a test ASSERT_RTNL() which I put in get_bridge_ifindices and executed "brctl show": [ 957.422726] RTNL: assertion failed at net/bridge//br_ioctl.c (30) [ 957.422925] CPU: 0 PID: 1862 Comm: brctl Tainted: G W O 4.6.0-rc4+ #157 [ 957.423009] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 957.423009] 0000000000000000 ffff880058adfdf0 ffffffff8138dec5 0000000000000400 [ 957.423009] ffffffff81ce8380 ffff880058adfe58 ffffffffa05ead32 0000000000000001 [ 957.423009] 00007ffec1a444b0 0000000000000400 ffff880053c19130 0000000000008940 [ 957.423009] Call Trace: [ 957.423009] [<ffffffff8138dec5>] dump_stack+0x85/0xc0 [ 957.423009] [<ffffffffa05ead32>] br_ioctl_deviceless_stub+0x212/0x2e0 [bridge] [ 957.423009] [<ffffffff81515beb>] sock_ioctl+0x22b/0x290 [ 957.423009] [<ffffffff8126ba75>] do_vfs_ioctl+0x95/0x700 [ 957.423009] [<ffffffff8126c159>] SyS_ioctl+0x79/0x90 [ 957.423009] [<ffffffff8163a4c0>] entry_SYSCALL_64_fastpath+0x23/0xc1 Since it only reads bridge ifindices, we can use rcu to safely walk the net device list. Also remove the wrong rtnl comment above. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-05 23:32:30 -04:00
Nikolay Aleksandrov	a60c090361	bridge: netlink: export per-vlan stats Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-02 22:27:06 -04:00
Nikolay Aleksandrov	6dada9b10a	bridge: vlan: learn to count Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-02 22:27:06 -04:00
Eric Dumazet	1d01550359	ipv6: rename IP6_INC_STATS_BH() Rename IP6_INC_STATS_BH() to __IP6_INC_STATS() and IP6_ADD_STATS_BH() to __IP6_ADD_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:24 -04:00
Eric Dumazet	b45386efa2	net: rename IP_INC_STATS_BH() Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to better express this is used in non preemptible context. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 22:48:23 -04:00
David S. Miller	c0cc53162a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 15:43:10 -04:00
Nicolas Dichtel	12a0faa3bd	bridge: use nla_put_u64_64bit() Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-25 15:09:10 -04:00
Elad Raz	45ebcce568	bridge: mdb: Marking port-group as offloaded There is a race-condition when updating the mdb offload flag without using the mulicast_lock. This reverts commit `9e8430f8d6` ("bridge: mdb: Passing the port-group pointer to br_mdb module"). This patch marks offloaded MDB entry as "offload" by changing the port- group flags and marks it as MDB_PG_FLAGS_OFFLOAD. When switchdev PORT_MDB succeeded and adds a multicast group, a completion callback is been invoked "br_mdb_complete". The completion function locks the multicast_lock and finds the right net_bridge_port_group and marks it as offloaded. Fixes: `9e8430f8d6` ("bridge: mdb: Passing the port-group pointer to br_mdb module") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:23:32 -04:00
Elad Raz	6dd684c0fe	bridge: mdb: Common function for mdb entry translation There is duplicate code that translates br_mdb_entry to br_ip let's wrap it in a common function. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:23:32 -04:00
David S. Miller	1602f49b58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 18:51:33 -04:00
Xin Long	bf871ad792	bridge: a netlink notification should be sent when those attributes are changed by ioctl Now when we change the attributes of bridge or br_port by netlink, a relevant netlink notification will be sent, but if we change them by ioctl or sysfs, no notification will be sent. We should ensure that whenever those attributes change internally or from sysfs/ioctl, that a netlink notification is sent out to listeners. Also, NetworkManager will use this in the future to listen for out-of-band bridge master attribute updates and incorporate them into the runtime configuration. This patch is used for ioctl. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:33 -04:00
Xin Long	bdaf0d5d98	bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_if Now when we change the attributes of bridge or br_port by netlink, a relevant netlink notification will be sent, but if we change them by ioctl or sysfs, no notification will be sent. We should ensure that whenever those attributes change internally or from sysfs/ioctl, that a netlink notification is sent out to listeners. Also, NetworkManager will use this in the future to listen for out-of-band bridge master attribute updates and incorporate them into the runtime configuration. This patch is used for br_sysfs_if, and we also move br_ifinfo_notify out of store_flag. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:33 -04:00
Xin Long	047831a9b9	bridge: a netlink notification should be sent when those attributes are changed by br_sysfs_br Now when we change the attributes of bridge or br_port by netlink, a relevant netlink notification will be sent, but if we change them by ioctl or sysfs, no notification will be sent. We should ensure that whenever those attributes change internally or from sysfs/ioctl, that a netlink notification is sent out to listeners. Also, NetworkManager will use this in the future to listen for out-of-band bridge master attribute updates and incorporate them into the runtime configuration. This patch is used for br_sysfs_br. and we also need to remove some rtnl_trylock in old functions so that we can call it in a common one. For group_addr_store, we cannot make it use store_bridge_parm, because it's not a string-to-long convert, we will add notification on it individually. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:33 -04:00
Xin Long	4436156b6f	bridge: simplify the stp_state_store by calling store_bridge_parm There are some repetitive codes in stp_state_store, we can remove them by calling store_bridge_parm. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:32 -04:00
Xin Long	347db6b49e	bridge: simplify the forward_delay_store by calling store_bridge_parm There are some repetitive codes in forward_delay_store, we can remove them by calling store_bridge_parm. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:32 -04:00
Xin Long	14f31bb39f	bridge: simplify the flush_store by calling store_bridge_parm There are some repetitive codes in flush_store, we can remove them by calling store_bridge_parm, also, it would send rtnl notification after we add it in store_bridge_parm in the following patches. Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-13 22:42:32 -04:00
David S. Miller	da0caadf0a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for your net-next tree. 1) Define pr_fmt() in nf_conntrack, from Weongyo Jeong. 2) Define and register netfilter's afinfo for the bridge family, this comes in preparation for native nfqueue's bridge for nft, from Stephane Bryant. 3) Add new attributes to store layer 2 and VLAN headers to nfqueue, also from Stephane Bryant. 4) Parse new NFQA_VLAN and NFQA_L2HDR nfqueue netlink attributes coming from userspace, from Stephane Bryant. 5) Use net->ipv6.devconf_all->hop_limit instead of hardcoded hop_limit in IPv6 SYNPROXY, from Liping Zhang. 6) Remove unnecessary check for dst == NULL in nf_reject_ipv6, from Haishuang Yan. 7) Deinline ctnetlink event report functions, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-12 22:34:56 -04:00
Phil Sutter	bcf4934288	netfilter: ebtables: Fix extension lookup with identical name If a requested extension exists as module and is not loaded, ebt_check_match() might accidentally use an NFPROTO_UNSPEC one with same name and fail. Reproduced with limit match: Given xt_limit and ebt_limit both built as module, the following would fail: modprobe xt_limit ebtables -I INPUT --limit 1/s -j ACCEPT The fix is to make ebt_check_match() distrust a found NFPROTO_UNSPEC extension and retry after requesting an appropriate module. Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-04-13 01:16:57 +02:00
David S. Miller	32fa270c8a	Revert "bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr" This reverts commit `c862cc9b70`. Patch lacks a real-name Signed-off-by. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 15:42:45 -04:00
Bastien Philbert	c862cc9b70	bridge: Fix incorrect variable assignment on error path in br_sysfs_addbr This fixes the incorrect variable assignment on error path in br_sysfs_addbr for when the call to kobject_create_and_add fails to assign the value of -EINVAL to the returned variable of err rather then incorrectly return zero making callers think this function has succeededed due to the previous assignment being assigned zero when assigning it the successful return value of the call to sysfs_create_group which is zero. Signed-off-by: Bastien Philbert <bastienphilbert@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-04 16:12:37 -04:00
Haishuang Yan	5e263f7126	bridge: Allow set bridge ageing time when switchdev disabled When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP, we should ignore this error code and continue to set the ageing time. Fixes: `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-30 15:38:13 -04:00
Stephane Bryant	ac28634456	netfilter: bridge: add nf_afinfo to enable queuing to userspace This just adds and registers a nf_afinfo for the ethernet bridge, which enables queuing to userspace for the AF_BRIDGE family. No checksum computation is done. Signed-off-by: Stephane Bryant <stephane.ml.bryant@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-29 13:24:37 +02:00
Liping Zhang	29421198c3	netfilter: ipv4: fix NULL dereference Commit `fa50d974d1` ("ipv4: Namespaceify ip_default_ttl sysctl knob") use sock_net(skb->sk) to get the net namespace, but we can't assume that sk_buff->sk is always exist, so when it is NULL, oops will happen. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Reviewed-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-28 17:59:29 +02:00
Pablo Neira Ayuso	b301f25387	netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES Make sure the table names via getsockopt GET_ENTRIES is nul-terminated in ebtables and all the x_tables variants and their respective compat code. Uncovered by KASAN. Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-28 17:59:24 +02:00
Eric Dumazet	ae74f10068	bridge: update max_gso_segs and max_gso_size It can be useful to lower max_gso_segs on NIC with very low number of TX descriptors like bcmgenet. However, this is defeated by bridge since it does not propagate the lower value of max_gso_segs and max_gso_size. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Petri Gynther <pgynther@google.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-21 13:35:56 -04:00
Florian Westphal	8626c56c82	bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict Zefir Kurtisi reported kernel panic with an openwrt specific patch. However, it turns out that mainline has a similar bug waiting to happen. Once NF_HOOK() returns the skb is in undefined state and must not be used. Moreover, the okfn must consume the skb to support async processing (NF_QUEUE). Current okfn in this spot doesn't consume it and caller assumes that NF_HOOK return value tells us if skb was freed or not, but thats wrong. It "works" because no in-tree user registers a NFPROTO_BRIDGE hook at LOCAL_IN that returns STOLEN or NF_QUEUE verdicts. Once we add NF_QUEUE support for nftables bridge this will break -- NF_QUEUE holds the skb for async processing, caller will erronoulsy return RX_HANDLER_PASS and on reinject netfilter will access free'd skb. Fix this by pushing skb up the stack in the okfn instead. NB: It also seems dubious to use LOCAL_IN while bypassing PRE_ROUTING completely in this case but this is how its been forever so it seems preferable to not change this. Cc: Felix Fietkau <nbd@openwrt.org> Cc: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-14 15:46:41 -04:00
Stephen Hemminger	4c656c13b2	bridge: allow zero ageing time This fixes a regression in the bridge ageing time caused by: commit `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") There are users of Linux bridge which use the feature that if ageing time is set to 0 it causes entries to never expire. See: https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge For a pure software bridge, it is unnecessary for the code to have arbitrary restrictions on what values are allowable. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-11 14:58:58 -05:00
David S. Miller	4c38cd61ae	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter updates for your net-next tree, they are: 1) Remove useless debug message when deleting IPVS service, from Yannick Brosseau. 2) Get rid of compilation warning when CONFIG_PROC_FS is unset in several spots of the IPVS code, from Arnd Bergmann. 3) Add prandom_u32 support to nft_meta, from Florian Westphal. 4) Remove unused variable in xt_osf, from Sudip Mukherjee. 5) Don't calculate IP checksum twice from netfilter ipv4 defrag hook since fixing af_packet defragmentation issues, from Joe Stringer. 6) On-demand hook registration for iptables from netns. Instead of registering the hooks for every available netns whenever we need one of the support tables, we register this on the specific netns that needs it, patchset from Florian Westphal. 7) Add missing port range selection to nf_tables masquerading support. BTW, just for the record, there is a typo in the description of `5f6c253ebe` ("netfilter: bridge: register hooks only when bridge interface is added") that refers to the cluster match as deprecated, but it is actually the CLUSTERIP target (which registers hooks inconditionally) the one that is scheduled for removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 14:25:20 -05:00
David S. Miller	810813c47a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Several cases of overlapping changes, as well as one instance (vxlan) of a bug fix in 'net' overlapping with code movement in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-08 12:34:12 -05:00
Florian Westphal	5f6c253ebe	netfilter: bridge: register hooks only when bridge interface is added This moves bridge hooks to a register-when-needed scheme. We use a device notifier to register the 'call-iptables' netfilter hooks only once a bridge gets added. This means that if the initial namespace uses a bridge, newly created network namespaces no longer get the PRE_ROUTING ipt_sabotage hook. It will registered in that network namespace once a bridge is created within that namespace. A few modules still use global hooks: - conntrack - bridge PF_BRIDGE hooks - IPVS - CLUSTER match (deprecated) - SYNPROXY As long as these modules are not loaded/used, a new network namespace has empty hook list and NF_HOOK() will boil down to single list_empty test even if initial namespace does stateless packet filtering. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2016-03-02 20:05:25 +01:00
WANG Cong	64d4e3431e	net: remove skb_sender_cpu_clear() After commit `52bd2d62ce` ("net: better skb->sender_cpu and skb->napi_id cohabitation") skb_sender_cpu_clear() becomes empty and can be removed. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 17:36:47 -05:00
Nikolay Aleksandrov	59f78f9f6c	bridge: mcast: add support for more router port information dumping Allow for more multicast router port information to be dumped such as timer and type attributes. For that that purpose we need to extend the MDBA_ROUTER_PORT attribute similar to how it was done for the mdb entries recently. The new format is thus: [MDBA_ROUTER_PORT] = { <- nested attribute u32 ifindex <- router port ifindex for user-space compatibility [MDBA_ROUTER_PATTR attributes] } This way it remains compatible with older users (they'll simply retrieve the u32 in the beginning) and new users can parse the remaining attributes. It would also allow to add future extensions to the router port without breaking compatibility. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	a55d8246ab	bridge: mcast: add support for temporary port router Add support for a temporary router port which doesn't depend only on the incoming query. It can be refreshed if set to the same value, which is a no-op for the rest. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	4950cfd1e6	bridge: mcast: do nothing if port's multicast_router is set to the same val This is needed for the upcoming temporary port router. There's no point to go through the logic if the value is the same. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Nikolay Aleksandrov	7f0aec7a66	bridge: mcast: use names for the different multicast_router types Using raw values makes it difficult to extend and also understand the code, give them names and do explicit per-option manipulation in br_multicast_set_port_router. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 16:55:07 -05:00
Paolo Abeni	45493d47c3	bridge: notify enslaved devices of headroom changes On bridge needed_headroom changes, the enslaved devices are notified via the ndo_set_rx_headroom method Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-03-01 15:54:30 -05:00
MINOURA Makoto / 箕浦真	472681d57a	net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump. When the send skbuff reaches the end, nlmsg_put and friends returns -EMSGSIZE but it is silently thrown away in ndo_fdb_dump. It is called within a for_each_netdev loop and the first fdb entry of a following netdev could fit in the remaining skbuff. This breaks the mechanism of cb->args[0] and idx to keep track of the entries that are already dumped, which results missing entries in bridge fdb show command. Signed-off-by: Minoura Makoto <minoura@valinux.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-26 15:04:02 -05:00
David Decotigny	702b26a24d	net: bridge: use __ethtool_get_ksettings Signed-off-by: David Decotigny <decot@googlers.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-25 22:06:46 -05:00
David S. Miller	b633353115	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-23 00:09:14 -05:00
Nikolay Aleksandrov	2125715635	bridge: mdb: add support for more attributes and export timer Currently mdb entries are exported directly as a structure inside MDBA_MDB_ENTRY_INFO attribute, we can't really extend it without breaking user-space. In order to export new mdb fields, I've converted the MDBA_MDB_ENTRY_INFO into a nested attribute which starts like before with struct br_mdb_entry (without header, as it's casted directly in iproute2) and continues with MDBA_MDB_EATTR_ attributes. This way we keep compatibility with older users and can export new data. I've tested this with iproute2, both with and without support for the added attribute and it works fine. So basically we again have MDBA_MDB_ENTRY_INFO with struct br_mdb_entry inside but it may contain also some additional MDBA_MDB_EATTR_ attributes such as MDBA_MDB_EATTR_TIMER which can be parsed by user-space. So the new structure is: [MDBA_MDB] = { [MDBA_MDB_ENTRY] = { [MDBA_MDB_ENTRY_INFO] [MDBA_MDB_ENTRY_INFO] { <- Nested attribute struct br_mdb_entry <- nla_put_nohdr() [MDBA_MDB_ENTRY attributes] <- normal netlink attributes } } } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-19 15:27:36 -05:00
Nikolay Aleksandrov	76cc173d48	bridge: mdb: reduce the indentation level in br_mdb_fill_info Switch the port check and skip if it's null, this allows us to reduce one indentation level. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-19 15:27:36 -05:00
Vivien Didelot	7c25b16dbb	net: bridge: log port STP state on change Remove the shared br_log_state function and print the info directly in br_set_state, where the net_bridge_port state is actually changed. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 14:20:08 -05:00
Ido Schimmel	7fbac984f3	bridge: switchdev: Offload VLAN flags to hardware bridge When VLANs are created / destroyed on a VLAN filtering bridge (MASTER flag set), the configuration is passed down to the hardware. However, when only the flags (e.g. PVID) are toggled, the configuration is done in the software bridge alone. While it is possible to pass these flags to hardware when invoked with the SELF flag set, this creates inconsistency with regards to the way the VLANs are initially configured. Pass the flags down to the hardware even when the VLAN already exists and only the flags are toggled. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-18 11:18:11 -05:00
Nikolay Borisov	fa50d974d1	ipv4: Namespaceify ip_default_ttl sysctl knob Signed-off-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-16 20:42:54 -05:00
Arnd Bergmann	56bb7fd994	bridge: mdb: avoid uninitialized variable warning A recent change to the mdb code confused the compiler to the point where it did not realize that the port-group returned from br_mdb_add_group() is always valid when the function returns a nonzero return value, so we get a spurious warning: net/bridge/br_mdb.c: In function 'br_mdb_add': net/bridge/br_mdb.c:542:4: error: 'pg' may be used uninitialized in this function [-Werror=maybe-uninitialized] __br_mdb_notify(dev, entry, RTM_NEWMDB, pg); Slightly rearranging the code in br_mdb_add_group() makes the problem go away, as gcc is clever enough to see that both functions check for 'ret != 0'. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `9e8430f8d6` ("bridge: mdb: Passing the port-group pointer to br_mdb module") Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-16 15:37:28 -05:00
Elad Raz	9e8430f8d6	bridge: mdb: Passing the port-group pointer to br_mdb module Passing the port-group to br_mdb in order to allow direct access to the structure. br_mdb will later use the structure to reflect HW reflection status via "state" variable. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-09 04:42:47 -05:00
Elad Raz	9d06b6d8a3	bridge: mdb: Separate br_mdb_entry->state from net_bridge_port_group->state Change net_bridge_port_group 'state' member to 'flags' and define new set of flags internal to the kernel. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-02-09 04:42:47 -05:00
Ido Schimmel	4f2c6ae5c6	switchdev: Require RTNL mutex to be held when sending FDB notifications When switchdev drivers process FDB notifications from the underlying device they resolve the netdev to which the entry points to and notify the bridge using the switchdev notifier. However, since the RTNL mutex is not held there is nothing preventing the netdev from disappearing in the middle, which will cause br_switchdev_event() to dereference a non-existing netdev. Make switchdev drivers hold the lock at the beginning of the notification processing session and release it once it ends, after notifying the bridge. Also, remove switchdev_mutex and fdb_lock, as they are no longer needed when RTNL mutex is held. Fixes: `03bf0c2812` ("switchdev: introduce switchdev notifier") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-28 16:21:31 -08:00
Nikolay Aleksandrov	c6894dec8e	bridge: fix lockdep addr_list_lock false positive splat After promisc mode management was introduced a bridge device could do dev_set_promiscuity from its ndo_change_rx_flags() callback which in turn can be called after the bridge's addr_list_lock has been taken (e.g. by dev_uc_add). This causes a false positive lockdep splat because the port interfaces' addr_list_lock is taken when br_manage_promisc() runs after the bridge's addr list lock was already taken. To remove the false positive introduce a custom bridge addr_list_lock class and set it on bridge init. A simple way to reproduce this is with the following: $ brctl addbr br0 $ ip l add l br0 br0.100 type vlan id 100 $ ip l set br0 up $ ip l set br0.100 up $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering $ brctl addif br0 eth0 Splat: [ 43.684325] ============================================= [ 43.684485] [ INFO: possible recursive locking detected ] [ 43.684636] 4.4.0-rc8+ #54 Not tainted [ 43.684755] --------------------------------------------- [ 43.684906] brctl/1187 is trying to acquire lock: [ 43.685047] (_xmit_ETHER){+.....}, at: [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.685460] but task is already holding lock: [ 43.685618] (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.686015] other info that might help us debug this: [ 43.686316] Possible unsafe locking scenario: [ 43.686743] CPU0 [ 43.686967] ---- [ 43.687197] lock(_xmit_ETHER); [ 43.687544] lock(_xmit_ETHER); [ 43.687886] * DEADLOCK * [ 43.688438] May be due to missing lock nesting notation [ 43.688882] 2 locks held by brctl/1187: [ 43.689134] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81510317>] rtnl_lock+0x17/0x20 [ 43.689852] #1: (_xmit_ETHER){+.....}, at: [<ffffffff815072a7>] dev_uc_add+0x27/0x80 [ 43.690575] stack backtrace: [ 43.690970] CPU: 0 PID: 1187 Comm: brctl Not tainted 4.4.0-rc8+ #54 [ 43.691270] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 [ 43.691770] ffffffff826a25c0 ffff8800369fb8e0 ffffffff81360ceb ffffffff826a25c0 [ 43.692425] ffff8800369fb9b8 ffffffff810d0466 ffff8800369fb968 ffffffff81537139 [ 43.693071] ffff88003a08c880 0000000000000000 00000000ffffffff 0000000002080020 [ 43.693709] Call Trace: [ 43.693931] [<ffffffff81360ceb>] dump_stack+0x4b/0x70 [ 43.694199] [<ffffffff810d0466>] __lock_acquire+0x1e46/0x1e90 [ 43.694483] [<ffffffff81537139>] ? netlink_broadcast_filtered+0x139/0x3e0 [ 43.694789] [<ffffffff8153b5da>] ? nlmsg_notify+0x5a/0xc0 [ 43.695064] [<ffffffff810d10f5>] lock_acquire+0xe5/0x1f0 [ 43.695340] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.695623] [<ffffffff815edea5>] _raw_spin_lock_bh+0x45/0x80 [ 43.695901] [<ffffffff8150169e>] ? dev_set_rx_mode+0x1e/0x40 [ 43.696180] [<ffffffff8150169e>] dev_set_rx_mode+0x1e/0x40 [ 43.696460] [<ffffffff8150189c>] dev_set_promiscuity+0x3c/0x50 [ 43.696750] [<ffffffffa0586845>] br_port_set_promisc+0x25/0x50 [bridge] [ 43.697052] [<ffffffffa05869aa>] br_manage_promisc+0x8a/0xe0 [bridge] [ 43.697348] [<ffffffffa05826ee>] br_dev_change_rx_flags+0x1e/0x20 [bridge] [ 43.697655] [<ffffffff81501532>] __dev_set_promiscuity+0x132/0x1f0 [ 43.697943] [<ffffffff81501672>] __dev_set_rx_mode+0x82/0x90 [ 43.698223] [<ffffffff815072de>] dev_uc_add+0x5e/0x80 [ 43.698498] [<ffffffffa05b3c62>] vlan_device_event+0x542/0x650 [8021q] [ 43.698798] [<ffffffff8109886d>] notifier_call_chain+0x5d/0x80 [ 43.699083] [<ffffffff810988b6>] raw_notifier_call_chain+0x16/0x20 [ 43.699374] [<ffffffff814f456e>] call_netdevice_notifiers_info+0x6e/0x80 [ 43.699678] [<ffffffff814f4596>] call_netdevice_notifiers+0x16/0x20 [ 43.699973] [<ffffffffa05872be>] br_add_if+0x47e/0x4c0 [bridge] [ 43.700259] [<ffffffffa058801e>] add_del_if+0x6e/0x80 [bridge] [ 43.700548] [<ffffffffa0588b5f>] br_dev_ioctl+0xaf/0xc0 [bridge] [ 43.700836] [<ffffffff8151a7ac>] dev_ifsioc+0x30c/0x3c0 [ 43.701106] [<ffffffff8151aac9>] dev_ioctl+0xf9/0x6f0 [ 43.701379] [<ffffffff81254345>] ? mntput_no_expire+0x5/0x450 [ 43.701665] [<ffffffff812543ee>] ? mntput_no_expire+0xae/0x450 [ 43.701947] [<ffffffff814d7b02>] sock_do_ioctl+0x42/0x50 [ 43.702219] [<ffffffff814d8175>] sock_ioctl+0x1e5/0x290 [ 43.702500] [<ffffffff81242d0b>] do_vfs_ioctl+0x2cb/0x5c0 [ 43.702771] [<ffffffff81243079>] SyS_ioctl+0x79/0x90 [ 43.703033] [<ffffffff815eebb6>] entry_SYSCALL_64_fastpath+0x16/0x7a CC: Vlad Yasevich <vyasevic@redhat.com> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Bridge list <bridge@lists.linux-foundation.org> CC: Andy Gospodarek <gospo@cumulusnetworks.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: `2796d0c648` ("bridge: Automatically manage port promiscuous mode.") Reported-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-15 15:40:45 -05:00
Elad Raz	f1fecb1d10	bridge: Reflect MDB entries to hardware Offload MDB changes per port to hardware Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-10 16:50:21 -05:00
David S. Miller	9b59377b75	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) Release nf_tables objects on netns destructions via nft_release_afinfo(). 2) Destroy basechain and rules on netdevice removal in the new netdev family. 3) Get rid of defensive check against removal of inactive objects in nf_tables. 4) Pass down netns pointer to our existing nfnetlink callbacks, as well as commit() and abort() nfnetlink callbacks. 5) Allow to invert limit expression in nf_tables, so we can throttle overlimit traffic. 6) Add packet duplication for the netdev family. 7) Add forward expression for the netdev family. 8) Define pr_fmt() in conntrack helpers. 9) Don't leave nfqueue configuration on inconsistent state in case of errors, from Ken-ichirou MATSUZAWA, follow up patches are also from him. 10) Skip queue option handling after unbind. 11) Return error on unknown both in nfqueue and nflog command. 12) Autoload ctnetlink when NFQA_CFG_F_CONNTRACK is set. 13) Add new NFTA_SET_USERDATA attribute to store user data in sets, from Carlos Falgueras. 14) Add support for 64 bit byteordering changes nf_tables, from Florian Westphal. 15) Add conntrack byte/packet counter matching support to nf_tables, also from Florian. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-08 20:53:16 -05:00
David S. Miller	9e0efaf6b4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2016-01-06 22:54:18 -05:00
Elad Raz	404cdbf089	bridge: add vlan filtering change for new bridged device Notifying hardware about newly bridged port vlan-aware changes. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:41 -05:00
Elad Raz	6b72a77020	bridge: add vlan filtering change notification Notifying hardware about bridge vlan-aware changes. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:40 -05:00
Elad Raz	08474cc1e6	bridge: Propagate vlan add failure to user Disallow adding interfaces to a bridge when vlan filtering operation failed. Send the failure code to the user. Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-06 14:42:40 -05:00
Hannes Frederic Sowa	ff62198553	bridge: Only call /sbin/bridge-stp for the initial network namespace [I stole this patch from Eric Biederman. He wrote:] > There is no defined mechanism to pass network namespace information > into /sbin/bridge-stp therefore don't even try to invoke it except > for bridge devices in the initial network namespace. > > It is possible for unprivileged users to cause /sbin/bridge-stp to be > invoked for any network device name which if /sbin/bridge-stp does not > guard against unreasonable arguments or being invoked twice on the > same network device could cause problems. [Hannes: changed patch using netns_eq] Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-01-05 16:46:17 -05:00
David S. Miller	c07f30ad68	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	2015-12-31 18:20:10 -05:00
Pablo Neira Ayuso	df05ef874b	netfilter: nf_tables: release objects on netns destruction We have to release the existing objects on netns removal otherwise we leak them. Chains are unregistered in first place to make sure no packets are walking on our rules and sets anymore. The object release happens by when we unregister the family via nft_release_afinfo() which is called from nft_unregister_afinfo() from the corresponding __net_exit path in every family. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-28 18:34:35 +01:00
Geliang Tang	aeb7ed14fe	bridge: use kobj_to_dev instead of to_dev kobj_to_dev has been defined in linux/device.h, so I replace to_dev with it. Signed-off-by: Geliang Tang <geliangtang@163.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-23 22:26:48 -05:00
Ido Schimmel	ef9cdd0fed	switchdev: bridge: Pass ageing time as clock_t instead of jiffies The bridge's ageing time is offloaded to hardware when: 1) A port joins a bridge 2) The ageing time of the bridge is changed In the first case the ageing time is offloaded as jiffies, but in the second case it's offloaded as clock_t, which is what existing switchdev drivers expect to receive. Fixes: `6ac311ae8b` ("Adding switchdev ageing notification on port bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-22 15:56:44 -05:00
David S. Miller	59ce9670ce	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains the first batch of Netfilter updates for the upcoming 4.5 kernel. This batch contains userspace netfilter header compilation fixes, support for packet mangling in nf_tables, the new tracing infrastructure for nf_tables and cgroup2 support for iptables. More specifically, they are: 1) Two patches to include dependencies in our netfilter userspace headers to resolve compilation problems, from Mikko Rapeli. 2) Four comestic cleanup patches for the ebtables codebase, from Ian Morris. 3) Remove duplicate include in the netfilter reject infrastructure, from Stephen Hemminger. 4) Two patches to simplify the netfilter defragmentation code for IPv6, patch from Florian Westphal. 5) Fix root ownership of /proc/net netfilter for unpriviledged net namespaces, from Philip Whineray. 6) Get rid of unused fields in struct nft_pktinfo, from Florian Westphal. 7) Add mangling support to our nf_tables payload expression, from Patrick McHardy. 8) Introduce a new netlink-based tracing infrastructure for nf_tables, from Florian Westphal. 9) Change setter functions in nfnetlink_log to be void, from Rami Rosen. 10) Add netns support to the cttimeout infrastructure. 11) Add cgroup2 support to iptables, from Tejun Heo. 12) Introduce nfnl_dereference_protected() in nfnetlink, from Florian. 13) Add support for mangling pkttype in the nf_tables meta expression, also from Florian. BTW, I need that you pull net into net-next, I have another batch that requires changes that I don't yet see in net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-18 15:37:42 -05:00
Ido Schimmel	6ff64f6f92	switchdev: Pass original device to port netdev driver switchdev drivers need to know the netdev on which the switchdev op was invoked. For example, the STP state of a VLAN interface configured on top of a port can change while being member in a bridge. In this case, the underlying driver should only change the STP state of that particular VLAN and not of all the VLANs configured on the port. However, current switchdev infrastructure only passes the port netdev down to the driver. Solve that by passing the original device down to the driver as part of the required switchdev object / attribute. This doesn't entail any change in current switchdev drivers. It simply enables those supporting stacked devices to know the originating device and act accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-15 11:58:20 -05:00
Pablo Neira Ayuso	a4ec80082c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Resolve conflict between commit `264640fc2c` ("ipv6: distinguish frag queues by device for multicast and link-local packets") from the net tree and commit `029f7f3b87` ("netfilter: ipv6: nf_defrag: avoid/free clone operations") from the nf-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/ipv6/netfilter/nf_conntrack_reasm.c	2015-12-14 20:31:16 +01:00
Florian Westphal	e639f7ab07	netfilter: nf_tables: wrap tracing with a static key Only needed when meta nftrace rule(s) were added. The assumption is that no such rules are active, so the call to nft_trace_init is "never" needed. When nftrace rules are active, we always call the nft_trace_* functions, but will only send netlink messages when all of the following are true: - traceinfo structure was initialised - skb->nf_trace == 1 - at least one subscriber to trace group. Adding an extra conditional (static_branch ... && skb->nf_trace) nft_trace_init( ..) Is possible but results in a larger nft_do_chain footprint. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-12-09 13:23:13 +01:00
Jiri Pirko	29bf24afb2	net: add possibility to pass information about upper device via notifier Sometimes the drivers and other code would find it handy to know some internal information about upper device being changed. So allow upper-code to pass information down to notifier listeners during linking. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-03 11:49:25 -05:00
Jiri Pirko	6dffb0447c	net: propagate upper priv via netdev_master_upper_dev_link Eliminate netdev_master_upper_dev_link_private and pass priv directly as a parameter of netdev_master_upper_dev_link. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-12-03 11:49:25 -05:00
Ian Morris	c1bc1d257b	netfilter-bridge: layout of if statements Eliminate some checkpatch issues by improved layout of if statements. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-23 17:54:41 +01:00
Ian Morris	abcdd9a623	netfilter-bridge: brace placement Change brace placement to eliminate checkpatch error. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-23 17:54:40 +01:00
Ian Morris	7f495ad946	netfilter-bridge: use netdev style comments Changes comments to use netdev style. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-23 17:54:39 +01:00
Ian Morris	052a4bc49d	netfilter-bridge: Cleanse indentation Fixes a bunch of issues detected by checkpatch with regards to code indentation. No changes detected by objdiff. Signed-off-by: Ian Morris <ipm@chirality.org.uk> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-11-23 17:54:39 +01:00
Ido Schimmel	bbe14f5429	switchdev: bridge: Check return code is not EOPNOTSUPP When NET_SWITCHDEV=n, switchdev_port_attr_set simply returns EOPNOTSUPP. In this case we should not emit errors and warnings to the kernel log. Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Fixes: `0bc05d585d` ("switchdev: allow caller to explicitly request attr_set as deferred") Fixes: `6ac311ae8b` ("Adding switchdev ageing notification on port bridged") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-16 14:56:03 -05:00
Vlad Yasevich	8a921265e2	Revert "bridge: Allow forward delay to be cfgd when STP enabled" This reverts commit `34c2d9fb04`. There are 2 reasons for this revert: 1) The commit in question doesn't do what it says it does. The description reads: "Allow bridge forward delay to be configured when Spanning Tree is enabled." This was already the case before the commit was made. What the commit actually do was disallow invalid values or 'forward_delay' when STP was turned off. 2) The above change was actually a change in the user observed behavior and broke things like libvirt and other network configs that set 'forward_delay' to 0 without enabling STP. The value of 0 is actually used when STP is turned off to immediately mark the bridge as forwarding. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-10 15:41:36 -05:00
Ido Schimmel	eca1e006cf	bridge: vlan: Use rcu_dereference instead of rtnl_dereference br_should_learn() is protected by RCU and not by RTNL, so use correct flavor of nbp_vlan_group(). Fixes: `907b1e6e83` ("bridge: vlan: use proper rcu for the vlgrp member") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 16:27:39 -05:00
Ido Schimmel	ddd611d3ff	bridge: vlan: Use correct flag name in comment The flag used to indicate if a VLAN should be used for filtering - as opposed to context only - on the bridge itself (e.g. br0) is called 'brentry' and not 'brvlan'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:40:11 -05:00
Ido Schimmel	07bc588fc1	bridge: vlan: Prevent possible use-after-free When adding a port to a bridge we initialize VLAN filtering on it. We do not bail out in case an error occurred in nbp_vlan_init, as it can be used as a non VLAN filtering bridge. However, if VLAN filtering is required and an error occurred in nbp_vlan_init, we should set vlgrp to NULL, so that VLAN filtering functions (e.g. br_vlan_find, br_get_pvid) will know the struct is invalid and will not try to access it. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-11-02 15:40:10 -05:00
Roopa Prabhu	b7af1472af	bridge: set is_local and is_static before fdb entry is added to the fdb hashtable Problem Description: We can add fdbs pointing to the bridge with NULL ->dst but that has a few race conditions because br_fdb_insert() is used which first creates the fdb and then, after the fdb has been published/linked, sets "is_local" to 1 and in that time frame if a packet arrives for that fdb it may see it as non-local and either do a NULL ptr dereference in br_forward() or attach the fdb to the port where it arrived, and later br_fdb_insert() will make it local thus getting a wrong fdb entry. Call chain br_handle_frame_finish() -> br_forward(): But in br_handle_frame_finish() in order to call br_forward() the dst should not be local i.e. skb != NULL, whenever the dst is found to be local skb is set to NULL so we can't forward it, and here comes the problem since it's running only with RCU when forwarding packets it can see the entry before "is_local" is set to 1 and actually try to dereference NULL. The main issue is that if someone sends a packet to the switch while it's adding the entry which points to the bridge device, it may dereference NULL ptr. This is needed now after we can add fdbs pointing to the bridge. This poses a problem for br_fdb_update() as well, while someone's adding a bridge fdb, but before it has is_local == 1, it might get moved to a port if it comes as a source mac and then it may get its "is_local" set to 1 This patch changes fdb_create to take is_local and is_static as arguments to set these values in the fdb entry before it is added to the hash. Also adds null check for port in br_forward. Fixes: `3741873b4f` ("bridge: allow adding of fdb entries pointing to the bridge device") Reported-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-30 12:13:05 +09:00
Arad, Ronen	b1974ed05e	netlink: Rightsize IFLA_AF_SPEC size calculation if_nlmsg_size() overestimates the minimum allocation size of netlink dump request (when called from rtnl_calcit()) or the size of the message (when called from rtnl_getlink()). This is because ext_filter_mask is not supported by rtnl_link_get_af_size() and rtnl_link_get_size(). The over-estimation is significant when at least one netdev has many VLANs configured (8 bytes for each configured VLAN). This patch-set "rightsizes" the protocol specific attribute size calculation by propagating ext_filter_mask to rtnl_link_get_af_size() and adding this a argument to get_link_af_size op in rtnl_af_ops. Bridge module already used filtering aware sizing for notifications. br_get_link_af_size_filtered() is consistent with the modified get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops. br_get_link_af_size() becomes unused and thus removed. Signed-off-by: Ronen Arad <ronen.arad@intel.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 19:15:20 -07:00
Elad Raz	6ac311ae8b	Adding switchdev ageing notification on port bridged Configure ageing time to the HW for newly bridged device CC: Scott Feldman <sfeldma@gmail.com> CC: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Elad Raz <eladr@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-21 07:50:57 -07:00
Pablo Neira Ayuso	f0a0a978b6	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next This merge resolves conflicts with `75aec9df3a` ("bridge: Remove br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve netns support in the network stack that reached upstream via David's net-next tree. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Conflicts: net/bridge/br_netfilter_hooks.c	2015-10-17 14:28:03 +02:00
Florian Westphal	2ffbceb2b0	netfilter: remove hook owner refcounting since commit `8405a8fff3` ("netfilter: nf_qeueue: Drop queue entries on nf_unregister_hook") all pending queued entries are discarded. So we can simply remove all of the owner handling -- when module is removed it also needs to unregister all its hooks. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-16 18:21:39 +02:00
Jiri Pirko	56607386e8	bridge: defer switchdev fdb del call in fdb_del_external_learn Since spinlock is held here, defer the switchdev operation. Also, ensure that defered switchdev ops are processed before port master device is unlinked. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:50 -07:00
Jiri Pirko	850d0cbc91	switchdev: remove pointers from switchdev objects When object is used in deferred work, we cannot use pointers in switchdev object structures because the memory they point at may be already used by someone else. So rather do local copy of the value. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:49 -07:00
Jiri Pirko	0bc05d585d	switchdev: allow caller to explicitly request attr_set as deferred Caller should know if he can call attr_set directly (when holding RTNL) or if he has to defer the att_set processing for later. This also allows drivers to sleep inside attr_set and report operation status back to switchdev core. Switchdev core then warns if status is not ok, instead of silent errors happening in drivers. Benefit from newly introduced switchdev deferred ops infrastructure. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-15 06:09:48 -07:00
Nikolay Aleksandrov	f409d0ed87	bridge: vlan: move back vlan_flush Ido Schimmel reported a problem with switchdev devices because of the order change of del_nbp operations, more specifically the move of nbp_vlan_flush() which deletes all vlans and frees vlgrp after the rx_handler has been unregistered. So in order to fix this move vlan_flush back where it was and make it destroy the rhtable after NULLing vlgrp and waiting a grace period to make sure noone can see it. Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:58 -07:00
Nikolay Aleksandrov	b8d02c3cac	bridge: vlan: drop unnecessary flush code As Ido Schimmel pointed out the vlan_vid_del() code in nbp_vlan_flush is unnecessary (and is actually a remnant of the old vlan code) so we can remove it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:56 -07:00
Nikolay Aleksandrov	e9c953eff7	bridge: vlan: use rcu for vlan_list traversal in br_fill_ifinfo br_fill_ifinfo is called by br_ifinfo_notify which can be called from many contexts with different locks held, sometimes it relies upon bridge's spinlock only which is a problem for the vlan code, so use explicitly rcu for that to avoid problems. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:54 -07:00
Nikolay Aleksandrov	907b1e6e83	bridge: vlan: use proper rcu for the vlgrp member The bridge and port's vlgrp member is already used in RCU way, currently we rely on the fact that it cannot disappear while the port exists but that is error-prone and we might miss places with improper locking (either RCU or RTNL must be held to walk the vlan_list). So make it official and use RCU for vlgrp to catch offenders. Introduce proper vlgrp accessors and use them consistently throughout the code. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:57:52 -07:00
Nikolay Aleksandrov	af3793921d	bridge: fix gc_timer mod/del race condition commit `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") introduced a timer race condition because the gc_timer can get rearmed after it's supposedly stopped and flushed in br_dev_delete() leading to a use of freed memory. So take rtnl to sync with bridge destruction when setting ageing_timer. Here's the trace reproduced with these two commands running in parallel: while :; do echo 10000 > /sys/class/net/br0/bridge/ageing_timer; done; while :; do brctl addbr br0; ip l set br0 up; ip l set br0 down; brctl delbr br0; done; [ 300.000029] BUG: unable to handle kernel paging request at ffffffff811c59d3 [ 300.000263] IP: [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.000422] PGD 1a0f067 PUD 1a10063 PMD 10001e1 [ 300.000639] Oops: 0003 [#1] SMP [ 300.000793] Modules linked in: bridge stp llc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd snd_hda_codec_generic qxl drm_kms_helper psmouse pcspkr ttm snd_hda_intel 9pnet_virtio evdev serio_raw joydev snd_hda_codec 9pnet virtio_balloon drm snd_hwdep virtio_console snd_hda_core pvpanic snd_pcm i2c_piix4 snd_timer acpi_cpufreq parport_pc snd parport soundcore button processor i2c_core ipv6 autofs4 hid_generic usbhid hid ext4 crc16 mbcache jbd2 sg sr_mod cdrom ata_generic virtio_blk virtio_net e1000 ehci_pci uhci_hcd ehci_hcd usbcore usb_common floppy ata_piix libata virtio_pci virtio_ring virtio scsi_mod [ 300.004008] CPU: 1 PID: 1169 Comm: bash Not tainted 4.3.0-rc3+ #46 [ 300.004008] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 300.004008] task: ffff880035be2200 ti: ffff88003795c000 task.ti: ffff88003795c000 [ 300.004008] RIP: 0010:[<ffffffff810f168e>] [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.004008] RSP: 0018:ffff88003fd03e78 EFLAGS: 00010046 [ 300.004008] RAX: ffff88003fd0ef60 RBX: 840fc78949c08548 RCX: 00000001ffffffff [ 300.004008] RDX: 0000000000000000 RSI: ffffffff811c59d3 RDI: ffff88003fd0df00 [ 300.004008] RBP: ffff88003fd03e78 R08: 00000000ffffffff R09: 0000000000000000 [ 300.004008] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003fd0df00 [ 300.004008] R13: 0000000000000000 R14: 0000000000000001 R15: ffffffff816032e0 [ 300.004008] FS: 00007fcbdd609700(0000) GS:ffff88003fd00000(0000) knlGS:0000000000000000 [ 300.004008] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 300.004008] CR2: ffffffff811c59d3 CR3: 0000000037879000 CR4: 00000000000406e0 [ 300.004008] Stack: [ 300.004008] ffff88003fd03ea8 ffffffff810f1775 ffff88003c8cb958 ffff88003fd0df00 [ 300.004008] 0000000000000000 0000000000000001 ffff88003fd03f18 ffffffff810f28c4 [ 300.004008] ffff88003fd0eb68 ffff88003fd0e968 ffff88003fd0e768 ffff88003fd0df68 [ 300.004008] Call Trace: [ 300.004008] <IRQ> [ 300.004008] [<ffffffff810f1775>] cascade+0x45/0x70 [ 300.004008] [<ffffffff810f28c4>] run_timer_softirq+0x2f4/0x340 [ 300.004008] [<ffffffff8107e380>] __do_softirq+0xd0/0x440 [ 300.004008] [<ffffffff8107e8a3>] irq_exit+0xb3/0xc0 [ 300.004008] [<ffffffff815c2032>] smp_apic_timer_interrupt+0x42/0x50 [ 300.004008] [<ffffffff815bfe37>] apic_timer_interrupt+0x87/0x90 [ 300.004008] <EOI> [ 300.004008] [<ffffffff811fb80c>] ? create_object+0x13c/0x2e0 [ 300.004008] [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70 [ 300.004008] [<ffffffff8109b23e>] ? __kernel_text_address+0x4e/0x70 [ 300.004008] [<ffffffff8101e17f>] print_context_stack+0x7f/0xf0 [ 300.004008] [<ffffffff8101d55b>] dump_trace+0x11b/0x300 [ 300.004008] [<ffffffff8102970b>] save_stack_trace+0x2b/0x50 [ 300.004008] [<ffffffff811fb80c>] create_object+0x13c/0x2e0 [ 300.004008] [<ffffffff815b2e8e>] kmemleak_alloc+0x4e/0xb0 [ 300.004008] [<ffffffff811e475d>] kmem_cache_alloc_trace+0x18d/0x2f0 [ 300.004008] [<ffffffff8128b139>] kernfs_fop_open+0xc9/0x380 [ 300.004008] [<ffffffff8120214f>] do_dentry_open+0x1ff/0x2f0 [ 300.004008] [<ffffffff8128b070>] ? kernfs_fop_release+0x70/0x70 [ 300.004008] [<ffffffff812034f9>] vfs_open+0x59/0x60 [ 300.004008] [<ffffffff812130de>] path_openat+0x1ce/0x1260 [ 300.004008] [<ffffffff812154ae>] do_filp_open+0x7e/0xe0 [ 300.004008] [<ffffffff812251ff>] ? __alloc_fd+0xaf/0x180 [ 300.004008] [<ffffffff8120387b>] do_sys_open+0x12b/0x210 [ 300.004008] [<ffffffff8120397e>] SyS_open+0x1e/0x20 [ 300.004008] [<ffffffff815bf0b6>] entry_SYSCALL_64_fastpath+0x16/0x7a [ 300.004008] Code: 66 90 48 8b 46 10 48 8b 4f 40 55 48 89 c2 48 89 e5 48 29 ca 48 81 fa ff 00 00 00 77 20 0f b6 c0 48 8d 44 c7 68 48 8b 10 48 85 d2 <48> 89 16 74 04 48 89 72 08 48 89 30 48 89 46 08 5d c3 48 81 fa [ 300.004008] RIP [<ffffffff810f168e>] __internal_add_timer+0x2e/0xd0 [ 300.004008] RSP <ffff88003fd03e78> [ 300.004008] CR2: ffffffff811c59d3 Fixes: `c62987bbd8` ("bridge: push bridge setting ageing_time down to switchdev") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-13 04:50:17 -07:00
Nikolay Aleksandrov	6623c60dc2	bridge: vlan: enforce no pvid flag in vlan ranges Currently it's possible for someone to send a vlan range to the kernel with the pvid flag set which will result in the pvid bouncing from a vlan to vlan and isn't correct, it also introduces problems for hardware where it doesn't make sense having more than 1 pvid. iproute2 already enforces this, so let's enforce it on kernel-side as well. Reported-by: Elad Raz <eladr@mellanox.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:59:15 -07:00
Jiri Pirko	0944d6b5a2	bridge: try switchdev op first in __vlan_vid_add/del Some drivers need to implement both switchdev vlan ops and vid_add/kill ndos. For that to work in bridge code, we need to try switchdev op first when adding/deleting vlan id. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 19:35:20 -07:00
Arnd Bergmann	c932245811	netfilter: bridge: avoid unused label warning With the ARM mini2440_defconfig, the bridge netfilter code gets built with both CONFIG_NF_DEFRAG_IPV4 and CONFIG_NF_DEFRAG_IPV6 disabled, which leads to a harmless gcc warning: net/bridge/br_netfilter_hooks.c: In function 'br_nf_dev_queue_xmit': net/bridge/br_netfilter_hooks.c:792:2: warning: label 'drop' defined but not used [-Wunused-label] This gets rid of the warning by cleaning up the code to avoid the respective #ifdefs causing this problem, and replacing them with if(IS_ENABLED()) checks. I have verified that the resulting object code is unchanged, and an additional advantage is that we now get compile coverage of the unused functions in more configurations. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `dd302b59bd` ("netfilter: bridge: don't leak skb in error paths") Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-10-12 17:48:36 +02:00
Scott Feldman	c62987bbd8	bridge: push bridge setting ageing_time down to switchdev Use SWITCHDEV_F_SKIP_EOPNOTSUPP to skip over ports in bridge that don't support setting ageing_time (or setting bridge attrs in general). If push fails, don't update ageing_time in bridge and return err to user. If push succeeds, update ageing_time in bridge and run gc_timer now to recalabrate when to run gc_timer next, based on new ageing_time. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:20:20 -07:00
Roopa Prabhu	3741873b4f	bridge: allow adding of fdb entries pointing to the bridge device This patch enables adding of fdb entries pointing to the bridge device. This can be used to propagate mac address of vlan interfaces configured on top of the vlan filtering bridge. Before: $bridge fdb add 44:38:39:00:27:9f dev bridge RTNETLINK answers: Invalid argument After: $bridge fdb add 44:38:39:00:27:9f dev bridge Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-12 05:11:58 -07:00
Nikolay Aleksandrov	5d6ae479ab	bridge: netlink: add support for port's multicast_router attribute Add IFLA_BRPORT_MULTICAST_ROUTER to allow setting/getting port's multicast_router via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:34 -07:00
Nikolay Aleksandrov	9b0c6e4deb	bridge: netlink: allow to flush port's fdb Add IFLA_BRPORT_FLUSH to allow flushing port's fdb similar to sysfs's flush. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:32 -07:00
Nikolay Aleksandrov	61c0a9a83e	bridge: netlink: export port's timer values Add the following attributes in order to export port's timer values: IFLA_BRPORT_MESSAGE_AGE_TIMER, IFLA_BRPORT_FORWARD_DELAY_TIMER and IFLA_BRPORT_HOLD_TIMER. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:31 -07:00
Nikolay Aleksandrov	e08e838ac5	bridge: netlink: export port's topology_change_ack and config_pending Add IFLA_BRPORT_TOPOLOGY_CHANGE_ACK and IFLA_BRPORT_CONFIG_PENDING to allow getting port's topology_change_ack and config_pending respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:30 -07:00
Nikolay Aleksandrov	42d452c4b5	bridge: netlink: export port's id and number Add IFLA_BRPORT_(ID\|NO) to allow getting port's port_id and port_no respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov	96f94e7f4a	bridge: netlink: export port's designated cost and port Add IFLA_BRPORT_DESIGNATED_(COST\|PORT) to allow getting the port's designated cost and port respectively via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:29 -07:00
Nikolay Aleksandrov	80df9a2692	bridge: netlink: export port's bridge id Add IFLA_BRPORT_BRIDGE_ID to allow getting the designated bridge id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:28 -07:00
Nikolay Aleksandrov	4ebc7660ab	bridge: netlink: export port's root id Add IFLA_BRPORT_ROOT_ID to allow getting the designated root id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:49:27 -07:00
Nikolay Aleksandrov	4917a1548f	bridge: netlink: make br_fill_info's frame size smaller When KASAN is enabled the frame size grows > 2048 bytes and we get a warning, so make it smaller. net/bridge/br_netlink.c: In function 'br_fill_info': >> net/bridge/br_netlink.c:1110:1: warning: the frame size of 2160 bytes >> is larger than 2048 bytes [-Wframe-larger-than=] Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-07 04:15:57 -07:00
David S. Miller	40e106801e	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next Eric W. Biederman says: ==================== net: Pass net through ip fragmention This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. This round focuses on passing net through ip fragmentation which we seem to call from about everywhere. That is the main ip output paths, the bridge netfilter code, and openvswitch. This has to happend at once accross the tree as function pointers are involved. First some prep work is done, then ipv4 and ipv6 are converted and then temporary helper functions are removed. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-05 03:39:31 -07:00
Nikolay Aleksandrov	0f963b7592	bridge: netlink: add support for default_pvid Add IFLA_BR_VLAN_DEFAULT_PVID to allow setting/getting bridge's default_pvid via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov	93870cc02a	bridge: netlink: add support for netfilter tables config Add support to allow getting/setting netfilter tables settings. Currently these are IFLA_BR_NF_CALL_IPTABLES, IFLA_BR_NF_CALL_IP6TABLES and IFLA_BR_NF_CALL_ARPTABLES. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:07 -07:00
Nikolay Aleksandrov	7e4df51eb3	bridge: netlink: add support for igmp's intervals Add support to set/get all of the igmp's configurable intervals via netlink. These currently are: IFLA_BR_MCAST_LAST_MEMBER_INTVL IFLA_BR_MCAST_MEMBERSHIP_INTVL IFLA_BR_MCAST_QUERIER_INTVL IFLA_BR_MCAST_QUERY_INTVL IFLA_BR_MCAST_QUERY_RESPONSE_INTVL IFLA_BR_MCAST_STARTUP_QUERY_INTVL Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov	b89e6babad	bridge: netlink: add support for multicast_startup_query_count Add IFLA_BR_MCAST_STARTUP_QUERY_CNT to allow setting/getting br->multicast_startup_query_count via netlink. Also align the ifla comments. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov	79b859f573	bridge: netlink: add support for multicast_last_member_count Add IFLA_BR_MCAST_LAST_MEMBER_CNT to allow setting/getting br->multicast_last_member_count via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov	858079fdae	bridge: netlink: add support for igmp's hash_max Add IFLA_BR_MCAST_HASH_MAX to allow setting/getting br->hash_max via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:06 -07:00
Nikolay Aleksandrov	431db3c050	bridge: netlink: add support for igmp's hash_elasticity Add IFLA_BR_MCAST_HASH_ELASTICITY to allow setting/getting br->hash_elasticity via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:05 -07:00
Nikolay Aleksandrov	ba062d7cc6	bridge: netlink: add support for multicast_querier Add IFLA_BR_MCAST_QUERIER to allow setting/getting br->multicast_querier via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:04 -07:00
Nikolay Aleksandrov	295141d904	bridge: netlink: add support for multicast_query_use_ifaddr Add IFLA_BR_MCAST_QUERY_USE_IFADDR to allow setting/getting br->multicast_query_use_ifaddr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:03 -07:00
Nikolay Aleksandrov	89126327f9	bridge: netlink: add support for multicast_snooping Add IFLA_BR_MCAST_SNOOPING to allow enabling/disabling multicast snooping via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:02 -07:00
Nikolay Aleksandrov	a9a6bc70f5	bridge: netlink: add support for multicast_router Add IFLA_BR_MCAST_ROUTER to allow setting and retrieving br->multicast_router when igmp snooping is enabled. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:01 -07:00
Nikolay Aleksandrov	150217c688	bridge: netlink: add fdb flush Simple attribute that flushes the bridge's fdb. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:46:01 -07:00
Nikolay Aleksandrov	111189abc5	bridge: netlink: add group_addr support Add IFLA_BR_GROUP_ADDR attribute to allow setting and retrieving the group_addr via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:59 -07:00
Nikolay Aleksandrov	d76bd14e0f	bridge: netlink: export all timers Export the following bridge timers (also exported via sysfs): IFLA_BR_HELLO_TIMER, IFLA_BR_TCN_TIMER, IFLA_BR_TOPOLOGY_CHANGE_TIMER, IFLA_BR_GC_TIMER via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:59 -07:00
Nikolay Aleksandrov	ed4163098e	bridge: netlink: export topology_change and topology_change_detected Add IFLA_BR_TOPOLOGY_CHANGE and IFLA_BR_TOPOLOGY_CHANGE_DETECTED and export them via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:58 -07:00
Nikolay Aleksandrov	684dd248be	bridge: netlink: export root path cost Add IFLA_BR_ROOT_PATH_COST and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:57 -07:00
Nikolay Aleksandrov	8762ba680f	bridge: netlink: export root port Add IFLA_BR_ROOT_PORT and export it via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:56 -07:00
Nikolay Aleksandrov	7599a2201f	bridge: netlink: export bridge id Add IFLA_BR_BRIDGE_ID and export br->bridge_id via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:55 -07:00
Nikolay Aleksandrov	5127c81f84	bridge: netlink: export root id Add IFLA_BR_ROOT_ID and export br->designated_root via netlink. For this purpose add struct ifla_bridge_id that would represent struct bridge_id. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:54 -07:00
Nikolay Aleksandrov	7910228b6b	bridge: netlink: add group_fwd_mask support Add IFLA_BR_GROUP_FWD_MASK attribute to allow setting and retrieving the group_fwd_mask via netlink. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:45:53 -07:00
Nikolay Aleksandrov	6be144f62f	bridge: vlan: use br_vlan_should_use to simplify __vlan_add/del The checks that lead to num_vlans change are always what br_vlan_should_use checks for, namely if the vlan is only a context or not and depending on that it's either not counted or counted as a real/used vlan respectively. Also give better explanation in br_vlan_should_use's comment. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:43:50 -07:00
Nikolay Aleksandrov	2ffdf508d2	bridge: vlan: drop master_flags from __vlan_add There's only one user now and we can include the flag directly. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:43:49 -07:00
Nikolay Aleksandrov	f8ed289fab	bridge: vlan: use br_vlan_(get\|put)_master to deal with refcounts Introduce br_vlan_(get\|put)_master which take a reference (or create the master vlan first if it didn't exist) and drop a reference respectively. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:43:48 -07:00
Nikolay Aleksandrov	586c2b573e	bridge: vlan: use rcu list for the ordered vlan list When I did the conversion to rhashtable I missed the required locking of one important user of the vlan list - br_get_link_af_size_filtered() which is called: br_ifinfo_notify() -> br_nlmsg_size() -> br_get_link_af_size_filtered() and the notifications can be sent without holding rtnl. Before this conversion the function relied on using rcu and since we already use rcu to destroy the vlans, we can simply migrate the list to use the rcu helpers. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-04 16:43:47 -07:00
Jiri Pirko	9e8f4a548a	switchdev: push object ID back to object structure Suggested-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:40 -07:00
Jiri Pirko	648b4a995a	switchdev: bring back switchdev_obj and use it as a generic object param Replace "void *obj" with a generic structure. Introduce couple of helpers along that. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	52ba57cfdc	switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:39 -07:00
Jiri Pirko	8f24f3095d	switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan Make the struct name in sync with object id name. Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:38 -07:00
Jiri Pirko	1f86839874	switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_* To be aligned with obj. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:37 -07:00
Jiri Pirko	57d80838da	switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_* Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-03 04:49:36 -07:00
Nikolay Aleksandrov	248234ca02	bridge: vlan: don't pass flags when creating context only We should not pass the original flags when creating a context vlan only because they may contain some flags that change behaviour in the bridge. The new global context should be with minimal set of flags, so pass 0 and let br_vlan_add() set the master flag only. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:05 -07:00
Nikolay Aleksandrov	263344e64c	bridge: vlan: fix possible null ptr derefs on port init and deinit When a new port is being added we need to make vlgrp available after rhashtable has been initialized and when removing a port we need to flush the vlans and free the resources after we're sure noone can use the port, i.e. after it's removed from the port list and synchronize_rcu is executed. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:05 -07:00
Nikolay Aleksandrov	77751ee8ae	bridge: vlan: move pvid inside net_bridge_vlan_group One obvious way to converge more code (which was also used by the previous vlan code) is to move pvid inside net_bridge_vlan_group. This allows us to simplify some and remove other port-specific functions. Also gives us the ability to simply pass the vlan group and use all of the contained information. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:04 -07:00
Nikolay Aleksandrov	468e794458	bridge: vlan: fix possible null vlgrp deref while registering new port While a new port is being initialized the rx_handler gets set, but the vlans get initialized later in br_add_if() and in that window if we receive a frame with a link-local address we can try to dereference p->vlgrp in: br_handle_frame() -> br_handle_local_finish() -> br_should_learn() Fix this by checking vlgrp before using it. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:04 -07:00
Nikolay Aleksandrov	8af78b6487	bridge: vlan: adjust rhashtable initial size and hash locks size As Stephen pointed out the default initial size is more than we need, so let's start small (4 elements, thus nelem_hint = 3). Also limit the hash locks to the number of CPUs as we don't need any write-side scaling and this looks like the minimum. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-10-01 18:24:03 -07:00
Eric W. Biederman	75aec9df3a	bridge: Remove br_nf_push_frag_xmit_sk Now that this compatability function no longer has any callers remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-09-30 01:45:04 -05:00
Eric W. Biederman	7d8c6e3915	ipv6: Pass struct net through ip6_fragment Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>	2015-09-30 01:45:03 -05:00
Eric W. Biederman	694869b3c5	ipv4: Pass struct net through ip_fragment Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-09-30 01:45:03 -05:00
David S. Miller	4bf1b54f9d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following pull request contains Netfilter/IPVS updates for net-next containing 90 patches from Eric Biederman. The main goal of this batch is to avoid recurrent lookups for the netns pointer, that happens over and over again in our Netfilter/IPVS code. The idea consists of passing netns pointer from the hook state to the relevant functions and objects where this may be needed. You can find more information on the IPVS updates from Simon Horman's commit merge message: `c3456026ad` ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next"). Exceptionally, this time, I'm not posting the patches again on netdev, Eric already Cc'ed this mailing list in the original submission. If you need me to make, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:46:21 -07:00
Vivien Didelot	ab06900230	net: switchdev: abstract object in add/del ops Similar to the notifier_call callback of a notifier_block, change the function signature of switchdev add and del operations to: int switchdev_port_obj_add/del(struct net_device dev, enum switchdev_obj_id id, void obj); This allows the caller to pass a specific switchdev_obj_* structure instead of the generic switchdev_obj one. Drivers implementation of these operations and switchdev have been changed accordingly. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 21:31:59 -07:00
Nikolay Aleksandrov	2594e9064a	bridge: vlan: add per-vlan struct and move to rhashtables This patch changes the bridge vlan implementation to use rhashtables instead of bitmaps. The main motivation behind this change is that we need extensible per-vlan structures (both per-port and global) so more advanced features can be introduced and the vlan support can be extended. I've tried to break this up but the moment net_port_vlans is changed and the whole API goes away, thus this is a larger patch. A few short goals of this patch are: - Extensible per-vlan structs stored in rhashtables and a sorted list - Keep user-visible behaviour (compressed vlans etc) - Keep fastpath ingress/egress logic the same (optimizations to come later) Here's a brief list of some of the new features we'd like to introduce: - per-vlan counters - vlan ingress/egress mapping - per-vlan igmp configuration - vlan priorities - avoid fdb entries replication (e.g. local fdb scaling issues) The structure is kept single for both global and per-port entries so to avoid code duplication where possible and also because we'll soon introduce "port0 / aka bridge as port" which should simplify things further (thanks to Vlad for the suggestion!). Now we have per-vlan global rhashtable (bridge-wide) and per-vlan port rhashtable, if an entry is added to a port it'll get a pointer to its global context so it can be quickly accessed later. There's also a sorted vlan list which is used for stable walks and some user-visible behaviour such as the vlan ranges, also for error paths. VLANs are stored in a "vlan group" which currently contains the rhashtable, sorted vlan list and the number of "real" vlan entries. A good side-effect of this change is that it resembles how hw keeps per-vlan data. One important note after this change is that if a VLAN is being looked up in the bridge's rhashtable for filtering purposes (or to check if it's an existing usable entry, not just a global context) then the new helper br_vlan_should_use() needs to be used if the vlan is found. In case the lookup is done only with a port's vlan group, then this check can be skipped. Things tested so far: - basic vlan ingress/egress - pvids - untagged vlans - undef CONFIG_BRIDGE_VLAN_FILTERING - adding/deleting vlans in different scenarios (with/without global ctx, while transmitting traffic, in ranges etc) - loading/removing the module while having/adding/deleting vlans - extracting bridge vlan information (user ABI), compressed requests - adding/deleting fdbs on vlans - bridge mac change, promisc mode - default pvid change - kmemleak ON during the whole time Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-29 13:36:06 -07:00
Eric W. Biederman	c1444c6357	bridge: Pass net into br_validate_ipv4 and br_validate_ipv6 The network namespace is easiliy available in state->net so use it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-29 20:21:32 +02:00
Ian Wilson	34c2d9fb04	bridge: Allow forward delay to be cfgd when STP enabled Allow bridge forward delay to be configured when Spanning Tree is enabled. Signed-off-by: Ian Wilson <iwilson@brocade.com> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-27 19:09:38 -07:00
David S. Miller	4963ed48f2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-26 16:08:27 -07:00
Siva Mannem	dcd45e0649	bridge: don't age externally added FDB entries Signed-off-by: Siva Mannem <siva.mannem.lnx@gmail.com> Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Premkumar Jonnala <pjonnala@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-23 14:35:58 -07:00
Scott Feldman	a79e88d9fb	bridge: define some min/max/default ageing time constants Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-23 14:35:58 -07:00
Eric W. Biederman	06198b34a3	netfilter: Pass priv instead of nf_hook_ops to netfilter hooks Only pass the void *priv parameter out of the nf_hook_ops. That is all any of the functions are interested now, and by limiting what is passed it becomes simpler to change implementation details. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 22:00:16 +02:00
Eric W. Biederman	88182a0e0c	netfilter: nf_tables: Use pkt->net instead of computing net from the passed net_devices Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:49 +02:00
Eric W. Biederman	686c9b5080	netfilter: x_tables: Use par->net instead of computing from the passed net devices Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:25 +02:00
Eric W. Biederman	156c196f60	netfilter: x_tables: Pass struct net in xt_action_param As xt_action_param lives on the stack this does not bloat any persistent data structures. This is a first step in making netfilter code that needs to know which network namespace it is executing in simpler. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:14 +02:00
Eric W. Biederman	6aa187f21c	netfilter: nf_tables: kill nft_pktinfo.ops - Add nft_pktinfo.pf to replace ops->pf - Add nft_pktinfo.hook to replace ops->hooknum This simplifies the code, makes it more readable, and likely reduces cache line misses. Maintainability is enhanced as the details of nft_hook_ops are of no concern to the recpients of nft_pktinfo. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:58:01 +02:00
Eric W. Biederman	97b59c3a91	netfilter: ebtables: Simplify the arguments to ebt_do_table Nearly everything thing of interest to ebt_do_table is already present in nf_hook_state. Simplify ebt_do_table by just passing in the skb, nf_hook_state, and the table. This make the code easier to read and maintenance easier. To support this create an nf_hook_state on the stack in ebt_broute (the only caller without a nf_hook_state already available). This new nf_hook_state adds no new computations to ebt_broute, but does use a few more bytes of stack. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-09-18 21:57:35 +02:00
Eric W. Biederman	0c4b51f005	netfilter: Pass net into okfn This is immediately motivated by the bridge code that chains functions that call into netfilter. Without passing net into the okfns the bridge code would need to guess about the best expression for the network namespace to process packets in. As net is frequently one of the first things computed in continuation functions after netfilter has done it's job passing in the desired network namespace is in many cases a code simplification. To support this change the function dst_output_okfn is introduced to simplify passing dst_output as an okfn. For the moment dst_output_okfn just silently drops the struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	9dff2c966a	netfilter: Use nf_hook_state.net Instead of saying "net = dev_net(state->in?state->in:state->out)" just say "state->net". As that information is now availabe, much less confusing and much less error prone. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	29a26a5680	netfilter: Pass struct net into the netfilter hooks Pass a network namespace parameter into the netfilter hooks. At the call site of the netfilter hooks the path a packet is taking through the network stack is well known which allows the network namespace to be easily and reliabily. This allows the replacement of magic code like "dev_net(state->in?:state->out)" that appears at the start of most netfilter hooks with "state->net". In almost all cases the network namespace passed in is derived from the first network device passed in, guaranteeing those paths will not see any changes in practice. The exceptions are: xfrm/xfrm_output.c:xfrm_output_resume() xs_net(skb_dst(skb)->xfrm) ipvs/ip_vs_xmit.c:ip_vs_nat_send_or_cont() ip_vs_conn_net(cp) ipvs/ip_vs_xmit.c:ip_vs_send_or_cont() ip_vs_conn_net(cp) ipv4/raw.c:raw_send_hdrinc() sock_net(sk) ipv6/ip6_output.c:ip6_xmit() sock_net(sk) ipv6/ndisc.c:ndisc_send_skb() dev_net(skb->dev) not dev_net(dst->dev) ipv6/raw.c:raw6_send_hdrinc() sock_net(sk) br_netfilter_hooks.c:br_nf_pre_routing_finish() dev_net(skb->dev) before skb->dev is set to nf_bridge->physindev In all cases these exceptions seem to be a better expression for the network namespace the packet is being processed in then the historic "dev_net(in?in:out)". I am documenting them in case something odd pops up and someone starts trying to track down what happened. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	04eb44890e	bridge: Add br_netif_receive_skb remove netif_receive_skb_sk netif_receive_skb_sk is only called once in the bridge code, replace it with a bridge specific function that calls netif_receive_skb. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:37 -07:00
Eric W. Biederman	f2d74cf88c	bridge: Cache net in br_nf_pre_routing_finish This is prep work for passing net to the netfilter hooks. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:36 -07:00
Eric W. Biederman	6532948b2e	bridge: Pass net into br_nf_push_frag_xmit When struct net starts being passed through the ipv4 and ipv6 fragment routines br_nf_push_frag_xmit will need to take a net parameter. Prepare br_nf_push_frag_xmit before that is needed and introduce br_nf_push_frag_xmit_sk for the call sites that still need the old calling conventions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:36 -07:00
Eric W. Biederman	8d4df0b930	bridge: Pass net into br_nf_ip_fragment This is a prep work for passing struct net through ip_do_fragment and later the netfilter okfn. Doing this independently makes the later code changes clearer. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:36 -07:00
Eric W. Biederman	1f19c578df	bridge: Introduce br_send_bpdu_finish The function dev_queue_xmit_skb_sk is unncessary and very confusing. Introduce br_send_bpdu_finish to remove the need for dev_queue_xmit_skb_sk, and have br_send_bpdu_finish call dev_queue_xmit. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-17 17:18:35 -07:00
Linus Lüssing	c2d4fbd216	bridge: fix igmpv3 / mldv2 report parsing With the newly introduced helper functions the skb pulling is hidden in the checksumming function - and undone before returning to the caller. The IGMPv3 and MLDv2 report parsing functions in the bridge still assumed that the skb is pointing to the beginning of the IGMP/MLD message while it is now kept at the beginning of the IPv4/6 header, breaking the message parsing and creating packet loss. Fixing this by taking the offset between IP and IGMP/MLD header into account, too. Fixes: `9afd85c9e4` ("net: Export IGMP/MLD message validation code") Reported-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Tested-by: Tobias Powalowski <tobias.powalowski@googlemail.com> Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-11 15:08:20 -07:00
Vivien Didelot	7a577f013d	net: bridge: remove unnecessary switchdev include Remove the unnecessary switchdev.h include from br_netlink.c. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-08 22:33:14 -07:00
Vivien Didelot	bf361ad381	net: bridge: check __vlan_vid_del for error Since __vlan_del can return an error code, change its inner function __vlan_vid_del to return an eventual error from switchdev_port_obj_del. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-09-08 22:28:45 -07:00
David S. Miller	581a5f2a61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter/IPVS updates for your net-next tree. In sum, patches to address fallout from the previous round plus updates from the IPVS folks via Simon Horman, they are: 1) Add a new scheduler to IPVS: The weighted overflow scheduling algorithm directs network connections to the server with the highest weight that is currently available and overflows to the next when active connections exceed the node's weight. From Raducu Deaconu. 2) Fix locking ordering in IPVS, always take rtnl_lock in first place. Patch from Julian Anastasov. 3) Allow to indicate the MTU to the IPVS in-kernel state sync daemon. From Julian Anastasov. 4) Enhance multicast configuration for the IPVS state sync daemon. Also from Julian. 5) Resolve sparse warnings in the nf_dup modules. 6) Fix a linking problem when CONFIG_NF_DUP_IPV6 is not set. 7) Add ICMP codes 5 and 6 to IPv6 REJECT target, they are more informative subsets of code 1. From Andreas Herz. 8) Revert the jumpstack size calculation from mark_source_chains due to chain depth miscalculations, from Florian Westphal. 9) Calm down more sparse warning around the Netfilter tree, again from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-28 16:29:59 -07:00
Florian Westphal	851345c5bb	netfilter: reduce sparse warnings bridge/netfilter/ebtables.c:290:26: warning: incorrect type in assignment (different modifiers) -> remove __pure annotation. ipv6/netfilter/ip6t_SYNPROXY.c:240:27: warning: cast from restricted __be16 -> switch ntohs to htons and vice versa. netfilter/core.c:391:30: warning: symbol 'nfq_ct_nat_hook' was not declared. Should it be static? -> delete it, got removed net/netfilter/nf_synproxy_core.c:221:48: warning: cast to restricted __be32 -> Use __be32 instead of u32. Tested with objdiff that these changes do not affect generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-28 21:04:12 +02:00
Nikolay Aleksandrov	b22fbf22f8	bridge: fdb: rearrange net_bridge_fdb_entry While looking into fixing the local entries scalability issue I noticed that the structure is badly arranged because vlan_id would fall in a second cache line while keeping rcu which is used only when deleting in the first, so re-arrange the structure and push rcu to the end so we can get 16 bytes which can be used for other fields (by pushing rcu fully in the second 64 byte chunk). With this change all the core necessary information when doing fdb lookups will be available in a single cache line. pahole before (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; /* 0 16 / struct net_bridge_port dst; /* 16 8 / struct callback_head rcu; / 24 16 / long unsigned int updated; / 40 8 / long unsigned int used; / 48 8 / mac_addr addr; / 56 6 / unsigned char is_local:1; / 62: 7 1 / unsigned char is_static:1; / 62: 6 1 / unsigned char added_by_user:1; / 62: 5 1 / unsigned char added_by_external_learn:1; / 62: 4 1 / / XXX 4 bits hole, try to pack / / XXX 1 byte hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / __u16 vlan_id; / 64 2 / / size: 72, cachelines: 2, members: 11 / / sum members: 65, holes: 1, sum holes: 1 / / bit holes: 1, sum bit holes: 4 bits / / padding: 6 / / last cacheline: 8 bytes / } pahole after (note vlan_id): struct net_bridge_fdb_entry { struct hlist_node hlist; / 0 16 / struct net_bridge_port dst; /* 16 8 / long unsigned int updated; / 24 8 / long unsigned int used; / 32 8 / mac_addr addr; / 40 6 / __u16 vlan_id; / 46 2 / unsigned char is_local:1; / 48: 7 1 / unsigned char is_static:1; / 48: 6 1 / unsigned char added_by_user:1; / 48: 5 1 / unsigned char added_by_external_learn:1; / 48: 4 1 / / XXX 4 bits hole, try to pack / / XXX 7 bytes hole, try to pack / struct callback_head rcu; / 56 16 / / --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- / / size: 72, cachelines: 2, members: 11 / / sum members: 65, holes: 1, sum holes: 7 / / bit holes: 1, sum bit holes: 4 bits / / last cacheline: 8 bytes */ } Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 16:38:52 -07:00
Toshiaki Makita	d2d427b392	bridge: Add netlink support for vlan_protocol attribute This enables bridge vlan_protocol to be configured through netlink. When CONFIG_BRIDGE_VLAN_FILTERING is disabled, kernel behaves the same way as this feature is not implemented. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-27 15:35:33 -07:00
David S. Miller	dc25b25897	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/usb/qmi_wwan.c Overlapping additions of new device IDs to qmi_wwan.c Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-21 11:44:04 -07:00

... 2 3 4 5 6 ...

1588 Commits