OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Nikolay Aleksandrov	4fb0ef585e	bonding: convert arp_ip_target to use the new option API This patch adds the necessary changes so arp_ip_target would use the new bonding option API. This option is an exception because of the way it's currently implemented that's why its netlink code is a bit different from the other options to keep the functionality as before and at the same time to have a single set function. This patch also fixes a few stylistic errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov	7bdb04ed0d	bonding: convert arp_interval to use the new option API This patch adds the necessary changes so arp_interval would use the new bonding option API. The "default" definition has been removed as it was 0. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov	1df6b6aa33	bonding: convert fail_over_mac to use the new option API This patch adds the necessary changes so fail_over_mac would use the new bonding option API. Also fixes a trivial copy/paste error in bond_check_params where the wrong variable was used for the error msg. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov	edf36b24c5	bonding: convert arp_all_targets to use the new option API This patch adds the necessary changes so arp_all_targets would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov	162288810c	bonding: convert arp_validate to use the new option API This patch adds the necessary changes so arp_validate would use the new bonding option API. Also fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:42 -08:00
Nikolay Aleksandrov	a4b32ce7f8	bonding: convert xmit_hash_policy to use the new option API This patch adds the necessary changes so xmit_hash_policy would use the new bonding option API. Also fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:41 -08:00
Nikolay Aleksandrov	aa59d8517d	bonding: convert packets_per_slave to use the new option API This patch adds the necessary changes so packets_per_slave would use the new bonding option API. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:41 -08:00
Nikolay Aleksandrov	2b3798d5e1	bonding: convert mode setting to use the new option API This patch makes the bond's mode setting use the new option API and adds support for dependency printing which relies on having an entry for the mode option in the bond_opts[] array. Also add the ability to print the mode name when mode dependency fails and fix some trivial/style errors. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:41 -08:00
Nikolay Aleksandrov	0911736245	bonding: add infrastructure for an option API This patch adds the necessary basic infrastructure to support centralized and unified option manipulation API for the bonding. The new structure bond_option will be used to describe each option with its dependencies on modes which will be checked automatically thus removing a lot of duplicated code. Also automatic range checking is added for some options. Currently the option setting function requires RTNL to be acquired prior to calling it, since many options already rely on RTNL it seemed like the best choice to protect all against common race conditions. In order to add an option the following steps need to be done: 1. Add an entry BOND_OPT_<option> to bond_options.h so it gets a unique id and a bit corresponding to the id 2. Add a bond_option entry to the bond_opts[] array in bond_options.c which describes the option, its dependencies and its manipulation function 3. Add code to export the option through sysfs and/or as a module parameter (the sysfs export will be made automatically in the future) The options can have different flags set, currently the following are supported: BOND_OPTFLAG_NOSLAVES - require that the bond device has no slaves prior to setting the option BOND_OPTFLAG_IFDOWN - require that the bond device is down prior to setting the option BOND_OPTFLAG_RAWVAL - don't parse the value but return it raw for the option to parse There's a new value structure to describe different types of values which can have the following flags: BOND_VALFLAG_DEFAULT - marks the default option (permanent string alias to this option is "default") BOND_VALFLAG_MIN - the minimum value that this option can have BOND_VALFLAG_MAX - the maximum value that this option can have An example would be nice here, so if we have an option which can have the values "off"(2), "special"(4, default) and supports a range, say 16 - 32, it should be defined as follows: "off", 2, "special", 4, BOND_VALFLAG_DEFAULT, "rangemin", 16, BOND_VALFLAG_MIN, "rangemax", 32, BOND_VALFLAG_MAX So we have the valid intervals: [2, 2], [4, 4], [16, 32] Also the valid strings: "off" = 2, "special" and "default" = 4 "rangemin" = 16, "rangemax" = 32 BOND_VALFLAG_(MIN\|MAX) can be used to specify a valid range for an option, if MIN is omitted then 0 is considered as a minimum. If an exact match is found in the values[] table it will be returned, otherwise the range is tried (if available). The option parameter passing is done by using a special structure called bond_opt_value which can take either a string or a value to parse. One of the bond_opt_init(val\|str) macros should be used depending on which one does the user want to parse (string or value). Then a call to __bond_opt_set should be done under RTNL. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-22 15:38:41 -08:00
David S. Miller	374d112523	Merge branch 'reciprocal' Hannes Frederic Sowa says: ==================== reciprocal_divide update This patch is on top of `aee636c480` ("bpf: do not use reciprocal divide") from Eric that sits in net tree. It will not create a merge conflict, but it depends on this one, so we suggest, if possible, to merge net into net-next. We are proposing this change with only small modifications from the v2 version, namely updating the name of trim to reciprocal_scale (as commented on by Ben Hutchings and Eric Dumazet, thanks!). We thought about introducing the reciprocal_divide algorithm in parallel to the one already used by the kernel but faced organizational issues, leading us to the conclusion that it is best to just replace the old one: We could not come up with names for the different implementations and also with a way to describe the differences to guide developers which one to choose in which situation. This is because we cannot specify the correct semantics for the version which is currently used by the kernel. Altough it seems to not be causing problems in the kernel, we cannot surely say so in the case of flex_array for the future. Current usage seems ok, but future users could run into problems. Changelog: v1->v2: - changed name to prandom_u32_max in p1 - changed name to trim in p2 - reworked code in p3 v2->v3: - p1 and p3 stays unchanged, only small update in commit message in p3 - changed name to reciprocal_scale in p2 - fixed kernel doc format v3->v4: - pseduo -> pseudo (thanks to Tilman Schmidt) v4->v5: - fix pseduo -> pseudo for real now, sorry for the noise ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:19:26 -08:00
Hannes Frederic Sowa	809fa972fd	reciprocal_divide: update/correction of the algorithm Jakub Zawadzki noticed that some divisions by reciprocal_divide() were not correct [1][2], which he could also show with BPF code after divisions are transformed into reciprocal_value() for runtime invariance which can be passed to reciprocal_divide() later on; reverse in BPF dump ended up with a different, off-by-one K in some situations. This has been fixed by Eric Dumazet in commit `aee636c480` ("bpf: do not use reciprocal divide"). This follow-up patch improves reciprocal_value() and reciprocal_divide() to work in all cases by using Granlund and Montgomery method, so that also future use is safe and without any non-obvious side-effects. Known problems with the old implementation were that division by 1 always returned 0 and some off-by-ones when the dividend and divisor where very large. This seemed to not be problematic with its current users, as far as we can tell. Eric Dumazet checked for the slab usage, we cannot surely say so in the case of flex_array. Still, in order to fix that, we propose an extension from the original implementation from commit `6a2d7a955d` resp. [3][4], by using the algorithm proposed in "Division by Invariant Integers Using Multiplication" [5], Torbjörn Granlund and Peter L. Montgomery, that is, pseudocode for q = n/d where q, n, d is in u32 universe: 1) Initialization: int l = ceil(log_2 d) uword m' = floor((1<<32)((1<<l)-d)/d)+1 int sh_1 = min(l,1) int sh_2 = max(l-1,0) 2) For q = n/d, all uword: uword t = (nm')>>32 q = (t+((n-t)>>sh_1))>>sh_2 The assembler implementation from Agner Fog [6] also helped a lot while implementing. We have tested the implementation on x86_64, ppc64, i686, s390x; on x86_64/haswell we're still half the latency compared to normal divide. Joint work with Daniel Borkmann. [1] http://www.wireshark.org/~darkjames/reciprocal-buggy.c [2] http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c [3] https://gmplib.org/~tege/division-paper.pdf [4] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html [5] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.2556 [6] http://www.agner.org/optimize/asmlib.zip Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Austin S Hemmelgarn <ahferroin7@gmail.com> Cc: linux-kernel@vger.kernel.org Cc: Jesse Gross <jesse@nicira.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: Veaceslav Falico <vfalico@redhat.com> Cc: Jay Vosburgh <fubar@us.ibm.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	89770b0a69	net: introduce reciprocal_scale helper and convert users As David Laight suggests, we shouldn't necessarily call this reciprocal_divide() when users didn't requested a reciprocal_value(); lets keep the basic idea and call it reciprocal_scale(). More background information on this topic can be found in [1]. Joint work with Hannes Frederic Sowa. [1] http://homepage.cs.uiowa.edu/~jones/bcd/divide.html Suggested-by: David Laight <david.laight@aculab.com> Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Daniel Borkmann	f337db64af	random32: add prandom_u32_max and convert open coded users Many functions have open coded a function that returns a random number in range [0,N-1]. Under the assumption that we have a PRNG such as taus113 with being well distributed in [0, ~0U] space, we can implement such a function as uword t = (n*m')>>32, where m' is a random number obtained from PRNG, n the right open interval border and t our resulting random number, with n,m',t in u32 universe. Lets go with Joe and simply call it prandom_u32_max(), although technically we have an right open interval endpoint, but that we have documented. Other users can further be migrated to the new prandom_u32_max() function later on; for now, we need to make sure to migrate reciprocal_divide() users for the reciprocal_divide() follow-up fixup since their function signatures are going to change. Joint work with Hannes Frederic Sowa. Cc: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:17:20 -08:00
Moni Shoua	6cd28f044b	net/mlx4_core: Remove unnecessary validation for port number This is a fix to a regression introduced by commit: "982290a net/mlx4_core: Check port number for validity before accessing data" IPoIB could not attach to multicast group and we get this in dmesg: [144214.145008] ib0: failed to attach to multicast group, ret = -22 [144214.145016] ib0: couldn't attach QP to multicast group ff12:401b:ffff:0000:0000:0000:ffff:ffff [144214.145019] ib0: multicast join failed for ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -22 The cause to the problem is because port is extracted from gid[5]. Which is only valid for Ethernet. Removed this validation in mlx4_qp_attach_common(), which is accessed from both Ethernet and IB flows. Error flow for bad port value in Ethernet is already exists in that function. Signed-off-by: Moni Shoua <monis@mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:14:27 -08:00
Somnath Kotur	a6b74e01f0	be2net: Fix be_vlan_add/rem_vid() routines The current logic to put interface into VLAN Promiscous mode is not correct. We should increment "adapter->vlans_added" before calling be_vid_config(). Also removed some unwanted log messages. Signed-off-by: Kalesh AP <kalesh.purayil@emulex.com> Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:12:59 -08:00
Ariel Elior	076d132958	bnx2x: Fix VF flr flow When a VF originating from a given PF is flr-ed, that PF gets an interrupt from the chip management and takes a part in the flr process. This patch fixes several corner cases in which the driver performs its part of the flr flow out-of-order, causing the FW to assert due to badly timed messages received from the driver. Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by: Ariel Elior <ariele@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 23:12:59 -08:00
David S. Miller	14e481445d	net: Missing change from the ether_addr_copy() fixups. Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 22:54:01 -08:00
Daniel Borkmann	3769229931	net: filter: let bpf_tell_extensions return SKF_AD_MAX Michal Sekletar added in commit `ea02f9411d` ("net: introduce SO_BPF_EXTENSIONS") a facility where user space can enquire the BPF ancillary instruction set, which is imho a step into the right direction for letting user space high-level to BPF optimizers make an informed decision for possibly using these extensions. The original rationale was to return through a getsockopt(2) a bitfield of which instructions are supported and which are not, as of right now, we just return 0 to indicate a base support for SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Limitations of this approach are that this API which we need to maintain for a long time can only support a maximum of 32 extensions, and needs to be additionally maintained/updated when each new extension that comes in. I thought about this a bit more and what we can do here to overcome this is to just return SKF_AD_MAX. Since we never remove any extension since we cannot break user space and always linearly increase SKF_AD_MAX on each newly added extension, user space can make a decision on what extensions are supported in the whole set of extensions and which aren't, by just checking which of them from the whole set have an offset < SKF_AD_MAX of the underlying kernel. Since SKF_AD_MAX must be updated each time we add new ones, we don't need to introduce an additional enum and got maintenance for free. At some point in time when SO_BPF_EXTENSIONS becomes ubiquitous for most kernels, then an application can simply make use of this and easily be run on newer or older underlying kernels without needing to be recompiled, of course. Since that is for 3.14, it's not too late to do this change. Cc: Michal Sekletar <msekleta@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Michal Sekletar <msekleta@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:57:43 -08:00
David S. Miller	9be68c1ae0	net: Fix some fallout from the etner_addr_copy() changes. net/appletalk/aarp.c: In function ‘__aarp_send_query’: net/appletalk/aarp.c:137:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] ... net/atm/lec.c: In function ‘send_to_lecd’: net/atm/lec.c:524:3: warning: passing argument 1 of ‘ether_addr_copy’ from incompatible pointer type [enabled by default] In file included from net/atm/lec.c:17:0: include/linux/etherdevice.h:227:20: note: expected ‘u8 ’ but argument is of type ‘unsigned char ()[6]’ ... net/caif/caif_usb.c: In function ‘cfusbl_create’: net/caif/caif_usb.c:108:2: error: implicit declaration of function ‘ether_addr_copy’ [-Werror=implicit-function-declaration] Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:57:26 -08:00
David S. Miller	656edac678	Merge branch 'sctp' Wang Weidong says: ==================== sctp: remove some macro locking wrappers In sctp.h we can find some macro locking wrappers. As Neil point out that: "Its because in the origional implementation of the sctp protocol, there was a user space test harness which built the kernel module for userspace execution to cary our some unit testing on the code. It did so by redefining some of those locking macros to user space friendly code. IIRC we haven't use those unit tests in years, and so should be removing them, not adding them to other locations." So I remove them. ==================== Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:46 -08:00
wangweidong	5bc1d1b4a2	sctp: remove macros sctp_bh_[un]lock_sock Redefined bh_[un]lock_sock to sctp_bh[un]lock_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	048ed4b626	sctp: remove macros sctp_{lock\|release}_sock Redefined {lock\|release}_sock to sctp_{lock\|release}_sock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:41:36 -08:00
wangweidong	1b0de194f1	sctp: remove macros sctp_read_[un]lock Redefined read_[un]lock to sctp_read_[un]lock for user space friendly code which we haven't use in years, and the macros we never used, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	387602dfdc	sctp: remove macros sctp_write_[un]_lock Redefined write_[un]lock to sctp_write_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	3c8e43ba9f	sctp: remove macros sctp_spin_[un]lock Redefined spin_[un]lock to sctp_spin_[un]lock for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:41 -08:00
wangweidong	79b91130a2	sctp: remove macros sctp_local_bh_{disable\|enable} Redefined local_bh_{disable\|enable} to sctp_local_bh_{disable\|enable} for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:40 -08:00
wangweidong	940287ee10	sctp: remove macros sctp_spin_[un]lock_irqrestore Redefined spin_[un]lock_irqstore to sctp_spin_[un]lock_irqrestore for user space friendly code which we haven't use in years, so removing them. Signed-off-by: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:40:40 -08:00
Joe Perches	d08f161a10	dsa: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	9ea08b1299	pktgen: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	c62326abac	netpoll: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:05 -08:00
Joe Perches	34b2cff4ee	caif_usb: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	116e853f7f	atm: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	90ccb6aa40	appletalk: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Convert struct aarp_entry.hwaddr[6] to hwaddr[ETH_ALEN]. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
Joe Perches	07fc67befd	8021q: Use ether_addr_copy Use ether_addr_copy instead of memcpy(a, b, ETH_ALEN) to save some cycles on arm and powerpc. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:13:04 -08:00
David S. Miller	70fccb7e34	Merge branch 'gro_udp_encap' Or Gerlitz says: ==================== net: Add GRO support for UDP encapsulating protocols This series adds GRO handlers for protocols that do UDP encapsulation, with the intent of being able to coalesce packets which encapsulate packets belonging to the same TCP session. For GRO purposes, the destination UDP port takes the role of the ether type field in the ethernet header or the next protocol in the IP header. The UDP GRO handler will only attempt to coalesce packets whose destination port is registered to have gro handler. The patches done against net-next `75e4364f67` "net: stmmac: fix NULL pointer dereference in stmmac_get_tx_hwtstamp" Or. v4 --> v5 changes: - followed Eric's directives to avoid using atomic get/put ops on the udp gro receive and complete callbacks and instead keep the rcu_read_lock when calling the next handler on the chain. v3 --> v4 changes: - applied feedback from Tom on some micro-optimizations that save branches and goto directives in the udp gro logic - applied feedback from Eric on correct RCU programming for the add/remove flow of the upper protocols udp gro handlers v2 --> v3 changes: - moved to use linked list to store the udp gro handlers, this solves the problem of consuming 512KB of memory for the handlers. - use a mark on the skb GRO CB data to disallow running the udp gro_receive twice on a packet, this solves the problem of udp encapsulated packets whose inner VM packet is udp and happen to carry a port which has registered offloads - and flush it. - invoke the udp offload protocol registration and de-registration from the vxlan driver in a sleepable context ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:12 -08:00
Or Gerlitz	dc01e7d344	net: Add GRO support for vxlan traffic Add GRO handlers for vxlann, by using the UDP GRO infrastructure. For single TCP session that goes through vxlan tunneling I got nice improvement from 6.8Gbs to 11.5Gbs --> UDP/VXLAN GRO disabled $ netperf -H 192.168.52.147 -c -C $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 6799.75 12.54 24.79 0.604 1.195 --> UDP/VXLAN GRO enabled $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 65536 65536 10.00 11562.72 24.90 20.34 0.706 0.577 Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Or Gerlitz	e27a2f8395	net: Export gro_find_by_type helpers Export the gro_find_receive/complete_by_type helpers to they can be invoked by the gro callbacks of encapsulation protocols such as vxlan. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Or Gerlitz	b582ef0990	net: Add GRO support for UDP encapsulating protocols Add GRO handlers for protocols that do UDP encapsulation, with the intent of being able to coalesce packets which encapsulate packets belonging to the same TCP session. For GRO purposes, the destination UDP port takes the role of the ether type field in the ethernet header or the next protocol in the IP header. The UDP GRO handler will only attempt to coalesce packets whose destination port is registered to have gro handler. Use a mark on the skb GRO CB data to disallow (flush) running the udp gro receive code twice on a packet. This solves the problem of udp encapsulated packets whose inner VM packet is udp and happen to carry a port which has registered offloads. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 18:05:04 -08:00
Vince Bridgers	2618abb73c	stmmac: Fix kernel crashes for jumbo frames These changes correct the following issues with jumbo frames on the stmmac driver: 1) The Synopsys EMAC can be configured to support different FIFO sizes at core configuration time. There's no way to query the controller and know the FIFO size, so the driver needs to get this information from the device tree in order to know how to correctly handle MTU changes and setting up dma buffers. The default max-frame-size is as currently used, which is the size of a jumbo frame. 2) The driver was enabling Jumbo frames by default, but was not allocating dma buffers of sufficient size to handle the maximum possible packet size that could be received. This led to memory corruption since DMAs were occurring beyond the extent of the allocated receive buffers for certain types of network traffic. kernel BUG at net/core/skbuff.c:126! Internal error: Oops - BUG: 0 [#1] SMP ARM Modules linked in: CPU: 0 PID: 563 Comm: sockperf Not tainted 3.13.0-rc6-01523-gf7111b9 #31 task: ef35e580 ti: ef252000 task.ti: ef252000 PC is at skb_panic+0x60/0x64 LR is at skb_panic+0x60/0x64 pc : [<c03c7c3c>] lr : [<c03c7c3c>] psr: 60000113 sp : ef253c18 ip : 60000113 fp : 00000000 r10: ef3a5400 r9 : 00000ebc r8 : ef3a546c r7 : ee59f000 r6 : ee59f084 r5 : ee59ff40 r4 : ee59f140 r3 : 000003e2 r2 : 00000007 r1 : c0b9c420 r0 : 0000007d Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 2e8ac04a DAC: 00000015 Process sockperf (pid: 563, stack limit = 0xef252248) Stack: (0xef253c18 to 0xef254000) 3c00: 00000ebc ee59f000 3c20: ee59f084 ee59ff40 ee59f140 c04a9cd8 ee8c50c0 00000ebc ee59ff40 00000000 3c40: ee59f140 c02d0ef0 00000056 ef1eda80 ee8c50c0 00000ebc 22bbef29 c0318f8c 3c60: 00000056 ef3a547c ffe2c716 c02c9c90 c0ba1298 ef3a5838 ef3a5838 ef3a5400 3c80: 000020c0 ee573840 000055cb ef3f2050 c053f0e0 c0319214 22b9b085 22d92813 3ca0: 00001c80 004b8e00 ef3a5400 ee573840 ef3f2064 22d92813 ef3f2064 000055cb 3cc0: ef3f2050 c031a19c ef252000 00000000 00000000 c0561bc0 00000000 ff00ffff 3ce0: c05621c0 ef3a5400 ef3f2064 ee573840 00000020 ef3f2064 000055cb ef3f2050 3d00: c053f0e0 c031cad0 c053e740 00000e60 00000000 00000000 ee573840 ef3a5400 3d20: ef0a6e00 00000000 ef3f2064 c032507c 00010000 00000020 c0561bc0 c0561bc0 3d40: ee599850 c032799c 00000000 ee573840 c055a380 ef3a5400 00000000 ef3f2064 3d60: ef3f2050 c032799c 0101c7c0 2b6755cb c059a280 c030e4d8 000055cb ffffffff 3d80: ee574fc0 c055a380 ee574000 ee573840 00002b67 ee573840 c03fe9c4 c053fa68 3da0: c055a380 00001f6f 00000000 ee573840 c053f0e0 c0304fdc ef0a6e01 ef3f2050 3dc0: ee573858 ef031000 ee573840 c03055d8 c0ba0c40 ef000f40 00100100 c053f0dc 3de0: c053ffdc c053f0f0 00000008 00000000 ef031000 c02da948 00001140 00000000 3e00: c0563c78 ef253e5f 00000020 ee573840 00000020 c053f0f0 ef313400 ee573840 3e20: c053f0e0 00000000 00000000 c05380c0 ef313400 00001000 00000015 c02df280 3e40: ee574000 ef001e00 00000000 00001080 00000042 005cd980 ef031500 ef031500 3e60: 00000000 c02df824 ef031500 c053e390 c0541084 f00b1e00 c05925e8 c02df864 3e80: 00001f5c ef031440 c053e390 c0278524 00000002 00000000 c0b9eb48 c02df280 3ea0: ee8c7180 00000100 c0542ca8 00000015 00000040 ef031500 ef031500 ef031500 3ec0: c027803c ef252000 00000040 000000ec c05380c0 c0b9eb40 c0b9eb48 c02df940 3ee0: ef060780 ffffa4dd c0564a9c c056343c 002e80a8 00000080 ef031500 00000001 3f00: c053808c ef252000 fffec100 00000003 00000004 002e80a8 0000000c c00258f0 3f20: 002e80a8 c005e704 00000005 00000100 c05634d0 c0538080 c05333e0 00000000 3f40: 0000000a c0565580 c05380c0 ffffa4dc c05434f4 00400100 00000004 c0534cd4 3f60: 00000098 00000000 fffec100 002e80a8 00000004 002e80a8 002a20e0 c0025da8 3f80: c0534cd4 c000f020 fffec10c c053ea60 ef253fb0 c0008530 0000ffe2 b6ef67f4 3fa0: 40000010 ffffffff 00000124 c0012f3c 0000ffe2 002e80f0 0000ffe2 00004000 3fc0: becb6338 becb6334 00000004 00000124 002e80a8 00000004 002e80a8 002a20e0 3fe0: becb6300 becb62f4 002773bb b6ef67f4 40000010 ffffffff 00000000 00000000 [<c03c7c3c>] (skb_panic+0x60/0x64) from [<c02d0ef0>] (skb_put+0x4c/0x50) [<c02d0ef0>] (skb_put+0x4c/0x50) from [<c0318f8c>] (tcp_collapse+0x314/0x3ec) [<c0318f8c>] (tcp_collapse+0x314/0x3ec) from [<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) [<c0319214>] (tcp_try_rmem_schedule+0x1b0/0x3c4) from [<c031a19c>] (tcp_data_queue+0x480/0xe6c) [<c031a19c>] (tcp_data_queue+0x480/0xe6c) from [<c031cad0>] (tcp_rcv_established+0x180/0x62c) [<c031cad0>] (tcp_rcv_established+0x180/0x62c) from [<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) [<c032507c>] (tcp_v4_do_rcv+0x13c/0x31c) from [<c032799c>] (tcp_v4_rcv+0x718/0x73c) [<c032799c>] (tcp_v4_rcv+0x718/0x73c) from [<c0304fdc>] (ip_local_deliver+0x98/0x274) [<c0304fdc>] (ip_local_deliver+0x98/0x274) from [<c03055d8>] (ip_rcv+0x420/0x758) [<c03055d8>] (ip_rcv+0x420/0x758) from [<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) [<c02da948>] (__netif_receive_skb_core+0x44c/0x5bc) from [<c02df280>] (netif_receive_skb+0x48/0xb4) [<c02df280>] (netif_receive_skb+0x48/0xb4) from [<c02df824>] (napi_gro_flush+0x70/0x94) [<c02df824>] (napi_gro_flush+0x70/0x94) from [<c02df864>] (napi_complete+0x1c/0x34) [<c02df864>] (napi_complete+0x1c/0x34) from [<c0278524>] (stmmac_poll+0x4e8/0x5c8) [<c0278524>] (stmmac_poll+0x4e8/0x5c8) from [<c02df940>] (net_rx_action+0xc4/0x1e4) [<c02df940>] (net_rx_action+0xc4/0x1e4) from [<c00258f0>] (__do_softirq+0x12c/0x2e8) [<c00258f0>] (__do_softirq+0x12c/0x2e8) from [<c0025da8>] (irq_exit+0x78/0xac) [<c0025da8>] (irq_exit+0x78/0xac) from [<c000f020>] (handle_IRQ+0x44/0x90) [<c000f020>] (handle_IRQ+0x44/0x90) from [<c0008530>] (gic_handle_irq+0x2c/0x5c) [<c0008530>] (gic_handle_irq+0x2c/0x5c) from [<c0012f3c>] (__irq_usr+0x3c/0x60) 3) The driver was setting the dma buffer size after allocating dma buffers, which caused a system panic when changing the MTU. BUG: Bad page state in process ifconfig pfn:2e850 page:c0b72a00 count:0 mapcount:0 mapping: (null) index:0x0 page flags: 0x200(arch_1) Modules linked in: CPU: 0 PID: 566 Comm: ifconfig Not tainted 3.13.0-rc6-01523-gf7111b9 #29 [<c001547c>] (unwind_backtrace+0x0/0xf8) from [<c00122dc>] (show_stack+0x10/0x14) [<c00122dc>] (show_stack+0x10/0x14) from [<c03c793c>] (dump_stack+0x70/0x88) [<c03c793c>] (dump_stack+0x70/0x88) from [<c00b2620>] (bad_page+0xc8/0x118) [<c00b2620>] (bad_page+0xc8/0x118) from [<c00b302c>] (get_page_from_freelist+0x744/0x870) [<c00b302c>] (get_page_from_freelist+0x744/0x870) from [<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) [<c00b40f4>] (__alloc_pages_nodemask+0x118/0x86c) from [<c00b4858>] (__get_free_pages+0x10/0x54) [<c00b4858>] (__get_free_pages+0x10/0x54) from [<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) [<c00cba1c>] (kmalloc_order_trace+0x24/0xa0) from [<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) [<c02d199c>] (__kmalloc_reserve.isra.21+0x24/0x70) from [<c02d240c>] (__alloc_skb+0x68/0x13c) [<c02d240c>] (__alloc_skb+0x68/0x13c) from [<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) [<c02d3930>] (__netdev_alloc_skb+0x3c/0xe8) from [<c0279378>] (stmmac_open+0x63c/0x1024) [<c0279378>] (stmmac_open+0x63c/0x1024) from [<c02e18cc>] (__dev_open+0xa0/0xfc) [<c02e18cc>] (__dev_open+0xa0/0xfc) from [<c02e1b40>] (__dev_change_flags+0x94/0x158) [<c02e1b40>] (__dev_change_flags+0x94/0x158) from [<c02e1c24>] (dev_change_flags+0x18/0x48) [<c02e1c24>] (dev_change_flags+0x18/0x48) from [<c0337bc0>] (devinet_ioctl+0x638/0x700) [<c0337bc0>] (devinet_ioctl+0x638/0x700) from [<c02c7aec>] (sock_ioctl+0x64/0x290) [<c02c7aec>] (sock_ioctl+0x64/0x290) from [<c0100890>] (do_vfs_ioctl+0x78/0x5b8) [<c0100890>] (do_vfs_ioctl+0x78/0x5b8) from [<c0100e0c>] (SyS_ioctl+0x3c/0x5c) [<c0100e0c>] (SyS_ioctl+0x3c/0x5c) from [<c000e760>] The fixes have been verified using reproducible, automated testing. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:05:27 -08:00
Vince Bridgers	369ea818bc	dts: Add a binding for Synopsys emac max-frame-size This change adds a parameter for the Synopsys 10/100/1000 stmmac Ethernet driver to configure the maximum frame size supported by the EMAC driver. Synopsys allows the FIFO sizes to be configured when the cores are built for a particular device, but do not provide a way for the driver to read information from the device about the maximum MTU size supported as limited by the device's FIFO size. Signed-off-by: Vince Bridgers <vbridgers2013@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:05:27 -08:00
Dan Carpenter	08d4d217df	rxrpc: out of bound read in debug code Smatch complains because we are using an untrusted index into the rxrpc_acks[] array. It's just a read and it's only in the debug code, but it's simple enough to add a check and fix it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:02:52 -08:00
Yegor Yefremov	2fa053a0a2	8021q: update description Replace deprecated 'vconfig' tool with 'ip' from 'iproute2'. Add some beautifications like replacing 'ethernet' with 'Ethernet' and removing unneeded spaces. Signed-off-by: Yegor Yefremov <yegorslists@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 17:01:25 -08:00
Hannes Frederic Sowa	82b276cd2b	ipv6: protect protocols not handling ipv4 from v4 connection/bind attempts Some ipv6 protocols cannot handle ipv4 addresses, so we must not allow connecting and binding to them. sendmsg logic does already check msg->name for this but must trust already connected sockets which could be set up for connection to ipv4 address family. Per-socket flag ipv6only is of no use here, as it is under users control by setsockopt. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:59:19 -08:00
FX Le Bail	446fab5933	ipv6: enable anycast addresses as source addresses in ICMPv6 error messages - Uses ipv6_anycast_destination() in icmp6_send(). Suggested-by: Bill Fink <billfink@mindspring.com> Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:53:23 -08:00
Peter Pan(潘卫平)	4d83e17730	tcp: delete redundant calls of tcp_mtup_init() As tcp_rcv_state_process() has already calls tcp_mtup_init() for non-fastopen sock, we can delete the redundant calls of tcp_mtup_init() in tcp_{v4,v6}_syn_recv_sock(). Signed-off-by: Weiping Pan <panweiping3@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:52:31 -08:00
Daniel Borkmann	f0d4eb29d1	packet: fix a couple of cppcheck warnings Doesn't bring much, but also doesn't hurt us to fix 'em: 1) In tpacket_rcv() flush dcache page we can restirct the scope for start and end and remove one layer of indent. 2) In tpacket_destruct_skb() we can restirct the scope for ph. 3) In alloc_one_pg_vec_page() we can remove the NULL assignment and change spacing a bit. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:51:42 -08:00
Duan Jiong	967680e027	ipv4: remove the useless argument from ip_tunnel_hash() Since commit c544193214("GRE: Refactor GRE tunneling code") introduced function ip_tunnel_hash(), the argument itn is no longer in use, so remove it. Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:48:18 -08:00
stephen hemminger	8d78316386	bond: make slave_sysfs_ops static Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:45:41 -08:00
Sabrina Dubroca	f14fe8a848	net: remove unnecessary initializations in net_dev_init softnet_data is already set to 0, no need to use memset or initialize specific fields to 0 or NULL afterwards. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:44:45 -08:00
Jesse Brandeburg	ead5139aa1	net: add vxlan description Add a description to the vxlan module, helping save the world from the minions of destruction and confusion. Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-01-21 16:28:07 -08:00

1 2 3 4 5 ...

416651 Commits All Branches Search

416651 Commits

All Branches