OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Alexander Duyck	bf7fa551e0	mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path There is a possible issue with the use, or lack thereof of sk_refcnt and sk_wmem_alloc in the wifi ack status functionality. Specifically if a socket were to request acknowledgements, and the socket were to have sk_refcnt drop to 0 resulting in it waiting on sk_wmem_alloc to reach 0 it would be possible to have sock_queue_err_skb orphan the last buffer, resulting in __sk_free being called on the socket. After this the buffer is enqueued on sk_error_queue, however the queue has already been flushed resulting in at least a memory leak, if not a data corruption. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:51:25 -04:00
Alexander Duyck	cab41c47d9	skb: Add documentation for skb_clone_sk This change adds some documentation to the call skb_clone_sk. This is meant to help clarify the purpose of the function for other developers. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 17:51:24 -04:00
Sébastien Barré	72b126a45e	Revert "ipv4: Clarify in docs that accept_local requires rp_filter." This reverts commit `c801e3cc19` ("ipv4: Clarify in docs that accept_local requires rp_filter."). It is not needed anymore since commit `1dced6a854` ("ipv4: Restore accept_local behaviour in fib_validate_source()"). Suggested-by: Julian Anastasov <ja@ssi.bg> Cc: Gregory Detal <gregory.detal@uclouvain.be> Cc: Christoph Paasch <christoph.paasch@uclouvain.be> Cc: Hannes Frederic Sowa <hannes@redhat.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-12 16:34:17 -04:00
Daniel Borkmann	b954d83421	net: bpf: only build bpf_jit_binary_{alloc, free}() when jit selected Since BPF JIT depends on the availability of module_alloc() and module_free() helpers (HAVE_BPF_JIT and MODULES), we better build that code only in case we have BPF_JIT in our config enabled, just like with other JIT code. Fixes builds for arm/marzen_defconfig and sh/rsk7269_defconfig. ==================== kernel/built-in.o: In function `bpf_jit_binary_alloc': /home/cwang/linux/kernel/bpf/core.c:144: undefined reference to `module_alloc' kernel/built-in.o: In function `bpf_jit_binary_free': /home/cwang/linux/kernel/bpf/core.c:164: undefined reference to `module_free' make: *** [vmlinux] Error 1 ==================== Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: `738cbe72ad` ("net: bpf: consolidate JIT binary allocator") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:05:07 -07:00
David S. Miller	17fa1f9836	Merge branch 'cxgb4-next' Hariprasad Shenai says: ==================== cxgb4: Allow FW size upto 1MB, support for S25FL032P flash and misc. fixes This patch series adds support to allow FW size upto 1MB, support for S25FL032P flash. Fix t4_flash_erase_sectors to throw an error, when erase sector aren't in the flash and also warning message when adapters have flashes less than 2Mb. Adds device id of new adapter and removes device id of debug adapter. The patches series is created against 'net-next' tree. And includes patches on cxgb4 driver and cxgb4vf driver. We have included all the maintainers of respective drivers. Kindly review the change and let us know in case of any review comments. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:02:37 -07:00
Hariprasad Shenai	56e03e51e6	cxgb4/cxgb4vf: Add device ID for new adapter and remove for dbg adapter Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:01:54 -07:00
Hariprasad Shenai	c290607e3e	cxgb4: Add warning msg when attaching to adapters which have FLASHes smaller than 2Mb Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:01:54 -07:00
Hariprasad Shenai	c0d5b8cf50	cxgb4: Fix t4_flash_erase_sectors() to throw an error when requested to erase sectors which aren't in the FLASH Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:01:54 -07:00
Hariprasad Shenai	fe2ee139aa	cxgb4: Add support to S25FL032P flash Add support for Spansion S25FL032P flash Based on original work by Dimitris Michailidis Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:01:54 -07:00
Hariprasad Shenai	60d42bf6b0	cxgb4: Allow T4/T5 firmware sizes up to 1MB Based on original work by Casey Leedom <leedom@chelsio.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:01:53 -07:00
Erik Hugne	0fc4dffad1	tipc: fix sparse warnings This fixes the following sparse warnings: sparse: symbol 'tipc_update_nametbl' was not declared. Should it be static? Also, the function is changed to return bool upon success, rather than a potentially freed pointer. Signed-off-by: Erik Hugne <erik.hugne@ericsson.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 14:00:58 -07:00
Romain Perier	cf98192d2e	net: ethernet: arc: Don't free Rockchip resources before disconnect from phy Free resources before being disconnected from phy and calling core driver is wrong and should not happen. It avoids a delay of 4-5s caused by the timeout of phy_disconnect(). Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 13:03:00 -07:00
David S. Miller	0aac383353	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== nf-next pull request The following patchset contains Netfilter/IPVS updates for your net-next tree. Regarding nf_tables, most updates focus on consolidating the NAT infrastructure and adding support for masquerading. More specifically, they are: 1) use __u8 instead of u_int8_t in arptables header, from Mike Frysinger. 2) Add support to match by skb->pkttype to the meta expression, from Ana Rey. 3) Add support to match by cpu to the meta expression, also from Ana Rey. 4) A smatch warning about IPSET_ATTR_MARKMASK validation, patch from Vytas Dauksa. 5) Fix netnet and netportnet hash types the range support for IPv4, from Sergey Popovich. 6) Fix missing-field-initializer warnings resolved, from Mark Rustad. 7) Dan Carperter reported possible integer overflows in ipset, from Jozsef Kadlecsick. 8) Filter out accounting objects in nfacct by type, so you can selectively reset quotas, from Alexey Perevalov. 9) Move specific NAT IPv4 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 10) Use the new NAT IPv4 functions from nft_chain_nat_ipv4. 11) Move specific NAT IPv6 functions to the core so x_tables and nf_tables can share the same NAT IPv4 engine. 12) Use the new NAT IPv6 functions from nft_chain_nat_ipv6. 13) Refactor code to add nft_delrule(), which can be reused in the enhancement of the NFT_MSG_DELTABLE to remove a table and its content, from Arturo Borrero. 14) Add a helper function to unregister chain hooks, from Arturo Borrero. 15) A cleanup to rename to nft_delrule_by_chain for consistency with the new nft_*() functions, also from Arturo. 16) Add support to match devgroup to the meta expression, from Ana Rey. 17) Reduce stack usage for IPVS socket option, from Julian Anastasov. 18) Remove unnecessary textsearch state initialization in xt_string, from Bojan Prtvar. 19) Add several helper functions to nf_tables, more work to prepare the enhancement of NFT_MSG_DELTABLE, again from Arturo Borrero. 20) Enhance NFT_MSG_DELTABLE to delete a table and its content, from Arturo Borrero. 21) Support NAT flags in the nat expression to indicate the flavour, eg. random fully, from Arturo. 22) Add missing audit code to ebtables when replacing tables, from Nicolas Dichtel. 23) Generalize the IPv4 masquerading code to allow its re-use from nf_tables, from Arturo. 24) Generalize the IPv6 masquerading code, also from Arturo. 25) Add the new masq expression to support IPv4/IPv6 masquerading from nf_tables, also from Arturo. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:46:32 -07:00
Joe Perches	b167a37c7b	netfilter: Convert pr_warning to pr_warn Use the more common pr_warn. Other miscellanea: o Coalesce formats o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	47c4cfc37f	iucv: Convert pr_warning to pr_warn Use the more common pr_warn. Coalesce formats. Realign arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	294a0b7f31	pktgen: Convert pr_warning to pr_warn Use the more common pr_warn. Realign arguments. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
Joe Perches	ef423a4109	atm: Convert pr_warning to pr_warn Use the more common pr_warn. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-10 12:40:10 -07:00
David S. Miller	6618ec6f74	Merge branch 'ipip_sit_gro' Tom Herbert says: ==================== net: enable GRO for IPIP and SIT This patch sets populates the IPIP and SIT offload structures with gro_receive and gro_complete functions. This enables use of GRO for these. Also, fixed a problem in IPv6 where we were not properly initializing flush_id. Peformance results are below. Note that these tests were done on bnx2x which doesn't provide RX checksum offload of IPIP or SIT (i.e. does not give CHEKCSUM_COMPLETE). Also, we don't get 4-tuple hash for RSS only 2-tuple in this case so all the packets between two hosts are winding up on the same queue. Net result is the interrupting CPU is the bottleneck in GRO (checksumming every packet there). Testing: netperf TCP_STREAM between two hosts using bnx2x. * Before fix IPIP 1 connection 6.53% CPU utilization 6544.71 Mbps 20 connections 13.79% CPU utilization 9284.54 Mbps SIT 1 connection 6.68% CPU utilization 5653.36 Mbps 20 connections 18.88% CPU utilization 9154.61 Mbps * After fix IPIP 1 connection 5.73% CPU utilization 9279.53 Mbps 20 connections 7.14% CPU utilization 7279.35 Mbps SIT 1 connection 2.95% CPU utilization 9143.36 Mbps 20 connections 7.09% CPU utilization 6255.3 Mbps ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:50 -07:00
Tom Herbert	19424e052f	sit: Add gro callbacks to sit_offload Add ipv6_gro_receive and ipv6_gro_complete to sit_offload to support GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
Tom Herbert	9667e9bb3f	ipip: Add gro callbacks to ipip offload Add inet_gro_receive and inet_gro_complete to ipip_offload to support GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
Tom Herbert	03d56daafe	ipv6: Clear flush_id to make GRO work In TCP gro we check flush_id which is derived from the IP identifier. In IPv4 gro path the flush_id is set with the expectation that every matched packet increments IP identifier. In IPv6, the flush_id is never set and thus is uinitialized. What's worse is that in IPv6 over IPv4 encapsulation, the IP identifier is taken from the outer header which is currently not incremented on every packet for Linux stack, so GRO in this case never matches packets (identifier is not increasing). This patch clears flush_id for every time for a matched packet in IPv6 gro_receive. We need to do this each time to overwrite the setting that would be done in IPv4 gro_receive per the outer header in IPv6 over Ipv4 encapsulation. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 21:29:33 -07:00
Joe Perches	fe3881cf7e	drivers/net: Convert remaining uses of pr_warning to pr_warn Use the much more common pr_warn instead of pr_warning. Other miscellanea: o Typo fixes submiting/submitting o Coalesce formats o Realign arguments o Add missing terminating '\n' to formats Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:37:08 -07:00
Florian Westphal	46cfd725c3	net: use kfree_skb_list() helper in more places Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	72bb17b37b	ipv4: udp4_gro_complete() is static net/ipv4/udp_offload.c:339:5: warning: symbol 'udp4_gro_complete' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tom Herbert <therbert@google.com> Fixes: `57c67ff4bd` ("udp: additional GRO support") Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	416c51e17b	netns: remove one sparse warning net/core/net_namespace.c:227:18: warning: incorrect type in argument 1 (different address spaces) net/core/net_namespace.c:227:18: expected void const <noident> net/core/net_namespace.c:227:18: got struct net_generic [noderef] <asn:4>gen We can use rcu_access_pointer() here as read-side access to the pointer was removed at least one grace period ago. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:45 -07:00
Eric Dumazet	cc9c668a08	ipv6: udp6_gro_complete() is static net/ipv6/udp_offload.c:159:5: warning: symbol 'udp6_gro_complete' was not declared. Should it be static? Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `57c67ff4bd` ("udp: additional GRO support") Cc: Tom Herbert <therbert@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Eric Dumazet	8e380f004e	ipv4: rcu cleanup in ip_ra_control() Remove one sparse warning : net/ipv4/ip_sockglue.c:328:22: warning: incorrect type in assignment (different address spaces) net/ipv4/ip_sockglue.c:328:22: expected struct ip_ra_chain [noderef] <asn:4>next net/ipv4/ip_sockglue.c:328:22: got struct ip_ra_chain [assigned] ra And replace one rcu_assign_ptr() by RCU_INIT_POINTER() where applicable. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Daniel Borkmann	cbeddd5d16	ipv6: mcast: remove dead debugging defines It's not used anywhere, so just remove these. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Andy Shevchenko	be07b79dcf	irda: vlsi_ir: use %*ph specifier Instead of looping in the code let's use kernel extension to dump small buffers. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
hayeswang	8ddfa07778	r8152: use usleep_range Replace mdelay with usleep_range to avoid busy loop. Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 20:10:44 -07:00
Willem de Bruijn	67cc0d4077	net-timestamp: optimize sock_tx_timestamp default path Few packets have timestamping enabled. Exit sock_tx_timestamp quickly in this common case. Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:34:41 -07:00
Florian Westphal	17448e5f63	net_sched: sfq: remove unused macro not used anymore since `ddecf0f` (net_sched: sfq: add optional RED on top of SFQ). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:34:41 -07:00
Rick Jones	4ef6dae4ba	sfc: Convert the normal transmit complete path to dev_consume_skb_any() Convert the normal transmit completion path from dev_kfree_skb_any() to dev_consume_skb_any() to help keep dropped packet profiling meaningful. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:33:56 -07:00
David S. Miller	29c10a817f	Merge branch 'bond_lock_removal' Nikolay Aleksandrov says: ==================== bonding: get rid of bond->lock This patch-set removes the last users of bond->lock and converts the places that needed it for sync to use curr_slave_lock or RCU as appropriate. I've run this with lockdep and have stress-tested it via loading/unloading and enslaving/releasing in parallel while outputting bond's proc, I didn't see any issues. Please pay special attention to the procfs change, I've done about an hour of stress-testing on it and have checked that the event that causes the bonding to delete its proc entry (NETDEV_UNREGISTER) is called before ndo_uninit() and the freeing of the dev so any readers will sync with that. Also ran sparse checks and there were no splats. v2: Add patch 0001/cxgb4 bond->lock removal, RTNL should be held in the notifier call, the other patches are the same. Also tested with allmodconfig to make sure there're no more users of bond->lock. Changes from the RFC: use RCU in procfs instead of RTNL since RTNL might lead to a deadlock with unloading and also is much slower. The bond destruction syncs with proc via the proc locks. There's one new patch that converts primary_slave to use RCU as it was necessary to fix a longstanding bugs in sysfs and procfs and to make it easy to migrate bond's procfs to RCU. And of course rebased on top of net-next current. This is the first patch-set in a series that should simplify the bond's locking requirements and will make it easier to define the locking conditions necessary for the various paths. The goal is to rely on RTNL and rcu alone, an extra lock would be needed in a few special cases that would be documented very well. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:43 -07:00
Nikolay Aleksandrov	87163ef9cd	bonding: remove last users of bond->lock and bond->lock itself The usage of bond->lock in bond_main.c was completely unnecessary as it didn't help to sync with anything, most of the spots already had RTNL. Since there're no more users of bond->lock, remove it. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:36 -07:00
Nikolay Aleksandrov	246df7b423	bonding: options: remove bond->lock usage We're safe to remove the bond->lock use from the arp targets because arp_rcv_probe no longer acquires bond->lock, only rcu_read_lock. Also setting the primary slave is safe because noone uses the bond->lock as a syncing mechanism for that anymore. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Nikolay Aleksandrov	e9fe8efeea	bonding: procfs: clean bond->lock usage and use RCU Use RCU to protect against slave release, the proc show function will sync with the bond destruction by the proc locks and the fact that the bond is released after NETDEV_UNREGISTER which causes the bonding to remove the proc entry. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Nikolay Aleksandrov	059b47e8aa	bonding: convert primary_slave to use RCU This is necessary mainly for two bonding call sites: procfs and sysfs as it was dereferenced without any real protection. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Nikolay Aleksandrov	ecfede424e	bonding: alb: clean bond->lock We can remove the lock/unlock as it's no longer necessary since RTNL should be held while calling bond_alb_set_mac_address(). Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Nikolay Aleksandrov	bdbc5f1303	bonding: 3ad: use curr_slave_lock instead of bond->lock In 3ad mode the only syncing needed by bond->lock is for the wq and the recv handler, so change them to use curr_slave_lock. There're no locking dependencies here as 3ad doesn't use curr_slave_lock at all. Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Nikolay Aleksandrov	51752afa73	cxgb4: remove bond->lock RTNL should be already held in the notifier call so the slave list can be traversed without a problem, remove the unnecessary bond->lock. CC: Hariprasad S <hariprasad@chelsio.com> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:31:35 -07:00
Romain Perier	c6ec956b73	ARM: dts: Enable emac node on the rk3188-radxarock boards This enables EMAC Rockchip support on radxa rock boards. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:29:59 -07:00
Romain Perier	18ec91e194	ARM: dts: Add emac nodes to the rk3188 device tree This adds support for EMAC Rockchip driver on RK3188 SoCs. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:29:59 -07:00
Romain Perier	40404e00f1	dt-bindings: Document EMAC Rockchip This adds the necessary binding documentation for the EMAC Rockchip platform driver found in RK3066 and RK3188 SoCs. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:29:59 -07:00
Romain Perier	6eacf31139	ethernet: arc: Add support for Rockchip SoC layer device tree bindings This patch defines a platform glue layer for Rockchip SoCs which support arc-emac driver. It ensures that regulator for the rmii is on before trying to connect to the ethernet controller. It applies right speed and mode changes to the grf when ethernet settings change. Signed-off-by: Romain Perier <romain.perier@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 17:29:59 -07:00
David S. Miller	60005c60b1	Merge branch 'bpf-next' Daniel Borkmann says: ==================== BPF updates [ Set applies on top of current net-next but also on top of Alexei's latest patches. Please see individual patches for more details. ] Changelog: v1->v2: - Removed paragraph in 1st commit message - Rest stays the same ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:59:03 -07:00
Daniel Borkmann	286aad3c40	net: bpf: be friendly to kmemcheck Reported by Mikulas Patocka, kmemcheck currently barks out a false positive since we don't have special kmemcheck annotation for bitfields used in bpf_prog structure. We currently have jited:1, len:31 and thus when accessing len while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that we're reading uninitialized memory. As we don't need the whole bit universe for pages member, we can just split it to u16 and use a bool flag for jited instead of a bitfield. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Daniel Borkmann	55309dd3d4	net: bpf: arm: address randomize and write protect JIT code This is the ARM variant for `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks"). It is now possible to implement it due to commits `75374ad47c` ("ARM: mm: Define set_memory_* functions for ARM") and `dca9aa92fc` ("ARM: add DEBUG_SET_MODULE_RONX option to Kconfig") which added infrastructure for this facility. Thus, this patch makes sure the BPF generated JIT code is marked RO, as other kernel text sections, and also lets the generated JIT code start at a pseudo random offset instead on a page boundary. The holes are filled with illegal instructions. JIT tested on armv7hl with BPF test suite. Reference: http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Mircea Gherzan <mgherzan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Daniel Borkmann	738cbe72ad	net: bpf: consolidate JIT binary allocator Introduced in commit `314beb9bca` ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") and later on replicated in `aa2d2c73c2` ("s390/bpf,jit: address randomize and write protect jit code") for s390 architecture, write protection for BPF JIT images got added and a random start address of the JIT code, so that it's not on a page boundary anymore. Since both use a very similar allocator for the BPF binary header, we can consolidate this code into the BPF core as it's mostly JIT independant anyway. This will also allow for future archs that support DEBUG_SET_MODULE_RONX to just reuse instead of reimplementing it. JIT tested on x86_64 and s390x with BPF test suite. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:58:56 -07:00
Eric Dumazet	ca777eff51	tcp: remove dst refcount false sharing for prequeue mode Alexander Duyck reported high false sharing on dst refcount in tcp stack when prequeue is used. prequeue is the mechanism used when a thread is blocked in recvmsg()/read() on a TCP socket, using a blocking model rather than select()/poll()/epoll() non blocking one. We already try to use RCU in input path as much as possible, but we were forced to take a refcount on the dst when skb escaped RCU protected region. When/if the user thread runs on different cpu, dst_release() will then touch dst refcount again. Commit `093162553c` (tcp: force a dst refcount when prequeue packet) was an example of a race fix. It turns out the only remaining usage of skb->dst for a packet stored in a TCP socket prequeue is IP early demux. We can add a logic to detect when IP early demux is probably going to use skb->dst. Because we do an optimistic check rather than duplicate existing logic, we need to guard inet_sk_rx_dst_set() and inet6_sk_rx_dst_set() from using a NULL dst. Many thanks to Alexander for providing a nice bug report, git bisection, and reproducer. Tested using Alexander script on a 40Gb NIC, 8 RX queues. Hosts have 24 cores, 48 hyper threads. echo 0 >/proc/sys/net/ipv4/tcp_autocorking for i in `seq 0 47` do for j in `seq 0 2` do netperf -H $DEST -t TCP_STREAM -l 1000 \ -c -C -T $i,$i -P 0 -- \ -m 64 -s 64K -D & done done Before patch : ~6Mpps and ~95% cpu usage on receiver After patch : ~9Mpps and ~35% cpu usage on receiver. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2014-09-09 16:54:41 -07:00

1 2 3 4 5 ...

469706 Commits All Branches Search

469706 Commits

All Branches