OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jason Abele	28981e5eb4	cfg80211: fix n_reg_rules to match world_regdom There are currently 8 rules in the world_regdom, but only the first 6 are applied due to an incorrect value for n_reg_rules. This causes channels 149-165 and 60GHz to be disabled. Signed-off-by: Jason Abele <jason@aether.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-02-24 10:58:37 +01:00
Johannes Berg	a18c7192aa	nl80211: fix memory leak in monitor flags parsing If monitor flags parsing results in active monitor but that isn't supported, the already allocated message is leaked. Fix this by moving the allocation after this check. Reported-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-02-24 10:56:42 +01:00
Samuel Tan	5528fae886	nl80211: use loop index as type for net detect frequency results We currently add nested members of the NL80211_ATTR_SCAN_FREQUENCIES as NLA_U32 attributes of type NL80211_ATTR_WIPHY_FREQ in cfg80211_net_detect_results. However, since there can be an arbitrary number of frequency results, we should use the loop index of the loop used to add the frequency results to NL80211_ATTR_SCAN_FREQUENCIES as the type (i.e. nla_type) for each result attribute, rather than a fixed type. This change is in line with how nested members are added to NL80211_ATTR_SCAN_FREQUENCIES in the functions nl80211_send_wowlan_nd and nl80211_add_scan_req. Signed-off-by: Samuel Tan <samueltan@chromium.org> Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-02-24 10:53:50 +01:00
Eliad Peller	104f5a6206	mac80211: clear sdata->radar_required If ieee80211_vif_use_channel() fails, we have to clear sdata->radar_required (which we might have just set). Failing to do it results in stale radar_required field which prevents starting new scan requests. Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Eliad Peller <eliad@wizery.com> [use false instead of 0] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-02-24 10:51:06 +01:00
Trond Myklebust	7a27eae362	NFS: RDMA Client Sparse Fix #2 This patch fixes another sparse fix found by Dan Carpenter's tool. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJU66HvAAoJENfLVL+wpUDrJdYP/Ap2xiMpEmO6Zl4h/MiDJKYs GIL3o+Ayc+wzlga2qBuNMfFBqzochCL2xqgS/9bvA0z/OT3ilq+FKZRr8FuDsmb+ 5XtIS49Kvo2mjWNpnh8GMoCbBoQfkBMJLETtR0KhhrMBNX5sidBPi2cSeZJxqnAw 9LIc3Od9jyZSLh2je3gJCKxh90CVWgjs9OfJookfksNyVMnwK+TRWjo1FInEfkW6 2fMFrC20xBg0qWWgtZL73ib8ojaIwv5MxnyFNgsiv/xcDWShZhdaMizBlqjDiNUq co9S1skJ9RTgInoDLeWvKoKkc7JjkLTpSzqsZx+Kke75p/D41Sah215jp0yyz2tb fEi+lZ8q1f5iorjamw5UDQPAuNcPx/eBzQNmkFGbue2xDP/kOXErb5AtnfMpDqav FXX/C3Vf6BxeAVB+8VifcdKEchE6cN9St6SBUludIisdmQWLLjSxazTAG7MXZdYW X0dxJ7D2hcwGEbjYKdNOn6a8+N3m1++XiJdPc1KSN71EO6z0xqjQM+3PytXCXaGI t3dKCDoEM/Ee1+4VxVa02OjW62ApNDd8APrCrn+8zMglG0eFwPmocNx/U42WanXd bGnEI9qd6mvn22eJrrOc55gShaDjovhXRmvvc2E8tHKVnaMXdfG8o5CjtB6LPXbW 6deK+m1/LxkzFhRvMySs =ooZn -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-4.0-3' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: RDMA Client Sparse Fix #2 This patch fixes another sparse fix found by Dan Carpenter's tool. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-4.0-3' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Store RDMA credits in unsigned variables	2015-02-23 19:16:43 -05:00
Marcelo Leitner	77751427a1	ipv6: addrconf: validate new MTU before applying it Currently we don't check if the new MTU is valid or not and this allows one to configure a smaller than minimum allowed by RFCs or even bigger than interface own MTU, which is a problem as it may lead to packet drops. If you have a daemon like NetworkManager running, this may be exploited by remote attackers by forging RA packets with an invalid MTU, possibly leading to a DoS. (NetworkManager currently only validates for values too small, but not for too big ones.) The fix is just to make sure the new value is valid. That is, between IPV6_MIN_MTU and interface's MTU. Note that similar check is already performed at ndisc_router_discovery(), for when kernel itself parses the RA. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-23 18:16:12 -05:00
Catalin Marinas	d720d8cec5	net: compat: Ignore MSG_CMSG_COMPAT in compat_sys_{send, recv}msg With commit `a7526eb5d0` (net: Unbreak compat_sys_{send,recv}msg), the MSG_CMSG_COMPAT flag is blocked at the compat syscall entry points, changing the kernel compat behaviour from the one before the commit it was trying to fix (`1be374a051`, net: Block MSG_CMSG_COMPAT in send(m)msg and recv(m)msg). On 32-bit kernels (!CONFIG_COMPAT), MSG_CMSG_COMPAT is 0 and the native 32-bit sys_sendmsg() allows flag 0x80000000 to be set (it is ignored by the kernel). However, on a 64-bit kernel, the compat ABI is different with commit `a7526eb5d0`. This patch changes the compat_sys_{send,recv}msg behaviour to the one prior to commit `1be374a051`. The problem was found running 32-bit LTP (sendmsg01) binary on an arm64 kernel. Arguably, LTP should not pass 0xffffffff as flags to sendmsg() but the general rule is not to break user ABI (even when the user behaviour is not entirely sane). Fixes: `a7526eb5d0` (net: Unbreak compat_sys_{send,recv}msg) Cc: Andy Lutomirski <luto@amacapital.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-23 17:22:05 -05:00
Fabian Frederick	a948f8ce77	irda: replace current->state by set_current_state() Use helper functions to access current->state. Direct assignments are prone to races and therefore buggy. current->state = TASK_RUNNING can be replaced by __set_current_state() Thanks to Peter Zijlstra for the exact definition of the problem. Suggested-By: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-23 17:21:11 -05:00
Chuck Lever	9b1dcbc8cf	xprtrdma: Store RDMA credits in unsigned variables Dan Carpenter's static checker pointed out: net/sunrpc/xprtrdma/rpc_rdma.c:879 rpcrdma_reply_handler() warn: can 'credits' be negative? "credits" is defined as an int. The credits value comes from the server as a 32-bit unsigned integer. A malicious or broken server can plant a large unsigned integer in that field which would result in an underflow in the following logic, potentially triggering a deadlock of the mount point by blocking the client from issuing more RPC requests. net/sunrpc/xprtrdma/rpc_rdma.c: 876 credits = be32_to_cpu(headerp->rm_credit); 877 if (credits == 0) 878 credits = 1; /* don't deadlock */ 879 else if (credits > r_xprt->rx_buf.rb_max_requests) 880 credits = r_xprt->rx_buf.rb_max_requests; 881 882 cwnd = xprt->cwnd; 883 xprt->cwnd = credits << RPC_CWNDSHIFT; 884 if (xprt->cwnd > cwnd) 885 xprt_release_rqst_cong(rqst->rq_task); Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `eba8ff660b` ("xprtrdma: Move credit update to RPC . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-02-23 16:54:04 -05:00
Rasmus Villemoes	46b9e4bb76	decnet: Fix obvious o/0 typo Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-23 15:28:50 -05:00
Neal Cardwell	6514890f7a	tcp: fix tcp_should_expand_sndbuf() to use tcp_packets_in_flight() tcp_should_expand_sndbuf() does not expand the send buffer if we have filled the congestion window. However, it should use tcp_packets_in_flight() instead of tp->packets_out to make this check. Testing has established that the difference matters a lot if there are many SACKed packets, causing a needless performance shortfall. Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-22 23:07:11 -05:00
Eric Dumazet	52d6c8c6ca	net: pktgen: disable xmit_clone on virtual devices Trying to use burst capability (aka xmit_more) on a virtual device like bonding is not supported. For example, skb might be queued multiple times on a qdisc, with various list corruptions. Fixes: `38b2cf2982` ("net: pktgen: packet bursting via skb->xmit_more") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-22 22:43:20 -05:00
Julian Anastasov	528c943f3b	ipvs: add missing ip_vs_pe_put in sync code ip_vs_conn_fill_param_sync() gets in param.pe a module reference for persistence engine from __ip_vs_pe_getbyname() but forgets to put it. Problem occurs in backup for sync protocol v1 (2.6.39). Also, pe_data usually comes in sync messages for connection templates and ip_vs_conn_new() copies the pointer only in this case. Make sure pe_data is not leaked if it comes unexpectedly for normal connections. Leak can happen only if bogus messages are sent to backup server. Fixes: `fe5e7a1efb` ("IPVS: Backup, Adding Version 1 receive capability") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-02-22 16:16:36 -05:00
Pablo Neira Ayuso	02263db00b	netfilter: nf_tables: fix addition/deletion of elements from commit/abort We have several problems in this path: 1) There is a use-after-free when removing individual elements from the commit path. 2) We have to uninit() the data part of the element from the abort path to avoid a chain refcount leak. 3) We have to check for set->flags to see if there's a mapping, instead of the element flags. 4) We have to check for !(flags & NFT_SET_ELEM_INTERVAL_END) to skip elements that are part of the interval that have no data part, so they don't need to be uninit(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-02-22 21:05:08 +01:00
Arturo Borrero	2156d321b8	netfilter: nft_compat: don't truncate ethernet protocol type to u8 Use u16 for protocol and then cast it to __be16 >> net/netfilter/nft_compat.c:140:37: sparse: incorrect type in assignment (different base types) net/netfilter/nft_compat.c:140:37: expected restricted __be16 [usertype] ethproto net/netfilter/nft_compat.c:140:37: got unsigned char [unsigned] [usertype] proto >> net/netfilter/nft_compat.c:351:37: sparse: incorrect type in assignment (different base types) net/netfilter/nft_compat.c:351:37: expected restricted __be16 [usertype] ethproto net/netfilter/nft_compat.c:351:37: got unsigned char [unsigned] [usertype] proto Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-02-22 21:04:06 +01:00
Alexander Drozdov	3f34b24a73	af_packet: allow packets defragmentation not only for hash fanout type Packets defragmentation was introduced for PACKET_FANOUT_HASH only, see `7736d33f42` ("packet: Add pre-defragmentation support for ipv4 fanouts") It may be useful to have defragmentation enabled regardless of fanout type. Without that, the AF_PACKET user may have to: 1. Collect fragments from different rings 2. Defragment by itself Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-21 23:00:18 -05:00
Matthew Thode	a4176a9391	net: reject creation of netdev names with colons colons are used as a separator in netdev device lookup in dev_ioctl.c Specific functions are SIOCGIFTXQLEN SIOCETHTOOL SIOCSIFNAME Signed-off-by: Matthew Thode <mthode@mthode.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-21 21:45:25 -05:00
Linus Torvalds	24a52e412e	NFS client updates for Linux 3.20 Highlights include: - Fix a use-after-free in decode_cb_sequence_args() - Fix a compile error when #undef CONFIG_PROC_FS - NFSv4.1 backchannel spinlocking issue - Cleanups in the NFS unstable write code requested by Linus - NFSv4.1 fix issues when the server denies our backchannel request - Cleanups in create_session and bind_conn_to_session -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU51zBAAoJEGcL54qWCgDyyvwQALp7vxXCTwC3/OSoPkhujd2C HGMoI2nVSl8CY0LFYxoT6auhefZCgQChmQXkGtlZwO+bkEr9rvJ6ii8lhuDR+JXF M0bAwOX+bNxUFkpqYYF3Q4Hi//tMJCIZdqUp2irtyFLL/qlNoN2ktoEYMIjMY5uO 4L1fxj9KaVuMtFuqt3xeSSe41LaXxitKyefyVJfbqz5qbcPGiXzS7WKYAun4RyvM 7rCir9kfnxxEX3+hc7xHxWeFJnW0jUMklrjNvnrubnpHE/fX2sUAnjX2hmx5bLg/ 4puxADzhPT4f3LdGDKXVWaULuTy20VksOJnB82TKJ3rLIELJixGZw88svOrFcGvt 5CMI8BvOihwn8ov+sSj7Xedz4046btwA1YHkUcwPV3LZAlyx/FSq4nasO4Cn27yl OPdkcAL1YR5I83mEKA+8BOVXJuFx5vKhkwmMdReJkBmxsWbSwB/qp6caPD9DtuXI K0qJYWHMqN+Dv9npi0Q4WR6vmnzxV+Eq7Z2D9WPW0FL8nT3STE0eWIrNipyqOv+7 bHptPxrrZej+i3c922bZ6hdaE2uAlwG8FPgEGhHFtm+s09RgDPDG7NiaeCutf9cQ 9ub82Hlk9nLTqq5X3poUPV35RS6THnZcybhngNMy8F1cPSUTs+/H+l91sEW/Zhgw odMB/DEa9sRnZGX5rQXE =NieR -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull more NFS client updates from Trond Myklebust: "Highlights include: - Fix a use-after-free in decode_cb_sequence_args() - Fix a compile error when #undef CONFIG_PROC_FS - NFSv4.1 backchannel spinlocking issue - Cleanups in the NFS unstable write code requested by Linus - NFSv4.1 fix issues when the server denies our backchannel request - Cleanups in create_session and bind_conn_to_session" * tag 'nfs-for-3.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Clean up bind_conn_to_session NFSv4.1: Always set up a forward channel when binding the session NFSv4.1: Don't set up a backchannel if the server didn't agree to do so NFSv4.1: Clean up create_session pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit NFSv4: Kill unused nfs_inode->delegation_state field NFS: struct nfs_commit_info.lock must always point to inode->i_lock nfs: Can call nfs_clear_page_commit() instead nfs: Provide and use helper functions for marking a page as unstable SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock SUNRPC: Fix a compile error when #undef CONFIG_PROC_FS NFSv4.1: Convert open-coded array allocation calls to kmalloc_array() NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args	2015-02-21 14:02:59 -08:00
David S. Miller	ee92259849	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains updates for your net tree, they are: 1) Fix removal of destination in IPVS when the new mixed family support is used, from Alexey Andriyanov via Simon Horman. 2) Fix module refcount undeflow in nft_compat when reusing a match / target. 3) Fix iptables-restore when the recent match is used with a new hitcount that exceeds threshold, from Florian Westphal. 4) Fix stack corruption in xt_socket due to using stack storage to save the inner IPv6 header, from Eric Dumazet. I'll follow up soon with another batch with more fixes that are still cooking. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 17:36:20 -05:00
Dan Carpenter	278f7b4fff	caif: fix a signedness bug in cfpkt_iterate() The cfpkt_iterate() function can return -EPROTO on error, but the function is a u16 so the negative value gets truncated to a positive unsigned short. This causes a static checker warning. The only caller which might care is cffrml_receive(), when it's checking the frame checksum. I modified cffrml_receive() so that it never says -EPROTO is a valid checksum. Also this isn't ever going to be inlined so I removed the "inline". Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 17:35:14 -05:00
Rami Rosen	65e9256c4e	ethtool: Add hw-switch-offload to netdev_features_strings. commit `aafb3e98b2` (netdev: introduce new NETIF_F_HW_SWITCH_OFFLOAD feature flag for switch device offloads) add a new feature without adding it to netdev_features_strings array; this patch fixes this. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 16:36:43 -05:00
Eric Dumazet	997d5c3f44	sock: sock_dequeue_err_skb() needs hard irq safety Non NAPI drivers can call skb_tstamp_tx() and then sock_queue_err_skb() from hard IRQ context. Therefore, sock_dequeue_err_skb() needs to block hard irq or corruptions or hangs can happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `364a9e9324` ("sock: deduplicate errqueue dequeue") Fixes: `cb820f8e4b` ("net: Provide a generic socket error queue delivery method for Tx time stamps.") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 15:52:21 -05:00
Pravin B Shelar	7b4577a9da	openvswitch: Fix net exit. Open vSwitch allows moving internal vport to different namespace while still connected to the bridge. But when namespace deleted OVS does not detach these vports, that results in dangling pointer to netdevice which causes kernel panic as follows. This issue is fixed by detaching all ovs ports from the deleted namespace at net-exit. BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa0aadaa5>] ovs_vport_locate+0x35/0x80 [openvswitch] Oops: 0000 [#1] SMP Call Trace: [<ffffffffa0aa6391>] lookup_vport+0x21/0xd0 [openvswitch] [<ffffffffa0aa65f9>] ovs_vport_cmd_get+0x59/0xf0 [openvswitch] [<ffffffff8167e07c>] genl_family_rcv_msg+0x1bc/0x3e0 [<ffffffff8167e319>] genl_rcv_msg+0x79/0xc0 [<ffffffff8167d919>] netlink_rcv_skb+0xb9/0xe0 [<ffffffff8167deac>] genl_rcv+0x2c/0x40 [<ffffffff8167cffd>] netlink_unicast+0x12d/0x1c0 [<ffffffff8167d3da>] netlink_sendmsg+0x34a/0x6b0 [<ffffffff8162e140>] sock_sendmsg+0xa0/0xe0 [<ffffffff8162e5e8>] ___sys_sendmsg+0x408/0x420 [<ffffffff8162f541>] __sys_sendmsg+0x51/0x90 [<ffffffff8162f592>] SyS_sendmsg+0x12/0x20 [<ffffffff81764ee9>] system_call_fastpath+0x12/0x17 Reported-by: Assaf Muller <amuller@redhat.com> Fixes: 46df7b81454("openvswitch: Add support for network namespaces.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Reviewed-by: Thomas Graf <tgraf@noironetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 15:32:08 -05:00
Ignacy Gawędzki	34eea79e26	ematch: Fix auto-loading of ematch modules. In tcf_em_validate(), after calling request_module() to load the kind-specific module, set em->ops to NULL before returning -EAGAIN, so that module_put() is not called again by tcf_em_tree_destroy(). Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 15:30:56 -05:00
Alexander Drozdov	fba04a9e0c	ipv4: ip_check_defrag should correctly check return value of skb_copy_bits skb_copy_bits() returns zero on success and negative value on error, so it is needed to invert the condition in ip_check_defrag(). Fixes: `1bf3751ec9` ("ipv4: ip_check_defrag must not modify skb before unsharing") Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-20 15:22:38 -05:00
Linus Torvalds	4533f6e27a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph changes from Sage Weil: "On the RBD side, there is a conversion to blk-mq from Christoph, several long-standing bug fixes from Ilya, and some cleanup from Rickard Strandqvist. On the CephFS side there is a long list of fixes from Zheng, including improved session handling, a few IO path fixes, some dcache management correctness fixes, and several blocking while !TASK_RUNNING fixes. The core code gets a few cleanups and Chaitanya has added support for TCP_NODELAY (which has been used on the server side for ages but we somehow missed on the kernel client). There is also an update to MAINTAINERS to fix up some email addresses and reflect that Ilya and Zheng are doing most of the maintenance for RBD and CephFS these days. Do not be surprised to see a pull request come from one of them in the future if I am unavailable for some reason" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits) MAINTAINERS: update Ceph and RBD maintainers libceph: kfree() in put_osd() shouldn't depend on authorizer libceph: fix double __remove_osd() problem rbd: convert to blk-mq ceph: return error for traceless reply race ceph: fix dentry leaks ceph: re-send requests when MDS enters reconnecting stage ceph: show nocephx_require_signatures and notcp_nodelay options libceph: tcp_nodelay support rbd: do not treat standalone as flatten ceph: fix atomic_open snapdir ceph: properly mark empty directory as complete client: include kernel version in client metadata ceph: provide seperate {inode,file}_operations for snapdir ceph: fix request time stamp encoding ceph: fix reading inline data when i_size > PAGE_SIZE ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions) ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps) ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync) rbd: fix error paths in rbd_dev_refresh() ...	2015-02-19 14:14:42 -08:00
Ignacy Gawędzki	1c4cff0cf5	gen_stats.c: Duplicate xstats buffer for later use The gnet_stats_copy_app() function gets called, more often than not, with its second argument a pointer to an automatic variable in the caller's stack. Therefore, to avoid copying garbage afterwards when calling gnet_stats_finish_copy(), this data is better copied to a dynamically allocated memory that gets freed after use. [xiyou.wangcong@gmail.com: remove a useless kfree()] Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-19 15:45:53 -05:00
Linus Torvalds	b11a278397	Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull kconfig updates from Michal Marek: "Yann E Morin was supposed to take over kconfig maintainership, but this hasn't happened. So I'm sending a few kconfig patches that I collected: - Fix for missing va_end in kconfig - merge_config.sh displays used if given too few arguments - s/boolean/bool/ in Kconfig files for consistency, with the plan to only support bool in the future" * 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: kconfig: use va_end to match corresponding va_start merge_config.sh: Display usage if given too few arguments kconfig: use bool instead of boolean for type definition attributes	2015-02-19 10:36:45 -08:00
Ilya Dryomov	b28ec2f37e	libceph: kfree() in put_osd() shouldn't depend on authorizer `a255651d4c` ("ceph: ensure auth ops are defined before use") made kfree() in put_osd() conditional on the authorizer. A mechanical mistake most likely - fix it. Cc: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-02-19 14:27:51 +03:00
Ilya Dryomov	7eb71e0351	libceph: fix double __remove_osd() problem It turns out it's possible to get __remove_osd() called twice on the same OSD. That doesn't sit well with rb_erase() - depending on the shape of the tree we can get a NULL dereference, a soft lockup or a random crash at some point in the future as we end up touching freed memory. One scenario that I was able to reproduce is as follows: <osd3 is idle, on the osd lru list> <con reset - osd3> con_fault_finish() osd_reset() <osdmap - osd3 down> ceph_osdc_handle_map() <takes map_sem> kick_requests() <takes request_mutex> reset_changed_osds() __reset_osd() __remove_osd() <releases request_mutex> <releases map_sem> <takes map_sem> <takes request_mutex> __kick_osd_requests() __reset_osd() __remove_osd() <-- !!! A case can be made that osd refcounting is imperfect and reworking it would be a proper resolution, but for now Sage and I decided to fix this by adding a safe guard around __remove_osd(). Fixes: http://tracker.ceph.com/issues/8087 Cc: Sage Weil <sage@redhat.com> Cc: stable@vger.kernel.org # 3.9+: 7c6e6fc53e73: libceph: assert both regular and lingering lists in __remove_osd() Cc: stable@vger.kernel.org # 3.9+: cc9f1f518cec: libceph: change from BUG to WARN for __remove_osd() asserts Cc: stable@vger.kernel.org # 3.9+ Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-02-19 14:27:50 +03:00
Chaitanya Huilgol	ba988f87f5	libceph: tcp_nodelay support TCP_NODELAY socket option set on connection sockets, disables Nagle’s algorithm and improves latency characteristics. tcp_nodelay(default)/notcp_nodelay option flags provided to enable/disable setting the socket option. Signed-off-by: Chaitanya Huilgol <chaitanya.huilgol@sandisk.com> [idryomov@redhat.com: NO_TCP_NODELAY -> TCP_NODELAY, minor adjustments] Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2015-02-19 13:31:40 +03:00
Ilya Dryomov	f646912d10	libceph: use mon_client.c/put_generic_request() more Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2015-02-19 13:31:37 +03:00
Ilya Dryomov	7a6fdeb2b1	libceph: nuke pool op infrastructure On Mon, Dec 22, 2014 at 5:35 PM, Sage Weil <sage@newdream.net> wrote: > On Mon, 22 Dec 2014, Ilya Dryomov wrote: >> Actually, pool op stuff has been unused for over two years - looks like >> it was added for rbd create_snap and that got ripped out in 2012. It's >> unlikely we'd ever need to manage pools or snaps from the kernel client >> so I think it makes sense to nuke it. Sage? > > Yep! Signed-off-by: Ilya Dryomov <idryomov@redhat.com>	2015-02-19 13:31:37 +03:00
Linus Torvalds	53861af9a1	OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work. Thanks, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU5B9cAAoJENkgDmzRrbjxPacP/jajliXX353JJ/g/hkZ6oDN5 o7FhELBKiUMr7enVZYwj2BBYk5OM36nB9pQkiqHMSbjJGoS5IK70enxb4YRxSHBn YCLblZMNqutGS0kclZ9DDysztjAhxH7CvLM6pMZ7eHP0f3+FM/QhbxHfbG9DTBUH 2U/nybvd3M/+YBe7ptwQdrH8aOCAD6RTIsXellfm99dNMK6K/5lqnWQ98WSXmNXq vyvdaAQsqqUkmxtajjcBumaCH4/SehOJJjUqojCMsR3aBkgOBWDZJURMek+KA5Dt X996fBsTAlvTtCUKRrmLTb2ScDH7fu+jwbWRqMYDk8zpEr3XqiLTTPV4/TiHGmi7 Wiw3g1wIY1YbETlZyongB5MIoVyUfmDAd+bT8nBsj3KIITD84gOUQFDMl6d63c0I z6A9Pu/UzpJGsXZT3WoFLi6TO67QyhOseqZnhS4wBgLabjxffNM7yov9RVKUVH/n JHunnpUk2iTtSgscBarOBz5867dstuurnaUIspZthVBo6y6N0z+GrU+agJ8Y4DXx mvwzeYLhQH2208PjxPFiah/kA/gHNm1m678TbpS+CUsgmpQiJ4gTwtazDSi4TwZY Hs9T9GulkzpZIzEyKL3qG2TsfyDhW5Avn+GvKInAT9+Fkig4BnP3DUONBxcwGZ78 eI3FDUWsE36NqE5ECWmz =ivCe -----END PGP SIGNATURE----- Merge tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull virtio updates from Rusty Russell: "OK, this has the big virtio 1.0 implementation, as specified by OASIS. On top of tht is the major rework of lguest, to use PCI and virtio 1.0, to double-check the implementation. Then comes the inevitable fixes and cleanups from that work" * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits) virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice. virtio_net: unconditionally define struct virtio_net_hdr_v1. tools/lguest: don't use legacy definitions for net device in example launcher. virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined. tools/lguest: use common error macros in the example launcher. tools/lguest: give virtqueues names for better error messages tools/lguest: more documentation and checking of virtio 1.0 compliance. lguest: don't look in console features to find emerg_wr. tools/lguest: don't start devices until DRIVER_OK status set. tools/lguest: handle indirect partway through chain. tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI) tools/lguest: rename virtio_pci_cfg_cap field to match spec. tools/lguest: fix features_accepted logic in example launcher. tools/lguest: handle device reset correctly in example launcher. virtual: Documentation: simplify and generalize paravirt_ops.txt lguest: remove NOTIFY call and eventfd facility. lguest: remove NOTIFY facility from demonstration launcher. lguest: use the PCI console device's emerg_wr for early boot messages. lguest: always put console in PCI slot #1. ...	2015-02-18 09:24:01 -08:00
Trond Myklebust	65d2918e71	Merge branch 'cleanups' Merge cleanups requested by Linus. * cleanups: (3 commits) pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit nfs: Can call nfs_clear_page_commit() instead nfs: Provide and use helper functions for marking a page as unstable	2015-02-18 07:28:37 -08:00
Linus Torvalds	f5af19d10d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking updates from David Miller: 1) Missing netlink attribute validation in nft_lookup, from Patrick McHardy. 2) Restrict ipv6 partial checksum handling to UDP, since that's the only case it works for. From Vlad Yasevich. 3) Clear out silly device table sentinal macros used by SSB and BCMA drivers. From Joe Perches. 4) Make sure the remote checksum code never creates a situation where the remote checksum is applied yet the tunneling metadata describing the remote checksum transformation is still present. Otherwise an external entity might see this and apply the checksum again. From Tom Herbert. 5) Use msecs_to_jiffies() where applicable, from Nicholas Mc Guire. 6) Don't explicitly initialize timer struct fields, use setup_timer() and mod_timer() instead. From Vaishali Thakkar. 7) Don't invoke tg3_halt() without the tp->lock held, from Jun'ichi Nomura. 8) Missing __percpu annotation in ipvlan driver, from Eric Dumazet. 9) Don't potentially perform skb_get() on shared skbs, also from Eric Dumazet. 10) Fix COW'ing of metrics for non-DST_HOST routes in ipv6, from Martin KaFai Lau. 11) Fix merge resolution error between the iov_iter changes in vhost and some bug fixes that occurred at the same time. From Jason Wang. 12) If rtnl_configure_link() fails we have to perform a call to ->dellink() before unregistering the device. From WANG Cong. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (39 commits) net: dsa: Set valid phy interface type rtnetlink: call ->dellink on failure when ->newlink exists com20020-pci: add support for eae single card vhost_net: fix wrong iter offset when setting number of buffers net: spelling fixes net/core: Fix warning while make xmldocs caused by dev.c net: phy: micrel: disable NAND-tree for KSZ8021, KSZ8031, KSZ8051, KSZ8081 ipv6: fix ipv6_cow_metrics for non DST_HOST case openvswitch: Fix key serialization. r8152: restore hw settings hso: fix rx parsing logic when skb allocation fails tcp: make sure skb is not shared before using skb_get() bridge: netfilter: Move sysctl-specific error code inside #ifdef ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc ipvlan: add a missing __percpu pcpu_stats tg3: Hold tp->lock before calling tg3_halt() from tg3_init_one() bgmac: fix device initialization on Northstar SoCs (condition typo) qlcnic: Delete existing multicast MAC list before adding new net/mlx5_core: Fix configuration of log_uar_page_sz sunvnet: don't change gso data on clones ...	2015-02-17 17:41:19 -08:00
David Ramos	a1d1e9be5a	svcrpc: fix memory leak in gssp_accept_sec_context_upcall Our UC-KLEE tool found a kernel memory leak of 512 bytes (on x86_64) for each call to gssp_accept_sec_context_upcall() (net/sunrpc/auth_gss/gss_rpc_upcall.c). Since it appears that this call can be triggered by remote connections (at least, from a cursory a glance at the call chain), it may be exploitable to cause kernel memory exhaustion. We found the bug in kernel 3.16.3, but it appears to date back to commit `9dfd87da1a` (2013-08-20). The gssp_accept_sec_context_upcall() function performs a pair of calls to gssp_alloc_receive_pages() and gssp_free_receive_pages(). The first allocates memory for arg->pages. The second then frees the pages pointed to by the arg->pages array, but not the array itself. Reported-by: David A. Ramos <daramos@stanford.edu> Fixes: `9dfd87da1a` ("rpc: fix huge kmalloc's in gss-proxy”) Signed-off-by: David A. Ramos <daramos@stanford.edu> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-02-17 18:09:02 -05:00
Guenter Roeck	19334920ea	net: dsa: Set valid phy interface type If the phy interface mode is not found in devicetree, or if devicetree is not configured, of_get_phy_mode returns -ENODEV. The current code sets the phy interface mode to the return value from of_get_phy_mode without checking if it is valid. This invalid phy interface mode is passed as parameter to of_phy_connect or to phy_connect_direct. This sets the phy interface mode to the invalid value, which in turn causes problems for any code using phydev->interface. Fixes: `b31f65fb43` ("net: dsa: slave: Fix autoneg for phys on switch MDIO bus") Fixes: `0d8bcdd383` ("net: dsa: allow for more complex PHY setups") Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-17 10:37:39 -08:00
Eric Dumazet	78296c97ca	netfilter: xt_socket: fix a stack corruption bug As soon as extract_icmp6_fields() returns, its local storage (automatic variables) is deallocated and can be overwritten. Lets add an additional parameter to make sure storage is valid long enough. While we are at it, adds some const qualifiers. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `b64c9256a9` ("tproxy: added IPv6 support to the socket match") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-02-16 17:00:48 +01:00
Florian Westphal	cef9ed86ed	netfilter: xt_recent: don't reject rule if new hitcount exceeds table max given: -A INPUT -m recent --update --seconds 30 --hitcount 4 and iptables-save > foo then iptables-restore < foo will fail with: kernel: xt_recent: hitcount (4) is larger than packets to be remembered (4) for table DEFAULT Even when the check is fixed, the restore won't work if the hitcount is increased to e.g. 6, since by the time checkentry runs it will find the 'old' incarnation of the table. We can avoid this by increasing the maximum threshold silently; we only have to rm all the current entries of the table (these entries would not have enough room to handle the increased hitcount). This even makes (not-very-useful) -A INPUT -m recent --update --seconds 30 --hitcount 4 -A INPUT -m recent --update --seconds 30 --hitcount 42 work. Fixes: `abc86d0f99` (netfilter: xt_recent: relax ip_pkt_list_tot restrictions) Tracked-down-by: Chris Vine <chris@cvine.freeserve.co.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-02-16 17:00:47 +01:00
Pablo Neira Ayuso	520aa7414b	netfilter: nft_compat: fix module refcount underflow Feb 12 18:20:42 nfdev kernel: ------------[ cut here ]------------ Feb 12 18:20:42 nfdev kernel: WARNING: CPU: 4 PID: 4359 at kernel/module.c:963 module_put+0x9b/0xba() Feb 12 18:20:42 nfdev kernel: CPU: 4 PID: 4359 Comm: ebtables-compat Tainted: G W 3.19.0-rc6+ #43 [...] Feb 12 18:20:42 nfdev kernel: Call Trace: Feb 12 18:20:42 nfdev kernel: [<ffffffff815fd911>] dump_stack+0x4c/0x65 Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e6f7>] warn_slowpath_common+0x9c/0xb6 Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] ? module_put+0x9b/0xba Feb 12 18:20:42 nfdev kernel: [<ffffffff8103e726>] warn_slowpath_null+0x15/0x17 Feb 12 18:20:42 nfdev kernel: [<ffffffff8109919f>] module_put+0x9b/0xba Feb 12 18:20:42 nfdev kernel: [<ffffffff813ecf7c>] nft_match_destroy+0x45/0x4c Feb 12 18:20:42 nfdev kernel: [<ffffffff813e683f>] nf_tables_rule_destroy+0x28/0x70 Reported-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>	2015-02-16 17:00:36 +01:00
WANG Cong	7afb8886a0	rtnetlink: call ->dellink on failure when ->newlink exists Ignacy reported that when eth0 is down and add a vlan device on top of it like: ip link add link eth0 name eth0.1 up type vlan id 1 We will get a refcount leak: unregister_netdevice: waiting for eth0.1 to become free. Usage count = 2 The problem is when rtnl_configure_link() fails in rtnl_newlink(), we simply call unregister_device(), but for stacked device like vlan, we almost do nothing when we unregister the upper device, more work is done when we unregister the lower device, so call its ->dellink(). Reported-by: Ignacy Gawedzki <ignacy.gawedzki@green-communications.fr> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-15 08:30:10 -08:00
Stephen Hemminger	ca9f1fd263	net: spelling fixes Spelling errors caught by codespell. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-14 20:36:08 -08:00
Masanari Iida	4a26e453d9	net/core: Fix warning while make xmldocs caused by dev.c This patch fix following warning wile make xmldocs. Warning(.//net/core/dev.c:5345): No description found for parameter 'bonding_info' Warning(.//net/core/dev.c:5345): Excess function parameter 'netdev_bonding_info' description in 'netdev_bonding_info_change' This warning starts to appear after following patch was added into Linus's tree during merger period. commit `61bd3857ff` net/core: Add event for a change in slave state Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-14 20:34:49 -08:00
Martin KaFai Lau	3b4711757d	ipv6: fix ipv6_cow_metrics for non DST_HOST case ipv6_cow_metrics() currently assumes only DST_HOST routes require dynamic metrics allocation from inetpeer. The assumption breaks when ndisc discovered router with RTAX_MTU and RTAX_HOPLIMIT metric. Refer to ndisc_router_discovery() in ndisc.c and note that dst_metric_set() is called after the route is created. This patch creates the metrics array (by calling dst_cow_metrics_generic) in ipv6_cow_metrics(). Test: radvd.conf: interface qemubr0 { AdvLinkMTU 1300; AdvCurHopLimit 30; prefix fd00:face:face:face::/64 { AdvOnLink on; AdvAutonomous on; AdvRouterAddr off; }; }; Before: [root@qemu1 ~]# ip -6 r show \| egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec fe80::/64 dev eth0 proto kernel metric 256 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec After: [root@qemu1 ~]# ip -6 r show \| egrep -v unreachable fd00:face:face:face::/64 dev eth0 proto kernel metric 256 expires 27sec mtu 1300 fe80::/64 dev eth0 proto kernel metric 256 mtu 1300 default via fe80::74df:d0ff:fe23:8ef2 dev eth0 proto ra metric 1024 expires 27sec mtu 1300 hoplimit 30 Fixes: `8e2ec63917` (ipv6: don't use inetpeer to store metrics for routes.) Signed-off-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-14 20:26:16 -08:00
Pravin B Shelar	26ad0b8358	openvswitch: Fix key serialization. Fix typo where mask is used rather than key. Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.") Reported-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-14 20:20:40 -08:00
Tejun Heo	f09068276c	net: use %pb[l] to print bitmaps including cpumasks and nodemasks printk and friends can now format bitmaps using '%pb[l]'. cpumask and nodemask also provide cpumask_pr_args() and nodemask_pr_args() respectively which can be used to generate the two printf arguments necessary to format the specified cpu/nodemask. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-13 21:21:38 -08:00
Chuck Lever	813b00d63f	SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock Other code that accesses rq_bc_pa_list holds xprt->bc_pa_lock. xprt_complete_bc_request() should do the same. Fixes: `2ea24497a1` ("SUNRPC: RPC callbacks may be split . . .") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-13 17:41:10 -05:00
Eric Dumazet	ba34e6d9d3	tcp: make sure skb is not shared before using skb_get() IPv6 can keep a copy of SYN message using skb_get() in tcp_v6_conn_request() so that caller wont free the skb when calling kfree_skb() later. Therefore TCP fast open has to clone the skb it is queuing in child->sk_receive_queue, as all skbs consumed from receive_queue are freed using __kfree_skb() (ie assuming skb->users == 1) Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Fixes: `5b7ed0892f` ("tcp: move fastopen functions to tcp_fastopen.c") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-13 07:11:40 -08:00
Vladimir Davydov	f48b80a5e2	memcg: cleanup static keys decrement Move memcg_socket_limit_enabled decrement to tcp_destroy_cgroup (called from memcg_destroy_kmem -> mem_cgroup_sockets_destroy) and zap a bunch of wrapper functions. Although this patch moves static keys decrement from __mem_cgroup_free to mem_cgroup_css_free, it does not introduce any functional changes, because the keys are incremented on setting the limit (tcp or kmem), which can only happen after successful mem_cgroup_css_online. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Glauber Costa <glommer@parallels.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David S. Miller <davem@davemloft.net> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-12 18:54:10 -08:00
Linus Torvalds	61845143fe	Merge branch 'for-3.20' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "The main change is the pNFS block server support from Christoph, which allows an NFS client connected to shared disk to do block IO to the shared disk in place of NFS reads and writes. This also requires xfs patches, which should arrive soon through the xfs tree, barring unexpected problems. Support for other filesystems is also possible if there's interest. Thanks also to Chuck Lever for continuing work to get NFS/RDMA into shape" * 'for-3.20' of git://linux-nfs.org/~bfields/linux: (32 commits) nfsd: default NFSv4.2 to on nfsd: pNFS block layout driver exportfs: add methods for block layout exports nfsd: add trace events nfsd: update documentation for pNFS support nfsd: implement pNFS layout recalls nfsd: implement pNFS operations nfsd: make find_any_file available outside nfs4state.c nfsd: make find/get/put file available outside nfs4state.c nfsd: make lookup/alloc/unhash_stid available outside nfs4state.c nfsd: add fh_fsid_match helper nfsd: move nfsd_fh_match to nfsfh.h fs: add FL_LAYOUT lease type fs: track fl_owner for leases nfs: add LAYOUT_TYPE_MAX enum value nfsd: factor out a helper to decode nfstime4 values sunrpc/lockd: fix references to the BKL nfsd: fix year-2038 nfs4 state problem svcrdma: Handle additional inline content svcrdma: Move read list XDR round-up logic ...	2015-02-12 10:39:41 -08:00
Geert Uytterhoeven	9672723973	bridge: netfilter: Move sysctl-specific error code inside #ifdef If CONFIG_SYSCTL=n: net/bridge/br_netfilter.c: In function ‘br_netfilter_init’: net/bridge/br_netfilter.c:996: warning: label ‘err1’ defined but not used Move the label and the code after it inside the existing #ifdef to get rid of the warning. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-12 08:44:46 -08:00
Jan Stancek	4762fb9804	ipv6: fix possible deadlock in ip6_fl_purge / ip6_fl_gc Use spin_lock_bh in ip6_fl_purge() to prevent following potentially deadlock scenario between ip6_fl_purge() and ip6_fl_gc() timer. ================================= [ INFO: inconsistent lock state ] 3.19.0 #1 Not tainted --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (ip6_fl_lock){+.?...}, at: [<ffffffff8171155d>] ip6_fl_gc+0x2d/0x180 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ee9a0>] __lock_acquire+0x4a0/0x10b0 [<ffffffff810efd54>] lock_acquire+0xc4/0x2b0 [<ffffffff81751d2d>] _raw_spin_lock+0x3d/0x80 [<ffffffff81711798>] ip6_flowlabel_net_exit+0x28/0x110 [<ffffffff815f9759>] ops_exit_list.isra.1+0x39/0x60 [<ffffffff815fa320>] cleanup_net+0x100/0x1e0 [<ffffffff810ad80a>] process_one_work+0x20a/0x830 [<ffffffff810adf4b>] worker_thread+0x11b/0x460 [<ffffffff810b42f4>] kthread+0x104/0x120 [<ffffffff81752bfc>] ret_from_fork+0x7c/0xb0 irq event stamp: 84640 hardirqs last enabled at (84640): [<ffffffff81752080>] _raw_spin_unlock_irq+0x30/0x50 hardirqs last disabled at (84639): [<ffffffff81751eff>] _raw_spin_lock_irq+0x1f/0x80 softirqs last enabled at (84628): [<ffffffff81091ad1>] _local_bh_enable+0x21/0x50 softirqs last disabled at (84629): [<ffffffff81093b7d>] irq_exit+0x12d/0x150 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(ip6_fl_lock); <Interrupt> lock(ip6_fl_lock); * DEADLOCK * Signed-off-by: Jan Stancek <jstancek@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-12 07:13:03 -08:00
huaibin Wang	ac37e2515c	xfrm: release dst_orig in case of error in xfrm_lookup() dst_orig should be released on error. Function like __xfrm_route_forward() expects that behavior. Since a recent commit, xfrm_lookup() may also be called by xfrm_lookup_route(), which expects the opposite. Let's introduce a new flag (XFRM_LOOKUP_KEEP_DST_REF) to tell what should be done in case of error. Fixes: f92ee61982d("xfrm: Generate blackhole routes only from route lookup functions") Signed-off-by: huaibin Wang <huaibin.wang@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-02-12 07:10:56 +01:00
Linus Torvalds	8cc748aa76	Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer updates from James Morris: "Highlights: - Smack adds secmark support for Netfilter - /proc/keys is now mandatory if CONFIG_KEYS=y - TPM gets its own device class - Added TPM 2.0 support - Smack file hook rework (all Smack users should review this!)" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits) cipso: don't use IPCB() to locate the CIPSO IP option SELinux: fix error code in policydb_init() selinux: add security in-core xattr support for pstore and debugfs selinux: quiet the filesystem labeling behavior message selinux: Remove unused function avc_sidcmp() ima: /proc/keys is now mandatory Smack: Repair netfilter dependency X.509: silence asn1 compiler debug output X.509: shut up about included cert for silent build KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y MAINTAINERS: email update tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device smack: fix possible use after frees in task_security() callers smack: Add missing logging in bidirectional UDS connect check Smack: secmark support for netfilter Smack: Rework file hooks tpm: fix format string error in tpm-chip.c char/tpm/tpm_crb: fix build error smack: Fix a bidirectional UDS connect check typo smack: introduce a special case for tmpfs in smack_d_instantiate() ...	2015-02-11 20:25:11 -08:00
Linus Torvalds	59d53737a8	Merge branch 'akpm' (patches from Andrew) Merge second set of updates from Andrew Morton: "More of MM" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (83 commits) mm/nommu.c: fix arithmetic overflow in __vm_enough_memory() mm/mmap.c: fix arithmetic overflow in __vm_enough_memory() vmstat: Reduce time interval to stat update on idle cpu mm/page_owner.c: remove unnecessary stack_trace field Documentation/filesystems/proc.txt: describe /proc/<pid>/map_files mm: incorporate read-only pages into transparent huge pages vmstat: do not use deferrable delayed work for vmstat_update mm: more aggressive page stealing for UNMOVABLE allocations mm: always steal split buddies in fallback allocations mm: when stealing freepages, also take pages created by splitting buddy page mincore: apply page table walker on do_mincore() mm: /proc/pid/clear_refs: avoid split_huge_page() mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP) mempolicy: apply page table walker on queue_pages_range() arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma() memcg: cleanup preparation for page table walk numa_maps: remove numa_maps->vma numa_maps: fix typo in gather_hugetbl_stats pagemap: use walk->vma instead of calling find_vma() clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk() ...	2015-02-11 18:23:28 -08:00
Linus Torvalds	6f83e5bd3e	NFS client updates for Linux 3.20 Highlights incluse: Features: - Removing the forced serialisation of open()/close() calls in NFSv4.x (x>0) makes for a significant performance improvement in metadata intensive workloads. - Full support for the pNFS "flexible files" layout type - Further RPC/RDMA client improvements from Chuck Bugfixes: - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, - Stable fix: Ensure we reference the inode for return-on-close in delegreturn - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to the same source address/port combination during a disconnect/reconnect event. This is a requirement imposed by most NFSv3 server duplicate reply cache implementations. Optimisations: - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT Other: - Add Anna Schumaker as co-maintainer for the NFS client -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU2swgAAoJEGcL54qWCgDyCWoP/1bxN8PesqaiwsBm3fsEqcra WZtMirDIpJYpHwgysdv9t5otBQrb7GrLlNyGZ9NBOVNakifoyj2tHe+/XGDx7Qny iYxXam0QdyjLU+bi4QoG4bdFncwQ/NmC6fqoG0rc25Il96Oggnc6LeSwL6Koc3CD QitRLLi/PaU5qtuaV80+tYMJiqZbpBdVjB+xfSpu7rhyWzm1QNdEeQYor5CozzMi 6cRJuvHgjoZ1xriCWdxQHjqOiEaKNLwfm3uZ3XVaaUAIDhStXugdhIihj3J6Wi7k MKNuY+AKJiy3yOdFfhYALyq+TPundDbYoM9x1foigjgP8zxXVfIU3VS6l33TSlzX zH+/lcnXAHFWjFYoAijG1gv1H+OYcTuDlKaYAShQ/cOkTfWFrmlWv+pZs3SSkmPY 4Aeu97YYOkB5ZZ7wTWKksQMeAu/LYNRSA3h+ANvEIR+NLlTSQTcaChlvBmS0IY5D qMmko1Xgmsxv+B8UeIY7PLfGBGrUdFho1JiDTfL8Xk7fGOfM7iBtMeaMAfdyOSUq AMqH9EDUUOWaFDggO2iisLtMCY6kJ0iFGKRTwzR38jAqm3bjWaIDitUqshNrNbC+ mbwvAVxn0IFSCJGFsVd3kD2rTLGDElZ25GLFW9JMalarE6nlLG7e4p65g209Q9bT HYKiyinJJM2Zji07kmG/ =c47U -----END PGP SIGNATURE----- Merge tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights incluse: Features: - Removing the forced serialisation of open()/close() calls in NFSv4.x (x>0) makes for a significant performance improvement in metadata intensive workloads. - Full support for the pNFS "flexible files" layout type - Further RPC/RDMA client improvements from Chuck Bugfixes: - Stable fix: NFSv4.1 backchannel calls blocking operations with !TASK_RUNNING - Stable fix: pnfs_generic_pg_init_read/write can be called with lseg == NULL - Stable fix: Fix an Oopsable condition when nsm_mon_unmon is called as part of the namespace cleanup, - Stable fix: Ensure we reference the inode for return-on-close in delegreturn - Use SO_REUSEPORT to ensure that NFSv3 TCP connections can rebind to the same source address/port combination during a disconnect/ reconnect event. This is a requirement imposed by most NFSv3 server duplicate reply cache implementations. Optimisations: - Ask for no NFSv4.1 delegations on OPEN if using O_DIRECT Other: - Add Anna Schumaker as co-maintainer for the NFS client" * tag 'nfs-for-3.20-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (119 commits) SUNRPC: Cleanup to remove xs_tcp_close() pnfs: delete an unintended goto pnfs/flexfiles: Do not dprintk after the free SUNRPC: Fix stupid typo in xs_sock_set_reuseport SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG SUNRPC: Handle connection reset more efficiently. SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT SUNRPC: Remove TCP socket linger code SUNRPC: Remove TCP client connection reset hack SUNRPC: TCP/UDP always close the old socket before reconnecting SUNRPC: Add helpers to prevent socket create from racing SUNRPC: Ensure xs_reset_transport() resets the close connection flags SUNRPC: Do not clear the source port in xs_reset_transport SUNRPC: Handle EADDRINUSE on connect SUNRPC: Set SO_REUSEPORT socket option for TCP connections NFSv4.1: Fix pnfs_put_lseg races NFSv4.1: pnfs_send_layoutreturn should use GFP_NOFS ...	2015-02-11 17:14:54 -08:00
Andrea Arcangeli	7e33912849	mm: gup: use get_user_pages_unlocked This allows those get_user_pages calls to pass FAULT_FLAG_ALLOW_RETRY to the page fault in order to release the mmap_sem during the I/O. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Peter Feiner <pfeiner@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:05 -08:00
Johannes Weiner	650c5e5654	mm: page_counter: pull "-1" handling out of page_counter_memparse() The unified hierarchy interface for memory cgroups will no longer use "-1" to mean maximum possible resource value. In preparation for this, make the string an argument and let the caller supply it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-02-11 17:06:02 -08:00
Tom Herbert	fe881ef11c	gue: Use checksum partial with remote checksum offload Change remote checksum handling to set checksum partial as default behavior. Added an iflink parameter to configure not using checksum partial (calling csum_partial to update checksum). Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 15:12:13 -08:00
Tom Herbert	15e2396d4e	net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload This patch adds infrastructure so that remote checksum offload can set CHECKSUM_PARTIAL instead of calling csum_partial and writing the modfied checksum field. Add skb_remcsum_adjust_partial function to set an skb for using CHECKSUM_PARTIAL with remote checksum offload. Changed skb_remcsum_process and skb_gro_remcsum_process to take a boolean argument to indicate if checksum partial can be set or the checksum needs to be modified using the normal algorithm. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 15:12:12 -08:00
Tom Herbert	6db93ea13b	udp: Set SKB_GSO_UDP_TUNNEL* in UDP GRO path Properly set GSO types and skb->encapsulation in the UDP tunnel GRO complete so that packets are properly represented for GSO. This sets SKB_GSO_UDP_TUNNEL or SKB_GSO_UDP_TUNNEL_CSUM depending on whether non-zero checksums were received, and sets SKB_GSO_TUNNEL_REMCSUM if the remote checksum option was processed. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 15:12:10 -08:00
Tom Herbert	26c4f7da3e	net: Fix remcsum in GRO path to not change packet Remote checksum offload processing is currently the same for both the GRO and non-GRO path. When the remote checksum offload option is encountered, the checksum field referred to is modified in the packet. So in the GRO case, the packet is modified in the GRO path and then the operation is skipped when the packet goes through the normal path based on skb->remcsum_offload. There is a problem in that the packet may be modified in the GRO path, but then forwarded off host still containing the remote checksum option. A remote host will again perform RCO but now the checksum verification will fail since GRO RCO already modified the checksum. To fix this, we ensure that GRO restores a packet to it's original state before returning. In this model, when GRO processes a remote checksum option it still changes the checksum per the algorithm but on return from lower layer processing the checksum is restored to its original value. In this patch we add define gro_remcsum structure which is passed to skb_gro_remcsum_process to save offset and delta for the checksum being changed. After lower layer processing, skb_gro_remcsum_cleanup is called to restore the checksum before returning from GRO. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 15:12:09 -08:00
Geert Uytterhoeven	13101602c4	openvswitch: Add missing initialization in validate_and_copy_set_tun() net/openvswitch/flow_netlink.c: In function ‘validate_and_copy_set_tun’: net/openvswitch/flow_netlink.c:1749: warning: ‘err’ may be used uninitialized in this function If ipv4_tun_from_nlattr() returns a different positive value than OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS, err will be uninitialized, and validate_and_copy_set_tun() may return an undefined value instead of a zero success indicator. Initialize err to zero to fix this. Fixes: `1dd144cf5b` ("openvswitch: Support VXLAN Group Policy extension") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 14:40:15 -08:00
Pravin B Shelar	b35725a285	openvswitch: Reset key metadata for packet execution. Userspace packet execute command pass down flow key for given packet. But userspace can skip some parameter with zero value. Therefore kernel needs to initialize key metadata to zero. Fixes: `0714812134` ("openvswitch: Eliminate memset() from flow_extract.") Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 14:40:15 -08:00
Sowmini Varadhan	80ad0d4a7a	rds: rds_cong_queue_updates needs to defer the congestion update transmission When the RDS transport is TCP, we cannot inline the call to rds_send_xmit from rds_cong_queue_update because (a) we are already holding the sock_lock in the recv path, and will deadlock when tcp_setsockopt/tcp_sendmsg try to get the sock lock (b) cong_queue_update does an irqsave on the rds_cong_lock, and this will trigger warnings (for a good reason) from functions called out of sock_lock. This patch reverts the change introduced by `2fa57129d` ("RDS: Bypass workqueue when queueing cong updates"). The patch has been verified for both RDS/TCP as well as RDS/RDMA to ensure that there are not regressions for either transport: - for verification of RDS/TCP a client-server unit-test was used, with the server blocked in gdb and thus unable to drain its rcvbuf, eventually triggering a RDS congestion update. - for RDS/RDMA, the standard IB regression tests were used Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 14:35:44 -08:00
Vlad Yasevich	bf250a1fa7	ipv6: Partial checksum only UDP packets ip6_append_data is used by other protocols and some of them can't be partially checksummed. Only partially checksum UDP protocol. Fixes: `32dce968dd` (ipv6: Allow for partial checksums on non-ufo packets) Reported-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 14:32:08 -08:00
David S. Miller	4a3046d68a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains two small Netfilter updates for your net-next tree, they are: 1) Add ebtables support to nft_compat, from Arturo Borrero. 2) Fix missing validation of the SET_ID attribute in the lookup expressions, from Patrick McHardy. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-11 14:27:24 -08:00
Paul Moore	04f81f0154	cipso: don't use IPCB() to locate the CIPSO IP option Using the IPCB() macro to get the IPv4 options is convenient, but unfortunately NetLabel often needs to examine the CIPSO option outside of the scope of the IP layer in the stack. While historically IPCB() worked above the IP layer, due to the inclusion of the inet_skb_param struct at the head of the {tcp,udp}_skb_cb structs, recent commit `971f10ec` ("tcp: better TCP_SKB_CB layout to reduce cache line misses") reordered the tcp_skb_cb struct and invalidated this IPCB() trick. This patch fixes the problem by creating a new function, cipso_v4_optptr(), which locates the CIPSO option inside the IP header without calling IPCB(). Unfortunately, this isn't as fast as a simple lookup so some additional tweaks were made to limit the use of this new function. Cc: <stable@vger.kernel.org> # 3.18 Reported-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Tested-by: Casey Schaufler <casey@schaufler-ca.com>	2015-02-11 14:46:37 -05:00
Trond Myklebust	c627d31ba0	SUNRPC: Cleanup to remove xs_tcp_close() xs_tcp_close() is now just a call to xs_tcp_shutdown(), so remove it, and replace the entry in xs_tcp_ops. Suggested-by: Anna Schumaker <anna.schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-10 11:06:04 -05:00
Fan Du	b0f9ca53cb	ipv4: Namespecify TCP PMTU mechanism Packetization Layer Path MTU Discovery works separately beside Path MTU Discovery at IP level, different net namespace has various requirements on which one to chose, e.g., a virutalized container instance would require TCP PMTU to probe an usable effective mtu for underlying tunnel, while the host would employ classical ICMP based PMTU to function. Hence making TCP PMTU mechanism per net namespace to decouple two functionality. Furthermore the probe base MSS should also be configured separately for each namespace. Signed-off-by: Fan Du <fan.du@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 18:45:00 -08:00
David S. Miller	2573beec56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:35:57 -08:00
Trond Myklebust	402e23b4ed	SUNRPC: Fix stupid typo in xs_sock_set_reuseport Yes, kernel_setsockopt() hates you for using a char argument. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 17:31:02 -05:00
Yuchung Cheng	531c94a968	tcp: don't include Fast Open option in SYN-ACK on pure SYN-data If a server has enabled Fast Open and it receives a pure SYN-data packet (without a Fast Open option), it won't accept the data but it incorrectly returns a SYN-ACK with a Fast Open cookie and also increments the SNMP stat LINUX_MIB_TCPFASTOPENPASSIVEFAIL. This patch makes the server include a Fast Open cookie in SYN-ACK only if the SYN has some Fast Open option (i.e., when client requests or presents a cookie). Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:26:55 -08:00
Thomas Graf	fd3137cd33	openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set This avoids setting TUNNEL_VXLAN_OPT for VXLAN frames which don't have any GBP metadata set. It is not invalid to set it but unnecessary. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:25:52 -08:00
Vlad Yasevich	8381eacf5c	ipv6: Make __ipv6_select_ident static Make __ipv6_select_ident() static as it isn't used outside the file. Fixes: `0508c07f5e` (ipv6: Select fragment id during UFO segmentation if not set.) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:21:03 -08:00
Vlad Yasevich	51f30770e5	ipv6: Fix fragment id assignment on LE arches. Recent commit: `0508c07f5e` Author: Vlad Yasevich <vyasevich@gmail.com> Date: Tue Feb 3 16:36:15 2015 -0500 ipv6: Select fragment id during UFO segmentation if not set. Introduced a bug on LE in how ipv6 fragment id is assigned. This was cought by nightly sparce check: Resolve the following sparce error: net/ipv6/output_core.c:57:38: sparse: incorrect type in assignment (different base types) net/ipv6/output_core.c:57:38: expected restricted __be32 [usertype] ip6_frag_id net/ipv6/output_core.c:57:38: got unsigned int [unsigned] [assigned] [usertype] id Fixes: `0508c07f5e` (ipv6: Select fragment id during UFO segmentation if not set.) Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:21:03 -08:00
Toshiaki Makita	25d3b493a5	bridge: Fix inability to add non-vlan fdb entry Bridge's default_pvid adds a vid by default, by which we cannot add a non-vlan fdb entry by default, because br_fdb_add() adds fdb entries for all vlans instead of a non-vlan one when any vlan is configured. # ip link add br0 type bridge # ip link set eth0 master br0 # bridge fdb add 12:34:56:78:90:ab dev eth0 master temp # bridge fdb show brport eth0 \| grep 12:34:56:78:90:ab 12:34:56:78:90:ab dev eth0 vlan 1 static We expect a non-vlan fdb entry as well as vlan 1: 12:34:56:78:90:ab dev eth0 static To fix this, we need to insert a non-vlan fdb entry if vlan is not specified, even when any vlan is configured. Fixes: `5be5a2df40` ("bridge: Add filtering support for default_pvid") Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:19:04 -08:00
Andrew Lunn	b750f5b427	net: dsa: Remove redundant phy_attach() dsa_slave_phy_setup() finds the phy for the port via device tree and using of_phy_connect(), or it uses the fall back of taking a phy from the switch internal mdio bus and calling phy_connect_direct(). Either way, if a phy is found, phy_attach_direct() is called to attach the phy to the slave device. In dsa_slave_create(), a second call to phy_attach() is made. This results in the warning "PHY already attached". Remove this second, redundant attaching of the phy. Signed-off-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 14:05:42 -08:00
Richard Alpe	941787b829	tipc: remove tipc_snprintf tipc_snprintf() was heavily utilized by the old netlink API which no longer exists (now netlink compat). In this patch we swap tipc_snprintf() to the identical scnprintf() in the only remaining occurrence. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	22ae7cff50	tipc: nl compat add noop and remove legacy nl framework Add TIPC_CMD_NOOP to compat layer and remove the old framework. All legacy nl commands are now converted to the compat layer in netlink_compat.c. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5a81a6377b	tipc: convert legacy nl stats show to nl compat Convert TIPC_CMD_SHOW_STATS to compat layer. This command does not have any counterpart in the new API, meaning it now solely exists as a function in the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	3c26181c5b	tipc: convert legacy nl net id get to nl compat Convert TIPC_CMD_GET_NETID to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	964f9501c1	tipc: convert legacy nl net id set to nl compat Convert TIPC_CMD_SET_NETID to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	d7cc75d3cb	tipc: convert legacy nl node addr set to nl compat Convert TIPC_CMD_SET_NODE_ADDR to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	4b28cb581d	tipc: convert legacy nl node dump to nl compat Convert TIPC_CMD_GET_NODES to compat dumpit and remove global node counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:49 -08:00
Richard Alpe	5bfc335a63	tipc: convert legacy nl media dump to nl compat Convert TIPC_CMD_GET_MEDIA_NAMES to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	487d2a3a13	tipc: convert legacy nl socket dump to nl compat Convert socket (port) listing to compat dumpit call. If a socket (port) has publications a second dumpit call is issued to collect them and format then into the legacy buffer before continuing to process the sockets (ports). Command converted in this patch: TIPC_CMD_SHOW_PORTS Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	44a8ae94fd	tipc: convert legacy nl name table dump to nl compat Add functionality for printing a dump header and convert TIPC_CMD_SHOW_NAME_TABLE to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	1817877b3c	tipc: convert legacy nl link stat reset to nl compat Convert TIPC_CMD_RESET_LINK_STATS to compat doit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	37e2d4843f	tipc: convert legacy nl link prop set to nl compat Convert setting of link proprieties to compat doit calls. Commands converted in this patch: TIPC_CMD_SET_LINK_TOL TIPC_CMD_SET_LINK_PRI TIPC_CMD_SET_LINK_WINDOW Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	357ebdbfca	tipc: convert legacy nl link dump to nl compat Convert TIPC_CMD_GET_LINKS to compat dumpit and remove global link counter solely used by the legacy API. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:48 -08:00
Richard Alpe	f2b3b2d4cc	tipc: convert legacy nl link stat to nl compat Add functionality for safely appending string data to a TLV without keeping write count in the caller. Convert TIPC_CMD_SHOW_LINK_STATS to compat dumpit. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	9ab154658a	tipc: convert legacy nl bearer enable/disable to nl compat Introduce a framework for transcoding legacy nl action into actions (.doit) calls from the new nl API. This is done by converting the incoming TLV data into netlink data with nested netlink attributes. Unfortunately due to the randomness of the legacy API we can't do this generically so each legacy netlink command requires a specific transcoding recipe. In this case for bearer enable and bearer disable. Convert TIPC_CMD_ENABLE_BEARER and TIPC_CMD_DISABLE_BEARER into doit compat calls. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	d0796d1ef6	tipc: convert legacy nl bearer dump to nl compat Introduce a framework for dumping netlink data from the new netlink API and formatting it to the old legacy API format. This is done by looping the dump data and calling a format handler for each entity, in this case a bearer. We dump until either all data is dumped or we reach the limited buffer size of the legacy API. Remember, the legacy API doesn't scale. In this commit we convert TIPC_CMD_GET_BEARER_NAMES to use the compat layer. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Richard Alpe	bfb3e5dd8d	tipc: move and rename the legacy nl api to "nl compat" The new netlink API is no longer "v2" but rather the standard API and the legacy API is now "nl compat". We split them into separate start/stop and put them in different files in order to further distinguish them. Signed-off-by: Richard Alpe <richard.alpe@ericsson.com> Reviewed-by: Erik Hugne <erik.hugne@ericsson.com> Reviewed-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-09 13:20:47 -08:00
Trond Myklebust	54c0987492	SUNRPC: Define xs_tcp_fin_timeout only if CONFIG_SUNRPC_DEBUG Now that the linger code is gone, the xs_tcp_fin_timeout variable has no real function. Keep it for now, since it is part of the /proc interface, but only define it if that /proc interface is enabled. Suggested-by: Anna Schumaker <Anna.Schumaker@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:27:45 -05:00
Trond Myklebust	b70ae915e4	SUNRPC: Handle connection reset more efficiently. If the connection reset is due to an active call on our side, then the state change is sometimes not reported. Catch those instances using xs_error_report() instead. Also remove the xs_tcp_shutdown() call in xs_tcp_send_request() as the change in behaviour makes it redundant. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:27:42 -05:00
Trond Myklebust	9e2b9f3776	SUNRPC: Remove the redundant XPRT_CONNECTION_CLOSE flag Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:26:06 -05:00
Trond Myklebust	caf4ccd4e8	SUNRPC: Make xs_tcp_close() do a socket shutdown rather than a sock_release Use of socket shutdown() means that we monitor the shutdown process through the xs_tcp_state_change() callback, so it is preferable to a full close in all cases unless we're destroying the transport. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 11:20:44 -05:00
Trond Myklebust	0efeac261c	SUNRPC: Ensure xs_tcp_shutdown() requests a full close of the connection The previous behaviour left the connection half-open in order to try to scrape the last replies from the socket. Now that we have more reliable reconnection, change the behaviour to close down the socket faster. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:31:11 -05:00
Trond Myklebust	505936f59f	SUNRPC: Cleanup to remove remaining uses of XPRT_CONNECTION_ABORT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:20:40 -05:00
Trond Myklebust	9cbc94fb06	SUNRPC: Remove TCP socket linger code Now that we no longer use the partial shutdown code when closing the socket, we no longer need to worry about the TCP linger2 state. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-09 09:20:40 -05:00
Steffen Klassert	044a832a77	xfrm: Fix local error reporting crash with interfamily tunnels We set the outer mode protocol too early. As a result, the local error handler might dispatch to the wrong address family and report the error to a wrong socket type. We fix this by setting the outer protocol to the skb after we accessed the inner mode for the last time, right before we do the atcual encapsulation where we switch finally to the outer mode. Reported-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Tested-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-02-09 11:14:17 +01:00
Eric Dumazet	93c1af6ca9	net:rfs: adjust table size checking Make sure root user does not try something stupid. Also make sure mask field in struct rps_sock_flow_table does not share a cache line with the potentially often dirtied flow table. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: `567e4b7973` ("net: rfs: add hash collision detection") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 21:54:09 -08:00
Alexey Andriyanov	dd3733b3e7	ipvs: fix inability to remove a mixed-family RS The current code prevents any operation with a mixed-family dest unless IP_VS_CONN_F_TUNNEL flag is set. The problem is that it's impossible for the client to follow this rule, because ip_vs_genl_parse_dest does not even read the destination conn_flags when cmd = IPVS_CMD_DEL_DEST (need_full_dest = 0). Also, not every client can pass this flag when removing a dest. ipvsadm, for example, does not support the "-i" command line option together with the "-d" option. This change disables any checks for mixed-family on IPVS_CMD_DEL_DEST command. Signed-off-by: Alexey Andriyanov <alan@al-an.info> Fixes: `bc18d37f67` ("ipvs: Allow heterogeneous pools now that we support them") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-02-09 14:13:30 +09:00
Trond Myklebust	4efdd92c92	SUNRPC: Remove TCP client connection reset hack Instead we rely on SO_REUSEPORT to provide the reconnection semantics that we need for NFSv2/v3. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:30 -05:00
Trond Myklebust	de84d89030	SUNRPC: TCP/UDP always close the old socket before reconnecting It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:30 -05:00
Trond Myklebust	718ba5b873	SUNRPC: Add helpers to prevent socket create from racing The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:29 -05:00
Trond Myklebust	6cc7e90836	SUNRPC: Ensure xs_reset_transport() resets the close connection flags Otherwise, we may end up looping. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:28 -05:00
Trond Myklebust	76698b2358	SUNRPC: Do not clear the source port in xs_reset_transport Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:28 -05:00
Trond Myklebust	3913c78c3a	SUNRPC: Handle EADDRINUSE on connect Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 21:47:27 -05:00
Eric Dumazet	567e4b7973	net: rfs: add hash collision detection Receive Flow Steering is a nice solution but suffers from hash collisions when a mix of connected and unconnected traffic is received on the host, when flow hash table is populated. Also, clearing flow in inet_release() makes RFS not very good for short lived flows, as many packets can follow close(). (FIN , ACK packets, ...) This patch extends the information stored into global hash table to not only include cpu number, but upper part of the hash value. I use a 32bit value, and dynamically split it in two parts. For host with less than 64 possible cpus, this gives 6 bits for the cpu number, and 26 (32-6) bits for the upper part of the hash. Since hash bucket selection use low order bits of the hash, we have a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big enough. If the hash found in flow table does not match, we fallback to RPS (if it is enabled for the rxqueue). This means that a packet for an non connected flow can avoid the IPI through a unrelated/victim CPU. This also means we no longer have to clear the table at socket close time, and this helps short lived flows performance. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:53:57 -08:00
Sabrina Dubroca	3e97fa7059	gre/ipip: use be16 variants of netlink functions encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead of nla_{get,put}_u16. Fixes the sparse warnings: warning: incorrect type in assignment (different base types) expected restricted __be32 [addressable] [usertype] o_key got restricted __be16 [addressable] [usertype] i_flags warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] sport got unsigned short warning: incorrect type in assignment (different base types) expected restricted __be16 [usertype] dport got unsigned short warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] sport warning: incorrect type in argument 3 (different base types) expected unsigned short [unsigned] [usertype] value got restricted __be16 [usertype] dport Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 16:28:06 -08:00
Trond Myklebust	4dda9c8a5e	SUNRPC: Set SO_REUSEPORT socket option for TCP connections When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-02-08 18:52:11 -05:00
Jon Paul Maloy	51a00daf73	tipc: fix bug in socket reception function In commit `c637c10355` ("tipc: resolve race problem at unicast message reception") we introduced a time limit for how long the function tipc_sk_eneque() would be allowed to execute its loop. Unfortunately, the test for when this limit is passed was put in the wrong place, resulting in a lost message when the test is true. We fix this by moving the test to before we dequeue the next buffer from the input queue. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:09:25 -08:00
Michael Büsch	662f5533c4	rt6_probe_deferred: Do not depend on struct ordering rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred. But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work. This works, because struct work_struct is the first element of struct __rt6_probe_work. Change it to kfree struct __rt6_probe_work to not implicitly depend on struct work_struct being the first element. This does not affect the generated code. Signed-off-by: Michael Buesch <m@bues.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 13:00:43 -08:00
Trond Myklebust	bc3203cdca	NFS: RDMA Client Sparse Fixes This patch fixes a sparse warning in the initial submission. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJU09WdAAoJENfLVL+wpUDr2HMP/2rvjkncrlbumSUlkMMUPenZ PUM6dDAnvvhkRj/YrhmJ94v9SPb4BsR0ay4apn5aqmoEHqEy/LfAsoy1TlcspRO6 DwIKD8wHQ9RkpIVaEt1IGhApWlWK3R5FqoDTSMW9VLq72QuCsnbX3EXWN9B+hgT7 UVmcsGe0/5zd9v8RXzWJjyabOsx7rx7gNuuAyULO3RudNMzEpldMf7oHq0RwndOt akxSSvRSfMFyfJscfI3F2IJsbXBST+ZzzJ0oRXXWn0RfiWR+Z438Nk9HDuOfGBFv LACoELeJxtBEFAjBv6GnBILdSxRi6dTZPpAEzvNR45SJAk4c1gGDqgUbyIYXqeYM k8alEMCS+eKvDe7DL6N1Nlxq6oc5pAt7BYYpRzCz+2dd7gfHIOwurBnZwFgxN/rK GhtXCaBqqVr4lmu/lbgWjVFN/Jt6TGZ6+FqQhDM70CmJGBWbOEJNE6H50R7SiFpv 7UapXg9iknCpjexkZS3cyAB0Qu31qvIK64ZzsPhVafmSYh9ed5hlT2WSwphfDki5 ZQOktuPFtVkOI55Rz+2EpNNYe9nRuL5wBqD4Na3uUiEWgbnDLRmx+sDyzcfPujTN trFTwUO082SFK0yZVusZ04vlqjfzxSd4aYSeC5Zzgt0CYJCZd7z0tFrdLpyO/fDn PLfvcJyFRBBbaPgiwBHg =3fRY -----END PGP SIGNATURE----- Merge tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma NFS: RDMA Client Sparse Fixes This patch fixes a sparse warning in the initial submission. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> * tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma: xprtrdma: Address sparse complaint in rpcr_to_rdmar()	2015-02-08 10:37:34 -05:00
Neal Cardwell	4fb17a6091	tcp: mitigate ACK loops for connections as tcp_timewait_sock Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is represented by a tcp_timewait_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:13 -08:00
Neal Cardwell	f2b2c582e8	tcp: mitigate ACK loops for connections as tcp_sock Ensure that in state ESTABLISHED, where the connection is represented by a tcp_sock, we rate limit dupacks in response to incoming packets (a) with TCP timestamps that fail PAWS checks, or (b) with sequence numbers or ACK numbers that are out of the acceptable window. We do not send a dupack in response to out-of-window packets if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a dupack in response to an out-of-window packet. There is already a similar (although global) rate-limiting mechanism for "challenge ACKs". When deciding whether to send a challence ACK, we first consult the new per-connection rate limit, and then the global rate limit. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Neal Cardwell	a9b2c06dbe	tcp: mitigate ACK loops for connections as tcp_request_sock In the SYN_RECV state, where the TCP connection is represented by tcp_request_sock, we now rate-limit SYNACKs in response to a client's retransmitted SYNs: we do not send a SYNACK in response to client SYN if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we last sent a SYNACK in response to a client's retransmitted SYN. This allows the vast majority of legitimate client connections to proceed unimpeded, even for the most aggressive platforms, iOS and MacOS, which actually retransmit SYNs 1-second intervals for several times in a row. They use SYN RTO timeouts following the progression: 1,1,1,1,1,2,4,8,16,32. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Neal Cardwell	032ee42369	tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks Helpers for mitigating ACK loops by rate-limiting dupacks sent in response to incoming out-of-window packets. This patch includes: - rate-limiting logic - sysctl to control how often we allow dupacks to out-of-window packets - SNMP counter for cases where we rate-limited our dupack sending The rate-limiting logic in this patch decides to not send dupacks in response to out-of-window segments if (a) they are SYNs or pure ACKs and (b) the remote endpoint is sending them faster than the configured rate limit. We rate-limit our responses rather than blocking them entirely or resetting the connection, because legitimate connections can rely on dupacks in response to some out-of-window segments. For example, zero window probes are typically sent with a sequence number that is below the current window, and ZWPs thus expect to thus elicit a dupack in response. We allow dupacks in response to TCP segments with data, because these may be spurious retransmissions for which the remote endpoint wants to receive DSACKs. This is safe because segments with data can't realistically be part of ACK loops, which by their nature consist of each side sending pure/data-less ACKs to each other. The dupack interval is controlled by a new sysctl knob, tcp_invalid_ratelimit, given in milliseconds, in case an administrator needs to dial this upward in the face of a high-rate DoS attack. The name and units are chosen to be analogous to the existing analogous knob for ICMP, icmp_ratelimit. The default value for tcp_invalid_ratelimit is 500ms, which allows at most one such dupack per 500ms. This is chosen to be 2x faster than the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule 2.4). We allow the extra 2x factor because network delay variations can cause packets sent at 1 second intervals to be compressed and arrive much closer. Reported-by: Avery Fay <avery@mixpanel.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 01:03:12 -08:00
Pravin B Shelar	ca539345f8	openvswitch: Initialize unmasked key and uid len Flow alloc needs to initialize unmasked key pointer. Otherwise it can crash kernel trying to free random unmasked-key pointer. general protection fault: 0000 [#1] SMP 3.19.0-rc6-net-next+ #457 Hardware name: Supermicro X7DWU/X7DWU, BIOS 1.1 04/30/2008 RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196 Call Trace: [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch] [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch] [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch] [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch] [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95 [<ffffffff81399898>] ? genl_rcv+0x18/0x37 [<ffffffff813998a7>] genl_rcv+0x27/0x37 [<ffffffff81399033>] netlink_unicast+0x103/0x191 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310 [<ffffffff811007ad>] ? might_fault+0x50/0xa0 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a [<ffffffff8135c799>] sock_sendmsg+0xb/0xd [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16 [<ffffffff81411852>] system_call_fastpath+0x12/0x17 Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.") CC: Joe Stringer <joestringer@nicira.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-08 00:51:14 -08:00
Roopa Prabhu	1fd0bddb61	bridge: add missing bridge port check for offloads This patch fixes a missing bridge port check caught by smatch. setlink/dellink of attributes like vlans can come for a bridge device and there is no need to offload those today. So, this patch adds a bridge port check. (In these cases however, the BRIDGE_SELF flags will always be set and we may not hit a problem with the current code). smatch complaint: The patch 68e331c785b8: "bridge: offload bridge port attributes to switch asic if feature flag set" from Jan 29, 2015, leads to the following Smatch complaint: net/bridge/br_netlink.c:552 br_setlink() error: we previously assumed 'p' could be null (see line 518) net/bridge/br_netlink.c 517 518 if (p && protinfo) { ^ Check for NULL. Reported-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:49:47 -08:00
Eric Dumazet	91e83133e7	net: use netif_rx_ni() from process context Hotpluging a cpu might be rare, yet we have to use proper handlers when taking over packets found in backlog queues. dev_cpu_callback() runs from process context, thus we should call netif_rx_ni() to properly invoke softirq handler. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:43:52 -08:00
Sowmini Varadhan	d0a47d3272	rds: Make rds_message_copy_from_user() return 0 on success. Commit `083735f4b0` ("rds: switch rds_message_copy_from_user() to iov_iter") breaks rds_message_copy_from_user() semantics on success, and causes it to return nbytes copied, when it should return 0. This commit fixes that bug. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:41:56 -08:00
Rasmus Villemoes	11ac11999b	net: rds: Remove repeated function names from debug output The macro rdsdebug is defined as pr_debug("%s(): " fmt, __func__ , ##args) Hence it doesn't make sense to include the name of the calling function explicitly in the format string passed to rdsdebug. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:41:56 -08:00
Jarno Rajahalme	83d2b9ba1a	net: openvswitch: Support masked set actions. OVS userspace already probes the openvswitch kernel module for OVS_ACTION_ATTR_SET_MASKED support. This patch adds the kernel module implementation of masked set actions. The existing set action sets many fields at once. When only a subset of the IP header fields, for example, should be modified, all the IP fields need to be exact matched so that the other field values can be copied to the set action. A masked set action allows modification of an arbitrary subset of the supported header bits without requiring the rest to be matched. Masked set action is now supported for all writeable key types, except for the tunnel key. The set tunnel action is an exception as any input tunnel info is cleared before action processing starts, so there is no tunnel info to mask. The kernel module converts all (non-tunnel) set actions to masked set actions. This makes action processing more uniform, and results in less branching and duplicating the action processing code. When returning actions to userspace, the fully masked set actions are converted back to normal set actions. We use a kernel internal action code to be able to tell the userspace provided and converted masked set actions apart. Signed-off-by: Jarno Rajahalme <jrajahalme@nicira.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:40:17 -08:00
David S. Miller	3c09e92fb6	NFC: 3.20 second pull request This is the second NFC pull request for 3.20. It brings: - NCI NFCEE (NFC Execution Environment, typically an embedded or external secure element) discovery and enabling/disabling support. In order to communicate with an NFCEE, we also added NCI's logical connections support to the NCI stack. - HCI over NCI protocol support. Some secure elements only understand HCI and thus we need to send them HCI frames when they're part of an NCI chipset. - NFC_EVT_TRANSACTION userspace API addition. Whenever an application running on a secure element needs to notify its host counterpart, we send an NFC_EVENT_SE_TRANSACTION event to userspace through the NFC netlink socket. - Secure element and HCI transaction event support for the st21nfcb chipset. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU06ozAAoJEIqAPN1PVmxKdtkP/2CeQ0+xJnJWgyh4xjkVxfPb wPYrz3cHeHdnVX36mnIdRoQZRjxh/Ef/vgK8RsXohcldzHA9D8GjxRpj0GrkVGQ9 xoDkiHsFDdsZHeezSrlnXNnObvZrkeMsmnUDJxfwLtcX1nzFKzBgW2GaDiNjUazC r5Ip4YIfoCaVLq5MSqYRqJ6uhplXd/ePdtPoo54L/E81y4ptQi8pUsQBy4K3GrFn FbXoemcymhi9AVBGl+OlppZ93t6bdOV1D5tcOwpytAbfa3jp2oUnJPZay7EoWruR aj8l5v3P+SSphAJwaGSbny0L83tTS7sq+N4QG+VxxdMkw2YjpmxgN3AGguNBtPGG SUThpZV6QhrkAOnwzP2BJnh68WSU1eJrGvk8zUWW0SanA8k2jaHO/VzXCA3/ZkC4 IFcXl4Jr6R3iUxDFgS/6MB5E9EGedzFTacsWoHVyZPtaDAQp4WVdAIRQ4QKgJCLw 1yRmCeVunTsrs6O0aenU1xHinPlnx+tkuiNh4H64APQYTz54WwXH1/S04OL1SkqI mTzUvFgWqJvSIOuU19g0HBw+xZ/9vvX8LLkm9kSTbe7tcmFH5CI7ZCN/hNDHvMSd sjNYOyag/SAHJ01tTKcSPAgWO49bHWPodI04zP0lK0gMLjKvRknVMdTgIDfu49HN 2da9E23iBmNNgG79iVHN =8caD -----END PGP SIGNATURE----- Merge tag 'nfc-next-3.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next NFC: 3.20 second pull request This is the second NFC pull request for 3.20. It brings: - NCI NFCEE (NFC Execution Environment, typically an embedded or external secure element) discovery and enabling/disabling support. In order to communicate with an NFCEE, we also added NCI's logical connections support to the NCI stack. - HCI over NCI protocol support. Some secure elements only understand HCI and thus we need to send them HCI frames when they're part of an NCI chipset. - NFC_EVT_TRANSACTION userspace API addition. Whenever an application running on a secure element needs to notify its host counterpart, we send an NFC_EVENT_SE_TRANSACTION event to userspace through the NFC netlink socket. - Secure element and HCI transaction event support for the st21nfcb chipset. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:22:25 -08:00
Daniel Borkmann	364d5716a7	rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY ifla_vf_policy[] is wrong in advertising its individual member types as NLA_BINARY since .type = NLA_BINARY in combination with .len declares the len member as max attribute length [0, len]. The issue is that when do_setvfinfo() is being called to set up a VF through ndo handler, we could set corrupted data if the attribute length is less than the size of the related structure itself. The intent is exactly the opposite, namely to make sure to pass at least data of minimum size of len. Fixes: `ebc08a6f47` ("rtnetlink: Add VF config code to rtnetlink") Cc: Mitch Williams <mitch.a.williams@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:13:10 -08:00
Tobias Waldekranz	e04449fcf2	dsa: correctly determine the number of switches in a system The number of connected switches was sourced from the number of children to the DSA node, change it to the number of available children, skipping any disabled switches. Fixes: `5e95329b70` ("dsa: add device tree bindings to register DSA switches") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-07 22:07:37 -08:00
Daniel Borkmann	11b1f8288d	ipv6: addrconf: add missing validate_link_af handler We still need a validate_link_af() handler with an appropriate nla policy, similarly as we have in IPv4 case, otherwise size validations are not being done properly in that case. Fixes: `f53adae4ea` ("net: ipv6: add tokenized interface identifier support") Fixes: `bc91b0f07a` ("ipv6: addrconf: implement address generation modes") Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-06 12:51:33 -08:00
Hajime Tazaki	cd3bafc73d	xfrm6: Fix a offset value for network header in _decode_session6 When a network-layer header has multiple IPv6 extension headers, then offset for mobility header goes wrong. This regression breaks an xfrm policy lookup for a particular receive packet. Binding update packets of Mobile IPv6 are all discarded without this fix. Fixes: `de3b7a06df` ("xfrm6: Fix transport header offset in _decode_session6.") Signed-off-by: Hajime Tazaki <tazaki@sfc.wide.ad.jp> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>	2015-02-06 07:00:32 +01:00
Jon Paul Maloy	cb1b728096	tipc: eliminate race condition at multicast reception In a previous commit in this series we resolved a race problem during unicast message reception. Here, we resolve the same problem at multicast reception. We apply the same technique: an input queue serializing the delivery of arriving buffers. The main difference is that here we do it in two steps. First, the broadcast link feeds arriving buffers into the tail of an arrival queue, which head is consumed at the socket level, and where destination lookup is performed. Second, if the lookup is successful, the resulting buffer clones are fed into a second queue, the input queue. This queue is consumed at reception in the socket just like in the unicast case. Both queues are protected by the same lock, -the one of the input queue. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:03 -08:00
Jon Paul Maloy	3c724acdd5	tipc: simplify socket multicast reception The structure 'tipc_port_list' is used to collect port numbers representing multicast destination socket on a receiving node. The list is not based on a standard linked list, and is in reality optimized for the uncommon case that there are more than one multicast destinations per node. This makes the list handling unecessarily complex, and as a consequence, even the socket multicast reception becomes more complex. In this commit, we replace 'tipc_port_list' with a new 'struct tipc_plist', which is based on a standard list. We give the new list stack (push/pop) semantics, someting that simplifies the implementation of the function tipc_sk_mcast_rcv(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:03 -08:00
Jon Paul Maloy	708ac32cb5	tipc: simplify connection abort notifications when links break The new input message queue in struct tipc_link can be used for delivering connection abort messages to subscribing sockets. This makes it possible to simplify the code for such cases. This commit removes the temporary list in tipc_node_unlock() used for transforming abort subscriptions to messages. Instead, the abort messages are now created at the moment of lost contact, and then added to the last failed link's generic input queue for delivery to the sockets concerned. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	c637c10355	tipc: resolve race problem at unicast message reception TIPC handles message cardinality and sequencing at the link layer, before passing messages upwards to the destination sockets. During the upcall from link to socket no locks are held. It is therefore possible, and we see it happen occasionally, that messages arriving in different threads and delivered in sequence still bypass each other before they reach the destination socket. This must not happen, since it violates the sequentiality guarantee. We solve this by adding a new input buffer queue to the link structure. Arriving messages are added safely to the tail of that queue by the link, while the head of the queue is consumed, also safely, by the receiving socket. Sequentiality is secured per socket by only allowing buffers to be dequeued inside the socket lock. Since there may be multiple simultaneous readers of the queue, we use a 'filter' parameter to reduce the risk that they peek the same buffer from the queue, hence also reducing the risk of contention on the receiving socket locks. This solves the sequentiality problem, and seems to cause no measurable performance degradation. A nice side effect of this change is that lock handling in the functions tipc_rcv() and tipc_bcast_rcv() now becomes uniform, something that will enable future simplifications of those functions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	94153e36e7	tipc: use existing sk_write_queue for outgoing packet chain The list for outgoing traffic buffers from a socket is currently allocated on the stack. This forces us to initialize the queue for each sent message, something costing extra CPU cycles in the most critical data path. Later in this series we will introduce a new safe input buffer queue, something that would force us to initialize even the spinlock of the outgoing queue. A closer analysis reveals that the queue always is filled and emptied within the same lock_sock() session. It is therefore safe to use a queue aggregated in the socket itself for this purpose. Since there already exists a queue for this in struct sock, sk_write_queue, we introduce use of that queue in this commit. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	e3a77561e7	tipc: split up function tipc_msg_eval() The function tipc_msg_eval() is in reality doing two related, but different tasks. First it tries to find a new destination for named messages, in case there was no first lookup, or if the first lookup failed. Second, it does what its name suggests, evaluating the validity of the message and its destination, and returning an appropriate error code depending on the result. This is confusing, and in this commit we choose to break it up into two functions. A new function, tipc_msg_lookup_dest(), first attempts to find a new destination, if the message is of the right type. If this lookup fails, or if the message should not be subject to a second lookup, the already existing tipc_msg_reverse() is called. This function performs prepares the message for rejection, if applicable. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	d570d86497	tipc: enqueue arrived buffers in socket in separate function The code for enqueuing arriving buffers in the function tipc_sk_rcv() contains long code lines and currently goes to two indentation levels. As a cosmetic preparaton for the next commits, we break it out into a separate function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:02 -08:00
Jon Paul Maloy	1186adf7df	tipc: simplify message forwarding and rejection in socket layer Despite recent improvements, the handling of error codes and return values at reception of messages in the socket layer is still confusing. In this commit, we try to make it more comprehensible. First, we separate between the return values coming from the functions called by tipc_sk_rcv(), -those are TIPC specific error codes, and the return values returned by tipc_sk_rcv() itself. Second, we don't use the returned TIPC error code as indication for whether a buffer should be forwarded/rejected or not; instead we use the buffer pointer passed along with filter_msg(). This separation is necessary because we sometimes want to forward messages even when there is no error (i.e., protocol messages and successfully secondary looked up data messages). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:01 -08:00
Jon Paul Maloy	c5898636c4	tipc: reduce usage of context info in socket and link The most common usage of namespace information is when we fetch the own node addess from the net structure. This leads to a lot of passing around of a parameter of type 'struct net *' between functions just to make them able to obtain this address. However, in many cases this is unnecessary. The own node address is readily available as a member of both struct tipc_sock and tipc_link, and can be fetched from there instead. The fact that the vast majority of functions in socket.c and link.c anyway are maintaining a pointer to their respective base structures makes this option even more compelling. In this commit, we introduce the inline functions tsk_own_node() and link_own_node() to make it easy for functions to fetch the node address from those structs instead of having to pass along and dereference the namespace struct. In particular, we make calls to the msg_xx() functions in msg.{h,c} context independent by directly passing them the own node address as parameter when needed. Those functions should be regarded as leaves in the code dependency tree, and it is hence desirable to keep them namspace unaware. Apart from a potential positive effect on cache behavior, these changes make it easier to introduce the changes that will follow later in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 16:00:01 -08:00
Sabrina Dubroca	7744b5f369	pktgen: fix UDP checksum computation This patch fixes two issues in UDP checksum computation in pktgen. First, the pseudo-header uses the source and destination IP addresses. Currently, the ports are used for IPv4. Second, the UDP checksum covers both header and data. So we need to generate the data earlier (move pktgen_finalize_skb up), and compute the checksum for UDP header + data. Fixes: `c26bf4a513` ("pktgen: Add UDPCSUM flag to support UDP checksums") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 15:41:34 -08:00
Erik Kline	c58da4c659	net: ipv6: allow explicitly choosing optimistic addresses RFC 4429 ("Optimistic DAD") states that optimistic addresses should be treated as deprecated addresses. From section 2.1: Unless noted otherwise, components of the IPv6 protocol stack should treat addresses in the Optimistic state equivalently to those in the Deprecated state, indicating that the address is available for use but should not be used if another suitable address is available. Optimistic addresses are indeed avoided when other addresses are available (i.e. at source address selection time), but they have not heretofore been available for things like explicit bind() and sendmsg() with struct in6_pktinfo, etc. This change makes optimistic addresses treated more like deprecated addresses than tentative ones. Signed-off-by: Erik Kline <ek@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 15:37:41 -08:00
Miroslav Urbanek	233c96fc07	flowcache: Fix kernel panic in flow_cache_flush_task flow_cache_flush_task references a structure member flow_cache_gc_work where it should reference flow_cache_flush_task instead. Kernel panic occurs on kernels using IPsec during XFRM garbage collection. The garbage collection interval can be shortened using the following sysctl settings: net.ipv4.xfrm4_gc_thresh=4 net.ipv6.xfrm6_gc_thresh=4 With the default settings, our productions servers crash approximately once a week. With the settings above, they crash immediately. Fixes: `ca925cf153` ("flowcache: Make flow cache name space aware") Reported-by: Tomáš Charvát <tc@excello.cz> Tested-by: Jan Hejl <jh@excello.cz> Signed-off-by: Miroslav Urbanek <mu@miroslavurbanek.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 14:38:53 -08:00
David S. Miller	6e03f896b5	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/vxlan.c drivers/vhost/net.c include/linux/if_vlan.h net/core/dev.c The net/core/dev.c conflict was the overlap of one commit marking an existing function static whilst another was adding a new function. In the include/linux/if_vlan.h case, the type used for a local variable was changed in 'net', whereas the function got rewritten to fix a stacked vlan bug in 'net-next'. In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next' overlapped with an endainness fix for VHOST 1.0 in 'net'. In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter in 'net-next' whereas in 'net' there was a bug fix to pass in the correct network namespace pointer in calls to this function. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 14:33:28 -08:00
Chuck Lever	b625a61698	xprtrdma: Address sparse complaint in rpcr_to_rdmar() With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__": linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect type in initializer (different base types) linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted __be32 [usertype] buffer linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: got unsigned int [usertype] rq_buffer As far as I can tell this is a false positive. Reported-by: kbuild-all@01.org Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-02-05 15:38:29 -05:00
Eric Dumazet	a409caecb2	sit: fix some __be16/u16 mismatches Fixes following sparse warnings : net/ipv6/sit.c:1509:32: warning: incorrect type in assignment (different base types) net/ipv6/sit.c:1509:32: expected restricted __be16 [usertype] sport net/ipv6/sit.c:1509:32: got unsigned short net/ipv6/sit.c:1514:32: warning: incorrect type in assignment (different base types) net/ipv6/sit.c:1514:32: expected restricted __be16 [usertype] dport net/ipv6/sit.c:1514:32: got unsigned short net/ipv6/sit.c:1711:38: warning: incorrect type in argument 3 (different base types) net/ipv6/sit.c:1711:38: expected unsigned short [unsigned] [usertype] value net/ipv6/sit.c:1711:38: got restricted __be16 [usertype] sport net/ipv6/sit.c:1713:38: warning: incorrect type in argument 3 (different base types) net/ipv6/sit.c:1713:38: expected unsigned short [unsigned] [usertype] value net/ipv6/sit.c:1713:38: got restricted __be16 [usertype] dport Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 00:43:14 -08:00
Eric Dumazet	2ce1ee1780	net: remove some sparse warnings netdev_adjacent_add_links() and netdev_adjacent_del_links() are static. queue->qdisc has __rcu annotation, need to use RCU_INIT_POINTER() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 00:41:17 -08:00
Sabrina Dubroca	d1e158e2d7	ip6_gre: fix endianness errors in ip6gre_err info is in network byte order, change it back to host byte order before use. In particular, the current code sets the MTU of the tunnel to a wrong (too big) value. Fixes: `c12b395a46` ("gre: Support GRE over IPv6") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-02-05 00:33:10 -08:00

1 2 3 4 5 ...

36535 Commits