OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Vivien Didelot	1525c386a1	net: switchdev: change fdb addr for a byte array The address in the switchdev_obj_fdb structure is currently represented as a pointer. Replacing it for a 6-byte array allows switchdev to carry addresses directly read from hardware registers, not stored by the switch chip driver (as in Rocker). Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:48:08 -07:00
Masanari Iida	4933d85c51	net:wimax: Fix doucble word "the the" in networking.xml This patch fix a double word "the the" in Documentation/DocBook/networking.xml and Documentation/DocBook/networking/API-Wimax-report-rfkill-sw.html. These files are generated from comment in source, so I had to fix the typo in net/wimax/io-rfkill.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-09 22:43:52 -07:00
Tom Herbert	10e4ea7514	net: Fix race condition in store_rps_map There is a race condition in store_rps_map that allows jump label count in rps_needed to go below zero. This can happen when concurrently attempting to set and a clear map. Scenario: 1. rps_needed count is zero 2. New map is assigned by setting thread, but rps_needed count _not_ yet incremented (rps_needed count still zero) 2. Map is cleared by second thread, old_map set to that just assigned 3. Second thread performs static_key_slow_dec, rps_needed count now goes negative Fix is to increment or decrement rps_needed under the spinlock. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 15:56:56 -07:00
David S. Miller	95a428f3be	Included changes: - prevent DAT from replying on behalf of local clients and confuse L2 bridges - fix crash on double list removal of TT objects (tt_local_entry) - fix crash due to missing NULL checks - initialize bw values for new GWs objects to prevent memory leak -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJVwdK+AAoJEOb/4TMchkvfO9UP/i8vIBZ0XabuYlG9r5GnYh8s 1WvO7jbe4zYhe3WrfAibcfcxMT6v0aI6hc/VRCb36zTNT3VJeh76+fKGwLC4ZU08 UdefhzCavV/gg2fX0b67aA/SsC6I9FrZAG8VOt5GXhMWqjZYSwbxMKLrNe/QRJks hUzfcQbmFjIMSTK2Zf4/MV0AHzzPKbHixE8t3to2DmlHhZ6lCWV5l42nv/H/s8js uxNI06rnQBY7nY06rZEualznEZpN7AFtFeIw9zXDZ0PgkIPrsjShwu/ZyzVK9wg1 4ld1JQjf1uKbrOqvdY4bAx8EUUQQ3n68513IXuKLgAdPcBGYVFo8atbk9K24QPue KzO5tqg7ELf6EO1/haH/VLOTCrMxiC1C62CuNxQn1rDIUAIUrtX9Rqt2f2bS75KG tGNJTUfQfpNCY4IlqaUE8cdzABiY6cLIf68gSmXl/FfBzVNWDc1Totsi/B6C2WEM PYjXnmPc97tz0/iWRpvOMd1DxoMXnM0odH4gifUt3Osl0hKMbelhnGKACCvHM8jK 3EOOz1yZIEoX9OkwtFnCji+web884ASUJjnyrNJgbyOY3juj2B0hSrWgK++trUMS +5oZMkFtHaar0l+JRc2Xhf/GqtkY50lGlwcPlOukDyKJ+RvPTRsShIYhns75Z2us dWqAlJEdf3kRZuDri8O3 =JfaB -----END PGP SIGNATURE----- Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge Antonio Quartulli says: ==================== Included changes: - prevent DAT from replying on behalf of local clients and confuse L2 bridges - fix crash on double list removal of TT objects (tt_local_entry) - fix crash due to missing NULL checks - initialize bw values for new GWs objects to prevent memory leak ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 15:51:10 -07:00
Wenyu Zhang	e05176a328	openvswitch: Make 100 percents packets sampled when sampling rate is 1. When sampling rate is 1, the sampling probability is UINT32_MAX. The packet should be sampled even the prandom32() generate the number of UINT32_MAX. And none packet need be sampled when the probability is 0. Signed-off-by: Wenyu Zhang <wenyuz@vmware.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 12:00:38 -07:00
Alexei Starovoitov	da8b43c0e1	vxlan: combine VXLAN_FLOWBASED into VXLAN_COLLECT_METADATA IFLA_VXLAN_FLOWBASED is useless without IFLA_VXLAN_COLLECT_METADATA, so combine them into single IFLA_VXLAN_COLLECT_METADATA flag. 'flowbased' doesn't convey real meaning of the vxlan tunnel mode. This mode can be used by routing, tc+bpf and ovs. Only ovs is strictly flow based, so 'collect metadata' is a better name for this tunnel mode. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:46:34 -07:00
Sowmini Varadhan	467fa15356	RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns. Register pernet subsys init/stop functions that will set up and tear down per-net RDS-TCP listen endpoints. Unregister pernet subusys functions on 'modprobe -r' to clean up these end points. Enable keepalive on both accept and connect socket endpoints. The keepalive timer expiration will ensure that client socket endpoints will be removed as appropriate from the netns when an interface is removed from a namespace. Register a device notifier callback that will clean up all sockets (and thus avoid the need to wait for keepalive timeout) when the loopback device is unregistered from the netns indicating that the netns is getting deleted. Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:29:58 -07:00
Sowmini Varadhan	d5a8ac28a7	RDS-TCP: Make RDS-TCP work correctly when it is set up in a netns other than init_net Open the sockets calling sock_create_kern() with the correct struct net pointer, and use that struct net pointer when verifying the address passed to rds_bind(). Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-07 11:29:57 -07:00
Andreas Schultz	3499abb249	netfilter: nfacct: per network namespace support - Move the nfnl_acct_list into the network namespace, initialize and destroy it per namespace - Keep track of refcnt on nfacct objects, the old logic does not longer work with a per namespace list - Adjust xt_nfacct to pass the namespace when registring objects Signed-off-by: Andreas Schultz <aschultz@tpip.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:50:56 +02:00
Pablo Neira Ayuso	d2168e849e	netfilter: nft_limit: add per-byte limiting This patch adds a new NFTA_LIMIT_TYPE netlink attribute to indicate the type of limiting. Contrary to per-packet limiting, the cost is calculated from the packet path since this depends on the packet length. The burst attribute indicates the number of bytes in which the rate can be exceeded. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:50:50 +02:00
Pablo Neira Ayuso	8bdf362642	netfilter: nft_limit: constant token cost per packet The cost per packet can be calculated from the control plane path since this doesn't ever change. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:50 +02:00
Pablo Neira Ayuso	3e87baafa4	netfilter: nft_limit: add burst parameter This patch adds the burst parameter. This burst indicates the number of packets that can exceed the limit. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:50 +02:00
Pablo Neira Ayuso	f8d3a6bc76	netfilter: nft_limit: factor out shared code with per-byte limiting This patch prepares the introduction of per-byte limiting. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:49 +02:00
Pablo Neira Ayuso	dba27ec1bc	netfilter: nft_limit: convert to token-based limiting at nanosecond granularity Rework the limit expression to use a token-based limiting approach that refills the bucket gradually. The tokens are calculated at nanosecond granularity instead jiffies to improve precision. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:49 +02:00
Pablo Neira Ayuso	09e4e42a00	netfilter: nft_limit: rename to nft_limit_pkts To prepare introduction of bytes ratelimit support. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:49 +02:00
Pablo Neira Ayuso	d877f07112	netfilter: nf_tables: add nft_dup expression This new expression uses the nf_dup engine to clone packets to a given gateway. Unlike xt_TEE, we use an index to indicate output interface which should be fine at this stage. Moreover, change to the preemtion-safe this_cpu_read(nf_skb_duplicated) from nf_dup_ipv{4,6} to silence a lockdep splat. Based on the original tee expression from Arturo Borrero Gonzalez, although this patch has diverted quite a bit from this initial effort due to the change to support maps. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:49 +02:00
Pablo Neira Ayuso	bbde9fc182	netfilter: factor out packet duplication for IPv4/IPv6 Extracted from the xtables TEE target. This creates two new modules for IPv4 and IPv6 that are shared between the TEE target and the new nf_tables dup expressions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:49 +02:00
Pablo Neira Ayuso	24b7811fa5	netfilter: xt_TEE: get rid of WITH_CONNTRACK definition Use IS_ENABLED(CONFIG_NF_CONNTRACK) instead. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:48 +02:00
Pablo Neira Ayuso	0c45e76960	netfilter: nft_counter: convert it to use per-cpu counters This patch converts the existing seqlock to per-cpu counters. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-07 11:49:48 +02:00
Nikolay Aleksandrov	786c2077ec	bridge: netlink: account for the IFLA_BRPORT_PROXYARP_WIFI attribute size and policy The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `842a9ae08a` ("bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 23:54:42 -07:00
Nikolay Aleksandrov	355b9f9df1	bridge: netlink: account for the IFLA_BRPORT_PROXYARP attribute size and policy The attribute size wasn't accounted for in the get_slave_size() callback (br_port_get_slave_size) when it was introduced, so fix it now. Also add a policy entry for it in br_port_policy. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `958501163d` ("bridge: Add support for IEEE 802.11 Proxy ARP") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 23:54:42 -07:00
Oleg Nesterov	7ba8bd75dd	net: pktgen: don't abuse current->state in pktgen_thread_worker() Commit `1fbe4b46ca` "net: pktgen: kill the Wait for kthread_stop code in pktgen_thread_worker()" removed (in particular) the final __set_current_state(TASK_RUNNING) and I didn't notice the previous set_current_state(TASK_INTERRUPTIBLE). This triggers the warning in __might_sleep() after return. Afaics, we can simply remove both set_current_state()'s, and we could do this a long ago right after `ef87979c27` "pktgen: better scheduler friendliness" which changed pktgen_thread_worker() to use wait_event_interruptible_timeout(). Reported-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 23:52:44 -07:00
Roopa Prabhu	3dcb615e68	af_mpls: add null dev check in find_outdev This patch adds null dev check for the 'cfg->rc_via_table == NEIGH_LINK_TABLE or dev_get_by_index() failed' case Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 22:03:58 -07:00
Dan Carpenter	5a9348b54d	mpls: small cleanup in inet/inet6_fib_lookup_dev() We recently changed this code from returning NULL to returning ERR_PTR. There are some left over NULL assignments which we can remove. We can preserve the error code from ip_route_output() instead of always returning -ENODEV. Also these functions use a mix of gotos and direct returns. There is no cleanup necessary so I changed the gotos to direct returns. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 21:56:55 -07:00
Herbert Xu	a0a2a66024	net: Fix skb_set_peeked use-after-free bug The commit `738ac1ebb9` ("net: Clone skb before setting peeked flag") introduced a use-after-free bug in skb_recv_datagram. This is because skb_set_peeked may create a new skb and free the existing one. As it stands the caller will continue to use the old freed skb. This patch fixes it by making skb_set_peeked return the new skb (or the old one if unchanged). Fixes: `738ac1ebb9` ("net: Clone skb before setting peeked flag") Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Brenden Blanco <bblanco@plumgrid.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-06 21:55:47 -07:00
Jakub Pawlowski	cb92205bad	Bluetooth: fix MGMT_EV_NEW_LONG_TERM_KEY event This patch fixes how MGMT_EV_NEW_LONG_TERM_KEY event is build. Right now val vield is filled with only 1 byte, instead of whole value. This bug was introduced in commit `1fc62c526a` ("Bluetooth: Fix exposing full value of shortened LTKs") Before that patch, if you paired with device using bluetoothd using simple pairing, and then restarted bluetoothd, you would be able to re-connect, but device would fail to establish encryption and would terminate connection. After this patch connecting after bluetoothd restart works fine. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-08-06 16:36:03 +02:00
Devesh Sharma	d0f36c46de	xprtrdma: take HCA driver refcount at client This is a rework of the following patch sent almost a year back: http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html In presence of active mount if someone tries to rmmod vendor-driver, the command remains stuck forever waiting for destruction of all rdma-cm-id. in worst case client can crash during shutdown with active mounts. The existing code assumes that ia->ri_id->device cannot change during the lifetime of a transport. xprtrdma do not have support for DEVICE_REMOVAL event either. Lifting that assumption and adding support for DEVICE_REMOVAL event is a long chain of work, and is in plan. The community decided that preventing the hang right now is more important than waiting for architectural changes. Thus, this patch introduces a temporary workaround to acquire HCA driver module reference count during the mount of a nfs-rdma mount point. Signed-off-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:22:56 -04:00
Chuck Lever	860477d1ff	xprtrdma: Count RDMA_NOMSG type calls RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG calls so administrators can tell if they happen to be used more than expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:28 -04:00
Chuck Lever	763f7e4e4b	xprtrdma: Clean up xprt_rdma_print_stats() checkpatch.pl complained about the seq_printf() format string split across lines and the use of %Lu. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:28 -04:00
Chuck Lever	2fcc213a18	xprtrdma: Fix large NFS SYMLINK calls Repair how rpcrdma_marshal_req() chooses which RDMA message type to use for large non-WRITE operations so that it picks RDMA_NOMSG in the correct situations, and sets up the marshaling logic to SEND only the RPC/RDMA header. Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS server XDR decoder for NFSv2 SYMLINK does not handle having the pathname argument arrive in a separate buffer. The decoder could be fixed, but this is simpler and RDMA_NOMSG can be used in a variety of other situations. Ensure that the Linux client continues to use "RDMA_MSG + read list" when sending large NFSv3 SYMLINK requests, which is more efficient than using RDMA_NOMSG. Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG + read list" just like NFSv3 (see Section 5 of RFC 5667). Before, these did not work at all. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:28 -04:00
Chuck Lever	677eb17e94	xprtrdma: Fix XDR tail buffer marshalling Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. The client is sending the trailing GETATTR at the same Position as the preceding WRITE data payload. Whether or not RFC 5667 allows the GETATTR to appear in a read chunk, RFC 5666 requires that these two separate RPC arguments appear at two distinct Positions. It can also be argued that the GETATTR operation is not bulk data, and therefore RFC 5667 forbids its appearance in a read chunk at all. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit `a97c331f9a` ("svcrdma: Handle additional inline content"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:28 -04:00
Chuck Lever	33943b2974	xprtrdma: Don't provide a reply chunk when expecting a short reply Currently Linux always offers a reply chunk, even when the reply can be sent inline (ie. is smaller than 1KB). On the client, registering a memory region can be expensive. A server may choose not to use the reply chunk, wasting the cost of the registration. This is a change only for RPC replies smaller than 1KB which the server constructs in the RPC reply send buffer. Because the elements of the reply must be XDR encoded, a copy-free data transfer has no benefit in this case. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:27 -04:00
Chuck Lever	02eb57d8f4	xprtrdma: Always provide a write list when sending NFS READ The client has been setting up a reply chunk for NFS READs that are smaller than the inline threshold. This is not efficient: both the server and client CPUs have to copy the reply's data payload into and out of the memory region that is then transferred via RDMA. Using the write list, the data payload is moved by the device and no extra data copying is necessary. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:27 -04:00
Chuck Lever	5457ced0b5	xprtrdma: Account for RPC/RDMA header size when deciding to inline When the size of the RPC message is near the inline threshold (1KB), the client would allow messages to be sent that were a few bytes too large. When marshaling RPC/RDMA requests, ensure the combined size of RPC/RDMA header and RPC header do not exceed the inline threshold. Endpoints typically reject RPC/RDMA messages that exceed the size of their receive buffers. The two server implementations I test with (Linux and Solaris) use receive buffers that are larger than the client’s inline threshold. Thus so far this has been benign, observed only by code inspection. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:27 -04:00
Chuck Lever	b3221d6a53	xprtrdma: Remove logic that constructs RDMA_MSGP type calls RDMA_MSGP type calls insert a zero pad in the middle of the RPC message to align the RPC request's data payload to the server's alignment preferences. A server can then "page flip" the payload into place to avoid a data copy in certain circumstances. However: 1. The client has to have a priori knowledge of the server's preferred alignment 2. Requests eligible for RDMA_MSGP are requests that are small enough to have been sent inline, and convey a data payload at the _end_ of the RPC message Today 1. is done with a sysctl, and is a global setting that is copied during mount. Linux does not support CCP to query the server's preferences (RFC 5666, Section 6). A small-ish NFSv3 WRITE might use RDMA_MSGP, but no NFSv4 compound fits bullet 2. Thus the Linux client currently leaves RDMA_MSGP disabled. The Linux server handles RDMA_MSGP, but does not use any special page flipping, so it confers no benefit. Clean up the marshaling code by removing the logic that constructs RDMA_MSGP type calls. This also reduces the maximum send iovec size from four to just two elements. /proc/sys/sunrpc/rdma_inline_write_padding is a kernel API, and thus is left in place. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:27 -04:00
Chuck Lever	d1ed857e57	xprtrdma: Clean up rpcrdma_ia_open() Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which is different for each registration method, to the .ro_open functions. This is refactoring only. No behavior change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:27 -04:00
Chuck Lever	e531dcabec	xprtrdma: Remove last ib_reg_phys_mr() call site All HCA providers have an ib_get_dma_mr() verb. Thus rpcrdma_ia_open() will either grab the device's local_dma_key if one is available, or it will call ib_get_dma_mr(). If ib_get_dma_mr() fails, rpcrdma_ia_open() fails and no transport is created. Therefore execution never reaches the ib_reg_phys_mr() call site in rpcrdma_register_internal(), so it can be removed. The remaining logic in rpcrdma_{de}register_internal() is folded into rpcrdma_{alloc,free}_regbuf(). This is clean up only. No behavior change is expected. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:26 -04:00
Chuck Lever	d231093023	xprtrdma: Don't fall back to PHYSICAL memory registration PHYSICAL memory registration uses a single rkey for all of the client's memory, thus is insecure. It is still useful in some cases for testing. Retain the ability to select PHYSICAL memory registration capability via /proc/sys/sunrpc/rdma_memreg_strategy, but don't fall back to it if the HCA does not support FRWR or FMR. This means amso1100 no longer works out of the box with NFS/RDMA. When using amso1100 HCAs, set the memreg_strategy sysctl to 6 before performing NFS/RDMA mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:26 -04:00
Chuck Lever	864be126fe	xprtrdma: Raise maximum payload size to one megabyte The point of larger rsize and wsize is to reduce the per-byte cost of memory registration and deregistration. Modern HCAs can typically handle a megabyte or more with a single registration operation. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: Sagi Grimberg <sagig@mellanox.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:26 -04:00
Chuck Lever	5231eb9773	xprtrdma: Make xprt_setup_rdma() agnostic to family of server address In particular, recognize when an IPv6 connection is bound. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Devesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>	2015-08-05 16:21:26 -04:00
Joe Stringer	f58e5aa7b8	netfilter: conntrack: Use flags in nf_ct_tmpl_alloc() The flags were ignored for this function when it was introduced. Also fix the style problem in kzalloc. Fixes: `0838aa7fc` (netfilter: fix netns dependencies with conntrack templates) Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-05 10:56:43 +02:00
David S. Miller	9dc20a6496	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next, they are: 1) A couple of cleanups for the netfilter core hook from Eric Biederman. 2) Net namespace hook registration, also from Eric. This adds a dependency with the rtnl_lock. This should be fine by now but we have to keep an eye on this because if we ever get the per-subsys nfnl_lock before rtnl we have may problems in the future. But we have room to remove this in the future by propagating the complexity to the clients, by registering hooks for the init netns functions. 3) Update nf_tables to use the new net namespace hook infrastructure, also from Eric. 4) Three patches to refine and to address problems from the new net namespace hook infrastructure. 5) Switch to alternate jumpstack in xtables iff the packet is reentering. This only applies to a very special case, the TEE target, but Eric Dumazet reports that this is slowing down things for everyone else. So let's only switch to the alternate jumpstack if the tee target is in used through a static key. This batch also comes with offline precalculation of the jumpstack based on the callchain depth. From Florian Westphal. 6) Minimal SCTP multihoming support for our conntrack helper, from Michal Kubecek. 7) Reduce nf_bridge_info per skbuff scratchpad area to 32 bytes, from Florian Westphal. 8) Fix several checkpatch errors in bridge netfilter, from Bernhard Thaler. 9) Get rid of useless debug message in ip6t_REJECT, from Subash Abhinov. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-04 23:57:45 -07:00
Simon Wunderlich	27a4d5efd4	batman-adv: initialize up/down values when adding a gateway Without this initialization, gateways which actually announce up/down bandwidth of 0/0 could be added. If these nodes get purged via _batadv_purge_orig() later, the gw_node structure does not get removed since batadv_gw_node_delete() updates the gw_node with up/down bandwidth of 0/0, and the updating function then discards the change and does not free gw_node. This results in leaking the gw_node structures, which references other structures: gw_node -> orig_node -> orig_node_ifinfo -> hardif. When removing the interface later, the open reference on the hardif may cause hangs with the infamous "unregister_netdevice: waiting for mesh1 to become free. Usage count = 1" message. Signed-off-by: Simon Wunderlich <simon@open-mesh.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2015-08-05 00:31:47 +02:00
Marek Lindner	ef72706a05	batman-adv: protect tt_local_entry from concurrent delete events The tt_local_entry deletion performed in batadv_tt_local_remove() was neither protecting against simultaneous deletes nor checking whether the element was still part of the list before calling hlist_del_rcu(). Replacing the hlist_del_rcu() call with batadv_hash_remove() provides adequate protection via hash spinlocks as well as an is-element-still-in-hash check to avoid 'blind' hash removal. Fixes: `068ee6e204` ("batman-adv: roaming handling mechanism redesign") Reported-by: alfonsname@web.de Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2015-08-05 00:31:47 +02:00
Marek Lindner	354136bcc3	batman-adv: fix kernel crash due to missing NULL checks batadv_softif_vlan_get() may return NULL which has to be verified by the caller. Fixes: `35df3b298f` ("batman-adv: fix TT VLAN inconsistency on VLAN re-add") Reported-by: Ryan Thompson <ryan@eero.com> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2015-08-05 00:31:46 +02:00
Antonio Quartulli	f202a666e9	batman-adv: avoid DAT to mess up LAN state When a node running DAT receives an ARP request from the LAN for the first time, it is likely that this node will request the ARP entry through the distributed ARP table (DAT) in the mesh. Once a DAT reply is received the asking node must check if the MAC address for which the IP address has been asked is local. If it is, the node must drop the ARP reply bceause the client should have replied on its own locally. Forwarding this reply means fooling any L2 bridge (e.g. Ethernet switches) lying between the batman-adv node and the LAN. This happens because the L2 bridge will think that the client sending the ARP reply lies somewhere in the mesh, while this node is sitting in the same LAN. Reported-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch> Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>	2015-08-05 00:31:46 +02:00
Subash Abhinov Kasiviswanathan	a6cd379b4d	netfilter: ip6t_REJECT: Remove debug messages from reject_tg6() Make it similar to reject_tg() in ipt_REJECT. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-08-04 11:12:51 +02:00
Robert Shearman	a6affd24f4	mpls: Use definition for reserved label checks In multiple locations there are checks for whether the label in hand is a reserved label or not using the arbritray value of 16. Factor this out into a #define for better maintainability and for documentation. Signed-off-by: Robert Shearman <rshearma@brocade.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 22:35:00 -07:00
Robert Shearman	0335f5b500	ipv4: apply lwtunnel encap for locally-generated packets lwtunnel encap is applied for forwarded packets, but not for locally-generated packets. This is because the output function is not overridden in __mkroute_output, unlike it is in __mkroute_input. The lwtunnel state is correctly set on the rth through the call to rt_set_nexthop, so all that needs to be done is to override the dst output function to be lwtunnel_output if there is lwtunnel state present and it requires output redirection. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 22:26:14 -07:00
Robert Shearman	abf7c1c540	lwtunnel: set skb protocol and dev In the locally-generated packet path skb->protocol may not be set and this is required for the lwtunnel encap in order to get the lwtstate. This would otherwise have been set by ip_output or ip6_output so set skb->protocol prior to calling the lwtunnel encap function. Additionally set skb->dev in case it is needed further down the transmit path. Signed-off-by: Robert Shearman <rshearma@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 22:26:13 -07:00
Eric Dumazet	10e2eb878f	udp: fix dst races with multicast early demux Multicast dst are not cached. They carry DST_NOCACHE. As mentioned in commit `f886497212` ("ipv4: fix dst race in sk_dst_get()"), these dst need special care before caching them into a socket. Caching them is allowed only if their refcnt was not 0, ie we must use atomic_inc_not_zero() Also, we must use READ_ONCE() to fetch sk->sk_rx_dst, as mentioned in commit `d0c294c53a` ("tcp: prevent fetching dst twice in early demux code") Fixes: `421b3885bf` ("udp: ipv4: Add udp early demux") Tested-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Gregory Hoggarth <Gregory.Hoggarth@alliedtelesis.co.nz> Reported-by: Alex Gartrell <agartrell@fb.com> Cc: Michal Kubeček <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 22:16:50 -07:00
Nikolay Aleksandrov	58da018053	bridge: mdb: fix vlan_enabled access when vlans are not configured Instead of trying to access br->vlan_enabled directly use the provided helper br_vlan_enabled(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 16:20:51 -07:00
Daniel Borkmann	a5c90b29e5	act_bpf: properly support late binding of bpf action to a classifier Since the introduction of the BPF action in `d23b8ad8ab` ("tc: add BPF based action"), late binding was not working as expected. I.e. setting the action part for a classifier only via 'bpf index <num>', where <num> is the index of an existing action, is being rejected by the kernel due to other missing parameters. It doesn't make sense to require these parameters such as BPF opcodes etc, as they are not going to be used anyway: in this case, they're just allocated/parsed and then freed again w/o doing anything meaningful. Instead, parse and verify the remaining parameters after the test on tcf_hash_check(), when we really know that we're dealing with creation of a new action or replacement of an existing one and where late binding is thus irrelevant. After patch, test case is now working: FOO="1,6 0 0 4294967295," tc actions add action bpf bytecode "$FOO" tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action bpf index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 tc filter show dev foo filter protocol all pref 49152 bpf filter protocol all pref 49152 bpf handle 0x1 flowid 1:1 bytecode '1,6 0 0 4294967295' action order 1: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 2 bind 1 Late binding of a BPF action can be useful for preloading maps (e.g. before they hit traffic) in case of eBPF programs, or to share a single eBPF action with multiple classifiers. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 16:05:56 -07:00
Satish Ashok	e44deb2f0c	bridge: mdb: add/del entry on all vlans if vlan_filter is enabled and vid is 0 Before this patch when a vid was not specified, the entry was added with vid 0 which is useless when vlan_filtering is enabled. This patch makes the entry to be added on all configured vlans when vlan filtering is enabled and respectively deleted from all, if the entry vid is 0. This is also closer to the way fdb works with regard to vid 0 and vlan filtering. Example: Setup: $ bridge vlan add vid 256 dev eth4 $ bridge vlan add vid 1024 dev eth4 $ bridge vlan add vid 64 dev eth3 $ bridge vlan add vid 128 dev eth3 $ bridge vlan port vlan ids eth3 1 PVID Egress Untagged 64 128 eth4 1 PVID Egress Untagged 256 1024 $ echo 1 > /sys/class/net/br0/bridge/vlan_filtering Before: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 $ bridge mdb dev br0 port eth3 grp 239.0.0.1 temp After: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 $ bridge mdb dev br0 port eth3 grp 239.0.0.1 temp vid 1 dev br0 port eth3 grp 239.0.0.1 temp vid 128 dev br0 port eth3 grp 239.0.0.1 temp vid 64 Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 15:43:35 -07:00
Dan Carpenter	468b732b6f	rds: fix an integer overflow test in rds_info_getsockopt() "len" is a signed integer. We check that len is not negative, so it goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than INT_MAX so the condition can never be true. I don't know if this is harmful but it seems safe to limit "len" to INT_MAX - 4095. Fixes: `a8c879a7ee` ('RDS: Info and stats') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 15:20:16 -07:00
Toshiaki Makita	6678053092	bridge: Don't segment multiple tagged packets on bridge device Bridge devices don't need to segment multiple tagged packets since thier ports can segment them. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 14:24:50 -07:00
WANG Cong	636dba8e12	act_mirred: avoid calling tcf_hash_release() when binding When we share an action within a filter, the bind refcnt should increase, therefore we should not call tcf_hash_release(). Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 14:13:28 -07:00
Glenn Griffin	3576fd794b	openvswitch: Fix L4 checksum handling when dealing with IP fragments openvswitch modifies the L4 checksum of a packet when modifying the ip address. When an IP packet is fragmented only the first fragment contains an L4 header and checksum. Prior to this change openvswitch would modify all fragments, modifying application data in non-first fragments, causing checksum failures in the reassembled packet. Signed-off-by: Glenn Griffin <ggriffin.kernel@gmail.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-03 14:03:08 -07:00
Daniel Borkmann	ba7591d8b2	ebpf: add skb->hash to offset map for usage in {cls, act}_bpf or filters Add skb->hash to the __sk_buff offset map, so it can be accessed from an eBPF program. We currently already do this for classic BPF filters, but not yet on eBPF, it might be useful as a demuxer in combination with helpers like bpf_clone_redirect(), toy example: __section("cls-lb") int ingress_main(struct __sk_buff skb) { unsigned int which = 3 + (skb->hash & 7); / bpf_skb_store_bytes(skb, ...); / / bpf_l{3,4}_csum_replace(skb, ...); */ bpf_clone_redirect(skb, which, 0); return -1; } I was thinking whether to add skb_get_hash(), but then concluded the raw skb->hash seems fine in this case: we can directly access the hash w/o extra eBPF helper function call, it's filled out by many NICs on ingress, and in case the entropy level would not be sufficient, people can still implement their own specific sw fallback hash mix anyway. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-02 17:20:47 -07:00
Eric Dumazet	3d0e0af406	fq_codel: explicitly reset flows in ->reset() Alex reported the following crash when using fq_codel with htb: crash> bt PID: 630839 TASK: ffff8823c990d280 CPU: 14 COMMAND: "tc" [... snip ...] #8 [ffff8820ceec17a0] page_fault at ffffffff8160a8c2 [exception RIP: htb_qlen_notify+24] RIP: ffffffffa0841718 RSP: ffff8820ceec1858 RFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88241747b400 RDX: ffff88241747b408 RSI: 0000000000000000 RDI: ffff8811fb27d000 RBP: ffff8820ceec1868 R8: ffff88120cdeff24 R9: ffff88120cdeff30 R10: 0000000000000bd4 R11: ffffffffa0840919 R12: ffffffffa0843340 R13: 0000000000000000 R14: 0000000000000001 R15: ffff8808dae5c2e8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [...] qdisc_tree_decrease_qlen at ffffffff81565375 #10 [...] fq_codel_dequeue at ffffffffa084e0a0 [sch_fq_codel] #11 [...] fq_codel_reset at ffffffffa084e2f8 [sch_fq_codel] #12 [...] qdisc_destroy at ffffffff81560d2d #13 [...] htb_destroy_class at ffffffffa08408f8 [sch_htb] #14 [...] htb_put at ffffffffa084095c [sch_htb] #15 [...] tc_ctl_tclass at ffffffff815645a3 #16 [...] rtnetlink_rcv_msg at ffffffff81552cb0 [... snip ...] As Jamal pointed out, there is actually no need to call dequeue to purge the queued skb's in reset, data structures can be just reset explicitly. Therefore, we reset everything except config's and stats, so that we would have a fresh start after device flipping. Fixes: `4b549a2ef4` ("fq_codel: Fair Queue Codel AQM") Reported-by: Alex Gartrell <agartrell@fb.com> Cc: Alex Gartrell <agartrell@fb.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [xiyou.wangcong@gmail.com: added codel_vars_init() and qdisc_qstats_backlog_dec()] Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-08-02 17:20:01 -07:00
David S. Miller	5510b3c2a1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: arch/s390/net/bpf_jit_comp.c drivers/net/ethernet/ti/netcp_ethss.c net/bridge/br_multicast.c net/ipv4/ip_fragment.c All four conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 23:52:20 -07:00
Linus Torvalds	7c764cec37	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Must teardown SR-IOV before unregistering netdev in igb driver, from Alex Williamson. 2) Fix ipv6 route unreachable crash in IPVS, from Alex Gartrell. 3) Default route selection in ipv4 should take the prefix length, table ID, and TOS into account, from Julian Anastasov. 4) sch_plug must have a reset method in order to purge all buffered packets when the qdisc is reset, likewise for sch_choke, from WANG Cong. 5) Fix deadlock and races in slave_changelink/br_setport in bridging. From Nikolay Aleksandrov. 6) mlx4 bug fixes (wrong index in port even propagation to VFs, overzealous BUG_ON assertion, etc.) from Ido Shamay, Jack Morgenstein, and Or Gerlitz. 7) Turn off klog message about SCTP userspace interface compat that makes no sense at all, from Daniel Borkmann. 8) Fix unbounded restarts of inet frag eviction process, causing NMI watchdog soft lockup messages, from Florian Westphal. 9) Suspend/resume fixes for r8152 from Hayes Wang. 10) Fix busy loop when MSG_WAITALL\|MSG_PEEK is used in TCP recv, from Sabrina Dubroca. 11) Fix performance regression when removing a lot of routes from the ipv4 routing tables, from Alexander Duyck. 12) Fix device leak in AF_PACKET, from Lars Westerhoff. 13) AF_PACKET also has a header length comparison bug due to signedness, from Alexander Drozdov. 14) Fix bug in EBPF tail call generation on x86, from Daniel Borkmann. 15) Memory leaks, TSO stats, watchdog timeout and other fixes to thunderx driver from Sunil Goutham and Thanneeru Srinivasulu. 16) act_bpf can leak memory when replacing programs, from Daniel Borkmann. 17) WOL packet fixes in gianfar driver, from Claudiu Manoil. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) stmmac: fix missing MODULE_LICENSE in stmmac_platform gianfar: Enable device wakeup when appropriate gianfar: Fix suspend/resume for wol magic packet gianfar: Fix warning when CONFIG_PM off act_pedit: check binding before calling tcf_hash_release() net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket net: sched: fix refcount imbalance in actions r8152: reset device when tx timeout r8152: add pre_reset and post_reset qlcnic: Fix corruption while copying act_bpf: fix memory leaks when replacing bpf programs net: thunderx: Fix for crash while BGX teardown net: thunderx: Add PCI driver shutdown routine net: thunderx: Fix crash when changing rss with mutliple traffic flows net: thunderx: Set watchdog timeout value net: thunderx: Wakeup TXQ only if CQE_TX are processed net: thunderx: Suppress alloc_pages() failure warnings net: thunderx: Fix TSO packet statistic net: thunderx: Fix memory leak when changing queue count net: thunderx: Fix RQ_DROP miscalculation ...	2015-07-31 17:10:56 -07:00
Tom Herbert	be26849bfb	ipv6: Disable flowlabel state ranges by default Per RFC6437 stateful flow labels (e.g. labels set by flow label manager) cannot "disturb" nodes taking part in stateless flow labels. While the ranges only reduce the flow label entropy by one bit, it is conceivable that this might bias the algorithm on some routers causing a load imbalance. For best results on the Internet we really need the full 20 bits. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 17:07:11 -07:00
Tom Herbert	42240901f7	ipv6: Implement different admin modes for automatic flow labels Change the meaning of net.ipv6.auto_flowlabels to provide a mode for automatic flow labels generation. There are four modes: 0: flow labels are disabled 1: flow labels are enabled, sockets can opt-out 2: flow labels are allowed, sockets can opt-in 3: flow labels are enabled and enforced, no opt-out for sockets np->autoflowlabel is initialized according to the sysctl value. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 17:07:11 -07:00
Tom Herbert	67800f9b1f	ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel We can't call skb_get_hash here since the packet is not complete to do flow_dissector. Create hash based on flowi6 instead. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 17:07:11 -07:00
Tom Herbert	f70ea018da	net: Add functions to get skb->hash based on flow structures Add skb_get_hash_flowi6 and skb_get_hash_flowi4 which derive an sk_buff hash from flowi6 and flowi4 structures respectively. These functions can be called when creating a packet in the output path where the new sk_buff does not yet contain a fully formed packet that is parsable by flow dissector. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 17:07:11 -07:00
Florian Fainelli	04ff53f96a	net: dsa: Add netconsole support Add support for using DSA slave network devices with netconsole, which requires us to allocate and free custom netpoll instances and invoke the parent network device poll controller callback. In order for netconsole to work, we need to construct the DSA tag, but not queue the skb for transmission on the master network device xmit function. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:45:37 -07:00
Florian Fainelli	4ed70ce9f0	net: dsa: Refactor transmit path to eliminate duplication All tagging protocols do the same thing: increment device statistics, make room for the tag to be inserted, create the tag, invoke the parent network device transmit function. In order to prepare for adding netpoll support, which requires the tag creation, but not using the parent network device transmit function, do some little refactoring which eliminates duplication between the 4 tagging protocols supported. We need to return a sk_buff pointer back to the caller because the tag specific transmit function may have to reallocate the original skb (e.g: tag_trailer.c) and this is the one we should be transmitting, not the original sk_buff we were passed. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:45:37 -07:00
Joe Perches	85b1d8bbfd	br2684: Remove unnecessary formatting macros b1 and bs Use vsprintf extension %pI4 instead. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:25:52 -07:00
WANG Cong	5175f7106c	act_pedit: check binding before calling tcf_hash_release() When we share an action within a filter, the bind refcnt should increase, therefore we should not call tcf_hash_release(). Fixes: `1a29321ed0` ("net_sched: act: Dont increment refcnt on replace") Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:22:34 -07:00
Roopa Prabhu	bf21563acc	af_mpls: fix undefined reference to ip6_route_output Undefined reference to ip6_route_output and ip_route_output was reported with CONFIG_INET=n and CONFIG_IPV6=n. This patch uses ipv6_stub_impl.ipv6_dst_lookup instead of ip6_route_output. And wraps affected code under IS_ENABLED(CONFIG_INET) and IS_ENABLED(CONFIG_IPV6). Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:21:30 -07:00
Roopa Prabhu	343d60aada	ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup for use cases where sk is not available (like mpls). sk appears to be needed to get the namespace 'net' and is optional otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup to take net argument. sk remains optional. All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified to pass net. I have modified them to use already available 'net' in the scope of the call. I can change them to sock_net(sk) to avoid any unintended change in behaviour if sock namespace is different. They dont seem to be from code inspection. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:21:30 -07:00
Alexei Starovoitov	d3aa45ce6b	bpf: add helpers to access tunnel metadata Introduce helpers to let eBPF programs attached to TC manipulate tunnel metadata: bpf_skb_[gs]et_tunnel_key(skb, key, size, flags) skb: pointer to skb key: pointer to 'struct bpf_tunnel_key' size: size of 'struct bpf_tunnel_key' flags: room for future extensions First eBPF program that uses these helpers will allocate per_cpu metadata_dst structures that will be used on TX. On RX metadata_dst is allocated by tunnel driver. Typical usage for TX: struct bpf_tunnel_key tkey; ... populate tkey ... bpf_skb_set_tunnel_key(skb, &tkey, sizeof(tkey), 0); bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); RX: struct bpf_tunnel_key tkey = {}; bpf_skb_get_tunnel_key(skb, &tkey, sizeof(tkey), 0); ... lookup or redirect based on tkey ... 'struct bpf_tunnel_key' will be extended in the future by adding elements to the end and the 'size' argument will indicate which fields are populated, thereby keeping backwards compatibility. The 'flags' argument may be used as well when the 'size' is not enough or to indicate completely different layout of bpf_tunnel_key. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-31 15:20:22 -07:00
Jon Paul Maloy	440d8963cd	tipc: clean up link creation We simplify the link creation function tipc_link_create() and the way the link struct it is connected to the node struct. In particular, we remove the duplicate initialization of some fields which are anyway set in tipc_link_reset(). Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:15 -07:00
Jon Paul Maloy	9073fb8be3	tipc: use temporary, non-protected skb queue for bundle reception Currently, when we extract small messages from a message bundle, or when many messages have accumulated in the link arrival queue, those messages are added one by one to the lock protected link input queue. This may increase contention with the reader of that queue, in the function tipc_sk_rcv(). This commit introduces a temporary, unprotected input queue in tipc_link_rcv() for such cases. Only when the arrival queue has been emptied, and the function is ready to return, does it splice the whole temporary queue into the real input queue. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:15 -07:00
Jon Paul Maloy	23d8335d78	tipc: remove implicit message delivery in node_unlock() After the most recent changes, all access calls to a link which may entail addition of messages to the link's input queue are postpended by an explicit call to tipc_sk_rcv(), using a reference to the correct queue. This means that the potentially hazardous implicit delivery, using tipc_node_unlock() in combination with a binary flag and a cached queue pointer, now has become redundant. This commit removes this implicit delivery mechanism both for regular data messages and for binding table update messages. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	598411d70f	tipc: make resetting of links non-atomic In order to facilitate future improvements to the locking structure, we want to make resetting and establishing of links non-atomic. I.e., the functions tipc_node_link_up() and tipc_node_link_down() should be called from outside the node lock context, and grab/release the node lock themselves. This requires that we can freeze the link state from the moment it is set to RESETTING or PEER_RESET in one lock context until it is set to RESET or ESTABLISHING in a later context. The recently introduced link FSM makes this possible, so we are now ready to introduce the above change. This commit implements this. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	cf148816ac	tipc: move received discovery data evaluation inside node.c The node lock is currently grabbed and and released in the function tipc_disc_rcv() in the file discover.c. As a preparation for the next commits, we need to move this node lock handling, along with the code area it is covering, to node.c. This commit introduces this change. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	662921cd0a	tipc: merge link->exec_mode and link->state into one FSM Until now, we have been handling link failover and synchronization by using an additional link state variable, "exec_mode". This variable is not independent of the link FSM state, something causing a risk of inconsistencies, apart from the fact that it clutters the code. The conditions are now in place to define a new link FSM that covers all existing use cases, including failover and synchronization, and eliminate the "exec_mode" field altogether. The FSM must also support non-atomic resetting of links, which will be introduced later. The new link FSM is shown below, with 7 states and 8 events. Only events leading to state change are shown as edges. +------------------------------------+ \|RESET_EVT \| \| \| \| +--------------+ \| +-----------------\| SYNCHING \|-----------------+ \| \|FAILURE_EVT +--------------+ PEER_RESET_EVT\| \| \| A \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|SYNCH_ \|SYNCH_ \| \| \| \|BEGIN_EVT \|END_EVT \| \| \| \| \| \| \| V \| V V \| +-------------+ +--------------+ +------------+ \| \| RESETTING \|<---------\| ESTABLISHED \|--------->\| PEER_RESET \| \| +-------------+ FAILURE_ +--------------+ PEER_ +------------+ \| \| EVT \| A RESET_EVT \| \| \| \| \| \| \| \| \| \| \| \| \| +--------------+ \| \| \| RESET_EVT\| \|RESET_EVT \|ESTABLISH_EVT \| \| \| \| \| \| \| \| \| \| \| \| V V \| \| \| +-------------+ +--------------+ RESET_EVT\| +--->\| RESET \|--------->\| ESTABLISHING \|<----------------+ +-------------+ PEER_ +--------------+ \| A RESET_EVT \| \| \| \| \| \| \| \|FAILOVER_ \|FAILOVER_ \|FAILOVER_ \|BEGIN_EVT \|END_EVT \|BEGIN_EVT \| \| \| V \| \| +-------------+ \| \| FAILINGOVER \|<----------------+ +-------------+ These changes are fully backwards compatible. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	5045f7b900	tipc: move protocol message sending away from link FSM The implementation of the link FSM currently takes decisions about and sends out link protocol messages. This is unnecessary, since such actions are not the result of any link state change, and are even decided based on non-FSM state information ("silent_intv_cnt"). We now move the sending of unicast link protocol messages to the function tipc_link_timeout(), and the initial broadcast synchronization message to tipc_node_link_up(). The latter is done because a link instance should not need to know whether it is the first or second link to a destination. Such information is now restricted to and handled by the link aggregation layer in node.c Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	6e498158a8	tipc: move link synch and failover to link aggregation level Link failover and synchronization have until now been handled by the links themselves, forcing them to have knowledge about and to access parallel links in order to make the two algorithms work correctly. In this commit, we move the control part of this functionality to the link aggregation level in node.c, which is the right location for this. As a result, the two algorithms become easier to follow, and the link implementation becomes simpler. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:14 -07:00
Jon Paul Maloy	66996b6c47	tipc: extend node FSM In the next commit, we will move link synch/failover orchestration to the link aggregation level. In order to do this, we first need to extend the node FSM with two more states, NODE_SYNCHING and NODE_FAILINGOVER, plus four new events to enter and leave those states. This commit introduces this change, without yet making use of it. The node FSM now looks as follows: +-----------------------------------------+ \| PEER_DOWN_EVT\| \| \| +------------------------+----------------+ \| \|SELF_DOWN_EVT \| \| \| \| \| \| \| \| +-----------+ +-----------+ \| \| \|NODE_ \| \|NODE_ \| \| \| +----------\|FAILINGOVER\|<---------\|SYNCHING \|------------+ \| \| \|SELF_ +-----------+ FAILOVER_+-----------+ PEER_ \| \| \| \|DOWN_EVT \| A BEGIN_EVT A \| DOWN_EVT\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|FAILOVER_\|FAILOVER_ \|SYNCH_ \|SYNCH_ \| \| \| \| \|END_EVT \|BEGIN_EVT \|BEGIN_EVT\|END_EVT \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| +--------------+ \| \| \| \| \| +------->\| SELF_UP_ \|<-------+ \| \| \| \| +----------------\| PEER_UP \|------------------+ \| \| \| \| \|SELF_DOWN_EVT +--------------+ PEER_DOWN_EVT\| \| \| \| \| \| A A \| \| \| \| \| \| \| \| \| \| \| \| \| \| PEER_UP_EVT\| \|SELF_UP_EVT \| \| \| \| \| \| \| \| \| \| \| V V V \| \| V V V +------------+ +-----------+ +-----------+ +------------+ \|SELF_DOWN_ \| \|SELF_UP_ \| \|PEER_UP_ \| \|PEER_DOWN \| \|PEER_LEAVING\|<------\|PEER_COMING\| \|SELF_COMING\|------>\|SELF_LEAVING\| +------------+ SELF_ +-----------+ +-----------+ PEER_ +------------+ \| DOWN_EVT A A DOWN_EVT \| \| \| \| \| \| \| \| \| \| SELF_UP_EVT\| \|PEER_UP_EVT \| \| \| \| \| \| \| \| \| \|PEER_DOWN_EVT +--------------+ SELF_DOWN_EVT\| +------------------->\| SELF_DOWN_ \|<--------------------+ \| PEER_DOWN \| +--------------+ Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	655fb243b8	tipc: reverse call order for link_reset()->node_link_down() In many cases the call order when a link is reset goes as follows: tipc_node_xx()->tipc_link_reset()->tipc_node_link_down() This is not the right order if we want the node to be in control, so in this commit we change the order to: tipc_node_xx()->tipc_node_link_down()->tipc_link_reset() The fact that tipc_link_reset() now is called from only one location with a well-defined state will also facilitate later simplifications of tipc_link_reset() and the link FSM. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	6144a996a6	tipc: move all link_reset() calls to link aggregation level In line with our effort to let the node level have full control over its links, we want to move all link reset calls from link.c to node.c. Some of the calls can be moved by simply moving the calling function, when this is the right thing to do. For the remaining calls we use the now established technique of returning a TIPC_LINK_DOWN_EVT flag from tipc_link_rcv(), whereafter we perform the reset call when the call returns. This change serves as a preparation for the coming commits. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
Jon Paul Maloy	cbeb83ca68	tipc: eliminate function tipc_link_activate() The function tipc_link_activate() is redundant, since it mostly performs settings that have already been done in a preceding tipc_link_reset(). There are three exceptions to this: - The actual state change to TIPC_LINK_WORKING. This should anyway be done in the FSM, and not in a separate function. - Registration of the link with the bearer. This should be done by the node, since we don't want the link to have any knowledge about its specific bearer. - Call to tipc_node_link_up() for user access registration. With the new role distribution between link aggregation and link level this becomes the wrong call order; tipc_node_link_up() should instead be called directly as a result of a TIPC_LINK_UP event, hence by the node itself. This commit implements those changes. Tested-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 17:25:13 -07:00
David S. Miller	29a3060aa7	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2015-07-30 Here's a set of Bluetooth & 802.15.4 patches intended for the 4.3 kernel. - Cleanups & fixes to mac802154 - Refactoring of Intel Bluetooth HCI driver - Various coding style fixes to Bluetooth HCI drivers - Support for Intel Lightning Peak Bluetooth devices - Generic class code in interface descriptor in btusb to match more HW - Refactoring of Bluetooth HS code together with a new config option - Support for BCM4330B1 Broadcom UART controller Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 16:16:43 -07:00
Sowmini Varadhan	8a68173691	net: sk_clone_lock() should only do get_net() if the parent is not a kernel socket The newsk returned by sk_clone_lock should hold a get_net() reference if, and only if, the parent is not a kernel socket (making this similar to sk_alloc()). E.g,. for the SYN_RECV path, tcp_v4_syn_recv_sock->..inet_csk_clone_lock sets up the syn_recv newsk from sk_clone_lock. When the parent (listen) socket is a kernel socket (defined in sk_alloc() as having sk_net_refcnt == 0), then the newsk should also have a 0 sk_net_refcnt and should not hold a get_net() reference. Fixes: `26abe14379` ("net: Modify sk_alloc to not reference count the netns of kernel sockets.") Acked-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 15:59:12 -07:00
Hangbin Liu	8013d1d7ea	net/ipv6: add sysctl option accept_ra_min_hop_limit Commit `6fd99094de` ("ipv6: Don't reduce hop limit for an interface") disabled accept hop limit from RA if it is smaller than the current hop limit for security stuff. But this behavior kind of break the RFC definition. RFC 4861, 6.3.4. Processing Received Router Advertisements A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time, and Retrans Timer) may contain a value denoting that it is unspecified. In such cases, the parameter should be ignored and the host should continue using whatever value it is already using. If the received Cur Hop Limit value is non-zero, the host SHOULD set its CurHopLimit variable to the received value. So add sysctl option accept_ra_min_hop_limit to let user choose the minimum hop limit value they can accept from RA. And set default to 1 to meet RFC standards. Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 15:56:40 -07:00
Daniel Borkmann	28e6b67f0b	net: sched: fix refcount imbalance in actions Since commit `55334a5db5` ("net_sched: act: refuse to remove bound action outside"), we end up with a wrong reference count for a tc action. Test case 1: FOO="1,6 0 0 4294967295," BAR="1,6 0 0 4294967294," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 \ action bpf bytecode "$FOO" tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 1 bind 1 tc actions replace action bpf bytecode "$BAR" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 1 ref 2 bind 1 tc actions replace action bpf bytecode "$FOO" index 1 tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 1 ref 3 bind 1 Test case 2: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 1 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 2 bind 1 tc actions add action drop index 1 RTNETLINK answers: File exists [...] tc actions show action gact action order 0: gact action pass random type none pass val 0 index 1 ref 3 bind 1 What happens is that in tcf_hash_check(), we check tcf_common for a given index and increase tcfc_refcnt and conditionally tcfc_bindcnt when we've found an existing action. Now there are the following cases: 1) We do a late binding of an action. In that case, we leave the tcfc_refcnt/tcfc_bindcnt increased and are done with the ->init() handler. This is correctly handeled. 2) We replace the given action, or we try to add one without replacing and find out that the action at a specific index already exists (thus, we go out with error in that case). In case of 2), we have to undo the reference count increase from tcf_hash_check() in the tcf_hash_check() function. Currently, we fail to do so because of the 'tcfc_bindcnt > 0' check which bails out early with an -EPERM error. Now, while commit `55334a5db5` prevents 'tc actions del action ...' on an already classifier-bound action to drop the reference count (which could then become negative, wrap around etc), this restriction only accounts for invocations outside a specific action's ->init() handler. One possible solution would be to add a flag thus we possibly trigger the -EPERM ony in situations where it is indeed relevant. After the patch, above test cases have correct reference count again. Fixes: `55334a5db5` ("net_sched: act: refuse to remove bound action outside") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-30 14:20:39 -07:00
Alexander Aring	5857d1dbae	Bluetooth: 6lowpan: Fix possible race This patch fix a possible race after calling register_netdev. After calling netdev_register it could be possible that netdev_ops callbacks use the uninitialized private data of lowpan_dev. By moving the initialization of this data before netdev_register we can be sure that initialized private data is be used after netdev_register. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 14:11:36 +02:00
Lennert Buytenhek	c22ff7b4e7	mac802154: Fix memory corruption with global deferred transmit state. When transmitting a packet via a mac802154 driver that can sleep in its transmit function, mac802154 defers the call to the driver's transmit function to a per-device workqueue. However, mac802154 uses a single global work_struct for this, which means that if you have more than one registered mac802154 interface in the system, and you transmit on more than one of them at the same time, you'll very easily cause memory corruption. This patch moves the deferred transmit processing state from global variables to struct ieee802154_local, and this seems to fix the memory corruption issue. Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 14:08:55 +02:00
Dan Carpenter	1a727c6361	netfilter: nf_conntrack: checking for IS_ERR() instead of NULL We recently changed this from nf_conntrack_alloc() to nf_ct_tmpl_alloc() so the error handling needs to changed to check for NULL instead of IS_ERR(). Fixes: `0838aa7fcf` ('netfilter: fix netns dependencies with conntrack templates') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-30 14:04:19 +02:00
Christophe JAILLET	54c9ee3992	Bluetooth: cmtp: Do not use list_for_each_safe when not needed There is no need to use the safe version of list_for_each here. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:50:35 +02:00
Bernhard Thaler	f4b3eee727	netfilter: bridge: do not initialize statics to 0 or NULL Fix checkpatch.pl "ERROR: do not initialise statics to 0 or NULL" for all statics explicitly initialized to 0. Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-30 13:46:04 +02:00
Florian Westphal	72b1e5e4ca	netfilter: bridge: reduce nf_bridge_info to 32 bytes again We can use union for most of the temporary cruft (original ipv4/ipv6 address, source mac, physoutdev) since they're used during different stages of br netfilter traversal. Also get rid of the last two ->mask users. Shrinks struct from 48 to 32 on 64bit arch. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-30 13:37:42 +02:00
Arron Wang	df9b89c7e4	Bluetooth: Move create/accept phy link completed callback to amp.c To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:37:22 +02:00
Arron Wang	b3d3914006	Bluetooth: Move amp assoc read/write completed callback to amp.c To avoid amp module hooks from hci_event.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:37:22 +02:00
Arron Wang	839278823c	Bluetooth: Move get info completed callback to a2mp.c To avoid a2mp module hooks from hci_event.c and send getinfo response operation only required by a2mp module, we can move this callback to a2mp.c Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:37:22 +02:00
Arron Wang	a77a6a14e5	Bluetooth: Move high speed specific event under BT_HS option Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:31:59 +02:00
Arron Wang	244bc37759	Bluetooth: Add BT_HS config option Move A2MP Module under BT_HS config option and allow the user have flexible option to choose the feature only they need a2mp_discover_amp() & a2mp_channel_create() are a2mp module entry point for master and slave, and this is dynamic invoked depends on the userspace or remote request, then we defined their implementation depends on BT_HS config Signed-off-by: Arron Wang <arron.wang@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-30 13:31:59 +02:00
Michal Kubeček	d7ee351904	netfilter: nf_ct_sctp: minimal multihoming support Currently nf_conntrack_proto_sctp module handles only packets between primary addresses used to establish the connection. Any packets between secondary addresses are classified as invalid so that usual firewall configurations drop them. Allowing HEARTBEAT and HEARTBEAT-ACK chunks to establish a new conntrack would allow traffic between secondary addresses to pass through. A more sophisticated solution based on the addresses advertised in the initial handshake (and possibly also later dynamic address addition and removal) would be much harder to implement. Moreover, in general we cannot assume to always see the initial handshake as it can be routed through a different path. The patch adds two new conntrack states: SCTP_CONNTRACK_HEARTBEAT_SENT - a HEARTBEAT chunk seen but not acked SCTP_CONNTRACK_HEARTBEAT_ACKED - a HEARTBEAT acked by HEARTBEAT-ACK State transition rules: - HEARTBEAT_SENT responds to usual chunks the same way as NONE (so that the behaviour changes as little as possible) - HEARTBEAT_ACKED responds to usual chunks the same way as ESTABLISHED does, except the resulting state is HEARTBEAT_ACKED rather than ESTABLISHED - previously existing states except NONE are preserved when HEARTBEAT or HEARTBEAT-ACK is seen - NONE (in the initial direction) changes to HEARTBEAT_SENT on HEARTBEAT and to CLOSED on HEARTBEAT-ACK - HEARTBEAT_SENT changes to HEARTBEAT_ACKED on HEARTBEAT-ACK in the reply direction - HEARTBEAT_SENT and HEARTBEAT_ACKED are preserved on HEARTBEAT and HEARTBEAT-ACK otherwise Normally, vtag is set from the INIT chunk for the reply direction and from the INIT-ACK chunk for the originating direction (i.e. each of these defines vtag value for the opposite direction). For secondary conntracks, we can't rely on seeing INIT/INIT-ACK and even if we have seen them, we would need to connect two different conntracks. Therefore simplified logic is applied: vtag of first packet in each direction (HEARTBEAT in the originating and HEARTBEAT-ACK in reply direction) is saved and all following packets in that direction are compared with this saved value. While INIT and INIT-ACK define vtag for the opposite direction, vtags extracted from HEARTBEAT and HEARTBEAT-ACK are always for their direction. Default timeout values for new states are HEARTBEAT_SENT: 30 seconds (default hb_interval) HEARTBEAT_ACKED: 210 seconds (hb_interval * path_max_retry + max_rto) (We cannot expect to see the shutdown sequence so that, unlike ESTABLISHED, the HEARTBEAT_ACKED timeout shouldn't be too long.) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-30 12:59:25 +02:00
Pablo Neira Ayuso	f0ad462189	netfilter: nf_conntrack: silence warning on falling back to vmalloc() Since `88eab472ec` ("netfilter: conntrack: adjust nf_conntrack_buckets default value"), the hashtable can easily hit this warning. We got reports from users that are getting this message in a quite spamming fashion, so better silence this. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>	2015-07-30 12:32:55 +02:00
Daniel Borkmann	f4eaed28c7	act_bpf: fix memory leaks when replacing bpf programs We currently trigger multiple memory leaks when replacing bpf actions, besides others: comm "tc", pid 1909, jiffies 4294851310 (age 1602.796s) hex dump (first 32 bytes): 01 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................ 18 b0 98 6d 00 88 ff ff 00 00 00 00 00 00 00 00 ...m............ backtrace: [<ffffffff817e623e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8120a22d>] __vmalloc_node_range+0x1bd/0x2c0 [<ffffffff8120a37a>] __vmalloc+0x4a/0x50 [<ffffffff811a8d0a>] bpf_prog_alloc+0x3a/0xa0 [<ffffffff816c0684>] bpf_prog_create+0x44/0xa0 [<ffffffffa09ba4eb>] tcf_bpf_init+0x28b/0x3c0 [act_bpf] [<ffffffff816d7001>] tcf_action_init_1+0x191/0x1b0 [<ffffffff816d70a2>] tcf_action_init+0x82/0xf0 [<ffffffff816d4d12>] tcf_exts_validate+0xb2/0xc0 [<ffffffffa09b5838>] cls_bpf_modify_existing+0x98/0x340 [cls_bpf] [<ffffffffa09b5cd6>] cls_bpf_change+0x1a6/0x274 [cls_bpf] [<ffffffff816d56e5>] tc_ctl_tfilter+0x335/0x910 [<ffffffff816b9145>] rtnetlink_rcv_msg+0x95/0x240 [<ffffffff816df34f>] netlink_rcv_skb+0xaf/0xc0 [<ffffffff816b909e>] rtnetlink_rcv+0x2e/0x40 [<ffffffff816deaaf>] netlink_unicast+0xef/0x1b0 Issue is that the old content from tcf_bpf is allocated and needs to be released when we replace it. We seem to do that since the beginning of act_bpf on the filter and insns, later on the name as well. Example test case, after patch: # FOO="1,6 0 0 4294967295," # BAR="1,6 0 0 4294967294," # tc actions add action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$BAR" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967294' default-action pipe index 2 ref 1 bind 0 # tc actions replace action bpf bytecode "$FOO" index 2 # tc actions show action bpf action order 0: bpf bytecode '1,6 0 0 4294967295' default-action pipe index 2 ref 1 bind 0 # tc actions del action bpf index 2 [...] # echo "scan" > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak \| grep "comm \"tc\"" \| wc -l 0 Fixes: `d23b8ad8ab` ("tc: add BPF based action") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 23:56:22 -07:00
Thomas Graf	dcc38c033b	openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN This readds the config option CONFIG_OPENVSWITCH_VXLAN to avoid a hard dependency of OVS on VXLAN. It moves the VXLAN config compat code to vport-vxlan.c and allows compliation as a module. Fixes: `614732eaa1` ("openvswitch: Use regular VXLAN net_device device") Fixes: `2661371ace` ("openvswitch: fix compilation when vxlan is a module") Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 23:03:10 -07:00
Eric Dumazet	c8507fb235	ipv6: flush nd cache on IFF_NOARP change This patch is the IPv6 equivalent of commit `6c8b4e3ff8` ("arp: flush arp cache on IFF_NOARP change") Without it, we keep buggy neighbours in the cache, with destination MAC address equal to our own MAC address. Tested: tcpdump -i eth0 -s 0 ip6 -n -e & ip link set dev eth0 arp off ping6 remote // sends buggy frames ip link set dev eth0 arp on ping6 remote // should work once kernel is patched Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mario Fanelli <mariofanelli@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 23:01:39 -07:00
Bogdan Hamciuc	1ce4b2f44d	net: pktgen: Remove unused 'allocated_skbs' field Field pktgen_dev.allocated_skbs had been written to, but never read from. The number of allocated skbs can be deduced anyway, from the total number of sent packets and the 'clone_skb' param. Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 23:00:39 -07:00
Bogdan Hamciuc	879c7220e8	net: pktgen: Observe needed_headroom of the device Allocate enough space so as not to force the outgoing net device to do skb_realloc_headroom(). Signed-off-by: Bogdan Hamciuc <bogdan.hamciuc@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 23:00:39 -07:00
Thomas Graf	92a99bf3ba	lwtunnel: Make lwtun_encaps[] static Any external user should use the registration API instead of accessing this directly. Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 22:59:39 -07:00
Tom Herbert	877d1f6291	net: Set sk_txhash from a random number This patch creates sk_set_txhash and eliminates protocol specific inet_set_txhash and ip6_set_txhash. sk_set_txhash simply sets a random number instead of performing flow dissection. sk_set_txash is also allowed to be called multiple times for the same socket, we'll need this when redoing the hash for negative routing advice. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 22:44:04 -07:00
Jon Maloy	5a4c355229	tipc: fix bug in broadcast synch message create function In commit `d999297c3d` ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_build_bcast_sync_msg(), which carries initial synchronization data between two nodes at first contact and at re-contact. In this function, we missed to add synchronization data, with the effect that the broadcast link endpoints will fail to synchronize correctly at re-contact between a running and a restarted node. All other cases work as intended. With this commit, we fix this bug. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 16:48:16 -07:00
Tobias Klauser	73d0fcf2f4	packet: remove handling of tx_ring from prb_shutdown_retire_blk_timer() Follow `e8e85cc5eb` ("packet: remove handling of tx_ring") and remove the tx_ring parameter from prb_shutdown_retire_blk_timer() as it is only called with tx_ring = 0. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 15:11:12 -07:00
Nikolay Aleksandrov	7ae90a4f96	bridge: mdb: fix delmdb state in the notification Since mdb states were introduced when deleting an entry the state was left as it was set in the delete request from the user which leads to the following output when doing a monitor (for example): $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 temp ^^^ Note the "temp" state in the delete notification which is wrong since the entry was permanent, the state in a delete is always reported as "temp" regardless of the real state of the entry. After this patch: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent There's one important note to make here that the state is actually not matched when doing a delete, so one can delete a permanent entry by stating "temp" in the end of the command, I've chosen this fix in order not to break user-space tools which rely on this (incorrect) behaviour. So to give an example after this patch and using the wrong state: $ bridge mdb add dev br0 port eth3 grp 239.0.0.1 permanent (monitor) dev br0 port eth3 grp 239.0.0.1 permanent $ bridge mdb del dev br0 port eth3 grp 239.0.0.1 temp (monitor) dev br0 port eth3 grp 239.0.0.1 permanent Note the state of the entry that got deleted is correct in the notification. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `ccb1c31a7a` ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 15:02:30 -07:00
Satish Ashok	544586f742	bridge: mcast: give fast leave precedence over multicast router and querier When fast leave is configured on a bridge port and an IGMP leave is received for a group, the group is not deleted immediately if there is a router detected or if multicast querier is configured. Ideally the group should be deleted immediately when fast leave is configured. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 14:57:05 -07:00
Toshiaki Makita	df356d5e81	bridge: Fix network header pointer for vlan tagged packets There are several devices that can receive vlan tagged packets with CHECKSUM_PARTIAL like tap, possibly veth and xennet. When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded by bridge to a device with the IP_CSUM feature, they end up with checksum error because before entering bridge, the network header is set to ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(), get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later. Since the network header is exepected to be ETH_HLEN in flow-dissection and hash-calculation in RPS in rx path, and since the header pointer fix is needed only in tx path, set the appropriate network header on forwarding packets. Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 12:20:16 -07:00
Alexander Drozdov	dbd46ab412	packet: tpacket_snd(): fix signed/unsigned comparison tpacket_fill_skb() can return a negative value (-errno) which is stored in tp_len variable. In that case the following condition will be (but shouldn't be) true: tp_len > dev->mtu + dev->hard_header_len as dev->mtu and dev->hard_header_len are both unsigned. That may lead to just returning an incorrect EMSGSIZE errno to the user. Fixes: `52f1454f62` ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-29 00:09:58 -07:00
Eric Dumazet	11c91ef98f	arp: filter NOARP neighbours for SIOCGARP When arp is off on a device, and ioctl(SIOCGARP) is queried, a buggy answer is given with MAC address of the device, instead of the mac address of the destination/gateway. We filter out NUD_NOARP neighbours for /proc/net/arp, we must do the same for SIOCGARP ioctl. Tested: lpaa23:~# ./arp 10.246.7.190 MAC=00:01:e8:22:cb:1d // correct answer lpaa23:~# ip link set dev eth0 arp off lpaa23:~# cat /proc/net/arp # check arp table is now 'empty' IP address HW type Flags HW address Mask Device lpaa23:~# ./arp 10.246.7.190 MAC=00:1a:11:c3:0d:7f // buggy answer before patch (this is eth0 mac) After patch : lpaa23:~# ip link set dev eth0 arp off lpaa23:~# ./arp 10.246.7.190 ioctl(SIOCGARP) failed: No such device or address Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Vytautas Valancius <valas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-28 23:41:24 -07:00
David Ward	865b804244	net/ipv4: suppress NETDEV_UP notification on address lifetime update This notification causes the FIB to be updated, which is not needed because the address already exists, and more importantly it may undo intentional changes that were made to the FIB after the address was originally added. (As a point of comparison, when an address becomes deprecated because its preferred lifetime expired, a notification on this chain is not generated.) The motivation for this commit is fixing an incompatibility between DHCP clients which set and update the address lifetime according to the lease, and a commercial VPN client which replaces kernel routes in a way that outbound traffic is sent only through the tunnel (and disconnects if any further route changes are detected via netlink). Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-28 23:38:13 -07:00
Nikolay Aleksandrov	76b91c32dd	bridge: stp: when using userspace stp stop kernel hello and hold timers These should be handled only by the respective STP which is in control. They become problematic for devices with limited resources with many ports because the hold_timer is per port and fires each second and the hello timer fires each 2 seconds even though it's global. While in user-space STP mode these timers are completely unnecessary so it's better to keep them off. Also ensure that when the bridge is up these timers are started only when running with kernel STP. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-28 23:33:20 -07:00
Linus Torvalds	d8132e08d2	NFS client bugfixes for Linux 4.2 Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVt6RGAAoJEGcL54qWCgDyiDIP/2+fUM7Tc1llCxYbM2WLC6Ar 34v5yVwO96MqhI4L2mXB5FJvr4LP2/EZ4ZExMcf4ymT7pgJnjFK4nEv9IHUSy6xb ea+oS9GjvFSeGdkukJLRniNER5/ZG3GWkojlHNJCgByoIVRK4ISXF/qL9w2sedGw +5ejvjqie9NmBnBXMq8DRlU+kXhVYCF6E9qWATwUNK5Eq2eeQnDbA2w9ACSBVK3W LhCvZi0eBq7krSbHob018PmlQ0VPvmYwk5xL4d//FvcaNj/utk82VjAZCdKOK1sH qn8hcKgVeVko/3jwcUp6m3zAkKZ1IX/XaXJeHbosnKG/g0vy3hQirpa/g2iDTQ4H NXOSwcsd6syReZDZbQTxbvaSOp5ACxZAQKYLnlPerJ/hMpXDQCEAwyeAFKzEaKz4 FfF0VJF+30w9PJk3wgk2DF66xbYVfHyvrLtVcb/ki8gb91cH09i+nFFSSfHQBMLh +ciHg7rOyXnbXoCaW9fBvONz2sCYDwbHATmhpWWZIx/3UTDf5owxHFa3BFDgGKnD jyiPjMh6I3JUE+Qm1zwInsfsskBKRSl2BdJgTHBGY5ODuQGF/sogOmvgbrT7Ox3t kbL8nzCydqLixM+4aw61nYakZqgDsKNER5Ggr+lkv4AZ2dH6IeP2IZjuoHLLylvZ dyqHwpCjoUtmYAUr166U =wlUD -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client bugfixes from Trond Myklebust: "Highlights include: Stable patches: - Fix a situation where the client uses the wrong (zero) stateid. - Fix a memory leak in nfs_do_recoalesce Bugfixes: - Plug a memory leak when ->prepare_layoutcommit fails - Fix an Oops in the NFSv4 open code - Fix a backchannel deadlock - Fix a livelock in sunrpc when sendmsg fails due to low memory availability - Don't revalidate the mapping if both size and change attr are up to date - Ensure we don't miss a file extension when doing pNFS - Several fixes to handle NFSv4.1 sequence operation status bits correctly - Several pNFS layout return bugfixes" * tag 'nfs-for-4.2-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits) nfs: Fix an oops caused by using other thread's stack space in ASYNC mode nfs: plug memory leak when ->prepare_layoutcommit fails SUNRPC: Report TCP errors to the caller sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. NFSv4.2: handle NFS-specific llseek errors NFS: Don't clear desc->pg_moreio in nfs_do_recoalesce() NFS: Fix a memory leak in nfs_do_recoalesce NFS: nfs_mark_for_revalidate should always set NFS_INO_REVAL_PAGECACHE NFS: Remove the "NFS_CAP_CHANGE_ATTR" capability NFS: Set NFS_INO_REVAL_PAGECACHE if the change attribute is uninitialised NFS: Don't revalidate the mapping if both size and change attr are up to date NFSv4/pnfs: Ensure we don't miss a file extension NFSv4: We must set NFS_OPEN_STATE flag in nfs_resync_open_stateid_locked SUNRPC: xprt_complete_bc_request must also decrement the free slot count SUNRPC: Fix a backchannel deadlock pNFS: Don't throw out valid layout segments pNFS: pnfs_roc_drain() fix a race with open pNFS: Fix races between return-on-close and layoutreturn. pNFS: pnfs_roc_drain should return 'true' when sleeping pNFS: Layoutreturn must invalidate all existing layout segments. ...	2015-07-28 09:37:44 -07:00
Lars Westerhoff	158cd4af8d	packet: missing dev_put() in packet_do_bind() When binding a PF_PACKET socket, the use count of the bound interface is always increased with dev_hold in dev_get_by_{index,name}. However, when rebound with the same protocol and device as in the previous bind the use count of the interface was not decreased. Ultimately, this caused the deletion of the interface to fail with the following message: unregister_netdevice: waiting for dummy0 to become free. Usage count = 1 This patch moves the dev_put out of the conditional part that was only executed when either the protocol or device changed on a bind. Fixes: `902fefb82e` ('packet: improve socket create/bind latency in some cases') Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 15:38:58 -07:00
Trond Myklebust	f580dd0428	SUNRPC: Report TCP errors to the caller Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-27 17:56:57 -04:00
Alexander Duyck	1513069edc	fib_trie: Drop unnecessary calls to leaf_pull_suffix It was reported that update_suffix was taking a long time on systems where a large number of leaves were attached to a single node. As it turns out fib_table_flush was calling update_suffix for each leaf that didn't have all of the aliases stripped from it. As a result, on this large node removing one leaf would result in us calling update_suffix for every other leaf on the node. The fix is to just remove the calls to leaf_pull_suffix since they are redundant as we already have a call in resize that will go through and update the suffix length for the node before we exit out of fib_table_flush or fib_table_flush_external. Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 14:29:11 -07:00
NeilBrown	743c69e7c0	sunrpc: translate -EAGAIN to -ENOBUFS when socket is writable. The networking layer does not reliably report the distinction between a non-block write failing because: 1/ the queue is too full already and 2/ a memory allocation attempt failed. The distinction is important because in the first case it is appropriate to retry as soon as the socket reports that it is writable, and in the second case a small delay is required as the socket will most likely report as writable but kmalloc could still fail. sk_stream_wait_memory() exhibits this distinction nicely, setting 'vm_wait' if a small wait is needed. However in the non-blocking case it always returns -EAGAIN no matter the cause of the failure. This -EAGAIN call get all the way to sunrpc. The sunrpc layer expects EAGAIN to indicate the first cause, and ENOBUFS to indicate the second. Various documentation suggests that this is not unreasonable, but does not guarantee the desired error codes. The result of getting -EAGAIN when -ENOBUFS is expected is that the send is tried again in a tight loop and soft lockups are reported. so: add tests after calls to xs_sendpages() to translate -EAGAIN into -ENOBUFS if the socket is writable. This cannot happen inside xs_sendpages() as the test for "is socket writable" is different between TCP and UDP. With this change, the tight loop retrying xs_sendpages() becomes a loop which only retries every 250ms, and so will not trigger a soft-lockup warning. It is possible that the write did fail because the queue was too full and by the time xs_sendpages() completed, the queue was writable again. In this case an extra 250ms delay is inserted that isn't really needed. This circumstance suggests a degree of congestion so a delay is not necessarily a bad thing, and it can only cause a single 250ms delay, not a series of them. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-27 11:16:56 -04:00
Dan Carpenter	e11f40b935	lwtunnel: use kfree_skb() instead of vanilla kfree() kfree_skb() is correct here. Fixes: `ffce41962e` ('lwtunnel: support dst output redirect function') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:24:59 -07:00
Eric Dumazet	99d7662a04	tcp: tso: allow deferring under reordering state While doing experiments with reordering resilience, we found linux senders were not able to send at full speed under reordering, because every incoming SACK was releasing one MSS. This patch removes the limitation, as we did for CWR state in commit `a0ea700e40` ("tcp: tso: allow CA_CWR state in tcp_tso_should_defer()") Neal Cardwell had a concern about limited transmit so Yuchung conducted experiments on GFE and found nothing worth adding an extra check on fast path : if (icsk->icsk_ca_state == TCP_CA_Disorder && tcp_sk(sk)->reordering == sysctl_tcp_reordering) goto send_now; Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:23:20 -07:00
Martin KaFai Lau	8d6c31bf57	ipv6: Avoid rt6_probe() taking writer lock in the fast path The patch checks neigh->nud_state before acquiring the writer lock. Note that rt6_probe() is only used in CONFIG_IPV6_ROUTER_PREF. 40 udpflood processes and a /64 gateway route are used. The gateway has NUD_PERMANENT. Each of them is run for 30s. At the end, the total number of finished sendto(): Before: 55M After: 95M Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Julian Anastasov <ja@ssi.bg> CC: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:08:25 -07:00
Martin KaFai Lau	990edb428c	ipv6: Re-arrange code in rt6_probe() It is a prep work for the next patch to remove write_lock from rt6_probe(). 1. Reduce the number of if(neigh) check. From 4 to 1. 2. Bring the write_(un)lock() closer to the operations that the lock is protecting. Hopefully, the above make rt6_probe() more readable. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Julian Anastasov <ja@ssi.bg> Cc: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:08:25 -07:00
Sabrina Dubroca	dfbafc9953	tcp: fix recv with flags MSG_WAITALL \| MSG_PEEK Currently, tcp_recvmsg enters a busy loop in sk_wait_data if called with flags = MSG_WAITALL \| MSG_PEEK. sk_wait_data waits for sk_receive_queue not empty, but in this case, the receive queue is not empty, but does not contain any skb that we can use. Add a "last skb seen on receive queue" argument to sk_wait_data, so that it sleeps until the receive queue has new skbs. Link: https://bugzilla.kernel.org/show_bug.cgi?id=99461 Link: https://sourceware.org/bugzilla/show_bug.cgi?id=18493 Link: https://bugzilla.redhat.com/show_bug.cgi?id=1205258 Reported-by: Enrico Scholz <rh-bugzilla@ensc.de> Reported-by: Dan Searle <dan@censornet.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:06:53 -07:00
Nicolas Dichtel	5a6228a0b4	lwtunnel: change prototype of lwtunnel_state_get() It saves some lines and simplify a bit the code when the state is returning by this function. It's also useful to handle a NULL entry. To avoid too long lines, I've also renamed lwtunnel_state_get() and lwtunnel_state_put() to lwtstate_get() and lwtstate_put(). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:02:49 -07:00
Nicolas Dichtel	d943659508	ipv6: copy lwtstate in ip6_rt_copy_init() We need to copy this field (ip6_rt_cache_alloc() and ip6_rt_pcpu_alloc() use ip6_rt_copy_init() to build a dst). CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: `19e42e4515` ("ipv6: support for fib route lwtunnel encap attributes") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:02:49 -07:00
Nicolas Dichtel	6673a9f4e3	ipv6: use lwtunnel_output6() only if flag redirect is set This function make sense only when LWTUNNEL_STATE_OUTPUT_REDIRECT is set. The check is already done in IPv4. CC: Thomas Graf <tgraf@suug.ch> CC: Roopa Prabhu <roopa@cumulusnetworks.com> Fixes: `74a0f2fe8e` ("ipv6: rt6_info output redirect to tunnel output") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 01:00:51 -07:00
subashab@codeaurora.org	b469139e81	dev: Spelling fix in comments Fix the following typo - unchainged -> unchanged Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-27 00:54:57 -07:00
David S. Miller	03de104f7b	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2015-07-23 Here's another one-patch pull request for 4.2 which targets a potential NULL pointer dereference in the LE Security Manager code that can be triggered by using older user space tools. The issue has been there since 4.0 so there's the appropriate "Cc: stable" in place. Let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:53:08 -07:00
Satish Ashok	949f1e39a6	bridge: mdb: notify on router port add and del Send notifications on router port add and del/expire, re-use the already existing MDBA_ROUTER and send NEWMDB/DELMDB netlink notifications respectively. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:19:58 -07:00
Thomas Graf	597798e438	openvswitch: Retrieve tunnel metadata when receiving from vport-netdev Retrieve the tunnel metadata for packets received by a net_device and provide it to ovs_vport_receive() for flow key extraction. [This hunk was in the GRE patch in the initial series and missed the cut for the initial submission for merging.] Fixes: `614732eaa1` ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:19:11 -07:00
Nikolay Aleksandrov	caaecdd3d3	inet: frags: remove INET_FRAG_EVICTED and use list_evictor for the test We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags race conditions with the evictor and use a participation test for the evictor list, when we're at that point (after inet_frag_kill) in the timer there're 2 possible cases: 1. The evictor added the entry to its evictor list while the timer was waiting for the chainlock or 2. The timer unchained the entry and the evictor won't see it In both cases we should be able to see list_evictor correctly due to the sync on the chainlock. Joint work with Florian Westphal. Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:00:15 -07:00
Florian Westphal	5719b296fb	inet: frag: don't wait for timer deletion when evicting Frank reports 'NMI watchdog: BUG: soft lockup' errors when load is high. Instead of (potentially) unbounded restarts of the eviction process, just skip to the next entry. One caveat is that, when a netns is exiting, a timer may still be running by the time inet_evict_bucket returns. We use the frag memory accounting to wait for outstanding timers, so that when we free the percpu counter we can be sure no running timer will trip over it. Reported-and-tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:00:14 -07:00
Florian Westphal	0e60d245a0	inet: frag: change *_frag_mem_limit functions to take netns_frags as argument Followup patch will call it after inet_frag_queue was freed, so q->net doesn't work anymore (but netf = q->net; free(q); mem_limit(netf) would). Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:00:14 -07:00
Florian Westphal	d1fe19444d	inet: frag: don't re-use chainlist for evictor commit `65ba1f1ec0` ("inet: frags: fix a race between inet_evict_bucket and inet_frag_kill") describes the bug, but the fix doesn't work reliably. Problem is that ->flags member can be set on other cpu without chainlock being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED bit after we put the element on the evictor private list. We can crash when walking the 'private' evictor list since an element can be deleted from list underneath the evictor. Join work with Nikolay Alexandrov. Fixes: `b13d3cbfb8` ("inet: frag: move eviction of queues to work queue") Reported-by: Johan Schuijt <johan@transip.nl> Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Alexandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 21:00:14 -07:00
Nicolas Dichtel	2661371ace	openvswitch: fix compilation when vxlan is a module With CONFIG_VXLAN=m and CONFIG_OPENVSWITCH=y, there was the following compilation error: LD init/built-in.o net/built-in.o: In function `vxlan_tnl_create': .../net/openvswitch/vport-netdev.c:322: undefined reference to `vxlan_dev_create' make: *** [vmlinux] Error 1 CC: Thomas Graf <tgraf@suug.ch> Fixes: `614732eaa1` ("openvswitch: Use regular VXLAN net_device device") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 20:56:55 -07:00
Julian Anastasov	88f6432036	ipv4: be more aggressive when probing alternative gateways Currently, we do not notice if new alternative gateways are added. We can do it by checking for present neigh entry. Also, gateways that are currently probed (NUD_INCOMPLETE) can be skipped from round-robin probing. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 20:56:27 -07:00
Wei-Chun Chao	48fb6b5545	ipv6: fix crash over flow-based vxlan device Similar check was added in ip_rcv but not in ipv6_rcv. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff81734e0a>] ipv6_rcv+0xfa/0x500 Call Trace: [<ffffffff816c9786>] ? ip_rcv+0x296/0x400 [<ffffffff817732d2>] ? packet_rcv+0x52/0x410 [<ffffffff8168e99f>] __netif_receive_skb_core+0x63f/0x9a0 [<ffffffffc02b34a0>] ? br_handle_frame_finish+0x580/0x580 [bridge] [<ffffffff8109912c>] ? update_rq_clock.part.81+0x1c/0x40 [<ffffffff8168ed18>] __netif_receive_skb+0x18/0x60 [<ffffffff8168fa1f>] process_backlog+0x9f/0x150 Fixes: `ee122c79d4` (vxlan: Flow based tunneling) Signed-off-by: Wei-Chun Chao <weichunc@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 20:54:56 -07:00
Daniel Borkmann	81296fc673	net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the (as per RFC deprecated) SCTP_SNDRCV via commit `bbbea41d5e` ("net: sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1]. Imho, it was not a good idea, and we should just revert that message for a couple of reasons: 1) It's uapi and therefore set in stone forever. 2) To be able to run on older and newer kernels, an SCTP application would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/ SCTP_RCVINFO support, so that on older kernels, it can make use of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO. In my (limited) experience, a lot of SCTP appliances are migrating to newer kernels only ve(ee)ry slowly. 3) Some people don't have the chance to change their applications, f.e. due to proprietary legacy stuff. So, they'll hit this warning in fast path and are stuck with older kernels. But i.e. due to point 1) I really fail to see the benefit of a warning. So just revert that for now, the issue was reported up Jamal. [1] http://thread.gmane.org/gmane.linux.network/321960/ Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Tuexen <tuexen@fh-muenster.de> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 16:32:41 -07:00
Jon Paul Maloy	cda3696d3d	tipc: clean up socket layer message reception When a message is received in a socket, one of the call chains tipc_sk_rcv()->tipc_sk_enqueue()->filter_rcv()(->tipc_sk_proto_rcv()) or tipc_sk_backlog_rcv()->filter_rcv()(->tipc_sk_proto_rcv()) are followed. At each of these levels we may encounter situations where the message may need to be rejected, or a new message produced for transfer back to the sender. Despite recent improvements, the current code for doing this is perceived as awkward and hard to follow. Leveraging the two previous commits in this series, we now introduce a more uniform handling of such situations. We let each of the functions in the chain itself produce/reverse the message to be returned to the sender, but also perform the actual forwarding. This simplifies the necessary logics within each function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 16:31:50 -07:00
Jon Paul Maloy	bcd3ffd4f6	tipc: introduce new tipc_sk_respond() function Currently, we use the code sequence if (msg_reverse()) tipc_link_xmit_skb() at numerous locations in socket.c. The preparation of arguments for these calls, as well as the sequence itself, makes the code unecessarily complex. In this commit, we introduce a new function, tipc_sk_respond(), that performs this call combination. We also replace some, but not yet all, of these explicit call sequences with calls to the new function. Notably, we let the function tipc_sk_proto_rcv() use the new function to directly send out PROBE_REPLY messages, instead of deferring this to the calling tipc_sk_rcv() function, as we do now. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 16:31:50 -07:00
Jon Paul Maloy	29042e19f2	tipc: let function tipc_msg_reverse() expand header when needed The shortest TIPC message header, for cluster local CONNECTED messages, is 24 bytes long. With this format, the fields "dest_node" and "orig_node" are optimized away, since they in reality are redundant in this particular case. However, the absence of these fields leads to code inconsistencies that are difficult to handle in some cases, especially when we need to reverse or reject messages at the socket layer. In this commit, we concentrate the handling of the absent fields to one place, by letting the function tipc_msg_reverse() reallocate the buffer and expand the header to 32 bytes when necessary. This means that the socket code now can assume that the two previously absent fields are present in the header when a message needs to be rejected. This opens up for some further simplifications of the socket code. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 16:31:50 -07:00
Nikolay Aleksandrov	963ad94853	bridge: netlink: fix slave_changelink/br_setport race conditions Since slave_changelink support was added there have been a few race conditions when using br_setport() since some of the port functions it uses require the bridge lock. It is very easy to trigger a lockup due to some internal spin_lock() usage without bh disabled, also it's possible to get the bridge into an inconsistent state. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `3ac636b859` ("bridge: implement rtnl_link_ops->slave_changelink") Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-26 16:27:22 -07:00
David S. Miller	485164381c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains ten Netfilter/IPVS fixes, they are: 1) Address refcount leak when creating an expectation from the ctnetlink interface. 2) Fix bug splat in the IDLETIMER target related to sysfs, from Dmitry Torokhov. 3) Resolve panic for unreachable route in IPVS with locally generated traffic in the output path, from Alex Gartrell. 4) Fix wrong source address in rare cases for tunneled traffic in IPVS, from Julian Anastasov. 5) Fix crash if scheduler is changed via ipvsadm -E, again from Julian. 6) Make sure skb->sk is unset for forwarded traffic through IPVS, again from Alex Gartrell. 7) Fix crash with IPVS sync protocol v0 and FTP, from Julian. 8) Reset sender cpu for forwarded traffic in IPVS, also from Julian. 9) Allocate template conntracks through kmalloc() to resolve netns dependency problems with the conntrack kmem_cache. 10) Fix zones with expectations that clash using the same tuple, from Joe Stringer. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-25 00:18:10 -07:00
Konstantin Khlebnikov	cc9f4daa63	cgroup: net_cls: fix false-positive "suspicious RCU usage" In dev_queue_xmit() net_cls protected with rcu-bh. [ 270.730026] =============================== [ 270.730029] [ INFO: suspicious RCU usage. ] [ 270.730033] 4.2.0-rc3+ #2 Not tainted [ 270.730036] ------------------------------- [ 270.730040] include/linux/cgroup.h:353 suspicious rcu_dereference_check() usage! [ 270.730041] other info that might help us debug this: [ 270.730043] rcu_scheduler_active = 1, debug_locks = 1 [ 270.730045] 2 locks held by dhclient/748: [ 270.730046] #0: (rcu_read_lock_bh){......}, at: [<ffffffff81682b70>] __dev_queue_xmit+0x50/0x960 [ 270.730085] #1: (&qdisc_tx_lock){+.....}, at: [<ffffffff81682d60>] __dev_queue_xmit+0x240/0x960 [ 270.730090] stack backtrace: [ 270.730096] CPU: 0 PID: 748 Comm: dhclient Not tainted 4.2.0-rc3+ #2 [ 270.730098] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011 [ 270.730100] 0000000000000001 ffff8800bafeba58 ffffffff817ad487 0000000000000007 [ 270.730103] ffff880232a0a780 ffff8800bafeba88 ffffffff810ca4f2 ffff88022fb23e00 [ 270.730105] ffff880232a0a780 ffff8800bafebb68 ffff8800bafebb68 ffff8800bafebaa8 [ 270.730108] Call Trace: [ 270.730121] [<ffffffff817ad487>] dump_stack+0x4c/0x65 [ 270.730148] [<ffffffff810ca4f2>] lockdep_rcu_suspicious+0xe2/0x120 [ 270.730153] [<ffffffff816a62d2>] task_cls_state+0x92/0xa0 [ 270.730158] [<ffffffffa00b534f>] cls_cgroup_classify+0x4f/0x120 [cls_cgroup] [ 270.730164] [<ffffffff816aac74>] tc_classify_compat+0x74/0xc0 [ 270.730166] [<ffffffff816ab573>] tc_classify+0x33/0x90 [ 270.730170] [<ffffffffa00bcb0a>] htb_enqueue+0xaa/0x4a0 [sch_htb] [ 270.730172] [<ffffffff81682e26>] __dev_queue_xmit+0x306/0x960 [ 270.730174] [<ffffffff81682b70>] ? __dev_queue_xmit+0x50/0x960 [ 270.730176] [<ffffffff816834a3>] dev_queue_xmit_sk+0x13/0x20 [ 270.730185] [<ffffffff81787770>] dev_queue_xmit+0x10/0x20 [ 270.730187] [<ffffffff8178b91c>] packet_snd.isra.62+0x54c/0x760 [ 270.730190] [<ffffffff8178be25>] packet_sendmsg+0x2f5/0x3f0 [ 270.730203] [<ffffffff81665245>] ? sock_def_readable+0x5/0x190 [ 270.730210] [<ffffffff817b64bb>] ? _raw_spin_unlock+0x2b/0x40 [ 270.730216] [<ffffffff8173bcbc>] ? unix_dgram_sendmsg+0x5cc/0x640 [ 270.730219] [<ffffffff8165f367>] sock_sendmsg+0x47/0x50 [ 270.730221] [<ffffffff8165f42f>] sock_write_iter+0x7f/0xd0 [ 270.730232] [<ffffffff811fd4c7>] __vfs_write+0xa7/0xf0 [ 270.730234] [<ffffffff811fe5b8>] vfs_write+0xb8/0x190 [ 270.730236] [<ffffffff811fe8c2>] SyS_write+0x52/0xb0 [ 270.730239] [<ffffffff817b6bae>] entry_SYSCALL_64_fastpath+0x12/0x76 Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-25 00:13:18 -07:00
WANG Cong	77e62da6e6	sch_choke: drop all packets in queue during reset Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-24 22:57:15 -07:00
WANG Cong	fe6bea7f1f	sch_plug: purge buffered packets during reset Otherwise the skbuff related structures are not correctly refcount'ed. Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-24 22:57:14 -07:00
Rosen, Rami	6ca91c6040	bridge: Fix setting a flag in br_fill_ifvlaninfo_range(). This patch fixes setting of vinfo.flags in the br_fill_ifvlaninfo_range() method. The assignment of vinfo.flags &= ~BRIDGE_VLAN_INFO_RANGE_BEGIN has no effect and is unneeded, as vinfo.flags value is overriden by the immediately following vinfo.flags = flags \| BRIDGE_VLAN_INFO_RANGE_END assignement. Signed-off-by: Rami Rosen <rami.rosen@intel.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-24 22:56:22 -07:00
Julian Anastasov	2392debc2b	ipv4: consider TOS in fib_select_default fib_select_default considers alternative routes only when res->fi is for the first alias in res->fa_head. In the common case this can happen only when the initial lookup matches the first alias with highest TOS value. This prevents the alternative routes to require specific TOS. This patch solves the problem as follows: - routes that require specific TOS should be returned by fib_select_default only when TOS matches, as already done in fib_table_lookup. This rule implies that depending on the TOS we can have many different lists of alternative gateways and we have to keep the last used gateway (fa_default) in first alias for the TOS instead of using single tb_default value. - as the aliases are ordered by many keys (TOS desc, fib_priority asc), we restrict the possible results to routes with matching TOS and lowest metric (fib_priority) and routes that match any TOS, again with lowest metric. For example, packet with TOS 8 can not use gw3 (not lowest metric), gw4 (different TOS) and gw6 (not lowest metric), all other gateways can be used: tos 8 via gw1 metric 2 <--- res->fa_head and res->fi tos 8 via gw2 metric 2 tos 8 via gw3 metric 3 tos 4 via gw4 tos 0 via gw5 tos 0 via gw6 metric 1 Reported-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-24 22:46:11 -07:00
Julian Anastasov	18a912e9a8	ipv4: fib_select_default should match the prefix fib_trie starting from 4.1 can link fib aliases from different prefixes in same list. Make sure the alternative gateways are in same table and for same prefix (0) by checking tb_id and fa_slen. Fixes: `79e5ad2ceb` ("fib_trie: Remove leaf_info") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-24 22:46:09 -07:00
Linus Torvalds	d1a343a023	virtio/vhost: fixes for 4.2 Bugfixes and documentation fixes. Igor's patch that allows users to tweak memory table size is borderline, but it does fix known crashes, so I merged it. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJVpBzpAAoJECgfDbjSjVRpQXEIAMEqetqPuRduynjIw2HNktle fe/UUhvipwTsAM4R2pmcEl5tW04A/M54RkXN4iVy0rPshAfG3Fh4XTjLmzSGU0fI KD6qX8/Zc/+DUWnfe3aUC6jOOrjb7c4xRKOlQ9X8lZgi2M6AzrPeoZHFTtbX34CU 2kcnv5Sb1SaI/t2SaCc2CaKilQalEOcd59gGjje2QXjidZnIVHwONrOyjBiINUy6 GzfTgvAje8ZiG6951W3HDwwSfcqXezin27LJqnZgaLCwTCKt2gdQ2MAKjrfP2aVF +gX2B4XxcFLutMVx/obsjvA1ceipubyUauRLB3mnO3P5VOj1qbofa2lj4pzQ80k= =EKPr -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio/vhost fixes from Michael Tsirkin: "Bugfixes and documentation fixes. Igor's patch that allows users to tweak memory table size is borderline, but it does fix known crashes, so I merged it" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost: add max_mem_regions module parameter vhost: extend memory regions allocation to vmalloc 9p/trans_virtio: reset virtio device on remove virtio/s390: rename drivers/s390/kvm -> drivers/s390/virtio MAINTAINERS: separate section for s390 virtio drivers virtio: define virtio_pci_cfg_cap in header. virtio: Fix typecast of pointer in vring_init() virtio scsi: fix unused variable warning vhost: use binary search instead of linear in find_region() virtio_net: document VIRTIO_NET_CTRL_GUEST_OFFLOADS	2015-07-23 13:07:04 -07:00
Jakub Pawlowski	9a0a8a8e85	Bluetooth: Move IRK checking logic in preparation to new connect method Move IRK checking logic in preparation to new connect method. Also make sure that MGMT_STATUS_INVALID_PARAMS is returned when non identity address is passed to ADD_DEVICE. Right now MGMT_STATUS_FAILED is returned, which might be misleading. Signed-off-by: Jakub Pawlowski <jpawlowski@google.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>	2015-07-23 17:10:51 +02:00
Dean Jenkins	e432c72c46	Bluetooth: __l2cap_wait_ack() add defensive timeout Add a timeout to prevent the do while loop running in an infinite loop. This ensures that the channel will be instructed to close within 10 seconds so prevents l2cap_sock_shutdown() getting stuck forever. Returns -ENOLINK when the timeout is reached. The channel will be subequently closed and not all data will be ACK'ed. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:51 +02:00
Dean Jenkins	cb02a25583	Bluetooth: __l2cap_wait_ack() use msecs_to_jiffies() Use msecs_to_jiffies() instead of using HZ so that it is easier to specify the time in milliseconds. Also add a #define L2CAP_WAIT_ACK_POLL_PERIOD to specify the 200ms polling period so that it is defined in a single place. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:51 +02:00
Dean Jenkins	451e4c6c6b	Bluetooth: Add BT_DBG to l2cap_sock_shutdown() Add helpful BT_DBG debug to l2cap_sock_shutdown() and __l2cap_wait_ack() so that the code flow can be analysed. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:51 +02:00
Dean Jenkins	f65468f6e2	Bluetooth: Make __l2cap_wait_ack more efficient Use chan->state instead of chan->conn because waiting for ACK's is only possible in the BT_CONNECTED state. Also avoids reference to the conn structure so makes locking easier. Only call __l2cap_wait_ack() when the needed condition of chan->unacked_frames > 0 && chan->state == BT_CONNECTED is true and convert the while loop to a do while loop. __l2cap_wait_ack() change the function prototype to pass in the chan variable as chan is already available in the calling function l2cap_sock_shutdown(). Avoids locking issues. Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:51 +02:00
Dean Jenkins	2baea85dec	Bluetooth: L2CAP ERTM shutdown protect sk and chan During execution of l2cap_sock_shutdown() which might sleep, the sk and chan structures can be in an unlocked condition which potentially allows the structures to be freed by other running threads. Therefore, there is a possibility of a malfunction or memory reuse after being freed. Keep the sk and chan structures alive during the execution of l2cap_sock_shutdown() by using their respective hold and put functions. This allows the structures to be freeable at the end of l2cap_sock_shutdown(). Signed-off-by: Kautuk Consul <Kautuk_Consul@mentor.com> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:50 +02:00
Varka Bhadram	d10270ce94	mac802154: fix ieee802154_rx handling Instead of passing ieee802154_hw pointer to ieee802154_rx, we can directly pass the ieee802154_local pointer. Signed-off-by: Varka Bhadram <varkabhadram@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:50 +02:00
Varka Bhadram	729a8989b3	mac802154: do not export ieee802154_rx() Right now there are no other users for ieee802154_rx() in kernel. So lets remove EXPORT_SYMBOL() for this. Also it moves the function prototype from global header file to local header file. Signed-off-by: Varka Bhadram <varkabhadram@gmail.com> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:50 +02:00
Alexander Aring	3cf24cf8c3	mac802154: cfg: add suspend and resume callbacks This patch introduces suspend and resume callbacks to mac802154. When doing suspend we calling the stop driver callback which should stop the receiving of frames. A transceiver should go into low-power mode then. Calling resume will call the start driver callback, which starts receiving again and allow to transmit frames. This was tested only with the fakelb driver and a qemu vm by doing the following commands: echo "devices" > /sys/power/pm_test echo "freeze" > /sys/power/state while doing some high traffic between two fakelb phys. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:49 +02:00
Varka Bhadram	a6cb869b3b	cfg802154: add PM hooks This patch help to implement suspend/resume in mac802154, these hooks will be run before the device is suspended and after it resumes. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:49 +02:00
Alexander Aring	c4227c8a62	mac802154: util: add stop_device utility function This patch adds ieee802154_stop_device for preparing a utility function to stop the ieee802154 device. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:49 +02:00
Varka Bhadram	927e031c7c	mac802154: remove unused macro This patch removes the unused macro which was removed with the rework of linux-wpan kernel. Signed-off-by: Varka Bhadram <varkab@cdac.in> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:48 +02:00
Varka Bhadram	8f451829dd	mac802154: use WARN_ON() macro This patch will generate the warning if the required driver ops were not defined. Also it removes unnecessary debug message. Signed-off-by: Varka Bhadram <varkab@cdac.in> Acked-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:48 +02:00
Alexander Aring	301d7d42a2	6lowpan: add request for ipv6 module The iphc module depends on CONFIG_IPV6, because it's not very useful to build the module without IPv6 support. Recently an user reported about issues for setting an IPv6 address to a 6LoWPAN interface. The issues was solved by modprobe the ipv6 module before. To avoid such user issues we try to request the ipv6 module when the 6LoWPAN module is loaded. Signed-off-by: Alexander Aring <alex.aring@gmail.com> Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:48 +02:00
Alexander Aring	d77b4852b4	mac802154: add llsec address update workaround This patch adds a workaround for using the new nl802154 netlink interface with the old ieee802154 netlink interface togehter. The nl802154 currently supports no access for llsec layer, currently there are users outside which are using both interfaces at the same time. This patch adds a necessary call when addresses are updated. Reported-by: Simon Vincent <simon.vincent@xsilon.com> Suggested-by: Phoebe Buckheister <phoebe.buckheister@itwm.fraunhofer.de> Signed-off-by: Alexander Aring <alex.aring@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-07-23 17:10:48 +02:00
Johan Hedberg	25ba265390	Bluetooth: Fix NULL pointer dereference in smp_conn_security The l2cap_conn->smp pointer may be NULL for various valid reasons where SMP has failed to initialize properly. One such scenario is when crypto support is missing, another when the adapter has been powered on through a legacy method. The smp_conn_security() function should have the appropriate check for this situation to avoid NULL pointer dereferences. Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org # 4.0+	2015-07-23 16:41:24 +02:00
Pablo Neira Ayuso	3bbd14e0a2	netfilter: rename local nf_hook_list to hook_list `085db2c045` ("netfilter: Per network namespace netfilter hooks.") introduced a new nf_hook_list that is global, so let's avoid this overlap. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-07-23 16:18:35 +02:00
Pablo Neira Ayuso	7181ebafd4	netfilter: fix possible removal of wrong hook nf_unregister_net_hook() uses the nf_hook_ops fields as tuple to look up for the corresponding hook in the list. However, we may have two hooks with exactly the same configuration. This shouldn't be a problem for nftables since every new chain has an unique priv field set, but this may still cause us problems in the future, so better address this problem now by keeping a reference to the original nf_hook_ops structure to make sure we delete the right hook from nf_unregister_net_hook(). Fixes: `085db2c045` ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-07-23 16:18:34 +02:00
Pablo Neira Ayuso	2385eb0c5f	netfilter: nf_queue: fix nf_queue_nf_hook_drop() This function reacquires the rtnl_lock() which is already held by nf_unregister_hook(). This can be triggered via: modprobe nf_conntrack_ipv4 && rmmod nf_conntrack_ipv4 [ 720.628746] INFO: task rmmod:3578 blocked for more than 120 seconds. [ 720.628749] Not tainted 4.2.0-rc2+ #113 [ 720.628752] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 720.628754] rmmod D ffff8800ca46fd58 0 3578 3571 0x00000080 [...] [ 720.628783] Call Trace: [ 720.628790] [<ffffffff8152ea0b>] schedule+0x6b/0x90 [ 720.628795] [<ffffffff8152ecb3>] schedule_preempt_disabled+0x13/0x20 [ 720.628799] [<ffffffff8152ff55>] mutex_lock_nested+0x1f5/0x380 [ 720.628803] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628807] [<ffffffff81462622>] ? rtnl_lock+0x12/0x20 [ 720.628812] [<ffffffff81462622>] rtnl_lock+0x12/0x20 [ 720.628817] [<ffffffff8148ab25>] nf_queue_nf_hook_drop+0x15/0x160 [ 720.628825] [<ffffffff81488d48>] nf_unregister_net_hook+0x168/0x190 [ 720.628831] [<ffffffff81488e24>] nf_unregister_hook+0x64/0x80 [ 720.628837] [<ffffffff81488e60>] nf_unregister_hooks+0x20/0x30 [...] Moreover, nf_unregister_net_hook() should only destroy the queue for this netns, not for every netns. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Fixes: `085db2c045` ("netfilter: Per network namespace netfilter hooks.") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-07-23 16:17:58 +02:00
Thomas Graf	045a0fa0c5	ip_tunnel: Call ip_tunnel_core_init() from inet_init() Convert the module_init() to a invocation from inet_init() since ip_tunnel_core is part of the INET built-in. Fixes: `3093fbe7ff` ("route: Per route IP tunnel metadata via lightweight tunnel") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-23 01:28:21 -07:00
David S. Miller	c5e40ee287	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/bridge/br_mdb.c br_mdb.c conflict was a function call being removed to fix a bug in 'net' but whose signature was changed in 'net-next'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-23 00:41:16 -07:00
Linus Torvalds	c5dfd654d0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Don't use shared bluetooth antenna in iwlwifi driver for management frames, from Emmanuel Grumbach. 2) Fix device ID check in ath9k driver, from Felix Fietkau. 3) Off by one in xen-netback BUG checks, from Dan Carpenter. 4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann. 5) Fix races in setting peeked bit flag in SKBs during datagram receive. If it's shared we have to clone it otherwise the value can easily be corrupted. Fix from Herbert Xu. 6) Revert fec clock handling change, causes regressions. From Fabio Estevam. 7) Fix use after free in fq_codel and sfq packet schedulers, from WANG Cong. 8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.) from WANG Cong and Konstantin Khlebnikov. 9) Memory leak in act_bpf packet action, from Alexei Starovoitov. 10) ARM bpf JIT bug fixes from Nicolas Schichan. 11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from Michael S Tsirkin. 12) Destruction of bond with different ARP header types not handled correctly, fix from Nikolay Aleksandrov. 13) Revert GRO receive support in ipv6 SIT tunnel driver, causes regressions because the GRO packets created cannot be processed properly on the GSO side if we forward the frame. From Herbert Xu. 14) TCCR update race and other fixes to ravb driver from Sergei Shtylyov. 15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet. 16) Fix panics on packet scheduler filter replace, from Daniel Borkmann. 17) Make sure AF_PACKET sees properly IP headers in defragmented frames (via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee. 18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian Westphal. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits) ravb: fix ring memory allocation net: phy: dp83867: Fix warning check for setting the internal delay openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes netlink: don't hold mutex in rcu callback when releasing mmapd ring ARM: net: fix vlan access instructions in ARM JIT. ARM: net: handle negative offsets in BPF JIT. ARM: net: fix condition for load_order > 0 when translating load instructions. tcp: suppress a division by zero warning drivers: net: cpsw: remove tx event processing in rx napi poll inet: frags: fix defragmented packet's IP header for af_packet net: mvneta: fix refilling for Rx DMA buffers stmmac: fix setting of driver data in stmmac_dvr_probe sched: cls_flow: fix panic on filter replace sched: cls_flower: fix panic on filter replace sched: cls_bpf: fix panic on filter replace net/mdio: fix mdio_bus_match for c45 PHY net: ratelimit warnings about dst entry refcount underflow or overflow caif: fix leaks and race in caif_queue_rcv_skb() qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355 ravb: fix race updating TCCR ...	2015-07-22 14:45:25 -07:00
Trond Myklebust	1980bd4d82	SUNRPC: xprt_complete_bc_request must also decrement the free slot count Calling xprt_complete_bc_request() effectively causes the slot to be allocated, so it needs to decrement the backchannel free slot count as well. Fixes: `0d2a970d0a` ("SUNRPC: Fix a backchannel race") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-22 17:10:50 -04:00
Trond Myklebust	68514471ce	SUNRPC: Fix a backchannel deadlock xprt_alloc_bc_request() cannot call xprt_free_bc_request() without deadlocking, since it already holds the xprt->bc_pa_lock. Reported-by: Chuck Lever <chuck.lever@oracle.com> Fixes: `0d2a970d0a` ("SUNRPC: Fix a backchannel race") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-22 16:33:59 -04:00
Erik Kline	3985e8a361	ipv6: sysctl to restrict candidate source addresses Per RFC 6724, section 4, "Candidate Source Addresses": It is RECOMMENDED that the candidate source addresses be the set of unicast addresses assigned to the interface that will be used to send to the destination (the "outgoing" interface). Add a sysctl to enable this behaviour. Signed-off-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-22 10:54:11 -07:00
Roopa Prabhu	de18547d48	mpls_iptunnel: fix sparse warn: remove incorrect rcu_dereference fix for: net/mpls/mpls_iptunnel.c:73:19: sparse: incompatible types in comparison expression (different address spaces) remove incorrect rcu_dereference possibly left over from earlier revisions of the code. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-22 10:50:23 -07:00
Joe Stringer	4b31814d20	netfilter: nf_conntrack: Support expectations in different zones When zones were originally introduced, the expectation functions were all extended to perform lookup using the zone. However, insertion was not modified to check the zone. This means that two expectations which are intended to apply for different connections that have the same tuple but exist in different zones cannot both be tracked. Fixes: `5d0aa2ccd4` (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-22 17:00:47 +02:00
Rick Jones	b56ea2985d	net: track success and failure of TCP PMTU probing Track success and failure of TCP PMTU probing. Signed-off-by: Rick Jones <rick.jones2@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 22:36:33 -07:00
Roopa Prabhu	01faef2ceb	mpls: make RTA_OIF optional If user did not specify an oif, try and get it from the via address. If failed to get device, return with -ENODEV. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 22:27:19 -07:00
Chris J Arges	bac541e463	openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes Some architectures like POWER can have a NUMA node_possible_map that contains sparse entries. This causes memory corruption with openvswitch since it allocates flow_cache with a multiple of num_possible_nodes() and assumes the node variable returned by for_each_node will index into flow->stats[node]. Use nr_node_ids to allocate a maximal sparse array instead of num_possible_nodes(). The crash was noticed after `3af229f2` was applied as it changed the node_possible_map to match node_online_map on boot. Fixes: `3af229f207` Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 22:26:03 -07:00
Florian Westphal	0470eb99b4	netlink: don't hold mutex in rcu callback when releasing mmapd ring Kirill A. Shutemov says: This simple test-case trigers few locking asserts in kernel: int main(int argc, char *argv) { unsigned int block_size = 16 4096; struct nl_mmap_req req = { .nm_block_size = block_size, .nm_block_nr = 64, .nm_frame_size = 16384, .nm_frame_nr = 64 * block_size / 16384, }; unsigned int ring_size; int fd; fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC); if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0) exit(1); if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0) exit(1); ring_size = req.nm_block_nr * req.nm_block_size; mmap(NULL, 2 * ring_size, PROT_READ\|PROT_WRITE, MAP_SHARED, fd, 0); return 0; } +++ exited with 0 +++ BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220 #1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70 #2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0 Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20 CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014 ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102 0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002 ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98 Call Trace: <IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b [<ffffffff81085a9d>] ___might_sleep+0x16d/0x270 [<ffffffff81085bed>] __might_sleep+0x4d/0x90 [<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430 [<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20 [<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350 [<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70 [<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150 [<ffffffff817e484d>] __sk_free+0x1d/0x160 [<ffffffff817e49a9>] sk_free+0x19/0x20 [..] Cong Wang says: We can't hold mutex lock in a rcu callback, [..] Thomas Graf says: The socket should be dead at this point. It might be simpler to add a netlink_release_ring() function which doesn't require locking at all. Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 22:22:56 -07:00
Eric Dumazet	89e478a2aa	tcp: suppress a division by zero warning Andrew Morton reported following warning on one ARM build with gcc-4.4 : net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc': net/ipv4/inet_hashtables.c:617: warning: division by zero Even guarded with a test on sizeof(spinlock_t), compiler does not like current construct on a !CONFIG_SMP build. Remove the warning by using a temporary variable. Fixes: `095dc8e0c3` ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 22:13:13 -07:00
Jon Paul Maloy	16040894b2	tipc: fix compatibility bug In commit `d999297c3d` ("tipc: reduce locking scope during packet reception") we introduced a new function tipc_link_proto_rcv(). This function contains a bug, so that it sometimes by error sends out a non-zero link priority value in created protocol messages. The bug may lead to an extra link reset at initial link establising with older nodes. This will never happen more than once, whereafter the link will work as intended. We fix this bug in this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 16:23:50 -07:00
Mathias Krause	e181a54304	net: #ifdefify sk_classid member of struct sock The sk_classid member is only required when CONFIG_CGROUP_NET_CLASSID is enabled. #ifdefify it to reduce the size of struct sock on 32 bit systems, at least. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 16:04:30 -07:00
Thomas Graf	614732eaa1	openvswitch: Use regular VXLAN net_device device This gets rid of all OVS specific VXLAN code in the receive and transmit path by using a VXLAN net_device to represent the vport. Only a small shim layer remains which takes care of handling the VXLAN specific OVS Netlink configuration. Unexports vxlan_sock_add(), vxlan_sock_release(), vxlan_xmit_skb() since they are no longer needed. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:07 -07:00
Thomas Graf	c9db965c52	openvswitch: Abstract vport name through ovs_vport_name() This allows to get rid of the get_name() vport ops later on. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:07 -07:00
Thomas Graf	be4ace6e6b	openvswitch: Move dev pointer into vport itself This is the first step in representing all OVS vports as regular struct net_devices. Move the net_device pointer into the vport structure itself to get rid of struct vport_netdev. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:07 -07:00
Thomas Graf	34ae932a40	openvswitch: Make tunnel set action attach a metadata dst Utilize the new metadata dst to attach encapsulation instructions to the skb. The existing egress_tun_info via the OVS_CB() is left in place until all tunnel vports have been converted to the new method. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:06 -07:00
Thomas Graf	e7030878fc	fib: Add fib rule match on tunnel id This add the ability to select a routing table based on the tunnel id which allows to maintain separate routing tables for each virtual tunnel network. ip rule add from all tunnel-id 100 lookup 100 ip rule add from all tunnel-id 200 lookup 200 A new static key controls the collection of metadata at tunnel level upon demand. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:06 -07:00
Thomas Graf	3093fbe7ff	route: Per route IP tunnel metadata via lightweight tunnel This introduces a new IP tunnel lightweight tunnel type which allows to specify IP tunnel instructions per route. Only IPv4 is supported at this point. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:06 -07:00
Thomas Graf	1b7179d3ad	route: Extend flow representation with tunnel key Add a new flowi_tunnel structure which is a subset of ip_tunnel_key to allow routes to match on tunnel metadata. For now, the tunnel id is added to flowi_tunnel which allows for routes to be bound to specific virtual tunnels. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:06 -07:00
Thomas Graf	0accfc268f	arp: Inherit metadata dst when creating ARP requests If output device wants to see the dst, inherit the dst of the original skb and pass it on to generate the ARP request. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:05 -07:00
Thomas Graf	f38a9eb1f7	dst: Metadata destinations Introduces a new dst_metadata which enables to carry per packet metadata between forwarding and processing elements via the skb->dst pointer. The structure is set up to be a union. Thus, each separate type of metadata requires its own dst instance. If demand arises to carry multiple types of metadata concurrently, metadata dst entries can be made stackable. The metadata dst entry is refcnt'ed as expected for now but a non reference counted use is possible if the reference is forced before queueing the skb. In order to allow allocating dsts with variable length, the existing dst_alloc() is split into a dst_alloc() and dst_init() function. The existing dst_init() function to initialize the subsystem is being renamed to dst_subsys_init() to make it clear what is what. The check before ip_route_input() is changed to ignore metadata dsts and drop the dst inside the routing function thus allowing to interpret metadata in a later commit. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:05 -07:00
Thomas Graf	773a69d64b	icmp: Don't leak original dst into ip_route_input() ip_route_input() unconditionally overwrites the dst. Hide the original dst attached to the skb by calling skb_dst_set(skb, NULL) prior to ip_route_input(). Reported-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:05 -07:00
Thomas Graf	1d8fff9073	ip_tunnel: Make ovs_tunnel_info and ovs_key_ipv4_tunnel generic Rename the tunnel metadata data structures currently internal to OVS and make them generic for use by all IP tunnels. Both structures are kernel internal and will stay that way. Their members are exposed to user space through individual Netlink attributes by OVS. It will therefore be possible to extend/modify these structures without affecting user ABI. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:05 -07:00
Roopa Prabhu	e3e4712ec0	mpls: ip tunnel support This implementation uses lwtunnel infrastructure to register hooks for mpls tunnel encaps. It picks cues from iptunnel_encaps infrastructure and previous mpls iptunnel RFC patches from Eric W. Biederman and Robert Shearman Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:05 -07:00
Roopa Prabhu	face0188e3	mpls: export mpls functions for use by mpls iptunnels Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	74a0f2fe8e	ipv6: rt6_info output redirect to tunnel output This is similar to ipv4 redirect of dst output to lwtunnel output function for encapsulation and xmit. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	8602a62502	ipv4: redirect dst output to lwtunnel output For input routes with tunnel encap state this patch redirects dst output functions to lwtunnel_output which later resolves to the corresponding lwtunnel output function. This has been tested to work with mpls ip tunnels. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	ffce41962e	lwtunnel: support dst output redirect function This patch introduces lwtunnel_output function to call corresponding lwtunnels output function to xmit the packet. It adds two variants lwtunnel_output and lwtunnel_output6 for ipv4 and ipv6 respectively today. But this is subject to change when lwtstate will reside in dst or dst_metadata (as per upstream discussions). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	19e42e4515	ipv6: support for fib route lwtunnel encap attributes This patch adds support in ipv6 fib functions to parse Netlink RTA encap attributes and attach encap state data to rt6_info. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:04 -07:00
Roopa Prabhu	571e722676	ipv4: support for fib route lwtunnel encap attributes This patch adds support in ipv4 fib functions to parse user provided encap attributes and attach encap state data to fib_nh and rtable. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:03 -07:00
Roopa Prabhu	499a242568	lwtunnel: infrastructure for handling light weight tunnels like mpls Provides infrastructure to parse/dump/store encap information for light weight tunnels like mpls. Encap information for such tunnels is associated with fib routes. This infrastructure is based on previous suggestions from Eric Biederman to follow the xfrm infrastructure. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:39:03 -07:00
Edward Hyunkoo Jee	0848f6428b	inet: frags: fix defragmented packet's IP header for af_packet When ip_frag_queue() computes positions, it assumes that the passed sk_buff does not contain L2 headers. However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly functions can be called on outgoing packets that contain L2 headers. Also, IPv4 checksum is not corrected after reassembly. Fixes: `7736d33f42` ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 10:29:23 -07:00
Jakub Wilk	0c199a903d	xfrm: Fix a typo Signed-off-by: Jakub Wilk <jwilk@jwilk.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:28:36 -07:00
Daniel Borkmann	32b2f4b196	sched: cls_flow: fix panic on filter replace The following test case causes a NULL pointer dereference in cls_flow: tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ flow hash keys mark action drop To be more precise, actually two different panics are fixed, the first occurs because tcf_exts_init() is not called on the newly allocated filter when we do a replace. And the second panic uncovered after that happens since the arguments of list_replace_rcu() are swapped, the old element needs to be the first argument and the new element the second. Fixes: `70da9f0bf9` ("net: sched: cls_flow use RCU") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:25:03 -07:00
Daniel Borkmann	ff3532f265	sched: cls_flower: fix panic on filter replace The following test case causes a NULL pointer dereference in cls_flower: tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1 tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ flower eth_type ipv6 action ok flowid 1:1 The problem is that commit `77b9900ef5` ("tc: introduce Flower classifier") accidentally swapped the arguments of list_replace_rcu(), the old element needs to be the first argument and the new element the second. Fixes: `77b9900ef5` ("tc: introduce Flower classifier") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:25:03 -07:00
Daniel Borkmann	f6bfc46da6	sched: cls_bpf: fix panic on filter replace The following test case causes a NULL pointer dereference in cls_bpf: FOO="1,6 0 0 4294967295," tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ bpf bytecode "$FOO" flowid 1:1 action drop The problem is that commit `1f947bf151` ("net: sched: rcu'ify cls_bpf") accidentally swapped the arguments of list_replace_rcu(), the old element needs to be the first argument and the new element the second. Fixes: `1f947bf151` ("net: sched: rcu'ify cls_bpf") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:25:02 -07:00
Marcelo Ricardo Leitner	b52effd263	sctp: fix cut and paste issue in comment Cookie ACK is always received by the association initiator, so fix the comment to avoid confusion. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:21:32 -07:00
Marcelo Ricardo Leitner	0ca50d12fe	sctp: fix src address selection if using secondary addresses In short, sctp is likely to incorrectly choose src address if socket is bound to secondary addresses. This patch fixes it by adding a new check that checks if such src address belongs to the interface that routing identified as output. This is enough to avoid rp_filter drops on remote peer. Details: Currently, sctp will do a routing attempt without specifying the src address and compare the returned value (preferred source) with the addresses that the socket is bound to. When using secondary addresses, this will not match. Then it will try specifying each of the addresses that the socket is bound to and re-routing, checking if that address is valid as src for that dst. Thing is, this check alone is weak: # ip r l 192.168.100.0/24 dev eth1 proto kernel scope link src 192.168.100.149 192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.147 # ip a l 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:15:18:6a brd ff:ff:ff:ff:ff:ff inet 192.168.122.147/24 brd 192.168.122.255 scope global dynamic eth0 valid_lft 2160sec preferred_lft 2160sec inet 192.168.122.148/24 scope global secondary eth0 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:fe15:186a/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:b3:91:46 brd ff:ff:ff:ff:ff:ff inet 192.168.100.149/24 brd 192.168.100.255 scope global dynamic eth1 valid_lft 2162sec preferred_lft 2162sec inet 192.168.100.148/24 scope global secondary eth1 valid_lft forever preferred_lft forever inet6 fe80::5054:ff:feb3:9146/64 scope link valid_lft forever preferred_lft forever 4: ens9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:05:47:ee brd ff:ff:ff:ff:ff:ff inet6 fe80::5054:ff:fe05:47ee/64 scope link valid_lft forever preferred_lft forever # ip r g 192.168.100.193 from 192.168.122.148 192.168.100.193 from 192.168.122.148 dev eth1 cache Even if you specify an interface: # ip r g 192.168.100.193 from 192.168.122.148 oif eth1 192.168.100.193 from 192.168.122.148 dev eth1 cache Although this would be valid, peers using rp_filter will drop such packets as their src doesn't match the routes for that interface. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:20:17 -07:00
Marcelo Ricardo Leitner	07868284e5	sctp: reduce indent level on sctp_v4_get_dst Paves the day for the next patch. Functionality stays untouched. Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:20:17 -07:00
David S. Miller	27dfead164	Some fixes for the current cycle: 1. Arik introduced an rtnl-locked regulatory API to be able to differentiate between place do/don't have the RTNL; this fixes missing locking in some of the code paths 2. Two small mesh bugfixes from Bob, one to avoid treating a certain malformed over-the-air frame and one to avoid sending a garbage field over the air. 3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya. 4. A fix for a powersave vs. aggregation teardown race, from Michal. 5. Thomas reduced the loglevel of CRDA messages to avoid spamming the kernel log with mostly irrelevant information. 6. Tom fixed a dangling debugfs directory pointer that could cause crashes if subsequent addition of the same interface to debugfs failed for some reason. 7. A fix from myself for a list corruption issue in mac80211 during combined interface shutdown/removal - shut down interfaces first and only then remove them to avoid that. -----BEGIN PGP SIGNATURE----- iQIcBAABCAAGBQJVqQMwAAoJEDBSmw7B7bqrZzkQAIjMKojlJRouN/N/aF7ym2pC eAboLMC+XHubQoq2H01k5ZOSrLL1kElhkB7pLas+q00gTFyavLzEcEiFqCNuLwPH lQEwLXTDUeiaVWekOYJev/ONtaDdwUXoB4BPAA3Ih4EAk9fEBtcWiWeLDgOLOS8P eYVqcMV733cOTjhYImEQnhnm3qrcwSCF1vTOJaN4Gf/qqw6j2ilq5wU1TvPyh0TA 1EP5Elb9hy1sud5X6shrsOBqkBrPoO1p3V4EeoHkxl8welqxXdqGvmA3K0sloGZT 7RiL8PD4QVyISy1NrBDnNMRRgj6BD1aLC+clmECmmgYvGGcqbzLtB3CWUCV6oQmb TC4NmgJkKNVTvQnoqxQEL8JYSs/E2ITRKyMi3sfIYAyz1dVuQf1RkZZzB22rQWT2 PaLq/k+vpS7E3OD3XB53flB/k7Y6j/OwJb/rE7i2vqSn3kcbua8H7dzd7p+AE5FA ZF//u2GBDgZeMaA9BvifByWy2+yvAEcD5/U9XkWqJ7t+HohKteLJj/scHT89pto3 n0NZ7RVRMNQ9mz14UJiVnFOL/81AjmiU123S5UIIMkmVE5Zrn7VTZlN6fVY4Fcsh AtxHQesOlCw8T4lFLxgyKkEl7bxATQ2OMR6vWsZQraRHSqIuK8JDABRokIlzoFn/ xC/Yn1vTaBuj+2nif/F0 =US5Y -----END PGP SIGNATURE----- Merge tag 'mac80211-for-davem-2015-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Some fixes for the current cycle: 1. Arik introduced an rtnl-locked regulatory API to be able to differentiate between place do/don't have the RTNL; this fixes missing locking in some of the code paths 2. Two small mesh bugfixes from Bob, one to avoid treating a certain malformed over-the-air frame and one to avoid sending a garbage field over the air. 3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya. 4. A fix for a powersave vs. aggregation teardown race, from Michal. 5. Thomas reduced the loglevel of CRDA messages to avoid spamming the kernel log with mostly irrelevant information. 6. Tom fixed a dangling debugfs directory pointer that could cause crashes if subsequent addition of the same interface to debugfs failed for some reason. 7. A fix from myself for a list corruption issue in mac80211 during combined interface shutdown/removal - shut down interfaces first and only then remove them to avoid that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:17:53 -07:00
Konstantin Khlebnikov	8bf4ada2e2	net: ratelimit warnings about dst entry refcount underflow or overflow Kernel generates a lot of warnings when dst entry reference counter overflows and becomes negative. That bug was seen several times at machines with outdated 3.10.y kernels. Most like it's already fixed in upstream. Anyway that flood completely kills machine and makes further debugging impossible. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:11:19 -07:00
Eric Dumazet	b8a23e8d8e	caif: fix leaks and race in caif_queue_rcv_skb() 1) If sk_filter() is applied, skb was leaked (not freed) 2) Testing SOCK_DEAD twice is racy : packet could be freed while already queued. 3) Remove obsolete comment about caching skb->len Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-21 00:02:44 -07:00
Alexei Starovoitov	4d9c5c53ac	test_bpf: add bpf_skb_vlan_push/pop() tests improve accuracy of timing in test_bpf and add two stress tests: - {skb->data[0], get_smp_processor_id} repeated 2k times - {skb->data[0], vlan_push} x 68 followed by {skb->data[0], vlan_pop} x 68 1st test is useful to test performance of JIT implementation of BPF_LD_ABS together with BPF_CALL instructions. 2nd test is stressing skb_vlan_push/pop logic together with skb->data access via BPF_LD_ABS insn which checks that re-caching of skb->data is done correctly. In order to call bpf_skb_vlan_push() from test_bpf.ko have to add three export_symbol_gpl. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:52:32 -07:00
Alexei Starovoitov	4e10df9a60	bpf: introduce bpf_skb_vlan_push/pop() helpers Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via helper functions. These functions may change skb->data/hlen which are cached by some JITs to improve performance of ld_abs/ld_ind instructions. Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls, re-compute header len and re-cache skb->data/hlen back into cpu registers. Note, skb->data/hlen are not directly accessible from the programs, so any changes to skb->data done either by these helpers or by other TC actions are safe. eBPF JIT supported by three architectures: - arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is. - x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls (experiments showed that conditional re-caching is slower). - s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present in the program (re-caching is tbd). These helpers allow more scalable handling of vlan from the programs. Instead of creating thousands of vlan netdevs on top of eth0 and attaching TC+ingress+bpf to all of them, the program can be attached to eth0 directly and manipulate vlans as necessary. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:52:31 -07:00
Jon Paul Maloy	d999297c3d	tipc: reduce locking scope during packet reception We convert packet/message reception according to the same principle we have been using for message sending and timeout handling: We move the function tipc_rcv() to node.c, hence handling the initial packet reception at the link aggregation level. The function grabs the node lock, selects the receiving link, and accesses it via a new call tipc_link_rcv(). This function appends buffers to the input queue for delivery upwards, but it may also append outgoing packets to the xmit queue, just as we do during regular message sending. The latter will happen when buffers are forwarded from the link backlog, or when retransmission is requested. Upon return of this function, and after having released the node lock, tipc_rcv() delivers/tranmsits the contents of those queues, but it may also perform actions such as link activation or reset, as indicated by the return flags from the link. This reduces the number of cpu cycles spent inside the node spinlock, and reduces contention on that lock. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	1a20cc254e	tipc: introduce node contact FSM The logics for determining when a node is permitted to establish and maintain contact with its peer node becomes non-trivial in the presence of multiple parallel links that may come and go independently. A known failure scenario is that one endpoint registers both its links to the peer lost, cleans up it binding table, and prepares for a table update once contact is re-establihed, while the other endpoint may see its links reset and re-established one by one, hence seeing no need to re-synchronize the binding table. To avoid this, a node must not allow re-establishing contact until it has confirmation that even the peer has lost both links. Currently, the mechanism for handling this consists of setting and resetting two state flags from different locations in the code. This solution is hard to understand and maintain. A closer analysis even reveals that it is not completely safe. In this commit we do instead introduce an FSM that keeps track of the conditions for when the node can establish and maintain links. It has six states and four events, and is strictly based on explicit knowledge about the own node's and the peer node's contact states. Only events leading to state change are shown as edges in the figure below. +--------------+ \| SELF_UP/ \| +---------------->\| PEER_COMING \|-----------------+ SELF_ \| +--------------+ \|PEER_ ESTBL_ \| \| \|ESTBL_ CONTACT\| SELF_LOST_CONTACT \| \|CONTACT \| v \| \| +--------------+ \| \| PEER_ \| SELF_DOWN/ \| SELF_ \| \| LOST_ +--\| PEER_LEAVING \|<--+ LOST_ v +-------------+ CONTACT \| +--------------+ \| CONTACT +-----------+ \| SELF_DOWN/ \|<----------+ +----------\| SELF_UP/ \| \| PEER_DOWN \|<----------+ +----------\| PEER_UP \| +-------------+ SELF_ \| +--------------+ \| PEER_ +-----------+ \| LOST_ +--\| SELF_LEAVING/\|<--+ LOST_ A \| CONTACT \| PEER_DOWN \| CONTACT \| \| +--------------+ \| \| A \| PEER_ \| PEER_LOST_CONTACT \| \|SELF_ ESTBL_ \| \| \|ESTBL_ CONTACT\| +--------------+ \|CONTACT +---------------->\| PEER_UP/ \|-----------------+ \| SELF_COMING \| +--------------+ Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	8a1577c96f	tipc: move link supervision timer to node level In our effort to move control of the links to the link aggregation layer, we move the perodic link supervision timer to struct tipc_node. The new timer is shared between all links belonging to the node, thus saving resources, while still kicking the FSM on both its pertaining links at each expiration. The current link timer and corresponding functions are removed. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	333ef69ed2	tipc: simplify link timer implementation We create a second, simpler, link timer function, tipc_link_timeout(). The new function makes use of the new FSM function introduced in the previous commit, and just like it, takes a buffer queue as parameter. It returns an event bit field and potentially a link protocol packet to the caller. The existing timer function, link_timeout(), is still needed for a while, so we redesign it to become a wrapper around the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:16 -07:00
Jon Paul Maloy	6ab30f9cbe	tipc: improve link FSM implementation The link FSM implementation is currently unnecessarily complex. It sometimes checks for conditional state outside the FSM data before deciding next state, and often performs actions directly inside the FSM logics. In this commit, we create a second, simpler FSM implementation, that as far as possible acts only on states and events that it is strictly defined for, and postpone any actions until it is finished with its decisions. It also returns an event flag field and an a buffer queue which may potentially contain a protocol message to be sent by the caller. Unfortunately, we cannot yet make the FSM "clean", in the sense that its decisions are only based on FSM state and event, and that state changes happen only here. That will have to wait until the activate/reset logics has been cleaned up in a future commit. We also rename the link states as follows: WORKING_WORKING -> TIPC_LINK_WORKING WORKING_UNKNOWN -> TIPC_LINK_PROBING RESET_UNKNOWN -> TIPC_LINK_RESETTING RESET_RESET -> TIPC_LINK_ESTABLISHING The existing FSM function, link_state_event(), is still needed for a while, so we redesign it to make use of the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	426cc2b86d	tipc: introduce new link protocol msg create function As a preparation for later changes, we introduce a new function tipc_link_build_proto_msg(). Instead of actually sending the created protocol message, it only creates it and adds it to the head of a skb queue provided by the caller. Since we still need the existing function tipc_link_protocol_xmit() for a while, we redesign it to make use of the new function. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	d3504c3449	tipc: clean up definitions and usage of link flags The status flag LINK_STOPPED is not needed any more, since the mechanism for delayed deletion of links has been removed. Likewise, LINK_STARTED and LINK_START_EVT are unnecessary, because we can just as well start the link timer directly from inside tipc_link_create(). We eliminate these flags in this commit. Instead of the above flags, we now introduce three new link modes, TIPC_LINK_OPEN, TIPC_LINK_BLOCKED and TIPC_LINK_TUNNEL. The values indicate whether, and in the case of TIPC_LINK_TUNNEL, which, messages the link is allowed to receive in this state. TIPC_LINK_BLOCKED also blocks timer-driven protocol messages to be sent out, and any change to the link FSM. Since the modes are mutually exclusive, we convert them to state values, and rename the 'flags' field in struct tipc_link to 'exec_mode'. Finally, we move the #defines for link FSM states and events from link.h into enums inside the file link.c, which is the real usage scope of these definitions. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	af9b028e27	tipc: make media xmit call outside node spinlock context Currently, message sending is performed through a deep call chain, where the node spinlock is grabbed and held during a significant part of the transmission time. This is clearly detrimental to overall throughput performance; it would be better if we could send the message after the spinlock has been released. In this commit, we do instead let the call revert on the stack after the buffer chain has been added to the transmission queue, whereafter clones of the buffers are transmitted to the device layer outside the spinlock scope. As a further step in our effort to separate the roles of the node and link entities we also move the function tipc_link_xmit() to node.c, and rename it to tipc_node_xmit(). Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	22d85c7942	tipc: change sk_buffer handling in tipc_link_xmit() When the function tipc_link_xmit() is given a buffer list for transmission, it currently consumes the list both when transmission is successful and when it fails, except for the special case when it encounters link congestion. This behavior is inconsistent, and needs to be corrected if we want to avoid problems in later commits in this series. In this commit, we change this to let the function consume the list only when transmission is successful, and leave the list with the sender in all other cases. We also modifiy the socket code so that it adapts to this change, i.e., purges the list when a non-congestion error code is returned. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:15 -07:00
Jon Paul Maloy	36e78a463b	tipc: use bearer index when looking up active links struct tipc_node currently holds two arrays of link pointers; one, indexed by bearer identity, which contains all links irrespective of current state, and one two-slot array for the currently active link or links. The latter array contains direct pointers into the elements of the former. This has the effect that we cannot know the bearer id of a link when accessing it via the "active_links[]" array without actually dereferencing the pointer, something we want to avoid in some cases. In this commit, we do instead store the bearer identity in the "active_links" array, and use this as an index to find the right element in the overall link entry array. This change should be seen as a preparation for the later commits in this series. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d39bbd445d	tipc: move link input queue to tipc_node At present, the link input queue and the name distributor receive queues are fields aggregated in struct tipc_link. This is a hazard, because a link might be deleted while a receiving socket still keeps reference to one of the queues. This commit fixes this bug. However, rather than adding yet another reference counter to the critical data path, we move the two queues to safe ground inside struct tipc_node, which is already protected, and let the link code only handle references to the queues. This is also in line with planned later changes in this area. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	d3a43b907a	tipc: move link creation from neighbor discoverer to node As a step towards turning links into node internal entities, we move the creation of links from the neighbor discovery logics to the node's link control logics. We also create an additional entry for the link's media address in the newly introduced struct tipc_link_entry, since this is where it is needed in the upcoming commits. The current copy in struct tipc_link is kept for now, but will be removed later. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Jon Paul Maloy	9d13ec65ed	tipc: introduce link entry structure to struct tipc_node struct 'tipc_node' currently contains two arrays for link attributes, one for the link pointers, and one for the usable link MTUs. We now group those into a new struct 'tipc_link_entry', and intoduce one single array consisting of such enties. Apart from being a cosmetic improvement, this is a starting point for the strict master-slave relation between node and link that we will introduce in the following commits. Reviewed-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 20:41:14 -07:00
Scott Feldman	1a3b2ec93d	switchdev: add offload_fwd_mark generator helper skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Scott Feldman	d754f98b50	net: add phys ID compare helper to test if two IDs are the same Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Scott Feldman	0c4f691ff6	net: don't reforward packets already forwarded by offload device Just before queuing skb for xmit on port, check if skb has been marked by switchdev port driver as already fordwarded by device. If so, drop skb. A non-zero skb->offload_fwd_mark field is set by the switchdev port driver/device on ingress to indicate the skb has already been forwarded by the device to egress ports with matching dev->skb_mark. The switchdev port driver would assign a non-zero dev->offload_skb_mark for each device port netdev during registration, for example. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 18:32:44 -07:00
Herbert Xu	fdbf5b097b	Revert "sit: Add gro callbacks to sit_offload" This patch reverts `19424e052f` ("sit: Add gro callbacks to sit_offload") because it generates packets that cannot be handled even by our own GSO. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 16:52:28 -07:00
Nikolay Aleksandrov	a7ce45a74b	bridge: mcast: fix br_multicast_dev_del warn when igmp snooping is not defined Fix: net/bridge/br_if.c: In function 'br_dev_delete': >> net/bridge/br_if.c:284:2: error: implicit declaration of function >> 'br_multicast_dev_del' [-Werror=implicit-function-declaration] br_multicast_dev_del(br); ^ cc1: some warnings being treated as errors when igmp snooping is not defined. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 16:19:03 -07:00
Phil Sutter	a0a9f33bdf	net/ipv6: update flowi6_oif in ip6_dst_lookup_flow if not set Newly created flows don't have flowi6_oif set (at least if the associated socket is not interface-bound). This leads to a mismatch in __xfrm6_selector_match() for policies which specify an interface in the selector (sel->ifindex != 0). Backtracing shows this happens in code-paths originating from e.g. ip6_datagram_connect(), rawv6_sendmsg() or tcp_v6_connect(). (UDP was not tested for.) In summary, this patch fixes policy matching on outgoing interface for locally generated packets. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:59:32 -07:00
Satish Ashok	e10177abf8	bridge: multicast: fix handling of temp and perm entries When the bridge (or port) is brought down/up flush only temp entries and leave the perm ones. Flush perm entries only when deleting the bridge device or the associated port. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:49:10 -07:00
Nikolay Aleksandrov	ef8299de7e	bridge: multicast: notify on group delete Group notifications were not sent when a group expired or was deleted due to bridge/port device being deleted. So add br_mdb_notify() to br_multicast_del_pg(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:49:10 -07:00
Daniel Borkmann	8d20aabe1c	ebpf: add helper to retrieve net_cls's classid cookie It would be very useful to retrieve the net_cls's classid from an eBPF program to allow for a more fine-grained classification, it could be directly used or in conjunction with additional policies. I.e. docker, but also tooling such as cgexec, can easily run applications via net_cls cgroups: cgcreate -g net_cls:/foo echo 42 > foo/net_cls.classid cgexec -g net_cls:foo <prog> Thus, their respecitve classid cookie of foo can then be looked up on the egress path to apply further policies. The helper is desigend such that a non-zero value returns the cgroup id. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:41:30 -07:00
Daniel Borkmann	b87a173e25	cls_cgroup: factor out classid retrieval Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-20 12:41:30 -07:00
Chuck Lever	31193fe5f6	svcrdma: Remove svc_rdma_fastreg() Commit `0bf4828983` ("svcrdma: refactor marshalling logic") removed the last call site for svc_rdma_fastreg(). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-07-20 14:58:47 -04:00
Chuck Lever	10dc451218	svcrdma: Clean up svc_rdma_get_reply_array() Kernel coding conventions frown upon having large nontrivial functions in header files, and the preference these days is to allow the compiler to make inlining decisions if possible. As these functions are re-homed into a .c file, be sure that comparisons with fields in struct rpcrdma_msg are with be32 constants. This is a refactoring change; no behavior change is intended. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-07-20 14:58:47 -04:00
Chuck Lever	9d11b51ce7	svcrdma: Fix send_reply() scatter/gather set-up The Linux NFS server returns garbage in the data payload of inline NFS/RDMA READ replies. These are READs of under 1000 bytes or so where the client has not provided either a reply chunk or a write list. The NFS server delivers the data payload for an NFS READ reply to the transport in an xdr_buf page list. If the NFS client did not provide a reply chunk or a write list, send_reply() is supposed to set up a separate sge for the page containing the READ data, and another sge for XDR padding if needed, then post all of the sges via a single SEND Work Request. The problem is send_reply() does not advance through the xdr_buf when setting up scatter/gather entries for SEND WR. It always calls dma_map_xdr with xdr_off set to zero. When there's more than one sge, dma_map_xdr() sets up the SEND sge's so they all point to the xdr_buf's head. The current Linux NFS/RDMA client always provides a reply chunk or a write list when performing an NFS READ over RDMA. Therefore, it does not exercise this particular case. The Linux server has never had to use more than one extra sge for building RPC/RDMA replies with a Linux client. However, an NFS/RDMA client _is_ allowed to send small NFS READs without setting up a write list or reply chunk. The NFS READ reply fits entirely within the inline reply buffer in this case. This is perhaps a more efficient way of performing NFS READs that the Linux NFS/RDMA client may some day adopt. Fixes: `b432e6b3d9` ('svcrdma: Change DMA mapping logic to . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285 Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-07-20 14:58:47 -04:00
Shirley Ma	ff79c74dca	NFS/RDMA Release resources in svcrdma when device is removed When removing underlying RDMA device, the rmmod will hang forever if there are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts could also prevent the server from shutting down. Further debugging shows that the existing connections are not teared down and resource are not released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler thus svc_xprt_free is never invoked to release the existing connection resources. The patch has been passed removing, adding device back and forth without stopping NFS/RDMA service. This will also allow a device to be unplugged and swapped out without shutting down NFS service. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252 Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2015-07-20 14:58:47 -04:00
Pablo Neira Ayuso	b64f48dcda	Merge tag 'ipvs-fixes-for-v4.2' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs Simon Horman says: ==================== IPVS Fixes for v4.2 please consider this fix for v4.2. For reasons that are not clear to me it is a bumper crop. It seems to me that they are all relevant to stable. Please let me know if you need my help to get the fixes into stable. * ipvs: fix ipv6 route unreach panic This problem appears to be present since IPv6 support was added to IPVS in v2.6.28. * ipvs: skb_orphan in case of forwarding This appears to resolve a problem resulting from a side effect of `41063e9dd1` ("ipv4: Early TCP socket demux.") which was included in v3.6. * ipvs: do not use random local source address for tunnels This appears to resolve a problem introduced by `026ace060d` ("ipvs: optimize dst usage for real server") in v3.10. * ipvs: fix crash if scheduler is changed This appears to resolve a problem introduced by `ceec4c3816` ("ipvs: convert services to rcu") in v3.10. Julian has provided backports of the fix: * [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04008.html * [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed http://www.spinics.net/lists/lvs-devel/msg04007.html Please let me know how you would like to handle guiding these backports into stable. * ipvs: fix crash with sync protocol v0 and FTP This appears to resolve a problem introduced by `749c42b620` ("ipvs: reduce sync rate with time thresholds") in v3.5 ==================== Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-20 15:01:19 +02:00
Pablo Neira Ayuso	0838aa7fcf	netfilter: fix netns dependencies with conntrack templates Quoting Daniel Borkmann: "When adding connection tracking template rules to a netns, f.e. to configure netfilter zones, the kernel will endlessly busy-loop as soon as we try to delete the given netns in case there's at least one template present, which is problematic i.e. if there is such bravery that the priviledged user inside the netns is assumed untrusted. Minimal example: ip netns add foo ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 ip netns del foo What happens is that when nf_ct_iterate_cleanup() is being called from nf_conntrack_cleanup_net_list() for a provided netns, we always end up with a net->ct.count > 0 and thus jump back to i_see_dead_people. We don't get a soft-lockup as we still have a schedule() point, but the serving CPU spins on 100% from that point onwards. Since templates are normally allocated with nf_conntrack_alloc(), we also bump net->ct.count. The issue why they are not yet nf_ct_put() is because the per netns .exit() handler from x_tables (which would eventually invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is called in the dependency chain at a later point in time than the per netns .exit() handler for the connection tracker. This is clearly a chicken'n'egg problem: after the connection tracker .exit() handler, we've teared down all the connection tracking infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be invoked at a later point in time during the netns cleanup, as that would lead to a use-after-free. At the same time, we cannot make x_tables depend on the connection tracker module, so that the xt_ct_tg_destroy() would be invoked earlier in the cleanup chain." Daniel confirms this has to do with the order in which modules are loaded or having compiled nf_conntrack as modules while x_tables built-in. So we have no guarantees regarding the order in which netns callbacks are executed. Fix this by allocating the templates through kmalloc() from the respective SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache. Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch is marked as unlikely since conntrack templates are rarely allocated and only from the configuration plane path. Note that templates are not kept in any list to avoid further dependencies with nf_conntrack anymore, thus, the tmpl larval list is removed. Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Daniel Borkmann <daniel@iogearbox.net>	2015-07-20 14:58:19 +02:00
Eric W. Biederman	e317fa505d	netfilter: Fix memory leak in nf_register_net_hook In the rare case that when it is a attempted to use a per network device netfilter hook and the network device does not exist the newly allocated structure can leak. Be a good citizen and free the newly allocated structure in the error handling code. Fixes: `085db2c045` ("netfilter: Per network namespace netfilter hooks.") Reported-by: kbuild@01.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-20 09:15:50 +02:00
Denys Vlasenko	eb6d9293df	mac80211: deinline rate_control_rate_init, rate_control_rate_update With this .config: http://busybox.net/~vda/kernel_config, after deinlining these functions have sizes and callsite counts as follows: rate_control_rate_init: 554 bytes, 8 calls rate_control_rate_update: 1596 bytes, 5 calls Total size reduction: about 11 kbytes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: John Linville <linville@tuxdriver.com> CC: Michal Kazior <michal.kazior@tieto.com> CC: Johannes Berg <johannes.berg@intel.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:50:02 +02:00
Denys Vlasenko	727da60be9	mac80211: deinline drv_sta_state With this .config: http://busybox.net/~vda/kernel_config, after deinlining the function size is 3132 bytes and there are 7 callsites. Total size reduction: about 20 kbytes. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: John Linville <linville@tuxdriver.com> CC: Michal Kazior <michal.kazior@tieto.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: linux-wireless@vger.kernel.org Cc: netdev@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:48:50 +02:00
Bob Copeland	0e0060fcfb	mac80211: select an AID when creating new mesh STAs Instead of using peer link id for AID, generate a new AID when creating mesh STAs in the kernel peering manager. This enables smaller TIM elements and more closely follows the standard, and it also enables mesh to work on drivers that require a valid AID when the STA is inserted (ath10k firmware has this requirement, for example). In the case of userspace-managed stations, we use the AID from NL80211_CMD_NEW_STATION. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:47:58 +02:00
Bob Copeland	a69bd8e60b	mac80211: mesh: separate plid and aid concepts According to 802.11-2012 13.3.1, a mesh STA should assign an AID upon receipt of a mesh peering open frame rather than using the link id of the peer. Using the peer link id has two potential issues: it may not be unique among the peers, and by its nature it is random, so the TIM may not compress well. In preparation for allocating it properly, use sta->sta.aid, but keep the existing behavior of using the plid in the aid we send. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:47:11 +02:00
Bob Copeland	fa87a6566c	mac80211: reorder mesh_plink to remove forward decl Move mesh_plink_frame_tx() above the first caller to remove the forward declaration. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:47:02 +02:00
Eliad Peller	b0485e9f3d	mac80211: clear local->suspended before calling drv_resume() Currently, mac80211 calls drv_resume() on wowlan resume, but drops any incoming frame until local->suspended is cleared later on. This requires the low-level driver to support a new state, in which it is expected to fully work (as it was resumed) but not passing rx frames yet (as they will be dropped). iwlwifi (and probably other drivers as well) has issues supporting such mode. Since in the wowlan case we already short-circuit ieee80211_reconfig, there's nothing that prevents us from clearing local->suspend before calling drv_resume(), and letting the low-level driver work normally. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:40:46 +02:00
Arik Nemtsov	42d8d78961	mac80211: TDLS: deny ch-switch req on disallowed channels If a TDLS station is not allowed to beacon on a channel, don't accept a channel switch request to this channel. Move channel building code up to avoid lockdep violations - reg_can_beacon needs to take the wdev lock. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:40:35 +02:00
Arik Nemtsov	c8ff71e667	mac80211: TDLS: handle chan-switch in RTNL locked work Move TDLS channel-switch Rx handling into an RTNL locked work. This is required to add proper regulatory checking to incoming channel-switch requests. Queue incoming requests in a dedicated skb queue and handle the request in a device-specific work to avoid deadlocking on interface removal. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:40:15 +02:00
Johannes Berg	72bbe3d1c2	Merge branch 'mac80211' into mac80211-next This is necessary to merge the new TDLS and mesh patches, as they depend on some fixes. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:39:41 +02:00
Sara Sharon	322cd406da	mac80211: Add support for declaring MU-MIMO capability Add support for declaring MU-MIMO beamformee capability for relevant hardware. When sending association request, the capability is included if both hardware and the AP support it, and no other virtual interface is using it. This is in order to avoid multiple interfaces using MU-MIMO in parallel which might lead to contradictions in the group-id mechanism. Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:29 +02:00
Johannes Berg	ccc6bb96ff	mac80211: account TX MSDUs properly with segmentation offload If an SKB will be segmented by the driver, count it for multiple MSDUs that are being transmitted rather than just a single. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:28 +02:00
John Linville	841b351cf9	wireless: remove superfluous if statement in regulatory code Commit `eeca9fce1d` ('cfg80211: Schedule timeout for all CRDA calls') left behind a superfluous check after it removed some earlier code. In reg_process_hint, the test of "treatment == REG_REQ_IGNORE \|\| treatment == REG_REQ_ALREADY_SET" is superfluous because the code in the if-then branch is identical to the code after the if statement. Coverity CID #1295939 I also removed the unnecessary assignment of treatment in this case, and added a comment reminding any future patch authors to ensure that treatment is properly assigned before it is used after the switch. Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:27 +02:00
Johannes Berg	33d8783c58	cfg80211: allow mgmt_frame_register callback to sleep This callback is currently not allowed to sleep, which makes it more difficult to implement proper driver methods in mac80211 than it has to be. Instead of doing asynchronous work here in mac80211, make it possible for the callback to sleep by doing some asynchronous work in cfg80211. This also enables improvements to other drivers, like ath6kl, that would like to sleep in this callback. While at it, also fix the code to call the driver on the implicit unregistration when an interface is removed, and do that also when a P2P-Device wdev is destroyed (otherwise we leak the structs.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:26 +02:00
Johannes Berg	69f1322368	mac80211: shrink struct ieee80211_fragment_entry Most of the fields in this struct use too wide types, change that to shrink the struct from 64 to 48 bytes (on 64-bit.) This results in a total saving of 64 bytes for each interface. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:25 +02:00
Johannes Berg	a76d5e0a23	mac80211: mesh: move fail_avg into mesh struct This value is only used in mesh, so move it into the new mesh sub-struct of the station info. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:24 +02:00
Krishna Chaitanya	a3ebb4e1b7	mac80211: minstrel_ht: handle peers in dynamic SMPS In case of Dynamic SMPS enable RTS/CTS for all rates. Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> [change comment] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:20 +02:00
Chun-Yeow Yeoh	932e628da2	mac80211: mesh process the target only subfield for mesh hwmp This patch does the following: - Remove unnecessary flags field used by PERR element - Use the per target flags defined in <linux/ieee80211.h> - Process the target only subfield based on case E2 of IEEE802.11-2012 13.10.9.3 Signed-off-by: Chun-Yeow Yeoh <yeohchunyeow@gmail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:19 +02:00
Arik Nemtsov	d51c2ea370	mac80211: TDLS: correctly configure SMPS state The IEEE802.11-2012 specification is vague regarding SMPS operation during TDLS. It does not define a clear way to transition between SMPS states. To avoid interop issues, set SMPS to off when TDLS peers are connected. Accomplish this by extending the definition of the AUTOMATIC state. If the driver forces a state other than OFF, disconnect all TDLS peers. While at it, avoid changing the SMPS state of the peer STA. We have no way to control it, so try and behave correctly towards it. Move the TDLS peer-teardown function to where the rest of the TDLS code resides. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:18 +02:00
Bob Copeland	3633ebebab	mac80211: enable assoc check for mesh interfaces We already set a station to be associated when peering completes, both in user space and in the kernel. Thus we should always have an associated sta before sending data frames to that station. Failure to check assoc state can cause crashes in the lower-level driver due to transmitting unicast data frames before driver sta structures (e.g. ampdu state in ath9k) are initialized. This occurred when forwarding in the presence of fixed mesh paths: frames were transmitted to stations with whom we hadn't yet completed peering. Cc: stable@vger.kernel.org Reported-by: Alexis Green <agreen@cococorp.com> Tested-by: Jesse Jones <jjones@cococorp.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:17 +02:00
Jesse Jones	d8f0300a7a	mac80211: mac80211: Check SN for deactivated mpaths When processing a PREQ or PREP it's critical to use the incoming SN. If that is improperly done routing loops and other types of badness can happen. But the code was always processing path messages for deactivated paths. This path fixes that so that if we have a valid SN then we use it to verify that it is a message we can accept. For reference the relevant section of the standard is 13.10.8.4 which doesn't address the deactivated path case at all. I also included a special case for when our peer reboots or restarts networking. This is an important case because without it there can be a very long delay before we accept path messages from that peer. It's also a simple case and intimately associated with processing messages for deactivated paths so I used one patch instead of two. Signed-off-by: Alexis Green <agreen@cococorp.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:16 +02:00
Jesse Jones	d82547106f	mac80211: mesh: don't invalidate SN on discovery failure The 2012 spec mentions that path SNs can be invalid when created (see section 13.10.8.4 table 13-9) but AFAICT never talks about invalidating SNs. Which makes sense: if we have figured out the path to a target at a certain SN then we want to remember that fact. Failing to do so can lead to routing loops because if we don't have a valid SN then we have no way of knowing whether an incoming path message leads to or away from the target. However currently when discovery fails we zero out mpath->flags which clears MESH_PATH_SN_VALID. This patch fixes that so that only the discovery relevant flags are cleared. Signed-off-by: Alexis Green <agreen@cococorp.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:14 +02:00
Alexis Green	703ee73a41	mac80211: mesh: add missing case to PERR processing When the nexthop is unable to resolve its own nexthop it will send back a PERR with a zero target_sn. According to section 13.10.11.4.3 step b in the 2012 standard that perr should be forwarded and the associated mpath->sn should be incremented. Neither one of those was happening which is rather bad because the originator was not told that packets are black holing. Signed-off-by: Alexis Green <agreen@cococorp.com> CC: Jesse Jones <jjones@cococorp.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:13 +02:00
Arik Nemtsov	0fabfaafec	mac80211: upgrade BW of TDLS peers when possible Define a station chandef, to be used for wider-bw TDLS peers. When both peers support the feature, upgrade the channel bandwidth to the maximum allowed by both peers and regulatory. Currently widths up to 80MHz are supported in the 5GHz band. When a TDLS peer connects/disconnects recalculate the channel type of the current chanctx. Make the chanctx width calculation consider wider-bw TDLS peers and similarly fix the max_required_bw calculation for the chanctx min_def. Since the sta->bandwidth is calculated only later on, take bss_conf.chandef.width as the minimal width for station interface. Set the upgraded channel width in the VHT-operation set during TDLS setup. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:12 +02:00
Arik Nemtsov	b98fb44ffc	mac80211: define TDLS wider BW support bits Allow a device to specify support for the TDLS wider-bandwidth feature. Indicate this support during TDLS setup in the ext-capab IE and set an appropriate station flag when our TDLS peer supports it. This feature gives TDLS peers the ability to use a wider channel than the base width of the BSS. For instance VHT capable TDLS peers connected on a 20MHz channel can extend the channel to 80MHz, if regulatory considerations allow it. Do not cap the bandwidth of such stations by the current BSS channel width in mac80211. Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:11 +02:00
Eliad Peller	7584f88f9e	mac80211: clear local->in_reconfig on reconfig error If reconfiguration fails, local->in_reconfig is never cleaned, resulting in rx frames being dropped next time the device is started. Signed-off-by: Eliad Peller <eliadx.peller@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:10 +02:00
Johannes Berg	6513e98e05	mac80211: allow passing NULL to ieee80211_vif_to_wdev() Simply return NULL in this case, instead of crashing. This can simplify callers that would otherwise have to check for this explicitly. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:09 +02:00
Wojciech Dubowik	e996ec2a4d	mac80211: avoid unnecessary beacon deref on CSA counter update The beacon struct is already available in many contexts that are also already in an RCU read-locked section. Avoid that by using the existing beacon struct pointer directly. Signed-off-by: Wojciech Dubowik <Wojciech.Dubowik@neratec.com> [rewrite subject/add commit message] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:08 +02:00
Johannes Berg	1365770248	mac80211: move mesh STA parameters code to own function The code was always a bit awkward due to the 80-col restriction and got worse in the previous patch. Refactor it a bit into its own function to make it read nicer. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:07 +02:00
Johannes Berg	433f5bc1c0	mac80211: move mesh related station fields to own struct There are now a fairly large number of mesh fields that really aren't needed in any other modes; move those into their own structure and allocate them separately. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:06 +02:00
Johannes Berg	e414eea77d	mac80211: remove IEEE80211_RX_FRAGMENTED There's a long-standing TODO item to use this flag in the cooked monitor RX, but clearly it was never needed and now this hasn't been used by userspace for a long time, so no userspace changes could require it now. Remove the unused flag. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:04 +02:00
Johannes Berg	ac100ce52a	mac80211: duplicate station's MAC address for hash table Currently, the station hash table lookup (or iteration) must access two cachelines for each station - the one with the hash table node, and the one with the MAC address. Duplicate the MAC address next to the hash node to get rid of this. Since the MAC address is static there's no consistency problem introduced by this. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:03 +02:00
Johannes Berg	981d94a801	mac80211: support device/driver PN check for CCMP/GCMP When there are multiple RX queues, the PN checks in mac80211 cannot be used since packets might be processed out of order on different CPUs. Allow the driver to report that the PN has been checked, drivers that will use multi-queue RX will have to set this flag. For now, the flag is only valid when the frame has been decrypted, in theory that restriction doesn't have to be there, but in practice the hardware will have decrypted the frame already. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:02 +02:00
Johannes Berg	77c96404a4	mac80211: remove key TX/RX counter This counter is inherently racy (since it can be incremented by RX as well as by concurrent TX) and only available in debugfs. Instead of fixing it to be per-CPU or similar, remove it for now. If needed it should be added without races and with proper nl80211, perhaps even addressing the threshold reporting TODO item that's been there since the code was originally added. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:01 +02:00
Johannes Berg	0c028b5fd1	mac80211: remove zero-length A-MPDU subframe reporting As there's no driver using this capability and reporting zero-length A-MPDU subframes for radiotap monitoring, remove the capability to free up two RX flags. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:38:00 +02:00
Johannes Berg	af9f9b22be	mac80211: don't store napi struct When introducing multiple RX queues, a single NAPI struct will not be sufficient. Instead of trying to store multiple, simply change the API to have the NAPI struct passed to the RX function. This of course means that drivers using rx_irqsafe() cannot use NAPI, but that seems a reasonable trade-off, particularly since only two of all drivers are currently using it at all. While at it, we can now remove the IEEE80211_RX_REORDER_TIMER flag again since this code path cannot have a napi struct anyway. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:59 +02:00
Johannes Berg	798a457dfb	mac80211: fix comment referring to RX queue There are no RX queues in mac80211 (yet), the comment should refer to the TID (including one slot for non-QoS) rather than 'RX queue'. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:58 +02:00
Johannes Berg	a682849329	mac80211: move ieee80211_get_bssid into RX file This function is only used in the RX code, so moving it into that file gives the compiler better optimisation possibilities and also allows us to remove the check for short frames (which in the RX path cannot happen, but as a generic utility needed to be checked.) Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:57 +02:00
Johannes Berg	9ad8b21b74	mac80211: remove short frame test and counter Short frames less than 16 octets are already blocked in the monitor code by the should_drop_frame() function, and cannot get into the regular RX path. Therefore, this check can never trigger and the counter invariably stays zero. Remove the useless code. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:55 +02:00
Johannes Berg	16bf948081	mac80211: remove sta_info.gtk_idx This struct member is only assigned, never used otherwise; remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:54 +02:00
Johannes Berg	cf47161ad2	mac80211: rename 'sta_inf' variable to more common 'sta' We typically use 'sta' for the station info struct, and if needed 'pubsta' for the public (driver-visible) portion thereof. Do this in the ieee80211_sta_ps_transition() function. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:53 +02:00
Johannes Berg	5c48f12017	mac80211: remove exposing 'mfp' to drivers There's no driver using this, so remove it. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:37:52 +02:00
Arik Nemtsov	923b352f19	cfg80211: use RTNL locked reg_can_beacon for IR-relaxation The RTNL is required to check for IR-relaxation conditions that allow more channels to beacon. Export an RTNL locked version of reg_can_beacon and use it where possible in AP/STA interface type flows, where IR-relaxation may be applicable. Fixes: `06f207fc54` ("cfg80211: change GO_CONCURRENT to IR_CONCURRENT for STA") Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 15:02:02 +02:00
Bob Copeland	b3e7de873d	mac80211: add missing length check for confirm frames Although mesh_rx_plink_frame() already checks that frames have enough bytes for the action code plus another two bytes for capability/reason code, it doesn't take into account that confirm frames also have an additional two-byte aid. As a result, a corrupt frame could cause a subsequent subtraction to wrap around to ill effect. Add another check for this case. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 14:39:42 +02:00
Bob Copeland	2ea752cd2c	mac80211: correct aid location in peering frames According to 802.11-2012 8.5.16.3.2 AID comes directly after the capability bytes in mesh peering confirm frames. The existing code, however, was adding a 2 byte offset to this location, resulting in garbage data going out over the air. Remove the offset to fix it. Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 14:38:10 +02:00
Thomas Petazzoni	042ab5fc7a	wireless: regulatory: reduce log level of CRDA related messages With a basic Linux userspace, the messages "Calling CRDA to update world regulatory domain" appears 10 times after boot every second or so, followed by a final "Exceeded CRDA call max attempts. Not calling CRDA". For those of us not having the corresponding userspace parts, having those messages repeatedly displayed at boot time is a bit annoying, so this commit reduces their log level to pr_debug(). Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 14:37:23 +02:00
Johannes Berg	d8d9008cfb	mac80211: shut down interfaces before destroying interface list If the hardware is unregistered while interfaces are up, mac80211 will unregister all interfaces, which in turns causes mac80211 to be called again to remove them all from the driver and eventually shut down the hardware. During this shutdown, however, it's currently already unsafe to iterate the list of interfaces atomically, as the list is manipulated in an unsafe manner. This puts an undue burden on the driver - it must stop all its activities before calling ieee80211_unregister_hw(), while in the normal stop path it can do all cleanup in the stop method. If, for example, it's using the iteration during RX for some reason, it would have to stop RX before unregistering to avoid crashes. Fix this problem by closing all interfaces before unregistering them. This will cause the driver stop to have completed before we manipulate the interface list, and after the driver is stopped and has called ieee80211_unregister_hw() it really musn't be iterating any more as the memory will be freed as well. Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 11:16:26 +02:00
Chaitanya T K	541b6ed7ce	mac80211: wowlan: enable powersave if suspend while ps-polling If for any reason we're in the middle of PS-polling or awake after TX due to dynamic powersave while going to suspend, go back to save power. This might cause a response frame to get lost, but since we can't really wait for it while going to suspend that's still better than not enabling powersave which would cause higher power usage during (and possibly even after) suspend. Note that this really only affects the very few drivers that use the powersave implementation in mac80211. Signed-off-by: Chaitanya T K <chaitanya.mgit@gmail.com> [rewrite misleading commit log] Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 11:13:21 +02:00
Michal Kazior	e9de01907e	mac80211: don't clear all tx flags when requeing When acting as AP and a PS-Poll frame is received associated station is marked as one in a Service Period. This state is kept until Tx status for released frame is reported. While a station is in Service Period PS-Poll frames are ignored. However if PS-Poll was received during A-MPDU teardown it was possible to have the to-be released frame re-queued back to pending queue. In such case the frame was stripped of 2 important flags: (a) IEEE80211_TX_CTL_NO_PS_BUFFER (b) IEEE80211_TX_STATUS_EOSP Stripping of (a) led to the frame that was to be released to be queued back to ps_tx_buf queue. If station remained to use only PS-Poll frames the re-queued frame (and new ones) was never actually transmitted because mac80211 would ignore subsequent PS-Poll frames due to station being in Service Period. There was nothing left to clear the Service Period bit (no xmit -> no tx status -> no SP end), i.e. the AP would have the station stuck in Service Period. Beacon TIM would repeatedly prompt station to poll for frames but it would get none. Once (a) is not stripped (b) becomes important because it's the main condition to clear the Service Period bit of the station when Tx status for the released frame is reported back. This problem was observed with ath9k acting as P2P GO in some testing scenarios but isn't limited to it. AP operation with mac80211 based Tx A-MPDU control combined with clients using PS-Poll frames is subject to this race. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 11:06:21 +02:00
Tom Hughes	4479004e64	mac80211: clear subdir_stations when removing debugfs If we don't do this, and we then fail to recreate the debugfs directory during a mode change, then we will fail later trying to add stations to this now bogus directory: BUG: unable to handle kernel NULL pointer dereference at 0000006c IP: [<c0a92202>] mutex_lock+0x12/0x30 Call Trace: [<c0678ab4>] start_creating+0x44/0xc0 [<c0679203>] debugfs_create_dir+0x13/0xf0 [<f8a938ae>] ieee80211_sta_debugfs_add+0x6e/0x490 [mac80211] Cc: stable@kernel.org Signed-off-by: Tom Hughes <tom@compton.nu> Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2015-07-17 10:53:19 +02:00
YOSHIFUJI Hideaki	c15df306fc	ipv6: Remove unused arguments for __ipv6_dev_get_saddr(). Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-16 01:00:56 -07:00
Anuradha Karuppiah	88d6378bd6	netlink: changes for setting and clearing protodown via netlink. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Anuradha Karuppiah	d746d707a8	net core: Add protodown support. This patch introduces the proto_down flag that can be used by user space applications to notify switch drivers that errors have been detected on the device. The switch driver can react to protodown notification by doing a phys down on the associated switch port. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:39:40 -07:00
Alexei Starovoitov	ddf06c1e56	tc: act_bpf: fix memory leak prog->bpf_ops is populated when act_bpf is used with classic BPF and prog->bpf_name is optionally used with extended BPF. Fix memory leak when act_bpf is released. Fixes: `d23b8ad8ab` ("tc: add BPF based action") Fixes: `a8cb5f556b` ("act_bpf: add initial eBPF support for actions") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:36:35 -07:00
WANG Cong	c0afd9ce4d	fq_codel: fix return value of fq_codel_drop() The ->drop() is supposed to return the number of bytes it dropped, however fq_codel_drop() returns the index of the flow where it drops a packet from. Fix this by introducing a helper to wrap fq_codel_drop(). Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:36:35 -07:00
WANG Cong	e8d092aafd	net_sched: fix a use-after-free in sfq Fixes: `25331d6ce4` ("net: sched: implement qstat helper routines") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:36:34 -07:00
YOSHIFUJI Hideaki/吉藤英明	c0b8da1e76	ipv6: Fix finding best source address in ipv6_dev_get_saddr(). Commit `9131f3de2` ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: `9131f3de24` ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 21:06:13 -07:00
Eric Dumazet	03645a11a5	ipv6: lock socket in ip6_datagram_connect() ip6_datagram_connect() is doing a lot of socket changes without socket being locked. This looks wrong, at least for udp_lib_rehash() which could corrupt lists because of concurrent udp_sk(sk)->udp_portaddr_hash accesses. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:25:51 -07:00
Andrea Parri	40bdc5360d	pkt_sched: sch_qfq: remove unused member of struct qfq_sched The member (u32) "num_active_agg" of struct qfq_sched has been unused since its introduction in `462dbc9101` "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT) there is no active plan to use it; this removes the member. Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:21:31 -07:00
WANG Cong	052cbda41f	fq_codel: fix a use-after-free Fixes: `25331d6ce4` ("net: sched: implement qstat helper routines") Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:18:21 -07:00
Yuchung Cheng	f82b681a51	tcp: don't use F-RTO on non-recurring timeouts Currently F-RTO may repeatedly send new data packets on non-recurring timeouts in CA_Loss mode. This is a bug because F-RTO (RFC5682) should only be used on either new recovery or recurring timeouts. This exacerbates the recovery progress during frequent timeout & repair, because we prioritize sending new data packets instead of repairing the holes when the bandwidth is already scarce. Fix it by correcting the test of a new recovery episode. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:17:21 -07:00
Nikolay Aleksandrov	5ebc784625	bridge: mdb: fix double add notification Since the mdb add/del code was introduced there have been 2 br_mdb_notify calls when doing br_mdb_add() resulting in 2 notifications on each add. Example: Command: bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent Before patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent [MDB]dev br0 port eth1 grp 239.0.0.1 permanent After patch: root@debian:~# bridge monitor all [MDB]dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `cfd5675435` ("bridge: add support of adding and deleting mdb entries") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:15:01 -07:00
Satish Ashok	bc8c20acae	bridge: multicast: treat igmpv3 report with INCLUDE and no sources as a leave A report with INCLUDE/Change_to_include and empty source list should be treated as a leave, specified by RFC 3376, section 3.1: "If the requested filter mode is INCLUDE and the requested source list is empty, then the entry corresponding to the requested interface and multicast address is deleted if present. If no such entry is present, the request is ignored." Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 17:10:40 -07:00
Linus Torvalds	9090fdb9c9	Changes for 4.2-rc - Mainly fix-ups for the various 4.2 items -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVpWPTAAoJELgmozMOVy/dRjkQALGpY+m0Q+dfS4geU+JvvH0/ tTNJevWA1lu5xIgzk8aW+RgNOt+IQV2K6XnKOdKWcdE2wg9Yg9/8Y7atxeNSLoB6 mdUfstmTHoDFRXea1wog45QVVv9ABqSXIOmFd/PQSf2pEHvwgmVaxO2KW5n0n71H 0G9nw28jx+Igd1IjX3UezbB4Rm5jeG6exQEM5KKQ/l/cD2Pt/DIr9OmTmkMhj4Ft 4ppC+JsDHZMT2mQljEB1UL6Va1BGgULn4JwZviXRGgTEFiS+rtUkWXy9CVeNIV+A bo6aHEBlO8zbj9xg0POkHGBP2XCkDpsc5EP6/zon30MLpOTpnnTp2zq/bhGpyr5w ytH1x6V8Od1Or9BrHygy1yLGaqaVthIuZMReLq0Bu26px8mTECdu8bUjwxEvbvep u51vQ75h99td5XtXBHULcd/I8mUb3JqbEHqig0ChLOcnOMh72KLql8qfhLvzrYSv gFMvnqUZWq4WYET8cKcWypj5+cIME4objOim6T/TG+zTCmgbPnhwBsgvujLhfhDy JhUjUfS+31S99I4smL7ceKQpfA/7awLuZf/C20zdLZO2wXnx7VmbNlhQA3mvssyt ZOXeLkLGcXDIT3egNj5VWQR/KlVtvUS9FNVslxqo4ItyZZwJtbHTwRAgATp3XRwD Po7E7bWoUQqpanhgiX4Q =N/5G -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "Mainly fix-ups for the various 4.2 items" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (24 commits) IB/core: Destroy ocrdma_dev_id IDR on module exit IB/core: Destroy multcast_idr on module exit IB/mlx4: Optimize do_slave_init IB/mlx4: Fix memory leak in do_slave_init IB/mlx4: Optimize freeing of items on error unwind IB/mlx4: Fix use of flow-counters for process_mad IB/ipath: Convert use of __constant_<foo> to <foo> IB/ipoib: Set MTU to max allowed by mode when mode changes IB/ipoib: Scatter-Gather support in connected mode IB/ucm: Fix bitmap wrap when devnum > IB_UCM_MAX_DEVICES IB/ipoib: Prevent lockdep warning in __ipoib_ib_dev_flush IB/ucma: Fix lockdep warning in ucma_lock_files rds: rds_ib_device.refcount overflow RDMA/nes: Fix for incorrect recording of the MAC address RDMA/nes: Fix for resolving the neigh RDMA/core: Fixes for port mapper client registration IB/IPoIB: Fix bad error flow in ipoib_add_port() IB/mlx4: Do not attemp to report HCA clock offset on VFs IB/cm: Do not queue work to a device that's going away IB/srp: Avoid using uninitialized variable ...	2015-07-15 17:03:03 -07:00
Herbert Xu	89c22d8c3b	net: Fix skb csum races when peeking When we calculate the checksum on the recv path, we store the result in the skb as an optimisation in case we need the checksum again down the line. This is in fact bogus for the MSG_PEEK case as this is done without any locking. So multiple threads can peek and then store the result to the same skb, potentially resulting in bogus skb states. This patch fixes this by only storing the result if the skb is not shared. This preserves the optimisations for the few cases where it can be done safely due to locking or other reasons, e.g., SIOCINQ. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 15:59:58 -07:00
Richard Stearn	da278622bf	NET: AX.25: Stop heartbeat timer on disconnect. This may result in a kernel panic. The bug has always existed but somehow we've run out of luck now and it bites. Signed-off-by: Richard Stearn <richard@rns-stearn.demon.co.uk> Cc: stable@vger.kernel.org # all branches Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 15:59:58 -07:00
Herbert Xu	738ac1ebb9	net: Clone skb before setting peeked flag Shared skbs must not be modified and this is crucial for broadcast and/or multicast paths where we use it as an optimisation to avoid unnecessary cloning. The function skb_recv_datagram breaks this rule by setting peeked without cloning the skb first. This causes funky races which leads to double-free. This patch fixes this by cloning the skb and replacing the skb in the list when setting skb->peeked. Fixes: `a59322be07` ("[UDP]: Only increment counter on first peek/recv") Reported-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 15:59:58 -07:00
Daniel Borkmann	035d210f92	rtnetlink: reject non-IFLA_VF_PORT attributes inside IFLA_VF_PORTS Similarly as in commit `4f7d2cdfdd` ("rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver"), we have a double nesting of netlink attributes, i.e. IFLA_VF_PORTS only contains IFLA_VF_PORT that is nested itself. While IFLA_VF_PORTS is a verified attribute from ifla_policy[], we only check if the IFLA_VF_PORTS container has IFLA_VF_PORT attributes and then pass the attribute's content itself via nla_parse_nested(). It would be more correct to reject inner types other than IFLA_VF_PORT instead of continuing parsing and also similarly as in commit `4f7d2cdfdd`, to check for a minimum of NLA_HDRLEN. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Roopa Prabhu <roopa@cumulusnetworks.com> Cc: Scott Feldman <sfeldma@gmail.com> Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-15 15:53:27 -07:00
Florian Westphal	6c7941dee9	netfilter: xtables: remove __pure annotation sparse complains: ip_tables.c:361:27: warning: incorrect type in assignment (different modifiers) ip_tables.c:361:27: expected struct ipt_entry [assigned] e ip_tables.c:361:27: got struct ipt_entry [pure] doesn't change generated code. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:18:07 +02:00
Florian Westphal	dcebd3153e	netfilter: add and use jump label for xt_tee Don't bother testing if we need to switch to alternate stack unless TEE target is used. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:18:06 +02:00
Florian Westphal	7814b6ec6d	netfilter: xtables: don't save/restore jumpstack offset In most cases there is no reentrancy into ip/ip6tables. For skbs sent by REJECT or SYNPROXY targets, there is one level of reentrancy, but its not relevant as those targets issue an absolute verdict, i.e. the jumpstack can be clobbered since its not used after the target issues absolute verdict (ACCEPT, DROP, STOLEN, etc). So the only special case where it is relevant is the TEE target, which returns XT_CONTINUE. This patch changes ip(6)_do_table to always use the jump stack starting from 0. When we detect we're operating on an skb sent via TEE (percpu nf_skb_duplicated is 1) we switch to an alternate stack to leave the original one alone. Since there is no TEE support for arptables, it doesn't need to test if tee is active. The jump stack overflow tests are no longer needed as well -- since ->stacksize is the largest call depth we cannot exceed it. A much better alternative to the external jumpstack would be to just declare a jumps[32] stack on the local stack frame, but that would mean we'd have to reject iptables rulesets that used to work before. Another alternative would be to start rejecting rulesets with a larger call depth, e.g. 1000 -- in this case it would be feasible to allocate the entire stack in the percpu area which would avoid one dereference. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:18:06 +02:00
Florian Westphal	e7c8899f3e	netfilter: move tee_active to core This prepares for a TEE like expression in nftables. We want to ensure only one duplicate is sent, so both will use the same percpu variable to detect duplication. The other use case is detection of recursive call to xtables, but since we don't want dependency from nft to xtables core its put into core.c instead of the x_tables core. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:18:05 +02:00
Florian Westphal	98d1bd802c	netfilter: xtables: compute exact size needed for jumpstack The {arp,ip,ip6tables} jump stack is currently sized based on the number of user chains. However, its rather unlikely that every user defined chain jumps to the next, so lets use the existing loop detection logic to also track the chain depths. The stacksize is then set to the largest chain depth seen. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:18:04 +02:00
Eric W. Biederman	fd2ecda034	netfilter: nftables: Only run the nftables chains in the proper netns - Register the nftables chains in the network namespace that they need to run in. - Remove the hacks that stopped chains running in the wrong network namespace. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 18:17:36 +02:00
Eric W. Biederman	085db2c045	netfilter: Per network namespace netfilter hooks. - Add a new set of functions for registering and unregistering per network namespace hooks. - Modify the old global namespace hook functions to use the per network namespace hooks in their implementation, so their remains a single list that needs to be walked for any hook (this is important for keeping the hook priority working and for keeping the code walking the hooks simple). - Only allow registering the per netdevice hooks in the network namespace where the network device lives. - Dynamically allocate the structures in the per network namespace hook list in nf_register_net_hook, and unregister them in nf_unregister_net_hook. Dynamic allocate is required somewhere as the number of network namespaces are not fixed so we might as well allocate them in the registration function. The chain of registered hooks on any list is expected to be small so the cost of walking that list to find the entry we are unregistering should also be small. Performing the management of the dynamically allocated list entries in the registration and unregistration functions keeps the complexity from spreading. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>	2015-07-15 18:17:26 +02:00
Eric W. Biederman	0edcf282b0	netfilter: Factor out the hook list selection from nf_register_hook - Add a new function find_nf_hook_list to select the nf_hook_list - Fail nf_register_hook when asked for a per netdevice hook list when support for per netdevice hook lists is not built into the kernel. - Move the hook list head selection outside of nf_hook_mutex as nothing in the selection requires the hook list, and error handling is simpler if a mutex is not held. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 17:51:44 +02:00
Eric W. Biederman	4c0911566d	netfilter: Simply the tests for enabling and disabling the ingress queue hook Replace an overcomplicated switch statement with a simple if statement. This also removes the ingress queue enable outside of nf_hook_mutex as the protection provided by the mutex is not necessary and the code is clearer having both of the static key increments together. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 17:51:43 +02:00
Markus Elfring	0d6ef0688d	ipvs: Delete an unnecessary check before the function call "module_put" The module_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-15 17:51:22 +02:00
Wengang Wang	4fabb59449	rds: rds_ib_device.refcount overflow Fixes: `3e0249f9c0` ("RDS/IB: add refcount tracking to struct rds_ib_device") There lacks a dropping on rds_ib_device.refcount in case rds_ib_alloc_fmr failed(mr pool running out). this lead to the refcount overflow. A complain in line 117(see following) is seen. From vmcore: s_ib_rdma_mr_pool_depleted is 2147485544 and rds_ibdev->refcount is -2147475448. That is the evidence the mr pool is used up. so rds_ib_alloc_fmr is very likely to return ERR_PTR(-EAGAIN). 115 void rds_ib_dev_put(struct rds_ib_device *rds_ibdev) 116 { 117 BUG_ON(atomic_read(&rds_ibdev->refcount) <= 0); 118 if (atomic_dec_and_test(&rds_ibdev->refcount)) 119 queue_work(rds_wq, &rds_ibdev->free_work); 120 } fix is to drop refcount when rds_ib_alloc_fmr failed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Haggai Eran <haggaie@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>	2015-07-14 13:20:11 -04:00
Julian Anastasov	e3895c0334	ipvs: call skb_sender_cpu_clear Reset XPS's sender_cpu on forwarding. Signed-off-by: Julian Anastasov <ja@ssi.bg> Fixes: `2bd82484bb` ("xps: fix xps for stacked devices") Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	56184858d1	ipvs: fix crash with sync protocol v0 and FTP Fix crash in 3.5+ if FTP is used after switching sync_version to 0. Fixes: `749c42b620` ("ipvs: reduce sync rate with time thresholds") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Alex Gartrell	71563f3414	ipvs: skb_orphan in case of forwarding It is possible that we bind against a local socket in early_demux when we are actually going to want to forward it. In this case, the socket serves no purpose and only serves to confuse things (particularly functions which implicitly expect sk_fullsock to be true, like ip_local_out). Additionally, skb_set_owner_w is totally broken for non full-socks. Signed-off-by: Alex Gartrell <agartrell@fb.com> Fixes: `41063e9dd1` ("ipv4: Early TCP socket demux.") Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	05f00505a8	ipvs: fix crash if scheduler is changed I overlooked the svc->sched_data usage from schedulers when the services were converted to RCU in 3.10. Now the rare ipvsadm -E command can change the scheduler but due to the reverse order of ip_vs_bind_scheduler and ip_vs_unbind_scheduler we provide new sched_data to the old scheduler resulting in a crash. To fix it without changing the scheduler methods we have to use synchronize_rcu() only for the editing case. It means all svc->scheduler readers should expect a NULL value. To avoid breakage for the service listing and ipvsadm -R we can use the "none" name to indicate that scheduler is not assigned, a state when we drop new connections. Reported-by: Alexander Vasiliev <a.vasylev@404-group.com> Fixes: `ceec4c3816` ("ipvs: convert services to rcu") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Julian Anastasov	4754957f04	ipvs: do not use random local source address for tunnels Michael Vallaly reports about wrong source address used in rare cases for tunneled traffic. Looks like __ip_vs_get_out_rt in 3.10+ is providing uninitialized dest_dst->dst_saddr.ip because ip_vs_dest_dst_alloc uses kmalloc. While we retry after seeing EINVAL from routing for data that does not look like valid local address, it still succeeded when this memory was previously used from other dests and with different local addresses. As result, we can use valid local address that is not suitable for our real server. Fix it by providing 0.0.0.0 every time our cache is refreshed. By this way we will get preferred source address from routing. Reported-by: Michael Vallaly <lvs@nolatency.com> Fixes: `026ace060d` ("ipvs: optimize dst usage for real server") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
Alex Gartrell	326bf17ea5	ipvs: fix ipv6 route unreach panic Previously there was a trivial panic unshare -n /bin/bash <<EOF ip addr add dev lo face::1/128 ipvsadm -A -t [face::1]:15213 ipvsadm -a -t [face::1]:15213 -r b00c::1 echo boom \| nc face::1 15213 EOF This patch allows us to replicate the net logic above and simply capture the skb_dst(skb)->dev and use that for the purpose of the invocation. Signed-off-by: Alex Gartrell <agartrell@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2015-07-14 16:41:27 +09:00
David S. Miller	638d3c6381	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/bridge/br_mdb.c Minor conflict in br_mdb.c, in 'net' we added a memset of the on-stack 'ip' variable whereas in 'net-next' we assign a new member 'vid'. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-13 17:28:09 -07:00
Nikolay Aleksandrov	74fe61f17e	bridge: mdb: add vlan support for user entries Until now all user mdb entries were added in vlan 0, this patch adds support to allow the user to specify the vlan for the entry. About the uapi change a hole in struct br_mdb_entry is used so the size and offsets are kept the same (verified with pahole and tested with older iproute2). Example: $ bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000 dev br0 port eth1 grp 239.0.0.1 permanent vlan 200 dev br0 port eth1 grp 239.0.0.1 permanent Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-13 14:41:26 -07:00
Tom Herbert	de551f2eb2	net: Build IPv6 into kernel by default This patch makes the default to build IPv6 into the kernel. IPv6 now has significant traction and any remaining vestiges of IPv6 not being provided parity with IPv4 should be swept away. IPv6 is now core to the Internet and kernel. Points on IPv6 adoption: - Per Google statistics, IPv6 usage has reached 7% on the Internet and continues to exhibit an exponential growth rate https://www.google.com/intl/en/ipv6/statistics.html - Just a few days ago ARIN officially depleted its IPv4 pool - IPv6 only data centers are being successfully built (e.g. at Facebook) This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6 is set to "y" and the text has been updated to reflect the maturity of IPv6. Impact: Under some circumstances building modules in to kernel might have a performance advantage. In my testing, I did notice a very slight improvement. This will obviously increase the size of the kernel image. In my configuration I see: IPv6 as module: text data bss dec hex filename 9703666 1899288 933888 12536842 bf4c0a vmlinux IPv6 built into kernel text data bss dec hex filename 9436490 1879600 913408 12229498 ba9b7a vmlinux Which increases text size by ~270K (2.8% increase in size for me). If image size is an issue, presumably for a device which does not do IP networking (IMO we should be discouraging IPv4-only devices), IPV6 can be disabled or still built as a module. Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-13 13:10:21 -07:00
Linus Torvalds	f760b87f8f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) Missing list head init in bluetooth hidp session creation, from Tedd Ho-Jeong An. 2) Don't leak SKB in bridge netfilter error paths, from Florian Westphal. 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien Grall. 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf Baechle. 5) Fix race between rhashtable resize events and table walks, from Phil Sutter. 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from Daniel Borkmann. 7) Missing security layer socket state initialization in tipc code, from Stephen Smalley. 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from Denys Vlasenko. 9) Missing minor_idr destroy on module unload on macvtap driver, from Johannes Thumshirn. 10) Various pktgen kernel thread races, from Oleg Nesterov. 11) Fix races that can cause packets to be processed in the backlog even after a device attached to that SKB has been fully unregistered. From Julian Anastasov. 12) bcmgenet driver doesn't account packet drops vs. errors properly, fix from Petri Gynther. 13) Array index validation and off by one fix in DSA layer from Florian Fainelli * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits) can: replace timestamp as unique skb attribute ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux can: c_can: Fix default pinmux glitch at init can: rcar_can: unify error messages can: rcar_can: print request_irq() error code can: rcar_can: fix typo in error message can: rcar_can: print signed IRQ # can: rcar_can: fix IRQ check net: dsa: Fix off-by-one in switch address parsing net: dsa: Test array index before use net: switchdev: don't abort unsupported operations net: bcmgenet: fix accounting of packet drops vs errors cdc_ncm: update specs URL Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets bridge: mdb: allow the user to delete mdb entry if there's a querier net: call rcu_read_lock early in process_backlog net: do not process device backlog during unregistration bridge: fix potential crash in __netdev_pick_tx() net: axienet: Fix devm_ioremap_resource return value check ...	2015-07-13 11:18:25 -07:00
Pierre Morel	ea52bf8eda	9p/trans_virtio: reset virtio device on remove On device shutdown/removal, virtio drivers need to trigger a reset on the device; if this is neglected, the virtio core will complain about non-zero device status. This patch resets the status when the 9p virtio driver is removed from the system by calling vdev->config->reset on the virtio_device to send a reset to the host virtio device. Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2015-07-13 19:47:10 +03:00
Dmitry Torokhov	484836ec2d	netfilter: IDLETIMER: fix lockdep warning Dynamically allocated sysfs attributes should be initialized with sysfs_attr_init() otherwise lockdep will be angry with us: [ 45.468653] BUG: key ffffffc030fad4e0 not in .data! [ 45.468655] ------------[ cut here ]------------ [ 45.468666] WARNING: CPU: 0 PID: 1176 at /mnt/host/source/src/third_party/kernel/v3.18/kernel/locking/lockdep.c:2991 lockdep_init_map+0x12c/0x490() [ 45.468672] DEBUG_LOCKS_WARN_ON(1) [ 45.468672] CPU: 0 PID: 1176 Comm: iptables Tainted: G U W 3.18.0 #43 [ 45.468674] Hardware name: XXX [ 45.468675] Call trace: [ 45.468680] [<ffffffc0002072b4>] dump_backtrace+0x0/0x10c [ 45.468683] [<ffffffc0002073d0>] show_stack+0x10/0x1c [ 45.468688] [<ffffffc000a86cd4>] dump_stack+0x74/0x94 [ 45.468692] [<ffffffc000217ae0>] warn_slowpath_common+0x84/0xb0 [ 45.468694] [<ffffffc000217b84>] warn_slowpath_fmt+0x4c/0x58 [ 45.468697] [<ffffffc0002530a4>] lockdep_init_map+0x128/0x490 [ 45.468701] [<ffffffc000367ef0>] __kernfs_create_file+0x80/0xe4 [ 45.468704] [<ffffffc00036862c>] sysfs_add_file_mode_ns+0x104/0x170 [ 45.468706] [<ffffffc00036870c>] sysfs_create_file_ns+0x58/0x64 [ 45.468711] [<ffffffc000930430>] idletimer_tg_checkentry+0x14c/0x324 [ 45.468714] [<ffffffc00092a728>] xt_check_target+0x170/0x198 [ 45.468717] [<ffffffc000993efc>] check_target+0x58/0x6c [ 45.468720] [<ffffffc000994c64>] translate_table+0x30c/0x424 [ 45.468723] [<ffffffc00099529c>] do_ipt_set_ctl+0x144/0x1d0 [ 45.468728] [<ffffffc0009079f0>] nf_setsockopt+0x50/0x60 [ 45.468732] [<ffffffc000946870>] ip_setsockopt+0x8c/0xb4 [ 45.468735] [<ffffffc0009661c0>] raw_setsockopt+0x10/0x50 [ 45.468739] [<ffffffc0008c1550>] sock_common_setsockopt+0x14/0x20 [ 45.468742] [<ffffffc0008bd190>] SyS_setsockopt+0x88/0xb8 [ 45.468744] ---[ end trace 41d156354d18c039 ]--- Signed-off-by: Dmitry Torokhov <dtor@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-13 17:23:25 +02:00
David S. Miller	cee9f6d018	linux-can-fixes-for-4.2-20150712 -----BEGIN PGP SIGNATURE----- iQEcBAABCgAGBQJVorxbAAoJEP5prqPJtc/H5zgH/2nvkmT3aQ1gBKdU8L+Ten5D LIKYyDJ67agNoMfEC/5xWOm7P3X8Qi8cGc326/9AZE22QLE5x6fMCq/SmEC4MA25 GKu2ocwosdPVhIDQOY33IYZHxxtV8UZjt5KdNbQnNO6+iqE6tX5CbgueMlcWusry WmPCOlezTqHydXo0TYYLPjmqJ66RlrQMWYKvcB0SaxLYQfkBGbkW+fei3wdE4kW1 t7cocRj6Ievv59wCeF+G+fpviTVQVR5bknGA0J3lku9onATOYfdArEAD9FRajygG KhA/eNr0YEA40VO/baUInuEAJ/YvDqdk9LoyJ1DOXgUv1ysxYSxu/44G/IcZsOU= =+zVD -----END PGP SIGNATURE----- Merge tag 'linux-can-fixes-for-4.2-20150712' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-07-12 this is a pull request of 8 patchs for net/master. Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ check and several info and error messages. There are two patches by J.D. Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree, which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better approach to make the CAN skbs unique, the timestamp is replaced by a counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-12 22:24:01 -07:00
Oliver Hartkopp	d3b58c47d3	can: replace timestamp as unique skb attribute Commit `514ac99c64` "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. Without timestamping to be required by user space applications this timestamp was not generated which lead to commit `36c01245eb` "can: fix loss of CAN frames in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs by introducing several __net_timestamp() calls. This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb() to add __net_timestamp() after skbuff creation to prevent the frame loss fixed in mainline Linux. This patch removes the timestamp dependency and uses an atomic counter to create an unique identifier together with the skbuff pointer. Btw: the new skbcnt element introduced in struct can_skb_priv has to be initialized with zero in out-of-tree drivers which are not using alloc_can{,fd}_skb() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>	2015-07-12 21:13:22 +02:00
Florian Fainelli	c8cf89f73f	net: dsa: Fix off-by-one in switch address parsing cd->sw_addr is used as a MDIO bus address, which cannot exceed PHY_MAX_ADDR (32), our check was off-by-one. Fixes: `5e95329b70` ("dsa: add device tree bindings to register DSA switches") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-11 23:25:16 -07:00
Florian Fainelli	8f5063e97f	net: dsa: Test array index before use port_index is used an index into an array, and this information comes from Device Tree, make sure that port_index is not equal to the array size before using it. Move the check against port_index earlier in the loop. Fixes: 5e95329b701c: ("dsa: add device tree bindings to register DSA switches") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-11 23:25:16 -07:00
Vivien Didelot	2ee94014d9	net: switchdev: don't abort unsupported operations There is no need to abort attribute setting or object addition, if the prepare phase returned operation not supported. Thus, abort these two transactions only if the error is not -EOPNOTSUPP. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-11 21:29:55 -07:00
Florian Westphal	14fe22e334	Revert "ipv4: use skb coalescing in defragmentation" This reverts commit `3cc4949269`. There is nothing wrong with coalescing during defragmentation, it reduces truesize overhead and simplifies things for the receiving socket (no fraglist walk needed). However, it also destroys geometry of the original fragments. While that doesn't cause any breakage (we make sure to not exceed largest original size) ip_do_fragment contains a 'fastpath' that takes advantage of a present frag list and results in fragments that (in most cases) match what was received. In case its needed the coalescing could be done later, when we're sure the skb is not forwarded. But discussion during NFWS resulted in 'lets just remove this for now'. Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-11 21:16:40 -07:00
Phil Sutter	8220ea2324	net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets Reconsidering my commit `20462155` "net: inet_diag: export IPV6_V6ONLY sockopt", I am not happy with the limitations it causes for socket analysing code in userspace. Exporting the value only if it is set makes it hard for userspace to decide whether the option is not set or the kernel does not support exporting the option at all. >From an auditor's perspective, the interesting question for listening AF_INET6 sockets is: "Does it NOT have IPV6_V6ONLY set?" Because it is the unexpected case. This patch allows to answer this question reliably. Signed-off-by: Phil Sutter <phil@nwl.cc> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 23:25:24 -07:00
YOSHIFUJI Hideaki/吉藤英明	9131f3de24	ipv6: Do not iterate over all interfaces when finding source address on specific interface. If outgoing interface is specified and the candidate address is restricted to the outgoing interface, it is enough to iterate over that given interface only. Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 23:19:25 -07:00
Satish Ashok	51ed7f3e7d	bridge: mdb: allow the user to delete mdb entry if there's a querier Until now when a querier was present static entries couldn't be deleted. Fix this and allow the user to manipulate the mdb with or without a querier. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 18:18:00 -07:00
Julian Anastasov	2c17d27c36	net: call rcu_read_lock early in process_backlog Incoming packet should be either in backlog queue or in RCU read-side section. Otherwise, the final sequence of flush_backlog() and synchronize_net() may miss packets that can run without device reference: CPU 1 CPU 2 skb->dev: no reference process_backlog:__skb_dequeue process_backlog:local_irq_enable on_each_cpu for flush_backlog => IPI(hardirq): flush_backlog - packet not found in backlog CPU delayed ... synchronize_net - no ongoing RCU read-side sections netdev_run_todo, rcu_barrier: no ongoing callbacks __netif_receive_skb_core:rcu_read_lock - too late free dev process packet for freed dev Fixes: `6e583ce524` ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 18:16:36 -07:00
Julian Anastasov	e9e4dd3267	net: do not process device backlog during unregistration commit `381c759d99` ("ipv4: Avoid crashing in ip_error") fixes a problem where processed packet comes from device with destroyed inetdev (dev->ip_ptr). This is not expected because inetdev_destroy is called in NETDEV_UNREGISTER phase and packets should not be processed after dev_close_many() and synchronize_net(). Above fix is still required because inetdev_destroy can be called for other reasons. But it shows the real problem: backlog can keep packets for long time and they do not hold reference to device. Such packets are then delivered to upper levels at the same time when device is unregistered. Calling flush_backlog after NETDEV_UNREGISTER_FINAL still accounts all packets from backlog but before that some packets continue to be delivered to upper levels long after the synchronize_net call which is supposed to wait the last ones. Also, as Eric pointed out, processed packets, mostly from other devices, can continue to add new packets to backlog. Fix the problem by moving flush_backlog early, after the device driver is stopped and before the synchronize_net() call. Then use netif_running check to make sure we do not add more packets to backlog. We have to do it in enqueue_to_backlog context when the local IRQ is disabled. As result, after the flush_backlog and synchronize_net sequence all packets should be accounted. Thanks to Eric W. Biederman for the test script and his valuable feedback! Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Fixes: `6e583ce524` ("net: eliminate refcounting in backlog queue") Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-10 18:16:36 -07:00
Pablo Neira Ayuso	95dd8653de	netfilter: ctnetlink: put back references to master ct and expect objects We have to put back the references to the master conntrack and the expectation that we just created, otherwise we'll leak them. Fixes: `0ef71ee1a5` ("netfilter: ctnetlink: refactor ctnetlink_create_expect") Reported-by: Tim Wiess <Tim.Wiess@watchguard.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-10 14:18:03 +02:00
Eric Dumazet	a7d35f9d73	bridge: fix potential crash in __netdev_pick_tx() Commit `c29390c6df` ("xps: must clear sender_cpu before forwarding") fixed an issue in normal forward path, caused by sender_cpu & napi_id skb fields being an union. Bridge is another point where skb can be forwarded, so we need the same cure. Bug triggers if packet was received on a NIC using skb_mark_napi_id() Fixes: `2bd82484bb` ("xps: fix xps for stacked devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Bob Liu <bob.liu@oracle.com> Tested-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 22:48:42 -07:00
Eric Dumazet	a4e2405cc5	tcp: do not export tcp_init_xmit_timers() After commit `900f65d361` ("tcp: move duplicate code from tcp_v4_init_sock()/tcp_v6_init_sock()"), we no longer need to export tcp_init_xmit_timers() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:44:38 -07:00
Nikolay Aleksandrov	09cf0211f9	bridge: mdb: fill state in br_mdb_notify Fill also the port group state when sending notifications. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:41:11 -07:00
Masatake YAMATO	cb1c61680d	route: remove unsed variable in __mkroute_input flags local variable in __mkroute_input is not used as a variable. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:34:34 -07:00
Tom Herbert	35a256fee5	ipv6: Nonlocal bind Add support to allow non-local binds similar to how this was done for IPv4. Non-local binds are very useful in emulating the Internet in a box, etc. This add the ip_nonlocal_bind sysctl under ipv6. Testing: Set up nonlocal binding and receive routing on a host, e.g.: ip -6 rule add from ::/0 iif eth0 lookup 200 ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200 sysctl -w net.ipv6.ip_nonlocal_bind=1 Set up routing to 2001:0:0:1::/64 on peer to go to first host ping6 -I 2001:0:0:1::1 peer-address -- to verify Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 21:09:10 -07:00
Eric Dumazet	dbe7faa404	inet: inet_twsk_deschedule factorization inet_twsk_deschedule() calls are followed by inet_twsk_put(). Only particular case is in inet_twsk_purge() but there is no point to defer the inet_twsk_put() after re-enabling BH. Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put() and move the inet_twsk_put() inside. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:20 -07:00
Eric Dumazet	fc01538f9f	inet: simplify timewait refcounting timewait sockets have a complex refcounting logic. Once we realize it should be similar to established and syn_recv sockets, we can use sk_nulls_del_node_init_rcu() and remove inet_twsk_unhash() In particular, deferred inet_twsk_put() added in commit `13475a30b6` ("tcp: connect() race with timewait reuse") looks unecessary : When removing a timewait socket from ehash or bhash, caller must own a reference on the socket anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:12:20 -07:00
Florian Westphal	8b58a39846	ipv6: use flag instead of u16 for hop in inet6_skb_parm Hop was always either 0 or sizeof(struct ipv6hdr). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:06:59 -07:00
Oleg Nesterov	1fbe4b46ca	net: pktgen: kill the "Wait for kthread_stop" code in pktgen_thread_worker() pktgen_thread_worker() doesn't need to wait for kthread_stop(), it can simply exit. Just pktgen_create_thread() and pg_net_exit() should do get_task_struct()/put_task_struct(). kthread_stop(dead_thread) is fine. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:05:32 -07:00
Oleg Nesterov	fecdf8be2d	net: pktgen: fix race between pktgen_thread_worker() and kthread_stop() pktgen_thread_worker() is obviously racy, kthread_stop() can come between the kthread_should_stop() check and set_current_state(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Stancek <jstancek@redhat.com> Reported-by: Marcelo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 15:05:32 -07:00
Yuchung Cheng	b20a3fa30a	tcp: update congestion state first before raising cwnd The congestion state and cwnd can be updated in the wrong order. For example, upon receiving a dubious ACK, we incorrectly raise the cwnd first (tcp_may_raise_cwnd()/tcp_cong_avoid()) because the state is still Open, then enter recovery state to reduce cwnd. For another example, if the ACK indicates spurious timeout or retransmits, we first revert the cwnd reduction and congestion state back to Open state. But we don't raise the cwnd even though the ACK does not indicate any congestion. To fix this problem we should first call tcp_fastretrans_alert() to process the dubious ACK and update the congestion state, then call tcp_may_raise_cwnd() that raises cwnd based on the current state. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Yuchung Cheng	76174004a0	tcp: do not slow start when cwnd equals ssthresh In the original design slow start is only used to raise cwnd when cwnd is stricly below ssthresh. It makes little sense to slow start when cwnd == ssthresh: especially when hystart has set ssthresh in the initial ramp, or after recovery when cwnd resets to ssthresh. Not doing so will also help reduce the buffer bloat slightly. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Yuchung Cheng	071d5080e3	tcp: add tcp_in_slow_start helper Add a helper to test the slow start condition in various congestion control modules and other places. This is to prepare a slight improvement in policy as to exactly when to slow start. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:22:52 -07:00
Alexander Duyck	1007f59dce	net: skb_defer_rx_timestamp should check for phydev before setting up classify This change makes it so that the call skb_defer_rx_timestamp will first check for a phydev before going in and manipulating the skb->data and skb->len values. By doing this we can avoid unnecessary work on network devices that don't support phydev. As a result we reduce the total instruction count needed to process this on most devices. Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:17:15 -07:00
Jon Maxwell	2251ae46af	tcp: v1 always send a quick ack when quickacks are enabled V1 of this patch contains Eric Dumazet's suggestion to move the per dst RTAX_QUICKACK check into tcp_in_quickack_mode(). Thanks Eric. I ran some tests and after setting the "ip route change quickack 1" knob there were still many delayed ACKs sent. This occured because when icsk_ack.quick=0 the !icsk_ack.pingpong value is subsequently ignored as tcp_in_quickack_mode() checks both these values. The condition for a quick ack to trigger requires that both icsk_ack.quick != 0 and icsk_ack.pingpong=0. Currently only icsk_ack.pingpong is controlled by the knob. But the icsk_ack.quick value changes dynamically depending on heuristics. The crux of the matter is that delayed acks still cannot be entirely disabled even with the RTAX_QUICKACK per dst knob enabled. This patch ensures that a quick ack is always sent when the RTAX_QUICKACK per dst knob is turned on. The "ip route change quickack 1" knob was recently added to enable quickacks. It was modeled around the TCP_QUICKACK setsockopt() option. This issue is that even with "ip route change quickack 1" enabled we still see delayed ACKs under some conditions. It would be nice to be able to completely disable delayed ACKs. Here is an example: # netstat -s\|grep dela 3 delayed acks sent For all routes enable the knob # ip route change quickack 1 Generate some traffic across a slow link and we still see the delayed acks. # netstat -s\|grep dela 106 delayed acks sent 1 delayed acks further delayed because of locked socket The issue is that both the "ip route change quickack 1" knob and the TCP_QUICKACK option set the icsk_ack.pingpong variable to 0. However at the business end in the __tcp_ack_snd_check() routine, tcp_in_quickack_mode() checks that both icsk_ack.quick != 0 and icsk_ack.pingpong=0 in order to trigger a quickack. As icsk_ack.quick is determined by heuristics it can be 0. When that occurs the icsk_ack.pingpong value is ignored and a delayed ACK is sent regardless. This patch moves the RTAX_QUICKACK per dst check into the tcp_in_quickack_mode() routine which ensures that a quickack is always sent when the quickack knob is enabled for that dst. Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 14:15:44 -07:00
Ilya Dryomov	c44bd69c0c	libceph: treat sockaddr_storage with uninitialized family as blank addr_is_blank() should return true if family is neither AF_INET nor AF_INET6. This is what its counterpart entity_addr_t::is_blank_ip() is doing and it is the right thing to do: in process_banner() we check if our address is blank and if it is "learn" it from our peer. As it is, we never learn our address and always send out a blank one. This goes way back to ceph.git commit dd732cbfc1c9 ("use sockaddr_storage; and some ipv6 support groundwork") from 2009. While at at, do not open-code ipv6_addr_any() and use INADDR_ANY constant instead of 0. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>	2015-07-09 20:30:34 +03:00
Ilya Dryomov	757856d2b9	libceph: enable ceph in a non-default network namespace Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Sage Weil <sage@redhat.com>	2015-07-09 20:30:34 +03:00
David S. Miller	ace15bbb39	Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree. This batch mostly comes with patches to address fallout from the previous merge window cycle, they are: 1) Use entry->state.hook_list from nf_queue() instead of the global nf_hooks which is not valid when used from NFPROTO_NETDEV, this should cause no problems though since we have no userspace queueing for that family, but let's fix this now for the sake of correctness. Patch from Eric W. Biederman. 2) Fix compilation breakage in bridge netfilter if CONFIG_NF_DEFRAG_IPV4 is not set, from Bernhard Thaler. 3) Use percpu jumpstack in arptables too, now that there's a single copy of the rule blob we can't store the return address there anymore. Patch from Florian Westphal. 4) Fix a skb leak in the xmit path of bridge netfilter, problem there since 2.6.37 although it should be not possible to hit invalid traffic there, also from Florian. 5) Eric Leblond reports that when loading a large ruleset with many missing modules after a fresh boot, nf_tables can take long time commit it. Fix this by processing the full batch until the end, even on missing modules, then abort only once and restart processing. 6) Add bridge netfilter files to the MAINTAINER files. 7) Fix a net_device refcount leak in the new IPV6 bridge netfilter code, from Julien Grall. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-09 00:03:10 -07:00
Andy Gospodarek	974d7af5fc	ipv4: add support for linkdown sysctl to netconf This kernel patch exports the value of the new ignore_routes_with_linkdown via netconf. v2: changes to notify userspace via netlink when sysctl values change and proposed for 'net' since this could be considered a bugfix Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 23:34:53 -07:00
Nikolay Aleksandrov	f1158b74e5	bridge: mdb: zero out the local br_ip variable before use Since commit `b0e9a30dd6` ("bridge: Add vlan id to multicast groups") there's a check in br_ip_equal() for a matching vlan id, but the mdb functions were not modified to use (or at least zero it) so when an entry was added it would have a garbage vlan id (from the local br_ip variable in __br_mdb_add/del) and this would prevent it from being matched and also deleted. So zero out the whole local ip var to protect ourselves from future changes and also to fix the current bug, since there's no vlan id support in the mdb uapi - use always vlan id 0. Example before patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent RTNETLINK answers: Invalid argument After patch: root@debian:~# bridge mdb add dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb del dev br0 port eth1 grp 239.0.0.1 permanent root@debian:~# bridge mdb Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Fixes: `b0e9a30dd6` ("bridge: Add vlan id to multicast groups") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:10:40 -07:00
Stephen Smalley	fdd75ea8df	net/tipc: initialize security state for new connection socket Calling connect() with an AF_TIPC socket would trigger a series of error messages from SELinux along the lines of: SELinux: Invalid class 0 type=AVC msg=audit(1434126658.487:34500): avc: denied { <unprintable> } for pid=292 comm="kworker/u16:5" scontext=system_u:system_r:kernel_t:s0 tcontext=system_u:object_r:unlabeled_t:s0 tclass=<unprintable> permissive=0 This was due to a failure to initialize the security state of the new connection sock by the tipc code, leaving it with junk in the security class field and an unlabeled secid. Add a call to security_sk_clone() to inherit the security state from the parent socket. Reported-by: Tim Shearer <tim.shearer@overturenetworks.com> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:08:23 -07:00
Timo Teräs	fc24f2b209	ip_tunnel: fix ipv4 pmtu check to honor inner ip header df Frag needed should be sent only if the inner header asked to not fragment. Currently fragmentation is broken if the tunnel has df set, but df was not asked in the original packet. The tunnel's df needs to be still checked to update internally the pmtu cache. Commit `23a3647bc4` broke it, and this commit fixes the ipv4 df check back to the way it was. Fixes: `23a3647bc4` ("ip_tunnels: Use skb-len to PMTU check.") Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Timo Teräs <timo.teras@iki.fi> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:03:09 -07:00
Daniel Borkmann	4f7d2cdfdd	rtnetlink: verify IFLA_VF_INFO attributes before passing them to driver Jason Gunthorpe reported that since commit `c02db8c629` ("rtnetlink: make SR-IOV VF interface symmetric"), we don't verify IFLA_VF_INFO attributes anymore with respect to their policy, that is, ifla_vfinfo_policy[]. Before, they were part of ifla_policy[], but they have been nested since placed under IFLA_VFINFO_LIST, that contains the attribute IFLA_VF_INFO, which is another nested attribute for the actual VF attributes such as IFLA_VF_MAC, IFLA_VF_VLAN, etc. Despite the policy being split out from ifla_policy[] in this commit, it's never applied anywhere. nla_for_each_nested() only does basic nla_ok() testing for struct nlattr, but it doesn't know about the data context and their requirements. Fix, on top of Jason's initial work, does 1) parsing of the attributes with the right policy, and 2) using the resulting parsed attribute table from 1) instead of the nla_for_each_nested() loop (just like we used to do when still part of ifla_policy[]). Reference: http://thread.gmane.org/gmane.linux.network/368913 Fixes: `c02db8c629` ("rtnetlink: make SR-IOV VF interface symmetric") Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Sucheta Chakraborty <sucheta.chakraborty@qlogic.com> Cc: Greg Rose <gregory.v.rose@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Cc: Rony Efraim <ronye@mellanox.com> Cc: Vlad Zolotarov <vladz@cloudius-systems.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Vlad Zolotarov <vladz@cloudius-systems.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 16:01:52 -07:00
Nicolas Dichtel	95ec655bc4	Revert "dev: set iflink to 0 for virtual interfaces" This reverts commit `e1622baf54`. The side effect of this commit is to add a '@NONE' after each virtual interface name with a 'ip link'. It may break existing scripts. Reported-by: Olivier Hartkopp <socketcan@hartkopp.net> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 15:52:33 -07:00
Eric Dumazet	d339727c2b	net: graceful exit from netif_alloc_netdev_queues() User space can crash kernel with ip link add ifb10 numtxqueues 100000 type ifb We must replace a BUG_ON() by proper test and return -EINVAL for crazy values. Fixes: `60877a32bc` ("net: allow large number of tx queues") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 15:46:17 -07:00
Satish Ashok	f7e2965db1	bridge: mdb: start delete timer for temp static entries Start the delete timer when adding temp static entries so they can expire. Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `ccb1c31a7a` ("bridge: add flags to distinguish permanent mdb entires") Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 14:52:58 -07:00
Eric Dumazet	32f675bbc9	net_sched: gen_estimator: extend pps limit rate estimators are limited to 4 Mpps, which was fine years ago, but too small with current hardware generation. Lets use 2^5 scaling instead of 2^10 to get 128 Mpps new limit. On 64bit arch, use an "unsigned long" for temp storage and remove limit. (We do not expect 32bit arches to be able to reach this point) Tested: tc -s -d filter sh dev eth0 parent ffff: filter protocol ip pref 1 u32 filter protocol ip pref 1 u32 fh 800: ht divisor 1 filter protocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:15 match 07000000/ff000000 at 12 action order 1: gact action drop random type none pass val 0 index 1 ref 1 bind 1 installed 166 sec Action statistics: Sent 39734251496 bytes 863788076 pkt (dropped 863788117, overlimits 0 requeues 0) rate 4067Mbit 11053596pps backlog 0b 0p requeues 0 Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:59:20 -07:00
Eric Dumazet	2ee22a90c7	net_sched: act_mirred: remove spinlock in fast path Like act_gact, act_mirred can be lockless in packet processing 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) use rcu to protect tcfm_dev 4) Remove spinlock usage, as it is no longer needed. Next step : add multi queue capability to ifb device Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: John Fastabend <john.fastabend@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	56e5d1ca18	net_sched: act_gact: remove spinlock in fast path Final step for gact RCU operation : 1) Use percpu stats 2) update lastuse only every clock tick to avoid false sharing 3) Remove spinlock acquisition, as it is no longer needed. Since this is the last contended lock in packet RX when tc gact is used, this gives impressive gain. My host with 8 RX queues was handling 5 Mpps before the patch, and more than 11 Mpps after patch. Tested: On receiver : dev=eth0 tc qdisc del dev $dev ingress 2>/dev/null tc qdisc add dev $dev ingress tc filter del dev $dev root pref 10 2>/dev/null tc filter del dev $dev pref 10 2>/dev/null tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \ u32 match ip src 7.0.0.0/8 flowid 1:15 action drop Sender sends packets flood from 7/8 network Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	8f2ae965b7	net_sched: act_gact: read tcfg_ptype once Third step for gact RCU operation : Following patch will get rid of spinlock protection, so we need to read tcfg_ptype once. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	cc6510a950	net_sched: act_gact: use a separate packet counters for gact_determ() Second step for gact RCU operation : We want to get rid of the spinlock protecting gact operations. Stats (packets/bytes) will soon be per cpu. gact_determ() would not work without a central packet counter, so lets add it for this mode. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	cef5ecf96b	net_sched: act_gact: make tcfg_pval non zero First step for gact RCU operation : Instead of testing if tcfg_pval is zero or not, just make it 1. No change in behavior, but slightly faster code. The smp_rmb()/smp_wmb() barriers, while not strictly needed at this stage are added for upcoming spinlock removal. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:42 -07:00
Eric Dumazet	519c818e8f	net: sched: add percpu stats to actions Reuse existing percpu infrastructure John Fastabend added for qdisc. This patch adds a new cpustats parameter to tcf_hash_create() and all actions pass false, meaning this patch should have no effect yet. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:41 -07:00
Eric Dumazet	24ea591d22	net: sched: extend percpu stats helpers qdisc_bstats_update_cpu() and other helpers were added to support percpu stats for qdisc. We want to add percpu stats for tc action, so this patch add common helpers. qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update() qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop() Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:50:41 -07:00
Yuchung Cheng	3759824da8	tcp: PRR uses CRB mode by default and SS mode conditionally PRR slow start is often too aggressive especially when drops are caused by traffic policers. The policers mainly use token bucket to enforce the rate so sending (twice) faster than the delivery rate causes excessive drops. This patch changes PRR to the conservative reduction bound (CRB) mode in RFC 6937 by default. CRB follows the packet conservation rule to send at most the delivery rate by default. But if many packets are lost and the pipe is empty, CRB may take N round trips to repair N losses. We conditionally turn on slow start mode if all these conditions are made to speed up the recovery: 1) on the second round or later in recovery 2) retransmission sent in the previous round is delivered on this ACK 3) no retransmission is marked lost on this ACK By using packet conservation by default, this change reduces the loss retransmits signicantly on networks that deploy traffic policers, up to 20% reduction of overall loss rate. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:29:46 -07:00
Yuchung Cheng	291a00d1a7	tcp: reduce cwnd if retransmit is lost in CA_Loss If the retransmission in CA_Loss is lost again, we should not continue to slow start or raise cwnd in congestion avoidance mode. Instead we should enter fast recovery and use PRR to reduce cwnd, following the principle in RFC5681: "... or the loss of a retransmission, should be taken as two indications of congestion and, therefore, cwnd (and ssthresh) MUST be lowered twice in this case." This is especially important to reduce loss when the CA_Loss state was caused by a traffic policer dropping the entire inflight. The CA_Loss state has a problem where a loss of L packets causes the sender to send a burst of L packets. So a policer that's dropping most packets in a given RTT can cause a huge retransmit storm. By contrast, PRR includes logic to bound the number of outbound packets that result from a given ACK. So switching to CA_Recovery on lost retransmits in CA_Loss avoids this retransmit storm problem when in CA_Loss. Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Nandita Dukkipati <nanditad@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-08 13:29:45 -07:00
Julien Grall	86e8971800	netfilter: bridge: Use __in6_dev_get rather than in6_dev_get in br_validate_ipv6 The commit `efb6de9b4b` "netfilter: bridge: forward IPv6 fragmented packets" introduced a new function br_validate_ipv6 which take a reference on the inet6 device. Although, the reference is not released at the end. This will result to the impossibility to destroy any netdevice using ipv6 and bridge. It's possible to directly retrieve the inet6 device without taking a reference as all netfilter hooks are protected by rcu_read_lock via nf_hook_slow. Spotted while trying to destroy a Xen guest on the upstream Linux: "unregister_netdevice: waiting for vif1.0 to become free. Usage count = 1" Signed-off-by: Julien Grall <julien.grall@citrix.com> Cc: Bernhard Thaler <bernhard.thaler@wvnet.at> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: fw@strlen.de Cc: ian.campbell@citrix.com Cc: wei.liu2@citrix.com Cc: Bob Liu <bob.liu@oracle.com> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-08 11:02:16 +02:00
Linus Torvalds	1dc51b8288	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...	2015-07-04 19:36:06 -07:00
Linus Torvalds	9b284cbdb5	bluetooth: fix list handling Commit `835a6a2f86` ("Bluetooth: Stop sabotaging list poisoning") thought that the code was sabotaging the list poisoning when NULL'ing out the list pointers and removed it. But what was going on was that the bluetooth code was using NULL pointers for the list as a way to mark it empty, and that commit just broke it (and replaced the test with NULL with a "list_empty()" test on a uninitialized list instead, breaking things even further). So fix it all up to use the regular and real list_empty() handling (which does not use NULL, but a pointer to itself), also making sure to initialize the list properly (the previous NULL case was initialized implicitly by the session being allocated with kzalloc()) This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong An. [ I would normally expect to get this through the bt tree, but I'm going to release -rc1, so I'm just committing this directly - Linus ] Reported-and-tested-by: Jörg Otte <jrg.otte@gmail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Original-by: Tedd Ho-Jeong An <tedd.an@intel.com> Original-by: Marcel Holtmann <marcel@holtmann.org>: Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-07-04 19:11:33 -07:00
Al Viro	0f1db7dee2	9p: cope with bogus responses from server in p9_client_{read,write} if server claims to have written/read more than we'd told it to, warn and cap the claimed byte count to avoid advancing more than we are ready to.	2015-07-04 16:17:39 -04:00
Al Viro	67e808fbb0	p9_client_write(): avoid double p9_free_req() Braino in "9p: switch p9_client_write() to passing it struct iov_iter *"; if response is impossible to parse and we discard the request, get the out of the loop right there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-07-04 16:11:05 -04:00
Al Viro	a84b69cb6e	9p: forgetting to cancel request on interrupted zero-copy RPC If we'd already sent a request and decide to abort it, we must issue TFLUSH properly and not just blindly reuse the tag, or we'll get seriously screwed when response eventually arrives and we confuse it for response to later request that had reused the same tag. Cc: stable@vger.kernel.org # v3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2015-07-04 16:04:19 -04:00
Angga	4c938d22c8	ipv6: Make MLD packets to only be processed locally Before commit `daad151263` ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") MLD packets were only processed locally. After the change, a copy of MLD packet goes through ip6_mr_input, causing MRT6MSG_NOCACHE message to be generated to user space. Make MLD packet only processed locally. Fixes: `daad151263` ("ipv6: Make ipv6_is_mld() inline and use it from ip6_mc_input().") Signed-off-by: Hermin Anggawijaya <hermin.anggawijaya@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:52:38 -07:00
Markus Elfring	92b80eb33c	netlink: Delete an unnecessary check before the function call "module_put" The module_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:27:43 -07:00
Markus Elfring	f6e1c91666	net-RDS: Delete an unnecessary check before the function call "module_put" The module_put() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:27:42 -07:00
Markus Elfring	87775312a8	net-ipv6: Delete an unnecessary check before the function call "free_percpu" The free_percpu() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-03 09:27:42 -07:00
Trond Myklebust	b5872f0c67	SUNRPC: Don't confuse ENOBUFS with a write_space issue ENOBUFS means that memory allocations are failing due to an actual low memory situation. It should not be confused with being out of socket buffer space. Handle the problem by just punting to the delay in call_status. Reported-by: Neil Brown <neilb@suse.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-03 09:42:54 -04:00
Trond Myklebust	93aa6c7bbc	SUNRPC: Don't reencode message if transmission failed with ENOBUFS If we're running out of buffer memory when transmitting data, then we want to just delay for a moment, and then continue transmitting the remainder of the message. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>	2015-07-03 09:42:12 -04:00
Nikolay Aleksandrov	462e1ead92	bridge: vlan: fix usage of vlan 0 and 4095 again Vlan ids 0 and 4095 were disallowed by commit: `8adff41c3d` ("bridge: Don't use VID 0 and 4095 in vlan filtering") but then the check was removed when vlan ranges were introduced by: `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") So reintroduce the vlan range check. Before patch: [root@testvm ~]# bridge vlan add vid 0 dev eth0 master (succeeds) After Patch: [root@testvm ~]# bridge vlan add vid 0 dev eth0 master RTNETLINK answers: Invalid argument Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: `bdced7ef78` ("bridge: support for multiple vlans and vlan ranges in setlink and dellink requests") Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-02 12:19:17 -07:00
David S. Miller	c4555d16d9	Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Johan Hedberg says: ==================== pull request: bluetooth 2015-07-02 A couple of regressions crept in because of a patch to use proper list APIs rather than manually reading & writing the next/prev pointers (commit `835a6a2f86`). Turns out this was masking a few bugs: a missing INIT_LIST_HEAD() call and incorrectly using list_del() rather than list_del_init(). The two patches in this set fix these, and it'd be nice they could still make it to 4.2-rc1 to avoid new bug reports from users. Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2015-07-02 12:17:11 -07:00
Linus Torvalds	0c76c6ba24	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph updates from Sage Weil: "We have a pile of bug fixes from Ilya, including a few patches that sync up the CRUSH code with the latest from userspace. There is also a long series from Zheng that fixes various issues with snapshots, inline data, and directory fsync, some simplification and improvement in the cap release code, and a rework of the caching of directory contents. To top it off there are a few small fixes and cleanups from Benoit and Hong" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits) rbd: use GFP_NOIO in rbd_obj_request_create() crush: fix a bug in tree bucket decode libceph: Fix ceph_tcp_sendpage()'s more boolean usage libceph: Remove spurious kunmap() of the zero page rbd: queue_depth map option rbd: store rbd_options in rbd_device rbd: terminate rbd_opts_tokens with Opt_err ceph: fix ceph_writepages_start() rbd: bump queue_max_segments ceph: rework dcache readdir crush: sync up with userspace crush: fix crash from invalid 'take' argument ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL ceph: pre-allocate data structure that tracks caps flushing ceph: re-send flushing caps (which are revoked) in reconnect stage ceph: send TID of the oldest pending caps flush to MDS ceph: track pending caps flushing globally ceph: track pending caps flushing accurately libceph: fix wrong name "Ceph filesystem for Linux" ceph: fix directory fsync ...	2015-07-02 11:35:00 -07:00
Linus Torvalds	8688d9540c	NFS client updates for Linux 4.2 Highlights include: Stable patches: - Fix a crash in the NFSv4 file locking code. - Fix an fsync() regression, where we were failing to retry I/O in some circumstances. - Fix an infinite loop in NFSv4.0 OPEN stateid recovery - Fix a memory leak when an attempted pnfs fails. - Fix a memory leak in the backchannel code - Large hostnames were not supported correctly in NFSv4.1 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O. - Fix a couple of credential issues in pNFS/flexfiles Bugfixes + cleanups: - Open flag sanity checks in the NFSv4 atomic open codepath - More NFSv4 delegation related bugfixes - Various NFSv4.1 backchannel bugfixes and cleanups - Fix the NFS swap socket code - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code - Fix a UDP transport deadlock issue Features: - More RDMA client transport improvements - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVlWQgAAoJEGcL54qWCgDyXtcP/2Y3HJ9xu5qU3Bo/jzCAw4E1 jPPMSFAz4kqy/LGoslyc1cNDEiKGzJYWU8TtCGI3KAyNxb6n3pT1mEE1tvIsSdis D8bpV13M452PPpZYrBawIf4+OuohXmuYHpFiVNSpLbH3Uo7dthvFFnbqCGaGlnqY rXYZHAnx637OGBcJsT4AXCUz12ILvxMYRnqwW6Xn+j9JmwR1coQX3v8W8e7SMf6i J+zOny7Uetjrg1U9C9uQB6ZvIoxUMo9QOVmtGCwsBl8lM3fLmzaQfcUf9fm76pMT yTrKJs4jBLvVf00bRHFDv9EHWCy97oqCkeQEw1EY2lnxp/lmM5SiI4zQqjbf0QTW 5VQScT1MK6xwHoUbuI/sYdXXR8KGDVT1xCFFHUNcg69CvgqdgWslPQY7xLJMvUJZ vBWfWDd8ppdCw2ZVX4ae/bnhfc+/mVh4wRPF7tgVAjT0pobBV9xMOeMkF4mo76Wa pvo/nTRMt68hpESVSvq9dYEMVhy5haqFhPrSbyAGOpT4SE2V3RCCZQfhu15TMKdW BdvItG+mdAVPbIHqhx7vRdAudcOEZKyxbFA+l3E5FyCAXLV7XS3M8CEl3P1w7gmm Ccr8DW9abKFJf1RAKdX3stexIoJLGTwciSMR5smsbup/xNcx/fRgx2f1w31JMPxb kG3Izfk25w9uGSsbR39D =AREr -----END PGP SIGNATURE----- Merge tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix a crash in the NFSv4 file locking code. - Fix an fsync() regression, where we were failing to retry I/O in some circumstances. - Fix an infinite loop in NFSv4.0 OPEN stateid recovery - Fix a memory leak when an attempted pnfs fails. - Fix a memory leak in the backchannel code - Large hostnames were not supported correctly in NFSv4.1 - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O. - Fix a couple of credential issues in pNFS/flexfiles Bugfixes + cleanups: - Open flag sanity checks in the NFSv4 atomic open codepath - More NFSv4 delegation related bugfixes - Various NFSv4.1 backchannel bugfixes and cleanups - Fix the NFS swap socket code - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code - Fix a UDP transport deadlock issue Features: - More RDMA client transport improvements - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles" * tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits) nfs: Remove invalid tk_pid from debug message nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh nfs: Drop bad comment in nfs41_walk_client_list() nfs: Remove unneeded micro checking of CONFIG_PROC_FS nfs: Don't setting FILE_CREATED flags always nfs: Use remove_proc_subtree() instead remove_proc_entry() nfs: Remove unused argument in nfs_server_set_fsinfo() nfs: Fix a memory leak when meeting an unsupported state protect nfs: take extra reference to fl->fl_file when running a LOCKU operation NFSv4: When returning a delegation, don't reclaim an incompatible open mode. NFSv4.2: LAYOUTSTATS is optional to implement NFSv4.2: Fix up a decoding error in layoutstats pNFS/flexfiles: Fix the reset of struct pgio_header when resending pNFS/flexfiles: Turn off layoutcommit for servers that don't need it pnfs/flexfiles: protect ktime manipulation with mirror lock nfs: provide pnfs_report_layoutstat when NFS42 is disabled nfs: verify open flags before allowing open nfs: always update creds in mirror, even when we have an already connected ds nfs: fix potential credential leak in ff_layout_update_mirror_cred pnfs/flexfiles: report layoutstat regularly ...	2015-07-02 11:32:23 -07:00
Linus Torvalds	9d90f03531	Replace module_init with appropriate alternate initcall in non modules. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVkO6nAAoJEOvOhAQsB9HWpHMP/Aknc+lmX2dZeIn96gdkP+UK 1qL24C5oq2sm/9yTZLdoXbyApLaaTbAJHS9O4kolaOU6uOs3JrgtXqL1697PVp1R qV4f4DOzXmmEHaE2oO21afAri3tXIVQNqA2NQl2TmKfwz0Atu01Vj5RJPu/ZOBPl dONXcFnE6nO2p7AEFRP/GfDZwkng4xALyZPhwL7tJDAeGaBpqG/n2hCuq+Szn9g8 wjTFACBdad/mRrYsL6YsWZ1e+LKI8vsArQbdPTam+jPaEUlK7yjFReFKCJVzL2JP xfQoTcCgFztzTUV0JTGR9sqeYA3WH9AkJOFDxNE/eIili4xiTh789WbEpHLVECSX 1LsW025I3DkRWBPT4L+9ZP805ha71kNXDFc5N3XJkzrCYaFvD2BgsUzxi6FXj7aC 9lEVKt6xO04FFG5SwTKnO0f8PEhPemZH3BDnVvjBDWQYLjUcPSNz7bfyHUhif0G5 ulOGVB0ncJJF9iP8PyZs1RA/F8kKxXWnhYMIHzvl0f0vLUA7rAKsACnhBgq8s9ZQ uM5YjzU91Z/4pe5C2E5MmQIZ84b79ZPsee1lF0GJdjK5W3PDvnCjIdXfQ5M/f3S8 76cssXWNhS78/P+19YqirLeb0u7Zw0jf73m9t9ywRgcByWfY5ZUDm0DFpQnWKkoR QY/aFO/yHKTO3VHj8Ril =KDJO -----END PGP SIGNATURE----- Merge tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux Pull module_init replacement part two from Paul Gortmaker: "Replace module_init with appropriate alternate initcall in non modules. This series converts non-modular code that is using the module_init() call to hook itself into the system to instead use one of our alternate priority initcalls. Unlike the previous series that used device_initcall and hence was a runtime no-op, these commits change to one of the alternate initcalls, because (a) we have them and (b) it seems like the right thing to do. For example, it would seem logical to use arch_initcall for arch specific setup code and fs_initcall for filesystem setup code. This does mean however, that changes in the init ordering will be taking place, and so there is a small risk that some kind of implicit init ordering issue may lie uncovered. But I think it is still better to give these ones sensible priorities than to just assign them all to device_initcall in order to exactly preserve the old ordering. Thad said, we have already made similar changes in core kernel code in commit `c96d6660dc` ("kernel: audit/fix non-modular users of module_init in core code") without any regressions reported, so this type of change isn't without precedent. It has also got the same local testing and linux-next coverage as all the other pull requests that I'm sending for this merge window have got. Once again, there is an unused module_exit function removal that shows up as an outlier upon casual inspection of the diffstat" * tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling mm/page_owner.c: use late_initcall to hook in enabling lib/list_sort: use late_initcall to hook in self tests arm: use subsys_initcall in non-modular pl320 IPC code powerpc: don't use module_init for non-modular core hugetlb code powerpc: use subsys_initcall for Freescale Local Bus x86: don't use module_init for non-modular core bootflag code netfilter: don't use module_init/exit in core IPV4 code fs/notify: don't use module_init for non-modular inotify_user code mm: replace module_init usages with subsys_initcall in nommu.c	2015-07-02 10:36:29 -07:00
Pablo Neira Ayuso	6742b9e310	netfilter: nfnetlink: keep going batch handling on missing modules After a fresh boot with no modules in place at all and a large rulesets, the existing nfnetlink_rcv_batch() funcion can take long time to commit the ruleset due to the many abort path. This is specifically a problem for the existing client of this code, ie. nf_tables, since it results in several synchronize_rcu() call in a row. This patch changes the policy to keep full batch processing on missing modules errors so we abort only once. Reported-by: Eric Leblond <eric@regit.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 17:59:33 +02:00
Florian Westphal	dd302b59bd	netfilter: bridge: don't leak skb in error paths br_nf_dev_queue_xmit must free skb in its error path. NF_DROP is misleading -- its an okfn, not a netfilter hook. Fixes: `462fb2af97` ("bridge : Sanitize skb before it enters the IP stack") Fixes: `efb6de9b4b` ("netfilter: bridge: forward IPv6 fragmented packets") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 17:59:26 +02:00
Florian Westphal	3bd229976f	netfilter: arptables: use percpu jumpstack commit `482cfc3185` ("netfilter: xtables: avoid percpu ruleset duplication") Unlike ip and ip6tables, arp tables were never converted to use the percpu jump stack. It still uses the rule blob to store return address, which isn't safe anymore since we now share this blob among all processors. Because there is no TEE support for arptables, we don't need to cope with reentrancy, so we can use loocal variable to hold stack offset. Fixes: `482cfc3185` ("netfilter: xtables: avoid percpu ruleset duplication") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 17:58:59 +02:00
Bernhard Thaler	a1bc1b356a	netfilter: bridge: fix CONFIG_NF_DEFRAG_IPV4/6 related warnings/errors br_nf_ip_fragment() is not needed when neither CONFIG_NF_DEFRAG_IPV4 nor CONFIG_NF_DEFRAG_IPV6 is set. struct brnf_frag_data must be available if either CONFIG_NF_DEFRAG_IPV4 or CONFIG_NF_DEFRAG_IPV6 is set. Fixes: `efb6de9b4b` ("netfilter: bridge: forward IPv6 fragmented packets") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Bernhard Thaler <bernhard.thaler@wvnet.at> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 15:03:13 +02:00
Eric W. Biederman	f307170d6e	netfilter: nf_queue: Don't recompute the hook_list head If someone sends packets from one of the netdevice ingress hooks to the a userspace queue, and then userspace later accepts the packet, the netfilter code can enter an infinite loop as the list head will never be found. Pass in the saved list_head to avoid this. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2015-07-02 15:03:13 +02:00
Linus Torvalds	47ebed96ff	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Pull networking fixes from David Miller: 1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications) from Ido Shamay, Eran Ben Elisha, and Or Gerlitz. 2) Missing unlock in error path of PTP support in renesas driver, from Dan Carpenter. 3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie. 4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad parity error notifications, flow-control and speed settings) from Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior. 5) ipv6 extension header parsing in the igb chip has a HW errata, disable it. Frm Todd Fujinaka. 6) Fix PCI link state locking issue in e1000e driver, from Yanir Lubetkin. 7) Cure panics during MTU change in i40e, from Mitch Williams. 8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef. 9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery Uytterhoeven. 10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from Michal Schmidt. 11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian Fainelli. 12) Fix races in SCTP protocol between OTTB responses and route removals, from Alexander Sverdlin. 13) Fix jumbo frame checksum issue with some mvneta devices, from Simon Guinot. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits) sock_diag: don't broadcast kernel sockets net: mvneta: disable IP checksum with jumbo frames for Armada 370 ARM: mvebu: update Ethernet compatible string for Armada XP net: mvneta: introduce compatible string "marvell, armada-xp-neta" api: fix compatibility of linux/in.h with netinet/in.h net: icplus: fix typo in constant name sis900: Trivial: Fix typos in enums stmmac: Trivial: fix typo in constant name sctp: Fix race between OOTB responce and route removal net-Liquidio: Delete unnecessary checks before the function call "vfree" vmxnet3: Bump up driver version number amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs net: bcmgenet: workaround initial read failures for integrated PHYs net: phy: bcm7xxx: workaround MDIO management controller initial read bnx2x: fix DMA API usage net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA net/phy: tune get_phy_c45_ids to support more c45 phy bnx2x: fix lockdep splat net: fec: don't access RACC register when not available ...	2015-07-01 14:58:07 -07:00
Linus Torvalds	02201e3f1b	Minor merge needed, due to function move. Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too. Cheers, Rusty. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVkgKHAAoJENkgDmzRrbjxQpwQAJVmBN6jF3SnwbQXv9vRixjH 58V33sb1G1RW+kXxQ3/e8jLX/4VaN479CufruXQp+IJWXsN/CH0lbC3k8m7u50d7 b1Zeqd/Yrh79rkc11b0X1698uGCSMlzz+V54Z0QOTEEX+nSu2ZZvccFS4UaHkn3z rqDo00lb7rxQz8U25qro2OZrG6D3ub2q20TkWUB8EO4AOHkPn8KWP2r429Axrr0K wlDWDTTt8/IsvPbuPf3T15RAhq1avkMXWn9nDXDjyWbpLfTn8NFnWmtesgY7Jl4t GjbXC5WYekX3w2ZDB9KaT/DAMQ1a7RbMXNSz4RX4VbzDl+yYeSLmIh2G9fZb1PbB PsIxrOgy4BquOWsJPm+zeFPSC3q9Cfu219L4AmxSjiZxC3dlosg5rIB892Mjoyv4 qxmg6oiqtc4Jxv+Gl9lRFVOqyHZrTC5IJ+xgfv1EyP6kKMUKLlDZtxZAuQxpUyxR HZLq220RYnYSvkWauikq4M8fqFM8bdt6hLJnv7bVqllseROk9stCvjSiE3A9szH5 OgtOfYV5GhOeb8pCZqJKlGDw+RoJ21jtNCgOr6DgkNKV9CX/kL/Puwv8gnA0B0eh dxCeB7f/gcLl7Cg3Z3gVVcGlgak6JWrLf5ITAJhBZ8Lv+AtL2DKmwEWS/iIMRmek tLdh/a9GiCitqS0bT7GE =tWPQ -----END PGP SIGNATURE----- Merge tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module updates from Rusty Russell: "Main excitement here is Peter Zijlstra's lockless rbtree optimization to speed module address lookup. He found some abusers of the module lock doing that too. A little bit of parameter work here too; including Dan Streetman's breaking up the big param mutex so writing a parameter can load another module (yeah, really). Unfortunately that broke the usual suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were appended too" * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits) modules: only use mod->param_lock if CONFIG_MODULES param: fix module param locks when !CONFIG_SYSFS. rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE() module: add per-module param_lock module: make perm const params: suppress unused variable error, warn once just in case code changes. modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'. kernel/module.c: avoid ifdefs for sig_enforce declaration kernel/workqueue.c: remove ifdefs over wq_power_efficient kernel/params.c: export param_ops_bool_enable_only kernel/params.c: generalize bool_enable_only kernel/module.c: use generic module param operaters for sig_enforce kernel/params: constify struct kernel_param_ops uses sysfs: tightened sysfs permission checks module: Rework module_addr_{min,max} module: Use __module_address() for module_address_lookup() module: Make the mod_tree stuff conditional on PERF_EVENTS \|\| TRACING module: Optimize __module_address() using a latched RB-tree rbtree: Implement generic latch_tree seqlock: Introduce raw_read_seqcount_latch() ...	2015-07-01 10:49:25 -07:00
Ilya Dryomov	82cd003a77	crush: fix a bug in tree bucket decode struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe() should be used. -Wconversion catches this, but I guess it went unnoticed in all the noise it spews. The actual problem (at least for common crushmaps) isn't the u32 -> u8 truncation though - it's the advancement by 4 bytes instead of 1 in the crushmap buffer. Fixes: http://tracker.ceph.com/issues/2759 Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Josh Durgin <jdurgin@redhat.com>	2015-07-01 00:46:35 +03:00
Tedd Ho-Jeong An	ab944c83f6	Bluetooth: Reinitialize the list after deletion for session user list If the user->list is deleted with list_del(), it doesn't initialize the entry which can cause the issue with list_empty(). According to the comment from the list.h, list_empty() returns false even if the list is empty and put the entry in an undefined state. /** * list_del - deletes entry from list. * @entry: the element to delete from the list. * Note: list_empty() on entry does not return true after this, the entry is * in an undefined state. */ Because of this behavior, list_empty() returns false even if list is empty when the device is reconnected. So, user->list needs to be re-initialized after list_del(). list.h already have a macro list_del_init() which deletes the entry and initailze it again. Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com> Tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-06-30 21:46:19 +02:00
Craig Gallek	b922622ec6	sock_diag: don't broadcast kernel sockets Kernel sockets do not hold a reference for the network namespace to which they point. Socket destruction broadcasting relies on the network namespace and will cause the splat below when a kernel socket is destroyed. This fix simply ignores kernel sockets when they are destroyed. Reported as: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1 Workqueue: sock_diag_events sock_diag_broadcast_destroy_work Stack: ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90 ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90 ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8 Call Trace: [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10 [<ffffffffc09b9b90>] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag] [<ffffffff845c868d>] netlink_broadcast+0x1d/0x20 [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10 [<ffffffff845b2bf5>] sock_diag_broadcast_destroy_work+0xd5/0x160 [<ffffffff8408ea97>] process_one_work+0x147/0x420 [<ffffffff8408f0f9>] worker_thread+0x69/0x470 [<ffffffff8409fda3>] ? preempt_count_sub+0xa3/0xf0 [<ffffffff8408f090>] ? rescuer_thread+0x320/0x320 [<ffffffff84093cd7>] kthread+0x107/0x120 [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff8469d31f>] ret_from_fork+0x3f/0x70 [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0 Tested: Using a debug kernel while 'ss -E' is running: ip netns add test-ns ip netns delete test-ns Fixes: `eb4cb00852` sock_diag: define destruction multicast groups Fixes: `26abe14379` net: Modify sk_alloc to not reference count the netns of kernel sockets. Reported-by: Dave Jones <davej@codemonkey.org.uk> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-30 10:00:26 -07:00
Benoît Canet	c2cfa19400	libceph: Fix ceph_tcp_sendpage()'s more boolean usage From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h: bool last_piece; /* current is last piece / In ceph_msg_data_next(): last_piece = cursor->last_piece; A call to ceph_msg_data_next() is followed by: ret = ceph_tcp_sendpage(con->sock, page, page_offset, length, last_piece); while ceph_tcp_sendpage() is: static int ceph_tcp_sendpage(struct socket sock, struct page page, int offset, size_t size, bool more) The logic is inverted: correct it. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-29 20:03:46 +03:00
Alexander Sverdlin	29c4afc4e9	sctp: Fix race between OOTB responce and route removal There is NULL pointer dereference possible during statistics update if the route used for OOTB responce is removed at unfortunate time. If the route exists when we receive OOTB packet and we finally jump into sctp_packet_transmit() to send ABORT, but in the meantime route is removed under our feet, we take "no_route" path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...). But sctp_ootb_pkt_new() used to prepare responce packet doesn't call sctp_transport_set_owner() and therefore there is no asoc associated with this packet. Probably temporary asoc just for OOTB responces is overkill, so just introduce a check like in all other places in sctp_packet_transmit(), where "asoc" is dereferenced. To reproduce this, one needs to 0. ensure that sctp module is loaded (otherwise ABORT is not generated) 1. remove default route on the machine 2. while true; do ip route del [interface-specific route] ip route add [interface-specific route] done 3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT responce On x86_64 the crash looks like this: BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] PGD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ... CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1 Hardware name: ... task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000 RIP: 0010:[<ffffffffa05ec9ac>] [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] RSP: 0018:ffff880127c037b8 EFLAGS: 00010296 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480 RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700 RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28 R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0 FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0 Stack: ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400 ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520 0000000000000000 0000000000000001 0000000000000000 0000000000000000 Call Trace: <IRQ> [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp] [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp] [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp] [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210 [<ffffffff810e0329>] ? update_process_times+0x59/0x60 [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0 [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0 [<ffffffff8101f599>] ? read_tsc+0x9/0x10 [<ffffffff8116d4b5>] ? put_page+0x55/0x60 [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100 [<ffffffff81462b68>] ? skb_free_head+0x58/0x80 [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic] [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0 [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp] [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp] [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp] [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp] [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp] [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30 [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150 [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210 [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90 [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370 [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0 [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50 [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60 [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0 [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120 [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169] [<ffffffff8147896a>] net_rx_action+0x21a/0x360 [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0 [<ffffffff8107912d>] irq_exit+0xad/0xb0 [<ffffffff8157d158>] do_IRQ+0x58/0xf0 [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d <EOI> [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20 [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp] [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0 [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20 [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480 [<ffffffff8156b365>] rest_init+0x85/0x90 [<ffffffff818eb035>] start_kernel+0x48b/0x4ac [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120 [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184 Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9 RIP [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp] RSP <ffff880127c037b8> CR2: 0000000000000020 ---[ end trace 5aec7fd2dc983574 ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff) drm_kms_helper: panic occurred, switching back to text console ---[ end Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-29 09:28:42 -07:00
Gilad Ben-Yossef	4fdeddfe04	dsa: fix promiscuity leak on slave dev open error DSA master netdev promiscuity counter was not being properly decremented on slave device open error path. Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com> CC: Gilad Ben-Yossef <giladb@ezchip.com> CC: David S. Miller <davem@davemloft.net> CC: Florian Fainelli <f.fainelli@gmail.com> CC: Guenter Roeck <linux@roeck-us.net> CC: Andrew Lunn <andrew@lunn.ch> CC: Scott Feldman <sfeldma@gmail.com> Acked-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:57:08 -07:00
David Miller	1830fcea5b	net: Kill sock->sk_protinfo No more users, so it can now be removed. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:55:44 -07:00
David Miller	3200392b88	ax25: Stop using sock->sk_protinfo. Just make a ax25_sock structure that provides the ax25_cb pointer. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:55:44 -07:00
Geert Uytterhoeven	8e690ffdbc	flow_dissector: Pre-initialize ip_proto in __skb_flow_dissect() net/core/flow_dissector.c: In function ‘__skb_flow_dissect’: net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this function Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:53:54 -07:00
Andy Gospodarek	96ac5cc963	ipv4: fix RCU lockdep warning from linkdown changes The following lockdep splat was seen due to the wrong context for grabbing in_dev. =============================== [ INFO: suspicious RCU usage. ] 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted ------------------------------- include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by ip/403: #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff81453305>] rtnl_lock+0x17/0x19 #1: ((inetaddr_chain).rwsem){.+.+.+}, at: [<ffffffff8105c327>] __blocking_notifier_call_chain+0x35/0x6a stack backtrace: CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3 ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00 ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8 Call Trace: [<ffffffff8150a542>] dump_stack+0x4c/0x6e [<ffffffff8107a8b3>] ? up+0x39/0x3e [<ffffffff8107cb74>] lockdep_rcu_suspicious+0xf7/0x100 [<ffffffff814b63c3>] fib_dump_info+0x227/0x3e2 [<ffffffff814b6624>] rtmsg_fib+0xa6/0x116 [<ffffffff814b978f>] fib_table_insert+0x316/0x355 [<ffffffff814b362e>] fib_magic+0xb7/0xc7 [<ffffffff814b4803>] fib_add_ifaddr+0xb1/0x13b [<ffffffff814b4d09>] fib_inetaddr_event+0x36/0x90 [<ffffffff8105c086>] notifier_call_chain+0x4c/0x71 [<ffffffff8105c340>] __blocking_notifier_call_chain+0x4e/0x6a [<ffffffff8105c370>] blocking_notifier_call_chain+0x14/0x16 [<ffffffff814a7f50>] __inet_insert_ifa+0x1a5/0x1b3 [<ffffffff814a894d>] inet_rtm_newaddr+0x350/0x35f [<ffffffff81457866>] rtnetlink_rcv_msg+0x17b/0x18a [<ffffffff8107e7c3>] ? trace_hardirqs_on+0xd/0xf [<ffffffff8146965f>] ? netlink_deliver_tap+0x1cb/0x1f7 [<ffffffff814576eb>] ? rtnl_newlink+0x72a/0x72a ... This patch resolves that splat. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:47:12 -07:00
Jon Paul Maloy	7d967b673c	tipc: purge backlog queue counters when broadcast link is reset In commit `1f66d161ab` ("tipc: introduce starvation free send algorithm") we introduced a counter per priority level for buffers in the link backlog queue. We also introduced a new function tipc_link_purge_backlog(), to reset these counters to zero when the link is reset. Unfortunately, we missed to call this function when the broadcast link is reset, with the result that the values of these counters might be permanently skewed when new nodes are attached. This may in the worst case lead to permananent, but spurious, broadcast link congestion, where no broadcast packets can be sent at all. We fix this bug with this commit. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-28 16:43:02 -07:00
Linus Torvalds	d2c3ac7e7e	Merge branch 'for-4.2' of git://linux-nfs.org/~bfields/linux Pull nfsd updates from Bruce Fields: "A relatively quiet cycle, with a mix of cleanup and smaller bugfixes" * 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits) sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key() nfsd: wrap too long lines in nfsd4_encode_read nfsd: fput rd_file from XDR encode context nfsd: take struct file setup fully into nfs4_preprocess_stateid_op nfsd: refactor nfs4_preprocess_stateid_op nfsd: clean up raparams handling nfsd: use swap() in sort_pacl_range() rpcrdma: Merge svcrdma and xprtrdma modules into one svcrdma: Add a separate "max data segs macro for svcrdma svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL svcrdma: Keep rpcrdma_msg fields in network byte-order svcrdma: Fix byte-swapping in svc_rdma_sendto.c nfsd: Update callback sequnce id only CB_SEQUENCE success nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying svcrdma: Remove svc_rdma_xdr_decode_deferred_req() SUNRPC: Move EXPORT_SYMBOL for svc_process uapi/nfs: Add NFSv4.1 ACL definitions nfsd: Remove dead declarations nfsd: work around a gcc-5.1 warning nfsd: Checking for acl support does not require fetching any acls ...	2015-06-27 10:14:39 -07:00
Tedd Ho-Jeong An	7c258670ce	Bluetooth: hidp: Initialize list header of hidp session user When new hidp session is created, list header in l2cap_user is not initialized and this causes list_empty() to fail in l2cap_register_user() even if l2cap_user list is empty. Signed-off-by: Tedd Ho-Jeong An <tedd.an@intel.com> Tested-by: Jörg Otte <jrg.otte@gmail.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>	2015-06-26 20:00:21 +02:00
Benoît Canet	6ba8edc0bc	libceph: Remove spurious kunmap() of the zero page ceph_tcp_sendpage already does the work of mapping/unmapping the zero page if needed. Signed-off-by: Benoît Canet <benoit.canet@nodalink.com> Reviewed-by: Alex Elder <elder@linaro.org> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-25 18:30:56 +03:00
Jamal Hadi Salim	b175c3a44f	net: sched: flower fix typo Fix typo in the validation rules for flower's attributes Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-25 05:23:02 -07:00
Ilya Dryomov	b459be739f	crush: sync up with userspace .. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff between kernel and userspace"). This fixes a bunch of recently pulled coding style issues and makes includes a bit cleaner. A patch "crush:Make the function crush_ln static" from Nicholas Krause <xerofoify@gmail.com> is folded in as crush_ln() has been made static in userspace as well. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-25 11:49:31 +03:00
Ilya Dryomov	8f529795ba	crush: fix crash from invalid 'take' argument Verify that the 'take' argument is a valid device or bucket. Otherwise ignore it (do not add the value to the working vector). Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-25 11:49:31 +03:00
Hong Zhiguo	6c13a6bb55	libceph: fix wrong name "Ceph filesystem for Linux" modinfo libceph prints the module name "Ceph filesystem for Linux", which is same as the real fs module ceph. It's confusing. Signed-off-by: Hong Zhiguo <zhiguohong@tencent.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-25 11:49:30 +03:00
Ilya Dryomov	216639dd50	libceph: a couple tweaks for wait loops - return -ETIMEDOUT instead of -EIO in case of timeout - wait_event_interruptible_timeout() returns time left until timeout and since it can be almost LONG_MAX we had better assign it to long Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:29 +03:00
Ilya Dryomov	a319bf56a6	libceph: store timeouts in jiffies, verify user input There are currently three libceph-level timeouts that the user can specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of these are in seconds and no checking is done on user input: negative values are accepted, we multiply them all by HZ which may or may not overflow, arbitrarily large jiffies then get added together, etc. There is also a bug in the way mount_timeout=0 is handled. It's supposed to mean "infinite timeout", but that's not how wait.h APIs treat it and so __ceph_open_session() for example will busy loop without much chance of being interrupted if none of ceph-mons are there. Fix all this by verifying user input, storing timeouts capped by msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies() helper for all user-specified waits to handle infinite timeouts correctly. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:29 +03:00
Ilya Dryomov	b01da6a08c	libceph: use kvfree() instead of open-coding it This one sneaked in through vfs tree with commit `2b777c9dd9` ("ceph_sync_read: stop poking into iov_iter guts"). Signed-off-by: Ilya Dryomov <idryomov@gmail.com>	2015-06-25 11:49:28 +03:00
Yan, Zheng	144cba1493	libceph: allow setting osd_req_op's flags Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:27 +03:00
Yan, Zheng	66ba609f7b	libceph: properly release STAT request's raw_data_in Signed-off-by: Yan, Zheng <zyan@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>	2015-06-25 11:49:27 +03:00
Linus Torvalds	e0456717e4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...	2015-06-24 16:49:49 -07:00
Nikolay Aleksandrov	1ea2d020ba	bridge: vlan: flush the dynamically learned entries on port vlan delete Add a new argument to br_fdb_delete_by_port which allows to specify a vid to match when flushing entries and use it in nbp_vlan_delete() to flush the dynamically learned entries of the vlan/port pair when removing a vlan from a port. Before this patch only the local mac was being removed and the dynamically learned ones were left to expire. Note that the do_all argument is still respected and if specified, the vid will be ignored. Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 05:40:55 -07:00
Nikolay Aleksandrov	9aa6638216	bridge: multicast: add a comment to br_port_state_selection about blocking state Add a comment to explain why we're not disabling port's multicast when it goes in blocking state. Since there's a check in the timer's function which bypasses the timer if the port's in blocking/disabled state, the timer will simply expire and stop without sending more queries. Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 05:40:54 -07:00
David S. Miller	3a07bd6fea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: drivers/net/ethernet/mellanox/mlx4/main.c net/packet/af_packet.c Both conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 02:58:51 -07:00
Phil Sutter	204621551b	net: inet_diag: export IPV6_V6ONLY sockopt For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is exported to userspace. It indicates whether a socket bound to in6addr_any listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not listed by e.g. 'ss -l -4'. This patch is accompanied by an appropriate one for iproute2 to enable the additional information in 'ss -e'. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 02:51:39 -07:00
Andy Gospodarek	0eeb075fad	net: ipv4 sysctl option to ignore routes when nexthop link is down This feature is only enabled with the new per-interface or ipv4 global sysctls called 'ignore_routes_with_linkdown'. net.ipv4.conf.all.ignore_routes_with_linkdown = 0 net.ipv4.conf.default.ignore_routes_with_linkdown = 0 net.ipv4.conf.lo.ignore_routes_with_linkdown = 0 ... When the above sysctls are set, will report to userspace that a route is dead and will no longer resolve to this nexthop when performing a fib lookup. This will signal to userspace that the route will not be selected. The signalling of a RTNH_F_DEAD is only passed to userspace if the sysctl is enabled and link is down. This was done as without it the netlink listeners would have no idea whether or not a nexthop would be selected. The kernel only sets RTNH_F_DEAD internally if the interface has IFF_UP cleared. With the new sysctl set, the following behavior can be observed (interface p8p1 is link-down): default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15 cache While the route does remain in the table (so it can be modified if needed rather than being wiped away as it would be if IFF_UP was cleared), the proper next-hop is chosen automatically when the link is down. Now interface p8p1 is linked-up: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2 90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1 cache local 80.0.0.1 dev lo src 80.0.0.1 cache <local> 80.0.0.2 dev p8p1 src 80.0.0.1 cache and the output changes to what one would expect. If the sysctl is not set, the following output would be expected when p8p1 is down: default via 10.0.5.2 dev p9p1 10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15 70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1 80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown 90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown 90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2 Since the dead flag does not appear, there should be no expectation that the kernel would skip using this route due to link being down. v2: Split kernel changes into 2 patches, this actually makes a behavioral change if the sysctl is set. Also took suggestion from Alex to simplify code by only checking sysctl during fib lookup and suggestion from Scott to add a per-interface sysctl. v3: Code clean-ups to make it more readable and efficient as well as a reverse path check fix. v4: Drop binary sysctl v5: Whitespace fixups from Dave v6: Style changes from Dave and checkpatch suggestions v7: One more checkpatch fixup Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 02:15:54 -07:00
Andy Gospodarek	8a3d03166f	net: track link-status of ipv4 nexthops Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are reachable via an interface where carrier is off. No action is taken, but additional flags are passed to userspace to indicate carrier status. This also includes a cleanup to fib_disable_ip to more clearly indicate what event made the function call to replace the more cryptic force option previously used. v2: Split out kernel functionality into 2 patches, this patch simply sets and clears new nexthop flag RTNH_F_LINKDOWN. v3: Cleanups suggested by Alex as well as a bug noticed in fib_sync_down_dev and fib_sync_up when multipath was not enabled. v5: Whitespace and variable declaration fixups suggested by Dave. v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them all. Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 02:15:54 -07:00
Vivien Didelot	5c8079d049	net: switchdev: ignore unsupported bridge flags switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS attributes, but a driver doesn't need to implement this in order to get bridge link information. So error out only on errors different than -EOPNOTSUPP. (This is a follow-up patch for 7d4f8d8.) Fixes: `8793d0a664` ("switchdev: add new switchdev_port_bridge_getlink") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 01:06:34 -07:00
Julian Anastasov	34b99df4e6	ip: report the original address of ICMP messages ICMP messages can trigger ICMP and local errors. In this case serr->port is 0 and starting from Linux 4.0 we do not return the original target address to the error queue readers. Add function to define which errors provide addr_offset. With this fix my ping command is not silent anymore. Fixes: `c247f0534c` ("ip: fix error queue empty skb handling") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-24 00:48:08 -07:00
Linus Torvalds	f9d1b5a31a	Changes for 4.2 - A large cleanup of how device capabilities are checked for various features - Additional cleanups in the MAD processing - Update to the srp driver - Creation and use of centralized log message helpers - Add const to a number of args to calls and clean up call chain - Add support for extended cq create verb - Add support for timestamps on cq completion - Add support for processing OPA MAD packets -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJVeyzqAAoJELgmozMOVy/di3wP/jml4F9crvQn7UBJjGm/rgcI wzZ2GZTqxQE8dn+W6gQsdKOzy0Ibxx5UYGp9ruInuxAcVh9t1PcylanasaiGMEtY mrGRFjipJ9jYa+yDQDTQi8EFMClZuMSvtRLKjzYITudCXQck37V+F5YlP6VphjX7 JeiM4a+4rD0ukk5PKGvUw51sP1eawKtEdUvnqcOEI2tJgQmzJBP4mXrhVtS/0wSc Pi8TRN5QKi3Drom/tK9QQ/ncoYngi4BKLfszCeU373HJq6qXqsxBYvs3jX6MPzfv Aooj272JxBgCYxkmEfECezDpmi3PbWDJjXj/xCLjfhjISDtHHHVLGVMODZpwUEsL 2wBgwlzdajVopSbSLvsjQNtQw25s7sDWpu+TFKbS0u+W2d0ZOyipM1Xeje+OtDHQ clhwvDhgSfeI/bJ1YdtNLbvINrwsfZD213zD+WH21A/9weAVr3hEfTuSaNFiTiRn 5yywP36TM0wH90KhiWoLrztcHvoE5p7kGuqzv04MRjrMMNHEJK2/IhWvT97Ewngu vWrZl7QRzXYcGspCOp2aJW9Wr2rhGRrv28TF+thpNrIJOB2JM4q4koCKZCcI0s2D E6pY2YQSzvrA/ZSfcWIg4yhugcycIJkOf7ur2N/U43cwGXtaCzPWVnKMApmdnVOO ZEMwD3OZ1OGcCHLhRL8Y =yISf -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: - a large cleanup of how device capabilities are checked for various features - additional cleanups in the MAD processing - update to the srp driver - creation and use of centralized log message helpers - add const to a number of args to calls and clean up call chain - add support for extended cq create verb - add support for timestamps on cq completion - add support for processing OPA MAD packets * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (92 commits) IB/mad: Add final OPA MAD processing IB/mad: Add partial Intel OPA MAD support IB/mad: Add partial Intel OPA MAD support IB/core: Add OPA MAD core capability flag IB/mad: Add support for additional MAD info to/from drivers IB/mad: Convert allocations from kmem_cache to kzalloc IB/core: Add ability for drivers to report an alternate MAD size. IB/mad: Support alternate Base Versions when creating MADs IB/mad: Create a generic helper for DR forwarding checks IB/mad: Create a generic helper for DR SMP Recv processing IB/mad: Create a generic helper for DR SMP Send processing IB/mad: Split IB SMI handling from MAD Recv handler IB/mad cleanup: Generalize processing of MAD data IB/mad cleanup: Clean up function params -- find_mad_agent IB/mlx4: Add support for CQ time-stamping IB/mlx4: Add mmap call to map the hardware clock IB/core: Pass hardware specific data in query_device IB/core: Add timestamp_mask and hca_core_clock to query_device IB/core: Extend ib_uverbs_create_cq IB/core: Add CQ creation time-stamping flag ...	2015-06-23 15:53:26 -07:00
Linus Torvalds	cb8a4deaf9	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina: "As usual, mostly comment, kerneldoc and printk() fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: lpfc: Grammar s/an negative/a negative/ ARM: lib/lib1funcs.S: fix typo s/substractions/subtractions/ cx25821: cx25821-medusa-reg.h: fix 0x0x prefix lib: crc-itu-t.[ch] fix 0x0x prefix in integer constants rapidio: Fix kerneldoc and comment qla4xxx: Fix printk() in qla4_83xx_read_reset_template() and qla4_83xx_pre_loopback_config() treewide: Kconfig: fix wording / spelling usb/serial: fix grammar in Kconfig help text for FTDI_SIO megaraid_sas: fix kerneldoc netfilter: ebtables: fix comment grammar drm/radeon: fix comment isdn: fix grammar in comment ARM: KVM: fix comment	2015-06-23 14:08:54 -07:00
Scott Feldman	e9fdaec0e0	switchdev: change BUG_ON to WARN for attr set failure case This particular BUG_ON condition was checking for attr set err in the COMMIT phase, which isn't expected (it's a driver bug if PREPARE phase is OK but COMMIT fails). But BUG_ON() is too strong for this case, so change to WARN(). BUG_ON() would be warranted if the system was corrupted beyond repair, but this is not the case here. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:57:17 -07:00
Scott Feldman	7d4f8d871a	switchdev; add VLAN support for port's bridge_getlink One more missing piece of the puzzle. Add vlan dump support to switchdev port's bridge_getlink. iproute2 "bridge vlan show" cmd already knows how to show the vlans installed on the bridge and the device , but (until now) no one implemented the port vlan part of the netlink PF_BRIDGE:RTM_GETLINK msg. Before this patch, "bridge vlan show": $ bridge -c vlan show port vlan ids sw1p1 30-34 << bridge side vlans 57 sw1p1 << device side vlans (missing) sw1p2 57 sw1p2 sw1p3 sw1p4 br0 None (When the port is bridged, the output repeats the vlan list for the vlans on the bridge side of the port and the vlans on the device side of the port. The listing above show no vlans for the device side even though they are installed). After this patch: $ bridge -c vlan show port vlan ids sw1p1 30-34 << bridge side vlan 57 sw1p1 30-34 << device side vlans 57 3840 PVID sw1p2 57 sw1p2 57 3840 PVID sw1p3 3842 PVID sw1p4 3843 PVID br0 None I re-used ndo_dflt_bridge_getlink to add vlan fill call-back func. switchdev support adds an obj dump for VLAN objects, using the same call-back scheme as FDB dump. Support included for both compressed and un-compressed vlan dumps. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:56:18 -07:00
Scott Feldman	3e3a78b495	switchdev: rename vlan vid_start to vid_begin Use vid_begin/end to be consistent with BRIDGE_VLAN_INFO_RANGE_BEGIN/END. Signed-off-by: Scott Feldman <sfeldma@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2015-06-23 06:56:18 -07:00

... 7 8 9 10 11 ...

38980 Commits