Commit Graph

45146 Commits

Author SHA1 Message Date
Yuchung Cheng e636f8b010 tcp: new helper for RACK to detect loss
Create a new helper tcp_rack_detect_loss to prepare the upcoming
RACK reordering timer patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 22:37:16 -05:00
Yuchung Cheng db8da6bb57 tcp: new helper function for RACK loss detection
Create a new helper tcp_rack_mark_skb_lost to prepare the
upcoming RACK reordering timer support.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 22:37:16 -05:00
Shannon Nelson 003c941057 tcp: fix tcp_fastopen unaligned access complaints on sparc
Fix up a data alignment issue on sparc by swapping the order
of the cookie byte array field with the length field in
struct tcp_fastopen_cookie, and making it a proper union
to clean up the typecasting.

This addresses log complaints like these:
    log_unaligned: 113 callbacks suppressed
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360
    Kernel unaligned access at TPC[9764ac] tcp_try_fastopen+0x2ec/0x360
    Kernel unaligned access at TPC[9764c8] tcp_try_fastopen+0x308/0x360
    Kernel unaligned access at TPC[9764e4] tcp_try_fastopen+0x324/0x360
    Kernel unaligned access at TPC[976490] tcp_try_fastopen+0x2d0/0x360

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 12:31:24 -05:00
David Lebrun fa79581ea6 ipv6: sr: fix several BUGs when preemption is enabled
When CONFIG_PREEMPT=y, CONFIG_IPV6=m and CONFIG_SEG6_HMAC=y,
seg6_hmac_init() is called during the initialization of the ipv6 module.
This causes a subsequent call to smp_processor_id() with preemption
enabled, resulting in the following trace.

[   20.451460] BUG: using smp_processor_id() in preemptible [00000000] code: systemd/1
[   20.452556] caller is debug_smp_processor_id+0x17/0x19
[   20.453304] CPU: 0 PID: 1 Comm: systemd Not tainted 4.9.0-rc5-00973-g46738b1 #1
[   20.454406]  ffffc9000062fc18 ffffffff813607b2 0000000000000000 ffffffff81a7f782
[   20.455528]  ffffc9000062fc48 ffffffff813778dc 0000000000000000 00000000001dcf98
[   20.456539]  ffffffffa003bd08 ffffffff81af93e0 ffffc9000062fc58 ffffffff81377905
[   20.456539] Call Trace:
[   20.456539]  [<ffffffff813607b2>] dump_stack+0x63/0x7f
[   20.456539]  [<ffffffff813778dc>] check_preemption_disabled+0xd1/0xe3
[   20.456539]  [<ffffffff81377905>] debug_smp_processor_id+0x17/0x19
[   20.460260]  [<ffffffffa0061f3b>] seg6_hmac_init+0xfa/0x192 [ipv6]
[   20.460260]  [<ffffffffa0061ccc>] seg6_init+0x39/0x6f [ipv6]
[   20.460260]  [<ffffffffa006121a>] inet6_init+0x21a/0x321 [ipv6]
[   20.460260]  [<ffffffffa0061000>] ? 0xffffffffa0061000
[   20.460260]  [<ffffffff81000457>] do_one_initcall+0x8b/0x115
[   20.460260]  [<ffffffff811328a3>] do_init_module+0x53/0x1c4
[   20.460260]  [<ffffffff8110650a>] load_module+0x1153/0x14ec
[   20.460260]  [<ffffffff81106a7b>] SYSC_finit_module+0x8c/0xb9
[   20.460260]  [<ffffffff81106a7b>] ? SYSC_finit_module+0x8c/0xb9
[   20.460260]  [<ffffffff81106abc>] SyS_finit_module+0x9/0xb
[   20.460260]  [<ffffffff810014d1>] do_syscall_64+0x62/0x75
[   20.460260]  [<ffffffff816834f0>] entry_SYSCALL64_slow_path+0x25/0x25

Moreover, dst_cache_* functions also call smp_processor_id(), generating
a similar trace.

This patch uses raw_cpu_ptr() in seg6_hmac_init() rather than this_cpu_ptr()
and disable preemption when using dst_cache_* functions.

Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-13 12:29:55 -05:00
Michal Kazior dbef536211 mac80211: prevent skb/txq mismatch
Station structure is considered as not uploaded
(to driver) until drv_sta_state() finishes. This
call is however done after the structure is
attached to mac80211 internal lists and hashes.
This means mac80211 can lookup (and use) station
structure before it is uploaded to a driver.

If this happens (structure exists, but
sta->uploaded is false) fast_tx path can still be
taken. Deep in the fastpath call the sta->uploaded
is checked against to derive "pubsta" argument for
ieee80211_get_txq(). If sta->uploaded is false
(and sta is actually non-NULL) ieee80211_get_txq()
effectively downgraded to vif->txq.

At first glance this may look innocent but coerces
mac80211 into a state that is almost guaranteed
(codel may drop offending skb) to crash because a
station-oriented skb gets queued up on
vif-oriented txq. The ieee80211_tx_dequeue() ends
up looking at info->control.flags and tries to use
txq->sta which in the fail case is NULL.

It's probably pointless to pretend one can
downgrade skb from sta-txq to vif-txq.

Since downgrading unicast traffic to vif->txq must
not be done there's no txq to put a frame on if
sta->uploaded is false. Therefore the code is made
to fall back to regular tx() op path if the
described condition is hit.

Only drivers using wake_tx_queue were affected.

Example crash dump before fix:

 Unable to handle kernel paging request at virtual address ffffe26c
 PC is at ieee80211_tx_dequeue+0x204/0x690 [mac80211]
 [<bf4252a4>] (ieee80211_tx_dequeue [mac80211]) from
 [<bf4b1388>] (ath10k_mac_tx_push_txq+0x54/0x1c0 [ath10k_core])
 [<bf4b1388>] (ath10k_mac_tx_push_txq [ath10k_core]) from
 [<bf4bdfbc>] (ath10k_htt_txrx_compl_task+0xd78/0x11d0 [ath10k_core])
 [<bf4bdfbc>] (ath10k_htt_txrx_compl_task [ath10k_core])
 [<bf51c5a4>] (ath10k_pci_napi_poll+0x54/0xe8 [ath10k_pci])
 [<bf51c5a4>] (ath10k_pci_napi_poll [ath10k_pci]) from
 [<c0572e90>] (net_rx_action+0xac/0x160)

Reported-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
Signed-off-by: Michal Kazior <michal.kazior@tieto.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 14:47:21 +01:00
Felix Fietkau 43071d8fb3 mac80211: initialize SMPS field in HT capabilities
ibss and mesh modes copy the ht capabilites from the band without
overriding the SMPS state. Unfortunately the default value 0 for the
SMPS field means static SMPS instead of disabled.

This results in HT ibss and mesh setups using only single-stream rates,
even though SMPS is not supposed to be active.

Initialize SMPS to disabled for all bands on ieee80211_hw_register to
ensure that the value is sane where it is not overriden with the real
SMPS state.

Reported-by: Elektra Wagenrad <onelektra@gmx.net>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
[move VHT TODO comment to a better place]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 11:31:26 +01:00
Purushottam Kushwaha 3093ebbeab cfg80211: Specify the reason for connect timeout
This enhances the connect timeout API to also carry the reason for the
timeout. These reason codes for the connect time out are represented by
enum nl80211_timeout_reason and are passed to user space through a new
attribute NL80211_ATTR_TIMEOUT_REASON (u32).

Signed-off-by: Purushottam Kushwaha <pkushwah@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[keep gfp_t argument last]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 09:46:18 +01:00
vamsi krishna bf95ecdba9 cfg80211: Add support to sched scan to report better BSSs
Enhance sched scan to support option of finding a better BSS while in
connected state. Firmware scans the medium and reports when it finds a
known BSS which has better RSSI than the current connected BSS. New
attributes to specify the relative RSSI (compared to the current BSS)
are added to the sched scan to implement this.

Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 09:40:41 +01:00
vamsi krishna ab5bb2d51b cfg80211: Add support for randomizing TA of Public Action frames
Add support to use a random local address (Address 2 = TA in transmit
and the same address in receive functionality) for Public Action frames
in order to improve privacy of WLAN clients. Applications fill the
random transmit address in the frame buffer in the NL80211_CMD_FRAME
command. This can be used only with the drivers that indicate support
for random local address by setting the new
NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA and/or
NL80211_EXT_FEATURE_MGMT_TX_RANDOM_TA_CONNECTED in ext_features.

The driver needs to configure receive behavior to accept frames to the
specified random address during the time the frame exchange is pending
and such frames need to be acknowledged similarly to frames sent to the
local permanent address when this random address functionality is not
used.

Signed-off-by: vamsi krishna <vamsin@qti.qualcomm.com>
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 09:39:47 +01:00
Johannes Berg 10b2eb6949 wext: uninline stream addition functions
With 78, 111 and 85 bytes respectively (on x86-64), the
functions iwe_stream_add_event(), iwe_stream_add_point()
and iwe_stream_add_value() really shouldn't be inlines.

It appears that at least my compiler already decided
the same, and created a single instance of each one
of them for each file using it, but that's still a
number of instances in the system overall, which this
reduces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-13 09:38:42 +01:00
Eric Dumazet 717ac5ce13 ipv6: sr: static percpu allocation for hmac_ring
Current allocations are not NUMA aware, and lack proper
cleanup in case of error.

It is perfectly fine to use static per cpu allocations for 256 bytes
per cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Lebrun <david.lebrun@uclouvain.be>
Acked-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 16:52:19 -05:00
Nikolay Aleksandrov 8fb472c09b ipmr: improve hash scalability
Recently we started using ipmr with thousands of entries and easily hit
soft lockups on smaller devices. The reason is that the hash function
uses the high order bits from the src and dst, but those don't change in
many common cases, also the hash table  is only 64 elements so with
thousands it doesn't scale at all.
This patch migrates the hash table to rhashtable, and in particular the
rhl interface which allows for duplicate elements to be chained because
of the MFC_PROXY support (*,G; *,*,oif cases) which allows for multiple
duplicate entries to be added with different interfaces (IMO wrong, but
it's been in for a long time).

And here are some results from tests I've run in a VM:
 mr_table size (default, allocated for all namespaces):
  Before                    After
   49304 bytes               2400 bytes

 Add 65000 routes (the diff is much larger on smaller devices):
  Before                    After
   1m42s                     58s

 Forwarding 256 byte packets with 65000 routes (test done in a VM):
  Before                    After
   3 Mbps / ~1465 pps        122 Mbps / ~59000 pps

As a bonus we no longer see the soft lockups on smaller devices which
showed up even with 2000 entries before.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 16:48:26 -05:00
Sriharsha Basavapatna ce1ca7d2d1 svcrdma: avoid duplicate dma unmapping during error recovery
In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
problems when the iova being unmapped is subsequently reused. Remove
the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
handle it.

Fixes: 412a15c0fe ("svcrdma: Port to new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 16:14:47 -05:00
Eric Dumazet c1ce1560a1 secure_seq: fix sparse errors
Fixes following warnings :

net/core/secure_seq.c:125:28: warning: incorrect type in argument 1
(different base types)
net/core/secure_seq.c:125:28:    expected unsigned int const [unsigned]
[usertype] a
net/core/secure_seq.c:125:28:    got restricted __be32 [usertype] saddr
net/core/secure_seq.c:125:35: warning: incorrect type in argument 2
(different base types)
net/core/secure_seq.c:125:35:    expected unsigned int const [unsigned]
[usertype] b
net/core/secure_seq.c:125:35:    got restricted __be32 [usertype] daddr
net/core/secure_seq.c:125:43: warning: cast from restricted __be16
net/core/secure_seq.c:125:61: warning: restricted __be16 degrades to
integer

Fixes: 7cd23e5300 ("secure_seq: use SipHash in place of MD5")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:57:10 -05:00
Scott Mayhew 546125d161 sunrpc: don't call sleeping functions from the notifier block callbacks
The inet6addr_chain is an atomic notifier chain, so we can't call
anything that might sleep (like lock_sock)... instead of closing the
socket from svc_age_temp_xprts_now (which is called by the notifier
function), just have the rpc service threads do it instead.

Cc: stable@vger.kernel.org
Fixes: c3d4879e01 "sunrpc: Add a function to close..."
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 15:56:40 -05:00
J. Bruce Fields 78794d1890 svcrpc: don't leak contexts on PROC_DESTROY
Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future.  This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests.  We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Cc: stable@vger.kernel.org
Fixes: c5b29f885a "sunrpc: use seconds since boot in expiry cache"
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-01-12 15:56:14 -05:00
David Ahern 8a430ed50b net: ipv4: fix table id in getroute response
rtm_table is an 8-bit field while table ids are allowed up to u32. Commit
709772e6e0 ("net: Fix routing tables with id > 255 for legacy software")
added the preference to set rtm_table in dumps to RT_TABLE_COMPAT if the
table id is > 255. The table id returned on get route requests should do
the same.

Fixes: c36ba6603a ("net: Allow user to get table id from route lookup")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:18:20 -05:00
David Ahern ea7a80858f net: lwtunnel: Handle lwtunnel_fill_encap failure
Handle failure in lwtunnel_fill_encap adding attributes to skb.

Fixes: 571e722676 ("ipv4: support for fib route lwtunnel encap attributes")
Fixes: 19e42e4515 ("ipv6: support for fib route lwtunnel encap attributes")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 15:11:43 -05:00
Wei Yongjun 79471b10d6 lwt_bpf: bpf_lwt_prog_cmp() can be static
Fixes the following sparse warning:

net/core/lwt_bpf.c:355:5: warning:
 symbol 'bpf_lwt_prog_cmp' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:04:40 -05:00
Daniel Borkmann 62c7989b24 bpf: allow b/h/w/dw access for bpf's cb in ctx
When structs are used to store temporary state in cb[] buffer that is
used with programs and among tail calls, then the generated code will
not always access the buffer in bpf_w chunks. We can ease programming
of it and let this act more natural by allowing for aligned b/h/w/dw
sized access for cb[] ctx member. Various test cases are attached as
well for the selftest suite. Potentially, this can also be reused for
other program types to pass data around.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:00:31 -05:00
Daniel Borkmann 6b8cc1d11e bpf: pass original insn directly to convert_ctx_access
Currently, when calling convert_ctx_access() callback for the various
program types, we pass in insn->dst_reg, insn->src_reg, insn->off from
the original instruction. This information is needed to rewrite the
instruction that is based on the user ctx structure into a kernel
representation for the ctx. As we'd like to allow access size beyond
just BPF_W, we'd need also insn->code for that in order to decode the
original access size. Given that, lets just pass insn directly to the
convert_ctx_access() callback and work on that to not clutter the
callback with even more arguments we need to pass when everything is
already contained in insn. So lets go through that once, no functional
change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 10:00:31 -05:00
Ursula Braun 143c017108 smc: ETH_ALEN as memcpy length for mac addresses
When creating an SMC connection, there is a CLC (connection layer control)
handshake to prepare for RDMA traffic. The corresponding code is part of
commit 0cfdd8f92c ("smc: connection and link group creation").
Mac addresses to be exchanged in the handshake are copied with a wrong
length of 12 instead of 6 bytes. Following code overwrites the wrongly
copied code, but nevertheless the correct length should already be used for
the preceding mac address copying. Use ETH_ALEN for the memcpy length with
mac addresses.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: 0cfdd8f92c ("smc: connection and link group creation")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:47:01 -05:00
Ursula Braun 526735ddc0 net: fix AF_SMC related typo
When introducing the new socket family AF_SMC in
commit ac7138746e ("smc: establish new socket family"),
a typo in af_family_clock_key_strings has slipped in.
This patch repairs it.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Fixes: ac7138746e ("smc: establish new socket family")
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:47:01 -05:00
Florian Fainelli 738b35ccee net: core: Make netif_wake_subqueue a wrapper
netif_wake_subqueue() is duplicating the same thing that netif_tx_wake_queue()
does, so make it call it directly after looking up the queue from the index.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-12 09:18:05 -05:00
Johannes Berg 2cc6f5a715 mac80211: set wifi_acked[_valid] bits for transmitted SKBs
There may be situations in which the in-kernel originator of an
SKB cares about its wifi transmission status. To have that, set
the wifi_acked[_valid] bits before freeing/orphaning the SKB if
the destructor is set. The originator can then use it in there.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-12 10:15:19 +01:00
David Spinadel cef0acd4d7 mac80211: Add RX flag to indicate ICV stripped
Add a flag that indicates that the WEP ICV was stripped from an
RX packet, allowing the device to not transfer that if it's
already checked.

Signed-off-by: David Spinadel <david.spinadel@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-12 10:15:18 +01:00
David S. Miller 02ac5d1487 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Two AF_* families adding entries to the lockdep tables
at the same time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 14:43:39 -05:00
Linus Torvalds ba836a6f5a Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "27 fixes.

  There are three patches that aren't actually fixes. They're simple
  function renamings which are nice-to-have in mainline as ongoing net
  development depends on them."

* akpm: (27 commits)
  timerfd: export defines to userspace
  mm/hugetlb.c: fix reservation race when freeing surplus pages
  mm/slab.c: fix SLAB freelist randomization duplicate entries
  zram: support BDI_CAP_STABLE_WRITES
  zram: revalidate disk under init_lock
  mm: support anonymous stable page
  mm: add documentation for page fragment APIs
  mm: rename __page_frag functions to __page_frag_cache, drop order from drain
  mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
  mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
  mm: don't dereference struct page fields of invalid pages
  mailmap: add codeaurora.org names for nameless email commits
  signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
  mm: pmd dirty emulation in page fault handler
  ipc/sem.c: fix incorrect sem_lock pairing
  lib/Kconfig.debug: fix frv build failure
  mm: get rid of __GFP_OTHER_NODE
  mm: fix remote numa hits statistics
  mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
  ocfs2: fix crash caused by stale lvb with fsdlm plugin
  ...
2017-01-11 11:15:15 -08:00
Simon Horman 99d31326cb net/sched: cls_flower: Support matching on ARP
Support matching on ARP operation, and hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

Example usage:

tc qdisc add dev eth0 ingress

tc filter add dev eth0 protocol arp parent ffff: flower indev eth0 \
	arp_op request arp_sip 10.0.0.1 action drop
tc filter add dev eth0 protocol rarp parent ffff: flower indev eth0 \
	arp_op reply arp_tha 52:54:3f:00:00:00/24 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 11:02:47 -05:00
Simon Horman 55733350e5 flow disector: ARP support
Allow dissection of (R)ARP operation hardware and protocol addresses
for Ethernet hardware and IPv4 protocol addresses.

There are currently no users of FLOW_DISSECTOR_KEY_ARP.
A follow-up patch will allow FLOW_DISSECTOR_KEY_ARP to be used by the
flower classifier.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 11:02:47 -05:00
Johannes Berg d2941df8fb mac80211: recalculate min channel width on VHT opmode changes
When an associated station changes its VHT operating mode this
can/will affect the bandwidth it's using, and consequently we
must recalculate the minimum bandwidth we need to use. Failure
to do so can lead to one of two scenarios:
 1) we use a too high bandwidth, this is benign
 2) we use a too narrow bandwidth, causing rate control and
    actual PHY configuration to be out of sync, which can in
    turn cause problems/crashes

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:35:05 +01:00
Johannes Berg 96aa2e7cf1 mac80211: calculate min channel width correctly
In the current minimum chandef code there's an issue in that the
recalculation can happen after rate control is initialized for a
station that has a wider bandwidth than the current chanctx, and
then rate control can immediately start using those higher rates
which could cause problems.

Observe that first of all that this problem is because we don't
take non-associated and non-uploaded stations into account. The
restriction to non-associated is quite pointless and is one of
the causes for the problem described above, since the rate init
will happen before the station is set to associated; no frames
could actually be sent until associated, but the rate table can
already contain higher rates and that might cause problems.

Also, rejecting non-uploaded stations is wrong, since the rate
control can select higher rates for those as well.

Secondly, it's then necessary to recalculate the minimal config
before initializing rate control, so that when rate control is
initialized, the higher rates are already available. This can be
done easily by adding the necessary function call in rate init.

Change-Id: Ib9bc02d34797078db55459d196993f39dcd43070
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:51 +01:00
Beni Lev 06f7c88c10 cfg80211: consider VHT opmode on station update
Currently, this attribute is only fetched on station addition, but
not on station change. Since this info is only present in the assoc
request, with full station state support in the driver it cannot be
present when the station is added.

Thus, add support for changing the VHT opmode on station update if
done before (or while) the station is marked as associated. After
this, ignore it, since it used to be ignored.

Signed-off-by: Beni Lev <beni.lev@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:34:25 +01:00
Emmanuel Grumbach d7f842442f mac80211: fix the TID on NDPs sent as EOSP carrier
In the commit below, I forgot to translate the mac80211's
AC to QoS IE order. Moreover, the condition in the if was
wrong. Fix both issues.
This bug would hit only with clients that didn't set all
the ACs as delivery enabled.

Fixes: f438ceb81d ("mac80211: uapsd_queues is in QoS IE order")
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:32:58 +01:00
Cedric Izoard c38c39bf7c mac80211: Fix headroom allocation when forwarding mesh pkt
This patch fix issue introduced by my previous commit that
tried to ensure enough headroom was present, and instead
broke it.

When forwarding mesh pkt, mac80211 may also add security header,
and it must therefore be taken into account in the needed headroom.

Fixes: d8da0b5d64 ("mac80211: Ensure enough headroom when forwarding mesh pkt")
Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 16:07:29 +01:00
Colin Ian King eb004603c8 sctp: Fix spelling mistake: "Atempt" -> "Attempt"
Trivial fix to spelling mistake in WARN_ONCE message

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 10:01:01 -05:00
David Ahern 7a18c5b9fb net: ipv4: Fix multipath selection with vrf
fib_select_path does not call fib_select_multipath if oif is set in the
flow struct. For VRF use cases oif is always set, so multipath route
selection is bypassed. Use the FLOWI_FLAG_SKIP_NH_OIF to skip the oif
check similar to what is done in fib_table_lookup.

Add saddr and proto to the flow struct for the fib lookup done by the
VRF driver to better match hash computation for a flow.

Fixes: 613d09b30f ("net: Use VRF device index for lookups on TX")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:59:55 -05:00
Florian Fainelli 592050b254 Revert "net: dsa: Implement ndo_get_phys_port_id"
This reverts commit 3a543ef479 ("net: dsa:
Implement ndo_get_phys_port_id") since it misuses the purpose of
ndo_get_phys_port_id(). We have ndo_get_phys_port_name() to do the
correct thing for us now.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:55:54 -05:00
Florian Fainelli 44bb765cf0 net: dsa: Implement ndo_get_phys_port_name()
Return the physical port number of a DSA created network device using
ndo_get_phys_port_name().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:55:54 -05:00
Arnd Bergmann 73b3514735 cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
We now 'select SOCK_CGROUP_DATA' but Kconfig complains that this is
not right when CONFIG_NET is disabled and there is no socket interface:

warning: (CGROUP_BPF) selects SOCK_CGROUP_DATA which has unmet direct dependencies (NET)

I don't know what the correct solution for this is, but simply removing
the dependency on NET from SOCK_CGROUP_DATA by moving it out of the
'if NET' section avoids the warning and does not produce other build
errors.

Fixes: 483c4933ea ("cgroup: Fix CGROUP_BPF config")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:47:10 -05:00
Vivien Didelot 9f91484f6f net: dsa: make "label" property optional for dsa2
In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.

Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.

Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.

If one wants to rename an interface, udev rules can be used as usual.

Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 09:26:15 -05:00
Eric Dumazet 7cfd5fd5a9 gro: use min_t() in skb_gro_reset_offset()
On 32bit arches, (skb->end - skb->data) is not 'unsigned int',
so we shall use min_t() instead of min() to avoid a compiler error.

Fixes: 1272ce87fa ("gro: Enter slow-path if there is no tailroom")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-11 08:15:40 -05:00
Jorge Ramirez-Ortiz 7acec26cec cfg80211: wext does not need to set monitor channel in managed mode
There is not a valid reason to attempt setting the monitor channel
while in managed mode. Since this code path only deals with this mode,
remove the code block.

Johannes: I'll note that the comment indicated it was for backward
compatibility, but the code wasn't functional since switching the
monitor channel isn't supported (any more?) when in managed mode, as
that mode owns the channel configuration. Additionally, since monitor
can't be done on a managed mode interface, this would only have had
any effect to start with if a separate monitor interface is present,
in which case it's better to change the channel through that anyway,
if even possible.

Signed-off-by: Jorge Ramirez-Ortiz <jorge.ramirez-ortiz@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-11 14:10:44 +01:00
Alexander Duyck 8c2dd3e4a4 mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
Patch series "Page fragment updates", v4.

This patch series takes care of a few cleanups for the page fragments
API.

First we do some renames so that things are much more consistent.  First
we move the page_frag_ portion of the name to the front of the functions
names.  Secondly we split out the cache specific functions from the
other page fragment functions by adding the word "cache" to the name.

Finally I added a bit of documentation that will hopefully help to
explain some of this.  I plan to revisit this later as we get things
more ironed out in the near future with the changes planned for the DMA
setup to support eXpress Data Path.

This patch (of 3):

This patch renames the page frag functions to be more consistent with
other APIs.  Specifically we place the name page_frag first in the name
and then have either an alloc or free call name that we append as the
suffix.  This makes it a bit clearer in terms of naming.

In addition we drop the leading double underscores since we are
technically no longer a backing interface and instead the front end that
is called from the networking APIs.

Link: http://lkml.kernel.org/r/20170104023854.13451.67390.stgit@localhost.localdomain
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Herbert Xu 57ea52a865 gro: Disable frag0 optimization on IPv6 ext headers
The GRO fast path caches the frag0 address.  This address becomes
invalid if frag0 is modified by pskb_may_pull or its variants.
So whenever that happens we must disable the frag0 optimization.

This is usually done through the combination of gro_header_hard
and gro_header_slow, however, the IPv6 extension header path did
the pulling directly and would continue to use the GRO fast path
incorrectly.

This patch fixes it by disabling the fast path when we enter the
IPv6 extension header path.

Fixes: 78a478d0ef ("gro: Inline skb_gro_header and cache frag0 virtual address")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:30:33 -05:00
Herbert Xu 1272ce87fa gro: Enter slow-path if there is no tailroom
The GRO path has a fast-path where we avoid calling pskb_may_pull
and pskb_expand by directly accessing frag0.  However, this should
only be done if we have enough tailroom in the skb as otherwise
we'll have to expand it later anyway.

This patch adds the check by capping frag0_len with the skb tailroom.

Fixes: cb18978cbf ("gro: Open-code final pskb_may_pull")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:26:12 -05:00
Julian Wiedmann dc5367bcc5 net/af_iucv: don't use paged skbs for TX on HiperSockets
With commit e53743994e
("af_iucv: use paged SKBs for big outbound messages"),
we transmit paged skbs for both of AF_IUCV's transport modes
(IUCV or HiperSockets).
The qeth driver for Layer 3 HiperSockets currently doesn't
support NETIF_F_SG, so these skbs would just be linearized again
by the stack.
Avoid that overhead by using paged skbs only for IUCV transport.

cc stable, since this also circumvents a significant skb leak when
sending large messages (where the skb then needs to be linearized).

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # v4.8+
Fixes: e53743994e ("af_iucv: use paged SKBs for big outbound messages")
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:08:29 -05:00
Sowmini Varadhan a505e58252 packet: pdiag_put_ring() should return TX_RING info for TPACKET_V3
Commit 7f953ab2ba ("af_packet: TX_RING support for TPACKET_V3")
now makes it possible to use TX_RING with TPACKET_V3, so make the
the relevant information available via 'ss -e -a --packet'

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 21:02:42 -05:00
Anna, Suman 5d722b3024 net: add the AF_QIPCRTR entries to family name tables
Commit bdabad3e36 ("net: Add Qualcomm IPC router") introduced a
new address family. Update the family name tables accordingly so
that the lockdep initialization can use the proper names for this
family.

Cc: Courtney Cavin <courtney.cavin@sonymobile.com>
Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Suman Anna <s-anna@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:50:59 -05:00
Stephen Boyd 3512a1ad56 net: qrtr: Mark 'buf' as little endian
Failure to mark this pointer as __le32 causes checkers like
sparse to complain:

net/qrtr/qrtr.c:274:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:274:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:274:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:275:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:275:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:275:16:    got restricted __le32 [usertype] <noident>
net/qrtr/qrtr.c:276:16: warning: incorrect type in assignment (different base types)
net/qrtr/qrtr.c:276:16:    expected unsigned int [unsigned] [usertype] <noident>
net/qrtr/qrtr.c:276:16:    got restricted __le32 [usertype] <noident>

Silence it.

Cc: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:45:04 -05:00
Florian Fainelli faf3a932fb net: dsa: Ensure validity of dst->ds[0]
It is perfectly possible to have non zero indexed switches being present
in a DSA switch tree, in such a case, we will be deferencing a NULL
pointer while dsa_cpu_port_ethtool_{setup,restore}. Be more defensive
and ensure that dst->ds[0] is valid before doing anything with it.

Fixes: 0c73c523cf ("net: dsa: Initialize CPU port ethtool ops per tree")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 20:18:52 -05:00
Eric Dumazet d9584d8ccc net: skb_flow_get_be16() can be static
Removes following sparse complain :

net/core/flow_dissector.c:70:8: warning: symbol 'skb_flow_get_be16'
was not declared. Should it be static?

Fixes: 972d3876fa ("flow dissector: ICMP support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 13:30:13 -05:00
Tobias Klauser dc647ec88e net: socket: Make unnecessarily global sockfs_setattr() static
Make sockfs_setattr() static as it is not used outside of net/socket.c

This fixes the following GCC warning:
net/socket.c:534:5: warning: no previous prototype for ‘sockfs_setattr’ [-Wmissing-prototypes]

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-10 11:29:50 -05:00
Florian Westphal 75cda62d9c xfrm: state: simplify rcu_read_unlock handling in two spots
Instead of:
  if (foo) {
      unlock();
      return bar();
   }
   unlock();
do:
   unlock();
   if (foo)
       return bar();

This is ok because rcu protected structure is only dereferenced before
the conditional.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:14 +01:00
Florian Westphal 711059b975 xfrm: add and use xfrm_state_afinfo_get_rcu
xfrm_init_tempstate is always called from within rcu read side section.
We can thus use a simpler function that doesn't call rcu_read_lock
again.

While at it, also make xfrm_init_tempstate return value void, the
return value was never tested.

A followup patch will replace remaining callers of xfrm_state_get_afinfo
with xfrm_state_afinfo_get_rcu variant and then remove the 'old'
get_afinfo interface.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal af5d27c4e1 xfrm: remove xfrm_state_put_afinfo
commit 44abdc3047
("xfrm: replace rwlock on xfrm_state_afinfo with rcu") made
xfrm_state_put_afinfo equivalent to rcu_read_unlock.

Use spatch to replace it with direct calls to rcu_read_unlock:

@@
struct xfrm_state_afinfo *a;
@@

-  xfrm_state_put_afinfo(a);
+  rcu_read_unlock();

old:
 text    data     bss     dec     hex filename
22570      72     424   23066    5a1a xfrm_state.o
 1612       0       0    1612     64c xfrm_output.o
new:
22554      72     424   23050    5a0a xfrm_state.o
 1596       0       0    1596     63c xfrm_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal 423826a7b1 xfrm: avoid rcu sparse warning
xfrm/xfrm_state.c:1973:21: error: incompatible types in comparison expression (different address spaces)

Harmless, but lets fix it to reduce the noise.

While at it, get rid of unneeded NULL check, its never hit:

net/ipv4/xfrm4_state.c: xfrm_state_register_afinfo(&xfrm4_state_afinfo);
net/ipv6/xfrm6_state.c: return xfrm_state_register_afinfo(&xfrm6_state_afinfo);
net/ipv6/xfrm6_state.c: xfrm_state_unregister_afinfo(&xfrm6_state_afinfo);

... are the only callsites.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:13 +01:00
Florian Westphal 961a9c0a4a xfrm: remove unused function
Has been ifdef'd out for more than 10 years, remove it.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-10 10:57:12 +01:00
Vivien Didelot 3a89eaa65d net: dsa: select NET_SWITCHDEV
The support for DSA Ethernet switch chips depends on TCP/IP networking,
thus explicit that HAVE_NET_DSA depends on INET.

DSA uses SWITCHDEV, thus select it instead of depending on it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 17:17:30 -05:00
Eric Dumazet b369e7fd41 tcp: make TCP_INFO more consistent
tcp_get_info() has to lock the socket, so lets lock it
for an extended critical section, so that various fields
have consistent values.

This solves an annoying issue that some applications
reported when multiple counters are updated during one
particular rx/rx event, and TCP_INFO was called from
another cpu.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 17:07:54 -05:00
Alexei Starovoitov 39f19ebbf5 bpf: rename ARG_PTR_TO_STACK
since ARG_PTR_TO_STACK is no longer just pointer to stack
rename it to ARG_PTR_TO_MEM and adjust comment.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:56:27 -05:00
Eric Dumazet 6bb629db5e tcp: do not export tcp_peer_is_proven()
After commit 1fb6f159fd ("tcp: add tcp_conn_request"),
tcp_peer_is_proven() no longer needs to be exported.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:39 -05:00
Pavel Tikhomirov b007f09072 ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:38 -05:00
Alexander Alemayhu 67c408cfa8 ipv6: fix typos
o s/approriate/appropriate
o s/discouvery/discovery

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:34:15 -05:00
Ursula Braun f16a7dd5cf smc: netlink interface for SMC sockets
Support for SMC socket monitoring via netlink sockets of protocol
NETLINK_SOCK_DIAG.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:41 -05:00
Ursula Braun b38d732477 smc: socket closing and linkgroup cleanup
smc_shutdown() and smc_release() handling
delayed linkgroup cleanup for linkgroups without connections

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun 952310ccf2 smc: receive data from RMBE
move RMBE data into user space buffer and update managing cursors

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun e6727f3900 smc: send data (through RDMA)
copy data to kernel send buffer, and trigger RDMA write

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun 5f08318f61 smc: connection data control (CDC)
send and receive CDC messages (via IB message send and CQE)

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun 9bf9abead2 smc: link layer control (LLC)
send and receive LLC messages CONFIRM_LINK (via IB message send and CQE)

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:40 -05:00
Ursula Braun bd4ad57718 smc: initialize IB transport incl. PD, MR, QP, CQ, event, WR
Prepare the link for RDMA transport:
Create a queue pair (QP) and move it into the state Ready-To-Receive (RTR).

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun f38ba179c6 smc: work request (WR) base for use by LLC and CDC
The base containers for RDMA transport are work requests and completion
queue entries processed through Infiniband verbs:
* allocate and initialize these areas
* map these areas to DMA
* implement the basic communication consisting of work request posting
  and receival of completion queue events

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun cd6851f303 smc: remote memory buffers (RMBs)
* allocate data RMB memory for sending and receiving
* size depends on the maximum socket send and receive buffers
* allocated RMBs are kept during life time of the owning link group
* map the allocated RMBs to DMA

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun 0cfdd8f92c smc: connection and link group creation
* create smc_connection for SMC-sockets
* determine suitable link group for a connection
* create a new link group if necessary

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Ursula Braun a046d57da1 smc: CLC handshake (incl. preparation steps)
* CLC (Connection Layer Control) handshake

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:39 -05:00
Thomas Richter 6812baabf2 smc: establish pnet table management
Connection creation with SMC-R starts through an internal
TCP-connection. The Ethernet interface for this TCP-connection is not
restricted to the Ethernet interface of a RoCE device. Any existing
Ethernet interface belonging to the same physical net can be used, as
long as there is a defined relation between the Ethernet interface and
some RoCE devices. This relation is defined with the help of an
identification string called "Physical Net ID" or short "pnet ID".
Information about defined pnet IDs and their related Ethernet
interfaces and RoCE devices is stored in the SMC-R pnet table.

A pnet table entry consists of the identifying pnet ID and the
associated network and IB device.
This patch adds pnet table configuration support using the
generic netlink message interface referring to network and IB device
by their names. Commands exist to add, delete, and display pnet table
entries, and to flush or display the entire pnet table.

There are cross-checks to verify whether the ethernet interfaces
or infiniband devices really exist in the system. If either device
is not available, the pnet ID entry is not created.
Loss of network devices and IB devices is also monitored;
a pnet ID entry is removed when an associated network or
IB device is removed.

Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun a4cf0443c4 smc: introduce SMC as an IB-client
* create a list of SMC IB-devices

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun ac7138746e smc: establish new socket family
* enable smc module loading and unloading
 * register new socket family
 * basic smc socket creation and deletion
 * use backing TCP socket to run CLC (Connection Layer Control)
   handshake of SMC protocol
 * Setup for infiniband traffic is implemented in follow-on patches.
   For now fallback to TCP socket is always used.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:38 -05:00
Ursula Braun 4b9d07a440 net: introduce keepalive function in struct proto
Direct call of tcp_set_keepalive() function from protocol-agnostic
sock_setsockopt() function in net/core/sock.c violates network
layering. And newly introduced protocol (SMC-R) will need its own
keepalive function. Therefore, add "keepalive" function pointer
to "struct proto", and call it from sock_setsockopt() via this pointer.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Reviewed-by: Utz Bacher <utz.bacher@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 16:07:37 -05:00
Jesper Dangaard Brouer 7ba91ecb16 net: for rate-limited ICMP replies save one atomic operation
It is possible to avoid the atomic operation in icmp{v6,}_xmit_lock,
by checking the sysctl_icmp_msgs_per_sec ratelimit before these calls,
as pointed out by Eric Dumazet, but the BH disabled state must be correct.

The icmp_global_allow() call states it must be called with BH
disabled.  This protection was given by the calls icmp_xmit_lock and
icmpv6_xmit_lock.  Thus, split out local_bh_disable/enable from these
functions and maintain it explicitly at callers.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
Jesper Dangaard Brouer c0303efeab net: reduce cycles spend on ICMP replies that gets rate limited
This patch split the global and per (inet)peer ICMP-reply limiter
code, and moves the global limit check to earlier in the packet
processing path.  Thus, avoid spending cycles on ICMP replies that
gets limited/suppressed anyhow.

The global ICMP rate limiter icmp_global_allow() is a good solution,
it just happens too late in the process.  The kernel goes through the
full route lookup (return path) for the ICMP message, before taking
the rate limit decision of not sending the ICMP reply.

Details: The kernels global rate limiter for ICMP messages got added
in commit 4cdf507d54 ("icmp: add a global rate limitation").  It is
a token bucket limiter with a global lock.  It brilliantly avoids
locking congestion by only updating when 20ms (HZ/50) were elapsed. It
can then avoids taking lock when credit is exhausted (when under
pressure) and time constraint for refill is not yet meet.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
Jesper Dangaard Brouer 8d9ba388f3 Revert "icmp: avoid allocating large struct on stack"
This reverts commit 9a99d4a50c ("icmp: avoid allocating large struct
on stack"), because struct icmp_bxm no really a large struct, and
allocating and free of this small 112 bytes hurts performance.

Fixes: 9a99d4a50c ("icmp: avoid allocating large struct on stack")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:49:12 -05:00
David S. Miller aaa9c1071d RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWHNwyPSw1s6N8H32AQKoqw//Wi8fpY/7SlQ8UT0RcF4KlBtfKux4dhMh
 c4P2ARqEi3hVHz0MAJSYwhJDiXmPT8FboXq7yQmXj7DpkwDUgEHJlOZyoZFrStWC
 hE72lbwD/m57jYgTG694wJZnGvTtqBEEkoMMIiUTSpEkSxB8aGsL+8dP9E6Q5hBS
 ixLUHINdjaubsu+uzlI3MZdDk7TWBwp5fNekf4Jbjlb9anoICEkJsjZJHTR9n3nM
 d9QpEbh42+YHAn2EFL8gXN+Cb7o75QppT3K+b68Pz43yvPgMLd78Q4tSN0aCo190
 9ynR1szpniiw3T/xW0dGanpRjKLs7HZubTujc1oQ+TD1Q1Uh+2/nZWb9PxWAAe3S
 CW+ssn6slv9IS+KXyoIMbDtyPaJOu1pMxYcFVXlZOAPXnYGl8P0A610f8u9833jT
 OEqVKQ/bHAPiiTl2X/ATzCePhATtoYUq7jIc71pP01WK+o054bzm0r9Wyjxgs7g6
 iPi4cfueZFOJMilkE9ZWuIws43YDv5wIEOWtpTkRCIHKCmkeVXkDfdRnnXhJCUeF
 6y3iW0staR/pnTqI6g8LEnGku2gbteBQNCueYoJA5jsxLyl6oJw1Bur7yGTzzPnJ
 SP+9+RBlyGI5EzIcqQWsReOhGY4U/hOWDtltYR/gmlhlQ2o/iO4U1aiN0qa1AiaH
 3ixixVygYOA=
 =H/FD
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20170109' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
afs: Refcount afs_call struct

These patches provide some tracepoints for AFS and fix a potential leak by
adding refcounting to the afs_call struct.

The patches are:

 (1) Add some tracepoints for logging incoming calls and monitoring
     notifications from AF_RXRPC and data reception.

 (2) Get rid of afs_wait_mode as it didn't turn out to be as useful as
     initially expected.  It can be brought back later if needed.  This
     clears some stuff out that I don't then need to fix up in (4).

 (3) Allow listen(..., 0) to be used to disable listening.  This makes
     shutting down the AFS cache manager server in the kernel much easier
     and the accounting simpler as we can then be sure that (a) all
     preallocated afs_call structs are relesed and (b) no new incoming
     calls are going to be started.

     For the moment, listening cannot be reenabled.

 (4) Add refcounting to the afs_call struct to fix a potential multiple
     release detected by static checking and add a tracepoint to follow the
     lifecycle of afs_call objects.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:47:52 -05:00
Florian Fainelli a82f67afe8 net: dsa: Make dsa_switch_ops const
Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:44:50 -05:00
Florian Fainelli ab3d408d3f net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver
In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 15:44:50 -05:00
David S. Miller bb1d303444 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-09 15:39:11 -05:00
Davide Caratti c008b33f3e net/sched: act_csum: compute crc32c on SCTP packets
modify act_csum to compute crc32c on IPv4/IPv6 packets having SCTP in
their payload, and extend UAPI definitions accordingly.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:36:57 -05:00
Davide Caratti ab9d226e3e net/sched: Kconfig: select LIBCRC32C if NET_ACT_CSUM is selected
LIBCRC32C is needed to compute crc32c on SCTP packets.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:36:57 -05:00
Alexandru Moise 58fa118f3d cls_u32: don't bother explicitly initializing ->divisor to zero
This struct member is already initialized to zero upon root_ht's
allocation via kzalloc().

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 14:27:16 -05:00
Jason A. Donenfeld fe62d05b29 syncookies: use SipHash in place of SHA1
SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.

The speedup should be similar -- and even more impressive -- to the
speedup from the sequence number fix in this series.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:58:57 -05:00
Jason A. Donenfeld 7cd23e5300 secure_seq: use SipHash in place of MD5
This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.

64-bit x86_64:
[    1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
[    1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
[    1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
[    1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526

32-bit x86:
[    1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
[    1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
[    1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
[    1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Laight <David.Laight@aculab.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:58:57 -05:00
David Ahern eafea7390e net: ipv4: remove disable of bottom half in inet_rtm_getroute
Nothing about the route lookup requires bottom half to be disabled.
Remove the local_bh_disable ... local_bh_enable around ip_route_input.
This appears to be a vestige of days gone by as it has been there
since the beginning of git time.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 13:54:44 -05:00
yuan linyu 1e9116327e net: change init_inodecache() return void
sock_init() call it but not check it's return value,
so change it to void return and add an internal BUG_ON() check.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-09 12:05:29 -05:00
Pau Espin Pedrol bf99b4ded5 tcp: fix mark propagation with fwmark_reflect enabled
Otherwise, RST packets generated by the TCP stack for non-existing
sockets always have mark 0.
The mark from the original packet is assigned to the netns_ipv4/6
socket used to send the response so that it can get copied into the
response skb when the socket sends it.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 18:01:03 +01:00
Pau Espin Pedrol cc31d43b41 netfilter: use fwmark_reflect in nf_send_reset
Otherwise, RST packets generated by ipt_REJECT always have mark 0 when
the routing is checked later in the same code path.

Fixes: e110861f86 ("net: add a sysctl to reflect the fwmark on replies")
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Pau Espin Pedrol <pau.espin@tessares.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 18:01:03 +01:00
Willem de Bruijn ec23189049 xtables: extend matches and targets with .usersize
In matches and targets that define a kernel-only tail to their
xt_match and xt_target data structs, add a field .usersize that
specifies up to where data is to be shared with userspace.

Performed a search for comment "Used internally by the kernel" to find
relevant matches and targets. Manually inspected the structs to derive
a valid offsetof.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:55 +01:00
Willem de Bruijn 4915f7bbc4 xtables: use match, target and data copy_to_user helpers in compat
Convert compat to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:55 +01:00
Willem de Bruijn b5040f6c33 ebtables: use match, target and data copy_to_user helpers
Convert ebtables to copying entries, matches and targets one by one.

The solution is analogous to that of generic xt_(match|target)_to_user
helpers, but is applied to different structs.

Convert existing helpers ebt_make_XXXname helpers that overwrite
fields of an already copy_to_user'd struct with ebt_XXX_to_user
helpers that copy all relevant fields of the struct from scratch.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:54 +01:00
Willem de Bruijn 244b531bee arptables: use match, target and data copy_to_user helpers
Convert arptables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:54 +01:00
Willem de Bruijn e47ddb2c46 ip6tables: use match, target and data copy_to_user helpers
Convert ip6tables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:54 +01:00
Willem de Bruijn f77bc5b23f iptables: use match, target and data copy_to_user helpers
Convert iptables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:53 +01:00
Willem de Bruijn f32815d21d xtables: add xt_match, xt_target and data copy_to_user functions
xt_entry_target, xt_entry_match and their private data may contain
kernel data.

Introduce helper functions xt_match_to_user, xt_target_to_user and
xt_data_to_user that copy only the expected fields. These replace
existing logic that calls copy_to_user on entire structs, then
overwrites select fields.

Private data is defined in xt_match and xt_target. All matches and
targets that maintain kernel data store this at the tail of their
private structure. Extend xt_match and xt_target with .usersize to
limit how many bytes of data are copied. The remainder is cleared.

If compatsize is specified, usersize can only safely be used if all
fields up to usersize use platform-independent types. Otherwise, the
compat_to_user callback must be defined.

This patch does not yet enable the support logic.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-09 17:24:53 +01:00
Andrzej Zaborowski bd2522b168 cfg80211: NL80211_ATTR_SOCKET_OWNER support for CMD_CONNECT
Disconnect or deauthenticate when the owning socket is closed if this
flag is supplied to CMD_CONNECT or CMD_ASSOCIATE.  This may be used
to ensure userspace daemon doesn't leave an unmanaged connection behind.

In some situations it would be possible to account for that, to some
degree, in the deamon restart code or in the up/down scripts without
the use of this attribute.  But there will be systems where the daemon
can go away for varying periods without a warning due to local resource
management.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-09 13:08:47 +01:00
Johannes Berg 4ef8c1c93f cfg80211: size various nl80211 messages correctly
Ilan reported that sometimes nl80211 messages weren't working if
the frames being transported got very large, which was really a
problem for userspace-to-kernel messages, but prompted me to look
at the code.

Upon review, I found various places where variable-length data is
transported in an nl80211 message but the message isn't allocated
taking that into account. This shouldn't cause any problems since
the frames aren't really that long, apart in one place where two
(possibly very long frames) might not fit.

Fix all the places (that I found) that get variable length data
from the driver and put it into a message to take the length of
the variable data into account. The 100 there is just a safe
constant for the remaining message overhead (it's usually around
50 for most messages.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-09 13:08:21 +01:00
David Howells 210f035316 rxrpc: Allow listen(sock, 0) to be used to disable listening
Allow listen() with a backlog of 0 to be used to disable listening on an
AF_RXRPC socket.  This also releases any preallocation, thereby making it
easier for a kernel service to account for all allocated call structures
when shutting down the service.

The socket cannot thereafter have listening reenabled, but must rather be
closed and reopened.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-09 11:10:02 +00:00
Michał Kępień 9b8e34e211 rfkill: Add rfkill-any LED trigger
Add a new "global" (i.e. not per-rfkill device) LED trigger, rfkill-any,
which may be useful on laptops with a single "radio LED" and multiple
radio transmitters.  The trigger is meant to turn a LED on whenever
there is at least one radio transmitter active and turn it off
otherwise.

Signed-off-by: Michał Kępień <kernel@kempniu.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-09 11:40:33 +01:00
Johannes Berg eeb0d56fab mac80211: implement multicast forwarding on fast-RX path
In AP (or VLAN) mode, when unicast 802.11 packets are received,
they might actually be multicast after conversion. In this case
the fast-RX path didn't handle them properly to send them back
to the wireless medium. Implement that by copying the SKB and
sending it back out.

The possible alternative would be to just punt the packet back
to the regular (slow) RX path, but since we have almost all of
the required code here already it's not so complicated to add
here. Punting it back would also mean acquiring the spinlock,
which would be bad for the stated purpose of the fast-RX path,
to enable well-performing parallel RX.

Cc: stable@vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-09 11:12:04 +01:00
Willem de Bruijn bc31c905e9 net-tc: convert tc_from to tc_from_ingress and tc_redirected
The tc_from field fulfills two roles. It encodes whether a packet was
redirected by an act_mirred device and, if so, whether act_mirred was
called on ingress or egress. Split it into separate fields.

The information is needed by the special IFB loop, where packets are
taken out of the normal path by act_mirred, forwarded to IFB, then
reinjected at their original location (ingress or egress) by IFB.

The IFB device cannot use skb->tc_at_ingress, because that may have
been overwritten as the packet travels from act_mirred to ifb_xmit,
when it passes through tc_classify on the IFB egress path. Cache this
value in skb->tc_from_ingress.

That field is valid only if a packet arriving at ifb_xmit came from
act_mirred. Other packets can be crafted to reach ifb_xmit. These
must be dropped. Set tc_redirected on redirection and drop all packets
that do not have this bit set.

Both fields are set only on cloned skbs in tc actions, so original
packet sources do not have to clear the bit when reusing packets
(notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:58:52 -05:00
Willem de Bruijn 8dc07fdbf2 net-tc: convert tc_at to tc_at_ingress
Field tc_at is used only within tc actions to distinguish ingress from
egress processing. A single bit is sufficient for this purpose.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:58:52 -05:00
Willem de Bruijn a5135bcfba net-tc: convert tc_verd to integer bitfields
Extract the remaining two fields from tc_verd and remove the __u16
completely. TC_AT and TC_FROM are converted to equivalent two-bit
integer fields tc_at and tc_from. Where possible, use existing
helper skb_at_tc_ingress when reading tc_at. Introduce helper
skb_reset_tc to clear fields.

Not documenting tc_from and tc_at, because they will be replaced
with single bit fields in follow-on patches.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:58:52 -05:00
Willem de Bruijn e7246e122a net-tc: extract skip classify bit from tc_verd
Packets sent by the IFB device skip subsequent tc classification.
A single bit governs this state. Move it out of tc_verd in
anticipation of removing that __u16 completely.

The new bitfield tc_skip_classify temporarily uses one bit of a
hole, until tc_verd is removed completely in a follow-up patch.

Remove the bit hole comment. It could be 2, 3, 4 or 5 bits long.
With that many options, little value in documenting it.

Introduce a helper function to deduplicate the logic in the two
sites that check this bit.

The field tc_skip_classify is set only in IFB on skbs cloned in
act_mirred, so original packet sources do not have to clear the
bit when reusing packets (notably, pktgen and octeon).

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:58:52 -05:00
Willem de Bruijn d6264071ce net-tc: make MAX_RECLASSIFY_LOOP local
This field is no longer kept in tc_verd. Remove it from the global
definition of that struct.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 20:58:52 -05:00
stephen hemminger bc1f44709c net: make ndo_get_stats64 a void function
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.

Fix all drivers with ndo_get_stats64 to have a void function.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 17:51:44 -05:00
David Ahern dc33da59ff net: ipv4: Remove flow arg from ip_mkroute_input
fl4 arg is not used; remove it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 17:14:35 -05:00
David Ahern 9f09eaeae2 net: ipmr: Remove nowait arg to ipmr_get_route
ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
simplify ipmr_get_route accordingly.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-08 17:14:35 -05:00
Vivien Didelot 111427f6eb net: dsa: move HWMON support to its own file
Isolate the HWMON support in DSA in its own file. Currently only the
legacy DSA code is concerned.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-07 22:37:22 -05:00
Paul Moore bcd5e1a49f netlabel: add CALIPSO to the list of built-in protocols
When we added CALIPSO support in Linux v4.8 we forgot to add it to the
list of supported protocols with display at boot.

Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:20:45 -05:00
Guillaume Nault c5fdae0440 l2tp: rework socket comparison in __l2tp_ip*_bind_lookup()
Split conditions, so that each test becomes clearer.

Also, for l2tp_ip, check if "laddr" is 0. This prevents a socket from
binding to the unspecified address when other sockets are already bound
using the same device (if any), connection ID and namespace.

Same thing for l2tp_ip6: add ipv6_addr_any(laddr) and
ipv6_addr_any(raddr) tests to ensure that an IPv6 unspecified address
passed as parameter is properly treated a wildcard.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:18:56 -05:00
Guillaume Nault 986f7cbc51 l2tp: remove useless NULL check in __l2tp_ip*_bind_lookup()
If "l2tp" was NULL, that'd mean "sk" is NULL too. This can't happen
since "sk" is returned by sk_for_each_bound().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:18:56 -05:00
Guillaume Nault bb39b0bdc8 l2tp: make __l2tp_ip*_bind_lookup() parameters 'const'
Add const qualifier wherever possible for __l2tp_ip_bind_lookup() and
__l2tp_ip6_bind_lookup().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:18:56 -05:00
Guillaume Nault 8cf2f70453 l2tp: remove redundant addr_len check in l2tp_ip_bind()
addr_len's value has already been verified at this point.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:18:56 -05:00
santosh.shilimkar@oracle.com 780e982905 RDS: validate the requested traces user input against max supported
Larger than supported value can lead to array read/write overflow.

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 22:14:26 -05:00
Xin Long a83863174a sctp: prepare asoc stream for stream reconf
sctp stream reconf, described in RFC 6525, needs a structure to
save per stream information in assoc, like stream state.

In the future, sctp stream scheduler also needs it to save some
stream scheduler params and queues.

This patchset is to prepare the stream array in assoc for stream
reconf. It defines sctp_stream that includes stream arrays inside
to replace ssnmap.

Note that we use different structures for IN and OUT streams, as
the members in per OUT stream will get more and more different
from per IN stream.

v1->v2:
  - put these patches into a smaller group.
v2->v3:
  - define sctp_stream to contain stream arrays, and create stream.c
    to put stream-related functions.
  - merge 3 patches into 1, as new sctp_stream has the same name
    with before.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 21:07:26 -05:00
Eric Garver df560056d9 udp: inuse checks can quit early for reuseport
UDP lib inuse checks will walk the entire hash bucket to check if the
portaddr is in use. In the case of reuseport we can stop searching when
we find a matching reuseport.

On a 16-core VM a test program that spawns 16 threads that each bind to
1024 sockets (one per 10ms) takes 1m45s. With this change it takes 11s.

Also add a cond_resched() when the port is not specified.

Signed-off-by: Eric Garver <e@erig.me>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 20:56:48 -05:00
David S. Miller 219a808fa1 Another single fix, to correctly handle destruction of a
single netlink socket having ownership of multiple objects
 (scheduled scan requests and interfaces.)
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYb47lAAoJEGt7eEactAAdGg0QAJBevWlObLIDrFJIzoq7caPc
 UV9/QqNCi8RzU0FrJBMn8xOWvTTEiL5WNLm2SstKg4SQNYCfn0DNTw8GTfeQiAtr
 ppbkDm+sEyANVOEneyMggEJlD1C0h3t8M7z6qYWJRuHndBrLzS/Safv44YNhQOeg
 dkOwwlAgc5VHxOB2pFfAWS6N+F0l+uqLE30NomOVcNZv19PRZgPq5VA9IqrXbNMl
 r9/cDYp9d3tSPB/LGd6rQLT6t4bTDKiYKtdxEkrBf8ceFLMhCc/IAzv+o92cSx6H
 r5ZrJ08rUgcJfgtZi88EK7phF+tQjR94pBnvKfyH9afsyLYzTNrdBrUuugTkKnVi
 J4oN3Sc/rhw3yagA1uXDzNXj6RtMKw+c8UUYuGNt+1nv9Hx6ZeDoVpT+Ms21JNTz
 ejSc10I+E5GjjBOSj+w8r0ofgDlK1B0+S3cs5M4mwS8d0W0ibDcanaru5DIwfnd2
 w1UJiTLQHnFG4Gp9LdoNlF1lXPfaQlXbXeXc0OlmQhnBL82lqbR14ast1dRk1HXi
 LpEqJFgToUmiVemjYsepI9//gK4vz3BCokOnjtnMtxVEUyv1Ud2eDPHx0DrKu+Oq
 JAJY4q6g8R2k8vRL+eupFIEeZTyz++7Yh2YsWv26EgB2mNvE1LCVBPeUg30wrX6+
 BcyYx4lgXlkeHH0kxnVA
 =dUOq
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2017-01-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Another single fix, to correctly handle destruction of a
single netlink socket having ownership of multiple objects
(scheduled scan requests and interfaces.)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 16:26:19 -05:00
David Forster 93e246f783 vti6: fix device register to report IFLA_INFO_KIND
vti6 interface is registered before the rtnl_link_ops block
is attached. As a result the resulting RTM_NEWLINK is missing
IFLA_INFO_KIND. Re-order attachment of rtnl_link_ops block to fix.

Signed-off-by: Dave Forster <dforster@brocade.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 16:09:09 -05:00
David Ahern c7b371e34c net: ipv4: make fib_select_default static
fib_select_default has a single caller within the same file.
Make it static.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 15:57:50 -05:00
David Ahern 0c8d803f39 net: ipv4: Simplify rt_fill_info
rt_fill_info has only 1 caller and both of the last 2 args -- nowait
and flags -- are hardcoded to 0. Given that remove them as input arguments
and simplify rt_fill_info accordingly.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 15:57:01 -05:00
Vivien Didelot 7558828ade net: dsa: remove version string
The dsa_driver_version string is irrelevant and has not been bumped
since its introduction about 9 years ago. Kill it.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 15:49:10 -05:00
Mahesh Bandewar f784ad3d79 ipv6: do not send RTM_DELADDR for tentative addresses
RTM_NEWADDR notification is sent when IFA_F_TENTATIVE is cleared from
the address. So if the address is added and deleted before DAD probes
completes, the RTM_DELADDR will be sent for which there was no
RTM_NEWADDR causing asymmetry in notification. However if the same
logic is used while sending RTM_DELADDR notification, this asymmetry
can be avoided.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
CC: Patrick McHardy <kaber@trash.net>
CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-06 15:39:31 -05:00
Rafał Miłecki e691ac2f75 cfg80211: support ieee80211-freq-limit DT property
This patch adds a helper for reading that new property and applying
limitations of supported channels specified this way.
It is used with devices that normally support a wide wireless band but
in a given config are limited to some part of it (usually due to board
design). For example a dual-band chipset may be able to support one band
only because of used antennas.
It's also common that tri-band routers have separated radios for lower
and higher part of 5 GHz band and it may be impossible to say which is
which without a DT info.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
[add new function to documentation, fix link]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-06 14:01:13 +01:00
Rafał Miłecki 4787cfa084 cfg80211: move function checking range fit to util.c
It is needed for another cfg80211 helper that will be out of reg.c so
move it to common util.c file and make it non-static.

Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-06 13:54:04 +01:00
Florian Westphal b3b73b8e6d xfrm: state: do not acquire lock in get_mtu helpers
Once flow cache gets removed the mtu initialisation happens for every skb
that gets an xfrm attached, so this lock starts to show up in perf.

It is not obvious why this lock is required -- the caller holds
reference on the state struct, type->destructor is only called from the
state gc worker (all state structs on gc list must have refcount 0).

xfrm_init_state already has been called (else private data accessed
by type->get_mtu() would not be set up).

So just remove the lock -- the race on the state (DEAD?) doesn't
matter (could change right after dropping the lock too).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-06 08:44:56 +01:00
Soheil Hassas Yeganeh ad02c4f547 tcp: provide timestamps for partial writes
For TCP sockets, TX timestamps are only captured when the user data
is successfully and fully written to the socket. In many cases,
however, TCP writes can be partial for which no timestamp is
collected.

Collect timestamps whenever any user data is (fully or partially)
copied into the socket. Pass tcp_write_queue_tail to tcp_tx_timestamp
instead of the local skb pointer since it can be set to NULL on
the error path.

Note that tcp_write_queue_tail can be NULL, even if bytes have been
copied to the socket. This is because acknowledgements are being
processed in tcp_sendmsg(), and by the time tcp_tx_timestamp is
called tcp_write_queue_tail can be NULL. For such cases, this patch
does not collect any timestamps (i.e., it is best-effort).

This patch is written with suggestions from Willem de Bruijn and
Eric Dumazet.

Change-log V1 -> V2:
	- Use sockc.tsflags instead of sk->sk_tsflags.
	- Use the same code path for normal writes and errors.

Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-05 14:56:16 -05:00
David S. Miller 37e65dc184 RxRPC rewrite
-----BEGIN PGP SIGNATURE-----
 
 iQIVAwUAWG5WZ/Sw1s6N8H32AQJwQxAAoN5jfbwbpEX9wxfdzCQPaM3c08ZoY8V9
 H7/+Anjv0R3nwy4OaESAAnx6i/gAc5lN7lwQMINT2hpmVSeZ/isuTVNYZorgASrt
 Y2+rsgbg1tP7rIuwssM4hqCP7oxGDdGUoHR7PrmmC2nCFCptDmSfoaXFzojqxAf8
 8KI0gBidIrZqsI8IYJIVAZXYdkAlIVK+M+IzVrVAFMpV9ZQaRssr2VJbqnzz/rRu
 ycOB9/YYZySsFH4VQccROdS+GLsftl2C8Xm4R+wPyAlTLaiVhyUK5NZ60GTdHhSw
 o4yJI2jiQIX09uuiVGHICjuTH5GfdiLzmSwwgC6bqb2pyKDtXGDVdVr0ewlbHYnY
 +oDaDgg37v+FNAVjjsKytISXaxdpHKz3lqzMj707Roa19LeRM0q7sD9HuHUm86oi
 HXru5DYNIGE6VzrCViUFp35cBe8b3GxDntx0702mRCpnDA/fUSnHl0fMhx71BNDV
 3uocBbY3jXKhNPs+eVZdV3E6xOT/FjEJMRJKk+2Z9F+B5XH5RQ7Z1KluphhFHkce
 TxSD1RxHIJL64BWzBbxEmCrcf6kR5NBB+xmoHi4TCoeR8IqWBhQ7DLKhYDPK2289
 /Uq1c4OCQmWocHJTzNn47Od2MCU6BtbCI9JttVDb03u7sbYDZ+b/Me/FE22vRSZB
 miuy4PJTA6s=
 =C+Ip
 -----END PGP SIGNATURE-----

Merge tag 'rxrpc-rewrite-20170105' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

David Howells says:

====================
rxrpc: Update tracing and proc interfaces

This set of patches fixes and extends tracing:

 (1) Fix the handling of enum-to-string translations so that external
     tracing tools can make use of it by using TRACE_DEFINE_ENUM.

 (2) Extend a couple of tracepoints to export some extra available
     information and add three new tracepoints to allow monitoring of
     received DATA packets, call disconnection and improper/implicit call
     termination.

and adds a bit more procfs-exported information:

 (3) Show a call's hard-ACK cursors in /proc/net/rxrpc_calls.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-05 12:43:28 -05:00
David S. Miller d896b3120b Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains accumulated Netfilter fixes for your
net tree:

1) Ensure quota dump and reset happens iff we can deliver numbers to
   userspace.

2) Silence splat on incorrect use of smp_processor_id() from nft_queue.

3) Fix an out-of-bound access reported by KASAN in
   nf_tables_rule_destroy(), patch from Florian Westphal.

4) Fix layer 4 checksum mangling in the nf_tables payload expression
   with IPv6.

5) Fix a race in the CLUSTERIP target from control plane path when two
   threads run to add a new configuration object. Serialize invocations
   of clusterip_config_init() using spin_lock. From Xin Long.

6) Call br_nf_pre_routing_finish_bridge_finish() once we are done with
   the br_nf_pre_routing_finish() hook. From Artur Molchanov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-05 11:49:57 -05:00
Volodymyr Bendiuga 5e6eb45698 net:dsa: check for EPROBE_DEFER from dsa_dst_parse()
Since there can be multiple dsa switches stacked together but
not all of devicetree nodes available at the time of calling
dsa_dst_parse(), EPROBE_DEFER can be returned by it. When this
happens, only the last dsa switch has to be deleted by
dsa_dst_del_ds(), but not the whole list, because next time linux
cames back to this function it will try to add only the last dsa
switch which returned EPROBE_DEFER.

Signed-off-by: Volodymyr Bendiuga <volodymyr.bendiuga@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-05 11:39:53 -05:00
David S. Miller 76eb75be79 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2017-01-05 11:03:07 -05:00
Geliang Tang 4cc4b72c13 netfilter: xt_connlimit: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-05 13:27:02 +01:00
Davide Caratti cf6e007eef netfilter: conntrack: validate SCTP crc32c in PREROUTING
implement sctp_error to let nf_conntrack_in validate crc32c on the packet
transport header. Assign skb->ip_summed to CHECKSUM_UNNECESSARY and return
NF_ACCEPT in case of successful validation; otherwise, return -NF_ACCEPT to
let netfilter skip connection tracking, like other protocols do.

Besides preventing corrupted packets from matching conntrack entries, this
fixes functionality of REJECT target: it was not generating any ICMP upon
reception of SCTP packets, because it was computing RFC 1624 checksum on
the packet and systematically mismatching crc32c in the SCTP header.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-05 13:24:47 +01:00
Davide Caratti 300ae14946 netfilter: select LIBCRC32C together with SCTP conntrack
nf_conntrack needs to compute crc32c when dealing with SCTP packets.
Moreover, NF_NAT_PROTO_SCTP (currently selecting LIBCRC32C) can be enabled
only if conntrack support for SCTP is enabled. Therefore, move enabling of
kernel support for crc32c so that it is selected when NF_CT_PROTO_SCTP=y.

Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-05 13:24:43 +01:00
Arend Van Spriel 343884c87e cfg80211: only pass sband to set_mandatory_flags_band()
The supported band structure contains the band is applies to
so no need to pass it separately. Also added a default case
to the switch for completeness. The current code base does not
call this function with NUM_NL80211_BANDS but kept that case
statement although default case would cover that.

Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-05 12:58:03 +01:00
David Howells 3e018daf04 rxrpc: Show a call's hard-ACK cursors in /proc/net/rxrpc_calls
Show a call's hard-ACK cursors in /proc/net/rxrpc_calls so that a call's
progress can be more easily monitored.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-05 11:39:44 +00:00
David Howells b1d9f7fde0 rxrpc: Add some more tracing
Add the following extra tracing information:

 (1) Modify the rxrpc_transmit tracepoint to record the Tx window size as
     this is varied by the slow-start algorithm.

 (2) Modify the rxrpc_rx_ack tracepoint to record more information from
     received ACK packets.

 (3) Add an rxrpc_rx_data tracepoint to record the information in DATA
     packets.

 (4) Add an rxrpc_disconnect_call tracepoint to record call disconnection,
     including the reason the call was disconnected.

 (5) Add an rxrpc_improper_term tracepoint to record implicit termination
     of a call by a client either by starting a new call on a particular
     connection channel without first transmitting the final ACK for the
     previous call.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-05 11:39:12 +00:00
David Howells b54a134a7d rxrpc: Fix handling of enums-to-string translation in tracing
Fix the way enum values are translated into strings in AF_RXRPC
tracepoints.  The problem with just doing a lookup in a normal flat array
of strings or chars is that external tracing infrastructure can't find it.
Rather, TRACE_DEFINE_ENUM must be used.

Also sort the enums and string tables to make it easier to keep them in
order so that a future patch to __print_symbolic() can be optimised to try
a direct lookup into the table first before iterating over it.

A couple of _proto() macro calls are removed because they refered to tables
that got moved to the tracing infrastructure.  The relevant data can be
found by way of tracing.

Signed-off-by: David Howells <dhowells@redhat.com>
2017-01-05 10:38:33 +00:00
Johannes Berg 753aacfd2e nl80211: fix sched scan netlink socket owner destruction
A single netlink socket might own multiple interfaces *and* a
scheduled scan request (which might belong to another interface),
so when it goes away both may need to be destroyed.

Remove the schedule_scan_stop indirection to fix this - it's only
needed for interface destruction because of the way this works
right now, with a single work taking care of all interfaces.

Cc: stable@vger.kernel.org
Fixes: 93a1e86ce1 ("nl80211: Stop scheduled scan if netlink client disappears")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-05 10:59:53 +01:00
Daniel Borkmann 57ea884b0d packet: fix panic in __packet_set_timestamp on tpacket_v3 in tx mode
When TX timestamping is in use with TPACKET_V3's TX ring, then we'll
hit the BUG() in __packet_set_timestamp() when ring buffer slot is
returned to user space via tpacket_destruct_skb(). This is due to v3
being assumed as unreachable here, but since 7f953ab2ba ("af_packet:
TX_RING support for TPACKET_V3") it's not anymore. Fix it by filling
the timestamp back into the ring slot.

Fixes: 7f953ab2ba ("af_packet: TX_RING support for TPACKET_V3")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 23:55:42 -05:00
Vivien Didelot a896eee334 net: dsa: remove out label in dsa_switch_setup_one
The "out" label in dsa_switch_setup_one() is useless, thus remove it.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:29:27 -05:00
David S. Miller ac4340fc3c net: Assert at build time the assumptions we make about the CMSG header.
It must always be the case that CMSG_ALIGN(sizeof(hdr)) == sizeof(hdr).

Otherwise there are missing adjustments in the various calculations
that parse and build these things.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:24:19 -05:00
yuan linyu 1ff8cebf49 scm: remove use CMSG{_COMPAT}_ALIGN(sizeof(struct {compat_}cmsghdr))
sizeof(struct cmsghdr) and sizeof(struct compat_cmsghdr) already aligned.
remove use CMSG_ALIGN(sizeof(struct cmsghdr)) and
CMSG_COMPAT_ALIGN(sizeof(struct compat_cmsghdr)) keep code consistent.

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-04 13:04:37 -05:00
Johannes Berg 5ec71dd7f1 cfg80211: sysfs: use wiphy_name()
Instead of open-coding dev_name(), use the wiphy_name() inline
to make the code easier to understand. While at it, clean up
some coding style.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-04 08:24:49 +01:00
Sven Eckelmann 4ea33ef0f9 batman-adv: Decrease hardif refcnt on fragmentation send error
An error before the hardif is found has to free the skb. But every error
after that has to free the skb + put the hard interface.

Fixes: 8def0be82d ("batman-adv: Consume skb in batadv_frag_send_packet")
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-04 08:22:04 +01:00
Alexander Alemayhu 1365e547c6 xfrm: trivial typos
o s/descentant/descendant
o s/workarbound/workaround

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2017-01-04 06:49:28 +01:00
Yotam Gigi ec2507d2a3 net/sched: cls_matchall: Fix error path
Fix several error paths in matchall:
 - Release reference to actions in case the hardware fails offloading
   (relevant to skip_sw only)
 - Fix error path in case tcf_exts initialization/validation fail

Fixes: bf3994d2ed ("net/sched: introduce Match-all classifier")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 18:58:27 -05:00
Jon Paul Maloy 365ad353c2 tipc: reduce risk of user starvation during link congestion
The socket code currently handles link congestion by either blocking
and trying to send again when the congestion has abated, or just
returning to the user with -EAGAIN and let him re-try later.

This mechanism is prone to starvation, because the wakeup algorithm is
non-atomic. During the time the link issues a wakeup signal, until the
socket wakes up and re-attempts sending, other senders may have come
in between and occupied the free buffer space in the link. This in turn
may lead to a socket having to make many send attempts before it is
successful. In extremely loaded systems we have observed latency times
of several seconds before a low-priority socket is able to send out a
message.

In this commit, we simplify this mechanism and reduce the risk of the
described scenario happening. When a message is attempted sent via a
congested link, we now let it be added to the link's backlog queue
anyway, thus permitting an oversubscription of one message per source
socket. We still create a wakeup item and return an error code, hence
instructing the sender to block or stop sending. Only when enough space
has been freed up in the link's backlog queue do we issue a wakeup event
that allows the sender to continue with the next message, if any.

The fact that a socket now can consider a message sent even when the
link returns a congestion code means that the sending socket code can
be simplified. Also, since this is a good opportunity to get rid of the
obsolete 'mtu change' condition in the three socket send functions, we
now choose to refactor those functions completely.

Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
Jon Paul Maloy 4d8642d896 tipc: modify struct tipc_plist to be more versatile
During multicast reception we currently use a simple linked list with
push/pop semantics to store port numbers.

We now see a need for a more generic list for storing values of type
u32. We therefore make some modifications to this list, while replacing
the prefix 'tipc_plist_' with 'u32_'. We also add a couple of new
functions which will come to use in the next commits.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
Jon Paul Maloy 8c44e1af16 tipc: unify tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() functions
The functions tipc_wait_for_sndpkt() and tipc_wait_for_sndmsg() are very
similar. The latter function is also called from two locations, and
there will be more in the coming commits, which will all need to test on
different conditions.

Instead of making yet another duplicates of the function, we now
introduce a new macro tipc_wait_for_cond() where the wakeup condition
can be stated as an argument to the call. This macro replaces all
current and future uses of the two functions, which can now be
eliminated.

Acked-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:13:05 -05:00
Reiter Wolfgang 3b48ab2248 drop_monitor: consider inserted data in genlmsg_end
Final nlmsg_len field update must reflect inserted net_dm_drop_point
data.

This patch depends on previous patch:
"drop_monitor: add missing call to genlmsg_end"

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:09:44 -05:00
Sowmini Varadhan 7f953ab2ba af_packet: TX_RING support for TPACKET_V3
Although TPACKET_V3 Rx has some benefits over TPACKET_V2 Rx, *_v3
does not currently have TX_RING support. As a result an application
that wants the best perf for Tx and Rx (e.g. to handle request/response
transacations) ends up needing 2 sockets, one with *_v2 for Tx and
another with *_v3 for Rx.

This patch enables TPACKET_V2 compatible Tx features in TPACKET_V3
so that an application can use a single descriptor to get the benefits
of _v3 RX_RING and _v2 TX_RING. An application may do a block-send by
first filling up multiple frames in the Tx ring and then triggering a
transmit. This patch only support fixed size Tx frames for TPACKET_V3,
and requires that tp_next_offset must be zero.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 11:00:27 -05:00
Nikolay Aleksandrov 1708ebc963 ipmr, ip6mr: add RTNH_F_UNRESOLVED flag to unresolved cache entries
While working with ipmr, we noticed that it is impossible to determine
if an entry is actually unresolved or its IIF interface has disappeared
(e.g. virtual interface got deleted). These entries look almost
identical to user-space when dumping or receiving notifications. So in
order to recognize them add a new RTNH_F_UNRESOLVED flag which is set when
sending an unresolved cache entry to user-space.

Suggested-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 10:04:31 -05:00
Alexander Duyck 5350d54f6c ipv4: Do not allow MAIN to be alias for new LOCAL w/ custom rules
In the case of custom rules being present we need to handle the case of the
LOCAL table being intialized after the new rule has been added.  To address
that I am adding a new check so that we can make certain we don't use an
alias of MAIN for LOCAL when allocating a new table.

Fixes: 0ddcf43d5d ("ipv4: FIB Local/MAIN table collapse")
Reported-by: Oliver Brunel <jjk@jjacky.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-03 09:38:34 -05:00
Liping Zhang 949a358418 netfilter: nft_ct: add average bytes per packet support
Similar to xt_connbytes, user can match how many average bytes per packet
a connection has transferred so far.

Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03 14:33:26 +01:00
Florian Westphal 9700ba805b netfilter: nat: merge udp and udplite helpers
udplite nat was copied from udp nat, they are virtually 100% identical.
Not really surprising given udplite is just udp with partial csum coverage.

old:
   text    data     bss     dec     hex filename
  11606    1457     210   13273    33d9 nf_nat.ko
    330       0       2     332     14c nf_nat_proto_udp.o
    276       0       2     278     116 nf_nat_proto_udplite.o
new:
   text    data     bss     dec     hex filename
  11598    1457     210   13265    33d1 nf_nat.ko
    640       0       4     644     284 nf_nat_proto_udp.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03 14:33:25 +01:00
Florian Westphal e4781421e8 netfilter: merge udp and udplite conntrack helpers
udplite was copied from udp, they are virtually 100% identical.

This adds udplite tracker to udp instead, removes udplite module,
and then makes the udplite tracker builtin.

udplite will then simply re-use udp timeout settings.
It makes little sense to add separate sysctls, nowadays we have
fine-grained timeout policy support via the CT target.

old:
 text    data     bss     dec     hex filename
 1633     672       0    2305     901 nf_conntrack_proto_udp.o
 1756     672       0    2428     97c nf_conntrack_proto_udplite.o
69526   17937     268   87731   156b3 nf_conntrack.ko

new:
 text    data     bss     dec     hex filename
 2442    1184       0    3626     e2a nf_conntrack_proto_udp.o
68565   17721     268   86554   1521a nf_conntrack.ko

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-01-03 14:33:25 +01:00
Santosh Shilimkar 3289025aed RDS: add receive message trace used by application
Socket option to tap receive path latency in various stages
in nano seconds. It can be enabled on selective sockets using
using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return
the data to application with RDS_CMSG_RXPATH_LATENCY in defined
format. Scope is left to add more trace points for future
without need of change in the interface.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:59 -08:00
Avinash Repaka f9fb69adb6 RDS: make message size limit compliant with spec
RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.

Signed-off-by: Avinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:57 -08:00
Venkat Venkatsubra 192a798f52 RDS: add stat for socket recv memory usage
Tracks the receive side memory added to scokets and removed from sockets.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:56 -08:00
Santosh Shilimkar cf657269d3 RDS: IB: fix panic due to handlers running post teardown
Shutdown code reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet code issues, tasklet handler could
still run even after tasklet_kill,
RDS IB shutdown code already reaps the CQs before freeing
cq/qp resources so as such the handlers have nothing left
to do post shutdown.

On other hand any handler running after teardown and trying
to access already freed qp/cq resources causes issues
Patch fixes this race by  makes sure that handlers returns
without any action post teardown.

Reviewed-by: Wengang <wen.gang.wang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:55 -08:00
Santosh Shilimkar 941f8d55f6 RDS: RDMA: Fix the composite message user notification
When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.

Reviewed-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:54 -08:00
Santosh Shilimkar be2f76eacc RDS: IB: Add vector spreading for cqs
Based on available device vectors, allocate cqs accordingly to
get better spread of completion vectors which helps performace
great deal..

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:52 -08:00
Santosh Shilimkar 09b2b8f528 RDS: IB: add few useful cache stasts
Tracks the ib receive cache total, incoming and frag allocations.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:51 -08:00
Santosh Shilimkar 581d53c91c RDS: IB: track and log active side endpoint in connection
Useful to know the active and passive end points in a
RDS IB connection.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:50 -08:00
Santosh Shilimkar c536a06887 RDS: RDMA: silence the use_once mr log flood
In absence of extension headers, message log will keep
flooding the console. As such even without use_once we can
clean up the MRs so its not really an error case message
so make it debug message

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:49 -08:00
Santosh Shilimkar 5601245931 RDS: IB: split the mr registration and invalidation path
MR invalidation in RDS is done in background thread and not in
data path like registration. So break the dependency between them
which helps to remove the performance bottleneck.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:47 -08:00
Santosh Shilimkar 584a8279a4 RDS: RDMA: return appropriate error on rdma map failures
The first message to a remote node should prompt a new
connection even if it is RDMA operation. For RDMA operation
the MR mapping can fail because connections is not yet up.

Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
Before returning to the user, lets trigger the connection
so that its ready for the next retry.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:46 -08:00
Qing Huang 8d5d8a5fd7 RDS: RDMA: start rdma listening after init
This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.

Signed-off-by: Qing Huang <qing.huang@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:45 -08:00
Santosh Shilimkar 3e56c2f856 RDS: RDMA: fix the ib_map_mr_sg_zbva() argument
Fixes warning: Using plain integer as NULL pointer

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:44 -08:00
Santosh Shilimkar fab8688d71 RDS: IB: make the transport retry count smallest
Transport retry is not much useful since it indicate packet loss
in fabric so its better to failover fast rather than longer retry.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:43 -08:00
Santosh Shilimkar ff3f19a2f6 RDS: IB: include faddr in connection log
Also use pr_* for it.

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:41 -08:00
Santosh Shilimkar bb7897631d RDS: mark few internal functions static to make sparse build happy
Fixes below warnings:
warning: symbol 'rds_send_probe' was not declared. Should it be static?
warning: symbol 'rds_send_ping' was not declared. Should it be static?
warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:40 -08:00
Santosh Shilimkar f69b22e65e RDS: log the address on bind failure
It's useful to know the IP address when RDS fails to bind a
connection. Thus, adding it to the error message.

Orabug: 21894138
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
2017-01-02 14:02:39 -08:00
Michal Tesar 7ababb7826 igmp: Make igmp group member RFC 3376 compliant
5.2. Action on Reception of a Query

 When a system receives a Query, it does not respond immediately.
 Instead, it delays its response by a random amount of time, bounded
 by the Max Resp Time value derived from the Max Resp Code in the
 received Query message.  A system may receive a variety of Queries on
 different interfaces and of different kinds (e.g., General Queries,
 Group-Specific Queries, and Group-and-Source-Specific Queries), each
 of which may require its own delayed response.

 Before scheduling a response to a Query, the system must first
 consider previously scheduled pending responses and in many cases
 schedule a combined response.  Therefore, the system must be able to
 maintain the following state:

 o A timer per interface for scheduling responses to General Queries.

 o A per-group and interface timer for scheduling responses to Group-
   Specific and Group-and-Source-Specific Queries.

 o A per-group and interface list of sources to be reported in the
   response to a Group-and-Source-Specific Query.

 When a new Query with the Router-Alert option arrives on an
 interface, provided the system has state to report, a delay for a
 response is randomly selected in the range (0, [Max Resp Time]) where
 Max Resp Time is derived from Max Resp Code in the received Query
 message.  The following rules are then used to determine if a Report
 needs to be scheduled and the type of Report to schedule.  The rules
 are considered in order and only the first matching rule is applied.

 1. If there is a pending response to a previous General Query
    scheduled sooner than the selected delay, no additional response
    needs to be scheduled.

 2. If the received Query is a General Query, the interface timer is
    used to schedule a response to the General Query after the
    selected delay.  Any previously pending response to a General
    Query is canceled.
--8<--

Currently the timer is rearmed with new random expiration time for
every incoming query regardless of possibly already pending report.
Which is not aligned with the above RFE.
It also might happen that higher rate of incoming queries can
postpone the report after the expiration time of the first query
causing group membership loss.

Now the per interface general query timer is rearmed only
when there is no pending report already scheduled on that interface or
the newly selected expiration time is before the already pending
scheduled report.

Signed-off-by: Michal Tesar <mtesar@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 13:01:03 -05:00
Ian Kumlien d0af683407 flow_dissector: Update pptp handling to avoid null pointer deref.
__skb_flow_dissect can be called with a skb or a data packet, either
can be NULL. All calls seems to have been moved to __skb_header_pointer
except the pptp handling which is still calling skb_header_pointer.

skb_header_pointer will use skb->data and thus:
[  109.556866] BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
[  109.557102] IP: [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.557263] PGD 0
[  109.557338]
[  109.557484] Oops: 0000 [#1] SMP
[  109.557562] Modules linked in: chaoskey
[  109.557783] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.9.0 #79
[  109.557867] Hardware name: Supermicro A1SRM-LN7F/LN5F/A1SRM-LN7F-2758, BIOS 1.0c 11/04/2015
[  109.557957] task: ffff94085c27bc00 task.stack: ffffb745c0068000
[  109.558041] RIP: 0010:[<ffffffff88dc02f8>]  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.558203] RSP: 0018:ffff94087fc83d40  EFLAGS: 00010206
[  109.558286] RAX: 0000000000000130 RBX: ffffffff8975bf80 RCX: ffff94084fab6800
[  109.558373] RDX: 0000000000000010 RSI: 000000000000000c RDI: 0000000000000000
[  109.558460] RBP: 0000000000000b88 R08: 0000000000000000 R09: 0000000000000022
[  109.558547] R10: 0000000000000008 R11: ffff94087fc83e04 R12: 0000000000000000
[  109.558763] R13: ffff94084fab6800 R14: ffff94087fc83e04 R15: 000000000000002f
[  109.558979] FS:  0000000000000000(0000) GS:ffff94087fc80000(0000) knlGS:0000000000000000
[  109.559326] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  109.559539] CR2: 0000000000000080 CR3: 0000000281809000 CR4: 00000000001026e0
[  109.559753] Stack:
[  109.559957]  000000000000000c ffff94084fab6822 0000000000000001 ffff94085c2b5fc0
[  109.560578]  0000000000000001 0000000000002000 0000000000000000 0000000000000000
[  109.561200]  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  109.561820] Call Trace:
[  109.562027]  <IRQ>
[  109.562108]  [<ffffffff88dfb4fa>] ? eth_get_headlen+0x7a/0xf0
[  109.562522]  [<ffffffff88c5a35a>] ? igb_poll+0x96a/0xe80
[  109.562737]  [<ffffffff88dc912b>] ? net_rx_action+0x20b/0x350
[  109.562953]  [<ffffffff88546d68>] ? __do_softirq+0xe8/0x280
[  109.563169]  [<ffffffff8854704a>] ? irq_exit+0xaa/0xb0
[  109.563382]  [<ffffffff8847229b>] ? do_IRQ+0x4b/0xc0
[  109.563597]  [<ffffffff8902d4ff>] ? common_interrupt+0x7f/0x7f
[  109.563810]  <EOI>
[  109.563890]  [<ffffffff88d57530>] ? cpuidle_enter_state+0x130/0x2c0
[  109.564304]  [<ffffffff88d57520>] ? cpuidle_enter_state+0x120/0x2c0
[  109.564520]  [<ffffffff8857eacf>] ? cpu_startup_entry+0x19f/0x1f0
[  109.564737]  [<ffffffff8848d55a>] ? start_secondary+0x12a/0x140
[  109.564950] Code: 83 e2 20 a8 80 0f 84 60 01 00 00 c7 04 24 08 00
00 00 66 85 d2 0f 84 be fe ff ff e9 69 fe ff ff 8b 34 24 89 f2 83 c2
04 66 85 c0 <41> 8b 84 24 80 00 00 00 0f 49 d6 41 8d 31 01 d6 41 2b 84
24 84
[  109.569959] RIP  [<ffffffff88dc02f8>] __skb_flow_dissect+0xa88/0xce0
[  109.570245]  RSP <ffff94087fc83d40>
[  109.570453] CR2: 0000000000000080

Fixes: ab10dccb11 ("rps: Inspect PPTP encapsulated by GRE to get flow hash")
Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-02 12:53:34 -05:00
Ilan peer 57629915d5 mac80211: Fix addition of mesh configuration element
The code was setting the capabilities byte to zero,
after it was already properly set previously. Fix it.

The bug was found while debugging hwsim mesh tests failures
that happened since the commit mentioned below.

Fixes: 76f43b4c0a ("mac80211: Remove invalid flag operations in mesh TSF synchronization")
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02 11:58:26 +01:00
Johannes Berg 35f432a03e mac80211: initialize fast-xmit 'info' later
In ieee80211_xmit_fast(), 'info' is initialized to point to the skb
that's passed in, but that skb may later be replaced by a clone (if
it was shared), leading to an invalid pointer.

This can lead to use-after-free and also later crashes since the
real SKB's info->hw_queue doesn't get initialized properly.

Fix this by assigning info only later, when it's needed, after the
skb replacement (may have) happened.

Cc: stable@vger.kernel.org
Reported-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2017-01-02 11:28:25 +01:00
Guillaume Nault a9b2dff80b l2tp: take remote address into account in l2tp_ip and l2tp_ip6 socket lookups
For connected sockets, __l2tp_ip{,6}_bind_lookup() needs to check the
remote IP when looking for a matching socket. Otherwise a connected
socket can receive traffic not originating from its peer.

Drop l2tp_ip_bind_lookup() and l2tp_ip6_bind_lookup() instead of
updating their prototype, as these functions aren't used.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:07:20 -05:00
Guillaume Nault 97b84fd6d9 l2tp: consider '::' as wildcard address in l2tp_ip6 socket lookup
An L2TP socket bound to the unspecified address should match with any
address. If not, it can't receive any packet and __l2tp_ip6_bind_lookup()
can't prevent another socket from binding on the same device/tunnel ID.

While there, rename the 'addr' variable to 'sk_laddr' (local addr), to
make following patch clearer.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:07:20 -05:00
Reiter Wolfgang 4200462d88 drop_monitor: add missing call to genlmsg_end
Update nlmsg_len field with genlmsg_end to enable userspace processing
using nlmsg_next helper. Also adds error handling.

Signed-off-by: Reiter Wolfgang <wr0112358@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 22:00:26 -05:00
Eric Biggers e1a3a60a2e net: socket: don't set sk_uid to garbage value in ->setattr()
->setattr() was recently implemented for socket files to sync the socket
inode's uid to the new 'sk_uid' member of struct sock.  It does this by
copying over the ia_uid member of struct iattr.  However, ia_uid is
actually only valid when ATTR_UID is set in ia_valid, indicating that
the uid is being changed, e.g. by chown.  Other metadata operations such
as chmod or utimes leave ia_uid uninitialized.  Therefore, sk_uid could
be set to a "garbage" value from the stack.

Fix this by only copying the uid over when ATTR_UID is set.

Fixes: 86741ec254 ("net: core: Add a UID field to struct sock.")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-01 11:53:34 -05:00
Simon Wunderlich 32bcad4b7f batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2017-01-01 00:09:49 +01:00
David Ahern 7bb387c5ab net: Allow IP_MULTICAST_IF to set index to L3 slave
IP_MULTICAST_IF fails if sk_bound_dev_if is already set and the new index
does not match it. e.g.,

    ntpd[15381]: setsockopt IP_MULTICAST_IF 192.168.1.23 fails: Invalid argument

Relax the check in setsockopt to allow setting mc_index to an L3 slave if
sk_bound_dev_if points to an L3 master.

Make a similar change for IPv6. In this case change the device lookup to
take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
the lookup of a potential L3 master device.

This really only silences a setsockopt failure since uses of mc_index are
secondary to sk_bound_dev_if if it is set. In both cases, if either index
is an L3 slave or master, lookups are directed to the same FIB table so
relaxing the check at setsockopt time causes no harm.

Patch is based on a suggested change by Darwin for a problem noted in
their code base.

Suggested-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-30 15:24:47 -05:00
Artur Molchanov 14221cc45c bridge: netfilter: Fix dropping packets that moving through bridge interface
Problem:
br_nf_pre_routing_finish() calls itself instead of
br_nf_pre_routing_finish_bridge(). Due to this bug reverse path filter drops
packets that go through bridge interface.

User impact:
Local docker containers with bridge network can not communicate with each
other.

Fixes: c5136b15ea ("netfilter: bridge: add and use br_nf_hook_thresh")
Signed-off-by: Artur Molchanov <artur.molchanov@synesis.ru>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-30 18:22:50 +01:00
David Ahern f5a0aab84b net: ipv4: dst for local input routes should use l3mdev if relevant
IPv4 output routes already use l3mdev device instead of loopback for dst's
if it is applicable. Change local input routes to do the same.

This fixes icmp responses for unreachable UDP ports which are directed
to the wrong table after commit 9d1a6c4ea4 because local_input
routes use the loopback device. Moving from ingress device to loopback
loses the L3 domain causing responses based on the dst to get to lost.

Fixes: 9d1a6c4ea4 ("net: icmp_route_lookup should use rt dev to
		       determine L3 domain")
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 22:27:23 -05:00
Florian Fainelli 3a543ef479 net: dsa: Implement ndo_get_phys_port_id
Implement ndo_get_phys_port_id() by returning the physical port number
of the switch this per-port DSA created network interface corresponds
to.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 22:16:53 -05:00
Matthias Tafelmeier 3d48b53fb2 net: dev_weight: TX/RX orthogonality
Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.

This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).

A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.

This patch therefore introduces an independent configurability via sysctl to
userland.

Signed-off-by: Matthias Tafelmeier <matthias.tafelmeier@gmx.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 15:38:35 -05:00
Marcelo Ricardo Leitner bfd2e4b873 sctp: refactor sctp_datamsg_from_user
This patch refactors sctp_datamsg_from_user() in an attempt to make it
better to read and avoid code duplication for handling the last
fragment.

It also avoids doing division and remaining operations. Even though, it
should still operate similarly as before this patch.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:44:03 -05:00
Wei Zhang f0c16ba893 net: fix incorrect original ingress device index in PKTINFO
When we send a packet for our own local address on a non-loopback
interface (e.g. eth0), due to the change had been introduced from
commit 0b922b7a82 ("net: original ingress device index in PKTINFO"), the
original ingress device index would be set as the loopback interface.
However, the packet should be considered as if it is being arrived via the
sending interface (eth0), otherwise it would break the expectation of the
userspace application (e.g. the DHCPRELEASE message from dhcp_release
binary would be ignored by the dnsmasq daemon, since it come from lo which
is not the interface dnsmasq bind to)

Fixes: 0b922b7a82 ("net: original ingress device index in PKTINFO")
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:07:41 -05:00
Mathias Krause 4775cc1f2d rtnl: stats - add missing netlink message size checks
We miss to check if the netlink message is actually big enough to contain
a struct if_stats_msg.

Add a check to prevent userland from sending us short messages that would
make us access memory beyond the end of the message.

Fixes: 10c9ead9f3 ("rtnetlink: add new RTM_GETSTATS message to dump...")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 14:05:15 -05:00
Dave Jones de8499cee5 ipv6: remove unnecessary inet6_sk check
np is already assigned in the variable declaration of ping_v6_sendmsg.
At this point, we have already dereferenced np several times, so the
NULL check is also redundant.

Suggested-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 12:05:49 -05:00
Zheng Li e4c5e13aa4 ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
There is an inconsistent conditional judgement between __ip6_append_data
and ip6_finish_output functions, the variable length in __ip6_append_data
just include the length of application's payload and udp6 header, don't
include the length of ipv6 header, but in ip6_finish_output use
(skb->len > ip6_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ipv6 header.

That causes some particular application's udp6 payloads whose length are
between (MTU - IPv6 Header) and MTU were fragmented by ip6_fragment even
though the rst->dev support UFO feature.

Add the length of ipv6 header to length in __ip6_append_data to keep
consistent conditional judgement as ip6_finish_output for ip6 fragment.

Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:55:17 -05:00
Haishuang Yan fee83d097b ipv4: Namespaceify tcp_max_syn_backlog knob
Different namespace application might require different maximal
number of remembered connection requests.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:38:31 -05:00
Haishuang Yan 1946e672c1 ipv4: Namespaceify tcp_tw_recycle and tcp_max_tw_buckets knob
Different namespace application might require fast recycling
TIME-WAIT sockets independently of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-29 11:38:31 -05:00
Augusto Mecking Caringi 9dd0f896d2 net: atm: Fix warnings in net/atm/lec.c when !CONFIG_PROC_FS
This patch fixes the following warnings when CONFIG_PROC_FS is not set:

linux/net/atm/lec.c: In function ‘lane_module_cleanup’:
linux/net/atm/lec.c:1062:27: error: ‘atm_proc_root’ undeclared (first
use in this function)
remove_proc_entry("lec", atm_proc_root);
                           ^
linux/net/atm/lec.c:1062:27: note: each undeclared identifier is
reported only once for each function it appears in

Signed-off-by: Augusto Mecking Caringi <augustocaringi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 15:11:32 -05:00
Paul Blakey 0df0f207aa net/sched: cls_flower: Fix missing addr_type in classify
Since we now use a non zero mask on addr_type, we are matching on its
value (IPV4/IPV6). So before this fix, matching on enc_src_ip/enc_dst_ip
failed in SW/classify path since its value was zero.
This patch sets the proper value of addr_type for encapsulated packets.

Fixes: 970bfcd097 ('net/sched: cls_flower: Use mask for addr_type')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:28:13 -05:00
Marcelo Ricardo Leitner b77b7565a6 sctp: add pr_debug for tracking asocs not found
This pr_debug may help identify why the system is generating some
Aborts. It's not something a sysadmin would be expected to use.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:26:17 -05:00
Marcelo Ricardo Leitner 509e7a311f sctp: sctp_chunk_length_valid should return bool
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:06:31 -05:00
Marcelo Ricardo Leitner 66b91d2cd0 sctp: remove return value from sctp_packet_init/config
There is no reason to use this cascading. It doesn't add anything.
Let's remove it and simplify.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:06:31 -05:00
Marcelo Ricardo Leitner 0630c56e40 sctp: simplify addr copy
Make it a bit easier to read.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:06:31 -05:00
Marcelo Ricardo Leitner 1ff0156167 sctp: reduce indent level in sctp_sf_shut_8_4_5
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:06:30 -05:00
Marcelo Ricardo Leitner eab59075d3 sctp: reduce indent level at sctp_sf_tabort_8_4_8
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-28 14:06:30 -05:00
Linus Torvalds 8f18e4d03e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.

    The most important is to not assume the packet is RX just because
    the destination address matches that of the device. Such an
    assumption causes problems when an interface is put into loopback
    mode.

 2) If we retry when creating a new tc entry (because we dropped the
    RTNL mutex in order to load a module, for example) we end up with
    -EAGAIN and then loop trying to replay the request. But we didn't
    reset some state when looping back to the top like this, and if
    another thread meanwhile inserted the same tc entry we were trying
    to, we re-link it creating an enless loop in the tc chain. Fix from
    Daniel Borkmann.

 3) There are two different WRITE bits in the MDIO address register for
    the stmmac chip, depending upon the chip variant. Due to a bug we
    could set them both, fix from Hock Leong Kweh.

 4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  net: stmmac: fix incorrect bit set in gmac4 mdio addr register
  r8169: add support for RTL8168 series add-on card.
  net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
  openvswitch: upcall: Fix vlan handling.
  ipv4: Namespaceify tcp_tw_reuse knob
  net: korina: Fix NAPI versus resources freeing
  net, sched: fix soft lockup in tc_classify
  net/mlx4_en: Fix user prio field in XDP forward
  tipc: don't send FIN message from connectionless socket
  ipvlan: fix multicast processing
  ipvlan: fix various issues in ipvlan_process_multicast()
2016-12-27 16:04:37 -08:00
Jason Wang be26727772 net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
After commit 73b62bd085 ("virtio-net:
remove the warning before XDP linearizing"), there's no users for
bpf_warn_invalid_xdp_buffer(), so remove it. This is a revert for
commit f23bc46c30.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
pravin shelar df30f7408b openvswitch: upcall: Fix vlan handling.
Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.

Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets").
CC: Jarno Rajahalme <jarno@ovn.org>
CC: Jiri Benc <jbenc@redhat.com>
Signed-off-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
Haishuang Yan 56ab6b9300 ipv4: Namespaceify tcp_tw_reuse knob
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.

Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-27 12:28:07 -05:00
Daniel Borkmann 628185cfdd net, sched: fix soft lockup in tc_classify
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.

What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.

This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.

This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.

tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.

Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit
12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").

Fixes: 12186be7d2 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup")
Reported-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Tested-by: Shahar Klein <shahark@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-26 11:24:10 -05:00
Thomas Gleixner 1f3a8e49d8 ktime: Get rid of ktime_equal()
No point in going through loops and hoops instead of just comparing the
values.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:23 +01:00
Thomas Gleixner 8b0e195314 ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner 2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Jon Paul Maloy 693c56491f tipc: don't send FIN message from connectionless socket
In commit 6f00089c73 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.

We fix this in this commit.

Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 17:53:47 -05:00
Marcelo Ricardo Leitner 1636098c46 sctp: fix recovering from 0 win with small data chunks
Currently if SCTP closes the receive window with window pressure, mostly
caused by excessive skb overhead on payload/overheads ratio, SCTP will
close the window abruptly while saving the delta on rwnd_press. It will
start recovering rwnd as the chunks are consumed by the application and
the rwnd_press will be only recovered after rwnd reach the same value as
of rwnd_press, mostly to prevent silly window syndrome.

Thing is, this is very inefficient with small data chunks, as with those
it will never reach back that value, and thus it will never recover from
such pressure. This means that we will not issue window updates when
recovering from 0 window and will rely on a sender retransmit to notice
it.

The fix here is to remove such threshold, as no value is good enough: it
depends on the (avg) chunk sizes being used.

Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by
sending SIGSTOP to netserver, sleep 1.2, and SIGCONT.
Rate limited to 845kbps, for visibility. Capture done at netserver side.

Previously:
01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [
01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS
01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS
01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS
01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap
  (window update missed)
04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS
04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g
04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS
04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS
04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g

After:
02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153]
02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S
02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S
02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S
02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga
03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508]
03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017]
03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526]
03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035]
03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544]
03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053]
03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562]
  (following data transmission triggered by window updates above)
03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S
03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000]
03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S
03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S
03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894]
03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S
03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 14:01:35 -05:00
Marcelo Ricardo Leitner 58b94d88de sctp: do not loose window information if in rwnd_over
It's possible that we receive a packet that is larger than current
window. If it's the first packet in this way, it will cause it to
increase rwnd_over. Then, if we receive another data chunk (specially as
SCTP allows you to have one data chunk in flight even during 0 window),
rwnd_over will be overwritten instead of added to.

In the long run, this could cause the window to grow bigger than its
initial size, as rwnd_over would be charged only for the last received
data chunk while the code will try open the window for all packets that
were received and had its value in rwnd_over overwritten. This, then,
can lead to the worsening of payload/buffer ratio and cause rwnd_press
to kick in more often.

The fix is to sum it too, same as is done for rwnd_press, so that if we
receive 3 chunks after closing the window, we still have to release that
same amount before re-opening it.

Log snippet from sctp_test exhibiting the issue:
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)
[  146.209232] sctp: sctp_assoc_rwnd_decrease:
association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1!
[  146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000
rwnd decreased by 1 to (0, 1, 114221)

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 14:01:35 -05:00
Ido Schimmel 53f800e3ba neigh: Send netevent after marking neigh as dead
neigh_cleanup_and_release() is always called after marking a neighbour
as dead, but it only notifies user space and not in-kernel listeners of
the netevent notification chain.

This can cause multiple problems. In my specific use case, it causes the
listener (a switch driver capable of L3 offloads) to believe a neighbour
entry is still valid, and is thus erroneously kept in the device's
table.

Fix that by sending a netevent after marking the neighbour as dead.

Fixes: a6bf9e933d ("mlxsw: spectrum_router: Offload neighbours based on NUD state change")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:31:18 -05:00
Dave Jones a98f917589 ipv6: handle -EFAULT from skb_copy_bits
By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:20:39 -05:00
Willem de Bruijn 39b2dd765e inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within
the packet. For sockets that have transport headers pulled, transport
offset can be negative. Use signed comparison to avoid overflow.

Fixes: e6afc8ace6 ("udp: remove headers from UDP packets before queueing")
Reported-by: Nisar Jagabar <njagabar@cloudmark.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 12:19:18 -05:00
Or Gerlitz d9724772e6 net/sched: cls_flower: Mandate mask when matching on flags
When matching on flags, we should require the user to provide the
mask and avoid using an all-ones mask. Not doing so causes matching
on flags provided w.o mask to hit on the value being unset for all
flags, which may not what the user wanted to happen.

Fixes: faa3ffce78 ('net/sched: cls_flower: Add support for matching on flags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Paul Blakey <paulb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 11:59:56 -05:00
Or Gerlitz dc594ecd41 net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6
The UDP dst port was provided to the helper function which sets the
IPv6 IP tunnel meta-data under a wrong param order, fix that.

Fixes: 75bfbca01e ('net/sched: act_tunnel_key: Add UDP dst port option')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23 11:59:56 -05:00
Xin Long 6c5d5cfbe3 netfilter: ipt_CLUSTERIP: check duplicate config when initializing
Now when adding an ipt_CLUSTERIP rule, it only checks duplicate config in
clusterip_config_find_get(). But after that, there may be still another
thread to insert a config with the same ip, then it leaves proc_create_data
to do duplicate check.

It's more reasonable to check duplicate config by ipt_CLUSTERIP itself,
instead of checking it by proc fs duplicate file check. Before, when proc
fs allowed duplicate name files in a directory, It could even crash kernel
because of use-after-free.

This patch is to check duplicate config under the protection of clusterip
net lock when initializing a new config and correct the return err.

Note that it also moves proc file node creation after adding new config, as
proc_create_data may sleep, it couldn't be called under the clusterip_net
lock. clusterip_config_find_get returns NULL if c->pde is null to make sure
it can't be used until the proc file node creation is done.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-23 15:20:01 +01:00
Lorenzo Colitti 7d99569460 net: ipv4: Don't crash if passing a null sk to ip_do_redirect.
Commit e2d118a1cb ("net: inet: Support UID-based routing in IP
protocols.") made ip_do_redirect call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.

Fix this by getting the network namespace from the skb instead.

Fixes: e2d118a1cb ("net: inet: Support UID-based routing in IP protocols.")
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-22 11:15:10 -05:00
Eric Dumazet 0a9648f129 tcp: add a missing barrier in tcp_tasklet_func()
Madalin reported crashes happening in tcp_tasklet_func() on powerpc64

Before TSQ_QUEUED bit is cleared, we must ensure the changes done
by list_del(&tp->tsq_node); are committed to memory, otherwise
corruption might happen, as an other cpu could catch TSQ_QUEUED
clearance too soon.

We can notice that old kernels were immune to this bug, because
TSQ_QUEUED was cleared after a bh_lock_sock(sk)/bh_unlock_sock(sk)
section, but they could have missed a kick to write additional bytes,
when NIC interrupts for a given flow are spread to multiple cpus.

Affected TCP flows would need an incoming ACK or RTO timer to add more
packets to the pipe. So overall situation should be better now.

Fixes: b223feb9de ("tcp: tsq: add shortcut in tcp_tasklet_func()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Madalin Bucur <madalin.bucur@nxp.com>
Tested-by: Madalin Bucur <madalin.bucur@nxp.com>
Tested-by: Xing Lei <xing.lei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-21 15:30:27 -05:00
Geliang Tang a763f78cea RDS: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:49 -05:00
Geliang Tang 7f7cd56c33 net_sched: sch_netem: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:48 -05:00
Geliang Tang e124557d60 net_sched: sch_fq: use rb_entry()
To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:22:48 -05:00
Xin Long b8607805dd sctp: not copying duplicate addrs to the assoc's bind address list
sctp.local_addr_list is a global address list that is supposed to include
all the local addresses. sctp updates this list according to NETDEV_UP/
NETDEV_DOWN notifications.

However, if multiple NICs have the same address, the global list would
have duplicate addresses. Even if for one NIC, promote secondaries in
__inet_del_ifa can also lead to accumulating duplicate addresses.

When sctp binds address 'ANY' and creates a connection, it copies all
the addresses from global list into asoc's bind addr list, which makes
sctp pack the duplicate addresses into INIT/INIT_ACK packets.

This patch is to filter the duplicate addresses when copying the addrs
from global list in sctp_copy_local_addr_list and unpacking addr_param
from cookie in sctp_raw_to_bind_addrs to asoc's bind addr list.

Note that we can't filter the duplicate addrs when global address list
gets updated, As NETDEV_DOWN event may remove an addr that still exists
in another NIC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:15:45 -05:00
Xin Long 165f2cf640 sctp: reduce indent level in sctp_copy_local_addr_list
This patch is to reduce indent level by using continue when the addr
is not allowed, and also drop end_copy by using break.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:15:44 -05:00
Jarno Rajahalme 87e159c59d openvswitch: Add a missing break statement.
Add a break statement to prevent fall-through from
OVS_KEY_ATTR_ETHERNET to OVS_KEY_ATTR_TUNNEL.  Without the break
actions setting ethernet addresses fail to validate with log messages
complaining about invalid tunnel attributes.

Fixes: 0a6410fbde ("openvswitch: netlink: support L3 packets")
Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 14:07:41 -05:00
zheng li 0a28cfd51e ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
There is an inconsistent conditional judgement in __ip_append_data and
ip_finish_output functions, the variable length in __ip_append_data just
include the length of application's payload and udp header, don't include
the length of ip header, but in ip_finish_output use
(skb->len > ip_skb_dst_mtu(skb)) as judgement, and skb->len include the
length of ip header.

That causes some particular application's udp payload whose length is
between (MTU - IP Header) and MTU were fragmented by ip_fragment even
though the rst->dev support UFO feature.

Add the length of ip header to length in __ip_append_data to keep
consistent conditional judgement as ip_finish_output for ip fragment.

Signed-off-by: Zheng Li <james.z.li@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-20 10:45:22 -05:00
Johannes Berg 7b854982b2 Revert "rfkill: Add rfkill-any LED trigger"
This reverts commit 73f4f76a19.

As Mike reported, and I should've seen in review, we can't call
the new LED functions, which acquire the mutex, from places like
rfkill_set_sw_state() that are documented to be callable from
any context the user likes to use. For Mike's case it led to a
deadlock, but other scenarios are possible.

Reported-by: Михаил Кринкин <krinkin.m.u@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-20 08:45:54 +01:00
Linus Torvalds 52f40e9d65 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes and cleanups from David Miller:

 1) Revert bogus nla_ok() change, from Alexey Dobriyan.

 2) Various bpf validator fixes from Daniel Borkmann.

 3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

 4) Several ethtool ksettings conversions from Philippe Reynes.

 5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

 6) XDP support for virtio_net, from John Fastabend.

 7) Fix NAT handling within a vrf, from David Ahern.

 8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
  net: mv643xx_eth: fix build failure
  isdn: Constify some function parameters
  mlxsw: spectrum: Mark split ports as such
  cgroup: Fix CGROUP_BPF config
  qed: fix old-style function definition
  net: ipv6: check route protocol when deleting routes
  r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
  irda: w83977af_ir: cleanup an indent issue
  net: sfc: use new api ethtool_{get|set}_link_ksettings
  net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
  net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
  net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
  net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
  bpf: fix mark_reg_unknown_value for spilled regs on map value marking
  bpf: fix overflow in prog accounting
  bpf: dynamically allocate digest scratch buffer
  gtp: Fix initialization of Flags octet in GTPv1 header
  gtp: gtp_check_src_ms_ipv4() always return success
  net/x25: use designated initializers
  isdn: use designated initializers
  ...
2016-12-17 20:17:04 -08:00
David S. Miller 67a72a5891 Three fixes:
* avoid a WARN_ON() when trying to use WEP with AP_VLANs
  * ensure enough headroom on mesh forwarding packets
  * don't report unknown/invalid rates to userspace
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCgAGBQJYU+BMAAoJEGt7eEactAAdtnkP/1/Rftp3Ll2m+EqJn1yNUCiQ
 K/XC5xT1khSavb/+Tdv4W8Fy40y008Ljrn5a4erB1g0YhXQXbtp43hxmHrWa9Q8q
 pDngQspUUI14hX9JBaWsvMnvXNjy5KK2DsEHxdmjEYy8+eSzqwefc0VqBFvQoEhT
 X3WWTm4rgn2dtI9QhfSKp4eyStNRjavqAtUhPI1TqCYZRyLrICQ2nQ6yrTjbEN9Z
 Wtwgo3VUTv30MuSuijklgi7mXnY+yjj35qBYMqlnKtArSrMIVRExwN9LybWOVCOx
 k9o+aDJCNX21V1l2TC5IlRksQBbjhjcMP71v/SFTYJlmph8eS4ZtrApoZyr0v5TD
 1IBfgyHDl4jcsFi1v+PtgpGhh0OV+KHeh1k2lK2YTq2UfPdiuevv1OhAp3H0ZyPj
 ZpSEKpdKqOV/R8G3g6K0YTnf/Jvd5F6aVZcrgeQGNUD8QNDavUrdcfl1sqwCMrl6
 LVZmVh8bYPKEqGzpsG8Xa//4eu6Xbgz9ny49CXwLcpVnUbhOXZLawGoYd4g9Tb0/
 j3HbM5KNNCp5U9EY9jQ22ih2xgt3ntiBkZUNPUczroqZ1U1v/rdusOb7oiF520jo
 iWQeE2+XUYHIAokVE4iMP2SS55LI1BkDr5cGtysw5PSneVzpnLSMi3qIa2nZV1jA
 Z6w0fDb5gvYVpt0viXQE
 =QHRs
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2016-12-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Three fixes:
 * avoid a WARN_ON() when trying to use WEP with AP_VLANs
 * ensure enough headroom on mesh forwarding packets
 * don't report unknown/invalid rates to userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 21:41:33 -05:00
Mantas M c2ed1880fd net: ipv6: check route protocol when deleting routes
The protocol field is checked when deleting IPv4 routes, but ignored for
IPv6, which causes problems with routing daemons accidentally deleting
externally set routes (observed by multiple bird6 users).

This can be verified using `ip -6 route del <prefix> proto something`.

Signed-off-by: Mantas Mikulėnas <grawity@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 21:37:06 -05:00
Kees Cook e999cb43d5 net/x25: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
Kees Cook 9d1c0ca5e1 net: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
Kees Cook 99a5e178bd ATM: use designated initializers
Prepare to mark sensitive kernel structures for randomization by making
sure they're using designated initializers. These were identified during
allyesconfig builds of x86, arm, and arm64, with most initializer fixes
extracted from grsecurity.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:56:57 -05:00
John Fastabend f23bc46c30 net: xdp: add invalid buffer warning
This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:48:55 -05:00
Xin Long 08abb79542 sctp: sctp_transport_lookup_process should rcu_read_unlock when transport is null
Prior to this patch, sctp_transport_lookup_process didn't rcu_read_unlock
when it failed to find a transport by sctp_addrs_lookup_transport.

This patch is to fix it by moving up rcu_read_unlock right before checking
transport and also to remove the out path.

Fixes: 1cceda7849 ("sctp: fix the issue sctp_diag uses lock_sock in rcu_read_lock")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:43:23 -05:00
Xin Long 5cb2cd68dd sctp: sctp_epaddr_lookup_transport should be protected by rcu_read_lock
Since commit 7fda702f93 ("sctp: use new rhlist interface on sctp transport
rhashtable"), sctp has changed to use rhlist_lookup to look up transport, but
rhlist_lookup doesn't call rcu_read_lock inside, unlike rhashtable_lookup_fast.

It is called in sctp_epaddr_lookup_transport and sctp_addrs_lookup_transport.
sctp_addrs_lookup_transport is always in the protection of rcu_read_lock(),
as __sctp_lookup_association is called in rx path or sctp_lookup_association
which are in the protection of rcu_read_lock() already.

But sctp_epaddr_lookup_transport is called by sctp_endpoint_lookup_assoc, it
doesn't call rcu_read_lock, which may cause "suspicious rcu_dereference_check
usage' in __rhashtable_lookup.

This patch is to fix it by adding rcu_read_lock in sctp_endpoint_lookup_assoc
before calling sctp_epaddr_lookup_transport.

Fixes: 7fda702f93 ("sctp: use new rhlist interface on sctp transport rhashtable")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:43:23 -05:00
LABBE Corentin 616f6b4023 irda: irnet: add member name to the miscdevice declaration
Since the struct miscdevice have many members, it is dangerous to init
it without members name relying only on member order.

This patch add member name to the init declaration.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:29:31 -05:00
LABBE Corentin 33de4d1bb9 irda: irnet: Remove unused IRNET_MAJOR define
The IRNET_MAJOR define is not used, so this patch remove it.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:17:24 -05:00
LABBE Corentin 24c946cc5d irnet: ppp: move IRNET_MINOR to include/linux/miscdevice.h
This patch move the define for IRNET_MINOR to include/linux/miscdevice.h
It is better that all minor number definitions are in the same place.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:17:24 -05:00
LABBE Corentin e292823559 irda: irnet: Move linux/miscdevice.h include
The only use of miscdevice is irda_ppp so no need to include
linux/miscdevice.h for all irda files.
This patch move the linux/miscdevice.h include to irnet_ppp.h

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:16:31 -05:00
LABBE Corentin 078497a4d9 irda: irproc.c: Remove unneeded linux/miscdevice.h include
irproc.c does not use any miscdevice so this patch remove this
unnecessary inclusion.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:16:31 -05:00
Tom Herbert 0643ee4fd1 inet: Fix get port to handle zero port number with soreuseport set
A user may call listen with binding an explicit port with the intent
that the kernel will assign an available port to the socket. In this
case inet_csk_get_port does a port scan. For such sockets, the user may
also set soreuseport with the intent a creating more sockets for the
port that is selected. The problem is that the initial socket being
opened could inadvertently choose an existing and unreleated port
number that was already created with soreuseport.

This patch adds a boolean parameter to inet_bind_conflict that indicates
rather soreuseport is allowed for the check (in addition to
sk->sk_reuseport). In calls to inet_bind_conflict from inet_csk_get_port
the argument is set to true if an explicit port is being looked up (snum
argument is nonzero), and is false if port scan is done.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:13:19 -05:00
Tom Herbert 9af7e923fd inet: Don't go into port scan when looking for specific bind port
inet_csk_get_port is called with port number (snum argument) that may be
zero or nonzero. If it is zero, then the intent is to find an available
ephemeral port number to bind to. If snum is non-zero then the caller
is asking to allocate a specific port number. In the latter case we
never want to perform the scan in ephemeral port range. It is
conceivable that this can happen if the "goto again" in "tb_found:"
is done. This patch adds a check that snum is zero before doing
the "goto again".

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 11:13:19 -05:00
Paul Blakey f93bd17b91 net/sched: cls_flower: Use masked key when calling HW offloads
Zero bits on the mask signify a "don't care" on the corresponding bits
in key. Some HWs require those bits on the key to be zero. Since these
bits are masked anyway, it's okay to provide the masked key to all
drivers.

Fixes: 5b33f48842 ('net/flower: Introduce hardware offload support')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 10:44:35 -05:00
Paul Blakey 970bfcd097 net/sched: cls_flower: Use mask for addr_type
When addr_type is set, mask should also be set.

Fixes: 66530bdf85 ('sched,cls_flower: set key address type when present')
Fixes: bc3103f1ed ('net/sched: cls_flower: Classify packet in ip tunnels')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-17 10:44:35 -05:00
Linus Torvalds 59331c215d A varied set of changes:
- a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
   (myself).  Also fixed a deadlock caused by a bogus allocation on the
   writeback path and authorize reply verification.
 
 - a fix for long stalls during fsync (Jeff Layton).  The client now
   has a way to force the MDS log flush, leading to ~100x speedups in
   some synthetic tests.
 
 - a new [no]require_active_mds mount option (Zheng Yan).  On mount, we
   will now check whether any of the MDSes are available and bail rather
   than block if none are.  This check can be avoided by specifying the
   "no" option.
 
 - a couple of MDS cap handling fixes and a few assorted patches
   throughout.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJYVByGAAoJEEp/3jgCEfOLBqkH/A7nVf7ObSDYmLuYgg1gJ8zq
 4zDDE42S4yZwayAVpn3UjbfPuez5J44lsdXitExdfiHOdIQZDa/WqAbSqQ48HCSg
 7sG6ecRWg3G5zG0psPZnB+S5wGMvsLXmj2hvzV1lt2t0lI5bDLSlNRSnElbhilD/
 8Z7+Ni2go8DMC9o49SJU32lBW7IByKl4p4flveItgwUvGkIFNd8OT3CyPBUqonQs
 lRCeImRYU8Jghb+ifnRxWSbuDf7pZAPc9kL0vibpUUT/1bH6iHsedKp37WQKqc/w
 KDSNnKiZcz0gY/hJeLqE3ymCIKO6SU+JkMQSaYNTouLO5fQsRr8/uWQXSe6S5oc=
 =ypWx
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A varied set of changes:

   - a large rework of cephx auth code to cope with CONFIG_VMAP_STACK
     (myself). Also fixed a deadlock caused by a bogus allocation on the
     writeback path and authorize reply verification.

   - a fix for long stalls during fsync (Jeff Layton). The client now
     has a way to force the MDS log flush, leading to ~100x speedups in
     some synthetic tests.

   - a new [no]require_active_mds mount option (Zheng Yan).

     On mount, we will now check whether any of the MDSes are available
     and bail rather than block if none are. This check can be avoided
     by specifying the "no" option.

   - a couple of MDS cap handling fixes and a few assorted patches
     throughout"

* tag 'ceph-for-4.10-rc1' of git://github.com/ceph/ceph-client: (32 commits)
  libceph: remove now unused finish_request() wrapper
  libceph: always signal completion when done
  ceph: avoid creating orphan object when checking pool permission
  ceph: properly set issue_seq for cap release
  ceph: add flags parameter to send_cap_msg
  ceph: update cap message struct version to 10
  ceph: define new argument structure for send_cap_msg
  ceph: move xattr initialzation before the encoding past the ceph_mds_caps
  ceph: fix minor typo in unsafe_request_wait
  ceph: record truncate size/seq for snap data writeback
  ceph: check availability of mds cluster on mount
  ceph: fix splice read for no Fc capability case
  ceph: try getting buffer capability for readahead/fadvise
  ceph: fix scheduler warning due to nested blocking
  ceph: fix printing wrong return variable in ceph_direct_read_write()
  crush: include mapper.h in mapper.c
  rbd: silence bogus -Wmaybe-uninitialized warning
  libceph: no need to drop con->mutex for ->get_authorizer()
  libceph: drop len argument of *verify_authorizer_reply()
  libceph: verify authorize reply on connect
  ...
2016-12-16 11:23:34 -08:00
Linus Torvalds ff0f962ca3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
 "This update contains:

   - try to clone on copy-up

   - allow renaming a directory

   - split source into managable chunks

   - misc cleanups and fixes

  It does not contain the read-only fd data inconsistency fix, which Al
  didn't like. I'll leave that to the next year..."

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (36 commits)
  ovl: fix reStructuredText syntax errors in documentation
  ovl: fix return value of ovl_fill_super
  ovl: clean up kstat usage
  ovl: fold ovl_copy_up_truncate() into ovl_copy_up()
  ovl: create directories inside merged parent opaque
  ovl: opaque cleanup
  ovl: show redirect_dir mount option
  ovl: allow setting max size of redirect
  ovl: allow redirect_dir to default to "on"
  ovl: check for emptiness of redirect dir
  ovl: redirect on rename-dir
  ovl: lookup redirects
  ovl: consolidate lookup for underlying layers
  ovl: fix nested overlayfs mount
  ovl: check namelen
  ovl: split super.c
  ovl: use d_is_dir()
  ovl: simplify lookup
  ovl: check lower existence of rename target
  ovl: rename: simplify handling of lower/merged directory
  ...
2016-12-16 10:58:12 -08:00
Linus Torvalds 759b2656b2 The one new feature is support for a new NFSv4.2 mode_umask attribute
that makes ACL inheritance a little more useful in environments that
 default to restrictive umasks.  Requires client-side support, also on
 its way for 4.10.
 
 Other than that, miscellaneous smaller fixes and cleanup, especially to
 the server rdma code.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYVAEqAAoJECebzXlCjuG+VM0QAKaR+ibSM31Ahpnrgit5/wrb
 n630KDFztO7iqEeuHfPQ4/n05T2QR0JWsLpjLMFvx88Gy4gyXYk9cuDPIrNKX1IS
 3/nnhBo0+EVnjODjufommCrtbPZlqOSsS3N03vWkB7rTi8QYsWBOThh+XLRJYOXo
 LZzJE1WmXNeCXV1kXPBsauryywql1fmwTXBzmIf1HbzoGAVROMEA2qqh4Z3nb7BP
 sJuGchWx0STBOuAa278ighXQPUW2lUft9uzw2bssOtMwfNyOs/Pd6nx4F1Lg6WwD
 1UQXoiR8K3PqelZfoeFJ05v0css/sbNKep+huWRdOXZj3Kjpa20lKBX8xHfat7sN
 1OQ4FHx8ToigX3c+wwtlCqRMCcIxqUYkRjqzPHyeBiSSSp0rLrId44rI5x/K0yay
 3bkGw7hFDSzc0Nq2uZgmtlbyTC71hLNhkWe7ThofcVG/pS0JtAqBiKIVwXJPh/e0
 PLmVHYGU6Xowjag5edJlXY1tlIlxtWfqsWUarCXS5bfKUa3UjMVSjyuljsDqqJsn
 96fEWu7DiUo4HeGYmf8MJoeZYV2y0DKSQGeguVkUKWp2DoTzinQHTfdKvrZVwNuu
 hVE9/QeWzUvPY13HOUaKD2skozhbUChqv0NHESKUv8gxE3svTEpYZkXrE74WNqMk
 l/WXAhw+RdKZof4+qdjU
 =JANY
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "The one new feature is support for a new NFSv4.2 mode_umask attribute
  that makes ACL inheritance a little more useful in environments that
  default to restrictive umasks. Requires client-side support, also on
  its way for 4.10.

  Other than that, miscellaneous smaller fixes and cleanup, especially
  to the server rdma code"

[ The client side of the umask attribute was merged yesterday ]

* tag 'nfsd-4.10' of git://linux-nfs.org/~bfields/linux:
  nfsd: add support for the umask attribute
  sunrpc: use DEFINE_SPINLOCK()
  svcrdma: Further clean-up of svc_rdma_get_inv_rkey()
  svcrdma: Break up dprintk format in svc_rdma_accept()
  svcrdma: Remove unused variable in rdma_copy_tail()
  svcrdma: Remove unused variables in xprt_rdma_bc_allocate()
  svcrdma: Remove svc_rdma_op_ctxt::wc_status
  svcrdma: Remove DMA map accounting
  svcrdma: Remove BH-disabled spin locking in svc_rdma_send()
  svcrdma: Renovate sendto chunk list parsing
  svcauth_gss: Close connection when dropping an incoming message
  svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm
  nfsd: constify reply_cache_stats_operations structure
  nfsd: update workqueue creation
  sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code
  nfsd: catch errors in decode_fattr earlier
  nfsd: clean up supported attribute handling
  nfsd: fix error handling for clients that fail to return the layout
  nfsd: more robust allocation failure handling in nfsd_reply_cache_init
2016-12-16 10:48:28 -08:00
Linus Torvalds 9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Arend Van Spriel 505a2e882b nl80211: rework {sched_,}scan event related functions
A couple of functions used with scan events were named with
term "send" although they were only preparing the the event
message so renamed those.

Also remove nl80211_send_sched_scan_results() in favor of
just calling nl80211_send_sched_scan() with the right value.

Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com>
[mention nl80211_send_sched_scan_results() in the commit log]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-16 13:32:31 +01:00
Miklos Szeredi beef5121f3 Revert "af_unix: fix hard linked sockets on overlay"
This reverts commit eb0a4a47ae.

Since commit 51f7e52dc9 ("ovl: share inode for hard link") there's no
need to call d_real_inode() to check two overlay inodes for equality.

Side effect of this revert is that it's no longer possible to connect one
socket on overlayfs to one on the underlying layer (something which didn't
make sense anyway).

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-12-16 11:02:53 +01:00
Arnd Bergmann f6b4122c93 rfkill: hide unused goto label
A cleanup introduced a harmless warning in some configurations:

net/rfkill/core.c: In function 'rfkill_init':
net/rfkill/core.c:1350:1: warning: label 'error_input' defined but not used [-Wunused-label]

This adds another #ifdef around the label to match that around the
caller.

Fixes: 6124c53ede ("rfkill: Cleanup error handling in rfkill_init()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-16 10:05:39 +01:00
Linus Torvalds 73e2e0c9b1 NFS client updates for Linux 4.10
Highlights include:
 
 Stable bugfixes:
  - Fix a pnfs deadlock between read resends and layoutreturn
  - Don't invalidate the layout stateid while a layout return is outstanding
  - Don't schedule a layoutreturn if the layout stateid is marked as invalid
  - On a pNFS error, do not send LAYOUTGET until the LAYOUTRETURN is complete
  - SUNRPC: fix refcounting problems with auth_gss messages.
 
 Features:
  - Add client support for the NFSv4 umask attribute.
  - NFSv4: Correct support for flock() stateids.
  - Add a LAYOUTRETURN operation to CLOSE and DELEGRETURN when return-on-close
    is specified
  - Allow the pNFS/flexfiles layoutstat information to piggyback on LAYOUTRETURN
  - Optimise away redundant GETATTR calls when doing state recovery and/or
    when not required by cache revalidation rules or close-to-open cache
    consistency.
  - Attribute cache improvements
  - RPC/RDMA support for SG_GAP devices
 
 Bugfixes:
  - NFS: Fix performance regressions in readdir
  - pNFS/flexfiles: Fix a deadlock on LAYOUTGET
  - NFSv4: Add missing nfs_put_lock_context()
  - NFSv4.1: Fix regression in callback retry handling
  - Fix false positive NFSv4.0 trunking detection.
  - pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated
  - Various layout stateid related bugfixes
  - RPC/RDMA bugfixes
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUyemAAoJEGcL54qWCgDy96wP/Ry86cknfLUqLKJCbFVV4nV8
 HdovCY8if8JQO0HUPDJ25ITvoRJNVRRwJMWnVq5XHRrPUHletDks6/UYfa63UDMv
 umHvGST1cQPU1G+vBIQ3sdkVi1X1GeyBY4rU8aDWxLyKWwyeNptCK12i80ifyaGV
 GZIIxuKVDOFS15M7NwMPRkrBacF8TyVK6S7275z6ZNmhFtvYwMAbvMxLabTwWAe8
 4A03m4RDBTYhQIc2xLJbHfOTYoHi34l90wrn3C7Wv0I2zp8EJlzCY2tSbYKhfPg7
 0HVKNdruRL+cHwLwJEcjFbxOg9MArgRxyup3dwAYQq7Ivsf9oR8/D61CDhanXAzy
 cAWyrCyxaAoPWCOb8k4OFRh6jOF9LBGb5WTNpXRi1LoGrbvi6/WLlJccV60325wd
 gmSAiwIE7aLG8pFk54J0Et86VaQ6qQNBUtJY/4m87uf1FSv3yzQvh7qDr7s+t8ZQ
 kmSTZJzMWZLEEeyvEPZCfjygFu7n4PuTePJu31217styvat39TpY2p0HaaMhgC0V
 /Y0ygGH7VlGp0oaVQ70CtBzGsCWTKU2DU8di7nvsCKg6iLv89QBILIJhVeP42tKd
 juNCWVw4bpW1Zex7HXKecKfMXkDJ4qSDLFzGWj6Ue85f/rCOSKQH01jfvwBlvtBc
 3E6fk85ExTw2+siHWiGy
 =MapM
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Stable bugfixes:
   - Fix a pnfs deadlock between read resends and layoutreturn
   - Don't invalidate the layout stateid while a layout return is
     outstanding
   - Don't schedule a layoutreturn if the layout stateid is marked as
     invalid
   - On a pNFS error, do not send LAYOUTGET until the LAYOUTRETURN is
     complete
   - SUNRPC: fix refcounting problems with auth_gss messages.

  Features:
   - Add client support for the NFSv4 umask attribute.
   - NFSv4: Correct support for flock() stateids.
   - Add a LAYOUTRETURN operation to CLOSE and DELEGRETURN when
     return-on-close is specified
   - Allow the pNFS/flexfiles layoutstat information to piggyback on
     LAYOUTRETURN
   - Optimise away redundant GETATTR calls when doing state recovery
     and/or when not required by cache revalidation rules or
     close-to-open cache consistency.
   - Attribute cache improvements
   - RPC/RDMA support for SG_GAP devices

  Bugfixes:
   - NFS: Fix performance regressions in readdir
   - pNFS/flexfiles: Fix a deadlock on LAYOUTGET
   - NFSv4: Add missing nfs_put_lock_context()
   - NFSv4.1: Fix regression in callback retry handling
   - Fix false positive NFSv4.0 trunking detection.
   - pNFS/flexfiles: Only send layoutstats updates for mirrors that were
     updated
   - Various layout stateid related bugfixes
   - RPC/RDMA bugfixes"

* tag 'nfs-for-4.10-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (82 commits)
  SUNRPC: fix refcounting problems with auth_gss messages.
  nfs: add support for the umask attribute
  pNFS/flexfiles: Ensure we have enough buffer for layoutreturn
  pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr()
  pNFS/flexfiles: Fix a deadlock on LAYOUTGET
  pNFS: Layoutreturn must free the layout after the layout-private data
  pNFS/flexfiles: Fix ff_layout_add_ds_error_locked()
  NFSv4: Add missing nfs_put_lock_context()
  pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid
  NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery()
  NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE
  NFS: Only look at the change attribute cache state in nfs_check_verifier
  NFS: Fix incorrect size revalidation when holding a delegation
  NFS: Fix incorrect mapping revalidation when holding a delegation
  pNFS/flexfiles: Support sending layoutstats in layoutreturn
  pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturn
  NFS: Fix up read of mirror stats
  pNFS/flexfiles: Clean up layoutstats
  pNFS/flexfiles: Refactor encoding of the layoutreturn payload
  pNFS: Add a layoutreturn callback to performa layout-private setup
  ...
2016-12-15 18:47:31 -08:00
Linus Torvalds ed3c5a0be3 virtio, vhost: new device, fixes, speedups
This includes the new virtio crypto device, and fixes all over the
 place.  In particular enabling endian-ness checks for sparse builds
 found some bugs which this fixes.  And it appears that everyone is in
 agreement that disabling endian-ness sparse checks shouldn't be
 necessary any longer.
 
 So this enables them for everyone, and drops __CHECK_ENDIAN__
 and __bitwise__ APIs.
 
 IRQ handling in virtio has been refactored somewhat, the
 larger switch to IRQ_SHARED will have to wait as
 it proved too aggressive.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYUxYEAAoJECgfDbjSjVRp5lgH/22HKRyb3+M+z3oH6R9rJmz5
 T4y3XI4yDOTlh93VzxlrHjHNBnoWRvzV5hn6BKH6bTbSZ87TabNhfws11FKGvhER
 G1ipl/DvwytvvWgZ5dFdcC4x/0wpWawt2jgpEpPP33VDVkGJFEEAGj6GX10ClX99
 ggrNfzUCHOAFaIWzC29i7gYMnYHIJDUqK6ycDxZebzsE/c12SNRGASxei2D+6eYC
 YkdVg0c/d7Wsk+ZO1ugiA6omO4UdvPAVvxUkvd4YphRikwEWH7gGuz558wiSo4VN
 iEMZvyYXSEjx4B2Hg8+mH63zWROEpCmaToUix9+4AF7YhkaeX5fICNdkAPdtxc8=
 =urXH
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:
 "virtio, vhost: new device, fixes, speedups

  This includes the new virtio crypto device, and fixes all over the
  place. In particular enabling endian-ness checks for sparse builds
  found some bugs which this fixes. And it appears that everyone is in
  agreement that disabling endian-ness sparse checks shouldn't be
  necessary any longer.

  So this enables them for everyone, and drops the __CHECK_ENDIAN__ and
  __bitwise__ APIs.

  IRQ handling in virtio has been refactored somewhat, the larger switch
  to IRQ_SHARED will have to wait as it proved too aggressive"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits)
  Makefile: drop -D__CHECK_ENDIAN__ from cflags
  fs/logfs: drop __CHECK_ENDIAN__
  Documentation/sparse: drop __CHECK_ENDIAN__
  linux: drop __bitwise__ everywhere
  checkpatch: replace __bitwise__ with __bitwise
  Documentation/sparse: drop __bitwise__
  tools: enable endian checks for all sparse builds
  linux/types.h: enable endian checks for all sparse builds
  virtio_mmio: Set dev.release() to avoid warning
  vhost: remove unused feature bit
  virtio_ring: fix description of virtqueue_get_buf
  vhost/scsi: Remove unused but set variable
  tools/virtio: use {READ,WRITE}_ONCE() in uaccess.h
  vringh: kill off ACCESS_ONCE()
  tools/virtio: fix READ_ONCE()
  crypto: add virtio-crypto driver
  vhost: cache used event for better performance
  vsock: lookup and setup guest_cid inside vhost_vsock_lock
  virtio_pci: split vp_try_to_find_vqs into INTx and MSI-X variants
  virtio_pci: merge vp_free_vectors into vp_del_vqs
  ...
2016-12-15 18:13:41 -08:00
Michael S. Tsirkin 6bdf1e0efb Makefile: drop -D__CHECK_ENDIAN__ from cflags
That's the default now, no need for makefiles to set it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Kalle Valo <kvalo@codeaurora.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
2016-12-16 00:13:43 +02:00
Michael S. Tsirkin 9efeccacd3 linux: drop __bitwise__ everywhere
__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Stefan Schmidt <stefan@osg.samsung.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Akced-by: Lee Duncan <lduncan@suse.com>
2016-12-16 00:13:41 +02:00
Linus Torvalds 4d5b57e05a Updates for 4.10 kernel merge window
- Shared mlx5 updates with net stack (will drop out on merge if Dave's
   tree has already been merged)
 - Driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe
 - Debug cleanups
 - New connection rejection helpers
 - SRP updates
 - Various misc fixes
 - New paravirt driver from vmware
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJYUbAPAAoJELgmozMOVy/dMXcP/iuG5MNzfN8Ny1JftyBQGWg3
 cqoQ2OLj9CsXjwVB+5EqbcZHRZY852lKONaLoDKkIOx4YAXO2YuIKOp944vN7EQx
 96wfqzT1F5jzAcy5mYZXgLaStGFDAwejKMqeHd0LfJj3OEtemGnVPWYzyqSQmSKo
 dzJraS1Z9GIRppzU5WaRpB9PtRBkqIqGJ5vZ0EKLGhed5hYY5r0iMJB0GfriMRDO
 lJ4UUVfpsAoLPnqDBFH6IMn2V2UeAw9IR5zNa1mrM1RBfvt/uYTxrw1w3p9WoaNs
 GRodhk4DCeAfeyqzVPNBLyXZ4Zq4FzGe3UWM4qysJ1RR4oFNw9Cuw0Fqk8mrfznr
 7hv5TpGIckRZiKf8l6e+qLirF0qGtXJg29j2vPVQI9i5nSj95g1agA81PnLQlLLb
 flWyxeMj81my7lfMHN1xcV6pqPEKMCOysZmfcvVfJd2XxpjuVD7ekl/YXWp8o8kU
 YPdQMqPD626XsD8VpPdMszb9FPmx0JD0HEv+Y1rIFX8JegEI+c3H2X0dqC27T/Ou
 FEPWOy025EgHm0Fh/7eIzkG6tjZ4JHoCugJAcxNZGj2XW4eB6r5vY8UwJ8iQRv+n
 PVYHiy0UoIRePh0mrdOSSphGZMi/GO/DsqKwCtAMEK43WqZQju6wR7QSIGkh66mp
 4uSHJqpf3YEYylxGMhk3
 =QeGy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma updates from Doug Ledford:
 "This is the complete update for the rdma stack for this release cycle.

  Most of it is typical driver and core updates, but there is the
  entirely new VMWare pvrdma driver. You may have noticed that there
  were changes in DaveM's pull request to the bnxt Ethernet driver to
  support a RoCE RDMA driver. The bnxt_re driver was tentatively set to
  be pulled in this release cycle, but it simply wasn't ready in time
  and was dropped (a few review comments still to address, and some
  multi-arch build issues like prefetch() not working across all
  arches).

  Summary:

   - shared mlx5 updates with net stack (will drop out on merge if
     Dave's tree has already been merged)

   - driver updates: cxgb4, hfi1, hns-roce, i40iw, mlx4, mlx5, qedr, rxe

   - debug cleanups

   - new connection rejection helpers

   - SRP updates

   - various misc fixes

   - new paravirt driver from vmware"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (210 commits)
  IB: Add vmw_pvrdma driver
  IB/mlx4: fix improper return value
  IB/ocrdma: fix bad initialization
  infiniband: nes: return value of skb_linearize should be handled
  MAINTAINERS: Update Intel RDMA RNIC driver maintainers
  MAINTAINERS: Remove Mitesh Ahuja from emulex maintainers
  IB/core: fix unmap_sg argument
  qede: fix general protection fault may occur on probe
  IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
  mlx5, calc_sq_size(): Make a debug message more informative
  mlx5: Remove a set-but-not-used variable
  mlx5: Use { } instead of { 0 } to init struct
  IB/srp: Make writing the add_target sysfs attr interruptible
  IB/srp: Make mapping failures easier to debug
  IB/srp: Make login failures easier to debug
  IB/srp: Introduce a local variable in srp_add_one()
  IB/srp: Fix CONFIG_DYNAMIC_DEBUG=n build
  IB/multicast: Check ib_find_pkey() return value
  IPoIB: Avoid reading an uninitialized member variable
  IB/mad: Fix an array index check
  ...
2016-12-15 12:03:32 -08:00
Thomas Huehn 38252e9ef6 mac80211: minstrel: avoid port control frames for sampling
Makes connections more reliable

Signed-off-by: Thomas Huehn <thomas.huehn@evernet-eg.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:53 +01:00
Felix Fietkau 73b16589b8 mac80211: minstrel_ht: remove obsolete #if for >= 3 streams
This was added during early development when 3x3 hardware was not very
common yet. This is completely unnecessary now.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:53 +01:00
Felix Fietkau 4cae4cd10d mac80211: minstrel: make prob_ewma u16 instead of u32
Saves about 1.2 KiB memory per station

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:53 +01:00
Felix Fietkau 4f0bc9c61b mac80211: minstrel: store probability variance instead of standard deviation
This avoids the costly int_sqrt calls in the statistics update and moves
it to the debugfs code instead.
This also fixes an overflow in the previous standard deviation
calculation.

Signed-off-by: Thomas Huehn <thomas.huehn@evernet-eg.de>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 0217eefa64 mac80211: minstrel: reduce MINSTREL_SCALE
The loss of a bit of extra precision does not hurt the calculation, 12
bits is still enough to calculate probabilities well. Reducing the scale
makes it easier to avoid overflows

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 1109dc392e mac80211: minstrel: remove cur_prob from debugfs
This field is redundant, because it is simply last success divided by
last attempt count. Removing it from the rate stats struct saves about
1.2 KiB per HT station.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 95cd470c75 mac80211: check for MCS in ieee80211_duration before fetching chanctx
Makes the code a bit more efficient

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 70550df2d3 mac80211: minstrel_ht: make att_hist and succ_hist u32 instead of u64
They are only used for debugging purposes and take a very long time to
overflow. Visibly reduces the size of the per-sta rate control data.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 782dda00ab mac80211: minstrel_ht: move short preamble check out of get_rate
Test short preamble support in minstrel_ht_update_caps instead of
looking at the per-packet flag. Makes the code more efficient.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Felix Fietkau 41d085835d mac80211: minstrel_ht: move supported bitrate mask out of group data
Improves dcache footprint by ensuring that fewer cache lines need to be
touched.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 11:07:52 +01:00
Koen Vandeputte 0c2e384267 mac80211: only alloc mem if a station doesn't exist yet
This speeds up the function in case a station already exists by avoiding
calling an expensive kzalloc just to free it again after the next check.

Signed-off-by: Koen Vandeputte <koen.vandeputte@ncentric.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 10:56:52 +01:00
Ben Greear a17d93ff3a mac80211: fix legacy and invalid rx-rate report
This fixes obtaining the rate info via sta_set_sinfo
when the rx rate is invalid (for instance, on IBSS
interface that has received no frames from one of its
peers).

Also initialize rinfo->flags for legacy rates, to not
rely on the whole sinfo being initialized to zero.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-15 10:54:48 +01:00
Michael S. Tsirkin f83f12d660 vsock/virtio: fix src/dst cid format
These fields are 64 bit, using le32_to_cpu and friends
on these will not do the right thing.
Fix this up.

Cc: stable@vger.kernel.org
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:19 +02:00
Michael S. Tsirkin 819483d806 vsock/virtio: mark an internal function static
virtio_transport_alloc_pkt is only used locally, make it static.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:19 +02:00
Michael S. Tsirkin 6c7efafdd5 vsock/virtio: add a missing __le annotation
guest cid is read from config space, therefore it's in little endian
format and is treated as such, annotate it accordingly.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2016-12-15 06:59:18 +02:00
Linus Torvalds a57cb1c1d7 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - a few misc things

 - kexec updates

 - DMA-mapping updates to better support networking DMA operations

 - IPC updates

 - various MM changes to improve DAX fault handling

 - lots of radix-tree changes, mainly to the test suite. All leading up
   to reimplementing the IDA/IDR code to be a wrapper layer over the
   radix-tree. However the final trigger-pulling patch is held off for
   4.11.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (114 commits)
  radix tree test suite: delete unused rcupdate.c
  radix tree test suite: add new tag check
  radix-tree: ensure counts are initialised
  radix tree test suite: cache recently freed objects
  radix tree test suite: add some more functionality
  idr: reduce the number of bits per level from 8 to 6
  rxrpc: abstract away knowledge of IDR internals
  tpm: use idr_find(), not idr_find_slowpath()
  idr: add ida_is_empty
  radix tree test suite: check multiorder iteration
  radix-tree: fix replacement for multiorder entries
  radix-tree: add radix_tree_split_preload()
  radix-tree: add radix_tree_split
  radix-tree: add radix_tree_join
  radix-tree: delete radix_tree_range_tag_if_tagged()
  radix-tree: delete radix_tree_locate_item()
  radix-tree: improve multiorder iterators
  btrfs: fix race in btrfs_free_dummy_fs_info()
  radix-tree: improve dump output
  radix-tree: make radix_tree_find_next_bit more useful
  ...
2016-12-14 17:25:18 -08:00
Matthew Wilcox 444306129a rxrpc: abstract away knowledge of IDR internals
Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Pablo Neira Ayuso 053d20f571 netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR is set
If the NFT_PAYLOAD_L4CSUM_PSEUDOHDR flag is set, then mangle layer 4
checksum. This should not depend on csum_type NFT_PAYLOAD_CSUM_INET
since IPv6 header has no checksum field, but still an update of any of
the pseudoheader fields may trigger a layer 4 checksum update.

Fixes: 1814096980 ("netfilter: nft_payload: layer 4 checksum adjustment for pseudoheader fields")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:11 +01:00
Florian Westphal 3e38df136e netfilter: nf_tables: fix oob access
BUG: KASAN: slab-out-of-bounds in nf_tables_rule_destroy+0xf1/0x130 at addr ffff88006a4c35c8
Read of size 8 by task nft/1607

When we've destroyed last valid expr, nft_expr_next() returns an invalid expr.
We must not dereference it unless it passes != nft_expr_last() check.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:06 +01:00
Pablo Neira Ayuso c2e756ff9e netfilter: nft_queue: use raw_smp_processor_id()
Using smp_processor_id() causes splats with PREEMPT_RCU:

[19379.552780] BUG: using smp_processor_id() in preemptible [00000000] code: ping/32389
[19379.552793] caller is debug_smp_processor_id+0x17/0x19
[...]
[19379.552823] Call Trace:
[19379.552832]  [<ffffffff81274e9e>] dump_stack+0x67/0x90
[19379.552837]  [<ffffffff8129a4d4>] check_preemption_disabled+0xe5/0xf5
[19379.552842]  [<ffffffff8129a4fb>] debug_smp_processor_id+0x17/0x19
[19379.552849]  [<ffffffffa07c42dd>] nft_queue_eval+0x35/0x20c [nft_queue]

No need to disable preemption since we only fetch the numeric value, so
let's use raw_smp_processor_id() instead.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:39:01 +01:00
Pablo Neira Ayuso 8010d7feb2 netfilter: nft_quota: reset quota after dump
Dumping of netlink attributes may fail due to insufficient room in the
skbuff, so let's reset consumed quota if we succeed to put netlink
attributes into the skbuff.

Fixes: 43da04a593 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2016-12-14 23:38:51 +01:00
Linus Torvalds dcdaa2f948 Merge branch 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore:
 "After the small number of patches for v4.9, we've got a much bigger
  pile for v4.10.

  The bulk of these patches involve a rework of the audit backlog queue
  to enable us to move the netlink multicasting out of the task/thread
  that generates the audit record and into the kernel thread that emits
  the record (just like we do for the audit unicast to auditd).

  While we were playing with the backlog queue(s) we fixed a number of
  other little problems with the code, and from all the testing so far
  things look to be in much better shape now. Doing this also allowed us
  to re-enable disabling IRQs for some netns operations ("netns: avoid
  disabling irq for netns id").

  The remaining patches fix some small problems that are well documented
  in the commit descriptions, as well as adding session ID filtering
  support"

* 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
  audit: use proper refcount locking on audit_sock
  netns: avoid disabling irq for netns id
  audit: don't ever sleep on a command record/message
  audit: handle a clean auditd shutdown with grace
  audit: wake up kauditd_thread after auditd registers
  audit: rework audit_log_start()
  audit: rework the audit queue handling
  audit: rename the queues and kauditd related functions
  audit: queue netlink multicast sends just like we do for unicast sends
  audit: fixup audit_init()
  audit: move kaudit thread start from auditd registration to kaudit init (#2)
  audit: add support for session ID user filter
  audit: fix formatting of AUDIT_CONFIG_CHANGE events
  audit: skip sessionid sentinel value when auto-incrementing
  audit: tame initialization warning len_abuf in audit_log_execve_info
  audit: less stack usage for /proc/*/loginuid
2016-12-14 14:06:40 -08:00
Ilya Dryomov 45ee2c1d66 libceph: remove now unused finish_request() wrapper
Kill the wrapper and rename __finish_request() to finish_request().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-14 22:39:08 +01:00
Ilya Dryomov c297eb4269 libceph: always signal completion when done
r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested.  It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.

However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set.  An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.

We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case.  Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error.  This is a bit cleaner and helps with the cancellation code.

Reported-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-14 22:39:08 +01:00
Paul Moore fba143c66a netns: avoid disabling irq for netns id
Bring back commit bc51dddf98 ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt.  Additional information, and history,
can be found in the links below:

 * https://github.com/linux-audit/audit-kernel/issues/22
 * https://github.com/linux-audit/audit-kernel/issues/23

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
2016-12-14 13:06:04 -05:00
Steve Wise 39384f04d0 rds_rdma: log the connection reject message
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-12-14 11:38:28 -05:00
Masashi Honma 445cd452fe mac80211: Use appropriate name for functions and messages
These functions drifts TSF timers, not TBTT.

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:22:27 +01:00
Masashi Honma 76f43b4c0a mac80211: Remove invalid flag operations in mesh TSF synchronization
mesh_sync_offset_adjust_tbtt() implements Extensible synchronization
framework ([1] 13.13.2 Extensible synchronization framework). It shall
not operate the flag "TBTT Adjusting subfield" ([1] 8.4.2.100.8 Mesh
Capability), since it is used only for MBCA ([1] 13.13.4 Mesh beacon
collision avoidance, see 13.13.4.4.3 TBTT scanning and adjustment
procedures for detail). So this patch remove the flag operations.

[1] IEEE Std 802.11 2012

Signed-off-by: Masashi Honma <masashi.honma@gmail.com>
[remove adjusting_tbtt entirely, since it's now unused]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:22:16 +01:00
Michał Kępień 73f4f76a19 rfkill: Add rfkill-any LED trigger
Add a new "global" (i.e. not per-rfkill device) LED trigger, rfkill-any,
which may be useful on laptops with a single "radio LED" and multiple
radio transmitters.  The trigger is meant to turn a LED on whenever
there is at least one radio transmitter active and turn it off
otherwise.

This requires taking rfkill_global_mutex before calling rfkill_set_block()
in rfkill_resume(): since __rfkill_any_led_trigger_event() is called from
rfkill_set_block() unconditionally, each caller of the latter needs to
take care of locking rfkill_global_mutex.

Signed-off-by: Michał Kępień <kernel@kempniu.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:20:29 +01:00
Michał Kępień 6124c53ede rfkill: Cleanup error handling in rfkill_init()
Use a separate label per error condition in rfkill_init() to make it a
bit cleaner and easier to extend.

Signed-off-by: Michał Kępień <kernel@kempniu.pl>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:20:29 +01:00
Cedric Izoard d8da0b5d64 mac80211: Ensure enough headroom when forwarding mesh pkt
When a buffer is duplicated during MESH packet forwarding,
this patch ensures that the new buffer has enough headroom.

Signed-off-by: Cedric Izoard <cedric.izoard@ceva-dsp.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:08:37 +01:00
Ben Greear 4a5eccaa93 mac80211: Show pending txqlen in debugfs.
Could be useful for debugging memory consumption issues,
and perhaps power-save as well.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2016-12-13 16:05:12 +01:00