Commit Graph

51983 Commits

Author SHA1 Message Date
Yafang Shao ea5d0c3249 tcp: add new SNMP counter for drops when try to queue in rcv queue
When sk_rmem_alloc is larger than the receive buffer and we can't
schedule more memory for it, the skb will be dropped.

In above situation, if this skb is put into the ofo queue,
LINUX_MIB_TCPOFODROP is incremented to track it.

While if this skb is put into the receive queue, there's no record.
So a new SNMP counter is introduced to track this behavior.

LINUX_MIB_TCPRCVQDROP:  Number of packets meant to be queued in rcv queue
			but dropped because socket rcvbuf limit hit.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 18:43:53 +09:00
Yuchung Cheng c860e997e9 tcp: fix Fast Open key endianness
Fast Open key could be stored in different endian based on the CPU.
Previously hosts in different endianness in a server farm using
the same key config (sysctl value) would produce different cookies.
This patch fixes it by always storing it as little endian to keep
same API for LE hosts.

Reported-by: Daniele Iamartino <danielei@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-30 18:40:46 +09:00
Simon Horman 0ed5269f9e net/sched: add tunnel option support to act_tunnel_key
Allow setting tunnel options using the act_tunnel_key action.

Options are expressed as class:type:data and multiple options
may be listed using a comma delimiter.

 # ip link add name geneve0 type geneve dstport 0 external
 # tc qdisc add dev eth0 ingress
 # tc filter add dev eth0 protocol ip parent ffff: \
     flower indev eth0 \
        ip_proto udp \
        action tunnel_key \
            set src_ip 10.0.99.192 \
            dst_ip 10.0.99.193 \
            dst_port 6081 \
            id 11 \
            geneve_opts 0102:80:00800022,0102:80:00800022 \
    action mirred egress redirect dev geneve0

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Pieter Jansen van Vuuren 256c87c17c net: check tunnel option type in tunnel flags
Check the tunnel option type stored in tunnel flags when creating options
for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel
options on interfaces that are not associated with them.

Make sure all users of the infrastructure set correct flags, for the BPF
helper we have to set all bits to keep backward compatibility.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Simon Horman 9d7298cd1d net/sched: act_tunnel_key: add extended ack support
Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG
during validation of user input.

Cc: Alexander Aring <aring@mojatatu.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Simon Horman a1165b5919 net/sched: act_tunnel_key: disambiguate metadata dst error cases
Metadata may be NULL for one of two reasons:
* Missing user input
* Failure to allocate the metadata dst

Disambiguate these case by returning -EINVAL for the former and -ENOMEM
for the latter rather than -EINVAL for both cases.

This is in preparation for using extended ack to provide more information
to users when parsing their input.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 23:50:26 +09:00
Christoph Hellwig e88958e636 net: handle NULL ->poll gracefully
The big aio poll revert broke various network protocols that don't
implement ->poll as a patch in the aio poll serie removed sock_no_poll
and made the common code handle this case.

Reported-by: syzbot+57727883dbad76db2ef0@syzkaller.appspotmail.com
Reported-by: syzbot+cdb0d3176b53d35ad454@syzkaller.appspotmail.com
Reported-by: syzbot+2c7e8f74f8b2571c87e8@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: a11e1d432b ("Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-29 06:51:51 -07:00
Xin Long b0e9a2fe3f sctp: add support for SCTP_REUSE_PORT sockopt
This feature is actually already supported by sk->sk_reuse which can be
set by socket level opt SO_REUSEADDR. But it's not working exactly as
RFC6458 demands in section 8.1.27, like:

  - This option only supports one-to-one style SCTP sockets
  - This socket option must not be used after calling bind()
    or sctp_bindx().

Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
work in linux.

To separate it from the socket level version, this patch adds 'reuse' in
sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
setup limitations that are needed when it is being enabled.

"It should be noted that the behavior of the socket-level socket option
to reuse ports and/or addresses for SCTP sockets is unspecified", so it
leaves SO_REUSEADDR as is for the compatibility.

Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.

Thanks to Neil to make this clear.

v1->v2:
  - add sctp_sk->reuse to separate it from the socket level version.
v2->v3:
  - improve changelog according to Marcelo's suggestion.

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 22:20:55 +09:00
David S. Miller 0933cc294f Just three fixes:
* fix HT operation in mesh mode
  * disable preemption in control frame TX
  * check nla_parse_nested() return values
    where missing (two places)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAls15CcACgkQB8qZga/f
 l8RUkw/+IQjrtvjKN0Y5rdEy8lgvmd16/umIp6XQ+uBDi3sOrcQRy6G0hwMHg+4F
 HZ7UbsazCVDwhwtTL/BPTJLWBJRDM9hTIppw6A4yQy7b9dp7EJVfgk4FxC2nsD1z
 rTzNdTDtrXsLleBOjPxPV0/q6uw4B6fjXhJZvvtPXZUhaB3Uf1gEKrDTMDqHbh5K
 GbPI1DU6k5M0A8co9wwrEyPGBlq0DRd8amftMWzIP0EOZSCK2BD3zIscZkRCT7dG
 +Jd2SFhuvhKvFEHuoQPMxAzotsDpIP7q2ejOSSSs3bHaindTWVmzi7XZgIQ3rT/O
 P8pBV9G4LLaquLNSvM+ihGUxGJXUrYFdsuchvnVr/tYYf3ozGh24XcodVihg47RC
 T7r0y07Ai4wuxI0aYoP26L5pANl4AEsyooKe5lOArUpHDGDWC9c9LiZAveg0d7vJ
 5UtWc+Zlf5n93mHd9/52ktCRo7jqEYFMEU5phwrCKn+YJV6u73BggHp79aAidi9I
 WMOHiLyiOsSfq8PmqASpV2/mz6oStCG188tto6o5CmeMVk6F1EbiMThajdikoNoM
 6K5kGHXYi/wGz4wsohUqWG2KQEM6+pxZeF2GFktGCuKsx7BDfR9V7K2Oub+3VD9Y
 LZXWHYRRq0YDUzXOEaUz1292QE8OAbYyWnNAZTfc8B7whrA5wRA=
 =g4XY
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
Just three fixes:
 * fix HT operation in mesh mode
 * disable preemption in control frame TX
 * check nla_parse_nested() return values
   where missing (two places)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 22:09:26 +09:00
Shakeel Butt e699e2c6a6 net, mm: account sock objects to kmemcg
Currently the kernel accounts the memory for network traffic through
mem_cgroup_[un]charge_skmem() interface. However the memory accounted
only includes the truesize of sk_buff which does not include the size of
sock objects. In our production environment, with opt-out kmem
accounting, the sock kmem caches (TCP[v6], UDP[v6], RAW[v6], UNIX) are
among the top most charged kmem caches and consume a significant amount
of memory which can not be left as system overhead. So, this patch
converts the kmem caches of all sock objects to SLAB_ACCOUNT.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 21:56:27 +09:00
Omer Efrat a421775058 mac80211: use BIT_ULL for NL80211_STA_INFO_* attribute types
The BIT macro uses unsigned long which some architectures handle as 32 bit
and therefore might cause macro's shift to overflow when used on a value
equals or larger than 32 (NL80211_STA_INFO_RX_DURATION and afterwards).

Since 'filled' member in station_info changed to u64, BIT_ULL macro
should be used with all NL80211_STA_INFO_* attribute types instead of BIT
to prevent future possible bugs when one will use BIT macro for higher
attributes by mistake.

This commit cleans up all usages of BIT macro with the above field
in mac80211 by changing it to BIT_ULL instead.

Signed-off-by: Omer Efrat <omer.efrat@tandemg.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:53:09 +02:00
Omer Efrat 397c657a06 cfg80211: use BIT_ULL for NL80211_STA_INFO_* attribute types
The BIT macro uses unsigned long which some architectures handle as 32 bit
and therefore might cause macro's shift to overflow when used on a value
equals or larger than 32 (NL80211_STA_INFO_RX_DURATION and afterwards).

Since 'filled' member in station_info changed to u64, BIT_ULL macro
should be used with all NL80211_STA_INFO_* attribute types instead of BIT
to prevent future possible bugs when one will use BIT macro for higher
attributes by mistake.

This commit cleans up all usages of BIT macro with the above field
in cfg80211 by changing it to BIT_ULL instead. In addition, there are
some places which don't use BIT nor BIT_ULL macros so align those as well.

Signed-off-by: Omer Efrat <omer.efrat@tandemg.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:52:23 +02:00
Johannes Berg f0c0407d2a mac80211: remove unnecessary NULL check
We don't need to check if he_oper is NULL before calling
ieee80211_verify_sta_he_mcs_support() as it - now - will
correctly check this itself. Remove the redundant check.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:51:39 +02:00
Gustavo A. R. Silva 47aa7861b9 mac80211: fix potential null pointer dereference
he_op is being dereferenced before it is null checked, hence there
is a potential null pointer dereference.

Fix this by moving the pointer dereference after he_op has been
properly null checked.

Notice that, currently, he_op is already being null checked before
calling this function at 4593:

4593	if (!he_oper ||
4594	    !ieee80211_verify_sta_he_mcs_support(sband, he_oper))
4595		ifmgd->flags |= IEEE80211_STA_DISABLE_HE;

but in case ieee80211_verify_sta_he_mcs_support is ever called
without verifying he_oper is not null, we will end up having a
null pointer dereference. So, we better don't take any chances.

Addresses-Coverity-ID: 1470068 ("Dereference before null check")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:50:43 +02:00
Arnd Bergmann fe0984d389 cfg80211: track time using boottime
The cfg80211 layer uses get_seconds() to read the current time
in its supend handling. This function is deprecated because of the 32-bit
time_t overflow, and it can cause unexpected behavior when the time
changes due to settimeofday() calls or leap second updates.

In many cases, we want to use monotonic time instead, however cfg80211
explicitly tracks the time spent in suspend, so this changes the
driver over to use ktime_get_boottime_seconds(), which is slightly
slower, but not used in a fastpath here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:49:28 +02:00
Johannes Berg 95bca62fb7 nl80211: check nla_parse_nested() return values
At the very least we should check the return value if
nla_parse_nested() is called with a non-NULL policy.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:44:51 +02:00
Bob Copeland 188f60ab8e nl80211: relax ht operation checks for mesh
Commit 9757235f45, "nl80211: correct checks for
NL80211_MESHCONF_HT_OPMODE value") relaxed the range for the HT
operation field in meshconf, while also adding checks requiring
the non-greenfield and non-ht-sta bits to be set in certain
circumstances.  The latter bit is actually reserved for mesh BSSes
according to Table 9-168 in 802.11-2016, so in fact it should not
be set.

wpa_supplicant sets these bits because the mesh and AP code share
the same implementation, but authsae does not.  As a result, some
meshconf updates from authsae which set only the NONHT_MIXED
protection bits were being rejected.

In order to avoid breaking userspace by changing the rules again,
simply accept the values with or without the bits set, and mask
off the reserved bit to match the spec.

While in here, update the 802.11-2012 reference to 802.11-2016.

Fixes: 9757235f45 ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Cc: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Bob Copeland <bobcopeland@fb.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Reviewed-by: Masashi Honma <masashi.honma@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:39:30 +02:00
Denis Kenzior e7441c9274 mac80211: disable BHs/preemption in ieee80211_tx_control_port()
On pre-emption enabled kernels the following print was being seen due to
missing local_bh_disable/local_bh_enable calls.  mac80211 assumes that
pre-emption is disabled in the data path.

    BUG: using smp_processor_id() in preemptible [00000000] code: iwd/517
    caller is __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]
    [...]
    Call Trace:
    dump_stack+0x5c/0x80
    check_preemption_disabled.cold.0+0x46/0x51
    __ieee80211_subif_start_xmit+0x144/0x210 [mac80211]

Fixes: 9118064914 ("mac80211: Add support for tx_control_port")
Signed-off-by: Denis Kenzior <denkenz@gmail.com>
[commit message rewrite, fixes tag]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-29 09:39:08 +02:00
Tom Herbert b6e71bdebb ila: Flush netlink command to clear xlat table
Add ILA_CMD_FLUSH netlink command to clear the ILA translation table.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 11:32:55 +09:00
Tom Herbert ad68147ef2 ila: Create main ila source file
Create a main ila file that contains the module initialization functions
as well as netlink definitions. Previously these were defined in
ila_xlat and ila_common. This approach allows better extensibility.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 11:32:55 +09:00
Tom Herbert b893281715 ila: Call library function alloc_bucket_locks
To allocate the array of bucket locks for the hash table we now
call library function alloc_bucket_spinlocks.

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 11:32:55 +09:00
Tom Herbert f7a2ba5ab9 ila: Fix use of rhashtable walk in ila_xlat.c
Perform better EAGAIN handling, handle case where ila_dump_info
fails and we missed objects in the dump, and add a skip index
to skip over ila entires in a list on a rhashtable node that have
already been visited (by a previous call to ila_nl_dump).

Signed-off-by: Tom Herbert <tom@quantonium.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-29 11:32:55 +09:00
David Ahern 4c79579b44 bpf: Change bpf_fib_lookup to return lookup status
For ACLs implemented using either FIB rules or FIB entries, the BPF
program needs the FIB lookup status to be able to drop the packet.
Since the bpf_fib_lookup API has not reached a released kernel yet,
change the return code to contain an encoding of the FIB lookup
result and return the nexthop device index in the params struct.

In addition, inform the BPF program of any post FIB lookup reason as
to why the packet needs to go up the stack.

The fib result for unicast routes must have an egress device, so remove
the check that it is non-NULL.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-29 00:02:02 +02:00
Linus Torvalds a11e1d432b Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL
The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-06-28 10:40:47 -07:00
Flavio Leitner 9c4c325252 skbuff: preserve sock reference when scrubbing the skb.
The sock reference is lost when scrubbing the packet and that breaks
TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
performance impacts of about 50% in a single TCP stream when crossing
network namespaces.

XPS breaks because the queue mapping stored in the socket is not
available, so another random queue might be selected when the stack
needs to transmit something like a TCP ACK, or TCP Retransmissions.
That causes packet re-ordering and/or performance issues.

TSQ breaks because it orphans the packet while it is still in the
host, so packets are queued contributing to the buffer bloat problem.

Preserving the sock reference fixes both issues. The socket is
orphaned anyways in the receiving path before any relevant action
and on TX side the netfilter checks if the reference is local before
use it.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:21:32 +09:00
Flavio Leitner f564650106 netfilter: check if the socket netns is correct.
Netfilter assumes that if the socket is present in the skb, then
it can be used because that reference is cleaned up while the skb
is crossing netns.

We want to change that to preserve the socket reference in a future
patch, so this is a preparation updating netfilter to check if the
socket netns matches before use it.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:21:32 +09:00
Roman Mashak 4305274153 net sched actions: avoid bitwise operation on signed value in pedit
Since char can be unsigned or signed, and bitwise operators may have
implementation-dependent results when performed on signed operands,
declare 'u8 *' operand instead.

Suggested-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:12:03 +09:00
Roman Mashak 95b0d2dc13 net sched actions: fix misleading text strings in pedit action
Change "tc filter pedit .." to "tc actions pedit .." in error
messages to clearly refer to pedit action.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:12:03 +09:00
Roman Mashak 6ff7586e38 net sched actions: use sizeof operator for buffer length
Replace constant integer with sizeof() to clearly indicate
the destination buffer length in skb_header_pointer() calls.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:12:03 +09:00
Roman Mashak 544377cd25 net sched actions: fix sparse warning
The variable _data in include/asm-generic/sections.h defines sections,
this causes sparse warning in pedit:

net/sched/act_pedit.c:293:35: warning: symbol '_data' shadows an earlier one
./include/asm-generic/sections.h:36:13: originally declared here

Therefore rename the variable.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:12:03 +09:00
Roman Mashak 80f0f574cc net sched actions: fix coding style in pedit action
Fix coding style issues in tc pedit action detected by the
checkpatch script.

Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:12:03 +09:00
Yousuk Seung 0a9fe5c375 netem: slotting with non-uniform distribution
Extend slotting with support for non-uniform distributions. This is
similar to netem's non-uniform distribution delay feature.

Commit f043efeae2f1 ("netem: support delivering packets in delayed
time slots") added the slotting feature to approximate the behaviors
of media with packet aggregation but only supported a uniform
distribution for delays between transmission attempts. Tests with TCP
BBR with emulated wifi links with non-uniform distributions produced
more useful results.

Syntax:
   slot dist DISTRIBUTION DELAY JITTER [packets MAX_PACKETS] \
      [bytes MAX_BYTES]

The syntax and use of the distribution table is the same as in the
non-uniform distribution delay feature. A file DISTRIBUTION must be
present in TC_LIB_DIR (e.g. /usr/lib/tc) containing numbers scaled by
NETEM_DIST_SCALE. A random value x is selected from the table and it
takes DELAY + ( x * JITTER ) as delay. Correlation between values is not
supported.

Examples:
  Normal distribution delay with mean = 800us and stdev = 100us.
  > tc qdisc add dev eth0 root netem slot dist normal 800us 100us

  Optionally set the max slot size in bytes and/or packets.
  > tc qdisc add dev eth0 root netem slot dist normal 800us 100us \
    bytes 64k packets 42

Signed-off-by: Yousuk Seung <ysseung@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:06:24 +09:00
Ursula Braun 24ac3a08e6 net/smc: rebuild nonblocking connect
The recent poll change may lead to stalls for non-blocking connecting
SMC sockets, since sock_poll_wait is no longer performed on the
internal CLC socket, but on the outer SMC socket.  kernel_connect() on
the internal CLC socket returns with -EINPROGRESS, but the wake up
logic does not work in all cases. If the internal CLC socket is still
in state TCP_SYN_SENT when polled, sock_poll_wait() from sock_poll()
does not sleep. It is supposed to sleep till the state of the internal
CLC socket switches to TCP_ESTABLISHED.

This problem triggered a redesign of the SMC nonblocking connect logic.
This patch introduces a connect worker covering all connect steps
followed by a wake up of socket waiters. It allows to get rid of all
delays and locks in smc_poll().

Fixes: c0129a0614 ("smc: convert to ->poll_mask")
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:03:55 +09:00
Eric Dumazet 15ecbe94a4 tcp: add one more quick ack after after ECN events
Larry Brakmo proposal ( https://patchwork.ozlabs.org/patch/935233/
tcp: force cwnd at least 2 in tcp_cwnd_reduction) made us rethink
about our recent patch removing ~16 quick acks after ECN events.

tcp_enter_quickack_mode(sk, 1) makes sure one immediate ack is sent,
but in the case the sender cwnd was lowered to 1, we do not want
to have a delayed ack for the next packet we will receive.

Fixes: 522040ea5f ("tcp: do not aggressively quick ack after ECN events")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 22:01:04 +09:00
Masahiro Yamada 8e75887d32 bpfilter: include bpfilter_umh in assembly instead of using objcopy
What we want here is to embed a user-space program into the kernel.
Instead of the complex ELF magic, let's simply wrap it in the assembly
with the '.incbin' directive.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 21:39:16 +09:00
Doron Roberts-Kedes 977c7114eb strparser: Remove early eaten to fix full tcp receive buffer stall
On receving an incomplete message, the existing code stores the
remaining length of the cloned skb in the early_eaten field instead of
incrementing the value returned by __strp_recv. This defers invocation
of sock_rfree for the current skb until the next invocation of
__strp_recv, which returns early_eaten if early_eaten is non-zero.

This behavior causes a stall when the current message occupies the very
tail end of a massive skb, and strp_peek/need_bytes indicates that the
remainder of the current message has yet to arrive on the socket. The
TCP receive buffer is totally full, causing the TCP window to go to
zero, so the remainder of the message will never arrive.

Incrementing the value returned by __strp_recv by the amount otherwise
stored in early_eaten prevents stalls of this nature.

Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 21:37:26 +09:00
Guillaume Nault a408194aa0 l2tp: define helper for parsing struct sockaddr_pppol2tp*
'sockaddr_len' is checked against various values when entering
pppol2tp_connect(), to verify its validity. It is used again later, to
find out which sockaddr structure was passed from user space. This
patch combines these two operations into one new function in order to
simplify pppol2tp_connect().

A new structure, l2tp_connect_info, is used to pass sockaddr data back
to pppol2tp_connect(), to avoid passing too many parameters to
l2tp_sockaddr_get_info(). Also, the first parameter is void* in order
to avoid casting between all sockaddr_* structures manually.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 16:06:50 +09:00
Eric Dumazet 242b1bbe51 tcp: remove one indentation level in tcp_create_openreq_child
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 16:02:31 +09:00
Masahiro Yamada 88e85a7daf bpfilter: check compiler capability in Kconfig
With the brand-new syntax extension of Kconfig, we can directly
check the compiler capability in the configuration phase.

If the cc-can-link.sh fails, the BPFILTER_UMH is automatically
hidden by the dependency.

I also deleted 'default n', which is no-op.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 13:36:39 +09:00
David S. Miller 0901441839 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree:

1) Missing netlink attribute validation in nf_queue, uncovered by KASAN,
   from Eric Dumazet.

2) Use pointer to sysctl table, save us 192 bytes of memory per netns.
   Also from Eric.

3) Possible use-after-free when removing conntrack helper modules due
   to missing synchronize RCU call. From Taehee Yoo.

4) Fix corner case in systcl writes to nf_log that lead to appending
   data to uninitialized buffer, from Jann Horn.

5) Jann Horn says we may indefinitely block other users of nf_log_mutex
   if a userspace access in proc_dostring() blocked e.g. due to a
   userfaultfd.

6) Fix garbage collection race for unconfirmed conntrack entries,
   from Florian Westphal.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 13:32:44 +09:00
Roopa Prabhu 8e326289e3 neighbour: force neigh_invalidate when NUD_FAILED update is from admin
In systems where neigh gc thresh holds are set to high values,
admin deleted neigh entries (eg ip neigh flush or ip neigh del) can
linger around in NUD_FAILED state for a long time until periodic gc kicks
in. This patch forces neigh_invalidate when NUD_FAILED neigh_update is
from an admin.

Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-27 15:40:45 +09:00
Kees Cook 3463e51dc3 net/tls: Remove VLA usage on nonce
It looks like the prior VLA removal, commit b16520f749 ("net/tls: Remove
VLA usage"), and a new VLA addition, commit c46234ebb4 ("tls: RX path
for ktls"), passed in the night. This removes the newly added VLA, which
happens to have its bounds based on the same max value.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-27 10:39:52 +09:00
Jason A. Donenfeld 7c8f4e6dc3 fib_rules: match rules based on suppress_* properties too
Two rules with different values of suppress_prefix or suppress_ifgroup
are not the same. This fixes an -EEXIST when running:

   $ ip -4 rule add table main suppress_prefixlength 0

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Fixes: f9d4b0c1e9 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule")
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-27 10:33:05 +09:00
Sowmini Varadhan c809195f55 rds: clean up loopback rds_connections on netns deletion
The RDS core module creates rds_connections based on callbacks
from rds_loop_transport when sending/receiving packets to local
addresses.

These connections will need to be cleaned up when they are
created from a netns that is not init_net, and that netns is deleted.

Add the changes aligned with the changes from
commit ebeeb1ad9b ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport

Reported-and-tested-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-27 10:11:03 +09:00
Florian Westphal b36e4523d4 netfilter: nf_conncount: fix garbage collection confirm race
Yi-Hung Wei and Justin Pettit found a race in the garbage collection scheme
used by nf_conncount.

When doing list walk, we lookup the tuple in the conntrack table.
If the lookup fails we remove this tuple from our list because
the conntrack entry is gone.

This is the common cause, but turns out its not the only one.
The list entry could have been created just before by another cpu, i.e. the
conntrack entry might not yet have been inserted into the global hash.

The avoid this, we introduce a timestamp and the owning cpu.
If the entry appears to be stale, evict only if:
 1. The current cpu is the one that added the entry, or,
 2. The timestamp is older than two jiffies

The second constraint allows GC to be taken over by other
cpu too (e.g. because a cpu was offlined or napi got moved to another
cpu).

We can't pretend the 'doubtful' entry wasn't in our list.
Instead, when we don't find an entry indicate via IS_ERR
that entry was removed ('did not exist' or withheld
('might-be-unconfirmed').

This most likely also fixes a xt_connlimit imbalance earlier reported by
Dmitry Andrianov.

Cc: Dmitry Andrianov <dmitry.andrianov@alertme.com>
Reported-by: Justin Pettit <jpettit@vmware.com>
Reported-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-26 18:28:57 +02:00
Jann Horn ce00bf07cc netfilter: nf_log: don't hold nf_log_mutex during user access
The old code would indefinitely block other users of nf_log_mutex if
a userspace access in proc_dostring() blocked e.g. due to a userfaultfd
region. Fix it by moving proc_dostring() out of the locked region.

This is a followup to commit 266d07cb1c ("netfilter: nf_log: fix
sleeping function called from invalid context"), which changed this code
from using rcu_read_lock() to taking nf_log_mutex.

Fixes: 266d07cb1c ("netfilter: nf_log: fix sleeping function calle[...]")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-26 16:48:40 +02:00
Jann Horn dffd22aed2 netfilter: nf_log: fix uninit read in nf_log_proc_dostring
When proc_dostring() is called with a non-zero offset in strict mode, it
doesn't just write to the ->data buffer, it also reads. Make sure it
doesn't read uninitialized data.

Fixes: c6ac37d8d8 ("netfilter: nf_log: fix error on write NONE to [...]")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-26 16:48:23 +02:00
John Hurley 326367427c net: sched: call reoffload op on block callback reg
Call the reoffload tcf_proto_op on all tcf_proto nodes in all chains of a
block when a callback tries to register to a block that already has
offloaded rules. If all existing rules cannot be offloaded then the
registration is rejected. This replaces the previous policy of rejecting
such callback registration outright.

On unregistration of a callback, the rules are flushed for that given cb.
The implementation of block sharing in the NFP driver, for example,
duplicates shared rules to all devs bound to a block. This meant that
rules could still exist in hw even after a device is unbound from a block
(assuming the block still remains active).

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:33 +09:00
John Hurley 7e916b7680 net: sched: cls_bpf: implement offload tcf_proto_op
Add the offload tcf_proto_op in cls_bpf to generate an offload message for
each bpf prog in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' prog.

A prog contains a flag to indicate if it is in hardware or not. To
ensure the offload function properly maintains this flag, keep a reference
counter for the number of instances of the prog that are in hardware. Only
update the flag when this counter changes from or to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:33 +09:00
John Hurley 530d995123 net: sched: cls_u32: implement offload tcf_proto_op
Add the offload tcf_proto_op in cls_u32 to generate an offload message for
each filter and the hashtable in the given tcf_proto. Call the specified
callback with this new offload message. The function only returns an error
if the callback rejects adding a 'hardware only' rule.

A filter contains a flag to indicate if it is in hardware or not. To
ensure the offload function properly maintains this flag, keep a reference
counter for the number of instances of the filter that are in hardware.
Only update the flag when this counter changes from or to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:33 +09:00
John Hurley 0efd1b3a13 net: sched: cls_matchall: implement offload tcf_proto_op
Add the reoffload tcf_proto_op in matchall to generate an offload message
for each filter in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' rule.

Ensure matchall flags correctly report if the rule is in hw by keeping a
reference counter for the number of instances of the rule offloaded. Only
update the flag when this counter changes from or to 0.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:33 +09:00
John Hurley 31533cba43 net: sched: cls_flower: implement offload tcf_proto_op
Add the reoffload tcf_proto_op in flower to generate an offload message
for each filter in the given tcf_proto. Call the specified callback with
this new offload message. The function only returns an error if the
callback rejects adding a 'hardware only' rule.

A filter contains a flag to indicate if it is in hardware or not. To
ensure the reoffload function properly maintains this flag, keep a
reference counter for the number of instances of the filter that are in
hardware. Only update the flag when this counter changes from or to 0. Add
a generic helper function to implement this behaviour.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:32 +09:00
John Hurley 60513bd82c net: sched: pass extack pointer to block binds and cb registration
Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.

Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 23:21:32 +09:00
Guillaume Nault 2685fbb804 l2tp: make l2tp_xmit_core() return void
It always returns 0, and nobody reads the return value anyway.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault 363a341d19 l2tp: avoid duplicate l2tp_pernet() calls
Replace 'l2tp_pernet(tunnel->l2tp_net)' with 'pn', which has been set
on the preceding line.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault d08532bb50 l2tp: don't export l2tp_tunnel_closeall()
This function is only used in l2tp_core.c.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault 2e67560ef6 l2tp: don't export l2tp_session_queue_purge()
This function is only used in l2tp_core.c.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault e484b1c227 l2tp: remove l2tp_tunnel_priv()
This function, and the associated .priv field, are unused.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault c3612f0e90 l2tp: remove .show from struct l2tp_tunnel
This callback has never been implemented.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Guillaume Nault 877375e485 l2tp: remove pppol2tp_session_close()
l2tp_core.c verifies that ->session_close() is defined before calling
it. There's no need for a stub.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 22:55:51 +09:00
Yafang Shao fb223502ec tcp: add SNMP counter for zero-window drops
It will be helpful if we could display the drops due to zero window or no
enough window space.
So a new SNMP MIB entry is added to track this behavior.
This entry is named LINUX_MIB_TCPZEROWINDOWDROP and published in
/proc/net/netstat in TcpExt line as TCPZeroWindowDrop.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 11:49:08 +09:00
David Miller 07d78363dc net: Convert NAPI gro list into a small hash table.
Improve the performance of GRO receive by splitting flows into
multiple hash chains.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 11:33:04 +09:00
David Miller d4546c2509 net: Convert GRO SKB handling to list_head.
Manage pending per-NAPI GRO packets via list_head.

Return an SKB pointer from the GRO receive handlers.  When GRO receive
handlers return non-NULL, it means that this SKB needs to be completed
at this time and removed from the NAPI queue.

Several operations are greatly simplified by this transformation,
especially timing out the oldest SKB in the list when gro_count
exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
in reverse order.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-26 11:33:04 +09:00
David S. Miller 9ff3b40e41 Merge ra.kernel.org:/pub/scm/linux/kernel/git/davem/net 2018-06-26 08:07:17 +09:00
Linus Torvalds 6f0d349d92 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix netpoll OOPS in r8169, from Ville Syrjälä.

 2) Fix bpf instruction alignment on powerpc et al., from Eric Dumazet.

 3) Don't ignore IFLA_MTU attribute when creating new ipvlan links. From
    Xin Long.

 4) Fix use after free in AF_PACKET, from Eric Dumazet.

 5) Mis-matched RTNL unlock in xen-netfront, from Ross Lagerwall.

 6) Fix VSOCK loopback on big-endian, from Claudio Imbrenda.

 7) Missing RX buffer offset correction when computing DMA addresses in
    mvneta driver, from Antoine Tenart.

 8) Fix crashes in DCCP's ccid3_hc_rx_send_feedback, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
  sfc: make function efx_rps_hash_bucket static
  strparser: Corrected typo in documentation.
  qmi_wwan: add support for the Dell Wireless 5821e module
  cxgb4: when disabling dcb set txq dcb priority to 0
  net_sched: remove a bogus warning in hfsc
  net: dccp: switch rx_tstamp_last_feedback to monotonic clock
  net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
  net: Remove depends on HAS_DMA in case of platform dependency
  MAINTAINERS: Add file patterns for dsa device tree bindings
  net: mscc: make sparse happy
  net: mvneta: fix the Rx desc DMA address in the Rx path
  Documentation: e1000: Fix docs build error
  Documentation: e100: Fix docs build error
  Documentation: e1000: Use correct heading adornment
  Documentation: e100: Use correct heading adornment
  ipv6: mcast: fix unsolicited report interval after receiving querys
  vhost_net: validate sock before trying to put its fd
  VSOCK: fix loopback on big-endian systems
  net: ethernet: ti: davinci_cpdma: make function cpdma_desc_pool_create static
  xen-netfront: Update features after registering netdev
  ...
2018-06-25 15:58:17 +08:00
Vakul Garg 0ef8b4567d tls: Removed unused variable
Removed unused variable 'rxm' from tls_queue().

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-24 23:54:18 +09:00
Cong Wang fe0b082fed net_sched: remove unused htb drop_list
After commit a09ceb0e08 ("sched: remove qdisc->drop"),
it is no longer used.

Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-24 16:42:46 +09:00
Cong Wang 35b42da69e net_sched: remove a bogus warning in hfsc
In update_vf():

  cftree_remove(cl);
  update_cfmin(cl->cl_parent);

the cl_cfmin of cl->cl_parent is intentionally updated to 0
when that parent only has one child. And if this parent is
root qdisc, we could end up, in hfsc_schedule_watchdog(),
that we can't decide the next schedule time for qdisc watchdog.
But it seems safe that we can just skip it, as this watchdog is
not always scheduled anyway.

Thanks to Marco for testing all the cases, nothing is broken.

Reported-by: Marco Berizzi <pupilla@libero.it>
Tested-by: Marco Berizzi <pupilla@libero.it>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:58:46 +09:00
Joe Perches 6c1f0a1ffb net: drivers/net: Convert random_ether_addr to eth_random_addr
random_ether_addr is a #define for eth_random_addr which is
generally preferred in kernel code by ~3:1

Convert the uses of random_ether_addr to enable removing the #define

Miscellanea:

o Convert &vfmac[0] to equivalent vfmac and avoid unnecessary line wrap

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:49:14 +09:00
Eric Dumazet 0ce4e70ff0 net: dccp: switch rx_tstamp_last_feedback to monotonic clock
To compute delays, better not use time of the day which can
be changed by admins or malicious programs.

Also change ccid3_first_li() to use s64 type for delta variable
to avoid potential overflows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:46:44 +09:00
Eric Dumazet 74174fe563 net: dccp: avoid crash in ccid3_hc_rx_send_feedback()
On fast hosts or malicious bots, we trigger a DCCP_BUG() which
seems excessive.

syzbot reported :

BUG: delta (-6195) <= 0 at net/dccp/ccids/ccid3.c:628/ccid3_hc_rx_send_feedback()
CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.18.0-rc1+ #112
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 ccid3_hc_rx_send_feedback net/dccp/ccids/ccid3.c:628 [inline]
 ccid3_hc_rx_packet_recv.cold.16+0x38/0x71 net/dccp/ccids/ccid3.c:793
 ccid_hc_rx_packet_recv net/dccp/ccid.h:185 [inline]
 dccp_deliver_input_to_ccids+0xf0/0x280 net/dccp/input.c:180
 dccp_rcv_established+0x87/0xb0 net/dccp/input.c:378
 dccp_v4_do_rcv+0x153/0x180 net/dccp/ipv4.c:654
 sk_backlog_rcv include/net/sock.h:914 [inline]
 __sk_receive_skb+0x3ba/0xd80 net/core/sock.c:517
 dccp_v4_rcv+0x10f9/0x1f58 net/dccp/ipv4.c:875
 ip_local_deliver_finish+0x2eb/0xda0 net/ipv4/ip_input.c:215
 NF_HOOK include/linux/netfilter.h:287 [inline]
 ip_local_deliver+0x1e9/0x750 net/ipv4/ip_input.c:256
 dst_input include/net/dst.h:450 [inline]
 ip_rcv_finish+0x823/0x2220 net/ipv4/ip_input.c:396
 NF_HOOK include/linux/netfilter.h:287 [inline]
 ip_rcv+0xa18/0x1284 net/ipv4/ip_input.c:492
 __netif_receive_skb_core+0x2488/0x3680 net/core/dev.c:4628
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 process_backlog+0x219/0x760 net/core/dev.c:5373
 napi_poll net/core/dev.c:5771 [inline]
 net_rx_action+0x7da/0x1980 net/core/dev.c:5837
 __do_softirq+0x2e8/0xb17 kernel/softirq.c:284
 run_ksoftirqd+0x86/0x100 kernel/softirq.c:645
 smpboot_thread_fn+0x417/0x870 kernel/smpboot.c:164
 kthread+0x345/0x410 kernel/kthread.c:240
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: dccp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:46:43 +09:00
Hangbin Liu 6c6da92808 ipv6: mcast: fix unsolicited report interval after receiving querys
After recieving MLD querys, we update idev->mc_maxdelay with max_delay
from query header. This make the later unsolicited reports have the same
interval with mc_maxdelay, which means we may send unsolicited reports with
long interval time instead of default configured interval time.

Also as we will not call ipv6_mc_reset() after device up. This issue will
be there even after leave the group and join other groups.

Fixes: fc4eba58b4 ("ipv6: make unsolicited report intervals configurable for mld")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-23 10:27:45 +09:00
Eric Dumazet cadefe5f58 tcp_bbr: fix bbr pacing rate for internal pacing
This commit makes BBR use only the MSS (without any headers) to
calculate pacing rates when internal TCP-layer pacing is used.

This is necessary to achieve the correct pacing behavior in this case,
since tcp_internal_pacing() uses only the payload length to calculate
pacing delays.

Signed-off-by: Kevin Yang <yyd@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 13:59:22 +09:00
Wei Wang 3f6c65d625 tcp: ignore rcv_rtt sample with old ts ecr value
When receiving multiple packets with the same ts ecr value, only try
to compute rcv_rtt sample with the earliest received packet.
This is because the rcv_rtt calculated by later received packets
could possibly include long idle time or other types of delay.
For example:
(1) server sends last packet of reply with TS val V1
(2) client ACKs last packet of reply with TS ecr V1
(3) long idle time passes
(4) client sends next request data packet with TS ecr V1 (again!)
At this time, the rcv_rtt computed on server with TS ecr V1 will be
inflated with the idle time and should get ignored.

Signed-off-by: Wei Wang <weiwan@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 13:45:01 +09:00
NeilBrown 9f9a707738 rhashtable: remove nulls_base and related code.
This "feature" is unused, undocumented, and untested and so doesn't
really belong.  A patch is under development to properly implement
support for detecting when a search gets diverted down a different
chain, which the common purpose of nulls markers.

This patch actually fixes a bug too.  The table resizing allows a
table to grow to 2^31 buckets, but the hash is truncated to 27 bits -
any growth beyond 2^27 is wasteful an ineffective.

This patch results in NULLS_MARKER(0) being used for all chains,
and leaves the use of rht_is_a_null() to test for it.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 13:43:27 +09:00
NeilBrown 0eb71a9da5 rhashtable: split rhashtable.h
Due to the use of rhashtables in net namespaces,
rhashtable.h is included in lots of the kernel,
so a small changes can required a large recompilation.
This makes development painful.

This patch splits out rhashtable-types.h which just includes
the major type declarations, and does not include (non-trivial)
inline code.  rhashtable.h is no longer included by anything
in the include/ directory.
Common include files only include rhashtable-types.h so a large
recompilation is only triggered when that changes.

Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 13:43:27 +09:00
Claudio Imbrenda e5ab564c9e VSOCK: fix loopback on big-endian systems
The dst_cid and src_cid are 64 bits, therefore 64 bit accessors should be
used, and in fact in virtio_transport_common.c only 64 bit accessors are
used. Using 32 bit accessors for 64 bit values breaks big endian systems.

This patch fixes a wrong use of le32_to_cpu in virtio_transport_send_pkt.

Fixes: b911682318 ("VSOCK: add loopback to virtio_transport")

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 09:34:08 +09:00
Paolo Abeni 44a5cd436e cls_flower: fix use after free in flower S/W path
If flower filter is created without the skip_sw flag, fl_mask_put()
can race with fl_classify() and we can destroy the mask rhashtable
while a lookup operation is accessing it.

 BUG: unable to handle kernel paging request at 00000000000911d1
 PGD 0 P4D 0
 SMP PTI
 CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 RIP: 0010:rht_bucket_nested+0x20/0x60
 Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
 RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
 RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
 RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
 RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
 R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
 R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
 FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
 Call Trace:
  fl_lookup+0x134/0x140 [cls_flower]
  fl_classify+0xf3/0x180 [cls_flower]
  tcf_classify+0x78/0x150
  __netif_receive_skb_core+0x69e/0xa50
  netif_receive_skb_internal+0x42/0xf0
  tun_get_user+0xdd5/0xfd0 [tun]
  tun_sendmsg+0x52/0x70 [tun]
  handle_tx+0x2b3/0x5f0 [vhost_net]
  vhost_worker+0xab/0x100 [vhost]
  kthread+0xf8/0x130
  ret_from_fork+0x35/0x40
 Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
 CR2: 00000000000911d1

Fix the above waiting for a RCU grace period before destroying the
rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
must run in process context, as pointed out by Cong Wang.

v1 -> v2: use tcf_queue_work to run rhashtable_destroy().

Fixes: 05cd271fd6 ("cls_flower: Support multiple masks per priority")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 07:24:07 +09:00
Eric Dumazet 945d015ee0 net/packet: fix use-after-free
We should put copy_skb in receive_queue only after
a successful call to virtio_net_hdr_from_skb().

syzbot report :

BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553

CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __skb_unlink include/linux/skbuff.h:1843 [inline]
 __skb_dequeue include/linux/skbuff.h:1863 [inline]
 skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
 skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
 packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
 packet_release+0x630/0xd90 net/packet/af_packet.c:2991
 __sock_release+0xd7/0x260 net/socket.c:603
 sock_close+0x19/0x20 net/socket.c:1186
 __fput+0x35b/0x8b0 fs/file_table.c:209
 ____fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1b08/0x2750 kernel/exit.c:865
 do_group_exit+0x177/0x440 kernel/exit.c:968
 __do_sys_exit_group kernel/exit.c:979 [inline]
 __se_sys_exit_group kernel/exit.c:977 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4448e9
Code: Bad RIP value.
RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 4553:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
 skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
 tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
 deliver_skb net/core/dev.c:1925 [inline]
 deliver_ptype_list_skb net/core/dev.c:1940 [inline]
 __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
 call_write_iter include/linux/fs.h:1795 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 4553:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
 kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
 __kfree_skb net/core/skbuff.c:642 [inline]
 kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
 tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
 deliver_skb net/core/dev.c:1925 [inline]
 deliver_ptype_list_skb net/core/dev.c:1940 [inline]
 __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
 __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
 netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
 netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
 tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
 tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
 tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
 call_write_iter include/linux/fs.h:1795 [inline]
 new_sync_write fs/read_write.c:474 [inline]
 __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
 vfs_write+0x1f8/0x560 fs/read_write.c:549
 ksys_write+0x101/0x260 fs/read_write.c:598
 __do_sys_write fs/read_write.c:610 [inline]
 __se_sys_write fs/read_write.c:607 [inline]
 __x64_sys_write+0x73/0xb0 fs/read_write.c:607
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801b044ecc0
 which belongs to the cache skbuff_head_cache of size 232
The buggy address is located 0 bytes inside of
 232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
The buggy address belongs to the page:
page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
>ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                           ^
 ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc

Fixes: 58d19b19cd ("packet: vnet_hdr support for tpacket_rcv")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-22 07:17:42 +09:00
Linus Torvalds 27db64f65f NFS client bugfixes for Linux 4.18
Hightlights include:
 
 Bugfixes:
 - Fix an rcu deadlock in nfs_delegation_find_inode()
 - Fix NFSv4 deadlocks due to not freeing the session slot in layoutget
 - Don't send layoutreturn if the layout is already invalid
 - Prevent duplicate XID allocation
 - flexfiles: Don't tie up all the rpciod threads in resends
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbK9avAAoJEA4mA3inWBJclq0P/1VCigyDlsbtdby3z2leV84k
 l0asrGjOQndljJ7I21awAgEo8KvXOd66cMv6YT3+UqEW18aNblH4/ngjyId6hPVb
 RZDX7tsG16ZHEqfe9f9irNZo90mdvuSC4ChJ/CbbPesaK9pblE1d76b/qUVr4FUX
 Gj7JPAC5ckoiZXPFfRWfc+o7JnGvs5wkEuDTy+ig6v7BRdL64hdPG3veRNpmLIAZ
 uS/NCyRpO+nFN/ukmvuoI2ZQ3qfHubHBD+rHxr1UKT/ad7dywLmL2UBaYQ0Tl3bq
 /iSQHutgJYj/80VaRTqdlLt/m4ebUZg+9BEZgM5MvqBWkXcpXND51zxExVJN4cGW
 BOytqjLz0gP1OGb8w+Oow58K8l4XyEgHe2CtZ6Yz8Vwof7nchkpv7RSX50hJFIcA
 YlikeDyDzfOmTT6ove5kF31WQSa3Bk6OMEei0of6hWU3UVHyEdr9az73pm/CLSHE
 /R7w0osU3B9tmQD4btQeJ2DxP+syQwhelOYodyVTwOlkmmGg7DSV7fehnGyH8t8f
 I4Yp8f0raiYGbwonYVE2+zDO140VRETEfTE4XQZnn41fZUfB74oIqk77JtgvGMk2
 /+XFNCYBGadHdSBdxyJmhSjhoAWrhgChEIz1G12SiHrNvqIRY/uHhdCX1Ut5vlPf
 5aqyn/yXm6rUH7aNh/Gd
 =tz0M
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Hightlights include:

   - fix an rcu deadlock in nfs_delegation_find_inode()

   - fix NFSv4 deadlocks due to not freeing the session slot in
     layoutget

   - don't send layoutreturn if the layout is already invalid

   - prevent duplicate XID allocation

   - flexfiles: Don't tie up all the rpciod threads in resends"

* tag 'nfs-for-4.18-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  pNFS/flexfiles: Process writeback resends from nfsiod context as well
  pNFS/flexfiles: Don't tie up all the rpciod threads in resends
  sunrpc: Prevent duplicate XID allocation
  pNFS: Don't send layoutreturn if the layout is already invalid
  pNFS: Always free the session slot on error in nfs4_layoutget_handle_exception
  NFS: Fix an rcu deadlock in nfs_delegation_find_inode()
2018-06-22 06:21:34 +09:00
Marcelo Ricardo Leitner fedb1bd3d2 sctp: fix erroneous inc of snmp SctpFragUsrMsgs
Currently it is incrementing SctpFragUsrMsgs when the user message size
is of the exactly same size as the maximum fragment size, which is wrong.

The fix is to increment it only when user message is bigger than the
maximum fragment size.

Fixes: bfd2e4b873 ("sctp: refactor sctp_datamsg_from_user")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-21 12:49:33 +09:00
Vakul Garg 456488cd95 strparser: Don't schedule in workqueue in paused state
In function strp_data_ready(), it is useless to call queue_work if
the state of strparser is already paused. The state checking should
be done before calling queue_work. The change reduces the context
switches and improves the ktls-rx throughput by approx 20% (measured
on cortex-a53 based platform).

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-21 09:54:05 +09:00
Matteo Croce c24fb5e68e bpfilter: fix user mode helper cross compilation
Use $(OBJDUMP) instead of literal 'objdump' to avoid
using host toolchain when cross compiling.

Fixes: 421780fd49 ("bpfilter: fix build error")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Reported-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-21 09:19:19 +09:00
Willem de Bruijn 9887cba199 ip: limit use of gso_size to udp
The ipcm(6)_cookie field gso_size is set only in the udp path. The ip
layer copies this to cork only if sk_type is SOCK_DGRAM. This check
proved too permissive. Ping and l2tp sockets have the same type.

Limit to sockets of type SOCK_DGRAM and protocol IPPROTO_UDP to
exclude ping sockets.

v1 -> v2
- remove irrelevant whitespace changes

Fixes: bec1f6f697 ("udp: generate gso with UDP_SEGMENT")
Reported-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 14:41:04 +09:00
Matteo Croce 8b26a06ad4 bpfilter: ignore binary files
net/bpfilter/bpfilter_umh is a binary file generated when bpfilter is
enabled, add it to .gitignore to avoid committing it.

Fixes: d2ba09c17a ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 09:07:46 +09:00
Matteo Croce 421780fd49 bpfilter: fix build error
bpfilter Makefile assumes that the system locale is en_US, and the
parsing of objdump output fails.
Set LC_ALL=C and, while at it, rewrite the objdump parsing so it spawns
only 2 processes instead of 7.

Fixes: d2ba09c17a ("net: add skeleton of bpfilter kernel module")
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 09:07:46 +09:00
Davide Caratti cbf56c2962 net/sched: act_ife: preserve the action control in case of error
in the following script

 # tc actions add action ife encode allow prio pass index 42
 # tc actions replace action ife encode allow tcindex drop index 42

the action control should remain equal to 'pass', if the kernel failed
to replace the TC action. Pospone the assignment of the action control,
to ensure it is not overwritten in the error path of tcf_ife_init().

Fixes: ef6980b6be ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 09:03:58 +09:00
Davide Caratti 0a889b9404 net/sched: act_ife: fix recursive lock and idr leak
a recursive lock warning [1] can be observed with the following script,

 # $TC actions add action ife encode allow prio pass index 42
 IFE type 0xED3E
 # $TC actions replace action ife encode allow tcindex pass index 42

in case the kernel was unable to run the last command (e.g. because of
the impossibility to load 'act_meta_skbtcindex'). For a similar reason,
the kernel can leak idr in the error path of tcf_ife_init(), because
tcf_idr_release() is not called after successful idr reservation:

 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No such file or directory
 We have an error talking to the kernel
 # $TC actions add action ife encode allow tcindex index 47
 IFE type 0xED3E
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # $TC actions add action ife encode use mark 7 type 0xfefe pass index 47
 IFE type 0xFEFE
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

Since tcfa_lock is already taken when the action is being edited, a call
to tcf_idr_release() wrongly makes tcf_idr_cleanup() take the same lock
again. On the other hand, tcf_idr_release() needs to be called in the
error path of tcf_ife_init(), to undo the last tcf_idr_create() invocation.
Fix both problems in tcf_ife_init().
Since the cleanup() routine can now be called when ife->params is NULL,
also add a NULL pointer check to avoid calling kfree_rcu(NULL, rcu).

 [1]
 ============================================
 WARNING: possible recursive locking detected
 4.17.0-rc4.kasan+ #417 Tainted: G            E
 --------------------------------------------
 tc/3932 is trying to acquire lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_cleanup+0x19/0x80 [act_ife]

 but task is already holding lock:
 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(&p->tcfa_lock)->rlock);
   lock(&(&p->tcfa_lock)->rlock);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 2 locks held by tc/3932:
  #0: 000000007ca8e990 (rtnl_mutex){+.+.}, at: tcf_ife_init+0xf61/0x13c0 [act_ife]
  #1: 000000005097c9a6 (&(&p->tcfa_lock)->rlock){+...}, at: tcf_ife_init+0xf6d/0x13c0 [act_ife]

 stack backtrace:
 CPU: 3 PID: 3932 Comm: tc Tainted: G            E     4.17.0-rc4.kasan+ #417
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 Call Trace:
  dump_stack+0x9a/0xeb
  __lock_acquire+0xf43/0x34a0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? debug_check_no_locks_freed+0x2b0/0x2b0
  ? __mutex_lock+0x62f/0x1240
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_acquire+0x10b/0x330
  lock_acquire+0x10b/0x330
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  _raw_spin_lock_bh+0x38/0x70
  ? tcf_ife_cleanup+0x19/0x80 [act_ife]
  tcf_ife_cleanup+0x19/0x80 [act_ife]
  __tcf_idr_release+0xff/0x350
  tcf_ife_init+0xdde/0x13c0 [act_ife]
  ? ife_exit_net+0x290/0x290 [act_ife]
  ? __lock_is_held+0xb4/0x140
  tcf_action_init_1+0x67b/0xad0
  ? tcf_action_dump_old+0xa0/0xa0
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? memset+0x1f/0x40
  tcf_action_init+0x30f/0x590
  ? tcf_action_init_1+0xad0/0xad0
  ? memset+0x1f/0x40
  tc_ctl_action+0x48e/0x5e0
  ? mutex_lock_io_nested+0x1160/0x1160
  ? tca_action_gd+0x990/0x990
  ? sched_clock+0x5/0x10
  ? find_held_lock+0x39/0x1d0
  rtnetlink_rcv_msg+0x4da/0x990
  ? validate_linkmsg+0x680/0x680
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  netlink_rcv_skb+0x127/0x350
  ? validate_linkmsg+0x680/0x680
  ? netlink_ack+0x970/0x970
  ? __kmalloc_node_track_caller+0x304/0x3a0
  netlink_unicast+0x40f/0x5d0
  ? netlink_attachskb+0x580/0x580
  ? _copy_from_iter_full+0x187/0x760
  ? import_iovec+0x90/0x390
  netlink_sendmsg+0x67f/0xb50
  ? netlink_unicast+0x5d0/0x5d0
  ? copy_msghdr_from_user+0x206/0x340
  ? netlink_unicast+0x5d0/0x5d0
  sock_sendmsg+0xb3/0xf0
  ___sys_sendmsg+0x60a/0x8b0
  ? copy_msghdr_from_user+0x340/0x340
  ? lock_downgrade+0x5e0/0x5e0
  ? tty_write_lock+0x18/0x50
  ? kvm_sched_clock_read+0x1a/0x30
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1d0
  ? lock_downgrade+0x5e0/0x5e0
  ? lock_acquire+0x10b/0x330
  ? __audit_syscall_entry+0x316/0x690
  ? current_kernel_time64+0x6b/0xd0
  ? __fget_light+0x55/0x1f0
  ? __sys_sendmsg+0xd2/0x170
  __sys_sendmsg+0xd2/0x170
  ? __ia32_sys_shutdown+0x70/0x70
  ? syscall_trace_enter+0x57a/0xd60
  ? rcu_read_lock_sched_held+0xdc/0x110
  ? __bpf_trace_sys_enter+0x10/0x10
  ? do_syscall_64+0x22/0x480
  do_syscall_64+0xa5/0x480
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
 RIP: 0033:0x7fd646988ba0
 RSP: 002b:00007fffc9fab3c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 00007fffc9fab4f0 RCX: 00007fd646988ba0
 RDX: 0000000000000000 RSI: 00007fffc9fab440 RDI: 0000000000000003
 RBP: 000000005b28c8b3 R08: 0000000000000002 R09: 0000000000000000
 R10: 00007fffc9faae20 R11: 0000000000000246 R12: 0000000000000000
 R13: 00007fffc9fab504 R14: 0000000000000001 R15: 000000000066c100

Fixes: 4e8c861550 ("net sched: net sched: ife action fix late binding")
Fixes: ef6980b6be ("introduce IFE action")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 09:03:58 +09:00
Li RongQing 7892bd0810 net: propagate dev_get_valid_name return code
if dev_get_valid_name failed, propagate its return code

and remove the setting err to ENODEV, it will be set to
0 again before dev_change_net_namespace exits.

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 08:12:57 +09:00
David Ahern 8c43bd1706 net/tcp: Fix socket lookups with SO_BINDTODEVICE
Similar to 69678bcd4d ("udp: fix SO_BINDTODEVICE"), TCP socket lookups
need to fail if dev_match is not true. Currently, a packet to a given port
can match a socket bound to device when it should not. In the VRF case,
this causes the lookup to hit a VRF socket and not a global socket
resulting in a response trying to go through the VRF when it should not.

Fixes: 3fa6f616a7 ("net: ipv4: add second dif to inet socket lookups")
Fixes: 4297a0ef08 ("net: ipv6: add second dif to inet6 socket lookups")
Reported-by: Lou Berger <lberger@labn.net>
Diagnosed-by: Renato Westphal <renato@opensourcerouting.org>
Tested-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 08:03:06 +09:00
Eric Dumazet 9b0a8da8c4 net/ipv6: respect rcu grace period before freeing fib6_info
syzbot reported use after free that is caused by fib6_info being
freed without a proper RCU grace period.

CPU: 0 PID: 1407 Comm: udevd Not tainted 4.17.0+ #39
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 __read_once_size include/linux/compiler.h:188 [inline]
 find_rr_leaf net/ipv6/route.c:705 [inline]
 rt6_select net/ipv6/route.c:761 [inline]
 fib6_table_lookup+0x12b7/0x14d0 net/ipv6/route.c:1823
 ip6_pol_route+0x1c2/0x1020 net/ipv6/route.c:1856
 ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2082
 fib6_rule_lookup+0x211/0x6d0 net/ipv6/fib6_rules.c:122
 ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2110
 ip6_route_output include/net/ip6_route.h:82 [inline]
 icmpv6_xrlim_allow net/ipv6/icmp.c:211 [inline]
 icmp6_send+0x147c/0x2da0 net/ipv6/icmp.c:535
 icmpv6_send+0x17a/0x300 net/ipv6/ip6_icmp.c:43
 ip6_link_failure+0xa5/0x790 net/ipv6/route.c:2244
 dst_link_failure include/net/dst.h:427 [inline]
 ndisc_error_report+0xd1/0x1c0 net/ipv6/ndisc.c:695
 neigh_invalidate+0x246/0x550 net/core/neighbour.c:892
 neigh_timer_handler+0xaf9/0xde0 net/core/neighbour.c:978
 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
 expire_timers kernel/time/timer.c:1363 [inline]
 __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:284
 invoke_softirq kernel/softirq.c:364 [inline]
 irq_exit+0x1d1/0x200 kernel/softirq.c:404
 exiting_irq arch/x86/include/asm/apic.h:527 [inline]
 smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863
 </IRQ>
RIP: 0010:strlen+0x5e/0xa0 lib/string.c:482
Code: 24 00 74 3b 48 bb 00 00 00 00 00 fc ff df 4c 89 e0 48 83 c0 01 48 89 c2 48 89 c1 48 c1 ea 03 83 e1 07 0f b6 14 1a 38 ca 7f 04 <84> d2 75 23 80 38 00 75 de 48 83 c4 08 4c 29 e0 5b 41 5c 5d c3 48
RSP: 0018:ffff8801af117850 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RAX: ffff880197f53bd0 RBX: dffffc0000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff81c5b06c RDI: ffff880197f53bc0
RBP: ffff8801af117868 R08: ffff88019a976540 R09: 0000000000000000
R10: ffff88019a976540 R11: 0000000000000000 R12: ffff880197f53bc0
R13: ffff880197f53bc0 R14: ffffffff899e4e90 R15: ffff8801d91c6a00
 strlen include/linux/string.h:267 [inline]
 getname_kernel+0x24/0x370 fs/namei.c:218
 open_exec+0x17/0x70 fs/exec.c:882
 load_elf_binary+0x968/0x5610 fs/binfmt_elf.c:780
 search_binary_handler+0x17d/0x570 fs/exec.c:1653
 exec_binprm fs/exec.c:1695 [inline]
 __do_execve_file.isra.35+0x16fe/0x2710 fs/exec.c:1819
 do_execveat_common fs/exec.c:1866 [inline]
 do_execve fs/exec.c:1883 [inline]
 __do_sys_execve fs/exec.c:1964 [inline]
 __se_sys_execve fs/exec.c:1959 [inline]
 __x64_sys_execve+0x8f/0xc0 fs/exec.c:1959
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1576a46207
Code: 77 19 f4 48 89 d7 44 89 c0 0f 05 48 3d 00 f0 ff ff 76 e0 f7 d8 64 41 89 01 eb d8 f7 d8 64 41 89 01 eb df b8 3b 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 02 f3 c3 48 8b 15 00 8c 2d 00 f7 d8 64 89 02
RSP: 002b:00007ffff2784568 EFLAGS: 00000202 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1576a46207
RDX: 0000000001215b10 RSI: 00007ffff2784660 RDI: 00007ffff2785670
RBP: 0000000000625500 R08: 000000000000589c R09: 000000000000589c
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000001215b10
R13: 0000000000000007 R14: 0000000001204250 R15: 0000000000000005

Allocated by task 12188:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:513 [inline]
 kzalloc include/linux/slab.h:706 [inline]
 fib6_info_alloc+0xbb/0x280 net/ipv6/ip6_fib.c:152
 ip6_route_info_create+0x782/0x2b50 net/ipv6/route.c:3013
 ip6_route_add+0x23/0xb0 net/ipv6/route.c:3154
 ipv6_route_ioctl+0x5a5/0x760 net/ipv6/route.c:3660
 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
 sock_ioctl+0x30d/0x680 net/socket.c:1097
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 1402:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 fib6_info_destroy+0x29b/0x350 net/ipv6/ip6_fib.c:207
 fib6_info_release include/net/ip6_fib.h:286 [inline]
 __ip6_del_rt_siblings net/ipv6/route.c:3235 [inline]
 ip6_route_del+0x11c4/0x13b0 net/ipv6/route.c:3316
 ipv6_route_ioctl+0x616/0x760 net/ipv6/route.c:3663
 inet6_ioctl+0x100/0x1f0 net/ipv6/af_inet6.c:546
 sock_do_ioctl+0xe4/0x3e0 net/socket.c:973
 sock_ioctl+0x30d/0x680 net/socket.c:1097
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:500 [inline]
 do_vfs_ioctl+0x1cf/0x16f0 fs/ioctl.c:684
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701
 __do_sys_ioctl fs/ioctl.c:708 [inline]
 __se_sys_ioctl fs/ioctl.c:706 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The buggy address belongs to the object at ffff8801b5df2580
 which belongs to the cache kmalloc-256 of size 256
The buggy address is located 8 bytes inside of
 256-byte region [ffff8801b5df2580, ffff8801b5df2680)
The buggy address belongs to the page:
page:ffffea0006d77c80 count:1 mapcount:0 mapping:ffff8801da8007c0 index:0xffff8801b5df2e40
flags: 0x2fffc0000000100(slab)
raw: 02fffc0000000100 ffffea0006c5cc48 ffffea0007363308 ffff8801da8007c0
raw: ffff8801b5df2e40 ffff8801b5df2080 0000000100000006 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8801b5df2480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b5df2500: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
> ffff8801b5df2580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                      ^
 ffff8801b5df2600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8801b5df2680: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb

Fixes: a64efe142f ("net/ipv6: introduce fib6_info struct and helpers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: David Ahern <dsahern@gmail.com>
Reported-by: syzbot+9e6d75e3edef427ee888@syzkaller.appspotmail.com
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 07:57:23 +09:00
Joel Stanley 6e42a3f5cd net/ncsi: Use netdev_dbg for debug messages
This moves all of the netdev_printk(KERN_DEBUG, ...) messages over to
netdev_dbg.

As Joe explains:

> netdev_dbg is not included in object code unless
> DEBUG is defined or CONFIG_DYNAMIC_DEBUG is set.
> And then, it is not emitted into the log unless
> DEBUG is set or this specific netdev_dbg is enabled
> via the dynamic debug control file.

Which is what we're after in this case.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 07:26:58 +09:00
Joel Stanley 5d3b146736 net/ncsi: Drop no more channels message
This does not provide useful information. As the ncsi maintainer said:

 > either we get a channel or broadcom has gone out to lunch

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 07:26:58 +09:00
Joel Stanley 87975a0117 net/ncsi: Silence debug messages
In normal operation we see this series of messages as the host drives
the network device:

 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
 ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
 ftgmac100 1e660000.ethernet eth0: NCSI interface down
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI interface up
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state down
 ftgmac100 1e660000.ethernet eth0: NCSI: suspending channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI: channel 0 link down after config
 ftgmac100 1e660000.ethernet eth0: NCSI interface down
 ftgmac100 1e660000.ethernet eth0: NCSI: LSC AEN - channel 0 state up
 ftgmac100 1e660000.ethernet eth0: NCSI: configuring channel 0
 ftgmac100 1e660000.ethernet eth0: NCSI interface up

This makes all of these messages netdev_dbg. They are still useful to
debug eg. misbehaving network device firmware, but we do not need them
filling up the kernel logs in normal operation.

Acked-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 07:26:58 +09:00
Chuck Lever 0dae72d581 sunrpc: Prevent duplicate XID allocation
Krzysztof Kozlowski <krzk@kernel.org> reports that a heavy NFSv4
WRITE workload against a slow NFS server causes his Raspberry Pi
clients to stall. Krzysztof bisected it to commit 37ac86c3a7
("SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock") .

I was able to reproduce similar behavior and it appears that rarely
the RPC client layer is re-allocating an XID for an RPC that it has
already partially sent. This results in the client ignoring the
subsequent reply, which carries the original XID.

For various reasons, checking !req->rq_xmit_bytes_sent in
xprt_prepare_transmit is not a 100% reliable mechanism for
determining when a fresh XID is needed.

Trond's preference is to allocate the XID at the time each rpc_rqst
slot is initialized.

This patch should also address a gcc 4.1.2 complaint reported by
Geert Uytterhoeven <geert@linux-m68k.org>.

Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Fixes: 37ac86c3a7 ("SUNRPC: Initialize rpc_rqst outside of ... ")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-06-19 08:53:48 -04:00
Luca Coelho 41cbb0f5a2 mac80211: add support for HE
Add support for HE in mac80211 conforming with P802.11ax_D1.4.

Johannes: Fix another bug with the buf_size comparison in agg-rx.c.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-18 22:40:32 +02:00
Johannes Berg b8042b3da9 ieee80211: bump IEEE80211_MAX_AMPDU_BUF to support HE
Bump the IEEE80211_MAX_AMPDU_BUF size to 0x100 for HE support
and - for now - use IEEE80211_MAX_AMPDU_BUF_HT everywhere.

This is derived from my internal patch, parts of which Luca
had sent upstream.

Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2018-06-18 22:39:39 +02:00
Gao Feng ad9852af97 netfilter: nf_ct_helper: Fix possible panic after nf_conntrack_helper_unregister
The helper module would be unloaded after nf_conntrack_helper_unregister,
so it may cause a possible panic caused by race.

nf_ct_iterate_destroy(unhelp, me) reset the helper of conntrack as NULL,
but maybe someone has gotten the helper pointer during this period. Then
it would panic, when it accesses the helper and the module was unloaded.

Take an example as following:
CPU0                                                   CPU1
ctnetlink_dump_helpinfo
helper = rcu_dereference(help->helper);
                                                       unhelp
                                                       set helper as NULL
                                                       unload helper module
helper->to_nlattr(skb, ct);

As above, the cpu0 tries to access the helper and its module is unloaded,
then the panic happens.

Signed-off-by: Gao Feng <gfree.wind@vip.163.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-18 14:15:12 +02:00
Eric Dumazet 9ce7bc036a netfilter: ipv6: nf_defrag: reduce struct net memory waste
It is a waste of memory to use a full "struct netns_sysctl_ipv6"
while only one pointer is really used, considering netns_sysctl_ipv6
keeps growing.

Also, since "struct netns_frags" has cache line alignment,
it is better to move the frags_hdr pointer outside, otherwise
we spend a full cache line for this pointer.

This saves 192 bytes of memory per netns.

Fixes: c038a767cd ("ipv6: add a new namespace for nf_conntrack_reasm")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-18 14:13:25 +02:00
Eric Dumazet ba062ebb2c netfilter: nf_queue: augment nfqa_cfg_policy
Three attributes are currently not verified, thus can trigger KMSAN
warnings such as :

BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline]
BUG: KMSAN: uninit-value in nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
CPU: 1 PID: 4521 Comm: syz-executor120 Not tainted 4.17.0+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x185/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1117
 __msan_warning_32+0x70/0xc0 mm/kmsan/kmsan_instr.c:620
 __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline]
 __fswab32 include/uapi/linux/swab.h:59 [inline]
 nfqnl_recv_config+0x939/0x17d0 net/netfilter/nfnetlink_queue.c:1268
 nfnetlink_rcv_msg+0xb2e/0xc80 net/netfilter/nfnetlink.c:212
 netlink_rcv_skb+0x37e/0x600 net/netlink/af_netlink.c:2448
 nfnetlink_rcv+0x2fe/0x680 net/netfilter/nfnetlink.c:513
 netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
 netlink_unicast+0x1680/0x1750 net/netlink/af_netlink.c:1336
 netlink_sendmsg+0x104f/0x1350 net/netlink/af_netlink.c:1901
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg net/socket.c:639 [inline]
 ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
 __sys_sendmsg net/socket.c:2155 [inline]
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x43fd59
RSP: 002b:00007ffde0e30d28 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd59
RDX: 0000000000000000 RSI: 0000000020000080 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401680
R13: 0000000000401710 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:279 [inline]
 kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:189
 kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:315
 kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan.c:322
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2753 [inline]
 __kmalloc_node_track_caller+0xb35/0x11b0 mm/slub.c:4395
 __kmalloc_reserve net/core/skbuff.c:138 [inline]
 __alloc_skb+0x2cb/0x9e0 net/core/skbuff.c:206
 alloc_skb include/linux/skbuff.h:988 [inline]
 netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
 netlink_sendmsg+0x76e/0x1350 net/netlink/af_netlink.c:1876
 sock_sendmsg_nosec net/socket.c:629 [inline]
 sock_sendmsg net/socket.c:639 [inline]
 ___sys_sendmsg+0xec8/0x1320 net/socket.c:2117
 __sys_sendmsg net/socket.c:2155 [inline]
 __do_sys_sendmsg net/socket.c:2164 [inline]
 __se_sys_sendmsg net/socket.c:2162 [inline]
 __x64_sys_sendmsg+0x331/0x460 net/socket.c:2162
 do_syscall_64+0x15b/0x230 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: fdb694a01f ("netfilter: Add fail-open support")
Fixes: 829e17a1a6 ("[NETFILTER]: nfnetlink_queue: allow changing queue length through netlink")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-18 14:13:24 +02:00
Konstantin Khlebnikov 7e85dc8cb3 net_sched: blackhole: tell upper qdisc about dropped packets
When blackhole is used on top of classful qdisc like hfsc it breaks
qlen and backlog counters because packets are disappear without notice.

In HFSC non-zero qlen while all classes are inactive triggers warning:
WARNING: ... at net/sched/sch_hfsc.c:1393 hfsc_dequeue+0xba4/0xe90 [sch_hfsc]
and schedules watchdog work endlessly.

This patch return __NET_XMIT_BYPASS in addition to NET_XMIT_SUCCESS,
this flag tells upper layer: this packet is gone and isn't queued.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-17 08:42:33 +09:00
David Woodhouse 9bbe60a67b atm: Preserve value of skb->truesize when accounting to vcc
ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
which they are to be sent. But it doesn't take ownership of those
packets from the sock (if any) which originally owned them. They should
remain owned by their actual sender until they've left the box.

There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs, precisely to avoid messing up sk_wmem_alloc
accounting. Ideally that hack would cover the ATM use case too, but it
doesn't — skbs which aren't owned by any sock, for example PPP control
frames, still get their truesize adjusted when the low-level ATM driver
adds headroom.

This has always been an issue, it seems. The truesize of a packet
increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
for normal traffic, only for control frames. So I think we just got away
with it, and we probably needed to send 2GiB of LCP echo frames before
the misaccounting would ever have caused a problem and caused
atm_may_send() to start refusing packets.

Commit 14afee4b60 ("net: convert sock.sk_wmem_alloc from atomic_t to
refcount_t") did exactly what it was intended to do, and turned this
mostly-theoretical problem into a real one, causing PPPoATM to fail
immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
starts refusing to allow new packets.

The least intrusive solution to this problem is to stash the value of
skb->truesize that was accounted to the VCC, in a new member of the
ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
value instead of the then-current value of skb->truesize.

Fixes: 158f323b98 ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: David Woodhouse <dwmw2@infradead.org>
Tested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-17 08:27:01 +09:00
David S. Miller 0841d98641 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2018-06-16

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Fix a panic in devmap handling in generic XDP where return type
   of __devmap_lookup_elem() got changed recently but generic XDP
   code missed the related update, from Toshiaki.

2) Fix a freeze when BPF progs are loaded that include BPF to BPF
   calls when JIT is enabled where we would later bail out via error
   path w/o dropping kallsyms, and another one to silence syzkaller
   splats from locking prog read-only, from Daniel.

3) Fix a bug in test_offloads.py BPF selftest which must not assume
   that the underlying system have no BPF progs loaded prior to test,
   and one in bpftool to fix accuracy of program load time, from Jakub.

4) Fix a bug in bpftool's probe for availability of the bpf(2)
   BPF_TASK_FD_QUERY subcommand, from Yonghong.

5) Fix a regression in AF_XDP's XDP_SKB receive path where queue
   id check got erroneously removed, from Björn.

6) Fix missing state cleanup in BPF's xfrm tunnel test, from William.

7) Check tunnel type more accurately in BPF's tunnel collect metadata
   kselftest, from Jian.

8) Fix missing Kconfig fragments for BPF kselftests, from Anders.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-17 07:54:24 +09:00
Linus Torvalds 9215310cf1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Various netfilter fixlets from Pablo and the netfilter team.

 2) Fix regression in IPVS caused by lack of PMTU exceptions on local
    routes in ipv6, from Julian Anastasov.

 3) Check pskb_trim_rcsum for failure in DSA, from Zhouyang Jia.

 4) Don't crash on poll in TLS, from Daniel Borkmann.

 5) Revert SO_REUSE{ADDR,PORT} change, it regresses various things
    including Avahi mDNS. From Bart Van Assche.

 6) Missing of_node_put in qcom/emac driver, from Yue Haibing.

 7) We lack checking of the TCP checking in one special case during SYN
    receive, from Frank van der Linden.

 8) Fix module init error paths of mac80211 hwsim, from Johannes Berg.

 9) Handle 802.1ad properly in stmmac driver, from Elad Nachman.

10) Must grab HW caps before doing quirk checks in stmmac driver, from
    Jose Abreu.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  net: stmmac: Run HWIF Quirks after getting HW caps
  neighbour: skip NTF_EXT_LEARNED entries during forced gc
  net: cxgb3: add error handling for sysfs_create_group
  tls: fix waitall behavior in tls_sw_recvmsg
  tls: fix use-after-free in tls_push_record
  l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
  l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
  mlxsw: spectrum_switchdev: Fix port_vlan refcounting
  mlxsw: spectrum_router: Align with new route replace logic
  mlxsw: spectrum_router: Allow appending to dev-only routes
  ipv6: Only emit append events for appended routes
  stmmac: added support for 802.1ad vlan stripping
  cfg80211: fix rcu in cfg80211_unregister_wdev
  mac80211: Move up init of TXQs
  mac80211_hwsim: fix module init error paths
  cfg80211: initialize sinfo in cfg80211_get_station
  nl80211: fix some kernel doc tag mistakes
  hv_netvsc: Fix the variable sizes in ipsecv2 and rsc offload
  rds: avoid unenecessary cong_update in loop transport
  l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
  ...
2018-06-16 07:39:34 +09:00
Toshiaki Makita 6d5fc19579 xdp: Fix handling of devmap in generic XDP
Commit 67f29e07e1 ("bpf: devmap introduce dev_map_enqueue") changed
the return value type of __devmap_lookup_elem() from struct net_device *
to struct bpf_dtab_netdev * but forgot to modify generic XDP code
accordingly.

Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
net_device is expected, then skb->dev was set to invalid value.

v2:
- Fix compiler warning without CONFIG_BPF_SYSCALL.

Fixes: 67f29e07e1 ("bpf: devmap introduce dev_map_enqueue")
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-15 23:47:15 +02:00
Roopa Prabhu f6a6f203d5 neighbour: skip NTF_EXT_LEARNED entries during forced gc
Commit 9ce33e4653 ("neighbour: support for NTF_EXT_LEARNED flag")
added support for NTF_EXT_LEARNED for neighbour entries.
NTF_EXT_LEARNED entries are neigh entries managed by control
plane (eg: Ethernet VPN implementation in FRR routing suite).
Periodic gc already excludes these entries. This patch extends
it to forced gc which the earlier patch missed.

Fixes: 9ce33e4653 ("neighbour: support for NTF_EXT_LEARNED flag")
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:37:33 -07:00
Daniel Borkmann 06030dbaf3 tls: fix waitall behavior in tls_sw_recvmsg
Current behavior in tls_sw_recvmsg() is to wait for incoming tls
messages and copy up to exactly len bytes of data that the user
provided. This is problematic in the sense that i) if no packet
is currently queued in strparser we keep waiting until one has been
processed and pushed into tls receive layer for tls_wait_data() to
wake up and push the decrypted bits to user space. Given after
tls decryption, we're back at streaming data, use sock_rcvlowat()
hint from tcp socket instead. Retain current behavior with MSG_WAITALL
flag and otherwise use the hint target for breaking the loop and
returning to application. This is done if currently no ctx->recv_pkt
is ready, otherwise continue to process it from our strparser
backlog.

Fixes: c46234ebb4 ("tls: RX path for ktls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:14:30 -07:00
Daniel Borkmann a447da7d00 tls: fix use-after-free in tls_push_record
syzkaller managed to trigger a use-after-free in tls like the
following:

  BUG: KASAN: use-after-free in tls_push_record.constprop.15+0x6a2/0x810 [tls]
  Write of size 1 at addr ffff88037aa08000 by task a.out/2317

  CPU: 3 PID: 2317 Comm: a.out Not tainted 4.17.0+ #144
  Hardware name: LENOVO 20FBCTO1WW/20FBCTO1WW, BIOS N1FET47W (1.21 ) 11/28/2016
  Call Trace:
   dump_stack+0x71/0xab
   print_address_description+0x6a/0x280
   kasan_report+0x258/0x380
   ? tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_push_record.constprop.15+0x6a2/0x810 [tls]
   tls_sw_push_pending_record+0x2e/0x40 [tls]
   tls_sk_proto_close+0x3fe/0x710 [tls]
   ? tcp_check_oom+0x4c0/0x4c0
   ? tls_write_space+0x260/0x260 [tls]
   ? kmem_cache_free+0x88/0x1f0
   inet_release+0xd6/0x1b0
   __sock_release+0xc0/0x240
   sock_close+0x11/0x20
   __fput+0x22d/0x660
   task_work_run+0x114/0x1a0
   do_exit+0x71a/0x2780
   ? mm_update_next_owner+0x650/0x650
   ? handle_mm_fault+0x2f5/0x5f0
   ? __do_page_fault+0x44f/0xa50
   ? mm_fault_error+0x2d0/0x2d0
   do_group_exit+0xde/0x300
   __x64_sys_exit_group+0x3a/0x50
   do_syscall_64+0x9a/0x300
   ? page_fault+0x8/0x30
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.

Fixes: 3c4d755915 ("tls: kernel TLS support")
Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com
Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:14:30 -07:00
Guillaume Nault ecd012e45a l2tp: filter out non-PPP sessions in pppol2tp_tunnel_ioctl()
pppol2tp_tunnel_ioctl() can act on an L2TPv3 tunnel, in which case
'session' may be an Ethernet pseudo-wire.

However, pppol2tp_session_ioctl() expects a PPP pseudo-wire, as it
assumes l2tp_session_priv() points to a pppol2tp_session structure. For
an Ethernet pseudo-wire l2tp_session_priv() points to an l2tp_eth_sess
structure instead, making pppol2tp_session_ioctl() access invalid
memory.

Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:12:37 -07:00
Guillaume Nault de9bada5d3 l2tp: reject creation of non-PPP sessions on L2TPv2 tunnels
The /proc/net/pppol2tp handlers (pppol2tp_seq_*()) iterate over all
L2TPv2 tunnels, and rightfully expect that only PPP sessions can be
found there. However, l2tp_netlink accepts creating Ethernet sessions
regardless of the underlying tunnel version.

This confuses pppol2tp_seq_session_show(), which expects that
l2tp_session_priv() returns a pppol2tp_session structure. When the
session is an Ethernet pseudo-wire, a struct l2tp_eth_sess is returned
instead. This leads to invalid memory access when
pppol2tp_session_get_sock() later tries to dereference ps->sk.

Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:12:37 -07:00
Ido Schimmel 6eba08c362 ipv6: Only emit append events for appended routes
Current code will emit an append event in the FIB notification chain for
any route added with NLM_F_APPEND set, even if the route was not
appended to any existing route.

This is inconsistent with IPv4 where such an event is only emitted when
the new route is appended after an existing one.

Align IPv6 behavior with IPv4, thereby allowing listeners to more easily
handle these events.

Fixes: f34436a430 ("net/ipv6: Simplify route replace and appending into multipath route")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:11:16 -07:00
David S. Miller 41f9ba67f0 A handful of fixes:
* missing RCU grace period enforcement led to drivers freeing
    data structures before; fix from Dedy Lansky.
  * hwsim module init error paths were messed up; fixed it myself
    after a report from Colin King (who had sent a partial patch)
  * kernel-doc tag errors; fix from Luca Coelho
  * initialize the on-stack sinfo data structure when getting
    station information; fix from Sven Eckelmann
  * TXQ state dumping is now done from init, and when TXQs aren't
    initialized yet at that point, bad things happen, move the
    initialization; fix from Toke Høiland-Jørgensen.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAlsjswEACgkQB8qZga/f
 l8T7pg/8CkKKwJRR/0ZIrYOyku/yEuZHpZ071/9gFhUna86zEymC/NDDuu27frny
 G4TYJuQl/x5fyTLMteFkLd4oqiBP43X5Mh/UZtTG5jZVfU2W5PQSALSHn8SEQID6
 u2hQBY1vM82CQbgUH6UjlIps40Y92urP5OG6LhT9F/9DijPYgrkxoNote+4x5wgM
 gdL3yvXrmsExdgRFjuHTUXX8pT2J01UyxJAmuF0k5jMQzj6JGnwrLiSjUe4/a1qn
 qJC72GYAezXVjv8noIW+B+e5jwABj4AbL9lJbU8lxHymTykTVqLlJbtMWygFoF2Y
 rw8Gc2+eqafDnHzlgO0GXls2X89cbpmxBd3rdxq6J16oiAyNDmILw+w5Ap85bzck
 1lQ3OvzOi25+qTndHBuV4wK066UbYJVCybgN25RRbM0q+soTHRZrpk8wFC27fGW8
 Qt1BmyLBMvTDBkjBQ983QxLOJ8o0iNMuyKPxnULmveBRuhGfrDVOf8w7ueDSGJ61
 vlTBA3gUBCHrtl7L67Tr6iun4y8n8YWhOz6VljaN6L5FFg5fVX15d9me6hXmAtar
 ETbQFTPNWpFxYRElaIe+RFdtLkBFbE9rGLUZ1gQZ+WhH5YT+hDSqLm9WOJ61PbTv
 VmNKrelwjdXL06Ei6O6ptRO7hoy+N0Z2khqAOpvNwaE30NojGqQ=
 =siAL
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-for-davem-2018-06-15' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A handful of fixes:
 * missing RCU grace period enforcement led to drivers freeing
   data structures before; fix from Dedy Lansky.
 * hwsim module init error paths were messed up; fixed it myself
   after a report from Colin King (who had sent a partial patch)
 * kernel-doc tag errors; fix from Luca Coelho
 * initialize the on-stack sinfo data structure when getting
   station information; fix from Sven Eckelmann
 * TXQ state dumping is now done from init, and when TXQs aren't
   initialized yet at that point, bad things happen, move the
   initialization; fix from Toke Høiland-Jørgensen.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-15 09:08:26 -07:00
Luca Coelho c4cbaf7973 cfg80211: Add support for HE
Add support for the HE in cfg80211 and also add userspace API to
nl80211 to send rate information out, conforming with P802.11ax_D2.0.

Signed-off-by: Liad Kaufman <liad.kaufman@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Signed-off-by: Ido Yariv <idox.yariv@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 14:03:56 +02:00
Antonio Quartulli 446faa15c6 nl80211: report 4ADDR status with GET_INTERFACE
User space tools might be interested in knowing the current
status of the 4ADDR property of an interface (when supported).

Send the status along with the other attributes when replying
to a GET_INTERFACE netlink query.

Signed-off-by: Antonio Quartulli <a@unstable.cc>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:38:40 +02:00
Johannes Berg b9771d41ae mac80211: support scan features for improved scan privacy
Support the new random SN and minimal probe request contents
scan flags for the case of software scan - for hardware scan
the drivers need to opt in, but may need to do only that,
depending on their implementation.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:40 +02:00
Johannes Berg 2e076f1990 nl80211: add scan features for improved scan privacy
Add the scan flags for randomized SN and minimized probe request
content for improved scan privacy.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:33 +02:00
Johannes Berg 45ad683484 mac80211: split ieee80211_send_probe_req()
This function is passed many more parameters in the scan case
than in the MLME case, and differentiates the two cases inside.
Split it up and make both versions static to simplify things.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:25 +02:00
Johannes Berg 00387f3215 mac80211: add probe request building flags
Add flags to pass through to probe request building and
change the "bool directed" to be one of them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:21 +02:00
Johannes Berg db0a4ad80d nl80211: refactor common code in scan flags checks
There's a very common pattern to check for a scan flag and
then reject it if an extended feature flag isn't set, factor
this out into a helper function.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:16 +02:00
Johannes Berg 1d211d4316 cfg80211: use better order for kcalloc() arguments
The arguments should be (# of elements, size of each) instead
of the other way around, which really ends up being mostly
equivalent but smatch complains about it, so swap them.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:34:06 +02:00
Dedy Lansky bf2b61a683 cfg80211: fix rcu in cfg80211_unregister_wdev
Callers of cfg80211_unregister_wdev can free the wdev object
immediately after this function returns. This may crash the kernel
because this wdev object is still in use by other threads.
Add synchronize_rcu() after list_del_rcu to make sure wdev object can
be safely freed.

Signed-off-by: Dedy Lansky <dlansky@codeaurora.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:05:14 +02:00
Toke Høiland-Jørgensen dc8b274f09 mac80211: Move up init of TXQs
On init, ieee80211_if_add() dumps the interface. Since that now includes a
dump of the TXQ state, we need to initialise that before the dump happens.
So move up the TXQ initialisation to to before the call to
ieee80211_if_add().

Fixes: 52539ca89f ("cfg80211: Expose TXQ stats and parameters to userspace")
Reported-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
Tested-by: Niklas Cassel <niklas.cassel@linaro.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:02:42 +02:00
Sven Eckelmann 3c12d04868 cfg80211: initialize sinfo in cfg80211_get_station
Most of the implementations behind cfg80211_get_station will not initialize
sinfo to zero before manipulating it. For example, the member "filled",
which indicates the filled in parts of this struct, is often only modified
by enabling certain bits in the bitfield while keeping the remaining bits
in their original state. A caller without a preinitialized sinfo.filled can
then no longer decide which parts of sinfo were filled in by
cfg80211_get_station (or actually the underlying implementations).

cfg80211_get_station must therefore take care that sinfo is initialized to
zero. Otherwise, the caller may tries to read information which was not
filled in and which must therefore also be considered uninitialized. In
batadv_v_elp_get_throughput's case, an invalid "random" expected throughput
may be stored for this neighbor and thus the B.A.T.M.A.N V algorithm may
switch to non-optimal neighbors for certain destinations.

Fixes: 7406353d43 ("cfg80211: implement cfg80211_get_station cfg80211 API")
Reported-by: Thomas Lauer <holminateur@gmail.com>
Reported-by: Marcel Schmidt <ff.z-casparistrasse@mailbox.org>
Cc: b.a.t.m.a.n@lists.open-mesh.org
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
2018-06-15 13:01:47 +02:00
Santosh Shilimkar f1693c63ab rds: avoid unenecessary cong_update in loop transport
Loop transport which is self loopback, remote port congestion
update isn't relevant. Infact the xmit path already ignores it.
Receive path needs to do the same.

Reported-by: syzbot+4c20b3866171ce8441d2@syzkaller.appspotmail.com
Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 19:01:49 -07:00
Guillaume Nault bda06be215 l2tp: clean up stale tunnel or session in pppol2tp_connect's error path
pppol2tp_connect() may create a tunnel or a session. Remove them in
case of error.

Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 17:10:19 -07:00
Guillaume Nault 3e1bc8bf97 l2tp: prevent pppol2tp_connect() from creating kernel sockets
If 'fd' is negative, l2tp_tunnel_create() creates a tunnel socket using
the configuration passed in 'tcfg'. Currently, pppol2tp_connect() sets
the relevant fields to zero, tricking l2tp_tunnel_create() into setting
up an unusable kernel socket.

We can't set 'tcfg' with the required fields because there's no way to
get them from the current connect() parameters. So let's restrict
kernel sockets creation to the netlink API, which is the original use
case.

Fixes: 789a4a2c61 ("l2tp: Add support for static unmanaged L2TPv3 tunnels")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 17:10:19 -07:00
Guillaume Nault 7ac6ab1f8a l2tp: only accept PPP sessions in pppol2tp_connect()
l2tp_session_priv() returns a struct pppol2tp_session pointer only for
PPPoL2TP sessions. In particular, if the session is an L2TP_PWTYPE_ETH
pseudo-wire, l2tp_session_priv() returns a pointer to an l2tp_eth_sess
structure, which is much smaller than struct pppol2tp_session. This
leads to invalid memory dereference when trying to lock ps->sk_lock.

Fixes: d9e31d17ce ("l2tp: Add L2TP ethernet pseudowire support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 17:10:19 -07:00
Guillaume Nault 90904ff5f9 l2tp: fix pseudo-wire type for sessions created by pppol2tp_connect()
Define cfg.pw_type so that the new session is created with its .pwtype
field properly set (L2TP_PWTYPE_PPP).

Not setting the pseudo-wire type had several annoying effects:

  * Invalid value returned in the L2TP_ATTR_PW_TYPE attribute when
    dumping sessions with the netlink API.

  * Impossibility to delete the session using the netlink API (because
    l2tp_nl_cmd_session_delete() gets the deletion callback function
    from an array indexed by the session's pseudo-wire type).

Also, there are several cases where we should check a session's
pseudo-wire type. For example, pppol2tp_connect() should refuse to
connect a session that is not PPPoL2TP, but that requires the session's
.pwtype field to be properly set.

Fixes: f7faffa3ff ("l2tp: Add L2TPv3 protocol support")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 17:10:19 -07:00
Frank van der Linden 4fd44a98ff tcp: verify the checksum of the first data segment in a new connection
commit 079096f103 ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.

But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly.  These lower-level processing functions do not do any checksum
verification.

Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.

Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Balbir Singh <bsingharora@gmail.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 17:04:41 -07:00
Linus Torvalds dc594c39f7 The main piece is a set of libceph changes that revamps how OSD
requests are aborted, improving CephFS ENOSPC handling and making
 "umount -f" actually work (Zheng and myself).  The rest is mostly
 mount option handling cleanups from Chengguang and assorted fixes
 from Zheng, Luis and Dongsheng.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJbIkigAAoJEEp/3jgCEfOL3EUH/1s7Ib3FgFzG/SPPKISxZOGr
 ndZGg0rPT9mPIQ4rp6t0z/cDlMrluPmCK3sWrAPe//sZz9iZiuip+mCL0gUFXFNr
 1kL2xDKkJzGxtP3UlUvr5CC6bnxLdeBXJRBDLk/swtphuqArKndlbN/iLZnCZivT
 uJDk+vZTwNJ3UhQP4QdnOQLV60NYs+q4euTqbZF3+pDiRiONbxRfXC3adFsc8zL9
 zlie3CHPbrQHWMsfNvbfM3rBH1WhTwEssDm+IEFlKl19q9SKP2WPZfmBcE1pmZ58
 AhIMoNGdQha1FXS6N96kaPaqFgeysPnEPoyHDqLxsUMKqsvJlOEZsK1jujza4rE=
 =EfXm
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "The main piece is a set of libceph changes that revamps how OSD
  requests are aborted, improving CephFS ENOSPC handling and making
  "umount -f" actually work (Zheng and myself).

  The rest is mostly mount option handling cleanups from Chengguang and
  assorted fixes from Zheng, Luis and Dongsheng.

* tag 'ceph-for-4.18-rc1' of git://github.com/ceph/ceph-client: (31 commits)
  rbd: flush rbd_dev->watch_dwork after watch is unregistered
  ceph: update description of some mount options
  ceph: show ino32 if the value is different with default
  ceph: strengthen rsize/wsize/readdir_max_bytes validation
  ceph: fix alignment of rasize
  ceph: fix use-after-free in ceph_statfs()
  ceph: prevent i_version from going back
  ceph: fix wrong check for the case of updating link count
  libceph: allocate the locator string with GFP_NOFAIL
  libceph: make abort_on_full a per-osdc setting
  libceph: don't abort reads in ceph_osdc_abort_on_full()
  libceph: avoid a use-after-free during map check
  libceph: don't warn if req->r_abort_on_full is set
  libceph: use for_each_request() in ceph_osdc_abort_on_full()
  libceph: defer __complete_request() to a workqueue
  libceph: move more code into __complete_request()
  libceph: no need to call flush_workqueue() before destruction
  ceph: flush pending works before shutdown super
  ceph: abort osd requests on force umount
  libceph: introduce ceph_osdc_abort_requests()
  ...
2018-06-15 07:24:58 +09:00
Xin Long 9951912200 sctp: define sctp_packet_gso_append to build GSO frames
Now sctp GSO uses skb_gro_receive() to append the data into head
skb frag_list. However it actually only needs very few code from
skb_gro_receive(). Besides, NAPI_GRO_CB has to be set while most
of its members are not needed here.

This patch is to add sctp_packet_gso_append() to build GSO frames
instead of skb_gro_receive(), and it would avoid many unnecessary
checks and make the code clearer.

Note that sctp will use page frags instead of frag_list to build
GSO frames in another patch. But it may take time, as sctp's GSO
frames may have different size. skb_segment() can only split it
into the frags with the same size, which would break the border
of sctp chunks.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-14 10:25:53 -07:00
David S. Miller 60d061e347 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter patches for your net tree:

1) Fix NULL pointer dereference from nf_nat_decode_session() if NAT is
   not loaded, from Prashant Bhole.

2) Fix socket extension module autoload.

3) Don't bogusly reject sets with the NFT_SET_EVAL flag set on from
   the dynset extension.

4) Fix races with nf_tables module removal and netns exit path,
   patches from Florian Westphal.

5) Don't hit BUG_ON if jumpstack goes too deep, instead hit
   WARN_ON_ONCE, from Taehee Yoo.

6) Another NULL pointer dereference from ctnetlink, again if NAT is
   not loaded, from Florian Westphal.

7) Fix x_tables match list corruption in xt_connmark module removal
   path, also from Florian.

8) nf_conncount doesn't properly deal with conntrack zones, hence
   garbage collector may get rid of entries in a different zone.
   From Yi-Hung Wei.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-13 14:04:48 -07:00
Linus Torvalds b08fc5277a - Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
 - Explicitly reported overflow fixes (Silvio, Kees)
 - Add missing kvcalloc() function (Kees)
 - Treewide conversions of allocators to use either 2-factor argument
   variant when available, or array_size() and array3_size() as needed (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
 oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
 bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
 oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
 EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
 fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
 zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
 ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
 tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
 BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
 LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
 AtfvEeaPHaOyD8/h2Q==
 =zUUp
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull more overflow updates from Kees Cook:
 "The rest of the overflow changes for v4.18-rc1.

  This includes the explicit overflow fixes from Silvio, further
  struct_size() conversions from Matthew, and a bug fix from Dan.

  But the bulk of it is the treewide conversions to use either the
  2-factor argument allocators (e.g. kmalloc(a * b, ...) into
  kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
  b) into vmalloc(array_size(a, b)).

  Coccinelle was fighting me on several fronts, so I've done a bunch of
  manual whitespace updates in the patches as well.

  Summary:

   - Error path bug fix for overflow tests (Dan)

   - Additional struct_size() conversions (Matthew, Kees)

   - Explicitly reported overflow fixes (Silvio, Kees)

   - Add missing kvcalloc() function (Kees)

   - Treewide conversions of allocators to use either 2-factor argument
     variant when available, or array_size() and array3_size() as needed
     (Kees)"

* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
  treewide: Use array_size in f2fs_kvzalloc()
  treewide: Use array_size() in f2fs_kzalloc()
  treewide: Use array_size() in f2fs_kmalloc()
  treewide: Use array_size() in sock_kmalloc()
  treewide: Use array_size() in kvzalloc_node()
  treewide: Use array_size() in vzalloc_node()
  treewide: Use array_size() in vzalloc()
  treewide: Use array_size() in vmalloc()
  treewide: devm_kzalloc() -> devm_kcalloc()
  treewide: devm_kmalloc() -> devm_kmalloc_array()
  treewide: kvzalloc() -> kvcalloc()
  treewide: kvmalloc() -> kvmalloc_array()
  treewide: kzalloc_node() -> kcalloc_node()
  treewide: kzalloc() -> kcalloc()
  treewide: kmalloc() -> kmalloc_array()
  mm: Introduce kvcalloc()
  video: uvesafb: Fix integer overflow in allocation
  UBIFS: Fix potential integer overflow in allocation
  leds: Use struct_size() in allocation
  Convert intel uncore to struct_size
  ...
2018-06-12 18:28:00 -07:00
Kees Cook fd7becedb1 treewide: Use array_size() in vzalloc_node()
The vzalloc_node() function has no 2-factor argument form, so
multiplication factors need to be wrapped in array_size(). This patch
replaces cases of:

        vzalloc_node(a * b, node)

with:
        vzalloc_node(array_size(a, b), node)

as well as handling cases of:

        vzalloc_node(a * b * c, node)

with:

        vzalloc_node(array3_size(a, b, c), node)

This does, however, attempt to ignore constant size factors like:

        vzalloc_node(4 * 1024, node)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc_node(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc_node(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc_node(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc_node(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc_node(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc_node(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc_node(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc_node(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc_node(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc_node(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc_node(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc_node(C1 * C2 * C3, ...)
|
  vzalloc_node(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc_node(C1 * C2, ...)
|
  vzalloc_node(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook fad953ce0b treewide: Use array_size() in vzalloc()
The vzalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vzalloc(a * b)

with:
        vzalloc(array_size(a, b))

as well as handling cases of:

        vzalloc(a * b * c)

with:

        vzalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vzalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vzalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vzalloc(C1 * C2 * C3, ...)
|
  vzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vzalloc(C1 * C2, ...)
|
  vzalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 42bc47b353 treewide: Use array_size() in vmalloc()
The vmalloc() function has no 2-factor argument form, so multiplication
factors need to be wrapped in array_size(). This patch replaces cases of:

        vmalloc(a * b)

with:
        vmalloc(array_size(a, b))

as well as handling cases of:

        vmalloc(a * b * c)

with:

        vmalloc(array3_size(a, b, c))

This does, however, attempt to ignore constant size factors like:

        vmalloc(4 * 1024)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  vmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  vmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  vmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  vmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_ID
+	array_size(COUNT_ID, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_ID)
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_ID
+	array_size(COUNT_ID, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT_CONST)
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT_CONST
+	array_size(COUNT_CONST, sizeof(THING))
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

  vmalloc(
-	SIZE * COUNT
+	array_size(COUNT, SIZE)
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  vmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  vmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  vmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  vmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  vmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  vmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  vmalloc(C1 * C2 * C3, ...)
|
  vmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants.
@@
expression E1, E2;
constant C1, C2;
@@

(
  vmalloc(C1 * C2, ...)
|
  vmalloc(
-	E1 * E2
+	array_size(E1, E2)
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 778e1cdd81 treewide: kvzalloc() -> kvcalloc()
The kvzalloc() function has a 2-factor argument form, kvcalloc(). This
patch replaces cases of:

        kvzalloc(a * b, gfp)

with:
        kvcalloc(a * b, gfp)

as well as handling cases of:

        kvzalloc(a * b * c, gfp)

with:

        kvzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kvcalloc(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kvzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvzalloc
+ kvcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kvzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kvzalloc(sizeof(THING) * C2, ...)
|
  kvzalloc(sizeof(TYPE) * C2, ...)
|
  kvzalloc(C1 * C2 * C3, ...)
|
  kvzalloc(C1 * C2, ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kvzalloc
+ kvcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 344476e16a treewide: kvmalloc() -> kvmalloc_array()
The kvmalloc() function has a 2-factor argument form, kvmalloc_array(). This
patch replaces cases of:

        kvmalloc(a * b, gfp)

with:
        kvmalloc_array(a * b, gfp)

as well as handling cases of:

        kvmalloc(a * b * c, gfp)

with:

        kvmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kvmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kvmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kvmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kvmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kvmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kvmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kvmalloc
+ kvmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kvmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kvmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kvmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kvmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kvmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kvmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kvmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kvmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kvmalloc(C1 * C2 * C3, ...)
|
  kvmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kvmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kvmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kvmalloc(sizeof(THING) * C2, ...)
|
  kvmalloc(sizeof(TYPE) * C2, ...)
|
  kvmalloc(C1 * C2 * C3, ...)
|
  kvmalloc(C1 * C2, ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kvmalloc
+ kvmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Kees Cook 6da2ec5605 treewide: kmalloc() -> kmalloc_array()
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
patch replaces cases of:

        kmalloc(a * b, gfp)

with:
        kmalloc_array(a * b, gfp)

as well as handling cases of:

        kmalloc(a * b * c, gfp)

with:

        kmalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kmalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kmalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The tools/ directory was manually excluded, since it has its own
implementation of kmalloc().

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kmalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kmalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kmalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kmalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kmalloc
+ kmalloc_array
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kmalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kmalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kmalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kmalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kmalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kmalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kmalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kmalloc(sizeof(THING) * C2, ...)
|
  kmalloc(sizeof(TYPE) * C2, ...)
|
  kmalloc(C1 * C2 * C3, ...)
|
  kmalloc(C1 * C2, ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kmalloc
+ kmalloc_array
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Cong Wang c0129a0614 smc: convert to ->poll_mask
smc->clcsock is an internal TCP socket, after TCP socket
converts to ->poll_mask, ->poll doesn't exist any more.
So just convert smc socket to ->poll_mask too.

Fixes: 2c7d3daceb ("net/tcp: convert to ->poll_mask")
Reported-by: syzbot+f5066e369b2d5fff630f@syzkaller.appspotmail.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12 15:37:09 -07:00
Bart Van Assche cdb8744d80 Revert "net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets"
Revert the patch mentioned in the subject because it breaks at least
the Avahi mDNS daemon. That patch namely causes the Ubuntu 18.04 Avahi
daemon to fail to start:

Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully called chroot().
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Successfully dropped remaining capabilities.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: No service file found in /etc/avahi/services.
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: SO_REUSEADDR failed: Structure needs cleaning
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: Failed to create server: No suitable network protocol available
Jun 12 09:49:24 ubuntu-vm avahi-daemon[529]: avahi-daemon 0.7 exiting.
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
Jun 12 09:49:24 ubuntu-vm systemd[1]: avahi-daemon.service: Failed with result 'exit-code'.
Jun 12 09:49:24 ubuntu-vm systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.

Fixes: f396922d86 ("net: do not allow changing SO_REUSEADDR/SO_REUSEPORT on bound sockets")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-12 11:09:23 -07:00
Yi-Hung Wei 21ba8847f8 netfilter: nf_conncount: Fix garbage collection with zones
Currently, we use check_hlist() for garbage colleciton. However, we
use the ‘zone’ from the counted entry to query the existence of
existing entries in the hlist. This could be wrong when they are in
different zones, and this patch fixes this issue.

Fixes: e59ea3df3f ("netfilter: xt_connlimit: honor conntrack zone if available")
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 20:07:07 +02:00
Florian Westphal fc6ddbecce netfilter: xt_connmark: fix list corruption on rmmod
This needs to use xt_unregister_targets, else new revision is left
on the list which then causes list to point to a target struct that has been free'd.

Fixes: 472a73e007 ("netfilter: xt_conntrack: Support bit-shifting for CONNMARK & MARK targets.")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:35:52 +02:00
Florian Westphal c05a45c086 netfilter: ctnetlink: avoid null pointer dereference
Dan Carpenter points out that deref occurs after NULL check, we should
re-fetch the pointer and check that instead.

Fixes: 2c205dd398 ("netfilter: add struct nf_nat_hook and use it")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:31:07 +02:00
Taehee Yoo adc972c5b8 netfilter: nf_tables: use WARN_ON_ONCE instead of BUG_ON in nft_do_chain()
When depth of chain is bigger than NFT_JUMP_STACK_SIZE, the nft_do_chain
crashes. But there is no need to crash hard here.

Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:30:11 +02:00
Florian Westphal 0a2cf5ee43 netfilter: nf_tables: close race between netns exit and rmmod
If net namespace is exiting while nf_tables module is being removed
we can oops:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
 IP: nf_tables_flowtable_event+0x43/0xf0 [nf_tables]
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP PTI
 Modules linked in: nf_tables(-) nfnetlink [..]
  unregister_netdevice_notifier+0xdd/0x130
  nf_tables_module_exit+0x24/0x3a [nf_tables]
  SyS_delete_module+0x1c5/0x240
  do_syscall_64+0x74/0x190

Avoid this by attempting to take reference on the net namespace from
the notifiers.  If it fails the namespace is exiting already, and nft
core is taking care of cleanup work.

We also need to make sure the netdev hook type gets removed
before netns ops removal, else notifier might be invoked with device
event for a netns where net->nft was never initialised (because
pernet ops was removed beforehand).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:28:18 +02:00
Florian Westphal 71ad00c50d netfilter: nf_tables: fix module unload race
We must first remove the nfnetlink protocol handler when nf_tables module
is unloaded -- we don't want userspace to submit new change requests once
we've started to tear down nft state.

Furthermore, nfnetlink must not call any subsystem function after
call_batch returned -EAGAIN.

EAGAIN means the subsys mutex was dropped, so its unlikely but possible that
nf_tables subsystem was removed due to 'rmmod nf_tables' on another cpu.

Therefore, we must abort batch completely and not move on to next part of
the batch.

Last, we can't invoke ->abort unless we've checked that the subsystem is
still registered.

Change netns exit path of nf_tables to make sure any incompleted
transaction gets removed on exit.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:28:18 +02:00
Pablo Neira Ayuso 215a31f19d netfilter: nft_dynset: do not reject set updates with NFT_SET_EVAL
NFT_SET_EVAL is signalling the kernel that this sets can be updated from
the evaluation path, even if there are no expressions attached to the
element. Otherwise, set updates with no expressions fail. Update
description to describe the right semantics.

Fixes: 22fe54d5fe ("netfilter: nf_tables: add support for dynamic set updates")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:12:48 +02:00
Pablo Neira Ayuso 3fb61eca18 netfilter: nft_socket: fix module autoload
Add alias definition for module autoload when adding socket rules.

Fixes: 554ced0a6e ("netfilter: nf_tables: add support for native socket matching")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2018-06-12 19:12:45 +02:00