Commit Graph

61815 Commits

Author SHA1 Message Date
Linus Torvalds a1d21081a6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:
 "Some merge window fallout, some longer term fixes:

   1) Handle headroom properly in lapbether and x25_asy drivers, from
      Xie He.

   2) Fetch MAC address from correct r8152 device node, from Thierry
      Reding.

   3) In the sw kTLS path we should allow MSG_CMSG_COMPAT in sendmsg,
      from Rouven Czerwinski.

   4) Correct fdputs in socket layer, from Miaohe Lin.

   5) Revert troublesome sockptr_t optimization, from Christoph Hellwig.

   6) Fix TCP TFO key reading on big endian, from Jason Baron.

   7) Missing CAP_NET_RAW check in nfc, from Qingyu Li.

   8) Fix inet fastreuse optimization with tproxy sockets, from Tim
      Froidcoeur.

   9) Fix 64-bit divide in new SFC driver, from Edward Cree.

  10) Add a tracepoint for prandom_u32 so that we can more easily
      perform usage analysis. From Eric Dumazet.

  11) Fix rwlock imbalance in AF_PACKET, from John Ogness"

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  net: openvswitch: introduce common code for flushing flows
  af_packet: TPACKET_V3: fix fill status rwlock imbalance
  random32: add a tracepoint for prandom_u32()
  Revert "ipv4: tunnel: fix compilation on ARCH=um"
  net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
  net: ethernet: stmmac: Disable hardware multicast filter
  net: stmmac: dwmac1000: provide multicast filter fallback
  ipv4: tunnel: fix compilation on ARCH=um
  vsock: fix potential null pointer dereference in vsock_poll()
  sfc: fix ef100 design-param checking
  net: initialize fastreuse on inet_inherit_port
  net: refactor bind_bucket fastreuse into helper
  net: phy: marvell10g: fix null pointer dereference
  net: Fix potential memory leak in proto_register()
  net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
  ionic_lif: Use devm_kcalloc() in ionic_qcq_alloc()
  net/nfc/rawsock.c: add CAP_NET_RAW check.
  hinic: fix strncpy output truncated compile warnings
  drivers/net/wan/x25_asy: Added needed_headroom and a skb->len check
  net/tls: Fix kmap usage
  ...
2020-08-13 20:03:11 -07:00
Tonghao Zhang 1f3a090b90 net: openvswitch: introduce common code for flushing flows
To avoid some issues, for example RCU usage warning and double free,
we should flush the flows under ovs_lock. This patch refactors
table_instance_destroy and introduces table_instance_flow_flush
which can be invoked by __dp_destroy or ovs_flow_tbl_flush.

Fixes: 50b0e61b32 ("net: openvswitch: fix possible memleak on destroy flow-table")
Reported-by: Johan Knöös <jknoos@google.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2020-August/050489.html
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13 15:53:30 -07:00
John Ogness 88fd1cb80d af_packet: TPACKET_V3: fix fill status rwlock imbalance
After @blk_fill_in_prog_lock is acquired there is an early out vnet
situation that can occur. In that case, the rwlock needs to be
released.

Also, since @blk_fill_in_prog_lock is only acquired when @tp_version
is exactly TPACKET_V3, only release it on that exact condition as
well.

And finally, add sparse annotation so that it is clearer that
prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring
and releasing @blk_fill_in_prog_lock, respectively. sparse is still
unable to understand the balance, but the warnings are now on a
higher level that make more sense.

Fixes: 632ca50f2c ("af_packet: TPACKET_V3: replace busy-wait loop")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-13 15:37:30 -07:00
John Fastabend 84f44df664 bpf: sock_ops sk access may stomp registers when dst_reg = src_reg
Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
src_reg = dst_reg when reading the sk field of a sock_ops struct we
generate xlated code,

  53: (61) r9 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  56: (79) r9 = *(u64 *)(r9 +0)

This stomps on the r9 reg to do the sk_fullsock check and then when
reading the skops->sk field instead of the sk pointer we get the
sk_fullsock. To fix use similar pattern noted in the previous fix
and use the temp field to save/restore a register used to do
sk_fullsock check.

After the fix the generated xlated code reads,

  52: (7b) *(u64 *)(r9 +32) = r8
  53: (61) r8 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  55: (79) r8 = *(u64 *)(r9 +32)
  56: (79) r9 = *(u64 *)(r9 +0)
  57: (05) goto pc+1
  58: (79) r8 = *(u64 *)(r9 +32)

Here r9 register was in-use so r8 is chosen as the temporary register.
In line 52 r8 is saved in temp variable and at line 54 restored in case
fullsock != 0. Finally we handle fullsock == 0 case by restoring at
line 58.

This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
bit more confusing than just adding a new macro despite a bit of
duplicating code.

Fixes: 1314ef5611 ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718349653.4728.6559437186853473612.stgit@john-Precision-5820-Tower
2020-08-13 22:40:40 +02:00
John Fastabend fd09af0107 bpf: sock_ops ctx access may stomp registers in corner case
I had a sockmap program that after doing some refactoring started spewing
this splat at me:

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[...]
[18610.807359] Call Trace:
[18610.807370]  ? 0xffffffffc114d0d5
[18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
[18610.807391]  tcp_connect+0x895/0xd50
[18610.807400]  tcp_v4_connect+0x465/0x4e0
[18610.807407]  __inet_stream_connect+0xd6/0x3a0
[18610.807412]  ? __inet_stream_connect+0x5/0x3a0
[18610.807417]  inet_stream_connect+0x3b/0x60
[18610.807425]  __sys_connect+0xed/0x120

After some debugging I was able to build this simple reproducer,

 __section("sockops/reproducer_bad")
 int bpf_reproducer_bad(struct bpf_sock_ops *skops)
 {
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        return 0;
 }

And along the way noticed that below program ran without splat,

__section("sockops/reproducer_good")
int bpf_reproducer_good(struct bpf_sock_ops *skops)
{
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        volatile __maybe_unused __u32 family;

        compiler_barrier();

        family = skops->family;
        return 0;
}

So I decided to check out the code we generate for the above two
programs and noticed each generates the BPF code you would expect,

0000000000000000 <bpf_reproducer_bad>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r1 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r1
;       return 0;
       2:       r0 = 0
       3:       exit

0000000000000000 <bpf_reproducer_good>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r2 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r2
;       family = skops->family;
       2:       r1 = *(u32 *)(r1 + 20)
       3:       *(u32 *)(r10 - 8) = r1
;       return 0;
       4:       r0 = 0
       5:       exit

So we get reasonable assembly, but still something was causing the null
pointer dereference. So, we load the programs and dump the xlated version
observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
translated by the skops access helpers.

int bpf_reproducer_bad(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)
   3: (61) r1 = *(u32 *)(r1 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r1
; return 0;
   5: (b7) r0 = 0
   6: (95) exit

int bpf_reproducer_good(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r2 = *(u32 *)(r1 +28)
   1: (15) if r2 == 0x0 goto pc+2
   2: (79) r2 = *(u64 *)(r1 +0)
   3: (61) r2 = *(u32 *)(r2 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r2
; family = skops->family;
   5: (79) r1 = *(u64 *)(r1 +0)
   6: (69) r1 = *(u16 *)(r1 +16)
; family = skops->family;
   7: (63) *(u32 *)(r10 -8) = r1
; return 0;
   8: (b7) r0 = 0
   9: (95) exit

Then we look at lines 0 and 2 above. In the good case we do the zero
check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
into the bpf_sock_ops check and we can confirm that is the 'struct
sock *sk' pointer field. But, in the bad case,

   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)

Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001

The 0x01 makes sense because that is exactly the fullsock value. And
its not a valid dereference so we splat.

To fix we need to guard the case when a program is doing a sock_ops field
access with src_reg == dst_reg. This is already handled in the load case
where the ctx_access handler uses a tmp register being careful to
store the old value and restore it. To fix the get case test if
src_reg == dst_reg and in this case do the is_fullsock test in the
temporary register. Remembering to restore the temporary register before
writing to either dst_reg or src_reg to avoid smashing the pointer into
the struct holding the tmp variable.

Adding this inline code to test_tcpbpf_kern will now be generated
correctly from,

  9: r2 = *(u32 *)(r2 + 96)

to xlated code,

  12: (7b) *(u64 *)(r2 +32) = r9
  13: (61) r9 = *(u32 *)(r2 +28)
  14: (15) if r9 == 0x0 goto pc+4
  15: (79) r9 = *(u64 *)(r2 +32)
  16: (79) r2 = *(u64 *)(r2 +0)
  17: (61) r2 = *(u32 *)(r2 +2348)
  18: (05) goto pc+1
  19: (79) r9 = *(u64 *)(r2 +32)

And in the normal case we keep the original code, because really this
is an edge case. From this,

  9: r2 = *(u32 *)(r6 + 96)

to xlated code,

  22: (61) r2 = *(u32 *)(r6 +28)
  23: (15) if r2 == 0x0 goto pc+2
  24: (79) r2 = *(u64 *)(r6 +0)
  25: (61) r2 = *(u32 *)(r2 +2348)

So three additional instructions if dst == src register, but I scanned
my current code base and did not see this pattern anywhere so should
not be a big deal. Further, it seems no one else has hit this or at
least reported it so it must a fairly rare pattern.

Fixes: 9b1f3d6e5a ("bpf: Refactor sock_ops_convert_ctx_access")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/159718347772.4728.2781381670567919577.stgit@john-Precision-5820-Tower
2020-08-13 22:40:36 +02:00
Florian Westphal 59136aa3b2 netfilter: nf_tables: free chain context when BINDING flag is missing
syzbot found a memory leak in nf_tables_addchain() because the chain
object is not free'd correctly on error.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: syzbot+c99868fde67014f7e9f5@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13 04:17:46 +02:00
Florian Westphal 2404b73c3f netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependency
nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core.

The current use of the netfilter ipv6 stub indirections  causes a module
dependency between ipv6 and nf_defrag_ipv6.

This prevents nf_defrag_ipv6 module from being removed because ipv6 can't
be unloaded.

Remove the indirection and always use a direct call.  This creates a
depency from nf_conntrack_bridge to nf_defrag_ipv6 instead:

modinfo nf_conntrack
depends:        nf_conntrack,nf_defrag_ipv6,bridge

.. and nf_conntrack already depends on nf_defrag_ipv6 anyway.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-13 04:16:15 +02:00
Andrii Nakryiko 068d9d1eba bpf: Fix XDP FD-based attach/detach logic around XDP_FLAGS_UPDATE_IF_NOEXIST
Enforce XDP_FLAGS_UPDATE_IF_NOEXIST only if new BPF program to be attached is
non-NULL (i.e., we are not detaching a BPF program).

Fixes: d4baa9368a ("bpf, xdp: Extract common XDP program attachment logic")
Reported-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Stanislav Fomichev <sdf@google.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20200812022923.1217922-1-andriin@fb.com
2020-08-12 18:00:49 -07:00
David S. Miller 9643609423 Revert "ipv4: tunnel: fix compilation on ARCH=um"
This reverts commit 06a7a37be5.

The bug was already fixed, this added a dup include.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-12 13:26:37 -07:00
Eric Dumazet 2e0d8fef5f net: accept an empty mask in /sys/class/net/*/queues/rx-*/rps_cpus
We must accept an empty mask in store_rps_map(), or we are not able
to disable RPS on a queue.

Fixes: 07bbecb341 ("net: Restrict receive packets queuing to housekeeping CPUs")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Cc: Alex Belits <abelits@marvell.com>
Cc: Nitesh Narayan Lal <nitesh@redhat.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Maciej Żenczykowski <maze@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-12 13:18:16 -07:00
Johannes Berg 06a7a37be5 ipv4: tunnel: fix compilation on ARCH=um
With certain configurations, a 64-bit ARCH=um errors
out here with an unknown csum_ipv6_magic() function.
Include the right header file to always have it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-12 12:57:51 -07:00
Stefano Garzarella 1980c05844 vsock: fix potential null pointer dereference in vsock_poll()
syzbot reported this issue where in the vsock_poll() we find the
socket state at TCP_ESTABLISHED, but 'transport' is null:
  general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN
  KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097]
  CPU: 0 PID: 8227 Comm: syz-executor.2 Not tainted 5.8.0-rc7-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  RIP: 0010:vsock_poll+0x75a/0x8e0 net/vmw_vsock/af_vsock.c:1038
  Call Trace:
   sock_poll+0x159/0x460 net/socket.c:1266
   vfs_poll include/linux/poll.h:90 [inline]
   do_pollfd fs/select.c:869 [inline]
   do_poll fs/select.c:917 [inline]
   do_sys_poll+0x607/0xd40 fs/select.c:1011
   __do_sys_poll fs/select.c:1069 [inline]
   __se_sys_poll fs/select.c:1057 [inline]
   __x64_sys_poll+0x18c/0x440 fs/select.c:1057
   do_syscall_64+0x60/0xe0 arch/x86/entry/common.c:384
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

This issue can happen if the TCP_ESTABLISHED state is set after we read
the vsk->transport in the vsock_poll().

We could put barriers to synchronize, but this can only happen during
connection setup, so we can simply check that 'transport' is valid.

Fixes: c0cfa2d8a7 ("vsock: add multi-transports support")
Reported-and-tested-by: syzbot+a61bac2fcc1a7c6623fe@syzkaller.appspotmail.com
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-12 12:56:06 -07:00
Linus Torvalds 7c2a69f610 Xiubo has completed his work on filesystem client metrics, they are
sent to all available MDSes once per second now.  Other than that, we
 have a lot of fixes and cleanups all around the filesystem, including
 a tweak to cut down on MDS request resends in multi-MDS setups from
 Yanhu and fixups for SELinux symlink labeling and MClientSession
 message decoding from Jeff.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAl80I3MTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi7rlB/4/tegnfsB+O6MQPt+DkSxjQ6+90J/O
 aDj3jnwHoUBUitTP92wHdy/TanlZMX0UQqq4caWt1Ot0Rveb6XTfTIzEDtG2ejYQ
 dcH6LB3dS9Qvjrwh7oRawx8Xux/RdIIA99OvAX7K+4IeLKP5PjmwfdM1Mm0agfoS
 V8qXm8cGyqF2lWown092Znr8s6OSsDEaUbjB2tlz5msWeK9rSFD5bDYOBdOTkBCU
 5v5BIUwWWoU9YFMWDJgVNS5l2Scq5yTDEleZcCCJMovKNrhBi9DcwUDKrDQJ3+5v
 UmfI5M/L99Ng7yAcPqbeOZ1HibuOrdmqAoR6lCiD+PPEPrf99rVLRinQ
 =2NtI
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "Xiubo has completed his work on filesystem client metrics, they are
  sent to all available MDSes once per second now.

  Other than that, we have a lot of fixes and cleanups all around the
  filesystem, including a tweak to cut down on MDS request resends in
  multi-MDS setups from Yanhu and fixups for SELinux symlink labeling
  and MClientSession message decoding from Jeff"

* tag 'ceph-for-5.9-rc1' of git://github.com/ceph/ceph-client: (22 commits)
  ceph: handle zero-length feature mask in session messages
  ceph: use frag's MDS in either mode
  ceph: move sb->wb_pagevec_pool to be a global mempool
  ceph: set sec_context xattr on symlink creation
  ceph: remove redundant initialization of variable mds
  ceph: fix use-after-free for fsc->mdsc
  ceph: remove unused variables in ceph_mdsmap_decode()
  ceph: delete repeated words in fs/ceph/
  ceph: send client provided metric flags in client metadata
  ceph: periodically send perf metrics to MDSes
  ceph: check the sesion state and return false in case it is closed
  libceph: replace HTTP links with HTTPS ones
  ceph: remove unnecessary cast in kfree()
  libceph: just have osd_req_op_init() return a pointer
  ceph: do not access the kiocb after aio requests
  ceph: clean up and optimize ceph_check_delayed_caps()
  ceph: fix potential mdsc use-after-free crash
  ceph: switch to WARN_ON_ONCE in encode_supported_features()
  ceph: add global total_caps to count the mdsc's total caps number
  ceph: add check_session_state() helper and make it global
  ...
2020-08-12 12:51:31 -07:00
Tim Froidcoeur d76f3351ce net: initialize fastreuse on inet_inherit_port
In the case of TPROXY, bind_conflict optimizations for SO_REUSEADDR or
SO_REUSEPORT are broken, possibly resulting in O(n) instead of O(1) bind
behaviour or in the incorrect reuse of a bind.

the kernel keeps track for each bind_bucket if all sockets in the
bind_bucket support SO_REUSEADDR or SO_REUSEPORT in two fastreuse flags.
These flags allow skipping the costly bind_conflict check when possible
(meaning when all sockets have the proper SO_REUSE option).

For every socket added to a bind_bucket, these flags need to be updated.
As soon as a socket that does not support reuse is added, the flag is
set to false and will never go back to true, unless the bind_bucket is
deleted.

Note that there is no mechanism to re-evaluate these flags when a socket
is removed (this might make sense when removing a socket that would not
allow reuse; this leaves room for a future patch).

For this optimization to work, it is mandatory that these flags are
properly initialized and updated.

When a child socket is created from a listen socket in
__inet_inherit_port, the TPROXY case could create a new bind bucket
without properly initializing these flags, thus preventing the
optimization to work. Alternatively, a socket not allowing reuse could
be added to an existing bind bucket without updating the flags, causing
bind_conflict to never be called as it should.

Call inet_csk_update_fastreuse when __inet_inherit_port decides to create
a new bind_bucket or use a different bind_bucket than the one of the
listen socket.

Fixes: 093d282321 ("tproxy: fix hash locking issue when using port redirection in __inet_inherit_port()")
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11 15:49:08 -07:00
Tim Froidcoeur 62ffc589ab net: refactor bind_bucket fastreuse into helper
Refactor the fastreuse update code in inet_csk_get_port into a small
helper function that can be called from other places.

Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Tim Froidcoeur <tim.froidcoeur@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11 15:49:08 -07:00
Miaohe Lin 0f5907af39 net: Fix potential memory leak in proto_register()
If we failed to assign proto idx, we free the twsk_slab_name but forget to
free the twsk_slab. Add a helper function tw_prot_cleanup() to free these
together and also use this helper function in proto_unregister().

Fixes: b45ce32135 ("sock: fix potential memory leak in proto_register()")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11 15:36:14 -07:00
Qingyu Li 26896f0146 net/nfc/rawsock.c: add CAP_NET_RAW check.
When creating a raw AF_NFC socket, CAP_NET_RAW needs to be checked first.

Signed-off-by: Qingyu Li <ieatmuttonchuan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11 10:34:30 -07:00
Ira Weiny b06c19d9f8 net/tls: Fix kmap usage
When MSG_OOB is specified to tls_device_sendpage() the mapped page is
never unmapped.

Hold off mapping the page until after the flags are checked and the page
is actually needed.

Fixes: e8f6979981 ("net/tls: Add generic NIC offload infrastructure")
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-11 10:20:34 -07:00
Linus Torvalds 97d052ea3f A set of locking fixes and updates:
- Untangle the header spaghetti which causes build failures in various
     situations caused by the lockdep additions to seqcount to validate that
     the write side critical sections are non-preemptible.
 
   - The seqcount associated lock debug addons which were blocked by the
     above fallout.
 
     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict per
     CPU seqcounts. As the lock is not part of the seqcount, lockdep cannot
     validate that the lock is held.
 
     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored and
     write_seqcount_begin() has a lockdep assertion to validate that the
     lock is held.
 
     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API is
     unchanged and determines the type at compile time with the help of
     _Generic which is possible now that the minimal GCC version has been
     moved up.
 
     Adding this lockdep coverage unearthed a handful of seqcount bugs which
     have been addressed already independent of this.
 
     While generaly useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if the
     writers are serialized by an associated lock, which leads to the well
     known reader preempts writer livelock. RT prevents this by storing the
     associated lock pointer independent of lockdep in the seqcount and
     changing the reader side to block on the lock when a reader detects
     that a writer is in the write side critical section.
 
  - Conversion of seqcount usage sites to associated types and initializers.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl8xmPYTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoTuQEACyzQCjU8PgehPp9oMqWzaX2fcVyuZO
 QU2yw6gmz2oTz3ZHUNwdW8UnzGh2OWosK3kDruoD9FtSS51lER1/ISfSPCGfyqxC
 KTjOcB1Kvxwq/3LcCx7Zi3ZxWApat74qs3EhYhKtEiQ2Y9xv9rLq8VV1UWAwyxq0
 eHpjlIJ6b6rbt+ARslaB7drnccOsdK+W/roNj4kfyt+gezjBfojGRdMGQNMFcpnv
 shuTC+vYurAVIiVA/0IuizgHfwZiXOtVpjVoEWaxg6bBH6HNuYMYzdSa/YrlDkZs
 n/aBI/Xkvx+Eacu8b1Zwmbzs5EnikUK/2dMqbzXKUZK61eV4hX5c2xrnr1yGWKTs
 F/juh69Squ7X6VZyKVgJ9RIccVueqwR2EprXWgH3+RMice5kjnXH4zURp0GHALxa
 DFPfB6fawcH3Ps87kcRFvjgm6FBo0hJ1AxmsW1dY4ACFB9azFa2euW+AARDzHOy2
 VRsUdhL9CGwtPjXcZ/9Rhej6fZLGBXKr8uq5QiMuvttp4b6+j9FEfBgD4S6h8csl
 AT2c2I9LcbWqyUM9P4S7zY/YgOZw88vHRuDH7tEBdIeoiHfrbSBU7EQ9jlAKq/59
 f+Htu2Io281c005g7DEeuCYvpzSYnJnAitj5Lmp/kzk2Wn3utY1uIAVszqwf95Ul
 81ppn2KlvzUK8g==
 =7Gj+
 -----END PGP SIGNATURE-----

Merge tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Thomas Gleixner:
 "A set of locking fixes and updates:

   - Untangle the header spaghetti which causes build failures in
     various situations caused by the lockdep additions to seqcount to
     validate that the write side critical sections are non-preemptible.

   - The seqcount associated lock debug addons which were blocked by the
     above fallout.

     seqcount writers contrary to seqlock writers must be externally
     serialized, which usually happens via locking - except for strict
     per CPU seqcounts. As the lock is not part of the seqcount, lockdep
     cannot validate that the lock is held.

     This new debug mechanism adds the concept of associated locks.
     sequence count has now lock type variants and corresponding
     initializers which take a pointer to the associated lock used for
     writer serialization. If lockdep is enabled the pointer is stored
     and write_seqcount_begin() has a lockdep assertion to validate that
     the lock is held.

     Aside of the type and the initializer no other code changes are
     required at the seqcount usage sites. The rest of the seqcount API
     is unchanged and determines the type at compile time with the help
     of _Generic which is possible now that the minimal GCC version has
     been moved up.

     Adding this lockdep coverage unearthed a handful of seqcount bugs
     which have been addressed already independent of this.

     While generally useful this comes with a Trojan Horse twist: On RT
     kernels the write side critical section can become preemtible if
     the writers are serialized by an associated lock, which leads to
     the well known reader preempts writer livelock. RT prevents this by
     storing the associated lock pointer independent of lockdep in the
     seqcount and changing the reader side to block on the lock when a
     reader detects that a writer is in the write side critical section.

   - Conversion of seqcount usage sites to associated types and
     initializers"

* tag 'locking-urgent-2020-08-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  locking/seqlock, headers: Untangle the spaghetti monster
  locking, arch/ia64: Reduce <asm/smp.h> header dependencies by moving XTP bits into the new <asm/xtp.h> header
  x86/headers: Remove APIC headers from <asm/smp.h>
  seqcount: More consistent seqprop names
  seqcount: Compress SEQCNT_LOCKNAME_ZERO()
  seqlock: Fold seqcount_LOCKNAME_init() definition
  seqlock: Fold seqcount_LOCKNAME_t definition
  seqlock: s/__SEQ_LOCKDEP/__SEQ_LOCK/g
  hrtimer: Use sequence counter with associated raw spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  vfs: Use sequence counter with associated spinlock
  timekeeping: Use sequence counter with associated raw spinlock
  xfrm: policy: Use sequence counters with associated lock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  sched: tasks: Use sequence counter with associated spinlock
  ...
2020-08-10 19:07:44 -07:00
Jason Baron f19008e676 tcp: correct read of TFO keys on big endian systems
When TFO keys are read back on big endian systems either via the global
sysctl interface or via getsockopt() using TCP_FASTOPEN_KEY, the values
don't match what was written.

For example, on s390x:

# echo "1-2-3-4" > /proc/sys/net/ipv4/tcp_fastopen_key
# cat /proc/sys/net/ipv4/tcp_fastopen_key
02000000-01000000-04000000-03000000

Instead of:

# cat /proc/sys/net/ipv4/tcp_fastopen_key
00000001-00000002-00000003-00000004

Fix this by converting to the correct endianness on read. This was
reported by Colin Ian King when running the 'tcp_fastopen_backup_key' net
selftest on s390x, which depends on the read value matching what was
written. I've confirmed that the test now passes on big and little endian
systems.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Fixes: 438ac88009 ("net: fastopen: robustness and endianness fixes for SipHash")
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Eric Dumazet <edumazet@google.com>
Reported-and-tested-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10 12:12:35 -07:00
Christoph Hellwig 519a8a6cf9 net: Revert "net: optimize the sockptr_t for unified kernel/user address spaces"
This reverts commits 6d04fe15f7 and
a31edb2059.

It turns out the idea to share a single pointer for both kernel and user
space address causes various kinds of problems.  So use the slightly less
optimal version that uses an extra bit, but which is guaranteed to be safe
everywhere.

Fixes: 6d04fe15f7 ("net: optimize the sockptr_t for unified kernel/user address spaces")
Reported-by: Eric Dumazet <edumazet@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-10 12:06:44 -07:00
Florian Westphal 2f941622fd netfilter: nft_compat: remove flush counter optimization
WARNING: CPU: 1 PID: 16059 at lib/refcount.c:31 refcount_warn_saturate+0xdf/0xf
[..]
 __nft_mt_tg_destroy+0x42/0x50 [nft_compat]
 nft_target_destroy+0x63/0x80 [nft_compat]
 nf_tables_expr_destroy+0x1b/0x30 [nf_tables]
 nf_tables_rule_destroy+0x3a/0x70 [nf_tables]
 nf_tables_exit_net+0x186/0x3d0 [nf_tables]

Happens when a compat expr is destoyed from abort path.
There is no functional impact; after this work queue is flushed
unconditionally if its pending.

This removes the waitcount optimization.  Test of repeated
iptables-restore of a ~60k kubernetes ruleset doesn't indicate
a slowdown.  In case the counter is needed after all for some workloads
we can revert this and increment the refcount for the
!= NFT_PREPARE_TRANS case to avoid the increment/decrement imbalance.

While at it, also flush for match case, this was an oversight
in the original patch.

Fixes: ffe8923f10 ("netfilter: nft_compat: make sure xtables destructors have run")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-10 13:03:36 +02:00
Stephen Suryaputra b428336676 netfilter: nf_tables: nft_exthdr: the presence return value should be little-endian
On big-endian machine, the returned register data when the exthdr is
present is not being compared correctly because little-endian is
assumed. The function nft_cmp_fast_mask(), called by nft_cmp_fast_eval()
and nft_cmp_fast_init(), calls cpu_to_le32().

The following dump also shows that little endian is assumed:

$ nft --debug=netlink add rule ip recordroute forward ip option rr exists counter
ip
  [ exthdr load ipv4 1b @ 7 + 0 present => reg 1 ]
  [ cmp eq reg 1 0x01000000 ]
  [ counter pkts 0 bytes 0 ]

Lastly, debug print in nft_cmp_fast_init() and nft_cmp_fast_eval() when
RR option exists in the packet shows that the comparison fails because
the assumption:

nft_cmp_fast_init:189 priv->sreg=4 desc.len=8 mask=0xff000000 data.data[0]=0x10003e0
nft_cmp_fast_eval:57 regs->data[priv->sreg=4]=0x1 mask=0xff000000 priv->data=0x1000000

v2: use nft_reg_store8() instead (Florian Westphal). Also to avoid the
    warnings reported by kernel test robot.

Fixes: dbb5281a1f ("netfilter: nf_tables: add support for matching IPv4 options")
Fixes: c078ca3b0c ("netfilter: nft_exthdr: Add support for existence check")
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-10 13:02:43 +02:00
Linus Torvalds 7a6b60441f Highlights:
- Support for user extended attributes on NFS (RFC 8276)
 - Further reduce unnecessary NFSv4 delegation recalls
 
 Notable fixes:
 
 - Fix recent krb5p regression
 - Address a few resource leaks and a rare NULL dereference
 
 Other:
 
 - De-duplicate RPC/RDMA error handling and other utility functions
 - Replace storage and display of kernel memory addresses by tracepoints
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAl8oBt0ACgkQM2qzM29m
 f5dTFQ/9H72E6gr1onsia0/Py0CO8F9qzLgmUBl1vVYAh2/vPqUL1ypxrC5OYrAy
 TOqESTsJvmGluCFc/77XUTD7NvJY3znIWim49okwDiyee4Y14ZfRhhCxyyA6Z94E
 FjJQb5TbF1Mti4X3dN8Gn7O1Y/BfTjDAAXnXGlTA1xoLcxM5idWIj+G8x0bPmeDb
 2fTbgsoETu6MpS2/L6mraXVh3d5ESOJH+73YvpBl0AhYPzlNASJZMLtHtd+A/JbO
 IPkMP/7UA5DuJtWGeuQ4I4D5bQNpNWMfN6zhwtih4IV5bkRC7vyAOLG1R7w9+Ufq
 58cxPiorMcsg1cHnXG0Z6WVtbMEdWTP/FzmJdE5RC7DEJhmmSUG/R0OmgDcsDZET
 GovPARho01yp80GwTjCIctDHRRFRL4pdPfr8PjVHetSnx9+zoRUT+D70Zeg/KSy2
 99gmCxqSY9BZeHoiVPEX/HbhXrkuDjUSshwl98OAzOFmv6kbwtLntgFbWlBdE6dB
 mqOxBb73zEoZ5P9GA2l2ShU3GbzMzDebHBb9EyomXHZrLejoXeUNA28VJ+8vPP5S
 IVHnEwOkdJrNe/7cH4jd/B0NR6f8Da/F9kmkLiG2GNPMqQ8bnVhxTUtZkcAE+fd4
 f34qLxsoht70wSSfISjBs7hP5KxEM1lOAf0w0RpycPUKJNV1FB0=
 =OEpF
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6

Pull NFS server updates from Chuck Lever:
 "Highlights:
   - Support for user extended attributes on NFS (RFC 8276)
   - Further reduce unnecessary NFSv4 delegation recalls

  Notable fixes:
   - Fix recent krb5p regression
   - Address a few resource leaks and a rare NULL dereference

  Other:
   - De-duplicate RPC/RDMA error handling and other utility functions
   - Replace storage and display of kernel memory addresses by tracepoints"

* tag 'nfsd-5.9' of git://git.linux-nfs.org/projects/cel/cel-2.6: (38 commits)
  svcrdma: CM event handler clean up
  svcrdma: Remove transport reference counting
  svcrdma: Fix another Receive buffer leak
  SUNRPC: Refresh the show_rqstp_flags() macro
  nfsd: netns.h: delete a duplicated word
  SUNRPC: Fix ("SUNRPC: Add "@len" parameter to gss_unwrap()")
  nfsd: avoid a NULL dereference in __cld_pipe_upcall()
  nfsd4: a client's own opens needn't prevent delegations
  nfsd: Use seq_putc() in two functions
  svcrdma: Display chunk completion ID when posting a rw_ctxt
  svcrdma: Record send_ctxt completion ID in trace_svcrdma_post_send()
  svcrdma: Introduce Send completion IDs
  svcrdma: Record Receive completion ID in svc_rdma_decode_rqst
  svcrdma: Introduce Receive completion IDs
  svcrdma: Introduce infrastructure to support completion IDs
  svcrdma: Add common XDR encoders for RDMA and Read segments
  svcrdma: Add common XDR decoders for RDMA and Read segments
  SUNRPC: Add helpers for decoding list discriminators symbolically
  svcrdma: Remove declarations for functions long removed
  svcrdma: Clean up trace_svcrdma_send_failed() tracepoint
  ...
2020-08-09 13:58:04 -07:00
Miaohe Lin 7c7ab580db net: Convert to use the fallthrough macro
Convert the uses of fallthrough comments to fallthrough macro.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08 14:29:09 -07:00
Miaohe Lin 11f920d2aa net: Use helper function ip_is_fragment()
Use helper function ip_is_fragment() to check ip fragment.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08 14:24:53 -07:00
Miaohe Lin 47260ba937 net: Remove meaningless jump label out_fs
The out_fs jump label has nothing to do but goto out.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08 14:23:21 -07:00
Miaohe Lin ce787a5a07 net: Set fput_needed iff FDPUT_FPUT is set
We should fput() file iff FDPUT_FPUT is set. So we should set fput_needed
accordingly.

Fixes: 00e188ef6a ("sockfd_lookup_light(): switch to fdget^W^Waway from fget_light")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08 14:21:42 -07:00
Miaohe Lin 6b07edebe6 net: Use helper function fdput()
Use helper function fdput() to fput() the file iff FDPUT_FPUT is set.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-08 14:19:16 -07:00
Rouven Czerwinski 1c3b63f155 net/tls: allow MSG_CMSG_COMPAT in sendmsg
Trying to use ktls on a system with 32-bit userspace and 64-bit kernel
results in a EOPNOTSUPP message during sendmsg:

  setsockopt(3, SOL_TLS, TLS_TX, …, 40) = 0
  sendmsg(3, …, msg_flags=0}, 0) = -1 EOPNOTSUPP (Operation not supported)

The tls_sw implementation does strict flag checking and does not allow
the MSG_CMSG_COMPAT flag, which is set if the message comes in through
the compat syscall.

This patch adds MSG_CMSG_COMPAT to the flag check to allow the usage of
the TLS SW implementation on systems using the compat syscall path.

Note that the same check is present in the sendmsg path for the TLS
device implementation, however the flag hasn't been added there for lack
of testing hardware.

Signed-off-by: Rouven Czerwinski <r.czerwinski@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07 17:40:45 -07:00
David S. Miller 64cae2fb48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2020-08-08

The following pull-request contains BPF updates for your *net* tree.

We've added 11 non-merge commits during the last 2 day(s) which contain
a total of 24 files changed, 216 insertions(+), 135 deletions(-).

The main changes are:

1) Fix UAPI for BPF map iterator before it gets frozen to allow for more
   extensions/customization in future, from Yonghong Song.

2) Fix selftests build to undo verbose build output, from Andrii Nakryiko.

3) Fix inlining compilation error on bpf_do_trace_printk() due to variable
   argument lists, from Stanislav Fomichev.

4) Fix an uninitialized pointer warning at btf__parse_raw() in libbpf,
   from Daniel T. Lee.

5) Fix several compilation warnings in selftests with regards to ignoring
   return value, from Jianlin Lv.

6) Fix interruptions by switching off timeout for BPF tests, from Jiri Benc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07 17:33:08 -07:00
Paolo Abeni 7ee2492635 mptcp: fix warn at shutdown time for unaccepted msk sockets
With commit b93df08ccd ("mptcp: explicitly track the fully
established status"), the status of unaccepted mptcp closed in
mptcp_sock_destruct() changes from TCP_SYN_RECV to TCP_ESTABLISHED.

As a result mptcp_sock_destruct() does not perform the proper
cleanup and inet_sock_destruct() will later emit a warn.

Address the issue updating the condition tested in mptcp_sock_destruct().
Also update the related comment.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/66
Reported-and-tested-by: Christoph Paasch <cpaasch@apple.com>
Fixes: b93df08ccd ("mptcp: explicitly track the fully established status")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-07 17:26:16 -07:00
Linus Torvalds 1fa2c0a0c8 Fix SCM_RIGHTS compat mode
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8trzgWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqrDEACRlxrvLRsMX5MCdZ3SQnnHkGbp
 7C/OP331cRe0HnAp1B3P8VhP/Y81fDiCnrP9cb7yoeMASCvCUhVQlMZpLVpppOUE
 NK0XQdjMlHJGh9qTcgWGd49KpFOFFYIeiv4NaddlmScb8Q+ZkMKiQWURNR65iM8c
 QLcpfryZEZd7eEZdH8Xtn6ItrYEYTldXSC8oc+e3ev0HCg0D7E4nBaqp1JzHT9I2
 Zku98lyap2sxSBZlTKVppowZ7RPqViwRkvGMofWE7brbJ2+H89YMXhj0ZMFQ8+8z
 htbG6KCWUP7kdUwcyuBgUX8/+cGMsb9R8U34ktxO7GWq1stHAph4e59kJ0cRKjNp
 v7vFL5nA6kvgYjZ9Y7dQoeKQWFAcN7qK2/DR/11ECx21pHbR/6LgyPDTYYgN88Bp
 wcbNgPr1AI7bxu/FKe+6a+qgIjV5Ukb4SxNKU7pvkKKSDCzhMxGGCRY9Pmip2gmv
 EHKOox7QgONGhvuTrE2CYr5hAO5alKbciTXnofAmCpi1ddTrpv5wVDDSSJXK9xNe
 uZjMvYGGnJWw8KLZiqbC6P1k92Lm8hfFzPuLBuuUILrG5MZsAiLGIOleXTaaehPS
 ZRPE1h2A95lFulOPXdX0fdQkWeVSf/IkBNxA3Uz/xM7QaF75n83VypFxVfEFzmos
 7R14u32ySUeuZX0/wg==
 =5TFR
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp fix from Kees Cook:
 "This fixes my typo in the SCM_RIGHTS refactoring that broke compat
  handling.

  Thanks to Thadeu Lima de Souza Cascardo for tracking it down, and to
  Christian Zigotzky and Alex Xu for their reports"

* tag 'seccomp-v5.9-rc1-fix1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  net/scm: Fix typo in SCM_RIGHTS compat refactoring
2020-08-07 13:16:22 -07:00
Kees Cook 16b89f6953 net/scm: Fix typo in SCM_RIGHTS compat refactoring
When refactoring the SCM_RIGHTS code, I accidentally mis-merged my
native/compat diffs, which entirely broke using SCM_RIGHTS in compat
mode. Use the correct helper.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Link: https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-August/216156.html
Reported-by: "Alex Xu (Hello71)" <alex_y_xu@yahoo.ca>
Link: https://lore.kernel.org/lkml/1596812929.lz7fuo8r2w.none@localhost/
Suggested-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Fixes: c0029de509 ("net/scm: Regularize compat handling of scm_detach_fds()")
Tested-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2020-08-07 12:43:25 -07:00
Linus Torvalds 81e11336d9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few MM hotfixes

 - kthread, tools, scripts, ntfs and ocfs2

 - some of MM

Subsystems affected by this patch series: kthread, tools, scripts, ntfs,
ocfs2 and mm (hofixes, pagealloc, slab-generic, slab, slub, kcsan,
debug, pagecache, gup, swap, shmem, memcg, pagemap, mremap, mincore,
sparsemem, vmalloc, kasan, pagealloc, hugetlb and vmscan).

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits)
  mm: vmscan: consistent update to pgrefill
  mm/vmscan.c: fix typo
  khugepaged: khugepaged_test_exit() check mmget_still_valid()
  khugepaged: retract_page_tables() remember to test exit
  khugepaged: collapse_pte_mapped_thp() protect the pmd lock
  khugepaged: collapse_pte_mapped_thp() flush the right range
  mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
  mm: thp: replace HTTP links with HTTPS ones
  mm/page_alloc: fix memalloc_nocma_{save/restore} APIs
  mm/page_alloc.c: skip setting nodemask when we are in interrupt
  mm/page_alloc: fallbacks at most has 3 elements
  mm/page_alloc: silence a KASAN false positive
  mm/page_alloc.c: remove unnecessary end_bitidx for [set|get]_pfnblock_flags_mask()
  mm/page_alloc.c: simplify pageblock bitmap access
  mm/page_alloc.c: extract the common part in pfn_to_bitidx()
  mm/page_alloc.c: replace the definition of NR_MIGRATETYPE_BITS with PB_migratetype_bits
  mm/shuffle: remove dynamic reconfiguration
  mm/memory_hotplug: document why shuffle_zone() is relevant
  mm/page_alloc: remove nr_free_pagecache_pages()
  mm: remove vm_total_pages
  ...
2020-08-07 11:39:33 -07:00
Waiman Long 453431a549 mm, treewide: rename kzfree() to kfree_sensitive()
As said by Linus:

  A symmetric naming is only helpful if it implies symmetries in use.
  Otherwise it's actively misleading.

  In "kzalloc()", the z is meaningful and an important part of what the
  caller wants.

  In "kzfree()", the z is actively detrimental, because maybe in the
  future we really _might_ want to use that "memfill(0xdeadbeef)" or
  something. The "zero" part of the interface isn't even _relevant_.

The main reason that kzfree() exists is to clear sensitive information
that should not be leaked to other future users of the same memory
objects.

Rename kzfree() to kfree_sensitive() to follow the example of the recently
added kvfree_sensitive() and make the intention of the API more explicit.
In addition, memzero_explicit() is used to clear the memory to make sure
that it won't get optimized away by the compiler.

The renaming is done by using the command sequence:

  git grep -w --name-only kzfree |\
  xargs sed -i 's/kzfree/kfree_sensitive/'

followed by some editing of the kfree_sensitive() kerneldoc and adding
a kzfree backward compatibility macro in slab.h.

[akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
[akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Joe Perches <joe@perches.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-08-07 11:33:22 -07:00
Linus Torvalds 86cfccb669 dlm for 5.9
This set includes a some improvements to the dlm
 networking layer: improving the ability to trace
 dlm messages for debugging, and improved handling
 of bad messages or disrupted connections.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJfLCPxAAoJEDgbc8f8gGmqz04P/2hvv/4rXo9AOgnnstvZV1Qy
 Yo01Cy807vB1c3jhIJryM2gG61GNH22RAHc2NcfjJwy04HH/1IEr6P48Po3qYEnS
 8fZ8B9msxpsujVOrRoeBuLN8elI1HftyNVWaVjH7xtD+fLCDLu9i10kv3aeS+DiB
 T6f7yQQv7hgXS3xGvlMr2//aLwGD2ZdcRbkOEGo+k7yUjQbIDH/wdZWcPLh6y4yT
 p20i2ulYKjEZFmXDMa17diONISeGO6iaDhee24XPDwNDp8qI1iPGJsmxltMmn8Qf
 d2HPF1IDh4eM8lCwmqBtjYTnJd6rAW0v3+Ek1+wzQKVeXLFiz/MEyuOldtpsqmMO
 8Og0vr6zfTCjFo8uvyj+cF7Fcj0yIPWg1yb7EauqqxreK8V9GBA1V2ZXYVd8xwea
 thrAUaq8f+PYQ9uy1FsN3xaO3BFN1VpcvHu4/3gU3OudnZZt2Ae670RYHKC0bq8D
 2tSsqaiDnlvniHgh4xvtNIvRANkDS1ZSbkUPZhMHL7DnRJn66oDIfCr7NMbZwvCa
 AS0q6suUFyXFbAEJcY6XWxe3aQ3WuxIClT84MgzX/dAK2Qcl8ryWGGSVc0dp4Vl1
 cd8MtmpnIWsnxqNRl4jn6cfolDheaxL8nouLtJ+3/dC9VkyDyfmrtnM+8aTZKHoa
 3/xrBuVkEJAwkAAr8Pb8
 =qgti
 -----END PGP SIGNATURE-----

Merge tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm

Pull dlm updates from David Teigland:
 "This set includes a some improvements to the dlm networking layer:
  improving the ability to trace dlm messages for debugging, and
  improved handling of bad messages or disrupted connections"

* tag 'dlm-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  fs: dlm: implement tcp graceful shutdown
  fs: dlm: change handling of reconnects
  fs: dlm: don't close socket on invalid message
  fs: dlm: set skb mark per peer socket
  fs: dlm: set skb mark for listen socket
  net: sock: add sock_set_mark
  dlm: Fix kobject memleak
2020-08-06 19:44:25 -07:00
Linus Torvalds 96e3f3c16b - Add support to enable/disable the thermal zones resulting on core code and
drivers cleanup (Andrzej Pietrasiewicz)
 
 - Add generic netlink support for userspace notifications: events, temperature
   and discovery commands (Daniel Lezcano)
 
 - Fix redundant initialization for a ret variable (Colin Ian King)
 
 - Remove the clock cooling code as it is used nowhere (Amit Kucheria)
 
 - Add the rcar_gen3_thermal's r8a774e1 support (Marian-Cristian Rotariu)
 
 - Replace all references to thermal.txt in the documentation to the
   corresponding yaml files (Amit Kucheria)
 
 - Add maintainer entry for the IPA (Lukasz Luba)
 
 - Add support for MSM8939 for the tsens (Shawn Guo)
 
 - Update power allocator and devfreq cooling to SPDX licensing (Lukasz Luba)
 
 - Add Cannon Lake Low Power PCH support (Sumeet Pawnikar)
 
 - Add tsensor support for V2 mediatek thermal system (Henry Yen)
 
 - Fix thermal zone lookup by ID for the core code (Thierry Reding)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAl8q7tsACgkQqDIjiipP
 6E+5Rwf7BFEn5YXPvng8cmnAlgvEBc9DdT6mGSo0NpFm9MdUxXlaqvw3WWSGyqWQ
 +z0Ka7lmn5XyiMsVN11++Snp+79X17HzZf9SXO3glyIpAn+5prTDRhzzj0/jPrtS
 sEeI++DrILsKKMGVljzftLmwNJN9DkUDNcnmWmZdCDbYVEKtP9Pjf2wBjAnXj7sX
 JA3CkHRMwYLEQbfaKz37M11cYM+LqbDOlb6U11YWgAGGJ7d7zNYRf2/YSYPM4AN6
 iE6j0E+3jIlXesULsap1AzeJaBq+wFxj1FL2TUZ8KscvRrm3AucqzNAT2M/Bc5Az
 XLKKzc6Gp9JfqB5KXhX2EDu7VRnDBg==
 =cSMN
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal updates from Daniel Lezcano:

 - Add support to enable/disable the thermal zones resulting on core
   code and drivers cleanup (Andrzej Pietrasiewicz)

 - Add generic netlink support for userspace notifications: events,
   temperature and discovery commands (Daniel Lezcano)

 - Fix redundant initialization for a ret variable (Colin Ian King)

 - Remove the clock cooling code as it is used nowhere (Amit Kucheria)

 - Add the rcar_gen3_thermal's r8a774e1 support (Marian-Cristian
   Rotariu)

 - Replace all references to thermal.txt in the documentation to the
   corresponding yaml files (Amit Kucheria)

 - Add maintainer entry for the IPA (Lukasz Luba)

 - Add support for MSM8939 for the tsens (Shawn Guo)

 - Update power allocator and devfreq cooling to SPDX licensing (Lukasz
   Luba)

 - Add Cannon Lake Low Power PCH support (Sumeet Pawnikar)

 - Add tsensor support for V2 mediatek thermal system (Henry Yen)

 - Fix thermal zone lookup by ID for the core code (Thierry Reding)

* tag 'thermal-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux: (40 commits)
  thermal: intel: intel_pch_thermal: Add Cannon Lake Low Power PCH support
  thermal: mediatek: Add tsensor support for V2 thermal system
  thermal: mediatek: Prepare to add support for other platforms
  thermal: Update power allocator and devfreq cooling to SPDX licensing
  MAINTAINERS: update entry to thermal governors file name prefixing
  thermal: core: Add thermal zone enable/disable notification
  thermal: qcom: tsens-v0_1: Add support for MSM8939
  dt-bindings: tsens: qcom: Document MSM8939 compatible
  thermal: core: Fix thermal zone lookup by ID
  thermal: int340x: processor_thermal: fix: update Jasper Lake PCI id
  thermal: imx8mm: Support module autoloading
  thermal: ti-soc-thermal: Fix reversed condition in ti_thermal_expose_sensor()
  MAINTAINERS: Add maintenance information for IPA
  thermal: rcar_gen3_thermal: Do not shadow thcode variable
  dt-bindings: thermal: Get rid of thermal.txt and replace references
  thermal: core: Move initialization after core initcall
  thermal: netlink: Improve the initcall ordering
  net: genetlink: Move initialization to core_initcall
  thermal: rcar_gen3_thermal: Add r8a774e1 support
  thermal/drivers/clock_cooling: Remove clock_cooling code
  ...
2020-08-06 18:10:55 -07:00
Yonghong Song 5e7b30205c bpf: Change uapi for bpf iterator map elements
Commit a5cbe05a66 ("bpf: Implement bpf iterator for
map elements") added bpf iterator support for
map elements. The map element bpf iterator requires
info to identify a particular map. In the above
commit, the attr->link_create.target_fd is used
to carry map_fd and an enum bpf_iter_link_info
is added to uapi to specify the target_fd actually
representing a map_fd:
    enum bpf_iter_link_info {
	BPF_ITER_LINK_UNSPEC = 0,
	BPF_ITER_LINK_MAP_FD = 1,

	MAX_BPF_ITER_LINK_INFO,
    };

This is an extensible approach as we can grow
enumerator for pid, cgroup_id, etc. and we can
unionize target_fd for pid, cgroup_id, etc.
But in the future, there are chances that
more complex customization may happen, e.g.,
for tasks, it could be filtered based on
both cgroup_id and user_id.

This patch changed the uapi to have fields
	__aligned_u64	iter_info;
	__u32		iter_info_len;
for additional iter_info for link_create.
The iter_info is defined as
	union bpf_iter_link_info {
		struct {
			__u32   map_fd;
		} map;
	};

So future extension for additional customization
will be easier. The bpf_iter_link_info will be
passed to target callback to validate and generic
bpf_iter framework does not need to deal it any
more.

Note that map_fd = 0 will be considered invalid
and -EBADF will be returned to user space.

Fixes: a5cbe05a66 ("bpf: Implement bpf iterator for map elements")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
2020-08-06 16:39:14 -07:00
Alexander Aring 84d1c61740 net: sock: add sock_set_mark
This patch adds a new socket helper function to set the mark value for a
kernel socket.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2020-08-06 10:30:50 -05:00
Linus Torvalds 47ec5303d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from David Miller:

 1) Support 6Ghz band in ath11k driver, from Rajkumar Manoharan.

 2) Support UDP segmentation in code TSO code, from Eric Dumazet.

 3) Allow flashing different flash images in cxgb4 driver, from Vishal
    Kulkarni.

 4) Add drop frames counter and flow status to tc flower offloading,
    from Po Liu.

 5) Support n-tuple filters in cxgb4, from Vishal Kulkarni.

 6) Various new indirect call avoidance, from Eric Dumazet and Brian
    Vazquez.

 7) Fix BPF verifier failures on 32-bit pointer arithmetic, from
    Yonghong Song.

 8) Support querying and setting hardware address of a port function via
    devlink, use this in mlx5, from Parav Pandit.

 9) Support hw ipsec offload on bonding slaves, from Jarod Wilson.

10) Switch qca8k driver over to phylink, from Jonathan McDowell.

11) In bpftool, show list of processes holding BPF FD references to
    maps, programs, links, and btf objects. From Andrii Nakryiko.

12) Several conversions over to generic power management, from Vaibhav
    Gupta.

13) Add support for SO_KEEPALIVE et al. to bpf_setsockopt(), from Dmitry
    Yakunin.

14) Various https url conversions, from Alexander A. Klimov.

15) Timestamping and PHC support for mscc PHY driver, from Antoine
    Tenart.

16) Support bpf iterating over tcp and udp sockets, from Yonghong Song.

17) Support 5GBASE-T i40e NICs, from Aleksandr Loktionov.

18) Add kTLS RX HW offload support to mlx5e, from Tariq Toukan.

19) Fix the ->ndo_start_xmit() return type to be netdev_tx_t in several
    drivers. From Luc Van Oostenryck.

20) XDP support for xen-netfront, from Denis Kirjanov.

21) Support receive buffer autotuning in MPTCP, from Florian Westphal.

22) Support EF100 chip in sfc driver, from Edward Cree.

23) Add XDP support to mvpp2 driver, from Matteo Croce.

24) Support MPTCP in sock_diag, from Paolo Abeni.

25) Commonize UDP tunnel offloading code by creating udp_tunnel_nic
    infrastructure, from Jakub Kicinski.

26) Several pci_ --> dma_ API conversions, from Christophe JAILLET.

27) Add FLOW_ACTION_POLICE support to mlxsw, from Ido Schimmel.

28) Add SK_LOOKUP bpf program type, from Jakub Sitnicki.

29) Refactor a lot of networking socket option handling code in order to
    avoid set_fs() calls, from Christoph Hellwig.

30) Add rfc4884 support to icmp code, from Willem de Bruijn.

31) Support TBF offload in dpaa2-eth driver, from Ioana Ciornei.

32) Support XDP_REDIRECT in qede driver, from Alexander Lobakin.

33) Support PCI relaxed ordering in mlx5 driver, from Aya Levin.

34) Support TCP syncookies in MPTCP, from Flowian Westphal.

35) Fix several tricky cases of PMTU handling wrt. briding, from Stefano
    Brivio.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2056 commits)
  net: thunderx: initialize VF's mailbox mutex before first usage
  usb: hso: remove bogus check for EINPROGRESS
  usb: hso: no complaint about kmalloc failure
  hso: fix bailout in error case of probe
  ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
  selftests/net: relax cpu affinity requirement in msg_zerocopy test
  mptcp: be careful on subflow creation
  selftests: rtnetlink: make kci_test_encap() return sub-test result
  selftests: rtnetlink: correct the final return value for the test
  net: dsa: sja1105: use detected device id instead of DT one on mismatch
  tipc: set ub->ifindex for local ipv6 address
  ipv6: add ipv6_dev_find()
  net: openvswitch: silence suspicious RCU usage warning
  Revert "vxlan: fix tos value before xmit"
  ptp: only allow phase values lower than 1 period
  farsync: switch from 'pci_' to 'dma_' API
  wan: wanxl: switch from 'pci_' to 'dma_' API
  hv_netvsc: do not use VF device if link is down
  dpaa2-eth: Fix passing zero to 'PTR_ERR' warning
  net: macb: Properly handle phylink on at91sam9x
  ...
2020-08-05 20:13:21 -07:00
Stefano Brivio 8ed54f167a ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM
On architectures defining _HAVE_ARCH_IPV6_CSUM, we get
csum_ipv6_magic() defined by means of arch checksum.h headers. On
other architectures, we actually need to include net/ip6_checksum.h
to be able to use it.

Without this include, building with defconfig breaks at least for
s390.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 4cb47a8644 ("tunnels: PMTU discovery support for directly bridged IP packets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:30:36 -07:00
Paolo Abeni adf7341064 mptcp: be careful on subflow creation
Nicolas reported the following oops:

[ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 1521.394189] #PF: supervisor read access in kernel mode
[ 1521.395376] #PF: error_code(0x0000) - not-present page
[ 1521.396607] PGD 0 P4D 0
[ 1521.397156] Oops: 0000 [#1] SMP PTI
[ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109
[ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[ 1521.401728] Workqueue: events mptcp_worker
[ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0
[ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b
[ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246
[ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000
[ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80
[ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00
[ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0
[ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000
[ 1521.417592] FS:  0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000
[ 1521.419490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0
[ 1521.422511] Call Trace:
[ 1521.423103]  __mptcp_subflow_connect+0x94/0x1f0
[ 1521.425376]  mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0
[ 1521.426736]  mptcp_worker+0x31b/0x390
[ 1521.431324]  process_one_work+0x1fc/0x3f0
[ 1521.432268]  worker_thread+0x2d/0x3b0
[ 1521.434197]  kthread+0x117/0x130
[ 1521.435783]  ret_from_fork+0x22/0x30

on some unconventional configuration.

The MPTCP protocol is trying to create a subflow for an
unaccepted server socket. That is allowed by the RFC, even
if subflow creation will likely fail.
Unaccepted sockets have still a NULL sk_socket field,
avoid the issue by failing earlier.

Reported-and-tested-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
Fixes: 7d14b0d2b9 ("mptcp: set correct vfs info for subflows")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:24:20 -07:00
Xin Long 5a6f6f5791 tipc: set ub->ifindex for local ipv6 address
Without ub->ifindex set for ipv6 address in tipc_udp_enable(),
ipv6_sock_mc_join() may make the wrong dev join the multicast
address in enable_mcast(). This causes that tipc links would
never be created.

So fix it by getting the right netdev and setting ub->ifindex,
as it does for ipv4 address.

Reported-by: Shuang Li <shuali@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:19:52 -07:00
Xin Long 81f6cb3122 ipv6: add ipv6_dev_find()
This is to add an ip_dev_find like function for ipv6, used to find
the dev by saddr.

It will be used by TIPC protocol. So also export it.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:19:52 -07:00
Tonghao Zhang 5845589ed6 net: openvswitch: silence suspicious RCU usage warning
ovs_flow_tbl_destroy always is called from RCU callback
or error path. It is no need to check if rcu_read_lock
or lockdep_ovsl_is_held was held.

ovs_dp_cmd_fill_info always is called with ovs_mutex,
So use the rcu_dereference_ovsl instead of rcu_dereference
in ovs_flow_tbl_masks_cache_size.

Fixes: 9bf24f594c ("net: openvswitch: make masks cache size configurable")
Cc: Eelco Chaudron <echaudro@redhat.com>
Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-05 12:11:46 -07:00
Linus Torvalds 2324d50d05 It's been a busy cycle for documentation - hopefully the busiest for a
while to come.  Changes include:
 
  - Some new Chinese translations
 
  - Progress on the battle against double words words and non-HTTPS URLs
 
  - Some block-mq documentation
 
  - More RST conversions from Mauro.  At this point, that task is
    essentially complete, so we shouldn't see this kind of churn again for a
    while.  Unless we decide to switch to asciidoc or something...:)
 
  - Lots of typo fixes, warning fixes, and more.
 -----BEGIN PGP SIGNATURE-----
 
 iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
 d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
 cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
 BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
 7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
 9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
 0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
 =JQLZ
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.9' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's been a busy cycle for documentation - hopefully the busiest for a
  while to come. Changes include:

   - Some new Chinese translations

   - Progress on the battle against double words words and non-HTTPS
     URLs

   - Some block-mq documentation

   - More RST conversions from Mauro. At this point, that task is
     essentially complete, so we shouldn't see this kind of churn again
     for a while. Unless we decide to switch to asciidoc or
     something...:)

   - Lots of typo fixes, warning fixes, and more"

* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
  scripts/kernel-doc: optionally treat warnings as errors
  docs: ia64: correct typo
  mailmap: add entry for <alobakin@marvell.com>
  doc/zh_CN: add cpu-load Chinese version
  Documentation/admin-guide: tainted-kernels: fix spelling mistake
  MAINTAINERS: adjust kprobes.rst entry to new location
  devices.txt: document rfkill allocation
  PCI: correct flag name
  docs: filesystems: vfs: correct flag name
  docs: filesystems: vfs: correct sync_mode flag names
  docs: path-lookup: markup fixes for emphasis
  docs: path-lookup: more markup fixes
  docs: path-lookup: fix HTML entity mojibake
  CREDITS: Replace HTTP links with HTTPS ones
  docs: process: Add an example for creating a fixes tag
  doc/zh_CN: add Chinese translation prefer section
  doc/zh_CN: add clearing-warn-once Chinese version
  doc/zh_CN: add admin-guide index
  doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
  futex: MAINTAINERS: Re-add selftests directory
  ...
2020-08-04 22:47:54 -07:00
Olga Kornievskaia 7de62bc09f SUNRPC dont update timeout value on connection reset
Current behaviour: every time a v3 operation is re-sent to the server
we update (double) the timeout. There is no distinction between whether
or not the previous timer had expired before the re-sent happened.

Here's the scenario:
1. Client sends a v3 operation
2. Server RST-s the connection (prior to the timeout) (eg., connection
is immediately reset)
3. Client re-sends a v3 operation but the timeout is now 120sec.

As a result, an application sees 2mins pause before a retry in case
server again does not reply.

Instead, this patch proposes to keep track off when the minor timeout
should happen and if it didn't, then don't update the new timeout.
Value is updated based on the previous value to make timeouts
predictable.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-08-04 23:17:11 -04:00
Linus Torvalds 3950e97543 Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman:
 "During the development of v5.7 I ran into bugs and quality of
  implementation issues related to exec that could not be easily fixed
  because of the way exec is implemented. So I have been diggin into
  exec and cleaning up what I can.

  This cycle I have been looking at different ideas and different
  implementations to see what is possible to improve exec, and cleaning
  the way exec interfaces with in kernel users. Only cleaning up the
  interfaces of exec with rest of the kernel has managed to stabalize
  and make it through review in time for v5.9-rc1 resulting in 2 sets of
  changes this cycle.

   - Implement kernel_execve

   - Make the user mode driver code a better citizen

  With kernel_execve the code size got a little larger as the copying of
  parameters from userspace and copying of parameters from userspace is
  now separate. The good news is kernel threads no longer need to play
  games with set_fs to use exec. Which when combined with the rest of
  Christophs set_fs changes should security bugs with set_fs much more
  difficult"

* 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits)
  exec: Implement kernel_execve
  exec: Factor bprm_stack_limits out of prepare_arg_pages
  exec: Factor bprm_execve out of do_execve_common
  exec: Move bprm_mm_init into alloc_bprm
  exec: Move initialization of bprm->filename into alloc_bprm
  exec: Factor out alloc_bprm
  exec: Remove unnecessary spaces from binfmts.h
  umd: Stop using split_argv
  umd: Remove exit_umh
  bpfilter: Take advantage of the facilities of struct pid
  exit: Factor thread_group_exited out of pidfd_poll
  umd: Track user space drivers with struct pid
  bpfilter: Move bpfilter_umh back into init data
  exec: Remove do_execve_file
  umh: Stop calling do_execve_file
  umd: Transform fork_usermode_blob into fork_usermode_driver
  umd: Rename umd_info.cmdline umd_info.driver_name
  umd: For clarity rename umh_info umd_info
  umh: Separate the user mode driver and the user mode helper support
  umh: Remove call_usermodehelper_setup_file.
  ...
2020-08-04 14:27:25 -07:00
Linus Torvalds fd76a74d94 audit/stable-5.9 PR 20200803
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl8okpIUHHBhdWxAcGF1
 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXNqOQ/8D+m9Ykcby3csEKsp8YtsaukEu62U
 lRVaxzRNO9wwB24aFwDFuJnIkmsSi/s/O4nBsy2mw+Apn+uDCvHQ9tBU07vlNn2f
 lu27YaTya7YGlqoe315xijd8tyoX99k8cpQeixvAVr9/jdR09yka7SJ8O7X9mjV7
 +SUVDiKCplPKpiwCCRS9cqD7F64T6y35XKzbrzYqdP0UOF2XelZo/Evt5rDRvWUf
 5qDN2tP+iM/Fvu5lCfczFwAeivfAdxjQ11n783hx8Ms2qyiaKQCzbEwjqAslmkbs
 1k/+ED0NjzXX1ne0JZaz/bk0wsMnmOoa8o+NDcyd7Za/cj5prUZi7kBy+xry4YV8
 qKJ40Lk0flCWgUpm6bkYVOByIYHk0gmfBNvjilqf25NR/eOC/9e9ir8PywvYUW/7
 kvVK37+N/a3LnFj80sZpIeqqnNU8z9PV1i7//5/kDuKvz94Bq83TJDO6pPKvqDtC
 njQfCFoHwdEeF8OalK793lIiYaoODqvbkWKChKMqziODJ4ZP8AW06gXpEbEWn7G3
 TTnJx7hqzR9t90vBQJeO3Fromfn+9TDlZVdX+EGO8gIqUiLGr0r7LPPep4VkDbNw
 LxMYKeC2cgRp8Z+XXPDxfXSDL2psTwg6CXcDrXcYnUyBo/yerpBvbJkeaR0h+UR0
 j6cvMX+T39X2JXM=
 =Xs3M
 -----END PGP SIGNATURE-----

Merge tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit

Pull audit updates from Paul Moore:
 "Aside from some smaller bug fixes, here are the highlights:

   - add a new backlog wait metric to the audit status message, this is
     intended to help admins determine how long processes have been
     waiting for the audit backlog queue to clear

   - generate audit records for nftables configuration changes

   - generate CWD audit records for for the relevant LSM audit records"

* tag 'audit-pr-20200803' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
  audit: report audit wait metric in audit status reply
  audit: purge audit_log_string from the intra-kernel audit API
  audit: issue CWD record to accompany LSM_AUDIT_DATA_* records
  audit: use the proper gfp flags in the audit_log_nfcfg() calls
  audit: remove unused !CONFIG_AUDITSYSCALL __audit_inode* stubs
  audit: add gfp parameter to audit_log_nfcfg
  audit: log nftables configuration change events
  audit: Use struct_size() helper in alloc_chunk
2020-08-04 14:20:26 -07:00
Linus Torvalds 9ecc6ea491 seccomp updates for v5.9-rc1
- Improved selftest coverage, timeouts, and reporting
 - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)
 - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers
 - Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l
 JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot
 nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv
 tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8
 smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby
 j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T
 xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf
 pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p
 VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp
 zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r
 L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl
 uTlTDM8kmQn6QzSztA==
 =yb55
 -----END PGP SIGNATURE-----

Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull seccomp updates from Kees Cook:
 "There are a bunch of clean ups and selftest improvements along with
  two major updates to the SECCOMP_RET_USER_NOTIF filter return:
  EPOLLHUP support to more easily detect the death of a monitored
  process, and being able to inject fds when intercepting syscalls that
  expect an fd-opening side-effect (needed by both container folks and
  Chrome). The latter continued the refactoring of __scm_install_fd()
  started by Christoph, and in the process found and fixed a handful of
  bugs in various callers.

   - Improved selftest coverage, timeouts, and reporting

   - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner)

   - Refactor __scm_install_fd() into __receive_fd() and fix buggy
     callers

   - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun
     Dhillon)"

* tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits)
  selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD
  seccomp: Introduce addfd ioctl to seccomp user notifier
  fs: Expand __receive_fd() to accept existing fd
  pidfd: Replace open-coded receive_fd()
  fs: Add receive_fd() wrapper for __receive_fd()
  fs: Move __scm_install_fd() to __receive_fd()
  net/scm: Regularize compat handling of scm_detach_fds()
  pidfd: Add missing sock updates for pidfd_getfd()
  net/compat: Add missing sock updates for SCM_RIGHTS
  selftests/seccomp: Check ENOSYS under tracing
  selftests/seccomp: Refactor to use fixture variants
  selftests/harness: Clean up kern-doc for fixtures
  seccomp: Use -1 marker for end of mode 1 syscall list
  seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID
  selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall()
  selftests/seccomp: Make kcmp() less required
  seccomp: Use pr_fmt
  selftests/seccomp: Improve calibration loop
  selftests/seccomp: use 90s as timeout
  selftests/seccomp: Expand benchmark to per-filter measurements
  ...
2020-08-04 14:11:08 -07:00
Linus Torvalds 99ea1521a0 Remove uninitialized_var() macro for v5.9-rc1
- Clean up non-trivial uses of uninitialized_var()
 - Update documentation and checkpatch for uninitialized_var() removal
 - Treewide removal of uninitialized_var()
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oYLQWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJsfjEACvf0D3WL3H7sLHtZ2HeMwOgAzq
 il08t6vUscINQwiIIK3Be43ok3uQ1Q+bj8sr2gSYTwunV2IYHFferzgzhyMMno3o
 XBIGd1E+v1E4DGBOiRXJvacBivKrfvrdZ7AWiGlVBKfg2E0fL1aQbe9AYJ6eJSbp
 UGqkBkE207dugS5SQcwrlk1tWKUL089lhDAPd7iy/5RK76OsLRCJFzIerLHF2ZK2
 BwvA+NWXVQI6pNZ0aRtEtbbxwEU4X+2J/uaXH5kJDszMwRrgBT2qoedVu5LXFPi8
 +B84IzM2lii1HAFbrFlRyL/EMueVFzieN40EOB6O8wt60Y4iCy5wOUzAdZwFuSTI
 h0xT3JI8BWtpB3W+ryas9cl9GoOHHtPA8dShuV+Y+Q2bWe1Fs6kTl2Z4m4zKq56z
 63wQCdveFOkqiCLZb8s6FhnS11wKtAX4czvXRXaUPgdVQS1Ibyba851CRHIEY+9I
 AbtogoPN8FXzLsJn7pIxHR4ADz+eZ0dQ18f2hhQpP6/co65bYizNP5H3h+t9hGHG
 k3r2k8T+jpFPaddpZMvRvIVD8O2HvJZQTyY6Vvneuv6pnQWtr2DqPFn2YooRnzoa
 dbBMtpon+vYz6OWokC5QNWLqHWqvY9TmMfcVFUXE4AFse8vh4wJ8jJCNOFVp8On+
 drhmmImUr1YylrtVOw==
 =xHmk
 -----END PGP SIGNATURE-----

Merge tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull uninitialized_var() macro removal from Kees Cook:
 "This is long overdue, and has hidden too many bugs over the years. The
  series has several "by hand" fixes, and then a trivial treewide
  replacement.

   - Clean up non-trivial uses of uninitialized_var()

   - Update documentation and checkpatch for uninitialized_var() removal

   - Treewide removal of uninitialized_var()"

* tag 'uninit-macro-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  compiler: Remove uninitialized_var() macro
  treewide: Remove uninitialized_var() usage
  checkpatch: Remove awareness of uninitialized_var() macro
  mm/debug_vm_pgtable: Remove uninitialized_var() usage
  f2fs: Eliminate usage of uninitialized_var() macro
  media: sur40: Remove uninitialized_var() usage
  KVM: PPC: Book3S PR: Remove uninitialized_var() usage
  clk: spear: Remove uninitialized_var() usage
  clk: st: Remove uninitialized_var() usage
  spi: davinci: Remove uninitialized_var() usage
  ide: Remove uninitialized_var() usage
  rtlwifi: rtl8192cu: Remove uninitialized_var() usage
  b43: Remove uninitialized_var() usage
  drbd: Remove uninitialized_var() usage
  x86/mm/numa: Remove uninitialized_var() usage
  docs: deprecated.rst: Add uninitialized_var()
2020-08-04 13:49:43 -07:00
Linus Torvalds 427714f258 tasklets API update for v5.9-rc1
- Prepare for tasklet API modernization (Romain Perier, Allen Pais, Kees Cook)
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oXpMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJtJgEACVb88nzYwu5mC5ZcfvwSyXeQsR
 eDpCkX5HT6CsxlOn0/YJvxUtkkerQftbRuAXrzoUpQkpyBh82PviVZFKDS7NE9Lc
 6xPqloi2gbZ8EfgMraVynL+9lpLh0+qNCM7LPg4xT+JxMDLut/nWRdrp8d7uBYfQ
 AXV6CV4Tc4ijOMROV6AEVVdSTzkRCbiqUnRDBLETBfiJOdDn5MgJgxicWvN5FTpu
 PiUVF3CtWaKCRfQO/GEAXTG65hOtmql5IbX9n7uooNu/wCCnEFfVUus1uTcsrqxN
 ByrZ56NVPoO7z2jYLt8Lft3myo2e/mn88PKqrzS2p9GPn0VBv7rcO3ePmbbHL/RA
 mp+pg8wdpmKrHv4YGfsF+obT1v8f6VJoTLUt5S/WqZAzl1sVJgEJdAkjmDKythGG
 yYKKCemMceMMzLXxnFAYMzdXzdXZ3YEpiW4UkBb77EhUisDrLxCHSL5t4UzyWnuO
 Gtzw7N69iHPHLsxAk1hESAD8sdlk2EdN6vzJVelOsiW955x1hpR+msvNpwZwBqdq
 A2h8VnnrxLK2APl93T5VW9T6kvhzaTwLhoCH+oKklE+U0XJTAYZ4D/AcRVghBvMg
 bC1+1vDx+t/S+8P308evPQnEygLtL2I+zpPnBA1DZzHRAoY8inCLc5HQOfr6pi/f
 koNTtKkmSSKaFSYITw==
 =hb+e
 -----END PGP SIGNATURE-----

Merge tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull tasklets API update from Kees Cook:
 "These are the infrastructure updates needed to support converting the
  tasklet API to something more modern (and hopefully for removal
  further down the road).

  There is a 300-patch series waiting in the wings to get set out to
  subsystem maintainers, but these changes need to be present in the
  kernel first. Since this has some treewide changes, I carried this
  series for -next instead of paining Thomas with it in -tip, but it's
  got his Ack.

  This is similar to the timer_struct modernization from a while back,
  but not nearly as messy (I hope). :)

   - Prepare for tasklet API modernization (Romain Perier, Allen Pais,
     Kees Cook)"

* tag 'tasklets-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  tasklet: Introduce new initialization API
  treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()
  usb: gadget: udc: Avoid tasklet passing a global
2020-08-04 13:40:35 -07:00
David S. Miller ee895a30ef Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Flush the cleanup xtables worker to make sure destructors
   have completed, from Florian Westphal.

2) iifgroup is matching erroneously, also from Florian.

3) Add selftest for meta interface matching, from Florian Westphal.

4) Move nf_ct_offload_timeout() to header, from Roi Dayan.

5) Call nf_ct_offload_timeout() from flow_offload_add() to
   make sure garbage collection does not evict offloaded flow,
   from Roi Dayan.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:32:39 -07:00
Stefano Brivio 4cb47a8644 tunnels: PMTU discovery support for directly bridged IP packets
It's currently possible to bridge Ethernet tunnels carrying IP
packets directly to external interfaces without assigning them
addresses and routes on the bridged network itself: this is the case
for UDP tunnels bridged with a standard bridge or by Open vSwitch.

PMTU discovery is currently broken with those configurations, because
the encapsulation effectively decreases the MTU of the link, and
while we are able to account for this using PMTU discovery on the
lower layer, we don't have a way to relay ICMP or ICMPv6 messages
needed by the sender, because we don't have valid routes to it.

On the other hand, as a tunnel endpoint, we can't fragment packets
as a general approach: this is for instance clearly forbidden for
VXLAN by RFC 7348, section 4.3:

   VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
   fragment encapsulated VXLAN packets due to the larger frame size.
   The destination VTEP MAY silently discard such VXLAN fragments.

The same paragraph recommends that the MTU over the physical network
accomodates for encapsulations, but this isn't a practical option for
complex topologies, especially for typical Open vSwitch use cases.

Further, it states that:

   Other techniques like Path MTU discovery (see [RFC1191] and
   [RFC1981]) MAY be used to address this requirement as well.

Now, PMTU discovery already works for routed interfaces, we get
route exceptions created by the encapsulation device as they receive
ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
we already rebuild those messages with the appropriate MTU and route
them back to the sender.

Add the missing bits for bridged cases:

- checks in skb_tunnel_check_pmtu() to understand if it's appropriate
  to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
  RFC 4443 section 2.4 for ICMPv6. This function is already called by
  UDP tunnels

- a new function generating those ICMP or ICMPv6 replies. We can't
  reuse icmp_send() and icmp6_send() as we don't see the sender as a
  valid destination. This doesn't need to be generic, as we don't
  cover any other type of ICMP errors given that we only provide an
  encapsulation function to the sender

While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
we might receive GSO buffers here, and the passed headroom already
includes the inner MAC length, so we don't have to account for it
a second time (that would imply three MAC headers on the wire, but
there are just two).

This issue became visible while bridging IPv6 packets with 4500 bytes
of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
bytes of encapsulation headroom, we would advertise MTU as 3950, and
we would reject fragmented IPv6 datagrams of 3958 bytes size on the
wire. We're exclusively dealing with network MTU here, though, so we
could get Ethernet frames up to 3964 octets in that case.

v2:
- moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
- split IPv4/IPv6 functions (David Ahern)

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:01:45 -07:00
Stefano Brivio df23bb18b4 ipv4: route: Ignore output interface in FIB lookup for PMTU route
Currently, processes sending traffic to a local bridge with an
encapsulation device as a port don't get ICMP errors if they exceed
the PMTU of the encapsulated link.

David Ahern suggested this as a hack, but it actually looks like
the correct solution: when we update the PMTU for a given destination
by means of updating or creating a route exception, the encapsulation
might trigger this because of PMTU discovery happening either on the
encapsulation device itself, or its lower layer. This happens on
bridged encapsulations only.

The output interface shouldn't matter, because we already have a
valid destination. Drop the output interface restriction from the
associated route lookup.

For UDP tunnels, we will now have a route exception created for the
encapsulation itself, with a MTU value reflecting its headroom, which
allows a bridge forwarding IP packets originated locally to deliver
errors back to the sending socket.

The behaviour is now consistent with IPv6 and verified with selftests
pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
this series.

v2:
- reset output interface only for bridge ports (David Ahern)
- add and use netif_is_any_bridge_port() helper (David Ahern)

Suggested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-04 13:01:45 -07:00
David S. Miller 2e7199bd77 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-08-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

The main changes are:

1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
   syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
   in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
   unwinder errors, from Song Liu.

4) Allow cgroup local storage map to be shared between programs on the same
   cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
   load instructions, from Jean-Philippe Brucker.

6) Follow-up fixes on BPF socket lookup in combination with reuseport group
   handling. Also add related BPF selftests, from Jakub Sitnicki.

7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
   socket create/release as well as bind functions, from Stanislav Fomichev.

8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
   xdp_statistics, from Peilin Ye.

9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
    fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.

11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:27:40 -07:00
David S. Miller 76769c38b4 mlx5-updates-2020-08-03
This patchset introduces some updates to mlx5 driver.
 
 1) Jakub converts mlx5 to use the new udp tunnel infrastructure.
    Starting with a hack to allow drivers to request a static configuration
    of the default vxlan port, and then a patch that converts mlx5.
 
 2) Parav implements change_carrier ndo for VF eswitch representors,
    to speedup link state control of representors netdevices.
 
 3) Alex Vesker, makes a simple update to software steering to fix an issue
    with push vlan action sequence
 
 4) Leon removes a redundant dump stack on error flow.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl8oRdgACgkQSD+KveBX
 +j4/LQgAkSjNzOaS7bVDzhoYL3aBQOMIzgocJUeVi7xXH8IO1uy55mNDrKBqjxbW
 dy9U9VsvV5i2V2qkkQLvHVkoDSg8Buo2Uxu4OrZHOLN0KfbFrra4VvmB1CzEBix8
 FICnQaZZcE7529P04TgZ8Mo9vRb5VdJFhqED5Nvegy+y8FolEsQYbjIoDBE6wa0j
 Meqa/29+XCE5FzTOjbbQWizAnRZMbkxtSSreDNgeHxke9eMSO+fmwKScng63QUfl
 7nfU6dW6A0d1kHhpL5RqAFOcmkpSdqYaA3SA+/8pPT9X3yOAkxE6KTKGIixpB9JX
 zQt+Wkna49jJ/JfDQB5vgww5c0HjAQ==
 =j0fG
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-08-03

This patchset introduces some updates to mlx5 driver.

1) Jakub converts mlx5 to use the new udp tunnel infrastructure.
   Starting with a hack to allow drivers to request a static configuration
   of the default vxlan port, and then a patch that converts mlx5.

2) Parav implements change_carrier ndo for VF eswitch representors,
   to speedup link state control of representors netdevices.

3) Alex Vesker, makes a simple update to software steering to fix an issue
   with push vlan action sequence

4) Leon removes a redundant dump stack on error flow.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:24:30 -07:00
Paolo Abeni 8555c6bfd5 mptcp: fix bogus sendmsg() return code under pressure
In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.

Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.

Fixes: f296234c98 ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:14:20 -07:00
Ido Schimmel c88e11e047 devlink: Pass extack when setting trap's action and group's parameters
A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:06:46 -07:00
Amit Cohen 08e335f6ad devlink: Add early_drop trap
Add the packet trap that can report packets that were ECN marked due to RED
AQM.

Signed-off-by: Amit Cohen <amitc@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:06:46 -07:00
YueHaibing 80fbbb1672 fib: Fix undef compile warning
net/core/fib_rules.c:26:7: warning: "CONFIG_IP_MULTIPLE_TABLES" is not defined, evaluates to 0 [-Wundef]
 #elif CONFIG_IP_MULTIPLE_TABLES
       ^~~~~~~~~~~~~~~~~~~~~~~~~

Fixes: 8b66a6fd34 ("fib: fix another fib_rules_ops indirect call wrapper problem")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-By: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:01:49 -07:00
Geliang Tang 190f8b060e mptcp: use mptcp_for_each_subflow in mptcp_stream_accept
Use mptcp_for_each_subflow in mptcp_stream_accept instead of
open-coding.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:01:14 -07:00
David S. Miller ee494f42a3 A few more changes, notably:
* handle new SAE (WPA3 authentication) status codes in the correct way
  * fix a while that should be an if instead, avoiding infinite loops
  * handle beacon filtering changing better
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAl8n/fgACgkQB8qZga/f
 l8TiOA/+PwGoXiP7WPulI+V8Bzv/Tmg6rqKEUFX1hxaGQOAlr9hqV2jvdxkOZmIM
 sQc19vT4TUMiwrdYHOKcKlY5jhi5r8aIzxSA98mGqqKsvERhJWuzoaPR5aHuqOUD
 VxUidOieV+H4Ol1YxV1Tx4eCmLR8ZPjg9EebMGNPOrASg7JXCjVXb5Pp8XXmsuDn
 uXdPc9QQ+DMLqCEF+1C+jhH2lDUfyk78AgPuXw7E3CP/copiKljKL7AApoFebpkM
 Cu0g6hYLBDAnkuZgL7PBw/MQThN5oMLDeX8RNK0uXDy/kdnXd7RL0IMcQpmwTn81
 ZeewZjYq/Ha64sscR87kbzzFM15Vo/UhyhDfqQki/2zrQ26GoQdtgbj/OB3CMMd/
 8kBtZPoItlIkpO18kN9gBO/nB0cGghtsLn8u8pKdQ5RiIv1Q4JqerN4rHCyMgLrk
 FeOH25feAtuJ5MlOkIuHDYTfxNZnw1DE6DQv0hZuui1dbMPHUZuMFrCgZscuVBfj
 HOzhcyvs2bdCU4ZKLGiiq1zTxPply9rESmk4tvlDo5strqCKUEez8VgtZIPw1LZp
 pD43UHAfRiBEAF1pO/l1E6LyTvHMHYFKMiqzwCEXZXf2L276hlQ1rYoHeWEP0hhO
 N03gwHiMs1xug6MeYhsdeyR0tqXb1N7XwnkpSwgtVOH2z+pCBx4=
 =9iOZ
 -----END PGP SIGNATURE-----

Merge tag 'mac80211-next-for-davem-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next

Johannes Berg says:

====================
A few more changes, notably:
 * handle new SAE (WPA3 authentication) status codes in the correct way
 * fix a while that should be an if instead, avoiding infinite loops
 * handle beacon filtering changing better
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:00:22 -07:00
Ioana-Ruxandra Stăncioi 88fab21c69 seg6_iptunnel: Refactor seg6_lwt_headroom out of uapi header
Refactor the function seg6_lwt_headroom out of the seg6_iptunnel.h uapi
header, because it is only used in seg6_iptunnel.c. Moreover, it is only
used in the kernel code, as indicated by the "#ifdef __KERNEL__".

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Ioana-Ruxandra Stăncioi <stancioi@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 17:57:40 -07:00
Jianfeng Wang 730e700e2c tcp: apply a floor of 1 for RTT samples from TCP timestamps
For retransmitted packets, TCP needs to resort to using TCP timestamps
for computing RTT samples. In the common case where the data and ACK
fall in the same 1-millisecond interval, TCP senders with millisecond-
granularity TCP timestamps compute a ca_rtt_us of 0. This ca_rtt_us
of 0 propagates to rs->rtt_us.

This value of 0 can cause performance problems for congestion control
modules. For example, in BBR, the zero min_rtt sample can bring the
min_rtt and BDP estimate down to 0, reduce snd_cwnd and result in a
low throughput. It would be hard to mitigate this with filtering in
the congestion control module, because the proper floor to apply would
depend on the method of RTT sampling (using timestamp options or
internally-saved transmission timestamps).

This fix applies a floor of 1 for the RTT sample delta from TCP
timestamps, so that seq_rtt_us, ca_rtt_us, and rs->rtt_us will be at
least 1 * (USEC_PER_SEC / TCP_TS_HZ).

Note that the receiver RTT computation in tcp_rcv_rtt_measure() and
min_rtt computation in tcp_update_rtt_min() both already apply a floor
of 1 timestamp tick, so this commit makes the code more consistent in
avoiding this edge case of a value of 0.

Signed-off-by: Jianfeng Wang <jfwang@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kevin Yang <yyd@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 17:54:03 -07:00
Huang Guobin fbc97de84e tipc: Use is_broadcast_ether_addr() instead of memcmp()
Using is_broadcast_ether_addr() instead of directly use
memcmp() to determine if the ethernet address is broadcast
address.

spatch with a semantic match is used to found this problem.
(http://coccinelle.lip6.fr/)

Signed-off-by: Huang Guobin <huangguobin4@huawei.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 16:21:46 -07:00
David S. Miller f2e0b29a9a Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

1) UAF in chain binding support from previous batch, from Dan Carpenter.

2) Queue up delayed work to expire connections with no destination,
   from Andrew Sy Kim.

3) Use fallthrough pseudo-keyword, from Gustavo A. R. Silva.

4) Replace HTTP links with HTTPS, from Alexander A. Klimov.

5) Remove superfluous null header checks in ip6tables, from
   Gaurav Singh.

6) Add extended netlink error reporting for expression.

7) Report EEXIST on overlapping chain, set elements and flowtable
   devices.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 16:03:18 -07:00
Vincent Duvert d0f6ba2ef2 appletalk: Fix atalk_proc_init() return path
Add a missing return statement to atalk_proc_init so it doesn't return
-ENOMEM when successful.  This allows the appletalk module to load
properly.

Fixes: e2bcd8b0ce ("appletalk: use remove_proc_subtree to simplify procfs code")
Link: https://www.downtowndougbrown.com/2020/08/hacking-up-a-fix-for-the-broken-appletalk-kernel-module-in-linux-5-1-and-newer/
Reported-by: Christopher KOBAYASHI <chris@disavowed.jp>
Reported-by: Doug Brown <doug@downtowndougbrown.com>
Signed-off-by: Vincent Duvert <vincent.ldev@duvert.net>
[lukas: add missing tags]
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v5.1+
Cc: Yue Haibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:48:32 -07:00
Miaohe Lin 2f631133c4 net: Pass NULL to skb_network_protocol() when we don't care about vlan depth
When we don't care about vlan depth, we could pass NULL instead of the
address of a unused local variable to skb_network_protocol() as a param.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:38:31 -07:00
Miaohe Lin c15fc199b3 net: Use __skb_pagelen() directly in skb_cow_data()
In fact, skb_pagelen() - skb_headlen() is equal to __skb_pagelen(), use it
directly to avoid unnecessary skb_headlen() call.

Also fix the CHECK note of checkpatch.pl:
    Comparison to NULL could be written "!__pskb_pull_tail"

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:38:31 -07:00
Lorenzo Bianconi 622e32b7d4 net: gre: recompute gre csum for sctp over gre tunnels
The GRE tunnel can be used to transport traffic that does not rely on a
Internet checksum (e.g. SCTP). The issue can be triggered creating a GRE
or GRETAP tunnel and transmitting SCTP traffic ontop of it where CRC
offload has been disabled. In order to fix the issue we need to
recompute the GRE csum in gre_gso_segment() not relying on the inner
checksum.
The issue is still present when we have the CRC offload enabled.
In this case we need to disable the CRC offload if we require GRE
checksum since otherwise skb_checksum() will report a wrong value.

Fixes: 90017accff ("sctp: Add GSO support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:29:44 -07:00
Nikolay Aleksandrov fd65e5a95d net: bridge: clear bridge's private skb space on xmit
We need to clear all of the bridge private skb variables as they can be
stale due to the packet being recirculated through the stack and then
transmitted through the bridge device. Similar memset is already done on
bridge's input. We've seen cases where proxyarp_replied was 1 on routed
multicast packets transmitted through the bridge to ports with neigh
suppress which were getting dropped. Same thing can in theory happen with
the port isolation bit as well.

Fixes: 821f1b21ca ("bridge: add new BR_NEIGH_SUPPRESS port flag to suppress arp and nd flood")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:26:46 -07:00
Florent Fourcot ae79dbf609 ipv6/addrconf: use a boolean to choose between UNREGISTER/DOWN
"how" was used as a boolean. Change the type to bool, and improve
variable name

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:24:14 -07:00
Florent Fourcot d208a42a62 ipv6/addrconf: call addrconf_ifdown with consistent values
Second parameter of addrconf_ifdown "how" is used as a boolean
internally. It does not make sense to call it with something different
of 0 or 1.

This value is set to 2 in all git history.

Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:24:14 -07:00
Eelco Chaudron 9bf24f594c net: openvswitch: make masks cache size configurable
This patch makes the masks cache size configurable, or with
a size of 0, disable it.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Eelco Chaudron 9d2f627b7e net: openvswitch: add masks cache hit counter
Add a counter that counts the number of masks cache hits, and
export it through the megaflow netlink statistics.

Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:17:48 -07:00
Gaurav Singh 71fed0bc86 ethtool: ethnl_set_linkmodes: remove redundant null check
info cannot be NULL here since its being accessed earlier
in the function: nlmsg_parse(info->nlhdr...). Remove this
redundant NULL check.

Signed-off-by: Gaurav Singh <gaurav1086@gmail.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:11:49 -07:00
Peilin Ye 9aba6c5b49 openvswitch: Prevent kernel-infoleak in ovs_ct_put_key()
ovs_ct_put_key() is potentially copying uninitialized kernel stack memory
into socket buffers, since the compiler may leave a 3-byte hole at the end
of `struct ovs_key_ct_tuple_ipv4` and `struct ovs_key_ct_tuple_ipv6`. Fix
it by initializing `orig` with memset().

Fixes: 9dd7f8907c ("openvswitch: Add original direction conntrack tuple to sw_flow_key.")
Suggested-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peilin Ye <yepeilin.cs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:09:44 -07:00
wenxu 038ebb1a71 net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct
When openvswitch conntrack offload with act_ct action. Fragment packets
defrag in the ingress tc act_ct action and miss the next chain. Then the
packet pass to the openvswitch datapath without the mru. The over
mtu packet will be dropped in output action in openvswitch for over mtu.

"kernel: net2: dropped over-mtu packet: 1528 > 1500"

This patch add mru in the tc_skb_ext for adefrag and miss next chain
situation. And also add mru in the qdisc_skb_cb. The act_ct set the mru
to the qdisc_skb_cb when the packet defrag. And When the chain miss,
The mru is set to tc_skb_ext which can be got by ovs datapath.

Fixes: b57dc7c13e ("net/sched: Introduce action ct")
Signed-off-by: wenxu <wenxu@ucloud.cn>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 15:04:48 -07:00
Linus Torvalds e4cbce4d13 The main changes in this cycle were:
- Improve uclamp performance by using a static key for the fast path
 
  - Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
    better power efficiency of RT tasks on battery powered devices.
    (The default is to maximize performance & reduce RT latencies.)
 
  - Improve utime and stime tracking accuracy, which had a fixed boundary
    of error, which created larger and larger relative errors as the values
    become larger. This is now replaced with more precise arithmetics,
    using the new mul_u64_u64_div_u64() helper in math64.h.
 
  - Improve the deadline scheduler, such as making it capacity aware
 
  - Improve frequency-invariant scheduling
 
  - Misc cleanups in energy/power aware scheduling
 
  - Add sched_update_nr_running tracepoint to track changes to nr_running
 
  - Documentation additions and updates
 
  - Misc cleanups and smaller fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oJDURHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ixLg//bqWzFlfWirvngTgDxDnplwUTyKXmMCcq
 R1IYhlyK2O5FxvhbRmdmW11W3yzyTPvgCs6Q/70negGaPNe2w1OxfxiK9NMKz5eu
 M1LoXas7pL5g7Pr/ZxxHk/8VqJLV4t9MkodiiInmV6lTaznT3sU6a/kpYQjJyFnG
 Tuu9jd6JhdRKmePDJnNmUBoGQ7JiOQDcX4HtkcQ3OA+An3624tmJzbW1yts+uj7J
 ZWo2EY60RfbA9MxQXGPOaR/nAjngWs4Q6tddAh10mftsPq1gR2iFUKju1d31MQt/
 RHLdiqJf+AyUC4popKG7a+7ilCKMBwPociSreTJNPyEUQ1X4AM3vUVk4yjUoiDph
 k2WdsCF8/JRdhXg0NnrpPUqOaAbQj53EeXnitEb92E7WyTZgLOvAtpV//xZo6utp
 2QHerfrQ9SoGQjz/ho78za5vQtV1x25yDhd+X4XV4QEhIy85G9/2JCpC/Kc/TXLf
 OO7A4X69XztKTEJhP60g8ldCPUe4N2vbh1vKY6oAD8AFQVVNZ6n7375/Qa//b0/k
 ++hcYkPc2EK97/aBFdvzDgqb7aUo7Mtn2ibke16sQU4szulaoRuAHQG4jdGKMwbD
 dk2VBoxyxeYFXWHsNneSe87+ha3sd0dSN0ul1EB/SlFrVELMvy634YXnMYGW8ima
 PzyPB0ezpuA=
 =PbO7
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve uclamp performance by using a static key for the fast path

 - Add the "sched_util_clamp_min_rt_default" sysctl, to optimize for
   better power efficiency of RT tasks on battery powered devices.
   (The default is to maximize performance & reduce RT latencies.)

 - Improve utime and stime tracking accuracy, which had a fixed boundary
   of error, which created larger and larger relative errors as the
   values become larger. This is now replaced with more precise
   arithmetics, using the new mul_u64_u64_div_u64() helper in math64.h.

 - Improve the deadline scheduler, such as making it capacity aware

 - Improve frequency-invariant scheduling

 - Misc cleanups in energy/power aware scheduling

 - Add sched_update_nr_running tracepoint to track changes to nr_running

 - Documentation additions and updates

 - Misc cleanups and smaller fixes

* tag 'sched-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched/doc: Factorize bits between sched-energy.rst & sched-capacity.rst
  sched/doc: Document capacity aware scheduling
  sched: Document arch_scale_*_capacity()
  arm, arm64: Fix selection of CONFIG_SCHED_THERMAL_PRESSURE
  Documentation/sysctl: Document uclamp sysctl knobs
  sched/uclamp: Add a new sysctl to control RT default boost value
  sched/uclamp: Fix a deadlock when enabling uclamp static key
  sched: Remove duplicated tick_nohz_full_enabled() check
  sched: Fix a typo in a comment
  sched/uclamp: Remove unnecessary mutex_init()
  arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE
  sched: Cleanup SCHED_THERMAL_PRESSURE kconfig entry
  arch_topology, sched/core: Cleanup thermal pressure definition
  trace/events/sched.h: fix duplicated word
  linux/sched/mm.h: drop duplicated words in comments
  smp: Fix a potential usage of stale nr_cpus
  sched/fair: update_pick_idlest() Select group with lowest group_util when idle_cpus are equal
  sched: nohz: stop passing around unused "ticks" parameter.
  sched: Better document ttwu()
  sched: Add a tracepoint to track rq->nr_running
  ...
2020-08-03 14:58:38 -07:00
Dmitry Yakunin 21594c4408 bpf: Allow to specify ifindex for skb in bpf_prog_test_run_skb
Now skb->dev is unconditionally set to the loopback device in current net
namespace. But if we want to test bpf program which contains code branch
based on ifindex condition (eg filters out localhost packets) it is useful
to allow specifying of ifindex from userspace. This patch adds such option
through ctx_in (__sk_buff) parameter.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200803090545.82046-3-zeil@yandex-team.ru
2020-08-03 23:32:23 +02:00
Dmitry Yakunin fa5cb548ce bpf: Setup socket family and addresses in bpf_prog_test_run_skb
Now it's impossible to test all branches of cgroup_skb bpf program which
accesses skb->family and skb->{local,remote}_ip{4,6} fields because they
are zeroed during socket allocation. This commit fills socket family and
addresses from related fields in constructed skb.

Signed-off-by: Dmitry Yakunin <zeil@yandex-team.ru>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200803090545.82046-2-zeil@yandex-team.ru
2020-08-03 23:31:59 +02:00
Linus Torvalds 8f0cb6660a These are the latest RCU bits for v5.9:
- kfree_rcu updates
   - RCU tasks updates
   - Read-side scalability tests
   - SRCU updates
   - Torture-test updates
   - Documentation updates
   - Miscellaneous fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8n80ERHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1gauA/+NtuExW9V9cPDZ8AAp6x6QfoEIgqN4VEk
 pYuyP0+ZbmwH+h8z7qPqMrwxUHQnhef7gqtlWa7wj9MawbEbmqnA/3uivjX/3Aao
 bGMMXkqXppc6hgwktgLNk8vfq3LRVEH2P0i0I+Tymgxu3DCHSGRep4LWfdAS/q3z
 4pe5JXqdMx+Qnfy/bsVxJTaJAncMq1LQNAtWY1TIwK8L8RmpXrj5dvuLKUr7q+zl
 P+BfXyrdX+x05TpmHHnI/bR3w9yASL32E0S3IaQYRRqH8TsUIGHWe13Ib6hKXXG5
 j7W5KrsOgr0fQBxi+JW2fgGQkrua4o7yk4H2Ygj+Fi5RvP2uqNZdvXFAlP2cUMu/
 7Pg8+7kC6jKIrwpD03s9ZZzm0QN3jsCxFs2PEkkHMzjXbe1CI4tIkTH6ex1uvjR2
 v3OhCIp6ypxpEIJbFQucia0iQ4NF+evKjqCvRkbepqQ096jg+CNFh0VG0Tp8XR+y
 Gk9B9oXvLLPMd6ah5CI9nLJKiMWVRV8mvvqspoblGo//+39ksh4mzxm865tFXYg4
 C+DPJvKlY15Ib5eJ/xr8EZ/oS0K2sUF9sMYnK4P8QMhyTBMbpAZiljHYK+Wujt8I
 g/JCWxrEMv3LHPY9/guB5Nod/Qb4Jqqm9iE9qEX3MQxtt2O2nmmWd91pzFcUXlFU
 RDBWYJ63Okg=
 =rNhf
 -----END PGP SIGNATURE-----

Merge tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull RCU updates from Ingo Molnar:

 - kfree_rcu updates

 - RCU tasks updates

 - Read-side scalability tests

 - SRCU updates

 - Torture-test updates

 - Documentation updates

 - Miscellaneous fixes

* tag 'core-rcu-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (109 commits)
  torture: Remove obsolete "cd $KVM"
  torture: Avoid duplicate specification of qemu command
  torture: Dump ftrace at shutdown only if requested
  torture: Add kvm-tranform.sh script for qemu-cmd files
  torture: Add more tracing crib notes to kvm.sh
  torture: Improve diagnostic for KCSAN-incapable compilers
  torture: Correctly summarize build-only runs
  torture: Pass --kmake-arg to all make invocations
  rcutorture: Check for unwatched readers
  torture: Abstract out console-log error detection
  torture: Add a stop-run capability
  torture: Create qemu-cmd in --buildonly runs
  rcu/rcutorture: Replace 0 with false
  torture: Add --allcpus argument to the kvm.sh script
  torture: Remove whitespace from identify_qemu_vcpus output
  rcutorture: NULL rcu_torture_current earlier in cleanup code
  rcutorture: Handle non-statistic bang-string error messages
  torture: Set configfile variable to current scenario
  rcutorture: Add races with task-exit processing
  locktorture: Use true and false to assign to bool variables
  ...
2020-08-03 14:31:33 -07:00
Linus Torvalds ab5c60b79a Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add support for allocating transforms on a specific NUMA Node
   - Introduce the flag CRYPTO_ALG_ALLOCATES_MEMORY for storage users

  Algorithms:
   - Drop PMULL based ghash on arm64
   - Fixes for building with clang on x86
   - Add sha256 helper that does the digest in one go
   - Add SP800-56A rev 3 validation checks to dh

  Drivers:
   - Permit users to specify NUMA node in hisilicon/zip
   - Add support for i.MX6 in imx-rngc
   - Add sa2ul crypto driver
   - Add BA431 hwrng driver
   - Add Ingenic JZ4780 and X1000 hwrng driver
   - Spread IRQ affinity in inside-secure and marvell/cesa"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (157 commits)
  crypto: sa2ul - Fix inconsistent IS_ERR and PTR_ERR
  hwrng: core - remove redundant initialization of variable ret
  crypto: x86/curve25519 - Remove unused carry variables
  crypto: ingenic - Add hardware RNG for Ingenic JZ4780 and X1000
  dt-bindings: RNG: Add Ingenic RNG bindings.
  crypto: caam/qi2 - add module alias
  crypto: caam - add more RNG hw error codes
  crypto: caam/jr - remove incorrect reference to caam_jr_register()
  crypto: caam - silence .setkey in case of bad key length
  crypto: caam/qi2 - create ahash shared descriptors only once
  crypto: caam/qi2 - fix error reporting for caam_hash_alloc
  crypto: caam - remove deadcode on 32-bit platforms
  crypto: ccp - use generic power management
  crypto: xts - Replace memcpy() invocation with simple assignment
  crypto: marvell/cesa - irq balance
  crypto: inside-secure - irq balance
  crypto: ecc - SP800-56A rev 3 local public key validation
  crypto: dh - SP800-56A rev 3 local public key validation
  crypto: dh - check validity of Z before export
  lib/mpi: Add mpi_sub_ui()
  ...
2020-08-03 10:40:14 -07:00
Jakub Kicinski 966e505976 udp_tunnel: add the ability to hard-code IANA VXLAN
mlx5 has the IANA VXLAN port (4789) hard coded by the device,
instead of being added dynamically when tunnels are created.

To support this add a workaround flag to struct udp_tunnel_nic_info.
Skipping updates for the port is fairly trivial, dumping the hard
coded port via ethtool requires some code duplication. The port
is not a part of any real table, we dump it in a special table
which has no tunnel types supported and only one entry.

This is the last known workaround / hack needed to convert
all drivers to the new infra.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-08-03 10:13:54 -07:00
Loic Poulain 0b91111fb1 mac80211: Do not report beacon loss if beacon filtering enabled
mac80211.h says: Beacon filter support is advertised with the
IEEE80211_VIF_BEACON_FILTER interface capability. The driver needs to
enable beacon filter support whenever power save is enabled, that is
IEEE80211_CONF_PS is set. When power save is enabled, the stack will
not check for beacon loss and the driver needs to notify about loss
of beacons with ieee80211_beacon_loss().

Some controllers may want to dynamically enable the beacon filter
capabilities on power save entry (CONF_PS) and disable it on exit.
This is the case for the wcn36xx driver which only supports beacon
filtering in PS mode (no CONNECTION_MONITOR support).

When the mac80211 beacon monitor timer expires, the beacon filter
flag must be checked again in case it as been changed in between
(e.g. vif moved to PS mode).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/1592471863-31402-1-git-send-email-loic.poulain@linaro.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 13:02:06 +02:00
Roi Dayan 4203b19c27 netfilter: flowtable: Set offload timeout when adding flow
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

Fixes: 90964016e5 ("netfilter: nf_conntrack: add IPS_OFFLOAD status bit")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03 12:37:24 +02:00
Roi Dayan 73f9407b3e netfilter: conntrack: Move nf_ct_offload_timeout to header file
To be used by callers from other modules.

[ Rename DAY to NF_CT_DAY to avoid possible symbol name pollution
  issue --Pablo ]

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-03 12:36:47 +02:00
Alexander A. Klimov 94f17c00d6 libceph: replace HTTP links with HTTPS ones
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

[ idryomov: Do the same for the CRUSH paper and replace
  ceph.newdream.net with ceph.io. ]

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:26 +02:00
Jeff Layton 042f649810 libceph: just have osd_req_op_init() return a pointer
The caller can just ignore the return. No need for this wrapper that
just casts the other function to void.

[ idryomov: argument alignment ]

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:05:25 +02:00
Johannes Berg 5981fe5b05 mac80211: fix misplaced while instead of if
This never was intended to be a 'while' loop, it should've
just been an 'if' instead of 'while'. Fix this.

I noticed this while applying another patch from Ben that
intended to fix a busy loop at this spot.

Cc: stable@vger.kernel.org
Fixes: b16798f5b9 ("mac80211: mark station unauthorized before key removal")
Reported-by: Ben Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20200803110209.253009ae41ff.I3522aad099392b31d5cf2dcca34cbac7e5832dde@changeid
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 11:03:14 +02:00
Ilya Dryomov 6e6f0f0116 libceph: dump class and method names on method calls
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2020-08-03 11:03:01 +02:00
Ilya Dryomov 5133ba8f15 libceph: use target_copy() in send_linger()
Instead of copying just oloc, oid and flags, copy the entire
linger target.  This is more for consistency than anything else,
as send_linger() -> submit_request() -> __submit_request() sends
the request regardless of what calc_target() says (i.e. both on
CALC_TARGET_NO_ACTION and CALC_TARGET_NEED_RESEND).

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2020-08-03 11:03:01 +02:00
Miaohe Lin 3b1648f109 nl80211: use eth_zero_addr() to clear mac address
Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/1596273349-24333-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:56:22 +02:00
Miaohe Lin 47d76e3190 mac80211: use eth_zero_addr() to clear mac address
Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Link: https://lore.kernel.org/r/1596273158-24183-1-git-send-email-linmiaohe@huawei.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:56:04 +02:00
John Crispin 6628d00116 mac8211: fix struct initialisation
Sparse showed up with the following error.
net/mac80211/agg-rx.c:480:43: warning: Using plain integer as NULL pointer

Fixes: 2ab4587675 (mac80211: add support for the ADDBA extension element)
Signed-off-by: John Crispin <john@phrozen.org>
Link: https://lore.kernel.org/r/20200803084540.179908-1-john@phrozen.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:55:17 +02:00
Jouni Malinen 4e56cde15f mac80211: Handle special status codes in SAE commit
SAE authentication has been extended with H2E (IEEE 802.11 REVmd) and PK
(WFA) options. Those extensions use special status code values in the
SAE commit messages (Authentication frame with transaction sequence
number 1) to identify which extension is in use. mac80211 was
interpreting those new values as the AP denying authentication and that
resulted in failure to complete SAE authentication in some cases.

Fix this by adding exceptions for the new status code values 126 and
127.

Signed-off-by: Jouni Malinen <jouni@codeaurora.org>
Link: https://lore.kernel.org/r/20200731183830.18735-1-jouni@codeaurora.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2020-08-03 10:54:54 +02:00
Pablo Neira Ayuso 77a92189ec netfilter: nf_tables: report EEXIST on overlaps
Replace EBUSY by EEXIST in the following cases:

- If the user adds a chain with a different configuration such as different
  type, hook and priority.

- If the user adds a non-base chain that clashes with an existing basechain.

- If the user adds a { key : value } mapping element and the key exists
  but the value differs.

- If the device already belongs to an existing flowtable.

User describe that this error reporting is confusing:

- https://bugzilla.netfilter.org/show_bug.cgi?id=1176
- https://bugzilla.netfilter.org/show_bug.cgi?id=1413

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2020-08-02 19:53:45 +02:00
David S. Miller bd0b33b248 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Resolved kernel/bpf/btf.c using instructions from merge commit
69138b34a7

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-02 01:02:12 -07:00