Commit Graph

56202 Commits

Author SHA1 Message Date
Eric Dumazet 858f501744 tcp: do not recycle cloned skbs
It is illegal to change arbitrary fields in skb_shared_info if the
skb is cloned.

Before calling skb_zcopy_clear() we need to ensure this rule,
therefore we need to move the test from sk_stream_alloc_skb()
to sk_wmem_free_skb()

Fixes: 4f661542a4 ("tcp: fix zerocopy and notsent_lowat issues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-15 09:22:41 -07:00
Sinan Kaya 432d82200f net: replace CONFIG_DEBUG_KERNEL with CONFIG_DEBUG_MISC
CONFIG_DEBUG_KERNEL should not impact code generation.  Use the newly
defined CONFIG_DEBUG_MISC instead to keep the current code.

Link: http://lkml.kernel.org/r/20190413224438.10802-6-okaya@kernel.org
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc:  Chris Zankel <chris@zankel.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 19:52:50 -07:00
Sabrina Dubroca feadc4b6cf rtnetlink: always put IFLA_LINK for links with a link-netnsid
Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when
iflink == ifindex.

In some cases, a device can be created in a different netns with the
same ifindex as its parent. That device will not dump its IFLA_LINK
attribute, which can confuse some userspace software that expects it.
For example, if the last ifindex created in init_net and foo are both
8, these commands will trigger the issue:

    ip link add parent type dummy                   # ifindex 9
    ip link add link parent netns foo type macvlan  # ifindex 9 in ns foo

So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump,
always put the IFLA_LINK attribute as well.

Thanks to Dan Winship for analyzing the original OpenShift bug down to
the missing netlink attribute.

v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said
    add Nicolas' ack
v3: change Fixes tag
    fix subject typo, spotted by Edward Cree

Analyzed-by: Dan Winship <danw@redhat.com>
Fixes: d8a5ec6727 ("[NET]: netlink support for moving devices between network namespaces.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14 15:40:01 -07:00
Yuchung Cheng cd736d8b67 tcp: fix retrans timestamp on passive Fast Open
Commit c7d13c8faa ("tcp: properly track retry time on
passive Fast Open") sets the start of SYNACK retransmission
time on passive Fast Open in "retrans_stamp". However the
timestamp is not reset upon the handshake has completed. As a
result, future data packet retransmission may not update it in
tcp_retransmit_skb(). This may lead to socket aborting earlier
unexpectedly by retransmits_timed_out() since retrans_stamp remains
the SYNACK rtx time.

This bug only manifests on passive TFO sender that a) suffered
SYNACK timeout and then b) stalls on very first loss recovery. Any
successful loss recovery would reset the timestamp to avoid this
issue.

Fixes: c7d13c8faa ("tcp: properly track retry time on passive Fast Open")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14 15:17:49 -07:00
Linus Torvalds 318222a35b Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - a few misc things and hotfixes

 - ocfs2

 - almost all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (139 commits)
  kernel/memremap.c: remove the unused device_private_entry_fault() export
  mm: delete find_get_entries_tag
  mm/huge_memory.c: make __thp_get_unmapped_area static
  mm/mprotect.c: fix compilation warning because of unused 'mm' variable
  mm/page-writeback: introduce tracepoint for wait_on_page_writeback()
  mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags
  mm/Kconfig: update "Memory Model" help text
  mm/vmscan.c: don't disable irq again when count pgrefill for memcg
  mm: memblock: make keeping memblock memory opt-in rather than opt-out
  hugetlbfs: always use address space in inode for resv_map pointer
  mm/z3fold.c: support page migration
  mm/z3fold.c: add structure for buddy handles
  mm/z3fold.c: improve compression by extending search
  mm/z3fold.c: introduce helper functions
  mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist
  mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig
  mm/vmscan.c: simplify shrink_inactive_list()
  fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback
  xen/privcmd-buf.c: convert to use vm_map_pages_zero()
  xen/gntdev.c: convert to use vm_map_pages()
  ...
2019-05-14 10:10:55 -07:00
Ira Weiny 73b0140bf0 mm/gup: change GUP fast to use flags rather than a write 'bool'
To facilitate additional options to get_user_pages_fast() change the
singular write parameter to be gup_flags.

This patch does not change any functionality.  New functionality will
follow in subsequent patches.

Some of the get_user_pages_fast() call sites were unchanged because they
already passed FOLL_WRITE or 0 for the write parameter.

NOTE: It was suggested to change the ordering of the get_user_pages_fast()
arguments to ensure that callers were converted.  This breaks the current
GUP call site convention of having the returned pages be the final
parameter.  So the suggestion was rejected.

Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marshall <hubcap@omnibond.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:46 -07:00
Ira Weiny 932f4a630a mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERM
Pach series "Add FOLL_LONGTERM to GUP fast and use it".

HFI1, qib, and mthca, use get_user_pages_fast() due to its performance
advantages.  These pages can be held for a significant time.  But
get_user_pages_fast() does not protect against mapping FS DAX pages.

Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks.  XDP has also
shown interest in using this functionality.[1]

In addition we change get_user_pages() to use the new FOLL_LONGTERM flag
and remove the specialized get_user_pages_longterm call.

[1] https://lkml.org/lkml/2019/3/19/939

"longterm" is a relative thing and at this point is probably a misnomer.
This is really flagging a pin which is going to be given to hardware and
can't move.  I've thought of a couple of alternative names but I think we
have to settle on if we are going to use FL_LAYOUT or something else to
solve the "longterm" problem.  Then I think we can change the flag to a
better name.

Secondly, it depends on how often you are registering memory.  I have
spoken with some RDMA users who consider MR in the performance path...
For the overall application performance.  I don't have the numbers as the
tests for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an aside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

This patch (of 7):

This patch starts a series which aims to support FOLL_LONGTERM in
get_user_pages_fast().  Some callers who would like to do a longterm (user
controlled pin) of pages with the fast variant of GUP for performance
purposes.

Rather than have a separate get_user_pages_longterm() call, introduce
FOLL_LONGTERM and change the longterm callers to use it.

This patch does not change any functionality.  In the short term
"longterm" or user controlled pins are unsafe for Filesystems and FS DAX
in particular has been blocked.  However, callers of get_user_pages_fast()
were not "protected".

FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it
requires vmas to determine if DAX is in use.

NOTE: In merging with the CMA changes we opt to change the
get_user_pages() call in check_and_migrate_cma_pages() to a call of
__get_user_pages_locked() on the newly migrated pages.  This makes the
code read better in that we are calling __get_user_pages_locked() on the
pages before and after a potential migration.

As a side affect some of the interfaces are cleaned up but this is not the
primary purpose of the series.

In review[1] it was asked:

<quote>
> This I don't get - if you do lock down long term mappings performance
> of the actual get_user_pages call shouldn't matter to start with.
>
> What do I miss?

A couple of points.

First "longterm" is a relative thing and at this point is probably a
misnomer.  This is really flagging a pin which is going to be given to
hardware and can't move.  I've thought of a couple of alternative names
but I think we have to settle on if we are going to use FL_LAYOUT or
something else to solve the "longterm" problem.  Then I think we can
change the flag to a better name.

Second, It depends on how often you are registering memory.  I have spoken
with some RDMA users who consider MR in the performance path...  For the
overall application performance.  I don't have the numbers as the tests
for HFI1 were done a long time ago.  But there was a significant
advantage.  Some of which is probably due to the fact that you don't have
to hold mmap_sem.

Finally, architecturally I think it would be good for everyone to use
*_fast.  There are patches submitted to the RDMA list which would allow
the use of *_fast (they reworking the use of mmap_sem) and as soon as they
are accepted I'll submit a patch to convert the RDMA core as well.  Also
to this point others are looking to use *_fast.

As an asside, Jasons pointed out in my previous submission that *_fast and
*_unlocked look very much the same.  I agree and I think further cleanup
will be coming.  But I'm focused on getting the final solution for DAX at
the moment.

</quote>

[1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965

[ira.weiny@intel.com: v3]
  Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com
Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14 09:47:45 -07:00
John Fastabend cabede8b4f bpf: sockmap fix msg->sg.size account on ingress skb
When converting a skb to msg->sg we forget to set the size after the
latest ktls/tls code conversion. This patch can be reached by doing
a redir into ingress path from BPF skb sock recv hook. Then trying to
read the size fails.

Fix this by setting the size.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14 01:31:43 +02:00
John Fastabend c42253cc88 bpf: sockmap remove duplicate queue free
In tcp bpf remove we free the cork list and purge the ingress msg
list. However we do this before the ref count reaches zero so it
could be possible some other access is in progress. In this case
(tcp close and/or tcp_unhash) we happen to also hold the sock
lock so no path exists but lets fix it otherwise it is extremely
fragile and breaks the reference counting rules. Also we already
check the cork list and ingress msg queue and free them once the
ref count reaches zero so its wasteful to check twice.

Fixes: 604326b41a ("bpf, sockmap: convert to generic sk_msg interface")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14 01:31:43 +02:00
John Fastabend 014894360e bpf: sockmap, only stop/flush strp if it was enabled at some point
If we try to call strp_done on a parser that has never been
initialized, because the sockmap user is only using TX side for
example we get the following error.

  [  883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030 __flush_work+0x1ca/0x1e0
  ...
  [  883.422095] Workqueue: events sk_psock_destroy_deferred
  [  883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0

This had been wrapped in a 'if (psock->parser.enabled)' logic which
was broken because the strp_done() was never actually being called
because we do a strp_stop() earlier in the tear down logic will
set parser.enabled to false. This could result in a use after free
if work was still in the queue and was resolved by the patch here,
1d79895aef ("sk_msg: Always cancel strp work before freeing the
psock"). However, calling strp_stop(), done by the patch marked in
the fixes tag, only is useful if we never initialized a strp parser
program and never initialized the strp to start with. Because if
we had initialized a stream parser strp_stop() would have been called
by sk_psock_drop() earlier in the tear down process.  By forcing the
strp to stop we get past the WARNING in strp_done that checks
the stopped flag but calling cancel_work_sync on work that has never
been initialized is also wrong and generates the warning above.

To fix check if the parser program exists. If the program exists
then the strp work has been initialized and must be sync'd and
cancelled before free'ing any structures. If no program exists we
never initialized the stream parser in the first place so skip the
sync/cancel logic implemented by strp_done.

Finally, remove the strp_done its not needed and in the case where we
are using the stream parser has already been called.

Fixes: e8e3437762 ("bpf: Stop the psock parser before canceling its work")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14 01:31:31 +02:00
Eric Dumazet b1c17a9a35 flow_dissector: disable preemption around BPF calls
Various things in eBPF really require us to disable preemption
before running an eBPF program.

syzbot reported :

BUG: assuming atomic context at net/core/flow_dissector.c:737
in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3
2 locks held by syz-executor.3/24710:
 #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850
 #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822
CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x172/0x1f0 lib/dump_stack.c:113
 __cant_sleep kernel/sched/core.c:6165 [inline]
 __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142
 bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737
 __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853
 skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline]
 skb_probe_transport_header include/linux/skbuff.h:2500 [inline]
 skb_probe_transport_header include/linux/skbuff.h:2493 [inline]
 tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940
 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037
 call_write_iter include/linux/fs.h:1872 [inline]
 do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693
 do_iter_write fs/read_write.c:970 [inline]
 do_iter_write+0x184/0x610 fs/read_write.c:951
 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015
 do_writev+0x15b/0x330 fs/read_write.c:1058
 __do_sys_writev fs/read_write.c:1131 [inline]
 __se_sys_writev fs/read_write.c:1128 [inline]
 __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128
 do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Fixes: d58e468b11 ("flow_dissector: implements flow dissector BPF hook")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Petar Penkov <ppenkov@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-13 09:53:42 -07:00
David S. Miller 3ebb41bf47 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

1) Postpone chain policy update to drop after transaction is complete,
   from Florian Westphal.

2) Add entry to flowtable after confirmation to fix UDP flows with
   packets going in one single direction.

3) Reference count leak in dst object, from Taehee Yoo.

4) Check for TTL field in flowtable datapath, from Taehee Yoo.

5) Fix h323 conntrack helper due to incorrect boundary check,
   from Jakub Jankowski.

6) Fix incorrect rcu dereference when fetching basechain stats,
   from Florian Westphal.

7) Missing error check when adding new entries to flowtable,
   from Taehee Yoo.

8) Use version field in nfnetlink message to honor the nfgen_family
   field, from Kristian Evensen.

9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6,
   from Subash Abhinov Kasiviswanathan.

10) Prevent dying entries from being added to the flowtable,
    from Taehee Yoo.

11) Don't hit WARN_ON() with malformed blob in ebtables with
    trailing data after last rule, reported by syzbot, patch
    from Florian Westphal.

12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel
    code.

13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian
    Westphal.

This batch comes with a conflict that can be fixed with this patch:

diff --cc include/uapi/linux/netfilter/nf_tables.h
index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys
   * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
   * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
   * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
 - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack
+  * @NFT_CT_ID: conntrack id
   */
  enum nft_ct_keys {
  	NFT_CT_STATE,
@@@ -991,6 -993,8 +992,7 @@@
  	NFT_CT_DST_IP,
  	NFT_CT_SRC_IP6,
  	NFT_CT_DST_IP6,
 -	NFT_CT_TIMEOUT,
+ 	NFT_CT_ID,
  	__NFT_CT_MAX
  };
  #define NFT_CT_MAX		(__NFT_CT_MAX - 1)

That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer,
I can also solve this conflict here, just let me know.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-13 08:55:15 -07:00
Hariprasad Kelam 3285a9aa65 net: dccp : proto: remove Unneeded variable "err"
Fix below issue reported by coccicheck

net/dccp/proto.c:266:5-8: Unneeded variable: "err". Return "0" on line
310

Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-12 13:21:30 -07:00
Vladimir Oltean 8767137510 net: dsa: Initialize DSA_SKB_CB(skb)->deferred_xmit variable
The sk_buff control block can have any contents on xmit put there by the
stack, so initialization is mandatory, since we are checking its value
after the actual DSA xmit (the tagger may have changed it).

The DSA_SKB_ZERO() macro could have been used for this purpose, but:
- Zeroizing a 48-byte memory region in the hotpath is best avoided.
- It would have triggered a warning with newer compilers since
  __dsa_skb_cb contains a structure within a structure, and the {0}
  initializer was incorrect for that purpose.

So simply remove the DSA_SKB_ZERO() macro and initialize the
deferred_xmit variable by hand (which should be done for all further
dsa_skb_cb variables which need initialization - currently none - to
avoid the performance penalty).

Fixes: 97a69a0dea ("net: dsa: Add support for deferred xmit")
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-12 13:19:46 -07:00
Nicholas Mc Guire 8f5e24514c net: qrtr: use protocol endiannes variable
sparse was unable to verify endiannes correctness due to reassignment
from le32_to_cpu to the same variable - fix this warning up by providing
a proper __le32 type and initializing it. This is not actually fixing
any bug - rather just addressing the sparse warning.

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-11 09:59:48 -07:00
YueHaibing b96a54154b dsa: tag_brcm: Fix build error without CONFIG_NET_DSA_TAG_BRCM_PREPEND
Fix gcc build error:

net/dsa/tag_brcm.c:211:16: error: brcm_prepend_netdev_ops undeclared here (not in a function); did you mean brcm_netdev_ops?
 DSA_TAG_DRIVER(brcm_prepend_netdev_ops);
                ^
./include/net/dsa.h:708:10: note: in definition of macro DSA_TAG_DRIVER
  .ops = &__ops,       \
          ^~~~~
./include/net/dsa.h:701:36: warning: dsa_tag_driver_brcm_prepend_netdev_ops defined but not used [-Wunused-variable]
 #define DSA_TAG_DRIVER_NAME(__ops) dsa_tag_driver ## _ ## __ops
                                    ^
./include/net/dsa.h:707:30: note: in expansion of macro DSA_TAG_DRIVER_NAME
 static struct dsa_tag_driver DSA_TAG_DRIVER_NAME(__ops) = {  \
                              ^~~~~~~~~~~~~~~~~~~
net/dsa/tag_brcm.c:211:1: note: in expansion of macro DSA_TAG_DRIVER
 DSA_TAG_DRIVER(brcm_prepend_netdev_ops);

Like the CONFIG_NET_DSA_TAG_BRCM case,
brcm_prepend_netdev_ops and DSA_TAG_PROTO_BRCM_PREPEND
should be wrappeed by CONFIG_NET_DSA_TAG_BRCM_PREPEND.

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: b74b70c449 ("net: dsa: Support prepended Broadcom tag")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-10 15:06:45 -07:00
Tobin C. Harding bdfad5aec1 bridge: Fix error path for kobject_init_and_add()
Currently error return from kobject_init_and_add() is not followed by a
call to kobject_put().  This means there is a memory leak.  We currently
set p to NULL so that kfree() may be called on it as a noop, the code is
arguably clearer if we move the kfree() up closer to where it is
called (instead of after goto jump).

Remove a goto label 'err1' and jump to call to kobject_put() in error
return from kobject_init_and_add() fixing the memory leak.  Re-name goto
label 'put_back' to 'err1' now that we don't use err1, following current
nomenclature (err1, err2 ...).  Move call to kfree out of the error
code at bottom of function up to closer to where memory was allocated.
Add comment to clarify call to kfree().

Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-10 15:05:08 -07:00
Linus Torvalds 601e6bcc4e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several bug fixes, many are quick merge-window regression cures:

   - When NLM_F_EXCL is not set, allow same fib rule insertion. From
     Hangbin Liu.

   - Several cures in sja1105 DSA driver (while loop exit condition fix,
     return of negative u8, etc.) from Vladimir Oltean.

   - Handle tx/rx delays in realtek PHY driver properly, from Serge
     Semin.

   - Double free in cls_matchall, from Pieter Jansen van Vuuren.

   - Disable SIOCSHWTSTAMP in macvlan/vlan containers, from Hangbin Liu.

   - Endainness fixes in aqc111, from Oliver Neukum.

   - Handle errors in packet_init properly, from Haibing Yue.

   - Various W=1 warning fixes in kTLS, from Jakub Kicinski"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits)
  nfp: add missing kdoc
  net/tls: handle errors from padding_length()
  net/tls: remove set but not used variables
  docs/btf: fix the missing section marks
  nfp: bpf: fix static check error through tightening shift amount adjustment
  selftests: bpf: initialize bpf_object pointers where needed
  packet: Fix error path in packet_init
  net/tcp: use deferred jump label for TCP acked data hook
  net: aquantia: fix undefined devm_hwmon_device_register_with_info reference
  aqc111: fix double endianness swap on BE
  aqc111: fix writing to the phy on BE
  aqc111: fix endianness issue in aqc111_change_mtu
  vlan: disable SIOCSHWTSTAMP in container
  macvlan: disable SIOCSHWTSTAMP in container
  tipc: fix hanging clients using poll with EPOLLOUT flag
  tuntap: synchronize through tfiles array instead of tun->numqueues
  tuntap: fix dividing by zero in ebpf queue selection
  dwmac4_prog_mtl_tx_algorithms() missing write operation
  ptp_qoriq: fix NULL access if ptp dt node missing
  net/sched: avoid double free on matchall reoffload
  ...
2019-05-09 17:00:51 -07:00
Jakub Kicinski b53f4976fb net/tls: handle errors from padding_length()
At the time padding_length() is called the record header
is still part of the message.  If malicious TLS 1.3 peer
sends an all-zero record padding_length() will stop at
the record header, and return full length of the data
including the tail_size.

Subsequent subtraction of prot->overhead_size from rxm->full_len
will cause rxm->full_len to turn negative.  skb accessors,
however, will always catch resulting out-of-bounds operation,
so in practice this fix comes down to returning the correct
error code.  It also fixes a set but not used warning.

This code was added by commit 130b392c6c ("net: tls: Add tls 1.3 support").

CC: Dave Watson <davejwatson@fb.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 16:37:39 -07:00
Jakub Kicinski 88c80bee88 net/tls: remove set but not used variables
Commit 4504ab0e6e ("net/tls: Inform user space about send buffer availability")
made us report write_space regardless whether partial record
push was successful or not.  Remove the now unused return value
to clean up the following W=1 warning:

net/tls/tls_device.c: In function ‘tls_device_write_space’:
net/tls/tls_device.c:546:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable]
  int rc = 0;
      ^~

CC: Vakul Garg <vakul.garg@nxp.com>
CC: Boris Pismenny <borisp@mellanox.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 16:37:39 -07:00
Linus Torvalds 06cbd26d31 NFS client updates for Linux 5.2
Stable bugfixes:
 - Fall back to MDS if no deviceid is found rather than aborting   # v4.11+
 - NFS4: Fix v4.0 client state corruption when mount
 
 Features:
 - Much improved handling of soft mounts with NFS v4.0
   - Reduce risk of false positive timeouts
   - Faster failover of reads and writes after a timeout
   - Added a "softerr" mount option to return ETIMEDOUT instead of
     EIO to the application after a timeout
 - Increase number of xprtrdma backchannel requests
 - Add additional xprtrdma tracepoints
 - Improved send completion batching for xprtrdma
 
 Other bugfixes and cleanups:
 - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode
 - Reduce usage of GFP_ATOMIC pages in SUNRPC
 - Various minor NFS over RDMA cleanups and bugfixes
 - Use the correct container namespace for upcalls
 - Don't share superblocks between user namespaces
 - Various other container fixes
 - Make nfs_match_client() killable to prevent soft lockups
 - Don't mark all open state for recovery when handling recallable state revoked flag
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlzUjdcACgkQ18tUv7Cl
 QOsUiw/+OirzlZI7XeHfpZ/CwS7A+tSk3AAg9PDS1gjbfylER0g++GpA08tXnmDt
 JdUnBKYC5ujLyAqxN1j7QK+EvmXZQro8rucJxhEdPJMIQDC65fQQnmW7efl2bAEv
 CAWNDCf9Xe4g6X8LSR5jrnaMV4kuOQBYX4wqrrmaV8I+g/A/GKXW262KWnAv+w1M
 Y1ZlX+d1Gm8hODXhvqz4lldW6bkyrpWpU9BKUtYSYnSR0x1fam6PLPuCTm74fEDR
 N/Tgy5XvJi4xgti4SOZ/dI2O/Oqu6ut81PEPlhs8sTX04G8bLhr+hl3rSksCZFlu
 Afz9Hcnxg6XYB3Va7j7AO67H5SbyX4Zyj5cRMipXQE7Ebc1iXo5lu3vdhAEOAtNx
 fdNJlqD86MC/XWbtM+DfWlD+KjtpZ+lkxN+xuMgC/kVaPTeFI7nEWM796hJP/4no
 EYtnSLbSpJyH6F7wH9IL5V2EJYFxbzTvnPSTxV+QNZ0HgF17gTY0AGmQBzDE5bF0
 tfQteOG6MYXMHg64pTEzjlowlXOWdnE5TnuaFpt64/yP+hVznZMepBMSkxZO1xYt
 jc1wQlJkv/SyVH7cMGsj5lw3A6zwTrLManDUUmrLjIsVVmh4dk8WKlNtWQmvf1v6
 nFBklUa2GzH8LWKRT2ftNGcUeEiCuw/QF9oE5T/V7/7SQ/wmmvA=
 =skb2
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - Fall back to MDS if no deviceid is found rather than aborting   # v4.11+
   - NFS4: Fix v4.0 client state corruption when mount

  Features:
   - Much improved handling of soft mounts with NFS v4.0:
       - Reduce risk of false positive timeouts
       - Faster failover of reads and writes after a timeout
       - Added a "softerr" mount option to return ETIMEDOUT instead of
         EIO to the application after a timeout
   - Increase number of xprtrdma backchannel requests
   - Add additional xprtrdma tracepoints
   - Improved send completion batching for xprtrdma

  Other bugfixes and cleanups:
   - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode
   - Reduce usage of GFP_ATOMIC pages in SUNRPC
   - Various minor NFS over RDMA cleanups and bugfixes
   - Use the correct container namespace for upcalls
   - Don't share superblocks between user namespaces
   - Various other container fixes
   - Make nfs_match_client() killable to prevent soft lockups
   - Don't mark all open state for recovery when handling recallable
     state revoked flag"

* tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (69 commits)
  SUNRPC: Rebalance a kref in auth_gss.c
  NFS: Fix a double unlock from nfs_match,get_client
  nfs: pass the correct prototype to read_cache_page
  NFSv4: don't mark all open state for recovery when handling recallable state revoked flag
  SUNRPC: Fix an error code in gss_alloc_msg()
  SUNRPC: task should be exit if encode return EKEYEXPIRED more times
  NFS4: Fix v4.0 client state corruption when mount
  PNFS fallback to MDS if no deviceid found
  NFS: make nfs_match_client killable
  lockd: Store the lockd client credential in struct nlm_host
  NFS: When mounting, don't share filesystems between different user namespaces
  NFS: Convert NFSv2 to use the container user namespace
  NFSv4: Convert the NFS client idmapper to use the container user namespace
  NFS: Convert NFSv3 to use the container user namespace
  SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall
  SUNRPC: Use the client user namespace when encoding creds
  NFS: Store the credential of the mount process in the nfs_server
  SUNRPC: Cache cred of process creating the rpc_client
  xprtrdma: Remove stale comment
  xprtrdma: Update comments that reference ib_drain_qp
  ...
2019-05-09 14:33:15 -07:00
YueHaibing 36096f2f4f packet: Fix error path in packet_init
kernel BUG at lib/list_debug.c:47!
invalid opcode: 0000 [#1
CPU: 0 PID: 12914 Comm: rmmod Tainted: G        W         5.1.0+ #47
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__list_del_entry_valid+0x53/0x90
Code: 48 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 00 5d c3 48
89 fe 48 89 c2 48 c7 c7 18 75 fe 82 e8 cb 34 78 ff <0f> 0b 48 89 fe 48 c7 c7 50 75 fe 82 e8 ba 34 78 ff 0f 0b 48 89 f2
RSP: 0018:ffffc90001c2fe40 EFLAGS: 00010286
RAX: 000000000000004e RBX: ffffffffa0184000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff888237a17788 RDI: 00000000ffffffff
RBP: ffffc90001c2fe40 R08: 0000000000000000 R09: 0000000000000000
R10: ffffc90001c2fe10 R11: 0000000000000000 R12: 0000000000000000
R13: ffffc90001c2fe50 R14: ffffffffa0184000 R15: 0000000000000000
FS:  00007f3d83634540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000555c350ea818 CR3: 0000000231677000 CR4: 00000000000006f0
Call Trace:
 unregister_pernet_operations+0x34/0x120
 unregister_pernet_subsys+0x1c/0x30
 packet_exit+0x1c/0x369 [af_packet
 __x64_sys_delete_module+0x156/0x260
 ? lockdep_hardirqs_on+0x133/0x1b0
 ? do_syscall_64+0x12/0x1f0
 do_syscall_64+0x6e/0x1f0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

When modprobe af_packet, register_pernet_subsys
fails and does a cleanup, ops->list is set to LIST_POISON1,
but the module init is considered to success, then while rmmod it,
BUG() is triggered in __list_del_entry_valid which is called from
unregister_pernet_subsys. This patch fix error handing path in
packet_init to avoid possilbe issue if some error occur.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 13:45:46 -07:00
Chuck Lever 5940d1cf9f SUNRPC: Rebalance a kref in auth_gss.c
Restore the kref_get that matches the gss_put_auth(gss_msg->auth)
done by gss_release_msg().

Fixes: ac83228a71 ("SUNRPC: Use namespace of listening daemon ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09 16:27:24 -04:00
Dan Carpenter fe31ce83cb SUNRPC: Fix an error code in gss_alloc_msg()
If kstrdup_const() then this function returns zero (success) but it
should return -ENOMEM.

Fixes: ac83228a71 ("SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09 16:26:56 -04:00
ZhangXiaoxu 9c5948c248 SUNRPC: task should be exit if encode return EKEYEXPIRED more times
If the rpc.gssd always return cred success, but now the cred is
expired, then the task will loop in call_refresh and call_transmit.

Exit the rpc task after retry.

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09 16:26:56 -04:00
Jakub Kicinski 494bc1d281 net/tcp: use deferred jump label for TCP acked data hook
User space can flip the clean_acked_data_enabled static branch
on and off with TLS offload when CONFIG_TLS_DEVICE is enabled.
jump_label.h suggests we use the delayed version in this case.

Deferred branches now also don't take the branch mutex on
decrement, so we avoid potential locking issues.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 11:13:57 -07:00
David S. Miller d7e163ced4 This feature/cleanup patchset includes the following patches:
- bump version strings, by Simon Wunderlich (we forgot to include
    this patch previously ...)
 
  - fix multicast tt/tvlv worker locking, by Linus Lüssing
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE1ilQI7G+y+fdhnrfoSvjmEKSnqEFAlzUKp4WHHN3QHNpbW9u
 d3VuZGVybGljaC5kZQAKCRChK+OYQpKeoX1kD/oCq7+iLO1oaewGrD/gnrGJu2Wk
 FDlVRRBvUOwdJM2zo7o5xLCU1+Pohv0H5uBIXUw4vhcXoqhkqyysLz9lotPrxRG1
 KD1Jl8FwOPLkSiLPxj7QurfBU8G0V/A7fYkAt+RTTkk9iSniL8lCc6hE6hj6KimP
 GaqnkYJyudhOkimYyLb4exUfq4Ls7J95x3LhN+YzXk8PgN3qgrD/dVuSu7dmuyse
 bfTP1dyWBtQiByEZqNCqrWnj2nBTvCQW9UwuOCg1W1z5coh5GWXTvL61ePu35/Vn
 0LxUNNHbbkjVwbVz9T3ExMTHlrV2nEOaZnw4kiuVAbqDNDqG5XkqD1iG2H5wVh74
 BPTtk2hd2b8PrvBsyzj2Q+RaTEk0o3jIbP6A/jM+9GpVukT1M0SohcYEFISUeh1f
 PXMBpAc7qVrP8+VefCkXGFWbKyRg60k2cJQipS/Q9Aef7xOv8DyDbOrnd/a5xPUS
 zXFRAsKx16gA8SUxNgYz/67jwamVhplHSpgKG2KFtEh0CxJJBY7Z9hlEPM3uZLmT
 QHUbTS+FTlBNS/Go48pnGd5QTEGNeP5gQkiovtLmIRULRTfbwbI1/4VQCp9/XahT
 Vj+4RhhULC7i4joqEeKv+GyO36tydsU93yBt66Xo0JI8W9SLfga5zREdrmlGYRZr
 ECBszSxPpQ8zp3A2Hg==
 =Dq5n
 -----END PGP SIGNATURE-----

Merge tag 'batadv-net-for-davem-20190509' of git://git.open-mesh.org/linux-merge

Simon Wunderlich says:

====================
This feature/cleanup patchset includes the following patches:

 - bump version strings, by Simon Wunderlich (we forgot to include
   this patch previously ...)

 - fix multicast tt/tvlv worker locking, by Linus Lüssing
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 09:44:17 -07:00
Hangbin Liu 873017af77 vlan: disable SIOCSHWTSTAMP in container
With NET_ADMIN enabled in container, a normal user could be mapped to
root and is able to change the real device's rx filter via ioctl on
vlan, which would affect the other ptp process on host. Fix it by
disabling SIOCSHWTSTAMP in container.

Fixes: a6111d3c93 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 09:31:16 -07:00
Parthasarathy Bhuvaragan ff946833b7 tipc: fix hanging clients using poll with EPOLLOUT flag
commit 517d7c79bd ("tipc: fix hanging poll() for stream sockets")
introduced a regression for clients using non-blocking sockets.
After the commit, we send EPOLLOUT event to the client even in
TIPC_CONNECTING state. This causes the subsequent send() to fail
with ENOTCONN, as the socket is still not in TIPC_ESTABLISHED state.

In this commit, we:
- improve the fix for hanging poll() by replacing sk_data_ready()
  with sk_state_change() to wake up all clients.
- revert the faulty updates introduced by commit 517d7c79bd
  ("tipc: fix hanging poll() for stream sockets").

Fixes: 517d7c79bd ("tipc: fix hanging poll() for stream sockets")
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09 09:26:09 -07:00
Linus Torvalds dce45af5c2 5.2 Merge Window pull request
This has been a smaller cycle than normal. One new driver was accepted,
 which is unusual, and at least one more driver remains in review on the
 list.
 
 - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma
 
 - Many patches from MatthewW converting radix tree and IDR users to use
   xarray
 
 - Introduction of tracepoints to the MAD layer
 
 - Build large SGLs at the start for DMA mapping and get the driver to
   split them
 
 - Generally clean SGL handling code throughout the subsystem
 
 - Support for restricting RDMA devices to net namespaces for containers
 
 - Progress to remove object allocation boilerplate code from drivers
 
 - Change in how the mlx5 driver shows representor ports linked to VFs
 
 - mlx5 uapi feature to access the on chip SW ICM memory
 
 - Add a new driver for 'EFA'. This is HW that supports user space packet
   processing through QPs in Amazon's cloud
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAlzTIU0ACgkQOG33FX4g
 mxrGKQ/8CqpyvuCyZDW5ovO4DI4YlzYSPXehWlwxA4CWhU1AYTujutnNOdZdngnz
 atTthOlJpZWJV26orvvzwIOi4qX/5UjLXEY3HYdn07JP1Z4iT7E3P4W2sdU3vdl3
 j8bU7xM7ZWmnGxrBZ6yQlVRadEhB8+HJIZWMw+wx66cIPnvU+g9NgwouH67HEEQ3
 PU8OCtGBwNNR508WPiZhjqMDfi/3BED4BfCihFhMbZEgFgObjRgtCV0M33SSXKcR
 IO2FGNVuDAUBlND3vU9guW1+M77xE6p1GvzkIgdCp6qTc724NuO5F2ngrpHKRyZT
 CxvBhAJI6tAZmjBVnmgVJex7rA8p+y/8M/2WD6GE3XSO89XVOkzNBiO2iTMeoxXr
 +CX6VvP2BWwCArxsfKMgW3j0h/WVE9w8Ciej1628m1NvvKEV4AGIJC1g93lIJkRN
 i3RkJ5PkIrdBrTEdKwDu1FdXQHaO7kGgKvwzJ7wBFhso8BRMrMfdULiMbaXs2Bw1
 WdL5zoSe/bLUpPZxcT9IjXRxY5qR0FpIOoo6925OmvyYe/oZo1zbitS5GGbvV90g
 tkq6Jb+aq8ZKtozwCo+oMcg9QPLYNibQsnkL3QirtURXWCG467xdgkaJLdF6s5Oh
 cp+YBqbR/8HNMG/KQlCfnNQKp1ci8mG3EdthQPhvdcZ4jtbqnSI=
 =TS64
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "This has been a smaller cycle than normal. One new driver was
  accepted, which is unusual, and at least one more driver remains in
  review on the list.

  Summary:

   - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4,
     vmw_pvrdma

   - Many patches from MatthewW converting radix tree and IDR users to
     use xarray

   - Introduction of tracepoints to the MAD layer

   - Build large SGLs at the start for DMA mapping and get the driver to
     split them

   - Generally clean SGL handling code throughout the subsystem

   - Support for restricting RDMA devices to net namespaces for
     containers

   - Progress to remove object allocation boilerplate code from drivers

   - Change in how the mlx5 driver shows representor ports linked to VFs

   - mlx5 uapi feature to access the on chip SW ICM memory

   - Add a new driver for 'EFA'. This is HW that supports user space
     packet processing through QPs in Amazon's cloud"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits)
  RDMA/ipoib: Allow user space differentiate between valid dev_port
  IB/core, ipoib: Do not overreact to SM LID change event
  RDMA/device: Don't fire uevent before device is fully initialized
  lib/scatterlist: Remove leftover from sg_page_iter comment
  RDMA/efa: Add driver to Kconfig/Makefile
  RDMA/efa: Add the efa module
  RDMA/efa: Add EFA verbs implementation
  RDMA/efa: Add common command handlers
  RDMA/efa: Implement functions that submit and complete admin commands
  RDMA/efa: Add the ABI definitions
  RDMA/efa: Add the com service API definitions
  RDMA/efa: Add the efa_com.h file
  RDMA/efa: Add the efa.h header file
  RDMA/efa: Add EFA device definitions
  RDMA: Add EFA related definitions
  RDMA/umem: Remove hugetlb flag
  RDMA/bnxt_re: Use core helpers to get aligned DMA address
  RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size
  RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks
  RDMA/umem: Add API to find best driver supported page size in an MR
  ...
2019-05-09 09:02:46 -07:00
Florian Westphal 680f6af533 netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last rule
If userspace provides a rule blob with trailing data after last target,
we trigger a splat, then convert ruleset to 64bit format (with trailing
data), then pass that to do_replace_finish() which then returns -EINVAL.

Erroring out right away avoids the splat plus unneeded translation and
error unwind.

Fixes: 81e675c227 ("netfilter: ebtables: add CONFIG_COMPAT support")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-09 08:54:49 +02:00
Pieter Jansen van Vuuren 5f05836831 net/sched: avoid double free on matchall reoffload
Avoid freeing cls_mall.rule twice when failing to setup flow_action
offload used in the hardware intermediate representation. This is
achieved by returning 0 when the setup fails but the skip software
flag has not been set.

Fixes: f00cbf1968 ("net/sched: use the hardware intermediate representation for matchall")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08 16:34:58 -07:00
Geert Uytterhoeven f319ca6557 openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)
Commit 4806e97572 ("netfilter: replace NF_NAT_NEEDED with
IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user
popped up afterwards.

Fixes: fec9c271b8 ("openvswitch: load and reference the NAT helper.")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08 09:43:15 -07:00
David Ahern 19e4e76806 ipv4: Fix raw socket lookup for local traffic
inet_iif should be used for the raw socket lookup. inet_iif considers
rt_iif which handles the case of local traffic.

As it stands, ping to a local address with the '-I <dev>' option fails
ever since ping was changed to use SO_BINDTODEVICE instead of
cmsg + IP_PKTINFO.

IPv6 works fine.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08 09:38:30 -07:00
Hangbin Liu e9919a24d3 fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied
With commit 153380ec4b ("fib_rules: Added NLM_F_EXCL support to
fib_nl_newrule") we now able to check if a rule already exists. But this
only works with iproute2. For other tools like libnl, NetworkManager,
it still could add duplicate rules with only NLM_F_CREATE flag, like

[localhost ~ ]# ip rule
0:      from all lookup local
32766:  from all lookup main
32767:  from all lookup default
100000: from 192.168.7.5 lookup 5
100000: from 192.168.7.5 lookup 5

As it doesn't make sense to create two duplicate rules, let's just return
0 if the rule exists.

Fixes: 153380ec4b ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule")
Reported-by: Thomas Haller <thaller@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08 09:32:10 -07:00
Linus Torvalds 80f232121b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support AES128-CCM ciphers in kTLS, from Vakul Garg.

   2) Add fib_sync_mem to control the amount of dirty memory we allow to
      queue up between synchronize RCU calls, from David Ahern.

   3) Make flow classifier more lockless, from Vlad Buslov.

   4) Add PHY downshift support to aquantia driver, from Heiner
      Kallweit.

   5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces
      contention on SLAB spinlocks in heavy RPC workloads.

   6) Partial GSO offload support in XFRM, from Boris Pismenny.

   7) Add fast link down support to ethtool, from Heiner Kallweit.

   8) Use siphash for IP ID generator, from Eric Dumazet.

   9) Pull nexthops even further out from ipv4/ipv6 routes and FIB
      entries, from David Ahern.

  10) Move skb->xmit_more into a per-cpu variable, from Florian
      Westphal.

  11) Improve eBPF verifier speed and increase maximum program size,
      from Alexei Starovoitov.

  12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit
      spinlocks. From Neil Brown.

  13) Allow tunneling with GUE encap in ipvs, from Jacky Hu.

  14) Improve link partner cap detection in generic PHY code, from
      Heiner Kallweit.

  15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan
      Maguire.

  16) Remove SKB list implementation assumptions in SCTP, your's truly.

  17) Various cleanups, optimizations, and simplifications in r8169
      driver. From Heiner Kallweit.

  18) Add memory accounting on TX and RX path of SCTP, from Xin Long.

  19) Switch PHY drivers over to use dynamic featue detection, from
      Heiner Kallweit.

  20) Support flow steering without masking in dpaa2-eth, from Ioana
      Ciocoi.

  21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri
      Pirko.

  22) Increase the strict parsing of current and future netlink
      attributes, also export such policies to userspace. From Johannes
      Berg.

  23) Allow DSA tag drivers to be modular, from Andrew Lunn.

  24) Remove legacy DSA probing support, also from Andrew Lunn.

  25) Allow ll_temac driver to be used on non-x86 platforms, from Esben
      Haabendal.

  26) Add a generic tracepoint for TX queue timeouts to ease debugging,
      from Cong Wang.

  27) More indirect call optimizations, from Paolo Abeni"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits)
  cxgb4: Fix error path in cxgb4_init_module
  net: phy: improve pause mode reporting in phy_print_status
  dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings
  net: macb: Change interrupt and napi enable order in open
  net: ll_temac: Improve error message on error IRQ
  net/sched: remove block pointer from common offload structure
  net: ethernet: support of_get_mac_address new ERR_PTR error
  net: usb: smsc: fix warning reported by kbuild test robot
  staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check
  net: dsa: support of_get_mac_address new ERR_PTR error
  net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats
  vrf: sit mtu should not be updated when vrf netdev is the link
  net: dsa: Fix error cleanup path in dsa_init_module
  l2tp: Fix possible NULL pointer dereference
  taprio: add null check on sched_nest to avoid potential null pointer dereference
  net: mvpp2: cls: fix less than zero check on a u32 variable
  net_sched: sch_fq: handle non connected flows
  net_sched: sch_fq: do not assume EDT packets are ordered
  net: hns3: use devm_kcalloc when allocating desc_cb
  net: hns3: some cleanup for struct hns3_enet_ring
  ...
2019-05-07 22:03:58 -07:00
David S. Miller a9e41a5296 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Minor conflict with the DSA legacy code removal.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 17:22:09 -07:00
Linus Torvalds cf482a49af Driver core/kobject patches for 5.2-rc1
Here is the "big" set of driver core patches for 5.2-rc1
 
 There are a number of ACPI patches in here as well, as Rafael said they
 should go through this tree due to the driver core changes they
 required.  They have all been acked by the ACPI developers.
 
 There are also a number of small subsystem-specific changes in here, due
 to some changes to the kobject core code.  Those too have all been acked
 by the various subsystem maintainers.
 
 As for content, it's pretty boring outside of the ACPI changes:
   - spdx cleanups
   - kobject documentation updates
   - default attribute groups for kobjects
   - other minor kobject/driver core fixes
 
 All have been in linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXNHDbw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynDAgCfbb4LBR6I50wFXb8JM/R6cAS7qrsAn1unshKV
 8XCYcif2RxjtdJWXbjdm
 =/rLh
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core/kobject updates from Greg KH:
 "Here is the "big" set of driver core patches for 5.2-rc1

  There are a number of ACPI patches in here as well, as Rafael said
  they should go through this tree due to the driver core changes they
  required. They have all been acked by the ACPI developers.

  There are also a number of small subsystem-specific changes in here,
  due to some changes to the kobject core code. Those too have all been
  acked by the various subsystem maintainers.

  As for content, it's pretty boring outside of the ACPI changes:
   - spdx cleanups
   - kobject documentation updates
   - default attribute groups for kobjects
   - other minor kobject/driver core fixes

  All have been in linux-next for a while with no reported issues"

* tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
  kobject: clean up the kobject add documentation a bit more
  kobject: Fix kernel-doc comment first line
  kobject: Remove docstring reference to kset
  firmware_loader: Fix a typo ("syfs" -> "sysfs")
  kobject: fix dereference before null check on kobj
  Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
  init/config: Do not select BUILD_BIN2C for IKCONFIG
  Provide in-kernel headers to make extending kernel easier
  kobject: Improve doc clarity kobject_init_and_add()
  kobject: Improve docs for kobject_add/del
  driver core: platform: Fix the usage of platform device name(pdev->name)
  livepatch: Replace klp_ktype_patch's default_attrs with groups
  cpufreq: schedutil: Replace default_attrs field with groups
  padata: Replace padata_attr_type default_attrs field with groups
  irqdesc: Replace irq_kobj_type's default_attrs field with groups
  net-sysfs: Replace ktype default_attrs field with groups
  block: Replace all ktype default_attrs with groups
  samples/kobject: Replace foo_ktype's default_attrs field with groups
  kobject: Add support for default attribute groups to kobj_type
  driver core: Postpone DMA tear-down until after devres release for probe failure
  ...
2019-05-07 13:01:40 -07:00
Pieter Jansen van Vuuren d6787147e1 net/sched: remove block pointer from common offload structure
Based on feedback from Jiri avoid carrying a pointer to the tcf_block
structure in the tc_cls_common_offload structure. Instead store
a flag in driver private data which indicates if offloads apply
to a shared block at block binding time.

Suggested-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:23:40 -07:00
Petr Štetiar a51645f70f net: ethernet: support of_get_mac_address new ERR_PTR error
There was NVMEM support added to of_get_mac_address, so it could now
return ERR_PTR encoded error values, so we need to adjust all current
users of of_get_mac_address to this new fact.

While at it, remove superfluous is_valid_ether_addr as the MAC address
returned from of_get_mac_address is always valid and checked by
is_valid_ether_addr anyway.

Fixes: d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:22:47 -07:00
Petr Štetiar 4974f9b7e0 net: dsa: support of_get_mac_address new ERR_PTR error
There was NVMEM support added to of_get_mac_address, so it could now
return ERR_PTR encoded error values, so we need to adjust all current
users of of_get_mac_address to this new fact.

While at it, remove superfluous is_valid_ether_addr as the MAC address
returned from of_get_mac_address is always valid and checked by
is_valid_ether_addr anyway.

Fixes: d01f449c00 ("of_net: add NVMEM support to of_get_mac_address")
Signed-off-by: Petr Štetiar <ynezz@true.cz>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:22:47 -07:00
Stephen Suryaputra ff6ab32bd4 vrf: sit mtu should not be updated when vrf netdev is the link
VRF netdev mtu isn't typically set and have an mtu of 65536. When the
link of a tunnel is set, the tunnel mtu is changed from 1480 to the link
mtu minus tunnel header. In the case of VRF netdev is the link, then the
tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in
this case.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:19:19 -07:00
YueHaibing 68be930249 net: dsa: Fix error cleanup path in dsa_init_module
BUG: unable to handle kernel paging request at ffffffffa01c5430
PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bc5067 PTE 0
Oops: 0000 [#1
CPU: 0 PID: 6159 Comm: modprobe Not tainted 5.1.0+ #33
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:raw_notifier_chain_register+0x16/0x40
Code: 63 f8 66 90 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 8b 07 48 89 e5 48 85 c0 74 1c 8b 56 10 3b 50 10 7e 07 eb 12 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08
RSP: 0018:ffffc90001c33c08 EFLAGS: 00010282
RAX: ffffffffa01c5420 RBX: ffffffffa01db420 RCX: 4fcef45928070a8b
RDX: 0000000000000000 RSI: ffffffffa01db420 RDI: ffffffffa01b0068
RBP: ffffc90001c33c08 R08: 000000003e0a33d0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000094443661 R12: ffff88822c320700
R13: ffff88823109be80 R14: 0000000000000000 R15: ffffc90001c33e78
FS:  00007fab8bd08540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffa01c5430 CR3: 00000002297ea000 CR4: 00000000000006f0
Call Trace:
 register_netdevice_notifier+0x43/0x250
 ? 0xffffffffa01e0000
 dsa_slave_register_notifier+0x13/0x70 [dsa_core
 ? 0xffffffffa01e0000
 dsa_init_module+0x2e/0x1000 [dsa_core
 do_one_initcall+0x6c/0x3cc
 ? do_init_module+0x22/0x1f1
 ? rcu_read_lock_sched_held+0x97/0xb0
 ? kmem_cache_alloc_trace+0x325/0x3b0
 do_init_module+0x5b/0x1f1
 load_module+0x1db1/0x2690
 ? m_show+0x1d0/0x1d0
 __do_sys_finit_module+0xc5/0xd0
 __x64_sys_finit_module+0x15/0x20
 do_syscall_64+0x6b/0x1d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Cleanup allocated resourses if there are errors,
otherwise it will trgger memleak.

Fixes: c9eb3e0f87 ("net: dsa: Add support for learning FDB through notification")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:17:23 -07:00
YueHaibing 638a3a1e34 l2tp: Fix possible NULL pointer dereference
BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
PGD 0 P4D 0
Oops: 0000 [#1
CPU: 0 PID: 5697 Comm: modprobe Tainted: G        W         5.1.0-rc7+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:__lock_acquire+0x53/0x10b0
Code: 8b 1c 25 40 5e 01 00 4c 8b 6d 10 45 85 e4 0f 84 bd 06 00 00 44 8b 1d 7c d2 09 02 49 89 fe 41 89 d2 45 85 db 0f 84 47 02 00 00 <48> 81 3f a0 05 70 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f 86 3a
RSP: 0018:ffffc90001c07a28 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff88822f038440 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000128
RBP: ffffc90001c07a88 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000128 R15: 0000000000000000
FS:  00007fead0811540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000128 CR3: 00000002310da000 CR4: 00000000000006f0
Call Trace:
 ? __lock_acquire+0x24e/0x10b0
 lock_acquire+0xdf/0x230
 ? flush_workqueue+0x71/0x530
 flush_workqueue+0x97/0x530
 ? flush_workqueue+0x71/0x530
 l2tp_exit_net+0x170/0x2b0 [l2tp_core
 ? l2tp_exit_net+0x93/0x2b0 [l2tp_core
 ops_exit_list.isra.6+0x36/0x60
 unregister_pernet_operations+0xb8/0x110
 unregister_pernet_device+0x25/0x40
 l2tp_init+0x55/0x1000 [l2tp_core
 ? 0xffffffffa018d000
 do_one_initcall+0x6c/0x3cc
 ? do_init_module+0x22/0x1f1
 ? rcu_read_lock_sched_held+0x97/0xb0
 ? kmem_cache_alloc_trace+0x325/0x3b0
 do_init_module+0x5b/0x1f1
 load_module+0x1db1/0x2690
 ? m_show+0x1d0/0x1d0
 __do_sys_finit_module+0xc5/0xd0
 __x64_sys_finit_module+0x15/0x20
 do_syscall_64+0x6b/0x1d0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7fead031a839
Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe8d9acca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000560078398b80 RCX: 00007fead031a839
RDX: 0000000000000000 RSI: 000056007659dc2e RDI: 0000000000000003
RBP: 000056007659dc2e R08: 0000000000000000 R09: 0000560078398b80
R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000
R13: 00005600783a04a0 R14: 0000000000040000 R15: 0000560078398b80
Modules linked in: l2tp_core(+) e1000 ip_tables ipv6 [last unloaded: l2tp_core
CR2: 0000000000000128
---[ end trace 8322b2b8bf83f8e1

If alloc_workqueue fails in l2tp_init, l2tp_net_ops
is unregistered on failure path. Then l2tp_exit_net
is called which will flush NULL workqueue, this patch
add a NULL check to fix it.

Fixes: 67e04c29ec ("l2tp: unregister l2tp_net_ops on failure path")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:16:47 -07:00
Colin Ian King e4acf42741 taprio: add null check on sched_nest to avoid potential null pointer dereference
The call to nla_nest_start_noflag can return a null pointer and currently
this is not being checked and this can lead to a null pointer dereference
when the null pointer sched_nest is passed to function nla_nest_end. Fix
this by adding in a null pointer check.

Addresses-Coverity: ("Dereference null return value")
Fixes: a3d43c0d56 ("taprio: Add support adding an admin schedule")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:15:33 -07:00
Linus Torvalds 41bc10cabe stream_open related patches for Linux 5.2
https://lore.kernel.org/linux-fsdevel/CAHk-=wg1tFzcaX2v9Z91vPJiBR486ddW5MtgDL02-fOen2F0Aw@mail.gmail.com/T/#m5b2d9ad3aeacea4bd6aa1964468ac074bf3aa5bf
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEECVWwJCUO7/z+QjZbZsp4hBP2dUkFAlzR1UgQHGtpcnJAbmV4
 ZWRpLmNvbQAKCRBmyniEE/Z1SZBiEACGw1LzUmjV9eBYFjqaUkgX/Zfcu42D4Ek2
 8MuWnNdRabtpGQq0LccYlfoL3yH5xECp14IkCgJvkjqoZ3CcqWcv6uDxf0WtnUqZ
 wPx1RYZykb4RZj2A6/ndhInReP4AlXICyTVulKb+BquVkemMvmXX8k+bkr/msKfT
 9jdKWFIn+ANNABt3y2D7ywZvs9mkxIx+Fti+tVV4BFBeGfUuj4ArZBOHnngRnIk/
 XYlQ7FVzENSPSB+3GvL34jTGEzo8suPHKhHQlIhtcd5hwzVRZKE2sdVXsCc6/WbY
 YnT32gmT1/+cUuDl1mZSiQY5R4Xkb07k6/jNrdmjQpwmWbZu90cuRhb+JBXwnmjZ
 2Wgy3sfwYISDxtePukg1iYePlHlVlGTYqMo3AQrTBs/gEwCKWrsKQb98mRxlf1YK
 e2mdtmq6upYoorLFQesfRgrCg4GTBiPkrR3amXsFgJ2O5fhV6R98ZdGSv4kip19f
 ZNoc/t1EtKGwyAJwjINduv36E3RSHODWwSPtSnmSS1ieCGToY1SI3bVUkFM4C0tO
 5GMdSugHgXRGGVbTd/VftndJm6Wtj8b1j8c/1Vh04Q8qbKKJDRTDzAbK1v8oLaDh
 UXAKMIc8uY4caZy3/bTAB2Ou9dibrSi8Oc+LwZqJlwIcbkwn/IGNvmwtWv4ehorE
 N7EhCFZsFQ==
 =Mavg
 -----END PGP SIGNATURE-----

Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux

Pull stream_open conversion from Kirill Smelkov:

 - remove unnecessary double nonseekable_open from drivers/char/dtlk.c
   as noticed by Pavel Machek while reviewing nonseekable_open ->
   stream_open mass conversion.

 - the mass conversion patch promised in commit 10dce8af34 ("fs:
   stream_open - opener for stream-like files so that read and write can
   run simultaneously without deadlock") and is automatically generated
   by running

        $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci

   I've verified each generated change manually - that it is correct to
   convert - and each other nonseekable_open instance left - that it is
   either not correct to convert there, or that it is not converted due
   to current stream_open.cocci limitations. More details on this in the
   patch.

 - finally, change VFS to pass ppos=NULL into .read/.write for files
   that declare themselves streams. It was suggested by Rasmus Villemoes
   and makes sure that if ppos starts to be erroneously used in a stream
   file, such bug won't go unnoticed and will produce an oops instead of
   creating illusion of position change being taken into account.

   Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to
   use stream_open()" that will be hopefully coming via FUSE tree,
   because fs/fuse/ uses new-style .read_iter/.write_iter, and for these
   accessors position is still passed as non-pointer kiocb.ki_pos .

* tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux:
  vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
  *: convert stream-like files from nonseekable_open -> stream_open
  dtlk: remove double call to nonseekable_open
2019-05-07 12:15:13 -07:00
Eric Dumazet 37c0aead79 net_sched: sch_fq: handle non connected flows
FQ packet scheduler assumed that packets could be classified
based on their owning socket.

This means that if a UDP server uses one UDP socket to send
packets to different destinations, packets all land
in one FQ flow.

This is unfair, since each TCP flow has a unique bucket, meaning
that in case of pressure (fully utilised uplink), TCP flows
have more share of the bandwidth.

If we instead detect unconnected sockets, we can use a stochastic
hash based on the 4-tuple hash.

This also means a QUIC server using one UDP socket will properly
spread the outgoing packets to different buckets, and in-kernel
pacing based on EDT model will no longer risk having big rb-tree on
one flow.

Note that UDP application might provide the skb->hash in an
ancillary message at sendmsg() time to avoid the cost of a dissection
in fq packet scheduler.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:09:25 -07:00
Eric Dumazet eeb84aa0d0 net_sched: sch_fq: do not assume EDT packets are ordered
TCP stack makes sure packets for a given flow are monotically
increasing, but we want to allow UDP packets to use EDT as
well, so that QUIC servers can use in-kernel pacing.

This patch adds a per-flow rb-tree on which packets might
be stored. We still try to use the linear list for the
typical cases where packets are queued with monotically
increasing skb->tstamp, since queue/dequeue packets on
a standard list is O(1).

Note that the ability to store packets in arbitrary EDT
order will allow us to implement later a per TCP socket
mechanism adding delays (with jitter eventually) and reorders,
to implement convenient network emulators.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 12:09:25 -07:00
Linus Torvalds 168e153d5e Merge branch 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  ->destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  ->free_inode into the victim (anon-unioned with ->i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of ->free_inode()
  shmem: make use of ->free_inode()
  hugetlb: make use of ->free_inode()
  overlayfs: make use of ->free_inode()
  jfs: switch to ->free_inode()
  fuse: switch to ->free_inode()
  ext4: make use of ->free_inode()
  ecryptfs: make use of ->free_inode()
  ceph: use ->free_inode()
  btrfs: use ->free_inode()
  afs: switch to use of ->free_inode()
  dax: make use of ->free_inode()
  ntfs: switch to ->free_inode()
  securityfs: switch to ->free_inode()
  apparmor: switch to ->free_inode()
  rpcpipe: switch to ->free_inode()
  bpf: switch to ->free_inode()
  mqueue: switch to ->free_inode()
  ufs: switch to ->free_inode()
  coda: switch to ->free_inode()
  ...
2019-05-07 10:57:05 -07:00
Jeff Layton b726ec972c libceph: make ceph_pr_addr take an struct ceph_entity_addr pointer
GCC9 is throwing a lot of warnings about unaligned accesses by
callers of ceph_pr_addr. All of the current callers are passing a
pointer to the sockaddr inside struct ceph_entity_addr.

Fix it to take a pointer to a struct ceph_entity_addr instead,
and then have the function make a copy of the sockaddr before
printing it.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07 19:43:05 +02:00
Jeff Layton cede185b1b libceph: fix unaligned accesses in ceph_entity_addr handling
GCC9 is throwing a lot of warnings about unaligned access. This patch
fixes some of them by changing most of the sockaddr handling functions
to take a pointer to struct ceph_entity_addr instead of struct
sockaddr_storage.  The lower functions can then make copies or do
unaligned accesses as needed.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2019-05-07 19:43:05 +02:00
David S. Miller 14cfbdac66 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2019-05-06

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Two AF_XDP libbpf fixes for socket teardown; first one an invalid
   munmap and the other one an invalid skmap cleanup, both from Björn.

2) More graceful CONFIG_DEBUG_INFO_BTF handling when pahole is not
   present in the system to generate vmlinux btf info, from Andrii.

3) Fix libbpf and thus fix perf build error with uClibc on arc
   architecture, from Vineet.

4) Fix missing libbpf_util.h header install in libbpf, from William.

5) Exclude bash-completion/bpftool from .gitignore pattern, from Masahiro.

6) Fix up rlimit in test_libbpf_open kselftest test case, from Yonghong.

7) Minor misc cleanups.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07 09:29:16 -07:00
Linus Torvalds 0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Linus Torvalds 81ff5d2cba Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:
   - Add support for AEAD in simd
   - Add fuzz testing to testmgr
   - Add panic_on_fail module parameter to testmgr
   - Use per-CPU struct instead multiple variables in scompress
   - Change verify API for akcipher

  Algorithms:
   - Convert x86 AEAD algorithms over to simd
   - Forbid 2-key 3DES in FIPS mode
   - Add EC-RDSA (GOST 34.10) algorithm

  Drivers:
   - Set output IV with ctr-aes in crypto4xx
   - Set output IV in rockchip
   - Fix potential length overflow with hashing in sun4i-ss
   - Fix computation error with ctr in vmx
   - Add SM4 protected keys support in ccree
   - Remove long-broken mxc-scc driver
   - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits)
  crypto: ccree - use a proper le32 type for le32 val
  crypto: ccree - remove set but not used variable 'du_size'
  crypto: ccree - Make cc_sec_disable static
  crypto: ccree - fix spelling mistake "protedcted" -> "protected"
  crypto: caam/qi2 - generate hash keys in-place
  crypto: caam/qi2 - fix DMA mapping of stack memory
  crypto: caam/qi2 - fix zero-length buffer DMA mapping
  crypto: stm32/cryp - update to return iv_out
  crypto: stm32/cryp - remove request mutex protection
  crypto: stm32/cryp - add weak key check for DES
  crypto: atmel - remove set but not used variable 'alg_name'
  crypto: picoxcell - Use dev_get_drvdata()
  crypto: crypto4xx - get rid of redundant using_sd variable
  crypto: crypto4xx - use sync skcipher for fallback
  crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
  crypto: crypto4xx - fix ctr-aes missing output IV
  crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA
  crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o
  crypto: ccree - handle tee fips error during power management resume
  crypto: ccree - add function to handle cryptocell tee fips error
  ...
2019-05-06 20:15:06 -07:00
Linus Torvalds a0e928ed7c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Ingo Molnar:
 "This cycle had the following changes:

   - Timer tracing improvements (Anna-Maria Gleixner)

   - Continued tasklet reduction work: remove the hrtimer_tasklet
     (Thomas Gleixner)

   - Fix CPU hotplug remove race in the tick-broadcast mask handling
     code (Thomas Gleixner)

   - Force upper bound for setting CLOCK_REALTIME, to fix ABI
     inconsistencies with handling values that are close to the maximum
     supported and the vagueness of when uptime related wraparound might
     occur. Make the consistent maximum the year 2232 across all
     relevant ABIs and APIs. (Thomas Gleixner)

   - various cleanups and smaller fixes"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick: Fix typos in comments
  tick/broadcast: Fix warning about undefined tick_broadcast_oneshot_offline()
  timekeeping: Force upper bound for setting CLOCK_REALTIME
  timer/trace: Improve timer tracing
  timer/trace: Replace deprecated vsprintf pointer extension %pf by %ps
  timer: Move trace point to get proper index
  tick/sched: Update tick_sched struct documentation
  tick: Remove outgoing CPU from broadcast masks
  timekeeping: Consistently use unsigned int for seqcount snapshot
  softirq: Remove tasklet_hrtimer
  xfrm: Replace hrtimer tasklet with softirq hrtimer
  mac80211_hwsim: Replace hrtimer tasklet with softirq hrtimer
2019-05-06 14:50:46 -07:00
Linus Torvalds 5ba2a4b12f Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "This cycles's RCU changes include:

   - a couple of straggling RCU flavor consolidation updates

   - SRCU updates

   - RCU CPU stall-warning updates

   - torture-test updates

   - an LKMM commit adding support for synchronize_srcu_expedited()

   - documentation updates

   - miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  net/ipv4/netfilter: Update comment from call_rcu_bh() to call_rcu()
  tools/memory-model: Add support for synchronize_srcu_expedited()
  doc/kprobes: Update obsolete RCU update functions
  torture: Suppress false-positive CONFIG_INITRAMFS_SOURCE complaint
  locktorture: NULL cxt.lwsa and cxt.lrsa to allow bad-arg detection
  rcuperf: Fix cleanup path for invalid perf_type strings
  rcutorture: Fix cleanup path for invalid torture_type strings
  rcutorture: Fix expected forward progress duration in OOM notifier
  rcutorture: Remove ->ext_irq_conflict field
  rcutorture: Make rcutorture_extend_mask() comment match the code
  tools/.../rcutorture: Convert to SPDX license identifier
  torture: Don't try to offline the last CPU
  rcu: Fix nohz status in stall warning
  rcu: Move forward-progress checkers into tree_stall.h
  rcu: Move irq-disabled stall-warning checking to tree_stall.h
  rcu: Organize functions in tree_stall.h
  rcu: Move FAST_NO_HZ stall-warning code to tree_stall.h
  rcu: Inline RCU stall-warning info helper functions
  rcu: Move rcu_print_task_exp_stall() to tree_exp.h
  rcu: Inline RCU task stall-warning helper functions
  ...
2019-05-06 12:04:02 -07:00
Kirill Smelkov c5bf68fe0c *: convert stream-like files from nonseekable_open -> stream_open
Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af34
("fs: stream_open - opener for stream-like files so that read and write
can run simultaneously without deadlock"), search and convert to
stream_open all in-kernel nonseekable_open users for which read and
write actually do not depend on ppos and where there is no other methods
in file_operations which assume @offset access.

I've verified each generated change manually - that it is correct to convert -
and each other nonseekable_open instance left - that it is either not correct
to convert there, or that it is not converted due to current stream_open.cocci
limitations. The script also does not convert files that should be valid to
convert, but that currently have .llseek = noop_llseek or generic_file_llseek
for unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Among cases converted 14 were potentially vulnerable to read vs write deadlock
(see details in 10dce8af34):

	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.

and the rest were just safe to convert to stream_open because their read and
write do not use ppos at all and corresponding file_operations do not
have methods that assume @offset file access(*):

	arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
	drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
	sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
	sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.

One can also recheck/review the patch via generating it with explanation comments included via

	$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain"

(*) This second group also contains cases with read/write deadlocks that
stream_open.cocci don't yet detect, but which are still valid to convert to
stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c
calls wait_for_completion_interruptible() in its .read, but stream_open.cocci
currently detects only "wait_event*" as blocking.

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com>
Cc: Corey Minyard <minyard@acm.org>
Cc: Harald Welte <laforge@gnumonks.org>
Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs]
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Johan Hovold <johan@kernel.org>
Cc: David Herrmann <dh.herrmann@googlemail.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
Cc: Jean Delvare <jdelvare@suse.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>	[watchdog/* hwmon/*]
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec]
Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec]
Cc: Benson Leung <bleung@chromium.org>
Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome]
Cc: Alessandro Zummo <a.zummo@towertech.it>
Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*]
Cc: Mark Brown <broonie@kernel.org>
Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: bcm-kernel-feedback-list@broadcom.com
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Zwane Mwaikambo <zwanem@gmail.com>
Cc: Marek Lindner <mareklindner@neomailbox.ch>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Antonio Quartulli <a@unstable.cc>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Jaroslav Kysela <perex@perex.cz>
Cc: Takashi Iwai <tiwai@suse.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-05-06 17:46:41 +03:00
Taehee Yoo 8cd2bc981c netfilter: nf_flow_table: do not flow offload deleted conntrack entries
Conntrack entries can be deleted by the masquerade module. In that case,
flow offload should be deleted too, but GC and data-path of flow offload
do not check for conntrack status bits, hence flow offload entries will
be removed only by the timeout.

Update garbage collector and data-path to check for ct->status. If
IPS_DYING_BIT is set, garbage collector removes flow offload entries and
data-path routine ignores them.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 15:15:09 +02:00
Subash Abhinov Kasiviswanathan b33c448c4f netfilter: nf_conntrack_h323: Remove deprecated config check
CONFIG_NF_CONNTRACK_IPV6 has been deprecated so replace it with a check
for IPV6 instead.

Use nf_ip6_route6() instead of v6ops->route() and keep the IS_MODULE()
in nf_ipv6_ops as mentioned by Florian so that direct calls are used
when IPV6 is builtin and indirect calls are used only when IPV6 is a
module.

Fixes: a0ae2562c6 ("netfilter: conntrack: remove l3proto abstraction")
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 15:15:09 +02:00
Kristian Evensen f8e6089820 netfilter: ctnetlink: Resolve conntrack L3-protocol flush regression
Commit 59c08c69c2 ("netfilter: ctnetlink: Support L3 protocol-filter
on flush") introduced a user-space regression when flushing connection
track entries. Before this commit, the nfgen_family field was not used
by the kernel and all entries were removed. Since this commit,
nfgen_family is used to filter out entries that should not be removed.
One example a broken tool is conntrack. conntrack always sets
nfgen_family to AF_INET, so after 59c08c69c2 only IPv4 entries were
removed with the -F parameter.

Pablo Neira Ayuso suggested using nfgenmsg->version to resolve the
regression, and this commit implements his suggestion. nfgenmsg->version
is so far set to zero, so it is well-suited to be used as a flag for
selecting old or new flush behavior. If version is 0, nfgen_family is
ignored and all entries are used. If user-space sets the version to one
(or any other value than 0), then the new behavior is used. As version
only can have two valid values, I chose not to add a new
NFNETLINK_VERSION-constant.

Fixes: 59c08c69c2 ("netfilter: ctnetlink: Support L3 protocol-filter on flush")
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 15:15:02 +02:00
Linus Lüssing a3c7cd0cdf batman-adv: mcast: fix multicast tt/tvlv worker locking
Syzbot has reported some issues with the locking assumptions made for
the multicast tt/tvlv worker: It was able to trigger the WARN_ON() in
batadv_mcast_mla_tt_retract() and batadv_mcast_mla_tt_add().
While hard/not reproduceable for us so far it seems that the
delayed_work_pending() we use might not be quite safe from reordering.

Therefore this patch adds an explicit, new spinlock to protect the
update of the mla_list and flags in bat_priv and then removes the
WARN_ON(delayed_work_pending()).

Reported-by: syzbot+83f2d54ec6b7e417e13f@syzkaller.appspotmail.com
Reported-by: syzbot+050927a651272b145a5d@syzkaller.appspotmail.com
Reported-by: syzbot+979ffc89b87309b1b94b@syzkaller.appspotmail.com
Reported-by: syzbot+f9f3f388440283da2965@syzkaller.appspotmail.com
Fixes: cbebd363b2 ("batman-adv: Use own timer for multicast TT and TVLV updates")
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-05-06 11:40:46 +02:00
Simon Wunderlich bdc76fd299 batman-adv: Start new development cycle
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
2019-05-06 11:40:46 +02:00
Cong Wang b362487a3b sch_htb: redefine htb qdisc overlimits
In commit 3c75f6ee13 ("net_sched: sch_htb: add per class overlimits counter")
we added an overlimits counter for each HTB class which could
properly reflect how many times we use up all the bandwidth
on each class. However, the overlimits counter in HTB qdisc
does not, it is way bigger than the sum of each HTB class.
In fact, this qdisc overlimits counter increases when we have
no skb to dequeue, which happens more often than we run out of
bandwidth.

It makes more sense to make this qdisc overlimits counter just
be a sum of each HTB class, in case people still get confused.

I have verified this patch with one single HTB class, where HTB
qdisc counters now always match HTB class counters as expected.

Eric suggested we could fold this field into 'direct_pkts' as
we only use its 32bit on 64bit CPU, this saves one cache line.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:55:20 -07:00
Vladimir Oltean 227d07a07e net: dsa: sja1105: Add support for traffic through standalone ports
In order to support this, we are creating a make-shift switch tag out of
a VLAN trunk configured on the CPU port. Termination of normal traffic
on switch ports only works when not under a vlan_filtering bridge.
Termination of management (PTP, BPDU) traffic works under all
circumstances because it uses a different tagging mechanism
(incl_srcpt). We are making use of the generic CONFIG_NET_DSA_TAG_8021Q
code and leveraging it from our own CONFIG_NET_DSA_TAG_SJA1105.

There are two types of traffic: regular and link-local.

The link-local traffic received on the CPU port is trapped from the
switch's regular forwarding decisions because it matched one of the two
DMAC filters for management traffic.

On transmission, the switch requires special massaging for these
link-local frames. Due to a weird implementation of the switching IP, by
default it drops link-local frames that originate on the CPU port.
It needs to be told where to forward them to, through an SPI command
("management route") that is valid for only a single frame.
So when we're sending link-local traffic, we are using the
dsa_defer_xmit mechanism.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Vladimir Oltean 97a69a0dea net: dsa: Add support for deferred xmit
Some hardware needs to take work to get convinced to receive frames on
the CPU port (such as the sja1105 which takes temporary L2 forwarding
rules over SPI that last for a single frame). Such work needs a
sleepable context, and because the regular .ndo_start_xmit is atomic,
this cannot be done in the tagger. So introduce a generic DSA mechanism
that sets up a transmit skb queue and a workqueue for deferred
transmission.

The new driver callback (.port_deferred_xmit) is in dsa_switch and not
in the tagger because the operations that require sleeping typically
also involve interacting with the hardware, and not simply skb
manipulations. Therefore having it there simplifies the structure a bit
and makes it unnecessary to export functions from the driver to the
tagger.

The driver is responsible of calling dsa_enqueue_skb which transfers it
to the master netdevice. This is so that it has a chance of performing
some more work afterwards, such as cleanup or TX timestamping.

To tell DSA that skb xmit deferral is required, I have thought about
changing the return type of the tagger .xmit from struct sk_buff * into
a enum dsa_tx_t that could potentially encode a DSA_XMIT_DEFER value.

But the trailer tagger is reallocating every skb on xmit and therefore
making a valid use of the pointer return value. So instead of reworking
the API in complicated ways, right now a boolean property in the newly
introduced DSA_SKB_CB is set.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Vladimir Oltean cc1939e4b3 net: dsa: Allow drivers to filter packets they can decode source port from
Frames get processed by DSA and redirected to switch port net devices
based on the ETH_P_XDSA multiplexed packet_type handler found by the
network stack when calling eth_type_trans().

The running assumption is that once the DSA .rcv function is called, DSA
is always able to decode the switch tag in order to change the skb->dev
from its master.

However there are tagging protocols (such as the new DSA_TAG_PROTO_SJA1105,
user of DSA_TAG_PROTO_8021Q) where this assumption is not completely
true, since switch tagging piggybacks on the absence of a vlan_filtering
bridge. Moreover, management traffic (BPDU, PTP) for this switch doesn't
rely on switch tagging, but on a different mechanism. So it would make
sense to at least be able to terminate that.

Having DSA receive traffic it can't decode would put it in an impossible
situation: the eth_type_trans() function would invoke the DSA .rcv(),
which could not change skb->dev, then eth_type_trans() would be invoked
again, which again would call the DSA .rcv, and the packet would never
be able to exit the DSA filter and would spiral in a loop until the
whole system dies.

This happens because eth_type_trans() doesn't actually look at the skb
(so as to identify a potential tag) when it deems it as being
ETH_P_XDSA. It just checks whether skb->dev has a DSA private pointer
installed (therefore it's a DSA master) and that there exists a .rcv
callback (everybody except DSA_TAG_PROTO_NONE has that). This is
understandable as there are many switch tags out there, and exhaustively
checking for all of them is far from ideal.

The solution lies in introducing a filtering function for each tagging
protocol. In the absence of a filtering function, all traffic is passed
to the .rcv DSA callback. The tagging protocol should see the filtering
function as a pre-validation that it can decode the incoming skb. The
traffic that doesn't match the filter will bypass the DSA .rcv callback
and be left on the master netdevice, which wasn't previously possible.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Vladimir Oltean f9bbe4477c net: dsa: Optional VLAN-based port separation for switches without tagging
This patch provides generic DSA code for using VLAN (802.1Q) tags for
the same purpose as a dedicated switch tag for injection/extraction.
It is based on the discussions and interest that has been so far
expressed in https://www.spinics.net/lists/netdev/msg556125.html.

Unlike all other DSA-supported tagging protocols, CONFIG_NET_DSA_TAG_8021Q
does not offer a complete solution for drivers (nor can it). Instead, it
provides generic code that driver can opt into calling:
- dsa_8021q_xmit: Inserts a VLAN header with the specified contents.
  Can be called from another tagging protocol's xmit function.
  Currently the LAN9303 driver is inserting headers that are simply
  802.1Q with custom fields, so this is an opportunity for code reuse.
- dsa_8021q_rcv: Retrieves the TPID and TCI from a VLAN-tagged skb.
  Removing the VLAN header is left as a decision for the caller to make.
- dsa_port_setup_8021q_tagging: For each user port, installs an Rx VID
  and a Tx VID, for proper untagged traffic identification on ingress
  and steering on egress. Also sets up the VLAN trunk on the upstream
  (CPU or DSA) port. Drivers are intentionally left to call this
  function explicitly, depending on the context and hardware support.
  The expected switch behavior and VLAN semantics should not be violated
  under any conditions. That is, after calling
  dsa_port_setup_8021q_tagging, the hardware should still pass all
  ingress traffic, be it tagged or untagged.

For uniformity with the other tagging protocols, a module for the
dsa_8021q_netdev_ops structure is registered, but the typical usage is
to set up another tagging protocol which selects CONFIG_NET_DSA_TAG_8021Q,
and calls the API from tag_8021q.h. Null function definitions are also
provided so that a "depends on" is not forced in the Kconfig.

This tagging protocol only works when switch ports are standalone, or
when they are added to a VLAN-unaware bridge. It will probably remain
this way for the reasons below.

When added to a bridge that has vlan_filtering 1, the bridge core will
install its own VLANs and reset the pvids through switchdev. For the
bridge core, switchdev is a write-only pipe. All VLAN-related state is
kept in the bridge core and nothing is read from DSA/switchdev or from
the driver. So the bridge core will break this port separation because
it will install the vlan_default_pvid into all switchdev ports.

Even if we could teach the bridge driver about switchdev preference of a
certain vlan_default_pvid (task difficult in itself since the current
setting is per-bridge but we would need it per-port), there would still
exist many other challenges.

Firstly, in the DSA rcv callback, a driver would have to perform an
iterative reverse lookup to find the correct switch port. That is
because the port is a bridge slave, so its Rx VID (port PVID) is subject
to user configuration. How would we ensure that the user doesn't reset
the pvid to a different value (which would make an O(1) translation
impossible), or to a non-unique value within this DSA switch tree (which
would make any translation impossible)?

Finally, not all switch ports are equal in DSA, and that makes it
difficult for the bridge to be completely aware of this anyway.
The CPU port needs to transmit tagged packets (VLAN trunk) in order for
the DSA rcv code to be able to decode source information.
But the bridge code has absolutely no idea which switch port is the CPU
port, if nothing else then just because there is no netdevice registered
by DSA for the CPU port.
Also DSA does not currently allow the user to specify that they want the
CPU port to do VLAN trunking anyway. VLANs are added to the CPU port
using the same flags as they were added on the user port.

So the VLANs installed by dsa_port_setup_8021q_tagging per driver
request should remain private from the bridge's and user's perspective,
and should not alter the VLAN semantics observed by the user.

In the current implementation a VLAN range ending at 4095 (VLAN_N_VID)
is reserved for this purpose. Each port receives a unique Rx VLAN and a
unique Tx VLAN. Separate VLANs are needed for Rx and Tx because they
serve different purposes: on Rx the switch must process traffic as
untagged and process it with a port-based VLAN, but with care not to
hinder bridging. On the other hand, the Tx VLAN is where the
reachability restrictions are imposed, since by tagging frames in the
xmit callback we are telling the switch onto which port to steer the
frame.

Some general guidance on how this support might be employed for
real-life hardware (some comments made by Florian Fainelli):

- If the hardware supports VLAN tag stacking, it should somehow back
  up its private VLAN settings when the bridge tries to override them.
  Then the driver could re-apply them as outer tags. Dedicating an outer
  tag per bridge device would allow identical inner tag VID numbers to
  co-exist, yet preserve broadcast domain isolation.

- If the switch cannot handle VLAN tag stacking, it should disable this
  port separation when added as slave to a vlan_filtering bridge, in
  that case having reduced functionality.

- Drivers for old switches that don't support the entire VLAN_N_VID
  range will need to rework the current range selection mechanism.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Vladimir Oltean 146c1bed44 net: dsa: Export symbols for dsa_port_vid_{add, del}
This is needed so that the newly introduced tag_8021q may access these
core DSA functions when built as a module.

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Vladimir Oltean b2243b369c net: dsa: Call driver's setup callback after setting up its switchdev notifier
This allows the driver to perform some manipulations of its own during
setup, using generic switchdev calls. Having the notifiers registered at
setup time is important because otherwise any switchdev transaction
emitted during this time would be ignored (dispatched to an empty call
chain).

One current usage scenario is for the driver to request DSA to set up
802.1Q based switch tagging for its ports.

There is no danger for the driver setup code to start racing now with
switchdev events emitted from the network stack (such as bridge core)
even if the notifier is registered earlier. This is because the network
stack needs a net_device as a vehicle to perform switchdev operations,
and the slave net_devices are registered later than the core driver
setup anyway (ds->ops->setup in dsa_switch_setup vs dsa_port_setup).

Luckily DSA doesn't need a net_device to carry out switchdev callbacks,
and therefore drivers shouldn't assume either that net_devices are
available at the time their switchdev callbacks get invoked.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com>-
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:52:42 -07:00
Pieter Jansen van Vuuren 88c44a5200 net/sched: add block pointer to tc_cls_common_offload structure
Some actions like the police action are stateful and could share state
between devices. This is incompatible with offloading to multiple devices
and drivers might want to test for shared blocks when offloading.
Store a pointer to the tcf_block structure in the tc_cls_common_offload
structure to allow drivers to determine when offloads apply to a shared
block.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren 12f02b6b15 net/sched: allow stats updates from offloaded police actions
Implement the stats_update callback for the police action that
will be used by drivers for hardware offload.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren b7fe4ab8a6 net/sched: extend matchall offload for hardware statistics
Introduce a new command for matchall classifiers that allows hardware
to update statistics.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren 8c8cfc6ed2 net/sched: add police action to the hardware intermediate representation
Add police action to the hardware intermediate representation which
would subsequently allow it to be used by drivers for offload.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren fa762da94d net/sched: move police action structures to header
Move tcf_police_params, tcf_police and tc_police_compat structures to a
header. Making them usable to other code for example drivers that would
offload police actions to hardware.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren dfcb19f0fa net/sched: remove unused functions for matchall offload
Cleanup unused functions and variables after porting to the newer
intermediate representation.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren 9681e8b3ef net/dsa: use intermediate representation for matchall offload
Updates dsa hardware switch handling infrastructure to use the newer
intermediate representation for flow actions in matchall offloads.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:24 -07:00
Pieter Jansen van Vuuren f00cbf1968 net/sched: use the hardware intermediate representation for matchall
Extends matchall offload to make use of the hardware intermediate
representation. More specifically, this patch moves the native TC
actions in cls_matchall offload to the newer flow_action
representation. This ultimately allows us to avoid a direct
dependency on native TC actions for matchall.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:23 -07:00
Pieter Jansen van Vuuren a7a7be6087 net/sched: add sample action to the hardware intermediate representation
Add sample action to the hardware intermediate representation model which
would subsequently allow it to be used by drivers for offload.

Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:49:23 -07:00
David S. Miller 1ffad6d1af Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

===================
Netfilter updates for net-next

The following batch contains Netfilter updates for net-next, they are:

1) Move nft_expr_clone() to nft_dynset, from Paul Gortmaker.

2) Do not include module.h from net/netfilter/nf_tables.h,
   also from Paul.

3) Restrict conntrack sysctl entries to boolean, from Tonghao Zhang.

4) Several patches to add infrastructure to autoload NAT helper
   modules from their respective conntrack helper, this also includes
   the first client of this code in OVS, patches from Flavio Leitner.

5) Add support to match for conntrack ID, from Brett Mastbergen.

6) Spelling fix in connlabel, from Colin Ian King.

7) Use struct_size() from hashlimit, from Gustavo A. R. Silva.

8) Add optimized version of nf_inet_addr_mask(), from Li RongQing.
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 21:35:08 -07:00
Gustavo A. R. Silva eabb478219 netfilter: xt_hashlimit: use struct_size() helper
Make use of the struct_size() helper instead of an open-coded version
in order to avoid any potential type mistakes, in particular in the
context in which this code is being used.

So, replace code of the following form:

sizeof(struct xt_hashlimit_htable) + sizeof(struct hlist_head) * size

with:

struct_size(hinfo, hash, size)

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 01:03:04 +02:00
Taehee Yoo 43c8f13118 netfilter: nf_flow_table: fix missing error check for rhashtable_insert_fast
rhashtable_insert_fast() may return an error value when memory
allocation fails, but flow_offload_add() does not check for errors.
This patch just adds missing error checking.

Fixes: ac2a66665e ("netfilter: add generic flow table infrastructure")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 00:36:30 +02:00
Florian Westphal edbd82c5fb netfilter: nf_tables: fix base chain stat rcu_dereference usage
Following splat gets triggered when nfnetlink monitor is running while
xtables-nft selftests are running:

net/netfilter/nf_tables_api.c:1272 suspicious rcu_dereference_check() usage!
other info that might help us debug this:

1 lock held by xtables-nft-mul/27006:
 #0: 00000000e0f85be9 (&net->nft.commit_mutex){+.+.}, at: nf_tables_valid_genid+0x1a/0x50
Call Trace:
 nf_tables_fill_chain_info.isra.45+0x6cc/0x6e0
 nf_tables_chain_notify+0xf8/0x1a0
 nf_tables_commit+0x165c/0x1740

nf_tables_fill_chain_info() can be called both from dumps (rcu read locked)
or from the transaction path if a userspace process subscribed to nftables
notifications.

In the 'table dump' case, rcu_access_pointer() cannot be used: We do not
hold transaction mutex so the pointer can be NULLed right after the check.
Just unconditionally fetch the value, then have the helper return
immediately if its NULL.

In the notification case we don't hold the rcu read lock, but updates are
prevented due to transaction mutex. Use rcu_dereference_check() to make lockdep
aware of this.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 00:36:29 +02:00
Jakub Jankowski f5e85ce8e7 netfilter: nf_conntrack_h323: restore boundary check correctness
Since commit bc7d811ace ("netfilter: nf_ct_h323: Convert
CHECK_BOUND macro to function"), NAT traversal for H.323
doesn't work, failing to parse H323-UserInformation.
nf_h323_error_boundary() compares contents of the bitstring,
not the addresses, preventing valid H.323 packets from being
conntrack'd.

This looks like an oversight from when CHECK_BOUND macro was
converted to a function.

To fix it, stop dereferencing bs->cur and bs->end.

Fixes: bc7d811ace ("netfilter: nf_ct_h323: Convert CHECK_BOUND macro to function")
Signed-off-by: Jakub Jankowski <shasta@toxcorp.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-06 00:36:17 +02:00
David S. Miller 19ab5f4023 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Johan Hedberg says:

====================
pull request: bluetooth-next 2019-05-05

Here's one more bluetooth-next pull request for 5.2:

 - Fixed Command Complete event handling check for matching opcode
 - Added support for Qualcomm WCN3998 controller, along with DT bindings
 - Added default address for Broadcom BCM2076B1 controllers

Please let me know if there are any issues pulling. Thanks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 13:10:36 -07:00
Paolo Abeni 8c3c447b3c net: use indirect calls helpers at the socket layer
This avoids an indirect call per {send,recv}msg syscall in
the common (IPv6 or IPv4 socket) case.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:38:04 -07:00
Paolo Abeni 97ff7ffb11 net: use indirect calls helpers at early demux stage
So that we avoid another indirect call per RX packet, if
early demux is enabled.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:38:04 -07:00
Paolo Abeni 0e219ae48c net: use indirect calls helpers for L3 handler hooks
So that we avoid another indirect call per RX packet in the common
case.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:38:04 -07:00
Paolo Abeni f5737cbadb net: use indirect calls helpers for ptype hook
This avoids an indirect call per RX IPv6/IPv4 packet.
Note that we don't want to use the indirect calls helper for taps.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:38:04 -07:00
João Paulo Rechi Vita f80c5dad7b Bluetooth: Ignore CC events not matching the last HCI command
This commit makes the kernel not send the next queued HCI command until
a command complete arrives for the last HCI command sent to the
controller. This change avoids a problem with some buggy controllers
(seen on two SKUs of QCA9377) that send an extra command complete event
for the previous command after the kernel had already sent a new HCI
command to the controller.

The problem was reproduced when starting an active scanning procedure,
where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
command. When this happends the kernel ends up not processing the
command complete for the following commmand, LE_SET_SCAN_PARAM, and
ultimately behaving as if a passive scanning procedure was being
performed, when in fact controller is performing an active scanning
procedure. This makes it impossible to discover BLE devices as no device
found events are sent to userspace.

This problem is reproducible on 100% of the attempts on the affected
controllers. The extra command complete event can be seen at timestamp
27.420131 on the btmon logs bellow.

Bluetooth monitor ver 5.50
= Note: Linux version 5.0.0+ (x86_64)                                  0.352340
= Note: Bluetooth subsystem version 2.22                               0.352343
= New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
= Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
= Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
@ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
@ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
@ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
@ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
        Address type: 0x06
          LE Public
          LE Random
< HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
        Type: Active (0x01)
        Interval: 11.250 msec (0x0012)
        Window: 11.250 msec (0x0012)
        Own address type: Random (0x01)
        Filter policy: Accept all advertisement (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
      LE Set Random Address (0x08|0x0005) ncmd 1
        Status: Success (0x00)
< HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
        Scanning: Enabled (0x01)
        Filter duplicates: Enabled (0x01)
> HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
      LE Set Scan Parameters (0x08|0x000b) ncmd 1
        Status: Success (0x00)
> HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
      LE Set Scan Enable (0x08|0x000c) ncmd 1
        Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
      Start Discovery (0x0023) plen 1
        Status: Success (0x00)
        Address type: 0x06
          LE Public
          LE Random
@ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)
@ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
        Address type: 0x06
          LE Public
          LE Random
        Discovery: Enabled (0x01)

Signed-off-by: João Paulo Rechi Vita <jprvita@endlessm.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2019-05-05 19:29:04 +02:00
Dan Carpenter fdd1a8103a net: atm: clean up a range check
The code works fine but the problem is that check for negatives is a
no-op:

	if (arg < 0)
		i = 0;

The "i" value isn't used.  We immediately overwrite it with:

	i = array_index_nospec(arg, MAX_LEC_ITF);

The array_index_nospec() macro returns zero if "arg" is out of bounds so
this works, but the dead code is confusing and it doesn't look very
intentional.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:25:52 -07:00
Colin Ian King d14a108d51 net: rds: fix spelling mistake "syctl" -> "sysctl"
There is a spelling mistake in a pr_warn warning. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:19:43 -07:00
Matteo Croce 594725db0c cls_cgroup: avoid panic when receiving a packet before filter set
When a cgroup classifier is added, there is a small time interval in
which tp->root is NULL. If we receive a packet in this small time slice
a NULL pointer dereference will happen, leading to a kernel panic:

    # mkdir /sys/fs/cgroup/net_cls/0
    # echo 0x100001 >  /sys/fs/cgroup/net_cls/0/net_cls.classid
    # echo $$ >/sys/fs/cgroup/net_cls/0/tasks
    # ping -qfb 255.255.255.255 -I eth0 &>/dev/null &
    # tc qdisc add dev eth0 root handle 10: htb
    # while : ; do
    > tc filter add dev eth0 parent 10: protocol ip prio 10 handle 1: cgroup
    > tc filter delete dev eth0
    > done
    Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028
    Mem abort info:
      ESR = 0x96000005
      Exception class = DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
    Data abort info:
      ISV = 0, ISS = 0x00000005
      CM = 0, WnR = 0
    user pgtable: 4k pages, 39-bit VAs, pgdp = 0000000098a7ff91
    [0000000000000028] pgd=0000000000000000, pud=0000000000000000
    Internal error: Oops: 96000005 [#1] SMP
    Modules linked in: sch_htb cls_cgroup algif_hash af_alg nls_iso8859_1 nls_cp437 vfat fat xhci_plat_hcd m25p80 spi_nor xhci_hcd mtd usbcore usb_common spi_orion sfp i2c_mv64xxx phy_generic mdio_i2c marvell10g i2c_core mvpp2 mvmdio phylink sbsa_gwdt ip_tables x_tables autofs4
    Process ping (pid: 5421, stack limit = 0x00000000b20b1505)
    CPU: 3 PID: 5421 Comm: ping Not tainted 5.1.0-rc6 #31
    Hardware name: Marvell 8040 MACCHIATOBin Double-shot (DT)
    pstate: 60000005 (nZCv daif -PAN -UAO)
    pc : cls_cgroup_classify+0x80/0xec [cls_cgroup]
    lr : cls_cgroup_classify+0x34/0xec [cls_cgroup]
    sp : ffffff8012e6b850
    x29: ffffff8012e6b850 x28: ffffffc423dd3c00
    x27: ffffff801093ebc0 x26: ffffffc425a85b00
    x25: 0000000020000000 x24: 0000000000000000
    x23: ffffff8012e6b910 x22: ffffffc428db4900
    x21: ffffff8012e6b910 x20: 0000000000100001
    x19: 0000000000000000 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000
    x15: 0000000000000000 x14: 0000000000000000
    x13: 0000000000000000 x12: 000000000000001c
    x11: 0000000000000018 x10: ffffff8012e6b840
    x9 : 0000000000003580 x8 : 000000000000009d
    x7 : 0000000000000002 x6 : ffffff8012e6b860
    x5 : 000000007cd66ffe x4 : 000000009742a193
    x3 : ffffff800865b4d8 x2 : ffffff8012e6b910
    x1 : 0000000000000400 x0 : ffffffc42c38f300
    Call trace:
     cls_cgroup_classify+0x80/0xec [cls_cgroup]
     tcf_classify+0x78/0x138
     htb_enqueue+0x74/0x320 [sch_htb]
     __dev_queue_xmit+0x3e4/0x9d0
     dev_queue_xmit+0x24/0x30
     ip_finish_output2+0x2e4/0x4d0
     ip_finish_output+0x1d8/0x270
     ip_mc_output+0xa8/0x240
     ip_local_out+0x58/0x68
     ip_send_skb+0x2c/0x88
     ip_push_pending_frames+0x44/0x50
     raw_sendmsg+0x458/0x830
     inet_sendmsg+0x54/0xe8
     sock_sendmsg+0x34/0x50
     __sys_sendto+0xd0/0x120
     __arm64_sys_sendto+0x30/0x40
     el0_svc_common.constprop.0+0x88/0xf8
     el0_svc_handler+0x2c/0x38
     el0_svc+0x8/0xc
    Code: 39496001 360002a1 b9425c14 34000274 (79405260)

Fixes: ed76f5edcc ("net: sched: protect filter_chain list with filter_chain_lock mutex")
Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 10:00:17 -07:00
Colin Ian King ca96534630 openvswitch: check for null pointer return from nla_nest_start_noflag
The call to nla_nest_start_noflag can return null in the unlikely
event that nla_put returns -EMSGSIZE.  Check for this condition to
avoid a null pointer dereference on pointer nla_reply.

Addresses-Coverity: ("Dereference null return value")
Fixes: 11efd5cb04 ("openvswitch: Support conntrack zone limit")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 00:52:07 -07:00
David Ahern a5995e7107 ipv4: Move exception bucket to nh_common
Similar to the cached routes, make IPv4 exceptions accessible when
using an IPv6 nexthop struct with IPv4 routes. Simplify the exception
functions by passing in fib_nh_common since that is all it needs,
and then cleanup the call sites that have extraneous fib_nh conversions.

As with the cached routes this is a change in location only, from fib_nh
up to fib_nh_common; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 00:47:16 -07:00
David Ahern 87063a1fa6 ipv4: Pass fib_nh_common to rt_cache_route
Now that the cached routes are in fib_nh_common, pass it to
rt_cache_route and simplify its callers. For rt_set_nexthop,
the tclassid becomes the last user of fib_nh so move the
container_of under the #ifdef CONFIG_IP_ROUTE_CLASSID.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 00:47:16 -07:00
David Ahern 0f457a3662 ipv4: Move cached routes to fib_nh_common
While the cached routes, nh_pcpu_rth_output and nh_rth_input, are IPv4
specific, a later patch wants to make them accessible for IPv6 nexthops
with IPv4 routes using a fib6_nh. Move the cached routes from fib_nh to
fib_nh_common and update references.

Initialization of the cached entries is moved to fib_nh_common_init,
and free is moved to fib_nh_common_release.

Change in location only, from fib_nh up to fib_nh_common; no functional
change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-05 00:47:16 -07:00
YueHaibing 71f150f4c2 bpf: Use PTR_ERR_OR_ZERO in bpf_fd_sk_storage_update_elem()
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-05-04 23:20:58 -07:00
David Ahern 7fcd1e033d ipmr_base: Do not reset index in mr_table_dump
e is the counter used to save the location of a dump when an
skb is filled. Once the walk of the table is complete, mr_table_dump
needs to return without resetting that index to 0. Dump of a specific
table is looping because of the reset because there is no way to
indicate the walk of the table is done.

Move the reset to the caller so the dump of each table starts at 0,
but the loop counter is maintained if a dump fills an skb.

Fixes: e1cedae1ba ("ipmr: Refactor mr_rtm_dumproute")
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:38:15 -04:00
Eelco Chaudron a734d1f4c2 net: openvswitch: return an error instead of doing BUG_ON()
For all other error cases in queue_userspace_packet() the error is
returned, so it makes sense to do the same for these two error cases.

Reported-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Acked-by: Flavio Leitner <fbl@sysclose.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:36:36 -04:00
Michal Kubecek 05d7f547be genetlink: do not validate dump requests if there is no policy
Unlike do requests, dump genetlink requests now perform strict validation
by default even if the genetlink family does not set policy and maxtype
because it does validation and parsing on its own (e.g. because it wants to
allow different message format for different commands). While the null
policy will be ignored, maxtype (which would be zero) is still checked so
that any attribute will fail validation.

The solution is to only call __nla_validate() from genl_family_rcv_msg()
if family->maxtype is set.

Fixes: ef6243acb4 ("genetlink: optionally validate strictly/dumps")
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-04 01:27:10 -04:00