If the netdevice is already part of a flowtable, return EBUSY. I cannot
find a valid usecase for having two flowtables bound to the same
netdevice. We can still have two flowtable where the device set is
disjoint.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
While working on 16338a9b3a ("bpf, arm64: fix out of bounds access in
tail call") I noticed that ppc64 JIT is partially affected as well. While
the bound checking is correctly performed as unsigned comparison, the
register with the index value however, is never truncated into 32 bit
space, so e.g. a index value of 0x100000000ULL with a map of 1 element
would pass with PPC_CMPLW() whereas we later on continue with the full
64 bit register value. Therefore, as we do in interpreter and other JITs
truncate the value to 32 bit initially in order to fix access.
Fixes: ce0761419f ("powerpc/bpf: Implement support for tail calls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
r8152 driver handles TSO packets (limited to ~16KB) quite well,
but pretends each TSO logical packet is a single packet on the wire.
There is also some error since headers are accounted once, but
error rate is small enough that we do not care.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Discrete TPMs are often connected over slow serial buses which, on
some platforms, can have glitches causing bit flips. If a bit does
flip it could cause an overrun if it's in one of the size parameters,
so sanity check that we're not overrunning the provided buffer when
doing a memcpy().
Signed-off-by: Jeremy Boone <jeremy.boone@nccgroup.trust>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Discrete TPMs are often connected over slow serial buses which, on
some platforms, can have glitches causing bit flips. In all the
driver _recv() functions, we need to use a u32 to unmarshal the
response size, otherwise a bit flip of the 31st bit would cause the
expected variable to go negative, which would then try to read a huge
amount of data. Also sanity check that the expected amount of data is
large enough for the TPM header.
Signed-off-by: Jeremy Boone <jeremy.boone@nccgroup.trust>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Discrete TPMs are often connected over slow serial buses which, on
some platforms, can have glitches causing bit flips. In all the
driver _recv() functions, we need to use a u32 to unmarshal the
response size, otherwise a bit flip of the 31st bit would cause the
expected variable to go negative, which would then try to read a huge
amount of data. Also sanity check that the expected amount of data is
large enough for the TPM header.
Signed-off-by: Jeremy Boone <jeremy.boone@nccgroup.trust>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Discrete TPMs are often connected over slow serial buses which, on
some platforms, can have glitches causing bit flips. In all the
driver _recv() functions, we need to use a u32 to unmarshal the
response size, otherwise a bit flip of the 31st bit would cause the
expected variable to go negative, which would then try to read a huge
amount of data. Also sanity check that the expected amount of data is
large enough for the TPM header.
Signed-off-by: Jeremy Boone <jeremy.boone@nccgroup.trust>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
Discrete TPMs are often connected over slow serial buses which, on
some platforms, can have glitches causing bit flips. In all the
driver _recv() functions, we need to use a u32 to unmarshal the
response size, otherwise a bit flip of the 31st bit would cause the
expected variable to go negative, which would then try to read a huge
amount of data. Also sanity check that the expected amount of data is
large enough for the TPM header.
Signed-off-by: Jeremy Boone <jeremy.boone@nccgroup.trust>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: James Morris <james.morris@microsoft.com>
The Makefile lacks a couple of line continuation backslashes
in an `if' clause, which produces an error when make versions
prior to 4.x are used for building the tests.
$ make
make[1]: Entering directory `/[...]/linux/tools/testing/selftests/futex'
/bin/sh: -c: line 5: syntax error: unexpected end of file
make[1]: *** [all] Error 1
make[1]: Leaving directory `/[...]/linux/tools/testing/selftests/futex'
make: *** [all] Error 2
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Commit 343a8d17fa (cpufreq: scpi: remove arm_big_little dependency)
changed the cpufreq driver on juno from arm_big_little to scpi.
The scpi set_target function does not call the frequency-invariance
setter function arch_set_freq_scale() like the arm_big_little set_target
function does. As a result the task scheduler load and utilization
signals are not frequency-invariant on this platform anymore.
Fix this by adding a call to arch_set_freq_scale() into
scpi_cpufreq_set_target().
Fixes: 343a8d17fa (cpufreq: scpi: remove arm_big_little dependency)
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull idr fixes from Matthew Wilcox:
"One test-suite build fix for you and one run-time regression fix.
The regression fix includes new tests to make sure they don't pop back
up."
* 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-dax:
idr: Fix handling of IDs above INT_MAX
radix tree test suite: Fix build
It is entirely possible that the BIOS wasn't able to assign resources to a
device. In this case don't crash in pci_release_resource() when we try to
resize the resource.
Fixes: 8bb705e3e7 ("PCI: Add pci_resize_resource() for resizing BARs")
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
CC: stable@vger.kernel.org # v4.15+
This stops the driver from trying to probe the ATA slave
interface. The vendor code enables the slave interface
but the driver in the vendor tree does not make use of
it.
Setting it to muxmode 0 disables the slave interface:
the hardware only has the master interface connected
to the one harddrive slot anyways.
Without this change booting takes excessive time, so it
is very annoying to end users.
Fixes: dd5c0561db ("ARM: dts: Add basic devicetree for D-Link DNS-313")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The CONFIG_LIRC symbol has changed from 'tristate' to 'bool, so we now
get a warning for omap2plus_defconfig:
arch/arm/configs/omap2plus_defconfig:322:warning: symbol value 'm' invalid for LIRC
This changes the file to mark the symbol as built-in to get rid of the
warning.
Fixes: a60d64b15c ("media: lirc: lirc interface should not be a raw decoder")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
device-dax instances when those are meant to be explicitly allowed.
Fixes: 2bb6d28370 ("mm: introduce get_user_pages_longterm")
Cc: <stable@vger.kernel.org>
Reported-by: Gerd Rausch <gerd.rausch@oracle.com>
Acked-by: Jane Chu <jane.chu@oracle.com>
Reported-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In Patch:
[7a862fb] brd: remove dax support
Dan Williams has removed the only might_sleep
implementation of ->direct_access.
So we no longer need to check for it.
CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Boaz Harrosh <boazh@netapp.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This reverts commit 5c38bd1b82.
skb->mark contains the mark the encapsulated traffic which
can result in incorrect routing decisions being made such
as routing loops if the route chosen is via tunnel itself.
The correct method should be to use tunnel->fwmark.
Signed-off-by: Thomas Winter <thomas.winter@alliedtelesis.co.nz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a VLAN is added on a port, a reference is taken on the
corresponding master VLAN entry. If it does not already exist, then it
is created and a reference taken.
However, in the second case a reference is not really taken when
CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by
refcount_inc_not_zero().
Fix this by using refcount_set() on a newly created master VLAN entry.
Fixes: 2512775985 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renesas R-Car V3H (R8A77980) SoC has the R-Car gen3 compatible EtherAVB
device, so document the SoC specific bindings.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <horms+renesas@verge.net.au>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Added MODULE_ALIAS("rpmsg:IPCRTR") to ensure qrtr-smd and qrtr will load
when IPCRTR channel is detected.
Signed-off-by: Ramon Fried <rfried@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
test_bpf() is taking 1.6 seconds nowadays, it is time
to add a schedule point in it.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Khalid reported that the kernel selftests are currently failing:
selftests: test_bpf.sh
========================================
test_bpf: [FAIL]
not ok 1..8 selftests: test_bpf.sh [FAIL]
He bisected it to 6ce711f275 ("idr: Make
1-based IDRs more efficient").
The root cause is doing a signed comparison in idr_alloc_u32() instead
of an unsigned comparison. I went looking for any similar problems and
found a couple (which would each result in the failure to warn in two
situations that aren't supposed to happen).
I knocked up a few test-cases to prove that I was right and added them
to the test-suite.
Reported-by: Khalid Aziz <khalid.aziz@oracle.com>
Tested-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Sometimes when physical lines have a just good noise to make the protocol
handshaking fail, but the carrier detect still good. Then after remove of
the noise, nobody will trigger this protocol to be start again to cause
the link to never come back. The fix is when the carrier is still on, not
terminate the protocol handshaking.
Signed-off-by: Denis Du <dudenis2000@yahoo.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
'struct blk_user_trace_setup' is passed to BLKTRACESETUP, not
BLKTRACESTART.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().
Note, this in fact won't try to batch packets through devmap, we could
address in the future.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 761876c857 ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Except for tuntap, all other drivers' XDP was implemented at NAPI
poll() routine in a bh. This guarantees all XDP operation were done at
the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
for tuntap, we do it in process context and we try to protect XDP
processing by RCU reader lock. This is insufficient since
CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
breaks the assumption that all XDP were processed in the same CPU.
Fixing this by simply disabling preemption during XDP processing.
Fixes: 761876c857 ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 762c330d67. The
reason is we try to batch packets for devmap which causes calling
xdp_do_flush() in the process context. Simply disabling preemption
may not work since process may move among processors which lead
xdp_do_flush() to miss some flushes on some processors.
So simply revert the patch, a follow-up patch will add the xdp flush
correctly.
Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 762c330d67 ("tuntap: add missing xdp flush")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add check for build_skb enabled ring in ixgbe_dma_sync_frag().
In that case &skb_shinfo(skb)->frags[0] may not always be set which
can lead to a crash. Instead we derive the page offset from skb->data.
Fixes: 42073d91a2
("ixgbe: Have the CPU take ownership of the buffers sooner")
CC: stable <stable@vger.kernel.org>
Reported-by: Ambarish Soman <asoman@redhat.com>
Suggested-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not valid for orion5x to use mac_pton().
First of all, the orion5x buffer is not NULL terminated. mac_pton()
has no business operating on non-NULL terminated buffers because
only the caller can know that this is valid and in what manner it
is ok to parse this NULL'less buffer.
Second of all, orion5x operates on an __iomem pointer, which cannot
be dereferenced using normal C pointer operations. Accesses to
such areas much be performed with the proper iomem accessors.
Fixes: 4904dbda41 ("ARM: orion5x: use mac_pton() helper")
Signed-off-by: David S. Miller <davem@davemloft.net>
When specifying string type mount option (e.g., logdev)
several times in a mount, current option parsing may
cause memory leak. Hence, call kfree for previous one
in this case.
Signed-off-by: Chengguang Xu <cgxu519@icloud.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Pull x86 fixes from Thomas Gleixner:
"Yet another pile of melted spectrum related changes:
- sanitize the array_index_nospec protection mechanism: Remove the
overengineered array_index_nospec_mask_check() magic and allow
const-qualified types as index to avoid temporary storage in a
non-const local variable.
- make the microcode loader more robust by properly propagating error
codes. Provide information about new feature bits after micro code
was updated so administrators can act upon.
- optimizations of the entry ASM code which reduce code footprint and
make the code simpler and faster.
- fix the {pmd,pud}_{set,clear}_flags() implementations to work
properly on paravirt kernels by removing the address translation
operations.
- revert the harmful vmexit_fill_RSB() optimization
- use IBRS around firmware calls
- teach objtool about retpolines and add annotations for indirect
jumps and calls.
- explicitly disable jumplabel patching in __init code and handle
patching failures properly instead of silently ignoring them.
- remove indirect paravirt calls for writing the speculation control
MSR as these calls are obviously proving the same attack vector
which is tried to be mitigated.
- a few small fixes which address build issues with recent compiler
and assembler versions"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
objtool, retpolines: Integrate objtool with retpoline support more closely
x86/entry/64: Simplify ENCODE_FRAME_POINTER
extable: Make init_kernel_text() global
jump_label: Warn on failed jump_label patching attempt
jump_label: Explicitly disable jump labels in __init code
x86/entry/64: Open-code switch_to_thread_stack()
x86/entry/64: Move ASM_CLAC to interrupt_entry()
x86/entry/64: Remove 'interrupt' macro
x86/entry/64: Move the switch_to_thread_stack() call to interrupt_entry()
x86/entry/64: Move ENTER_IRQ_STACK from interrupt macro to interrupt_entry
x86/entry/64: Move PUSH_AND_CLEAR_REGS from interrupt macro to helper function
x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP
objtool: Add module specific retpoline rules
objtool: Add retpoline validation
objtool: Use existing global variables for options
x86/mm/sme, objtool: Annotate indirect call in sme_encrypt_execute()
x86/boot, objtool: Annotate indirect jump in secondary_startup_64()
x86/paravirt, objtool: Annotate indirect calls
...
- optimization for the exitless interrupt support that was merged in 4.16-rc1
- improve the branch prediction blocking for nested KVM
- replace some jump tables with switch statements to improve expoline performance
- fixes for multiple epoch facility
ARM:
- fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
- make sure we can build 32-bit KVM/ARM with gcc-8.
x86:
- fixes for AMD SEV
- fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
- fixes for async page fault migration
- small optimization to PV TLB flush (new in 4.16-rc1)
- syzkaller fixes
Generic:
- compiler warning fixes
- syzkaller fixes
- more improvements to the kvm_stat tool
Two more small Spectre fixes are going to reach you via Ingo.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEbBAABAgAGBQJakL/fAAoJEL/70l94x66Dzp4H9j6qMzgOTAQ0bYmupQp81tad
V8lNabVSNi0UBYwk2D44oNigtNjQckE18KGnjuJ4tZW+GZ+D7zrrHrKXWtATXgxP
SIfHj+raSd/lgJoy6HLu/N0oT6wS+PdZMYFgSu600Vi618lGKGX1SIAwBhjoxdMX
7QKKAuPcDZ1qgGddhWaLnof28nQQEWcCAVfFeVojmM0TyhvSbgSysh/Gq10ydybh
NVUfgP3fzLtT9gVngX/ZtbogNkltPYmucpI+wT3nWfsgBic783klfWrfpnC/GM85
OeXLVhHwVLG6tXUGhb4ULO+F9HwRGX31+er6iIxmwH9PvqnQMRcQ0Xxf2gbNXg==
=YmH6
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"s390:
- optimization for the exitless interrupt support that was merged in 4.16-rc1
- improve the branch prediction blocking for nested KVM
- replace some jump tables with switch statements to improve expoline performance
- fixes for multiple epoch facility
ARM:
- fix the interaction of userspace irqchip VMs with in-kernel irqchip VMs
- make sure we can build 32-bit KVM/ARM with gcc-8.
x86:
- fixes for AMD SEV
- fixes for Intel nested VMX, emulated UMIP and a dump_stack() on VM startup
- fixes for async page fault migration
- small optimization to PV TLB flush (new in 4.16-rc1)
- syzkaller fixes
Generic:
- compiler warning fixes
- syzkaller fixes
- more improvements to the kvm_stat tool
Two more small Spectre fixes are going to reach you via Ingo"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (40 commits)
KVM: SVM: Fix SEV LAUNCH_SECRET command
KVM: SVM: install RSM intercept
KVM: SVM: no need to call access_ok() in LAUNCH_MEASURE command
include: psp-sev: Capitalize invalid length enum
crypto: ccp: Fix sparse, use plain integer as NULL pointer
KVM: X86: Avoid traversing all the cpus for pv tlb flush when steal time is disabled
x86/kvm: Make parse_no_xxx __init for kvm
KVM: x86: fix backward migration with async_PF
kvm: fix warning for non-x86 builds
kvm: fix warning for CONFIG_HAVE_KVM_EVENTFD builds
tools/kvm_stat: print 'Total' line for multiple events only
tools/kvm_stat: group child events indented after parent
tools/kvm_stat: separate drilldown and fields filtering
tools/kvm_stat: eliminate extra guest/pid selection dialog
tools/kvm_stat: mark private methods as such
tools/kvm_stat: fix debugfs handling
tools/kvm_stat: print error on invalid regex
tools/kvm_stat: fix crash when filtering out all non-child trace events
tools/kvm_stat: avoid 'is' for equality checks
tools/kvm_stat: use a more pythonic way to iterate over dictionaries
...
James Chapman says:
====================
l2tp: fix API races discovered by syzbot
This patch series addresses several races with L2TP APIs discovered by
syzbot. There are no functional changes.
The set of patches 1-5 in combination fix the following syzbot reports.
19c09769f WARNING in debug_print_object
347bd5acd KASAN: use-after-free Read in inet_shutdown
6e6a5ec8d general protection fault in pppol2tp_connect
9df43faf0 KASAN: use-after-free Read in pppol2tp_connect
My first attempts to fix these issues were as net-next patches but
the series included other refactoring and cleanup work. I was asked to
separate out the bugfixes and redo for the net tree, which is what
these patches are.
The changes are:
1. Fix inet_shutdown races when L2TP tunnels and sessions close. (patches 1-2)
2. Fix races with tunnel and its socket. (patch 3)
3. Fix race in pppol2tp_release with session and its socket. (patch 4)
4. Fix tunnel lookup use-after-free. (patch 5)
All of the syzbot reproducers hit races in the tunnel and pppol2tp
session create and destroy paths. These tests create and destroy
pppol2tp tunnels and sessions rapidly using multiple threads,
provoking races in several tunnel/session create/destroy paths. The
key problem was that each tunnel/session socket could be destroyed
while its associated tunnel/session object still existed (patches 3,
4). Patch 5 addresses a problem with the way tunnels are removed from
the tunnel list. Patch 5 is tagged that it addresses all four syzbot
issues, though all 5 patches are needed.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
pppol2tp_release uses call_rcu to put the final ref on its socket. But
the session object doesn't hold a ref on the session socket so may be
freed while the pppol2tp_put_sk RCU callback is scheduled. Fix this by
having the session hold a ref on its socket until the session is
destroyed. It is this ref that is dropped via call_rcu.
Sessions are also deleted via l2tp_tunnel_closeall. This must now also put
the final ref via call_rcu. So move the call_rcu call site into
pppol2tp_session_close so that this happens in both destroy paths. A
common destroy path should really be implemented, perhaps with
l2tp_tunnel_closeall calling l2tp_session_delete like pppol2tp_release
does, but this will be looked at later.
ODEBUG: activate active (active state 1) object type: rcu_head hint: (null)
WARNING: CPU: 3 PID: 13407 at lib/debugobjects.c:291 debug_print_object+0x166/0x220
Modules linked in:
CPU: 3 PID: 13407 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #38
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:debug_print_object+0x166/0x220
RSP: 0018:ffff880013647a00 EFLAGS: 00010082
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff814d3333
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001a59f6d0
RBP: ffff880013647a40 R08: 0000000000000000 R09: 0000000000000001
R10: ffff8800136479a8 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff86161420 R14: ffffffff85648b60 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000006022000 CR4: 00000000000006e0
Call Trace:
debug_object_activate+0x38b/0x530
? debug_object_assert_init+0x3b0/0x3b0
? __mutex_unlock_slowpath+0x85/0x8b0
? pppol2tp_session_destruct+0x110/0x110
__call_rcu.constprop.66+0x39/0x890
? __call_rcu.constprop.66+0x39/0x890
call_rcu_sched+0x17/0x20
pppol2tp_release+0x2c7/0x440
? fcntl_setlk+0xca0/0xca0
? sock_alloc_file+0x340/0x340
sock_release+0x92/0x1e0
sock_close+0x1b/0x20
__fput+0x296/0x6e0
____fput+0x1a/0x20
task_work_run+0x127/0x1a0
do_exit+0x7f9/0x2ce0
? SYSC_connect+0x212/0x310
? mm_update_next_owner+0x690/0x690
? up_read+0x1f/0x40
? __do_page_fault+0x3c8/0xca0
do_group_exit+0x10d/0x330
? do_group_exit+0x330/0x330
SyS_exit_group+0x22/0x30
do_syscall_64+0x1e0/0x730
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f362e471259
RSP: 002b:00007ffe389abe08 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f362e471259
RDX: 00007f362e471259 RSI: 000000000000002e RDI: 0000000000000000
RBP: 00007ffe389abe30 R08: 0000000000000000 R09: 00007f362e944270
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffe389abf50 R14: 0000000000000000 R15: 0000000000000000
Code: 8d 3c dd a0 8f 64 85 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 8b 14 dd a0 8f 64 85 4c 89 f6 48 c7 c7 20 85 64 85 e
8 2a 55 14 ff <0f> 0b 83 05 ad 2a 68 04 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41
Fixes: ee40fb2e1e ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The tunnel socket tunnel->sock (struct sock) is accessed when
preparing a new ppp session on a tunnel at pppol2tp_session_init. If
the socket is closed by a thread while another is creating a new
session, the threads race. In pppol2tp_connect, the tunnel object may
be created if the pppol2tp socket is associated with the special
session_id 0 and the tunnel socket is looked up using the provided
fd. When handling this, pppol2tp_connect cannot sock_hold the tunnel
socket to prevent it being destroyed during pppol2tp_connect since
this may itself may race with the socket being destroyed. Doing
sockfd_lookup in pppol2tp_connect isn't sufficient to prevent
tunnel->sock going away either because a given tunnel socket fd may be
reused between calls to pppol2tp_connect. Instead, have
l2tp_tunnel_create sock_hold the tunnel socket before it does
sockfd_put. This ensures that the tunnel's socket is always extant
while the tunnel object exists. Hold a ref on the socket until the
tunnel is destroyed and ensure that all tunnel destroy paths go
through a common function (l2tp_tunnel_delete) since this will do the
final sock_put to release the tunnel socket.
Since the tunnel's socket is now guaranteed to exist if the tunnel
exists, we no longer need to use sockfd_lookup via l2tp_sock_to_tunnel
to derive the tunnel from the socket since this is always
sk_user_data.
Also, sessions no longer sock_hold the tunnel socket since sessions
already hold a tunnel ref and the tunnel sock will not be freed until
the tunnel is freed. Removing these sock_holds in
l2tp_session_register avoids a possible sock leak in the
pppol2tp_connect error path if l2tp_session_register succeeds but
attaching a ppp channel fails. The pppol2tp_connect error path could
have been fixed instead and have the sock ref dropped when the session
is freed, but doing a sock_put of the tunnel socket when the session
is freed would require a new session_free callback. It is simpler to
just remove the sock_hold of the tunnel socket in
l2tp_session_register, now that the tunnel socket lifetime is
guaranteed.
Finally, some init code in l2tp_tunnel_create is reordered to ensure
that the new tunnel object's refcount is set and the tunnel socket ref
is taken before the tunnel socket destructor callbacks are set.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4360 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #34
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:pppol2tp_session_init+0x1d6/0x500
RSP: 0018:ffff88001377fb40 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88001636a940 RCX: ffffffff84836c1d
RDX: 0000000000000045 RSI: 0000000055976744 RDI: 0000000000000228
RBP: ffff88001377fb60 R08: ffffffff84836bc8 R09: 0000000000000002
R10: ffff88001377fab8 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88001636aac8 R14: ffff8800160f81c0 R15: 1ffff100026eff76
FS: 00007ffb3ea66700(0000) GS:ffff88001a400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000016261000 CR4: 00000000000006f0
Call Trace:
pppol2tp_connect+0xd18/0x13c0
? pppol2tp_session_create+0x170/0x170
? __might_fault+0x115/0x1d0
? lock_downgrade+0x860/0x860
? __might_fault+0xe5/0x1d0
? security_socket_connect+0x8e/0xc0
SYSC_connect+0x1b6/0x310
? SYSC_bind+0x280/0x280
? __do_page_fault+0x5d1/0xca0
? up_read+0x1f/0x40
? __do_page_fault+0x3c8/0xca0
SyS_connect+0x29/0x30
? SyS_accept+0x40/0x40
do_syscall_64+0x1e0/0x730
? trace_hardirqs_off_thunk+0x1a/0x1c
entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7ffb3e376259
RSP: 002b:00007ffeda4f6508 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000020e77012 RCX: 00007ffb3e376259
RDX: 000000000000002e RSI: 0000000020e77000 RDI: 0000000000000004
RBP: 00007ffeda4f6540 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffeda4f6660 R14: 0000000000000000 R15: 0000000000000000
Code: 80 3d b0 ff 06 02 00 0f 84 07 02 00 00 e8 13 d6 db fc 49 8d bc 24 28 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f
a 48 c1 ea 03 <80> 3c 02 00 0f 85 ed 02 00 00 4d 8b a4 24 28 02 00 00 e8 13 16
Fixes: 80d84ef3ff ("l2tp: prevent l2tp_tunnel_delete racing with userspace close")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously, if a ppp session was closed, we called inet_shutdown to mark
the socket as unconnected such that userspace would get errors and
then close the socket. This could race with userspace closing the
socket. Instead, leave userspace to close the socket in its own time
(our session will be detached anyway).
BUG: KASAN: use-after-free in inet_shutdown+0x5d/0x1c0
Read of size 4 at addr ffff880010ea3ac0 by task syzbot_347bd5ac/8296
CPU: 3 PID: 8296 Comm: syzbot_347bd5ac Not tainted 4.16.0-rc1+ #91
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x101/0x157
? inet_shutdown+0x5d/0x1c0
print_address_description+0x78/0x260
? inet_shutdown+0x5d/0x1c0
kasan_report+0x240/0x360
__asan_load4+0x78/0x80
inet_shutdown+0x5d/0x1c0
? pppol2tp_show+0x80/0x80
pppol2tp_session_close+0x68/0xb0
l2tp_tunnel_closeall+0x199/0x210
? udp_v6_flush_pending_frames+0x90/0x90
l2tp_udp_encap_destroy+0x6b/0xc0
? l2tp_tunnel_del_work+0x2e0/0x2e0
udpv6_destroy_sock+0x8c/0x90
sk_common_release+0x47/0x190
udp_lib_close+0x15/0x20
inet_release+0x85/0xd0
inet6_release+0x43/0x60
sock_release+0x53/0x100
? sock_alloc_file+0x260/0x260
sock_close+0x1b/0x20
__fput+0x19f/0x380
____fput+0x1a/0x20
task_work_run+0xd2/0x110
exit_to_usermode_loop+0x18d/0x190
do_syscall_64+0x389/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x7fe240a45259
RSP: 002b:00007fe241132df8 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe240a45259
RDX: 00007fe240a45259 RSI: 0000000000000000 RDI: 00000000000000a5
RBP: 00007fe241132e20 R08: 00007fe241133700 R09: 0000000000000000
R10: 00007fe241133700 R11: 0000000000000297 R12: 0000000000000000
R13: 00007ffc49aff84f R14: 0000000000000000 R15: 00007fe241141040
Allocated by task 8331:
save_stack+0x43/0xd0
kasan_kmalloc+0xad/0xe0
kasan_slab_alloc+0x12/0x20
kmem_cache_alloc+0x144/0x3e0
sock_alloc_inode+0x22/0x130
alloc_inode+0x3d/0xf0
new_inode_pseudo+0x1c/0x90
sock_alloc+0x30/0x110
__sock_create+0xaa/0x4c0
SyS_socket+0xbe/0x130
do_syscall_64+0x128/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
Freed by task 8314:
save_stack+0x43/0xd0
__kasan_slab_free+0x11a/0x170
kasan_slab_free+0xe/0x10
kmem_cache_free+0x88/0x2b0
sock_destroy_inode+0x49/0x50
destroy_inode+0x77/0xb0
evict+0x285/0x340
iput+0x429/0x530
dentry_unlink_inode+0x28c/0x2c0
__dentry_kill+0x1e3/0x2f0
dput.part.21+0x500/0x560
dput+0x24/0x30
__fput+0x2aa/0x380
____fput+0x1a/0x20
task_work_run+0xd2/0x110
exit_to_usermode_loop+0x18d/0x190
do_syscall_64+0x389/0x3b0
entry_SYSCALL_64_after_hwframe+0x26/0x9b
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When blkdev_open() races with device removal and creation it can happen
that unhashed bdev inode gets associated with newly created gendisk
like:
CPU0 CPU1
blkdev_open()
bdev = bd_acquire()
del_gendisk()
bdev_unhash_inode(bdev);
remove device
create new device with the same number
__blkdev_get()
disk = get_gendisk()
- gets reference to gendisk of the new device
Now another blkdev_open() will not find original 'bdev' as it got
unhashed, create a new one and associate it with the same 'disk' at
which point problems start as we have two independent page caches for
one device.
Fix the problem by verifying that the bdev inode didn't get unhashed
before we acquired gendisk reference. That way we make sure gendisk can
get associated only with visible bdev inodes.
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:
CPU0 CPU1 CPU2
del_gendisk()
bdev_unhash_inode(part1);
blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL)
bdev = bd_acquire() bdev = bd_acquire()
blkdev_get(bdev)
bd_start_claiming(bdev)
- finds old inode 'whole'
bd_prepare_to_claim() -> 0
bdev_unhash_inode(whole);
<device removed>
<new device under same
number created>
blkdev_get(bdev);
bd_start_claiming(bdev)
- finds new inode 'whole'
bd_prepare_to_claim()
- this also succeeds as we have
different 'whole' here...
- bad things happen now as we
have two exclusive openers of
the same bdev
The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.
We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).
Reported-and-analyzed-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When two blkdev_open() calls race with device removal and recreation,
__blkdev_get() can use looked up gendisk after it is freed:
CPU0 CPU1 CPU2
del_gendisk(disk);
bdev_unhash_inode(inode);
blkdev_open() blkdev_open()
bdev = bd_acquire(inode);
- creates and returns new inode
bdev = bd_acquire(inode);
- returns the same inode
__blkdev_get(devt) __blkdev_get(devt)
disk = get_gendisk(devt);
- got structure of device going away
<finish device removal>
<new device gets
created under the same
device number>
disk = get_gendisk(devt);
- got new device structure
if (!bdev->bd_openers) {
does the first open
}
if (!bdev->bd_openers)
- false
} else {
put_disk_and_module(disk)
- remember this was old device - this was last ref and disk is
now freed
}
disk_unblock_events(disk); -> oops
Fix the problem by making sure we drop reference to disk in
__blkdev_get() only after we are really done with it.
Reported-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add a proper counterpart to get_disk_and_module() -
put_disk_and_module(). Currently it is opencoded in several places.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Rename get_disk() to get_disk_and_module() to make sure what the
function does. It's not a great name but at least it is now clear that
put_disk() is not it's counterpart.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit 8ddcd65325 "block: introduce GENHD_FL_HIDDEN" added handling of
hidden devices to get_gendisk() but forgot to drop module reference
which is also acquired by get_disk(). Drop the reference as necessary.
Arguably the function naming here is misleading as put_disk() is *not*
the counterpart of get_disk() but let's fix that in the follow up
commit since that will be more intrusive.
Fixes: 8ddcd65325
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Introduce __smp_{mb,rmb,wmb}, and rely on the generic definitions
for smp_{mb,rmb,wmb}. A first consequence is that smp_{mb,rmb,wmb}
map to a compiler barrier on !SMP (while their definition remains
unchanged on SMP). As a further consequence, smp_load_acquire and
smp_store_release have "fence rw,rw" instead of "fence iorw,iorw".
Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
We don't want a separate module for vb2-trace.
That fixes this warning:
WARNING: modpost: missing MODULE_LICENSE() in drivers/media/common/videobuf2/vb2-trace.o
When building as module.
While here, add a SPDX header.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>