Patch series "userfaultfd non-cooperative further update for 4.11 merge
window".
Unfortunately I noticed one relevant bug in userfaultfd_exit while doing
more testing. I've been doing testing before and this was also tested
by kbuild bot and exercised by the selftest, but this bug never
reproduced before.
I dropped userfaultfd_exit as result. I dropped it because of
implementation difficulty in receiving signals in __mmput and because I
think -ENOSPC as result from the background UFFDIO_COPY should be enough
already.
Before I decided to remove userfaultfd_exit, I noticed userfaultfd_exit
wasn't exercised by the selftest and when I tried to exercise it, after
moving it to a more correct place in __mmput where it would make more
sense and where the vma list is stable, it resulted in the
event_wait_completion in D state. So then I added the second patch to
be sure even if we call userfaultfd_event_wait_completion too late
during task exit(), we won't risk to generate tasks in D state. The
same check exists in handle_userfault() for the same reason, except it
makes a difference there, while here is just a robustness check and it's
run under WARN_ON_ONCE.
While looking at the userfaultfd_event_wait_completion() function I
looked back at its callers too while at it and I think it's not ok to
stop executing dup_fctx on the fcs list because we relay on
userfaultfd_event_wait_completion to execute
userfaultfd_ctx_put(fctx->orig) which is paired against
userfaultfd_ctx_get(fctx->orig) in dup_userfault just before
list_add(fcs). This change only takes care of fctx->orig but this area
also needs further review looking for similar problems in fctx->new.
The only patch that is urgent is the first because it's an use after
free during a SMP race condition that affects all processes if
CONFIG_USERFAULTFD=y. Very hard to reproduce though and probably
impossible without SLUB poisoning enabled.
This patch (of 3):
I once reproduced this oops with the userfaultfd selftest, it's not
easily reproducible and it requires SLUB poisoning to reproduce.
general protection fault: 0000 [#1] SMP
Modules linked in:
CPU: 2 PID: 18421 Comm: userfaultfd Tainted: G ------------ T 3.10.0+ #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
task: ffff8801f83b9440 ti: ffff8801f833c000 task.ti: ffff8801f833c000
RIP: 0010:[<ffffffff81451299>] [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
RSP: 0018:ffff8801f833fe80 EFLAGS: 00010202
RAX: ffff8801f833ffd8 RBX: 6b6b6b6b6b6b6b6b RCX: ffff8801f83b9440
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800baf18600
RBP: ffff8801f833fee8 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: ffffffff8127ceb3 R12: 0000000000000000
R13: ffff8800baf186b0 R14: ffff8801f83b99f8 R15: 00007faed746c700
FS: 0000000000000000(0000) GS:ffff88023fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007faf0966f028 CR3: 0000000001bc6000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
do_exit+0x297/0xd10
SyS_exit+0x17/0x20
tracesys+0xdd/0xe2
Code: 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 48 83 ec 58 48 8b 1f 48 85 db 75 11 eb 73 66 0f 1f 44 00 00 48 8b 5b 10 48 85 db 74 64 <4c> 8b a3 b8 00 00 00 4d 85 e4 74 eb 41 f6 84 24 2c 01 00 00 80
RIP [<ffffffff81451299>] userfaultfd_exit+0x29/0xa0
RSP <ffff8801f833fe80>
---[ end trace 9fecd6dcb442846a ]---
In the debugger I located the "mm" pointer in the stack and walking
mm->mmap->vm_next through the end shows the vma->vm_next list is fully
consistent and it is null terminated list as expected. So this has to
be an SMP race condition where userfaultfd_exit was running while the
vma list was being modified by another CPU.
When userfaultfd_exit() run one of the ->vm_next pointers pointed to
SLAB_POISON (RBX is the vma pointer and is 0x6b6b..).
The reason is that it's not running in __mmput but while there are still
other threads running and it's not holding the mmap_sem (it can't as it
has to wait the even to be received by the manager). So this is an use
after free that was happening for all processes.
One more implementation problem aside from the race condition:
userfaultfd_exit has really to check a flag in mm->flags before walking
the vma or it's going to slowdown the exit() path for regular tasks.
One more implementation problem: at that point signals can't be
delivered so it would also create a task in D state if the manager
doesn't read the event.
The major design issue: it overall looks superfluous as the manager can
check for -ENOSPC in the background transfer:
if (mmget_not_zero(ctx->mm)) {
[..]
} else {
return -ENOSPC;
}
It's safer to roll it back and re-introduce it later if at all.
[rppt@linux.vnet.ibm.com: documentation fixup after removal of UFFD_EVENT_EXIT]
Link: http://lkml.kernel.org/r/1488345437-4364-1-git-send-email-rppt@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20170224181957.19736-2-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All exit paths from gup_pte_range() require pte_unmap() of the original
pte page before returning. Refactor the code to have a single exit
point to do the unmap.
This mirrors the flow of the generic gup_pte_range() in mm/gup.c.
Link: http://lkml.kernel.org/r/148804251828.36605.14910389618497006945.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
gup_pte_range() fails to check pte_allows_gup() before translating a DAX
pte entry, pte_devmap(), to a page. This allows writes to read-only
mappings, and bypasses the DAX cacheline dirty tracking due to missed
'mkwrite' faults. The gup_huge_pmd() path and the gup_huge_pud() path
correctly check pte_allows_gup() before checking for _devmap() entries.
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Link: http://lkml.kernel.org/r/148804251312.36605.12665024794196605053.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Xiong Zhou <xzhou@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We use pte_write() to check whethwer the pte entry is writable. This is
mostly used to later mark the pte read only if it is writable. The other
use of pte_write() is to check whether the pte_entry is writable so that
hardware page table entry can be marked accordingly. This is used in kvm
where we look at qemu page table entry and update hardware hash page table
for the guest with correct write enable bit.
With the above, for the first usage we should also check the savedwrite
bit so that we can correctly clear the savedwite bit. For the later, we
add a new variant __pte_write().
With this we can revert write_protect_page part of 595cd8f256 ("mm/ksm:
handle protnone saved writes when making page write protect"). But I left
it as it is as an example code for savedwrite check.
Fixes: c137a2757b ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write")
Link: http://lkml.kernel.org/r/1488203787-17849-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to mark pages of parent process read only on fork. Numa fault
pte needs a protnone ptes variant with saved write flag set. On fork we
need to make sure we remove the saved write bit. Instead of adding the
protnone check in the caller update ptep_set_wrprotect variants to clear
savedwrite bit.
Without this we see random segfaults in application on fork.
Fixes: c137a2757b ("powerpc/mm/autonuma: switch ppc64 to its own implementation of saved write")
Link: http://lkml.kernel.org/r/1488203787-17849-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typos and add the following to the scripts/spelling.txt:
overide||override
While we are here, fix the doubled "address" in the touched line
Documentation/devicetree/bindings/regulator/ti-abb-regulator.txt.
Also, fix the comment block style in the touched hunks in
drivers/media/dvb-frontends/drx39xyj/drx_driver.h.
Link: http://lkml.kernel.org/r/1481573103-11329-21-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix typos and add the following to the scripts/spelling.txt:
disble||disable
disbled||disabled
I kept the TSL2563_INT_DISBLED in /drivers/iio/light/tsl2563.c
untouched. The macro is not referenced at all, but this commit is
touching only comment blocks just in case.
Link: http://lkml.kernel.org/r/1481573103-11329-20-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__do_fault assumes vmf->page has been initialized and is valid if
VM_FAULT_NOPAGE is not returned by vma->vm_ops->fault(vma, vmf).
handle_userfault() in turn should return VM_FAULT_NOPAGE if it doesn't
return VM_FAULT_SIGBUS or VM_FAULT_RETRY (the other two possibilities).
This VM_FAULT_NOPAGE case is only invoked when signal are pending and it
didn't matter for anonymous memory before. It only started to matter
since shmem was introduced. hugetlbfs also takes a different path and
doesn't exercise __do_fault.
Link: http://lkml.kernel.org/r/20170228154201.GH5816@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Three fixes for intel_pstate problems related to the passive
mode (in which it acts as a regular cpufreq scaling driver), two
for the handling of global P-state limits and one for the handling
of the cpu_frequency tracepoint in that mode (Rafael Wysocki).
- Three fixes for the handling of P-state limits in intel_pstate in
the active mode (Rafael Wysocki).
- Introduction of a new cpufreq.off=1 kernel command line argument
that will disable cpufreq entirely if passed to the kernel and
is simply hooked up to the existing code used by Xen (Len Brown).
- Fix for the schedutil cpufreq governor to prevent it from using
stale raw frequency values in configurations with mutiple CPUs
sharing one policy object and a cleanup for it reducing its
overhead slightly (Viresh Kumar).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCAAGBQJYwc9bAAoJEILEb/54YlRxU24P/2RPFpRYNtGNfbnvmyNjcA5H
MLQ/i0HEqhBVsfsj2Kmy6sRkY/X/kqnOiD2PzGcaOMTtMx2FcLNHzNqH0P0I7A4W
9wzUcq0SC5FzTJcLryN4gtJa3tfBDHjr6peAcZyji5t3DbGa+mygVwH8IGyr4feo
u7PE/6eXLfkRIySwG/kCI/YVE8tWuhjDHbjhR1pjgyrMjhDbPP480Mgsi4eDRRTO
BwGVyQJXoMo9e7/vZ8Y8Fkt7PMxYeyeE1yAGHkJzjkFfdrkZrn3IUPfgVgxy8IPV
N3w1BB+H84duEswPZH2patdIKQxXG3r0eaGHTXZIwy+5sHyc3mUMJ1FeHR67gasd
Z4p4hylTP+dGZGDo4G7PbVX4V6zEP+LSxgOQYjbo4n6k3nJJOrIF4wt+l5+tQz5m
Y+V6XVHgZm1LyFgjj06jMeVSmk6SS7X0qBHJ4WcfGzV/J+vStXmIvAmljs+cd/u2
R9z1W/rxelZFKT+4lRCuztpBYfJvlE5nXyM1XwC6Rz6QjFax0pJAHUPFznWyq24C
GlMAvQWPapDEoOvDrS/QKeqX1ROOYf2FwHs+uvPPCvOxMjL9NCQsQ34tqCM3MiL/
ebk3uZ0xC1e0LaOB87xr7tfsag12MyZzSCwX0D8nBLBLvntpa/xQwE5+4vu9ZguD
xlarnIsEh6bTwTVH6DJP
=Gwp7
-----END PGP SIGNATURE-----
Merge tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix several issues in the intel_pstate driver and one issue in
the schedutil cpufreq governor, clean up that governor a bit and hook
up existing code for disabling cpufreq to a new kernel command line
option.
Specifics:
- Three fixes for intel_pstate problems related to the passive mode
(in which it acts as a regular cpufreq scaling driver), two for the
handling of global P-state limits and one for the handling of the
cpu_frequency tracepoint in that mode (Rafael Wysocki).
- Three fixes for the handling of P-state limits in intel_pstate in
the active mode (Rafael Wysocki).
- Introduction of a new cpufreq.off=1 kernel command line argument
that will disable cpufreq entirely if passed to the kernel and is
simply hooked up to the existing code used by Xen (Len Brown).
- Fix for the schedutil cpufreq governor to prevent it from using
stale raw frequency values in configurations with mutiple CPUs
sharing one policy object and a cleanup for it reducing its
overhead slightly (Viresh Kumar)"
* tag 'pm-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
cpufreq: intel_pstate: Fix global settings in active mode
cpufreq: Add the "cpufreq.off=1" cmdline option
cpufreq: schedutil: Pass sg_policy to get_next_freq()
cpufreq: schedutil: move cached_raw_freq to struct sugov_policy
cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
cpufreq: intel_pstate: Do not use performance_limits in passive mode
Pull block fixes from Jens Axboe:
"Sending this a bit sooner than I otherwise would have, as a fix in the
merge window had some unfortunate issues and side effects for some
folks.
This contains:
- Fixes from Jan for the bdi registration/unregistration. These have
been tested by the various parties reporting issues, and should be
solid at this point.
- Also from Jan, fix for axonram gendisk registration.
- A stable fix for zram from Johannes.
- A small series from Ming, fixing up some long standing issues with
blk-mq hardware queue kobject initialization and registration.
- A fix for sed opal from Jon, fixing a nonsensical range check and
some set-but-not-used variables.
- A fix from Neil for a long standing deadlock issue for stacking
device drivers. With this in place, dm/md don't have to work around
the issue anymore, and can be properly fixed up"
* 'for-linus' of git://git.kernel.dk/linux-block:
axonram: Fix gendisk handling
blk: improve order of bio handling in generic_make_request()
Revert "scsi, block: fix duplicate bdi name registration crashes"
block: Make del_gendisk() safer for disks without queues
bdi: Fix use-after-free in wb_congested_put()
block: Allow bdi re-registration
block/sed: Fix opal user range check and unused variables
zram: set physical queue limits to avoid array out of bounds accesses
blk-mq: free hctx->cpumask in release handler of hctx's kobject
blk-mq: make lifetime consistent between hctx and its kobject
blk-mq: make lifetime consitent between q/ctx and its kobject
blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYwLA0AAoJEAhfPr2O5OEVbd0P/0dycXafb2UkwpQiyzN7j62T
95CB3YySddBUecT3WUNA5DIwD0rcImdzd6JSOiB12kREbxzSwTDP+Qpi1+7+ra55
T6F+nYoc4ptTTQtHPhXrgXXJUdqvQEg/zIb6fzRM+VBkEz7qM3WJCuokdbtzyebN
Z2YvwOxsprnZLdUm+loFlnNOHIstE7XcMCtoZFUQwr5lBvVc/SrhypfkJTaKG4Og
qggnaZW+yEu++mILGOPUmbHbKGxr5qKm5Aijj3L73T/XYloNRwHFvxv48/VrJkG6
hfYLV1FAo1Y5kfmUde1vUOhtMH5eNvz4Sg42KkYCOvJgngi78WYP+/YyenT0yMp4
BGSpLjaUML7zgz2TdkwDdfIzLAPPvvOtSoDyyzP9ELM6vUaUZpf8xPBrjHc6ZZy3
Tndu8IOzlOEFc4njcV+jzRBWqzTLRlxGsP8POKzDeZKTHj/DmAs+LzVnWtLHNEWE
rvem/A3zoo919YVolkkN/vdTWExBIplg2xwmdmfDLA/ZDYw8AbHUsGnT4SQM5UAl
7cHhhh+XZ9ORihrghYvHw4yZq6Nky8P/WgREMbD7XHOEW7sydnhI5xvFPVpWS/Uz
7+SfZFerMxoX8N9+E8UZ7aROO/dbzt8RBXdfHrThhEu/7SCHVEk5PzdRArshjtoK
4DnHrEN6evtmY0XrMPiy
=Liea
-----END PGP SIGNATURE-----
Merge tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
"Media regression fixes:
- serial_ir: fix a Kernel crash during boot on Kernel 4.11-rc1, due
to an IRQ code called too early
- other IR regression fixes at lirc and at the raw IR decoding
- a deadlock fix at the RC nuvoton driver
- fix another issue with DMA on stack at dw2102 driver
There's an extra patch there that change a driver interface for the
SoC VSP1 driver, with is shared between the DRM and V4L2 driver. The
patch itself is trivial, and was acked by David Arlie"
* tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure
[media] dw2102: don't do DMA on stack
[media] rc: protocol is not set on register for raw IR devices
[media] rc: raw decoder for keymap protocol is not loaded on register
[media] rc: nuvoton: fix deadlock in nvt_write_wakeup_codes
[media] lirc: fix dead lock between open and wakeup_filter
[media] serial_ir: ensure we're ready to receive interrupts
We added new gem ioctl flags and the new fences ioctl, but forgot
to bump the version.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reject it if there are any invalid flags or domains.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJYwPOdAAoJELDendYovxMvaREH/jWjZt38HQFyWYmpGN/jTl5e
fl2kK8PcYR50WDVMROG50MoWeDWj4OiCHkQTln/BIckBfi895qNE+S27Z6ZvPBcY
Xqx+lbcKej1KU5O11Kmmuz7Jz/h3pyP09lY7vG50pxLMBVJy8L2P3Oj66fB4MbY0
u5DQtPSwRDlf86gNisQuRHDYIF+LZ+ZQD5SL0hRz5UStnxojbX0oxP/ijz/tyshP
Qk5PZXWLOTcWn8mvKJu9wqfNur9FLT+FE8dYzAqa8hoLECl3wR3jUxGb1kNVt+GB
GuK6AHtJ7plVjfMYaAJtjYJnBaCGTCt3GuSzGJhoES0RkC/u1knuQBe87hKhzCc=
=gYGK
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fix and cleanup from Juergen Gross:
"This contains one fix for MSIX handling under Xen and a trivial
cleanup patch"
* tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xenbus: Remove duplicate inclusion of linux/init.h
xen: do not re-use pirq number cached in pci device msi msg data
For full 5-level paging we need a helper to allocate p4d page table.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Convert all non-architecture-specific code to 5-level paging.
It's mostly mechanical adding handling one more page table level in
places where we deal with pud_t.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Like with pgtable-nopud.h for 4-level paging, this new header is base
for converting an architectures to properly folded p4d_t level.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.
If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.
If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to introduce <asm-generic/pgtable-nop4d.h> to provide
abstraction for properly (in opposite to 5level-fixup.h hack) folded
p4d level. The new header will be included from pgtable-nopud.h.
If an architecture uses <asm-generic/nop*d.h>, we cannot use
5level-fixup.h directly to quickly convert the architecture to 5-level
paging as it would conflict with pgtable-nop4d.h.
With this patch an architecture can define __ARCH_USE_5LEVEL_HACK before
inclusion <asm-genenric/nop*d.h> to use 5level-fixup.h.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We are going to switch core MM to 5-level paging abstraction.
This is preparation step which adds <asm-generic/5level-fixup.h>
As with 4level-fixup.h, the new header allows quickly make all
architectures compatible with 5-level paging in core MM.
In long run we would like to switch architectures to properly folded p4d
level by using <asm-generic/pgtable-nop4d.h>, but it requires more
changes to arch-specific code.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Look for 'la57' in /proc/cpuinfo to see if your machine supports 5-level
paging.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Upstream commit 98d74f9cea ("xhci: fix 10 second timeout on removal of
PCI hotpluggable xhci controllers") fixes a problem with hot pluggable PCI
xhci controllers which can result in excessive timeouts, to the point where
the system reports a deadlock.
The same problem is seen with hot pluggable xhci controllers using the
xhci-plat driver, such as the driver used for Type-C ports on rk3399.
Similar to hot-pluggable PCI controllers, the driver for this chip
removes the xhci controller from the system when the Type-C cable is
disconnected.
The solution for PCI devices works just as well for non-PCI devices
and avoids the problem.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to xHCI spec, HCIVERSION containing a BCD encoding
of the xHCI specification revision number, 0100h corresponds
to xHCI version 1.0. Change "100" as "0x100".
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: stable <stable@vger.kernel.org>
Fixes: 04abb6de28 ("xhci: Read and parse new xhci
1.1 capability register")
Signed-off-by: Peter Chen <peter.chen@nxp.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hcc_params is set in xhci_gen_setup() called from usb_add_hcd(),
so checks the Maximum Primary Stream Array Size in the hcc_params
register after adding primary hcd.
Signed-off-by: Chunfeng Yun <chunfeng.yun@mediatek.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit b0c1e95ab4. It
contains a flaw and the next version has more features added which makes
me want to move it to the next cycle.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
This reverts commit 02dbfa5e55. I grabbed
the wrong version from the list and will pull the proper one from Peter
Rosin's mux tree.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
After commit 7999eecb7e ("i2c: exynos5: fix arbitration lost handling"),
some I2C transactions are failing because the TRANSFER_DONE_AUTO field is
not set in the I2C_TRANS_STATUS register so the i2c->status value is left
to -EINVAL causing the i2c->msg_complete completion to never be signaled.
For example, when reading the time of an I2C rtc on an Exynos5800 machine:
$ cat /sys/class/rtc/rtc0/time
[ 25.924594] exynos5-hsi2c 12e10000.i2c: rx timeout
[ 65.028365] max77686-rtc max77802-rtc: Fail to read time reg(-22)
cat: /sys/class/rtc/rtc0/time: Invalid argument
The Exynos5422 manual states clearly that most I2C_TRANS_STATUS reg bits
(including TRANSFER_DONE_AUTO) are cleared after the register is read. So
reading has side effects and should only be done if HSI2C_INT_I2C was set.
Fixes: 7999eecb7e ("i2c: exynos5: fix arbitration lost handling")
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
vgic updates:
- Honour disabling the ITS
- Don't deadlock when deactivating own interrupts via MMIO
- Correctly expose the lact of IRQ/FIQ bypass on GICv3
I/O virtualization:
- Make KVM_CAP_NR_MEMSLOTS big enough for large guests with
many PCIe devices
General bug fixes:
- Gracefully handle exception generated with syndroms that
the host doesn't understand
- Properly invalidate TLBs on VHE systems
-----BEGIN PGP SIGNATURE-----
iQFJBAABCAAzFiEEaVjJ8iM8Xp1syoGKqzCcdLk7HqEFAljBIgcVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJEKswnHS5Ox6hXKUH/j2K+UJrvBISpjEnF8v1rXcXSMxZ
8z/MT/bsmjpX0t/+YBpmNYrEA06RCWGDt4J5kGnl4imS+Xtqr16XeKW/rrOlydE5
JlvKXs31AWxym9ASnmLf8A2rODNtOWXOUrbYLm6VDUHb8E5ou1U2ywSVOvXqMSYD
pHtoD9PRqIQqxJtcV5DtJ3Xgg7AsIdmeBtBz1UGmi9rwKMk1hfwXrIPb2HCraiuY
/11RdIuWy5py62fi+x+coyXtpbCyhDDcGjiHHu4eqCqXtIZ91KkDSPFXfx14PhHB
UTTUBPKFwCdY+FFsWJtX20I+jGQt1nVL1yMbytdVJbmOixtHPQkjz17ReRU=
=hplc
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm
KVM/ARM updates for v4.11-rc2
vgic updates:
- Honour disabling the ITS
- Don't deadlock when deactivating own interrupts via MMIO
- Correctly expose the lact of IRQ/FIQ bypass on GICv3
I/O virtualization:
- Make KVM_CAP_NR_MEMSLOTS big enough for large guests with
many PCIe devices
General bug fixes:
- Gracefully handle exception generated with syndroms that
the host doesn't understand
- Properly invalidate TLBs on VHE systems
Before trying to do nested_get_page() in nested_vmx_merge_msr_bitmap(),
we have already checked that the MSR bitmap address is valid (4k aligned
and within physical limits). SDM doesn't specify what happens if the
there is no memory mapped at the valid address, but Intel CPUs treat the
situation as if the bitmap was configured to trap all MSRs.
KVM already does that by returning false and a correct handling doesn't
need the guest-trigerrable warning that was reported by syzkaller:
(The warning was originally there to catch some possible bugs in nVMX.)
------------[ cut here ]------------
WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
WARNING: CPU: 0 PID: 7832 at arch/x86/kvm/vmx.c:9709
nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 7832 Comm: syz-executor1 Not tainted 4.10.0+ #229
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
panic+0x1fb/0x412 kernel/panic.c:179
__warn+0x1c4/0x1e0 kernel/panic.c:540
warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
nested_vmx_merge_msr_bitmap arch/x86/kvm/vmx.c:9709 [inline]
nested_get_vmcs12_pages+0xfb6/0x15c0 arch/x86/kvm/vmx.c:9640
enter_vmx_non_root_mode arch/x86/kvm/vmx.c:10471 [inline]
nested_vmx_run+0x6186/0xaab0 arch/x86/kvm/vmx.c:10561
handle_vmlaunch+0x1a/0x20 arch/x86/kvm/vmx.c:7312
vmx_handle_exit+0xfc0/0x3f00 arch/x86/kvm/vmx.c:8526
vcpu_enter_guest arch/x86/kvm/x86.c:6982 [inline]
vcpu_run arch/x86/kvm/x86.c:7044 [inline]
kvm_arch_vcpu_ioctl_run+0x1418/0x4840 arch/x86/kvm/x86.c:7205
kvm_vcpu_ioctl+0x673/0x1120 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2570
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
[Jim Mattson explained the bare metal behavior: "I believe this behavior
would be documented in the chipset data sheet rather than the SDM,
since the chipset returns all 1s for an unclaimed read."]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
* pm-cpufreq:
cpufreq: intel_pstate: Do not reinit performance limits in ->setpolicy
cpufreq: intel_pstate: Fix intel_pstate_verify_policy()
cpufreq: intel_pstate: Fix global settings in active mode
cpufreq: Add the "cpufreq.off=1" cmdline option
cpufreq: intel_pstate: Avoid triggering cpu_frequency tracepoint unnecessarily
cpufreq: intel_pstate: Fix intel_cpufreq_verify_policy()
cpufreq: intel_pstate: Do not use performance_limits in passive mode
- irqchip/crossbar: Some type tidying up
- irqchip/gicv3-its: Workaround for a Qualcomm erratum
- irqdomain: Compile for for systems that don't use CONFIG_IRQ_DOMAIN
-----BEGIN PGP SIGNATURE-----
iQFJBAABCAAzFiEEaVjJ8iM8Xp1syoGKqzCcdLk7HqEFAljBLCQVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJEKswnHS5Ox6h6h8IAJKt0NHSLV4vQH51MdX1flj8uyw0
xzeosx/r7hsV/apbaFcciJuaOZNDYdVH+a1NboklpDe2g8N1eSwSITodXnsg3seO
1EyGWkZT9lWbz8/febvG8WPLLZJu0CeD9wNRT7cePTUgNZgGRR1IN6ySM/TtWixA
csIL9kgdRvChcU6hydTPceBzN5pSpZ169+BTeEL5CVTggRiq24fJlmBwM7caOu1Z
nZDsKckUQA5D/VO0CiopE6BpCZTBzB6TTC2251DEEfVxyL6/M947lOtsC4EXK4y2
6CjWM4II24ImvPmqYDNIwzK1ZimhWO0Uo8jy1LZZKVpiXXXk8zm6LKxqJbU=
=JV9b
-----END PGP SIGNATURE-----
Merge tag 'irq-fixes-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip/irqdomain updates for 4.11-rc2 from Marc Zyngier
- irqchip/crossbar: Some type tidying up
- irqchip/gicv3-its: Workaround for a Qualcomm erratum
- irqdomain: Compile for for systems that don't use CONFIG_IRQ_DOMAIN
Fixed up minor conflict in the crossbar driver.
Here's a fix for a digi_acceleport regression in -rc1, and some fixes
for long-standing issues in three other drivers, including a
NULL-pointer dereference and a couple of information leaks that could be
triggered by a malicious device.
Signed-off-by: Johan Hovold <johan@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCAAvFiEEHszNKQClByu0A+9RQQ3kT97htJUFAljBKGkRHGpvaGFuQGtl
cm5lbC5vcmcACgkQQQ3kT97htJVKgA/+MjxjlUXCmwpgOrZNsdbDnS1mev1lUVVI
kVEN5o/mOmUHlLzfaBduQlNWqFENWyanfSpHvz5XJDTtD7T7+o0AJAiCW/tnf4jX
r4mOH/Y38L3KbnoWEVw7p7kcrUcghqpQiIsG9iK4wfG+d2bDl1rV7tt9i9FY3hah
5Sl7lzb38g/hQdJ0Hp5rL9sxN5dBgHcHkFchiPUSRrlar0cFM6xCD3U9cYakv6/D
ec31UuskIhKB6B0TzNNT+lkIOVFZEvwYnDzW3AkGYycvRtMZzjIr598x38Xb/wlm
K3qY9p6LR/oQCjqslLLU8FOnUofKwRkz9YdMzwORaOoVaOkyCbfQ4ESuave3wQGs
5kUoP83Fu4d+lQvj6jNS0s7a3BioAt7whqxfJcptqEevEVNwpNw73vooowKxqprq
c/4+KaYoUvITfu/q/Lbl9hfXxIzeUKdD/y1TLBvw9DLmrG5furbgdG3UfRD/gpkR
7DIPKacIhrA4omXM+HBT4yHDgwIbJmqFDRcDBP2+QcwKb+PLcs4UFVka50cxBwpu
iNHqw6ZllX54jVUInGii3wYEf2aYQJbK4x0QwCNGMUU7/ld6reRJDWoQHKAT936g
D6Boli3+fIHhjJjleyRc7+/Gvhff61DWun8aOV6iwYcQ6crUuGCEdSV+uDkQkH3P
DujQYqzsHss=
=wkt6
-----END PGP SIGNATURE-----
Merge tag 'usb-serial-4.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus
Johan writes:
USB-serial fixes for v4.11-rc2
Here's a fix for a digi_acceleport regression in -rc1, and some fixes
for long-standing issues in three other drivers, including a
NULL-pointer dereference and a couple of information leaks that could be
triggered by a malicious device.
Signed-off-by: Johan Hovold <johan@kernel.org>
A recent change claimed to fix an off-by-one error in the OOB-port
completion handler, but instead introduced such an error. This could
specifically led to modem-status changes going unnoticed, effectively
breaking TIOCMGET.
Note that the offending commit fixes a loop-condition underflow and is
marked for stable, but should not be backported without this fix.
Reported-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 2d38088921 ("USB: serial: digi_acceleport: fix OOB data sanity check")
Cc: stable <stable@vger.kernel.org> # v2.6.30
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The platform_data header file was dropped in the merged version of the
USB251xB driver. Therefore remove its reference from the MAINTAINERS file.
Signed-off-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark the reg property as required and furthermore fix some typos and
spellings in the documentation.
Signed-off-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Rename oc-delay-* to oc-delay-us and make it expect a time value.
Furthermore add -ms suffix to power-on-time. There changes were
suggested by Rob Herring in https://lkml.org/lkml/2017/2/15/1283.
Signed-off-by: Richard Leitner <richard.leitner@skidata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Remove the max_{power,current}_{sp,bp} properties of the usb251xb driver
from devicetree. This is done to simplify the dt bindings as requested
by Rob Herring in https://lkml.org/lkml/2017/2/15/1283. If those
properties are ever needed by somebody they can be enabled again easily.
Signed-off-by: Richard Leitner <richard.leitner@skidata.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This USB-SATA bridge chip is used in a StarTech enclosure for
optical drives.
Without the quirk MakeMKV fails during the key exchange with an
installed BluRay drive:
> Error 'Scsi error - ILLEGAL REQUEST:COPY PROTECTION KEY EXCHANGE FAILURE - KEY NOT ESTABLISHED'
> occurred while issuing SCSI command AD010..080002400 to device 'SG:dev_11:2'
Signed-off-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to verify that we have the required interrupt-out endpoint for
IOWarrior56 devices to avoid dereferencing a NULL-pointer in write
should a malicious device lack such an endpoint.
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Make sure to check for the required interrupt-in endpoint to avoid
dereferencing a NULL-pointer should a malicious device lack such an
endpoint.
Note that a fairly recent change purported to fix this issue, but added
an insufficient test on the number of endpoints only, a test which can
now be removed.
Fixes: 4ec0ef3a82 ("USB: iowarrior: fix oops with malicious USB descriptors")
Fixes: 946b960d13 ("USB: add driver for iowarrior devices.")
Cc: stable <stable@vger.kernel.org> # 2.6.21
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The driver doesn't have a struct of_device_id table but supported devices
are registered via Device Trees. This is working on the assumption that a
I2C device registered via OF will always match a legacy I2C device ID and
that the MODALIAS reported will always be of the form i2c:<device>.
But this could change in the future so the correct approach is to have an
OF device ID table if the devices are registered via OF.
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In patch 2e2aa1bc7eff90ecm, USB suspend and wakeup control requests are
passed to SFR_OHCIICR register. If a processor does not have such a
register, this hub control request will be dropped.
If no such a SFR register is available, all USB suspend control requests
will now be processed using ohci_hub_control()
(like before patch 2e2aa1bc7eff90ecm.)
Tested on an Atmel AT91SAM9G20 with an on-board TI TUSB2046B hub chip
If the last USB device is unplugged from the USB hub, the hub goes into
sleep and will not wakeup when an USB devices is inserted.
Fixes: 2e2aa1bc7e ("usb: ohci-at91: Forcibly suspend ports while USB suspend")
Signed-off-by: Jelle Martijn Kok <jmkok@youcom.nl>
Tested-by: Wenyou Yang <wenyou.yang@atmel.com>
Cc: Wenyou Yang <wenyou.yang@atmel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@vger.kernel.org>
Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com>
Reviewed-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Having only 32 memslots is a real constraint for the maximum
number of PCI devices that can be assigned to a single guest.
Assuming each PCI device/virtual function having two memory BAR
regions, we could assign only 15 devices/virtual functions to a
guest.
Hence increase KVM_USER_MEM_SLOTS to 512 as done in other archs like
powerpc.
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Linu Cherian <linu.cherian@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Return KVM_USER_MEM_SLOTS for userspace capability query on
NR_MEMSLOTS.
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Linu Cherian <linu.cherian@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>