Commit Graph

692584 Commits

Author SHA1 Message Date
Rafael J. Wysocki 3de559d4df Merge branches 'acpi-soc', 'acpi-wdat' and 'acpi-cppc'
* acpi-soc:
  ACPI: APD: Fix HID for Hisilicon Hip07/08
  ACPI / LPSS: Only call pwm_add_table() for the first PWM controller

* acpi-wdat:
  ACPI / watchdog: Fix init failure with overlapping register regions

* acpi-cppc:
  mailbox: pcc: Fix crash when request PCC channel 0
2017-08-03 20:30:18 +02:00
Rafael J. Wysocki 78aa904a77 Merge branches 'pm-core' and 'pm-misc'
* pm-core:
  PM / runtime: Document new pm_runtime_set_suspended() constraint

* pm-misc:
  thunderbolt: icm: Ignore mailbox errors in icm_suspend()
2017-08-03 20:29:45 +02:00
Rafael J. Wysocki 8a05c3115d Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate'
* pm-cpufreq-x86:
  cpufreq: x86: Make scaling_cur_freq behave more as expected

* pm-cpufreq-docs:
  cpufreq: docs: Add missing cpuinfo_cur_freq description

* intel_pstate:
  cpufreq: intel_pstate: Drop ->get from intel_pstate structure
2017-08-03 20:29:24 +02:00
Radim Krčmář 53a5abd839 KVM/ARM Fixes for v4.13-rc4
- Yet another race with VM destruction plugged
 - A set of small vgic fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlmDOZ4VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDnwQP/ji7d7bbWybmeuWKXm8TYt6CNPBr
 JC57OQHytU1+ccagjeUPaAgUAqWSQWOwfbqPuS1afshJu1ZZsN851QNmcr5W6bXQ
 EwjICGWcAdKSMLnhRDdf+uXQbFSkkcaxsXnJWjmD0EE8ylaXCCkdV278xhTx0hVO
 yiov/xWNzLMa3O3W/l4SIKQ4UNyeQ7f+Od1Vkf7iUpaaFcz6s3RNsPAOwc7Thq7L
 eLk4SgXMLDNHz5HwdLoOp3RwQsNe9A7uh05z9VOav0YAyLG5vf29JhMroCO/4P/x
 1HgWvpGwzxAUqUcHbW4FSOsbydEltlWMdfSoAF7BtPCLudmmsxfMJJlK3FKLvV+P
 MsO2FdXiz6/ZLI9/ds7YJDOfc1cGSVK07Efx+SLXD5FAnkFYq4SElmI4sK8BWc5U
 Dugw3j8u9M6jTiVKBioFo7yRT5iHG9O0A+6DtsMQ0iREfLolC7EEzpfGB/vWpKk5
 BbdwDUiu6JxXEBJJqyP948iGfuIrikAx2mcf3dmEHVKhEnKi+MN/H5c9BKGAJ3sf
 Fm1xF99IG+m3L3zXiukQ11OqsCx32kqLiXaejVttscFg/C33OYmGOhsk066hHCzx
 LtoxrHUl7rwF+ssnKHq4Az2mYzZtquEq13O5tgleQcVwH/2+6EgWDf3mDSy1em0h
 5IXgqk+PsaQ5n0jL
 =TMR9
 -----END PGP SIGNATURE-----

Merge tag 'kvm-arm-for-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm

KVM/ARM Fixes for v4.13-rc4

- Yet another race with VM destruction plugged
- A set of small vgic fixes
2017-08-03 17:59:58 +02:00
Christoffer Dall 3af4e414af KVM: arm/arm64: vgic: Use READ_ONCE fo cmpxchg
There is a small chance that the compiler could generate separate loads
for the dist->propbaser which could be modified from another CPU.  As we
want to make sure we atomically update the entire value, and don't race
with other updates, guarantee that the cmpxchg operation compares
against the original value.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2017-08-03 15:47:36 +01:00
Wanpeng Li 6550c4df7e KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit"
------------[ cut here ]------------
 WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
 CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
 RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
Call Trace:
  vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
  kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
  ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
  ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
  kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
  ? __fget+0xfc/0x210
  do_vfs_ioctl+0xa4/0x6a0
  ? __fget+0x11d/0x210
  SyS_ioctl+0x79/0x90
  do_syscall_64+0x8f/0x750
  ? trace_hardirqs_on_thunk+0x1a/0x1c
  entry_SYSCALL64_slow_path+0x25/0x25

This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
means that tells the kernel to not make use of any IOAPICs that may be present
in the system.

Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
variable passed from vcpu_enter_guest() which means that the L0's userspace
requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
L0's userspace reqeusts an irq window) is true, so there is no interrupt which
L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
interrupt on exit" for the irq window requirement in this scenario.

This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
if there is no L1 requirement to inject an interrupt to L2.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
[Added code comment to make it obvious that the behavior is not correct.
 We should do a userspace exit with open interrupt window instead of the
 nested VM exit.  This patch still improves the behavior, so it was
 accepted as a (temporary) workaround.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-03 15:38:11 +02:00
Shawn Lin 7c84b8b43d mmc: block: bypass the queue even if usage is present for hotplug
The commit 304419d8a7 ("mmc: core: Allocate per-request data using the
block layer core") refactored mechanism of queue handling caused
mmc_init_request() can be called just after mmc_cleanup_queue() caused null
pointer dereference.

Another commit bbdc74dc19 ("mmc: block: Prevent new req entering queue
after its cleanup") tried to fix the problem. However it actually miss one
corner case.

We could still reproduce the issue mentioned with these steps:
(1) insert a SD card and mount it
(2) hotplug it, so it will leave md->usage still be counted
(3) reboot the system which will sync data and umount the card

[Unable to handle kernel NULL pointer dereference at virtual address
00000000
[user pgtable: 4k pages, 48-bit VAs, pgd = ffff80007bab3000
[[0000000000000000] *pgd=000000007a828003, *pud=0000000078dce003,
*pmd=000000007aab6003, *pte=0000000000000000
[Internal error: Oops: 96000007 [#1] PREEMPT SMP
[Modules linked in:
[CPU: 3 PID: 3507 Comm: umount Tainted: G        W
4.13.0-rc1-next-20170720-00012-g9d9bf45 #33
[Hardware name: Firefly-RK3399 Board (DT)
[task: ffff80007a1de200 task.stack: ffff80007a01c000
[PC is at mmc_init_request+0x14/0xc4
[LR is at alloc_request_size+0x4c/0x74
[pc : [<ffff0000087d7150>] lr : [<ffff000008378fe0>] pstate: 600001c5
[sp : ffff80007a01f8f0

....

[[<ffff0000087d7150>] mmc_init_request+0x14/0xc4
[[<ffff000008378fe0>] alloc_request_size+0x4c/0x74
[[<ffff00000817ac28>] mempool_create_node+0xb8/0x17c
[[<ffff00000837aadc>] blk_init_rl+0x9c/0x120
[[<ffff000008396580>] blkg_alloc+0x110/0x234
[[<ffff000008396ac8>] blkg_create+0x424/0x468
[[<ffff00000839877c>] blkg_lookup_create+0xd8/0x14c
[[<ffff0000083796bc>] generic_make_request_checks+0x368/0x3b0
[[<ffff00000837b050>] generic_make_request+0x1c/0x240

So mmc_blk_put wouldn't calling blk_cleanup_queue which actually the
QUEUE_FLAG_DYING and QUEUE_FLAG_BYPASS should stay. Block core expect
blk_queue_bypass_{start, end} internally to bypass/drain the queue before
actually dying the queue, so it didn't expose API to set the queue bypass.
I think we should set QUEUE_FLAG_BYPASS whenever queue is removed, although
the md->usage is still counted, as no dispatch queue could be found then.

Fixes: 304419d8a7 ("mmc: core: Allocate per-request data using the block layer core")
Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-03 11:00:39 +02:00
Ludovic Desroches 7a1e3f1431 mmc: sdhci-of-at91: force card detect value for non removable devices
When the device is non removable, the card detect signal is often used
for another purpose i.e. muxed to another SoC peripheral or used as a
GPIO. It could lead to wrong behaviors depending the default value of
this signal if not muxed to the SDHCI controller.

Fixes: bb5f8ea4d5 ("mmc: sdhci-of-at91: introduce driver for the Atmel SDMMC")
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2017-08-03 10:50:55 +02:00
Linus Torvalds 19ec50a438 Two more NFS client bugfixes for 4.13
Stable fix:
 - Fix EXCHANGE_ID corrupt verifier issue
 
 Other fix:
 - Fix double frees in nfs4_test_session_trunk()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAlmCNJAACgkQ18tUv7Cl
 QOvlKhAAjjKMtdwYDV7B3SJvDfa5KNH9nNaMjKoD/OzWavDL7jRVRUWayeJiy4sf
 wJAMsG280ztag1Mr/OnClESZThWNrHTnKjq/P8kuefJz+pPvWP+AaQ/8QSM/rulM
 Y7qfh+FZb2t7lK80VAZTXEi4bvQzlkIV2285a34VIUm3VTpwl3HvltkBMLFztDSb
 sdaQh+N48FOYdG9RMncxpzFfRvUILzP9GFMg6WAFAcbr1wSsT9MfLqd+ANgOidE8
 X2Ch9l0jN51m1dFGmMUZ5HMmuhUh2m50SXhmUOTpS7fj+k11OWsV2IIWHKPk/LjI
 5E0aUPDU2TOE9noLSfkguAyxvqAGeAHPDIE7QM9vTGgktzXWGzh++ODeSzkIDp+5
 iZ+cYLkme1lOg1nEqj1/SrIZXHP7ZCvK8iqNgoqT8rIp6zE1tLPnNAbRDmnwC/VH
 PrfINATV8llN/3vFojWJfD1enDe3+dALPINLVQOjwjafq6d5/hzRdCGqWpCDjmlU
 esDoCkoWGUjadIr6EZGtDAzSZgafsUx6d6QnGtkIUsz1d0FIXH6holB5EKaQpOLf
 dVb9iO5R2EcVP+WvB3KjA5y6MCNvoVqebxMvLBPCYsu3fI8fQ0sgSjgSJXi0fRzg
 G7DUsKcBGVKarDMXTUReL2G6lN7h0f5EsM1WnA9KF0TV9aah/ZE=
 =Ddq9
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:
 "Two fixes from Trond this time, now that he's back from his vacation.
  The first is a stable fix for the EXCHANGE_ID issue on the mailing
  list, and the other fixes a double-free situation that he found at the
  same time.

  Stable fix:
   - Fix EXCHANGE_ID corrupt verifier issue

  Other fix:
   - Fix double frees in nfs4_test_session_trunk()"

* tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4: Fix double frees in nfs4_test_session_trunk()
  NFSv4: Fix EXCHANGE_ID corrupt verifier issue
2017-08-02 20:56:44 -07:00
Annie Cherkaev 9f5af546e6 isdn/i4l: fix buffer overflow
This fixes a potential buffer overflow in isdn_net.c caused by an
unbounded strcpy.

[ ISDN seems to be effectively unmaintained, and the I4L driver in
  particular is long deprecated, but in case somebody uses this..
    - Linus ]

Signed-off-by: Jiten Thakkar <jitenmt@gmail.com>
Signed-off-by: Annie Cherkaev <annie.cherk@gmail.com>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 20:43:36 -07:00
Tero Kristo f54d2cd3c1 clk: keystone: sci-clk: Fix sci_clk_get
Currently a bug in the sci_clk_get implementation causes it to always
return a clock belonging to the last device in the static list of clock
data. This is due to a bug in the init code that causes the array
used by sci_clk_get to only be populated with the clocks for the last
device, as each device overwrites the entire array with its own clocks.

Fix this by calculating the actual number of clocks for the SoC, and
allocating the whole array in one go. Also, we don't need the handle
to the init data array anymore after doing this, instead we can
just compare the dev_id / clk_id against the registered clocks and
use binary search for speed.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
Reported-by: Dave Gerlach <d-gerlach@ti.com>
Fixes: b745c0794e ("clk: keystone: Add sci-clk driver support")
Cc: Nishanth Menon <nm@ti.com>
Tested-by: Franklin Cooper <fcooper@ti.com>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2017-08-02 18:37:26 -07:00
Jan Kara 19ec8e4858 ocfs2: don't clear SGID when inheriting ACLs
When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
set, DIR1 is expected to have SGID bit set (and owning group equal to
the owning group of 'DIR0').  However when 'DIR0' also has some default
ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
'DIR1' to get cleared if user is not member of the owning group.

Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
into ocfs2_iop_set_acl().  That way the function will not be called when
inheriting ACLs which is what we want as it prevents SGID bit clearing
and the mode has been properly set by posix_acl_create() anyway.  Also
posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
mode itself.

Fixes: 073931017b ("posix_acl: Clear SGID bit when setting file permissions")
Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Kan Liang 1ee1c3f5b5 mm: allow page_cache_get_speculative in interrupt context
Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
handler.

The bug was introduced by commit 2947ba054a ("x86/mm/gup: Switch GUP
to the generic get_user_page_fast() implementation").

The original x86 __get_user_page_fast used plain get_page() or
page_ref_add().  However, the generic __get_user_page_fast uses
page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).

There is no reason to prevent page_cache_get_speculative from using in
interrupt context.  According to the author, putting a BUG_ON there is
just because the code is not verifying correctness of interrupt races.
I did some tests in interrupt context.  There is no issue found.

Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().

Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
Fixes: 2947ba054a ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
Signed-off-by: Kan Liang <kan.liang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ying Huang <ying.huang@intel.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Mike Rapoport 5a18b64e3f userfaultfd: non-cooperative: flush event_wqh at release time
There may still be threads waiting on event_wqh at the time the
userfault file descriptor is closed.  Flush the events wait-queue to
prevent waiting threads from hanging.

Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 9cd75c3cd4 ("userfaultfd: non-cooperative: add ability to report
non-PF events from uffd descriptor")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:13 -07:00
Kees Cook ade9f91b32 ipc: add missing container_of()s for randstruct
When building with the randstruct gcc plugin, the layout of the IPC
structs will be randomized, which requires any sub-structure accesses to
use container_of().  The proc display handlers were missing the needed
container_of()s since the iterator is passing in the top-level struct
kern_ipc_perm.

This would lead to crashes when running the "lsipc" program after the
system had IPC registered (e.g. after starting up Gnome):

  general protection fault: 0000 [#1] PREEMPT SMP
  ...
  RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
  ...
  Call Trace:
    sysvipc_shm_proc_show+0x5e/0x150
    sysvipc_proc_show+0x1a/0x30
    seq_read+0x2e9/0x3f0
  ...

Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
Fixes: 3859a271a0 ("randstruct: Mark various structs for randomization")
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Dima Zavin 89affbf5d9 cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
In codepaths that use the begin/retry interface for reading
mems_allowed_seq with irqs disabled, there exists a race condition that
stalls the patch process after only modifying a subset of the
static_branch call sites.

This problem manifested itself as a deadlock in the slub allocator,
inside get_any_partial.  The loop reads mems_allowed_seq value (via
read_mems_allowed_begin), performs the defrag operation, and then
verifies the consistency of mem_allowed via the read_mems_allowed_retry
and the cookie returned by xxx_begin.

The issue here is that both begin and retry first check if cpusets are
enabled via cpusets_enabled() static branch.  This branch can be
rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
x86 jump label code fully synchronizes across all CPUs for every entry
it rewrites.  If it rewrites only one of the callsites (specifically the
one in read_mems_allowed_retry) and then waits for the
smp_call_function(do_sync_core) to complete while a CPU is inside the
begin/retry section with IRQs off and the mems_allowed value is changed,
we can hang.

This is because begin() will always return 0 (since it wasn't patched
yet) while retry() will test the 0 against the actual value of the seq
counter.

The fix is to use two different static keys: one for begin
(pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
always return a valid seqcount if are enabling cpusets.  Similarly, when
disabling cpusets via cpuset_dec(), we first ensure that callers of
cpuset_mems_allowed_retry() will start ignoring the seqcount value
before we let cpuset_mems_allowed_begin() return 0.

The relevant stack traces of the two stuck threads:

  CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
  RIP: smp_call_function_many+0x1f9/0x260
  Call Trace:
    smp_call_function+0x3b/0x70
    on_each_cpu+0x2f/0x90
    text_poke_bp+0x87/0xd0
    arch_jump_label_transform+0x93/0x100
    __jump_label_update+0x77/0x90
    jump_label_update+0xaa/0xc0
    static_key_slow_inc+0x9e/0xb0
    cpuset_css_online+0x70/0x2e0
    online_css+0x2c/0xa0
    cgroup_apply_control_enable+0x27f/0x3d0
    cgroup_mkdir+0x2b7/0x420
    kernfs_iop_mkdir+0x5a/0x80
    vfs_mkdir+0xf6/0x1a0
    SyS_mkdir+0xb7/0xe0
    entry_SYSCALL_64_fastpath+0x18/0xad

  ...

  CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
  Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
  task: ffff8818087c0000 task.stack: ffffc90000030000
  RIP: int3+0x39/0x70
  Call Trace:
    <#DB> ? ___slab_alloc+0x28b/0x5a0
    <EOE> ? copy_process.part.40+0xf7/0x1de0
    __slab_alloc.isra.80+0x54/0x90
    copy_process.part.40+0xf7/0x1de0
    copy_process.part.40+0xf7/0x1de0
    kmem_cache_alloc_node+0x8a/0x280
    copy_process.part.40+0xf7/0x1de0
    _do_fork+0xe7/0x6c0
    _raw_spin_unlock_irq+0x2d/0x60
    trace_hardirqs_on_caller+0x136/0x1d0
    entry_SYSCALL_64_fastpath+0x5/0xad
    do_syscall_64+0x27/0x350
    SyS_clone+0x19/0x20
    do_syscall_64+0x60/0x350
    entry_SYSCALL64_slow_path+0x25/0x25

Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
Fixes: 46e700abc4 ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
Signed-off-by: Dima Zavin <dmitriyz@waymo.com>
Reported-by: Cliff Spradlin <cspradlin@waymo.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Mike Rapoport 9d95aa4bad userfaultfd_zeropage: return -ENOSPC in case mm has gone
In the non-cooperative userfaultfd case, the process exit may race with
outstanding mcopy_atomic called by the uffd monitor.  Returning -ENOSPC
instead of -EINVAL when mm is already gone will allow uffd monitor to
distinguish this case from other error conditions.

Unfortunately I overlooked userfaultfd_zeropage when updating
userfaultd_copy().

Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
Fixes: 96333187ab ("userfaultfd_copy: return -ENOSPC in case mm has gone")
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:12 -07:00
Heiko Carstens 167d0f258f mm: take memory hotplug lock within numa_zonelist_order_handler()
Andre Wild reported the following warning:

  WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x4c/0x60
  Modules linked in:
  CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57ec20 #10
  Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
  task: 00000000701d8100 task.stack: 0000000073594000
  Krnl PSW : 0704f00180000000 0000000000145e24 (lockdep_assert_cpus_held+0x4c/0x60)
  ...
  Call Trace:
   lockdep_assert_cpus_held+0x42/0x60)
   stop_machine_cpuslocked+0x62/0xf0
   build_all_zonelists+0x92/0x150
   numa_zonelist_order_handler+0x102/0x150
   proc_sys_call_handler.isra.12+0xda/0x118
   proc_sys_write+0x34/0x48
   __vfs_write+0x3c/0x178
   vfs_write+0xbc/0x1a0
   SyS_write+0x66/0xc0
   system_call+0xc4/0x2b0
   locks held by bash/1205:
   #0:  (sb_writers#4){.+.+.+}, at: vfs_write+0xa6/0x1a0
   #1:  (zl_order_mutex){+.+...}, at: numa_zonelist_order_handler+0x44/0x150
   #2:  (zonelists_mutex){+.+...}, at: numa_zonelist_order_handler+0xf4/0x150
  Last Breaking-Event-Address:
    lockdep_assert_cpus_held+0x48/0x60

This can be easily triggered with e.g.

    echo n > /proc/sys/vm/numa_zonelist_order

In commit 3f906ba236 ("mm/memory-hotplug: switch locking to a percpu
rwsem") memory hotplug locking was changed to fix a potential deadlock.

This also switched the stop_machine() invocation within
build_all_zonelists() to stop_machine_cpuslocked() which now expects
that online cpus are locked when being called.

This assumption is not true if build_all_zonelists() is being called
from numa_zonelist_order_handler().

In order to fix this simply add a mem_hotplug_begin()/mem_hotplug_done()
pair to numa_zonelist_order_handler().

Link: http://lkml.kernel.org/r/20170726111738.38768-1-heiko.carstens@de.ibm.com
Fixes: 3f906ba236 ("mm/memory-hotplug: switch locking to a percpu rwsem")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Andre Wild <wild@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:11 -07:00
Tetsuo Handa b0ba2d0faf mm/page_io.c: fix oops during block io poll in swapin path
When a thread is OOM-killed during swap_readpage() operation, an oops
occurs because end_swap_bio_read() is calling wake_up_process() based on
an assumption that the thread which called swap_readpage() is still
alive.

  Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
  Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
  oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
  general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
  Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-20170725 #129
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
  task: ffffffffb7c16500 task.stack: ffffffffb7c00000
  RIP: 0010:__lock_acquire+0x151/0x12f0
  Call Trace:
   <IRQ>
   lock_acquire+0x59/0x80
   _raw_spin_lock_irqsave+0x3b/0x4f
   try_to_wake_up+0x3b/0x410
   wake_up_process+0x10/0x20
   end_swap_bio_read+0x6f/0xf0
   bio_endio+0x92/0xb0
   blk_update_request+0x88/0x270
   scsi_end_request+0x32/0x1c0
   scsi_io_completion+0x209/0x680
   scsi_finish_command+0xd4/0x120
   scsi_softirq_done+0x120/0x140
   __blk_mq_complete_request_remote+0xe/0x10
   flush_smp_call_function_queue+0x51/0x120
   generic_smp_call_function_single_interrupt+0xe/0x20
   smp_trace_call_function_single_interrupt+0x22/0x30
   smp_call_function_single_interrupt+0x9/0x10
   call_function_single_interrupt+0xa7/0xb0
   </IRQ>
  RIP: 0010:native_safe_halt+0x6/0x10
   default_idle+0xe/0x20
   arch_cpu_idle+0xa/0x10
   default_idle_call+0x1e/0x30
   do_idle+0x187/0x200
   cpu_startup_entry+0x6e/0x70
   rest_init+0xd0/0xe0
   start_kernel+0x456/0x477
   x86_64_start_reservations+0x24/0x26
   x86_64_start_kernel+0xf7/0x11a
   secondary_startup_64+0xa5/0xa5
  Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
  RIP: __lock_acquire+0x151/0x12f0 RSP: ffffa01f39e03c50
  ---[ end trace 6c441db499169b1e ]---
  Kernel panic - not syncing: Fatal exception in interrupt
  Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
  ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Fix it by holding a reference to the thread.

[akpm@linux-foundation.org: add comment]
Fixes: 23955622ff ("swap: add block io poll in swapin path")
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Shaohua Li <shli@fb.com>
Cc: Tim Chen <tim.c.chen@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 17:16:11 -07:00
Minchan Kim 3189c82056 zram: do not free pool->size_class
Mike reported kernel goes oops with ltp:zram03 testcase.

  zram: Added device: zram0
  zram0: detected capacity change from 0 to 107374182400
  BUG: unable to handle kernel paging request at 0000306d61727a77
  IP: zs_map_object+0xb9/0x260
  PGD 0
  P4D 0
  Oops: 0000 [#1] SMP
  Dumping ftrace buffer:
     (ftrace buffer empty)
  Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
   nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
  CPU: 6 PID: 12356 Comm: swapon Tainted: G            E   4.13.0.g87b2c3f-default #194
  Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698     , BIOS -[D6E150AUS-1.10]- 12/15/2010
  task: ffff880158d2c4c0 task.stack: ffffc90001680000
  RIP: 0010:zs_map_object+0xb9/0x260
  Call Trace:
   zram_bvec_rw.isra.26+0xe8/0x780 [zram]
   zram_rw_page+0x6e/0xa0 [zram]
   bdev_read_page+0x81/0xb0
   do_mpage_readpage+0x51a/0x710
   mpage_readpages+0x122/0x1a0
   blkdev_readpages+0x1d/0x20
   __do_page_cache_readahead+0x1b2/0x270
   ondemand_readahead+0x180/0x2c0
   page_cache_sync_readahead+0x31/0x50
   generic_file_read_iter+0x7e7/0xaf0
   blkdev_read_iter+0x37/0x40
   __vfs_read+0xce/0x140
   vfs_read+0x9e/0x150
   SyS_read+0x46/0xa0
   entry_SYSCALL_64_fastpath+0x1a/0xa5
  Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 <8b> 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
  RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
  CR2: 0000306d61727a77

He bisected the problem is [1].

After commit cf8e0fedf0 ("mm/zsmalloc: simplify zs_max_alloc_size
handling"), zram doesn't use double pointer for pool->size_class any
more in zs_create_pool so counter function zs_destroy_pool don't need to
free it, either.

Otherwise, it does kfree wrong address and then, kernel goes Oops.

Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
Fixes: cf8e0fedf0 ("mm/zsmalloc: simplify zs_max_alloc_size handling")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:47 -07:00
Jonathan Corbet d16977f3a6 kthread: fix documentation build warning
The kerneldoc comment for kthread_create() had an incorrect argument
name, leading to a warning in the docs build.

Correct it, and make one more small step toward a warning-free build.

Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.net
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:47 -07:00
Arnd Bergmann e7701557bf kasan: avoid -Wmaybe-uninitialized warning
gcc-7 produces this warning:

  mm/kasan/report.c: In function 'kasan_report':
  mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     print_shadow_for_address(info->first_bad_addr);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here

The code seems fine as we only print info.first_bad_addr when there is a
shadow, and we always initialize it in that case, but this is relatively
hard for gcc to figure out after the latest rework.

Adding an intialization to the most likely value together with the other
struct members shuts up that warning.

Fixes: b235b9808664 ("kasan: unify report headers")
Link: https://patchwork.kernel.org/patch/9641417/
Link: http://lkml.kernel.org/r/20170725152739.4176967-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Alexander Potapenko <glider@google.com>
Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Mike Rapoport b228237193 userfaultfd: non-cooperative: notify about unmap of destination during mremap
When mremap is called with MREMAP_FIXED it unmaps memory at the
destination address without notifying userfaultfd monitor.

If the destination were registered with userfaultfd, the monitor has no
way to distinguish between the old and new ranges and to properly relate
the page faults that would occur in the destination region.

Fixes: 897ab3e0c4 ("userfaultfd: non-cooperative: add event for memory unmaps")
Link: http://lkml.kernel.org/r/1500276876-3350-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Mel Gorman 3ea277194d mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

        CPU0                            CPU1
        ----                            ----
                                        user accesses memory using RW PTE
                                        [PTE now cached in TLB]
        try_to_unmap_one()
        ==> ptep_get_and_clear()
        ==> set_tlb_ubc_flush_pending()
                                        mprotect(addr, PROT_READ)
                                        ==> change_pte_range()
                                        ==> [ PTE non-present - no flush ]

                                        user writes using cached RW PTE
        ...

        try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access.  For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB.  However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption.  In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch.  Other users such as gup as a race with
reclaim looks just at PTEs.  huge page variants should be ok as they
don't race with reclaim.  mincore only looks at PTEs.  userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected.  His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: <stable@vger.kernel.org>	[v4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Kefeng Wang 27e37d84e5 pid: kill pidhash_size in pidhash_init()
After commit 3d375d7859 ("mm: update callers to use HASH_ZERO flag"),
drop unused pidhash_size in pidhash_init().

Link: http://lkml.kernel.org/r/1500389267-49222-1-git-send-email-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Pavel Tatashin <Pasha.Tatashin@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
Daniel Jordan 2be7cfed99 mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
Commit 9a291a7c94 ("mm/hugetlb: report -EHWPOISON not -EFAULT when
FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain
errors from follow_hugetlb_page.  After such error, __get_user_pages
subsequently calls faultin_page on the same VMA and start address that
follow_hugetlb_page failed on instead of returning the error immediately
as it should.

In follow_hugetlb_page, when hugetlb_fault returns a value covered under
VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages
to 0 as __get_user_pages expects in this case, which causes the
following to happen in __get_user_pages: the "while (nr_pages)" check
succeeds, we skip the "if (!vma..." check because we got a VMA the last
time around, we find no page with follow_page_mask, and we call
faultin_page, which calls hugetlb_fault for the second time.

This issue also slightly changes how __get_user_pages works.  Before, it
only returned error if it had made no progress (i = 0).  But now,
follow_hugetlb_page can clobber "i" with an error code since its new
return path doesn't check for progress.  So if "i" is nonzero before a
failing call to follow_hugetlb_page, that indication of progress is lost
and __get_user_pages can return error even if some pages were
successfully pinned.

To fix this, change follow_hugetlb_page so that it updates nr_pages,
allowing __get_user_pages to fail immediately and restoring the "error
only if no progress" behavior to __get_user_pages.

Tested that __get_user_pages returns when expected on error from
hugetlb_fault in follow_hugetlb_page.

Fixes: 9a291a7c94 ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified")
Link: http://lkml.kernel.org/r/1500406795-58462-1-git-send-email-daniel.m.jordan@oracle.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: James Morse <james.morse@arm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: zhong jiang <zhongjiang@huawei.com>
Cc: <stable@vger.kernel.org>	[4.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-02 16:34:46 -07:00
David Matlack c9f04407f2 KVM: nVMX: mark vmcs12 pages dirty on L2 exit
The host physical addresses of L1's Virtual APIC Page and Posted
Interrupt descriptor are loaded into the VMCS02. The CPU may write
to these pages via their host physical address while L2 is running,
bypassing address-translation-based dirty tracking (e.g. EPT write
protection). Mark them dirty on every exit from L2 to prevent them
from getting out of sync with dirty tracking.

Also mark the virtual APIC page and the posted interrupt descriptor
dirty when KVM is virtualizing posted interrupt processing.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:04 +02:00
David Matlack 8ca44e88c3 kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown
According to the Intel SDM, software cannot rely on the current VMCS to be
coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
flushes.

24.11.1 Software Use of Virtual-Machine Control Structures
...
  If a logical processor leaves VMX operation, any VMCSs active on
  that logical processor may be corrupted (see below). To prevent
  such corruption of a VMCS that may be used either after a return
  to VMX operation or on another logical processor, software should
  execute VMCLEAR for that VMCS before executing the VMXOFF instruction
  or removing power from the processor (e.g., as part of a transition
  to the S3 and S4 power states).
...

This fixes a "suspicious rcu_dereference_check() usage!" warning during
kvm_vm_release() because nested_release_vmcs12() calls
kvm_vcpu_write_guest_page() without holding kvm->srcu.

Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:03 +02:00
Paolo Bonzini 9f744c5974 KVM: nVMX: do not pin the VMCS12
Since the current implementation of VMCS12 does a memcpy in and out
of guest memory, we do not need current_vmcs12 and current_vmcs12_page
anymore.  current_vmptr is enough to read and write the VMCS12.

And David Matlack noted:

  This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
  VMCS12 page by using kvm_write_guest. nested_release_page() only marks
  the struct page dirty.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
[Added David Matlack's note and nested_release_page_clean() fix.]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:03 +02:00
Paolo Bonzini 3898da947b KVM: avoid using rcu_dereference_protected
During teardown, accesses to memslots and buses are using
rcu_dereference_protected with an always-true condition because
these accesses are done outside the usual mutexes.  This
is because the last reference is gone and there cannot be any
concurrent modifications, but rcu_dereference_protected is
ugly and unobvious.

Instead, check the refcount in kvm_get_bus and __kvm_memslots.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:02 +02:00
Longpeng(Mike) ebd28fcb55 KVM: X86: init irq->level in kvm_pv_kick_cpu_op
'lapic_irq' is a local variable and its 'level' field isn't
initialized, so 'level' is random, it doesn't matter but
makes UBSAN unhappy:

UBSAN: Undefined behaviour in .../lapic.c:...
load of value 10 is not a valid value for type '_Bool'
...
Call Trace:
 [<ffffffff81f030b6>] dump_stack+0x1e/0x20
 [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55
 [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162
 [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm]
 [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
 [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
 [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm]
 [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel]
 [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
...

Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:01 +02:00
Wanpeng Li f4ef191086 KVM: X86: Fix loss of pending INIT due to race
When SMP VM start, AP may lost INIT because of receiving INIT between
kvm_vcpu_ioctl_x86_get/set_vcpu_events.

       vcpu 0                             vcpu 1
                                   kvm_vcpu_ioctl_x86_get_vcpu_events
                                     events->smi.latched_init = 0
  send INIT to vcpu1
    set vcpu1's pending_events
                                   kvm_vcpu_ioctl_x86_set_vcpu_events
                                      if (events->smi.latched_init == 0)
                                        clear INIT in pending_events

This patch fixes it by just update SMM related flags if we are in SMM.

Thanks Peng Hao for the report and original commit message.

Reported-by: Peng Hao <peng.hao2@zte.com.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-02 22:41:01 +02:00
Felix Kuehling 68c9793d63 drm/amdgpu: Use list_del_init in amdgpu_mn_unregister
Otherwise bo->shadow_list (which is aliased by bo->mn_list) will not
appear empty in amdgpu_ttm_bo_destroy and cause an oops when freeing
former userptr BOs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2017-08-02 14:29:58 -04:00
Jean Delvare 5694785cf0 drm/amdgpu: Fix undue fallthroughs in golden registers initialization
As I was staring at the si_init_golden_registers code, I noticed that
the Pitcairn initialization silently falls through the Cape Verde
initialization, and the Oland initialization falls through the Hainan
initialization. However there is no comment stating that this is
intentional, and the radeon driver doesn't have any such fallthrough,
so I suspect this is not supposed to happen.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: 62a3755341 ("drm/amdgpu: add si implementation v10")
Cc: Ken Wang <Qingqing.Wang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Marek Olšák" <maraeo@gmail.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Flora Cui <Flora.Cui@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2017-08-02 14:29:42 -04:00
Linus Torvalds 4d3f5d04d6 platform-drivers-x86 for v4.13-3
Fix two bugs under error or abnormal usage conditions. Correct a config
 dependency.
 
 dell-wmi:
  - Fix driver interface version query
 
 wmi:
  - Fix error handling in acpi_wmi_init()
 
 peaq-wmi:
  - select INPUT_POLLDEV
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZgfUFAAoJEKbMaAwKp364u/IIAKvumEHC/ULEY/18hUvw0/nm
 hJn479pcvVKW7huDG0PF6v6KstYgWJmY28ZZV0jnBeAlKwZA+mr5y4skbTNZRkaC
 GK2w3PeWywCiDjME9IKbrBFjnNIiFLplH4Jjotjf+1g0fTp7SE0s5CrttU8oXSne
 DKhlyo1JsN+NnamxtvJBWMXZAIYbw5qjfy9vnaTPaKFokVEYAeXfI4Jn2Ue9vw0q
 D4xgxrd/UL6/WL4erezb80jqVzsdCHhN5AKQ5PeDbTRmLDJK8VxjvYH7LPtjQS9T
 L4VU1rqUCRitNHu+Z3daqOySk3zPhZZxckJZPimBojPhKEK98qUFHZLpWCVpeXU=
 =ZTnI
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "Fix two bugs under error or abnormal usage conditions. Correct a
  config dependency:

  dell-wmi:
   - Fix driver interface version query

  wmi:
   - Fix error handling in acpi_wmi_init()

  peaq-wmi:
   - select INPUT_POLLDEV"

* tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: dell-wmi: Fix driver interface version query
  platform/x86: wmi: Fix error handling in acpi_wmi_init()
  platform/x86: peaq-wmi: select INPUT_POLLDEV
2017-08-02 09:43:28 -07:00
Stephen Boyd 5c0858f12b Allwinner clock fixes for 4.13
One critical clock fix for sun5i (A10s/A13/R8) which enables propogation
 of clock rate changes from the "cpu" clock to it's parent PLL clock.
 This fixes cpufreq related crashes that have been observed on KernelCI
 with the C.H.I.P. and multi_v7_defconfig.
 -----BEGIN PGP SIGNATURE-----
 
 iQJCBAABCgAsFiEE2nN1m/hhnkhOWjtHOJpUIZwPJDAFAlmBWB8OHHdlbnNAY3Np
 ZS5vcmcACgkQOJpUIZwPJDBGvRAAkwO1p1Aw62i5QmrGox3iyWabH/Rbon0M+tLQ
 jP+kVukWrLR8REclkpwXqSK07Blg5m3o1MVMJXBKKOD6NAKCPOJoN75K/aX44fzZ
 msrTuuzGkixovkG7p8ln253v/TOoBMwwK+QDtTOk7zBNEZuRKevKAUOglQTJYTmy
 um7cVsisv5Yr89vtG767o0UyP82TUbnga6yLYY+72PbEKP1MnJtR0gzQp6Y9zxmW
 Oyt7MaHH7km6SDGrQUWgQ8SvZ36s3uLxPfrtPx0zbTE1sM31sgdHf4kZmciP7Yap
 eiLoHxV+twVxrgYh/Fl7BxOMfSqdIpZrzL3z2N3Hc53oARjOSl9c3N12UVzC0D0K
 lRldFpmW26U21xMp22033j6uW9daU2PmyybNyJu892DRfY8RcX2N7otwIgAtxEtY
 SKvVaG/uin2WVXs5FsxbpBUOm5xEJ04547TXuEegGjKyEv8+d1lCO6yU8i6xdUE0
 qBzNm5rNuuSTD5zKu41YpEWhMWxEFEJE+Cx/tsCOhaySDuKKOrV7Bbd15ec3PjRD
 llv2TvJNiBAkm4pakOsLZcDKk6RWqBEoN+6GXuC/66Dm+k+rR0nsRthDk418pDF2
 T8YgG1p5wETdh/DfnUnwg63xN/dVs4On8IJkGunznTTGMxHiYduQsmqAxkgaRq6h
 k4insAE=
 =uoCy
 -----END PGP SIGNATURE-----

Merge tag 'sunxi-clk-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into clk-fixes

Pull one Allwinner clock fix from Chen-Yu Tsai:

One critical clock fix for sun5i (A10s/A13/R8) which enables propagation
of clock rate changes from the "cpu" clock to it's parent PLL clock.
This fixes cpufreq related crashes that have been observed on KernelCI
with the C.H.I.P. and multi_v7_defconfig.

* tag 'sunxi-clk-fixes-for-4.13' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  clk: sunxi-ng: sun5i: Add clk_set_rate_parent to the CPU clock
2017-08-02 09:11:44 -07:00
Stephen Boyd b48d7c5b89 Meson Clock fixes:
- mpll: fix mpll0 fractional part ignored
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZgHG0AAoJEHfc29rIyEnRKCwQAIoujlfO+NULaxl7JgGagZDy
 nJ1FU1kShDtm/pgWbW52cC92zgCJl5PGihxnwPVdJWd59TNKjPoREr1m5S8mWXPv
 IOVVoDg5VRfLVhwcuETi0aiXUmFjRQusgjM9NXyZpTcx2MtPY2CB92OMoaiITJgm
 a85tsfpptCQqAfixYsp8IqiHMnbLRj2Ntbf5+bHTcHrUnLmUUIJh6Fa7+16f8KJo
 JaYQ95X1c1ems4ECrYFINCgfNpi659deIum/UPZKysX0f8c78k0ANV9sMAyBazs8
 dhEdOWaGAohzyGJZg+btXLLnaRNGqVExP8qZ2X3+11Ob9r3VS95+sN7X19x+4q7C
 jKxgreD4jR3vv5ayXv/H15GK/y6xdDyO7vTkiMA9km8o5FK1Q/NYAlEbJx0Sr/v6
 Z3Dvp1IDdTNCzVa4ASmAyqOTOsx8SSqp6CdlpoQrutSrbnQsVZ8hleYrxjrGbsKC
 StB1g9JQEqOurOzMd50CdreKghCtQeI3TegxEKmhm++Q9sBD7KVhOpPw9ccty181
 GaEdfcdscKouT90c4tog3EwFihjoi2oidlU9PioJTIfeBfAE7q8P4ozMjzP7DCVd
 rSg9thk20oZYnep9UHF0NzM8JXv1dKcPTw44dkGRtj4JL9F0Grv8xFwgvXHG9SMU
 sRcgc13O8rNF6jjUQeJn
 =sv+n
 -----END PGP SIGNATURE-----

Merge tag 'meson-clk-fixes-for-4.13-rc4-v2' of git://github.com/baylibre/clk-meson into clk-fixes

Pull one Meson clock fix from Neil Armstrong

* tag 'meson-clk-fixes-for-4.13-rc4-v2' of git://github.com/baylibre/clk-meson:
  clk: meson: mpll: fix mpll0 fractional part ignored
2017-08-02 09:09:42 -07:00
Linus Torvalds 33611ba0fe SCSI fixes on 20170802
This seven is mostly minor build, Kconfig and error leg fixes.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZgeACAAoJEAVr7HOZEZN4vxwP/0GZc1EWDubS8DZZOr4X25WA
 jofQFCrWYf9jBcCA64pv7kF5wCbQk0MFXMvs2u5YHGMV8U4VyquNT2zEYm+R5ZD8
 C3sIdgZyxb7x+KkUJ5NxvDBIzAQuQcPEUPZspgi809veoATQ28qoMO1NkOJzbJGE
 nUs/6TZcpA5xGdqXtI3G6IsbvuAFO1hmI2aIUAj5UVKV+B9ILGsxWHPzYPErgtra
 afTJ8jO4vYW/wcyY7/DG/V8UAvU/0pe9NZqgt0Gn+XiSq5C9HnqS9+BWjU9nx98g
 EiPuvP6UBq7zGmjFQdUTwYuysbPDwyVfB1l7RJFVeBTPj6sINFrqTQPU8qwy96S8
 gxfObnfbEkx4zJmv4iP8HtAgRQMVsVN64ICD6oeS4JQWmXoz90S0vCLwrTQv36ll
 UHmBaSKMqPHTt+emEmup3rcOBU+4PIVq40jQjUyBCJXd/kg5GUz7Og++/NHX8JDZ
 qhVJ/VhkBacnDD9fZLQ2cMfcB/E5pUmEgx3MycZty8pXx0AkttWyB0lIddln2yiL
 qpZwbJ4nbhS7TO1/ScZlHEqp/rVY3aaIvLovWrhrmy4N1KBk41EmSxCokxkv+Kmd
 Twy35eHLDT9Hn0emT+y12Ul5y0cTxy1JMOcMf6LLXAHQsKgg2khm3JfHcXC7H4mO
 H/FpxwQT7z47yU3FF2mZ
 =/+Ot
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "These seven patches are mostly minor build, Kconfig and error leg
  fixes"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: qedi: Fix return code in qedi_ep_connect()
  scsi: lpfc: fix linking against modular NVMe support
  scsi: scsi_transport_fc: return -EBUSY for deleted vport
  scsi: libcxgbi: add check for valid cxgbi_task_data
  scsi: aic7xxx: fix firmware build with O=path
  scsi: megaraid_sas: fix memleak in megasas_alloc_cmdlist_fusion
  scsi: qedi: Add ISCSI_BOOT_SYSFS to Kconfig
2017-08-02 08:43:19 -07:00
Takashi Iwai 5ef26e966d ASoC: Fixes for v4.13
Quite a few fixes here that have been sent since the merge window, the
 biggest one is the fix from Tony for some confusion with the device
 property API which was causing issues with the of-graph card.  This is
 fixed with some changes in the graph API itself as it seemed very likely
 to be error prone.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAlmB51ATHGJyb29uaWVA
 a2VybmVsLm9yZwAKCRAk1otyXVSH0G39B/90DuMEckKIUVTPyIpyHetKl46eFRDJ
 9t3EnovAZJX6zZ7bdlkcCg33wGmcIkn9PFfLpPcnbxYeKP0xGah/GGKec1K/aeTn
 FiD2LjmO5Cc0adk5tEwqiVrL0cOVrUdXuMpvhoMQqD1EBzsgDQtUpG6EL8SGbrtd
 dfSLFDrYHIBO2b3iZOZPJvqMPnh9jJ4zEyBwVpQGOCoQdD/AP5S84McxwGhlJfl4
 UqAY8keZYlW/Gl1DSnc8BDDCuLrexdCEUAR/0Vz3QFQk+sSnKVlkE+XXo0L4AVZl
 XoDGjB5DL8FjXPG7yKO0rAytFFFzSJCWwBRnESAIGKz9o5HIUfOZtE8Q
 =10Px
 -----END PGP SIGNATURE-----

Merge tag 'asoc-fix-v4.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v4.13

Quite a few fixes here that have been sent since the merge window, the
biggest one is the fix from Tony for some confusion with the device
property API which was causing issues with the of-graph card.  This is
fixed with some changes in the graph API itself as it seemed very likely
to be error prone.
2017-08-02 17:11:45 +02:00
Gregory CLEMENT d7a65c4905 ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge
The number of pins in South Bridge is 30 and not 29. There is a fix for
the driver for the pinctrl, but a fix is also need at device tree level
for the GPIO.

Fixes: afda007fed ("ARM64: dts: marvell: Add pinctrl nodes for Armada
3700")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
2017-08-02 16:00:00 +02:00
Trond Myklebust d9cb73300a NFSv4: Fix double frees in nfs4_test_session_trunk()
rpc_clnt_add_xprt() expects the callback function to be synchronous, and
expects to release the transport and switch references itself.

Fixes: 04fa2c6bb5 ("NFS pnfs data server multipath session trunking")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-02 09:45:55 -04:00
Sergei A. Trusov 3f3c371421 ALSA: hda - Fix speaker output from VAIO VPCL14M1R
Sony VAIO VPCL14M1R needs the quirk to make the speaker working properly.

Tested-by: Dmitriy <mexx400@yandex.ru>
Cc: <stable@vger.kernel.org>
Signed-off-by: Sergei A. Trusov <sergei.a.trusov@ya.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2017-08-02 14:18:49 +02:00
Sergei Shtylyov f29bb7861a powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
of_irq_to_resource() has recently been fixed to return negative error #'s
along with 0 in case of failure, however the Freescale MPC832x RDB board
code still only regards 0 as a failure indication -- fix it up.

Fixes: 7a4228bbff ("of: irq: use of_irq_get() in of_irq_to_resource()")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Scott Wood <oss@buserror.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-08-02 19:26:52 +10:00
Michał Mirosław 9e9509e38f gpio: tegra: fix unbalanced chained_irq_enter/exit
When more than one GPIO IRQs are triggered simultaneously,
tegra_gpio_irq_handler() called chained_irq_exit() multiple
times for one chained_irq_enter().

Fixes: 3c92db9ac0
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
[Also changed the variable to a bool]
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2017-08-02 10:42:38 +02:00
Andy Lutomirski 51391caf99 platform/x86: dell-wmi: Fix driver interface version query
When I converted dell-wmi to the new bus infrastructure, I left the
call to dell_wmi_check_descriptor_buffer() in dell_wmi_init().  This
could cause two problems:

 - An error message when loading the driver on a system without
   dell-wmi.  We'd try to read the event descriptor even if the WMI
   GUID wasn't there.

 - A possible race if dell-wmi was loaded manually before wmi was
   fully initialized.

Fix it by moving the call to the probe function where it belongs.

Fixes: bff589be59 ("platform/x86: dell-wmi: Convert to the WMI bus infrastructure")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2017-08-01 15:41:43 -07:00
Trond Myklebust fd40559c86 NFSv4: Fix EXCHANGE_ID corrupt verifier issue
The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
changed to be asynchronous by commit 8d89bd70bc. If we interrrupt
the call to rpc_wait_for_completion_task(), we can therefore end up
transmitting random stack contents in lieu of the verifier.

Fixes: 8d89bd70bc ("NFS setup async exchange_id")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-08-01 16:28:55 -04:00
Wanpeng Li 337c017ccd KVM: async_pf: make rcu irq exit if not triggered from idle task
WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
 CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
 RIP: 0010:rcu_note_context_switch+0x207/0x6b0
 Call Trace:
  __schedule+0xda/0xba0
  ? kvm_async_pf_task_wait+0x1b2/0x270
  schedule+0x40/0x90
  kvm_async_pf_task_wait+0x1cc/0x270
  ? prepare_to_swait+0x22/0x70
  do_async_page_fault+0x77/0xb0
  ? do_async_page_fault+0x77/0xb0
  async_page_fault+0x28/0x30
 RIP: 0010:__d_lookup_rcu+0x90/0x1e0

I encounter this when trying to stress the async page fault in L1 guest w/
L2 guests running.

Commit 9b132fbe54 (Add rcu user eqs exception hooks for async page
fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
idle eqs when needed, to protect the code that needs use rcu.  However,
we need to call the pair even if the function calls schedule(), as seen
from the above backtrace.

This patch fixes it by informing the RCU subsystem exit/enter the irq
towards/away from idle for both n.halted and !n.halted.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-01 22:24:18 +02:00
Paolo Bonzini b96fb43977 KVM: nVMX: fixes to nested virt interrupt injection
There are three issues in nested_vmx_check_exception:

1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
by Wanpeng Li;

2) it should rebuild the interruption info and exit qualification fields
from scratch, as reported by Jim Mattson, because the values from the
L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
a page fault, the EPT misconfig's exit qualification is incorrect).

3) CR2 and DR6 should not be written for exception intercept vmexits
(CR2 only for AMD).

This patch fixes the first two and adds a comment about the last,
outlining the fix.

Cc: Jim Mattson <jmattson@google.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-08-01 22:24:17 +02:00
Paolo Bonzini 7313c69805 KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12
Do this in the caller of nested_vmx_vmexit instead.

nested_vmx_check_exception was doing a vmwrite to the vmcs02's
VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
the field to vmcs12->vm_exit_intr_error_code.  However that isn't
possible on pre-Haswell machines.  Moving the vmcs12 write to the
callers fixes it.

Reported-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
 thanks to fengguang.wu@intel.com]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
2017-08-01 22:23:25 +02:00
Linus Torvalds 26c5cebfdb Merge branch 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parsic fixes from Helge Deller:

 - Our cache flushing code ran into a BUG in case context is not
   current. Fix it by flushing the whole cache in such rare situations
   (by Dave Anglin).

 - Fix a "sleeping function called from invalid context BUG" in our
   pdc_stable driver by rearranging our locks (by James Bottomley)

 - The thread and irq stacks require more than 16 KB since kernel 4.11.
   Increase both to 32 KB.

 - Define CONFIG_CPU_BIG_ENDIAN unconditionally on parisc to avoid wrong
   behaviour in qrwlock functions (by Babu Moger).

* 'parisc-4.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Define CONFIG_CPU_BIG_ENDIAN
  parisc: pdc_stable: Fix locking when creating sysfs links
  parisc: Increase thread and stack size to 32kb
  parisc: Handle vma's whose context is not current in flush_cache_range
2017-08-01 13:20:24 -07:00