Commit Graph

146234 Commits

Author SHA1 Message Date
Linus Torvalds b9abdcfd10 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Assorted fixes.

  Some of that is only a matter with fault injection (broken handling of
  small allocation failure in various mount-related places), but the
  last one is a root-triggerable stack overflow, and combined with
  userns it gets really nasty ;-/"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Don't leak MNT_INTERNAL away from internal mounts
  mm,vmscan: Allow preallocating memory for register_shrinker().
  rpc_pipefs: fix double-dput()
  orangefs_kill_sb(): deal with allocation failures
  jffs2_kill_sb(): deal with failed allocations
  hypfs_kill_super(): deal with failed allocations
2018-04-20 09:15:14 -07:00
Marc Zyngier 85bd0ba1ff arm/arm64: KVM: Add PSCI version selection API
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1
or 1.0 to a guest, defaulting to the latest version of the PSCI
implementation that is compatible with the requested version. This is
no different from doing a firmware upgrade on KVM.

But in order to give a chance to hypothetical badly implemented guests
that would have a fit by discovering something other than PSCI 0.2,
let's provide a new API that allows userspace to pick one particular
version of the API.

This is implemented as a new class of "firmware" registers, where
we expose the PSCI version. This allows the PSCI version to be
save/restored as part of a guest migration, and also set to
any supported version if the guest requires it.

Cc: stable@vger.kernel.org #4.16
Reviewed-by: Christoffer Dall <cdall@kernel.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-20 16:32:23 +01:00
Linus Torvalds 854da23875 MIPS fixes for 4.17-rc2
Some MIPS fixes for 4.17:
 
  - io: Add barriers to read*() & write*()
 
  - dts: Fix boston PCI bus DTC warnings (4.17)
 
  - memset: Several corner case fixes (one 3.10, others longer)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlrZ9EUACgkQbAtpk944
 dnor8w/9EildyR3SOF4wGjfxQROnDnDzTdXSFEqldiVg7w6f9HmTtGLcHvP3i91t
 LqtSBslRk91qoj9/3Lop7Q4Ik5lnWOyhBzww4EienUr5DIHedV0G59aa7IzH8cZV
 Zke0s014z0wSZb1BcByKLc5ewpXVV1uaARxCp7jphh1WJZB4BaiISh35qn06v3Op
 xKQ1vNNDwyc/9Z/KxoR522ujmEiguzd/LTEfFmhA1Njy8gYbyciXWBEEeccQ9fOq
 kr0P2wg37+sqEvFkmKWnUO8bQq8M3J12hPzOuZRcvtqAcsFduRT1lIDrvbcKzRjD
 Eesuuh4p6+nFkpPnQw8rTfe5hnDN9f2l7FWSzMmOY02pwgG50uYIl2LDg9f4IG5o
 h89opjmHMYRHws+yeFsCkmDswMjauHuWI3ZyIa3xJ07OvZfO4LhRwq9oTn5B6iJl
 ymDV77Z73mXGy7MOZ56miT68+vTfAG3lxHe+Bnflr0IJ7xiDNqbkrvY8eMUQFAMY
 TCpv0MF3Se9TB9xqmQ2RuCVd1iGn/88m3AXcNlIDwAIcxAb2vikUDHS2rSEgQAo/
 7er9gZE8NnKo0qyQZustJDN+iLCyXoBbHpy8ZxJ89AYfDwLPyK3XmALjt7T7dZzn
 326fXY4hy0EKuXsTX2YIHrzhWSBbR0tH8xx2aqu3AtrXY7K/Hcc=
 =TGOs
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS fixes from James Hogan:

 - io: Add barriers to read*() & write*()

 - dts: Fix boston PCI bus DTC warnings (4.17)

 - memset: Several corner case fixes (one 3.10, others longer)

* tag 'mips_fixes_4.17_1' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
  MIPS: uaccess: Add micromips clobbers to bzero invocation
  MIPS: memset.S: Fix clobber of v1 in last_fixup
  MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
  MIPS: memset.S: EVA & fault support for small_memset
  MIPS: dts: Boston: Fix PCI bus dtc warnings:
  MIPS: io: Add barrier after register read in readX()
  MIPS: io: Prevent compiler reordering writeX()
2018-04-20 08:25:31 -07:00
Linus Torvalds d08de37b8c powerpc fixes for 4.17 #3
Fix an off-by-one bug in our alternative asm patching which leads to incorrectly
 patched code. This bug lay dormant for nearly 10 years but we finally hit it
 due to a recent change.
 
 Fix lockups when running KVM guests on Power8 due to a missing check when a
 thread that's running KVM comes out of idle.
 
 Fix an out-of-spec behaviour in the XIVE code (P9 interrupt controller).
 
 Fix EEH handling of bridge MMIO windows.
 
 Prevent crashes in our RFI fallback flush handler if firmware didn't tell us the
 size of the L1 cache (only seen on simulators).
 
 Thanks to:
   Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJa2eo9ExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCb
 AxAAqXGKeXg1ZJw+OHR8pteDYLGylVTRhWZykO3rYkJ7DXtcb9Qr17VOyitVZ93y
 uNbu8Nm4KMqKeOYijqyXjK9kjVx47K3V9p95CBDy5yA4hGrV+I/YIZUqANwRZNat
 1Bn+eGlyLBGNRlpZ/CnFTm/W/LH0fDJV8tmecXmegGRwcHU7M0TT8t9XgRT1iX04
 2uUhBFCltwSmLiFJTOth8VbppTCzf2mZd1GsBM1lM/pwCd5j1LZO9lHoY0wTEddj
 v+lsCzp2IGO0G8xPFEg+jDa/dOZ1y86L8feir7/TRkwe73EesSux7PN7tWpWA1dE
 iDJ4fdHiBL/PIB6XjFcCUsED2MaLO/01DpqRfM79HSXlxtQQBN1HJYmIALowwFLm
 UTGOEkXTv/V43TQrAAcFSMZJ6A+r8qsfkA+ROuohG5mTr8xhJ4yLh7YbptSZOWZk
 zwC58UAFEhKZHrZO3Vt8DFR0NSbkURfDLEgCszheBESqRh3HmiLYxo7UrBvfkL8G
 rXUSbH7i+4jsr7ehdkoEAoGvq0f9jSmBxQ0h4UmZW1TyTL6Q/EdRbwA5PBtgin6m
 x/viq3Im9l0IRrMlNUT/WhjMwzPbp1GaTiiMjCaxEfCQhoP1Dfb94lsL4CXaXfHa
 rNzXmNRRM2fDAVINYLOK39UjZdnrONwxznI6KmBWkvHJLYA=
 =Ep0t
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix an off-by-one bug in our alternative asm patching which leads to
   incorrectly patched code. This bug lay dormant for nearly 10 years
   but we finally hit it due to a recent change.

 - Fix lockups when running KVM guests on Power8 due to a missing check
   when a thread that's running KVM comes out of idle.

 - Fix an out-of-spec behaviour in the XIVE code (P9 interrupt
   controller).

 - Fix EEH handling of bridge MMIO windows.

 - Prevent crashes in our RFI fallback flush handler if firmware didn't
   tell us the size of the L1 cache (only seen on simulators).

Thanks to: Benjamin Herrenschmidt, Madhavan Srinivasan, Michael Neuling.

* tag 'powerpc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kvm: Fix lockups when running KVM guests on Power8
  powerpc/eeh: Fix enabling bridge MMIO windows
  powerpc/xive: Fix trying to "push" an already active pool VP
  powerpc/64s: Default l1d_size to 64K in RFI fallback flush
  powerpc/lib: Fix off-by-one in alternate feature patching
2018-04-20 08:23:30 -07:00
Linus Torvalds c2d94c5214 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes and kexec-file-load from Martin Schwidefsky:
 "After the common code kexec patches went in via Andrew we can now push
  the architecture parts to implement the kexec-file-load system call.

  Plus a few more bug fixes and cleanups, this includes an update to the
  default configurations"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/signal: cleanup uapi struct sigaction
  s390: rename default_defconfig to debug_defconfig
  s390: remove gcov defconfig
  s390: update defconfig
  s390: add support for IBM z14 Model ZR1
  s390: remove couple of duplicate includes
  s390/boot: remove unused COMPILE_VERSION and ccflags-y
  s390/nospec: include cpu.h
  s390/decompressor: Ignore file vmlinux.bin.full
  s390/kexec_file: add generated files to .gitignore
  s390/Kconfig: Move kexec config options to "Processor type and features"
  s390/kexec_file: Add ELF loader
  s390/kexec_file: Add crash support to image loader
  s390/kexec_file: Add image loader
  s390/kexec_file: Add kexec_file_load system call
  s390/kexec_file: Add purgatory
  s390/kexec_file: Prepare setup.h for kexec_file_load
  s390/smsgiucv: disable SMSG on module unload
  s390/sclp: avoid potential usage of uninitialized value
2018-04-20 08:01:38 -07:00
Oskar Senft 15a3e845b0 perf/x86/intel/uncore: Fix SBOX support for Broadwell CPUs
SBOX on some Broadwell CPUs is broken because it's enabled unconditionally
despite the fact that there are no SBOXes available.

Check the Power Control Unit CAPID4 register to determine the number of
available SBOXes on the particular CPU before trying to enable them. If
there are none, nullify the SBOX descriptor so it isn't tried to be
initialized.

Signed-off-by: Oskar Senft <osk@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mark van Dijk <mark@voidzero.net>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1521810690-2576-2-git-send-email-kan.liang@linux.intel.com
2018-04-20 13:17:50 +02:00
Stephane Eranian d7717587ac perf/x86/intel/uncore: Revert "Remove SBOX support for Broadwell server"
This reverts commit 3b94a89166 ("perf/x86/intel/uncore: Remove
SBOX support for Broadwell server")

Revert because there exists a proper workaround for Broadwell-EP servers
without SBOX now. Note that BDX-DE does not have a SBOX.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kan Liang <kan.liang@intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: ak@linux.intel.com
Cc: osk@google.com
Cc: mark@voidzero.net
Link: https://lkml.kernel.org/r/1521810690-2576-1-git-send-email-kan.liang@linux.intel.com
2018-04-20 12:41:17 +02:00
Joerg Roedel 05189820da x86/power/64: Fix page-table setup for temporary text mapping
On a system with 4-level page-tables there is no p4d, so the pud in the pgd
should be mapped. The old code before commit fb43d6cb91 already did that.

The change from above commit causes an invalid page-table which causes
undefined behavior. In one report it caused triple faults.

Fix it by changing the p4d back to pud.

Fixes: fb43d6cb91 ('x86/mm: Do not auto-massage page protections')
Reported-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Tested-by: Borislav Petkov <bp@suse.de>
Cc: linux-pm@vger.kernel.org
Cc: rjw@rjwysocki.net
Cc: pavel@ucw.cz
Cc: hpa@zytor.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/1524162360-26179-1-git-send-email-joro@8bytes.org
2018-04-20 11:52:00 +02:00
Tony Lindgren fb289e3ab1 Merge branch 'omap-for-v4.17/fixes-ti-sysc' into omap-for-v4.17/fixes 2018-04-19 15:48:46 -07:00
Laura Abbott eb0b4aa89c x86/xen: Remove use of VLAs
There's an ongoing effort to remove VLAs[1] from the kernel to eventually
turn on -Wvla. It turns out, the few VLAs in use in Xen produce only a
single entry array that is always bounded by GDT_SIZE. Clean up the code to
get rid of the VLA and the loop.

[1] https://lkml.org/lkml/2018/3/7/621

Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

[boris: Use BUG_ON(size>PAGE_SIZE) instead of GDT_SIZE]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2018-04-19 16:35:51 -04:00
Michael Ellerman 56376c5864 powerpc/kvm: Fix lockups when running KVM guests on Power8
When running KVM guests on Power8 we can see a lockup where one CPU
stops responding. This often leads to a message such as:

  watchdog: CPU 136 detected hard LOCKUP on other CPUs 72
  Task dump for CPU 72:
  qemu-system-ppc R  running task    10560 20917  20908 0x00040004

And then backtraces on other CPUs, such as:

  Task dump for CPU 48:
  ksmd            R  running task    10032  1519      2 0x00000804
  Call Trace:
    ...
    --- interrupt: 901 at smp_call_function_many+0x3c8/0x460
        LR = smp_call_function_many+0x37c/0x460
    pmdp_invalidate+0x100/0x1b0
    __split_huge_pmd+0x52c/0xdb0
    try_to_unmap_one+0x764/0x8b0
    rmap_walk_anon+0x15c/0x370
    try_to_unmap+0xb4/0x170
    split_huge_page_to_list+0x148/0xa30
    try_to_merge_one_page+0xc8/0x990
    try_to_merge_with_ksm_page+0x74/0xf0
    ksm_scan_thread+0x10ec/0x1ac0
    kthread+0x160/0x1a0
    ret_from_kernel_thread+0x5c/0x78

This is caused by commit 8c1c7fb0b5 ("powerpc/64s/idle: avoid sync
for KVM state when waking from idle"), which added a check in
pnv_powersave_wakeup() to see if the kvm_hstate.hwthread_state is
already set to KVM_HWTHREAD_IN_KERNEL, and if so to skip the store and
test of kvm_hstate.hwthread_req.

The problem is that the primary does not set KVM_HWTHREAD_IN_KVM when
entering the guest, so it can then come out to cede with
KVM_HWTHREAD_IN_KERNEL set. It can then go idle in kvm_do_nap after
setting hwthread_req to 1, but because hwthread_state is still
KVM_HWTHREAD_IN_KERNEL we will skip the test of hwthread_req when we
wake up from idle and won't go to kvm_start_guest. From there the
thread will return somewhere garbage and crash.

Fix it by skipping the store of hwthread_state, but not the test of
hwthread_req, when coming out of idle. It's OK to skip the sync in
that case because hwthread_req will have been set on the same thread,
so there is no synchronisation required.

Fixes: 8c1c7fb0b5 ("powerpc/64s/idle: avoid sync for KVM state when waking from idle")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19 16:22:20 +10:00
Michael Neuling 13a83eac37 powerpc/eeh: Fix enabling bridge MMIO windows
On boot we save the configuration space of PCIe bridges. We do this so
when we get an EEH event and everything gets reset that we can restore
them.

Unfortunately we save this state before we've enabled the MMIO space
on the bridges. Hence if we have to reset the bridge when we come back
MMIO is not enabled and we end up taking an PE freeze when the driver
starts accessing again.

This patch forces the memory/MMIO and bus mastering on when restoring
bridges on EEH. Ideally we'd do this correctly by saving the
configuration space writes later, but that will have to come later in
a larger EEH rewrite. For now we have this simple fix.

The original bug can be triggered on a boston machine by doing:
  echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
On boston, this PHB has a PCIe switch on it.  Without this patch,
you'll see two EEH events, 1 expected and 1 the failure we are fixing
here. The second EEH event causes the anything under the PHB to
disappear (i.e. the i40e eth).

With this patch, only 1 EEH event occurs and devices properly recover.

Fixes: 652defed48 ("powerpc/eeh: Check PCIe link after reset")
Cc: stable@vger.kernel.org # v3.11+
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19 13:02:38 +10:00
Matt Redfearn b3d7e55c3f
MIPS: uaccess: Add micromips clobbers to bzero invocation
The micromips implementation of bzero additionally clobbers registers t7
& t8. Specify this in the clobbers list when invoking bzero.

Fixes: 26c5e07d14 ("MIPS: microMIPS: Optimise 'memset' core library function.")
Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: <stable@vger.kernel.org> # 3.10+
Patchwork: https://patchwork.linux-mips.org/patch/19110/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-18 22:02:29 +01:00
Matt Redfearn c96eebf076
MIPS: memset.S: Fix clobber of v1 in last_fixup
The label .Llast_fixup\@ is jumped to on page fault within the final
byte set loop of memset (on < MIPSR6 architectures). For some reason, in
this fault handler, the v1 register is randomly set to a2 & STORMASK.
This clobbers v1 for the calling function. This can be observed with the
following test code:

static int __init __attribute__((optimize("O0"))) test_clear_user(void)
{
  register int t asm("v1");
  char *test;
  int j, k;

  pr_info("\n\n\nTesting clear_user\n");
  test = vmalloc(PAGE_SIZE);

  for (j = 256; j < 512; j++) {
    t = 0xa5a5a5a5;
    if ((k = clear_user(test + PAGE_SIZE - 256, j)) != j - 256) {
        pr_err("clear_user (%px %d) returned %d\n", test + PAGE_SIZE - 256, j, k);
    }
    if (t != 0xa5a5a5a5) {
       pr_err("v1 was clobbered to 0x%x!\n", t);
    }
  }

  return 0;
}
late_initcall(test_clear_user);

Which demonstrates that v1 is indeed clobbered (MIPS64):

Testing clear_user
v1 was clobbered to 0x1!
v1 was clobbered to 0x2!
v1 was clobbered to 0x3!
v1 was clobbered to 0x4!
v1 was clobbered to 0x5!
v1 was clobbered to 0x6!
v1 was clobbered to 0x7!

Since the number of bytes that could not be set is already contained in
a2, the andi placing a value in v1 is not necessary and actively
harmful in clobbering v1.

Reported-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19109/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-18 21:57:29 +01:00
Srinath Mannam 4555a5021f arm64: dts: correct SATA addresses for Stingray
Correct all SATA ahci and phy controller register
addresses and interrupt lines to proper values.

Fixes: 344a2e5141 ("arm64: dts: Add SATA DT nodes for Stingray SoC")

Signed-off-by: Srinath Mannam <srinath.mannam@broadcom.com>
Reviewed-by: Ray Jui <ray.jui@broadcom.com>
Reviewed-by: Scott Branden <scott.branden@broadcom.com>
Reviewed-by: Andrew Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
2018-04-18 11:31:16 -07:00
Martin Blumenstingl 4b7b0d7b25 ARM64: dts: meson-gxm-khadas-vim2: enable the USB controller
The Khadas VIM2 board connects the dwc3 controller to an internal 4-port
USB hub which. Two of these ports are accessible directly soldered to
the board, while the other two are accessible through the 40-pin "GPIO"
header.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl 55ef32249b ARM64: dts: meson-gxl-nexbox-a95x: enable the USB controller
The Nexbox A95X provides two USB ports. Enable the SoC's USB controller
on this board to make these USB ports usable.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl b83687f359 ARM64: dts: meson-gxl-s905x-libretech-cc: enable the USB controller
The LibreTech CC ("Le Potato") board provides four USB connectors.
These are provided by a hub which is connected to the SoC's USB
controller.
Enable the SoC's USB controller to make the USB ports usable. Also turn
on the HDMI_5V regulator when powering on the PHY because (even though
it's not shown in the schematics) HDMI_5V also supplies the USB VBUS.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl 972cd12a02 ARM64: dts: meson-gx-p23x-q20x: enable the USB controller
All S905D (GXL) and S912 (GXM) reference boards (namely these are
P230, P231, Q200 and Q201) provide USB connectors.
This enables the USB controller on these boards to make the USB ports
actually usable.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl b9f07cb4f4 ARM64: dts: meson-gxl-s905x-p212: enable the USB controller
All boards based on the P212 reference design (the P212 reference board
itself and the Khadas VIM) have USB connectors (in case of the Khadas
VIM the first port is exposed through the USB Type-C connector, the
second port is connected to a 4-port USB hub).
This enables the USB controller on these boards to make the USB ports
actually usable.

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl 458baa95c8 ARM64: dts: meson-gxm: add GXM specific USB host configuration
The USB configuration on GXM is slightly different than on GXL. The dwc3
controller's internal hub has three USB2 ports (instead of 2 on GXL)
along with a dedicated USB2 PHY for this port. However, it seems that
there are no pins on GXM which would allow connecting the third port to
a physical USB port.
Passing the third PHY is required though, because without it none of the
other USB ports is working (this seems to be a limitation of how the
internal USB hub works, if one PHY is disabled then no USB port works).

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Martin Blumenstingl 8aec5fc1d4 ARM64: dts: meson-gxl: add USB host support
This adds USB host support to the Meson GXL SoC. A dwc3 controller is
used for host-mode, while a dwc2 controller (not added in this patch
because I could not get it working) is used for device-mode only.

The dwc3 controller's internal roothub has two USB2 ports enabled but no
USB3 port. Each of the ports is supplied by a separate PHY. The USB pins
are connected to the SoC's USBHOST_A and USBOTG_B pins.
Due to the way the roothub works internally the USB PHYs are left
enabled. When the dwc3 controller is disabled the PHY is never powered on
so it does not draw any extra power. However, when the dwc3 host
controller is enabled then all PHYs also have to be enabled, otherwise
USB devices will not be detected (regardless of whether they are plugged
into an enabled port or not). This means that only the dwc3 controller
has to be enabled on boards with USB support (instead of requiring all
boards to enable the PHYs additionally with the chance of forgetting to
enable one and breaking all other ports with that as well).

This also adds the USB3 PHY which currently only does some basic
initialization. That however is required because without it high-speed
devices (like USB thumb drives) do not work on some devices (probably
because the bootloader does not configure the USB3 PHY registers).

Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Signed-off-by: Kevin Hilman <khilman@baylibre.com>
2018-04-18 10:24:34 -07:00
Dave Gerlach 5692fceebe ARM: OMAP2+: Fix build when using split object directories
The sleep33xx and sleep43xx files should not depend on a header file
generated in drivers/memory. Remove this dependency and instead allow
both drivers/memory and arch/arm/mach-omap2 to generate all macros
needed in headers local to their own paths.

This fixes an issue where the build fail will when using O= to set a
split object directory and arch/arm/mach-omap2 is built before
drivers/memory with the following error:

.../drivers/memory/emif-asm-offsets.c:1:0: fatal error: can't open
drivers/memory/emif-asm-offsets.s for writing: No such file or directory
compilation terminated.

Fixes: 41d9d44d72 ("ARM: OMAP2+: pm33xx-core: Add platform code needed for PM")
Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
Acked-by: Santosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2018-04-18 10:07:13 -07:00
Benjamin Herrenschmidt b32e56e5a8 powerpc/xive: Fix trying to "push" an already active pool VP
When setting up a CPU, we "push" (activate) a pool VP for it.

However it's an error to do so if it already has an active
pool VP.

This happens when doing soft CPU hotplug on powernv since we
don't tear down the CPU on unplug. The HW flags the error which
gets captured by the diagnostics.

Fix this by making sure to "pull" out any already active pool
first.

Fixes: 243e25112d ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-19 00:49:45 +10:00
Helge Deller 41dbee81c8 parisc: Document rules regarding checksum of HPMC handler
Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-18 16:17:13 +02:00
Mark Rutland b2d71b3cda arm64: signal: don't force known signals to SIGKILL
Since commit:

  a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")

... any signal which is not SIGKILL will be upgraded to a SIGKILL be
force_signal_inject(). This includes signals we do expect, such as
SIGILL triggered by do_undefinstr().

Fix the check to use a logical AND rather than a logical OR, permitting
signals whose layout is SIL_FAULT.

Fixes: a7e6f1ca90 ("arm64: signal: Force SIGKILL for unknown signals in force_signal_inject")
Cc: Will Deacon <will.deacon@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-04-18 15:13:27 +01:00
Helge Deller 89e050c87d parisc: Make bzImage default build target
Debian uses "make all" to build the Linux kernel, thus to be able to use
the self-decompressing kernel as default debian kernel we need to make
bzImage the default build target.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-18 15:44:06 +02:00
Matt Redfearn daf70d89f8
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
The __clear_user function is defined to return the number of bytes that
could not be cleared. From the underlying memset / bzero implementation
this means setting register a2 to that number on return. Currently if a
page fault is triggered within the memset_partial block, the value
loaded into a2 on return is meaningless.

The label .Lpartial_fixup\@ is jumped to on page fault. In order to work
out how many bytes failed to copy, the exception handler should find how
many bytes left in the partial block (andi a2, STORMASK), add that to
the partial block end address (a2), and subtract the faulting address to
get the remainder. Currently it incorrectly subtracts the partial block
start address (t1), which has additionally been clobbered to generate a
jump target in memset_partial. Fix this by adding the block end address
instead.

This issue was found with the following test code:
      int j, k;
      for (j = 0; j < 512; j++) {
        if ((k = clear_user(NULL, j)) != j) {
           pr_err("clear_user (NULL %d) returned %d\n", j, k);
        }
      }
Which now passes on Creator Ci40 (MIPS32) and Cavium Octeon II (MIPS64).

Suggested-by: James Hogan <jhogan@kernel.org>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19108/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-17 16:17:23 +01:00
Mark Rutland 800cb2e553 arm64: kasan: avoid pfn_to_nid() before page array is initialized
In arm64's kasan_init(), we use pfn_to_nid() to find the NUMA node a
span of memory is in, hoping to allocate shadow from the same NUMA node.
However, at this point, the page array has not been initialized, and
thus this is bogus.

Since commit:

  f165b378bb ("mm: uninitialized struct page poisoning sanity")

... accessing fields of the page array results in a boot time Oops(),
highlighting this problem:

[    0.000000] Unable to handle kernel paging request at virtual address dfff200000000000
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x96000004
[    0.000000]   Exception class = DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000004
[    0.000000]   CM = 0, WnR = 0
[    0.000000] [dfff200000000000] address between user and kernel address ranges
[    0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.16.0-07317-gf165b378bbdf #42
[    0.000000] Hardware name: ARM Juno development board (r1) (DT)
[    0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO)
[    0.000000] pc : __asan_load8+0x8c/0xa8
[    0.000000] lr : __dump_page+0x3c/0x3b8
[    0.000000] sp : ffff2000099b7ca0
[    0.000000] x29: ffff2000099b7ca0 x28: ffff20000a1762c0
[    0.000000] x27: ffff7e0000000000 x26: ffff2000099dd000
[    0.000000] x25: ffff200009a3f960 x24: ffff200008f9c38c
[    0.000000] x23: ffff20000a9d3000 x22: ffff200009735430
[    0.000000] x21: fffffffffffffffe x20: ffff7e0001e50420
[    0.000000] x19: ffff7e0001e50400 x18: 0000000000001840
[    0.000000] x17: ffffffffffff8270 x16: 0000000000001840
[    0.000000] x15: 0000000000001920 x14: 0000000000000004
[    0.000000] x13: 0000000000000000 x12: 0000000000000800
[    0.000000] x11: 1ffff0012d0f89ff x10: ffff10012d0f89ff
[    0.000000] x9 : 0000000000000000 x8 : ffff8009687c5000
[    0.000000] x7 : 0000000000000000 x6 : ffff10000f282000
[    0.000000] x5 : 0000000000000040 x4 : fffffffffffffffe
[    0.000000] x3 : 0000000000000000 x2 : dfff200000000000
[    0.000000] x1 : 0000000000000005 x0 : 0000000000000000
[    0.000000] Process swapper (pid: 0, stack limit = 0x        (ptrval))
[    0.000000] Call trace:
[    0.000000]  __asan_load8+0x8c/0xa8
[    0.000000]  __dump_page+0x3c/0x3b8
[    0.000000]  dump_page+0xc/0x18
[    0.000000]  kasan_init+0x2e8/0x5a8
[    0.000000]  setup_arch+0x294/0x71c
[    0.000000]  start_kernel+0xdc/0x500
[    0.000000] Code: aa0403e0 9400063c 17ffffee d343fc00 (38e26800)
[    0.000000] ---[ end trace 67064f0e9c0cc338 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Let's fix this by using early_pfn_to_nid(), as other architectures do in
their kasan init code. Note that early_pfn_to_nid acquires the nid from
the memblock array, which we iterate over in kasan_init(), so this
should be fine.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-04-17 16:16:59 +01:00
Joerg Roedel d6ef1f194b x86/mm: Prevent kernel Oops in PTDUMP code with HIGHPTE=y
The walk_pte_level() function just uses __va to get the virtual address of
the PTE page, but that breaks when the PTE page is not in the direct
mapping with HIGHPTE=y.

The result is an unhandled kernel paging request at some random address
when accessing the current_kernel or current_user file.

Use the correct API to access PTE pages.

Fixes: fe770bf031 ('x86: clean up the page table dumper and add 32-bit support')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: jgross@suse.com
Cc: JBeulich@suse.com
Cc: hpa@zytor.com
Cc: aryabinin@virtuozzo.com
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/1523971636-4137-1-git-send-email-joro@8bytes.org
2018-04-17 15:43:01 +02:00
Alison Schofield 1340ccfa9a x86,sched: Allow topologies where NUMA nodes share an LLC
Intel's Skylake Server CPUs have a different LLC topology than previous
generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
into two "slices", each containing half the cores, half the LLC, and one
memory controller and each slice is enumerated to Linux as a NUMA
node. This is similar to how the cores and LLC were arranged for the
Cluster-On-Die (CoD) feature.

CoD allowed the same cache line to be present in each half of the LLC.
But, with SNC, each line is only ever present in *one* slice. This means
that the portion of the LLC *available* to a CPU depends on the data being
accessed:

    Remote socket: entire package LLC is shared
    Local socket->local slice: data goes into local slice LLC
    Local socket->remote slice: data goes into remote-slice LLC. Slightly
                    		higher latency than local slice LLC.

The biggest implication from this is that a process accessing all
NUMA-local memory only sees half the LLC capacity.

The CPU describes its cache hierarchy with the CPUID instruction. One of
the CPUID leaves enumerates the "logical processors sharing this
cache". This information is used for scheduling decisions so that tasks
move more freely between CPUs sharing the cache.

But, the CPUID for the SNC configuration discussed above enumerates the LLC
as being shared by the entire package. This is not 100% precise because the
entire cache is not usable by all accesses. But, it *is* the way the
hardware enumerates itself, and this is not likely to change.

The userspace visible impact of all the above is that the sysfs info
reports the entire LLC as being available to the entire package. As noted
above, this is not true for local socket accesses. This patch does not
correct the sysfs info. It is the same, pre and post patch.

The current code emits the following warning:

 sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.

The warning is coming from the topology_sane() check in smpboot.c because
the topology is not matching the expectations of the model for obvious
reasons.

To fix this, add a vendor and model specific check to never call
topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
the "coregroup" sched_domain_topology_level and use NUMA information from
the SRAT alone.

This is OK at least on the hardware we are immediately concerned about
because the LLC sharing happens at both the slice and at the package level,
which are also NUMA boundaries.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: brice.goglin@gmail.com
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
2018-04-17 15:39:55 +02:00
Marc Zyngier 165d102905 arm64: KVM: Demote SVE and LORegion warnings to debug only
While generating a message about guests probing for SVE/LORegions
is a useful debugging tool, considering it an error is slightly
over the top, as this is the only way the guest can find out
about the presence of the feature.

Let's turn these message into kvm_debug so that they can only
be seen if CONFIG_DYNAMIC_DEBUG, and kept quiet otherwise.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
2018-04-17 12:56:36 +01:00
Dou Liyang 451cf3ca7d x86/processor: Remove two unused function declarations
early_trap_init() and cpu_set_gdt() have been removed, so remove the stale
declarations as well.

Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: keescook@chromium.org
Cc: luto@kernel.org
Cc: hpa@zytor.com
Cc: bp@suse.de
Cc: kirill.shutemov@linux.intel.com
Link: https://lkml.kernel.org/r/20180404064527.10562-1-douly.fnst@cn.fujitsu.com
2018-04-17 11:56:32 +02:00
Dou Liyang 10daf10ab1 x86/acpi: Prevent X2APIC id 0xffffffff from being accounted
RongQing reported that there are some X2APIC id 0xffffffff in his machine's
ACPI MADT table, which makes the number of possible CPU inaccurate.

The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
Architecture x2APIC Specification", Chapter 2.4.1.

Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.

Reported-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: len.brown@intel.com
Cc: rjw@rjwysocki.net
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/20180412014052.25186-1-douly.fnst@cn.fujitsu.com
2018-04-17 11:56:31 +02:00
Xiaoming Gao d3878e164d x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:

    hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6

and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.

    tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref

This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.

On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.

Use div64_u64() to avoid the problem.

[ tglx: Fixes whitespace mangled patch and rewrote changelog ]

Signed-off-by: Xiaoming Gao <newtongao@tencent.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
2018-04-17 11:50:42 +02:00
Christoph Hellwig ef97837db9 x86: Remove pci-nommu.c
The commit that switched x86 to dma_direct_ops stopped using and building
this file, but accidentally left it in the tree.  Remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: iommu@lists.infradead.org
Link: https://lkml.kernel.org/r/20180416124442.13831-1-hch@lst.de
2018-04-17 11:48:06 +02:00
Madhavan Srinivasan 9dfbf78e41 powerpc/64s: Default l1d_size to 64K in RFI fallback flush
If there is no d-cache-size property in the device tree, l1d_size could
be zero. We don't actually expect that to happen, it's only been seen
on mambo (simulator) in some configurations.

A zero-size l1d_size leads to the loop in the asm wrapping around to
2^64-1, and then walking off the end of the fallback area and
eventually causing a page fault which is fatal.

Just default to 64K which is correct on some CPUs, and sane enough to
not cause a crash on others.

Fixes: aa8a5e0062 ('powerpc/64s: Add support for RFI flush of L1-D cache')
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
[mpe: Rewrite comment and change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-17 19:29:04 +10:00
Martin Schwidefsky fae7649121 s390/signal: cleanup uapi struct sigaction
The struct sigaction for user space in arch/s390/include/uapi/asm/signal.h
is ill defined. The kernel uses two structures 'struct sigaction' and
'struct old_sigaction', the correlation in the kernel for both 31 and
64 bit is as follows

    sys_sigaction -> struct old_sigaction
    sys_rt_sigaction -> struct sigaction

The correlation of the (single) uapi definition for 'struct sigaction'
under '#ifndef __KERNEL__':

    31-bit: sys_sigaction -> uapi struct sigaction
    31-bit: sys_rt_sigaction -> no structure available

    64-bit: sys_sigaction -> no structure available
    64-bit: sys_rt_sigaction -> uapi struct sigaction

This is quite confusing. To make it a bit less confusing make the
uapi definition of 'struct sigaction' usable for sys_rt_sigaction for
both 31-bit and 64-bit.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-17 10:36:12 +02:00
Linus Torvalds a27fc14219 Merge branch 'parisc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc build fix from Helge Deller:
 "Fix build error because of missing binfmt_elf32.o file which is still
  mentioned in the Makefile"

* 'parisc-4.17-3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Fix missing binfmt_elf32.o build error
2018-04-16 14:07:39 -07:00
Matt Redfearn 8a8158c85e
MIPS: memset.S: EVA & fault support for small_memset
The MIPS kernel memset / bzero implementation includes a small_memset
branch which is used when the region to be set is smaller than a long (4
bytes on 32bit, 8 bytes on 64bit). The current small_memset
implementation uses a simple store byte loop to write the destination.
There are 2 issues with this implementation:

1. When EVA mode is active, user and kernel address spaces may overlap.
Currently the use of the sb instruction means kernel mode addressing is
always used and an intended write to userspace may actually overwrite
some critical kernel data.

2. If the write triggers a page fault, for example by calling
__clear_user(NULL, 2), instead of gracefully handling the fault, an OOPS
is triggered.

Fix these issues by replacing the sb instruction with the EX() macro,
which will emit EVA compatible instuctions as required. Additionally
implement a fault fixup for small_memset which sets a2 to the number of
bytes that could not be cleared (as defined by __clear_user).

Reported-by: Chuanhua Lei <chuanhua.lei@intel.com>
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/18975/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-16 21:31:26 +01:00
Linus Torvalds e6d9bfdeb4 Bug fixes, plus a new test case and the associated infrastructure for
writing nested virtualization tests.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJa1MZMAAoJEL/70l94x66DupgH/jIRQ6wsZ9Hq5qBJ39sLFXNe
 cAIAbaCUAck4tl5YNDgv/SOQ644ClmDVP/4CgezqosoY29eLY0+P71GQZEIQ7aB5
 Taa7UI5qYnIctBmxFwD1+iV717Vyb+QLpRnMb8zjLkfT/3S8HsQvpcYJlQrrN3PP
 w4VIvhZjPx11wvXDCuY6ire7sBEb/vSQQewGWg9dLt4hnDz1tRFMtAg/7GVT+rG9
 SjuH57NrXAKWiNVlQvYfLSfaTyPf5J41i49nwFJJVPY1kMaXvOSDDOfejTD/SjTs
 pYye7o8TGbrsY9O8H85gxdppHz4K0+sP9xNunUqk1wQ+zo9lWTejIaDoN2rzyuA=
 =GKBC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "Bug fixes, plus a new test case and the associated infrastructure for
  writing nested virtualization tests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: selftests: add vmx_tsc_adjust_test
  kvm: x86: move MSR_IA32_TSC handling to x86.c
  X86/KVM: Properly update 'tsc_offset' to represent the running guest
  kvm: selftests: add -std=gnu99 cflags
  x86: Add check for APIC access address for vmentry of L2 guests
  KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
  X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available
  kvm: selftests: fix spelling mistake: "divisable" and "divisible"
  X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
2018-04-16 11:24:28 -07:00
Joerg Roedel e6f39e87b6 x86/ldt: Fix support_pte_mask filtering in map_ldt_struct()
The |= operator will let us end up with an invalid PTE. Use
the correct &= instead.

[ The bug was also independently reported by Shuah Khan ]

Fixes: fb43d6cb91 ('x86/mm: Do not auto-massage page protections')
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-16 11:20:34 -07:00
Tony Lindgren bc8a3ef194 ARM: dts: Fix cm2 and prm sizes for omap4
The size of these modules is 0x2000, not 0x3000. The extra 0x1000
after 0x2000 is for the interconnect target agent which is a separate
device.

Fixes: 7415b0b4c6 ("ARM: dts: omap4: add minimal l4 bus layout with
control module support")
Cc: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
2018-04-16 10:01:04 -07:00
Paolo Bonzini dd259935e4 kvm: x86: move MSR_IA32_TSC handling to x86.c
This is not specific to Intel/AMD anymore.  The TSC offset is available
in vcpu->arch.tsc_offset.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-16 17:50:22 +02:00
KarimAllah Ahmed e79f245dde X86/KVM: Properly update 'tsc_offset' to represent the running guest
Update 'tsc_offset' on vmentry/vmexit of L2 guests to ensure that it always
captures the TSC_OFFSET of the running guest whether it is the L1 or L2
guest.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
[AMD changes, fix update_ia32_tsc_adjust_msr. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-16 17:50:11 +02:00
Michael Ellerman b8858581fe powerpc/lib: Fix off-by-one in alternate feature patching
When we patch an alternate feature section, we have to adjust any
relative branches that branch out of the alternate section.

But currently we have a bug if we have a branch that points to past
the last instruction of the alternate section, eg:

  FTR_SECTION_ELSE
  1:     b       2f
         or      6,6,6
  2:
  ALT_FTR_SECTION_END(...)
         nop

This will result in a relative branch at 1 with a target that equals
the end of the alternate section.

That branch does not need adjusting when it's moved to the non-else
location. Currently we do adjust it, resulting in a branch that goes
off into the link-time location of the else section, which is junk.

The fix is to not patch branches that have a target == end of the
alternate section.

Fixes: d20fe50a7b ("KVM: PPC: Book3S HV: Branch inside feature section")
Fixes: 9b1a735de6 ("powerpc: Add logic to patch alternative feature sections")
Cc: stable@vger.kernel.org # v2.6.27+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-17 00:37:48 +10:00
Thor Thayer 6e8fe39989 ARM: socfpga_defconfig: Remove QSPI Sector 4K size force
Remove QSPI Sector 4K size force which is causing QSPI boot
problems with the JFFS2 root filesystem.

Fixes the following error:
     "Magic bitmask 0x1985 not found at ..."

Cc: stable@vger.kernel.org
Signed-off-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2018-04-16 09:14:30 -05:00
Matt Redfearn 2c2bf522ed
MIPS: dts: Boston: Fix PCI bus dtc warnings:
dtc recently (v1.4.4-8-g756ffc4f52f6) added PCI bus checks. Fix the
warnings now emitted:

arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@10000000: missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@12000000: missing bus-range for PCI bridge
arch/mips/boot/dts/img/boston.dtb: Warning (pci_bridge): /pci@14000000: missing bus-range for PCI bridge

Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-mips@linux-mips.org
Cc: devicetree@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/19070/
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-16 10:37:49 +01:00
Sudeep Holla c326599b29 arm64: dts: juno: drop unnecessary address-cells and size-cells properties
/smb@8000000/motherboard/gpio_keys node doesn't have "ranges" or "reg"
property in child nodes. So it's unnecessary to have address-cells
as well as size-cells properties which results in below warning.

Warning (avoid_unnecessary_addr_size):
/smb@8000000/motherboard/gpio_keys:
	unnecessary #address-cells/#size-cells without "ranges" or child "reg"
	property

This patch drops the unnecessary address+size-cell properties.

Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2018-04-16 10:15:45 +01:00
Heiko Carstens 49d23a851d s390: rename default_defconfig to debug_defconfig
The name debug_defconfig reflects what the config is actually good
for and should be less confusing.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 10:29:35 +02:00
Heiko Carstens cd7cf57f18 s390: remove gcov defconfig
This config is not needed anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 10:29:35 +02:00
Martin Schwidefsky de2011197d s390: update defconfig
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 10:29:34 +02:00
Heiko Carstens 451239eb3d s390: add support for IBM z14 Model ZR1
Just add the new machine type number to the two places that matter.

Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:24 +02:00
Vasily Gorbik c65bbb51c6 s390/boot: remove unused COMPILE_VERSION and ccflags-y
ccflags-y has no effect (no code is built in that directory,
arch/s390/boot/compressed/Makefile defines its own KBUILD_CFLAGS).
Removing ccflags-y together with COMPILE_VERSION.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:24 +02:00
Sebastian Ott 232acdff21 s390/nospec: include cpu.h
Fix the following sparse warnings:
symbol 'cpu_show_spectre_v1' was not declared. Should it be static?
symbol 'cpu_show_spectre_v2' was not declared. Should it be static?

Signed-off-by: Sebastian Ott <sebott@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:23 +02:00
Thomas Richter 701e188c65 s390/decompressor: Ignore file vmlinux.bin.full
Commit 81796a3c6a ("s390/decompressor: trim uncompressed
image head during the build") introduced a new
file named vmlinux.bin.full in directory
arch/s390/boot/compressed.

Add this file to the list of ignored files so it does
not show up on git status.

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:23 +02:00
Heiko Carstens de66b24291 s390/kexec_file: add generated files to .gitignore
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:23 +02:00
Philipp Rudo bdea9f6f7a s390/Kconfig: Move kexec config options to "Processor type and features"
The config options for kexec are currently not under any menu directory. Up
until now this was not a problem as standard kexec is always compiled in
and thus does not create a menu entry. This changed when kexec_file_load
was enabled. Its config option requires a menu entry which, when added
beneath standard kexec option, appears on the main directory above "General
Setup". Thus move the whole block further down such that the entry in now
in "Processor type and features".

While at it also update the help text for kexec file.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:23 +02:00
Philipp Rudo 8be0188271 s390/kexec_file: Add ELF loader
Add an ELF loader for kexec_file. The main task here is to do proper sanity
checks on the ELF file. Basically all other functionality was already
implemented for the image loader.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:23 +02:00
Philipp Rudo ee337f5469 s390/kexec_file: Add crash support to image loader
Add support to load a crash kernel to the image loader. This requires
extending the purgatory.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:22 +02:00
Philipp Rudo e49bb0a27f s390/kexec_file: Add image loader
Add an image loader for kexec_file_load. For simplicity first skip crash
support. The functions defined in machine_kexec_file will later be shared
with the ELF loader.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:22 +02:00
Philipp Rudo 71406883fd s390/kexec_file: Add kexec_file_load system call
This patch adds the kexec_file_load system call to s390 as well as the arch
specific functions common code requires to work. Loaders for the different
file types will be added later.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:22 +02:00
Philipp Rudo 840798a1f5 s390/kexec_file: Add purgatory
The common code expects the architecture to have a purgatory that runs
between the two kernels. Add it now. For simplicity first skip crash
support.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:22 +02:00
Philipp Rudo 15ceb8c936 s390/kexec_file: Prepare setup.h for kexec_file_load
kexec_file_load needs to prepare the new kernels before they are loaded.
For that it has to know the offsets in head.S, e.g. to register the new
command line. Unfortunately there are no macros right now defining those
offsets. Define them now.

Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-16 09:10:21 +02:00
Ingo Molnar aacd188a2d perf/core improvements and fixes:
perf annotate:
 
 - Allow showing offsets in more than just jump targets, use the new
   'O' hotkey in the TUI, config ~/.perfconfig annotate.offset_level
   for it and for --stdio2 (Arnaldo Carvalho de Melo)
 
 - Use the resolved variable names from objdump disassembled lines to
   make them more compact, just like was already done for some instructions,
   like "mov", this eventually will be done more generally, but lets now add
   some more to the existing mechanism (Arnaldo Carvalho de Melo)
 
 perf record:
 
 - Change warning for missing topology sysfs entry to debug, as not all
   architectures have those files, s390 being one of those (Thomas Richter)
 
 perf sched:
 
 - Fix -g/--call-graph documentation (Takuya Yamamoto)
 
 perf stat:
 
 - Enable 1ms interval for printing event counters values in (Alexey Budankov)
 
 perf test:
 
 - Run dwarf unwind  on arm32 (Kim Phillips)
 
 - Remove unused ptrace.h include from LLVM test, sidesteping older
   clang's lack of support for some asm constructs (Arnaldo Carvalho de Melo)
 
 perf version:
 
 - Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version --build-options'
   when HAVE_SYSCALL_TABLE_SUPPORT is true, as libaudit won't be used in that
   case, print info about syscall_table support instead (Jin Yao)
 
 Build system:
 
 - Use HAVE_..._SUPPORT used consistently (Jin Yao)
 
 - Restore READ_ONCE() C++ compatibility in tools/include (Mark Rutland)
 
 - Give hints about package names needed to build jvmti (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEELb9bqkb7Te0zijNb1lAW81NSqkAFAlrQq1MACgkQ1lAW81NS
 qkCt3A/+N3Tq41g6zxvO5kIH/mnjCdZ6D1n+7qPOkBnmEPZhsyo6QCiYld3gHxaq
 kmOecRqKzdMx/4xDArdCXizw0iNWecAEa0vCk+A8qfEeBS9ZiditU+vqrzLhzbxr
 wHR1YA3oJSUeQzGmTbXgjjc2ySmfK7EJcBdP+diESXQIRkO6DfpPsxeR6UBoyGT+
 gWM5GvTRxa4P6hlVv+uEsdWDvziPIL7Uk/ykKJA6s2BVScvXog2uJqfzroYlqJWG
 TIEobGfU7zoLVuZtuj/8E8tncQwyNSX+BgFRiZxX5gHxv4VGG1q7D4ug302Q/ctU
 Dkted9+lL7W1cFcNOgsFJAm5TkaczGKFezRvVMVv6T9LCJbJIddRxG5LjKWPb1Gk
 ok242hlFzH2a1Sas7MQKRXnhaHxjjVUTKO6Vgq24AXoWgFWVyTdNtFL8D7FBkQfZ
 eL0S10mgUSG5n3WnfKeomgt2BqwTMURXEIwRnMv+er2hkmeBl80K9BbKBc8KyAAX
 CLn2bS35S/NXyX4Cin44gBYLPdbDjg7r8WDdtstIQsmF6SvofDtTCKlVhQL9++BH
 lOH+hPkwuai1eOXnhUgCd0pUbO5PmcKfpd2Tv5hOR3Xr6ATL7HEZ8k24Tl6r1T7T
 MAfz4lv/wUoX+Gx1xpxt/6+hAau+yNs2QW5i7Szk/NCMUSx/rlw=
 =mR6l
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-4.17-20180413' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull tooling improvements and fixes from Arnaldo Carvalho de Melo:

perf annotate fixes and improvements:

- Allow showing offsets in more than just jump targets, use the new
  'O' hotkey in the TUI, config ~/.perfconfig annotate.offset_level
  for it and for --stdio2 (Arnaldo Carvalho de Melo)

- Use the resolved variable names from objdump disassembled lines to
  make them more compact, just like was already done for some instructions,
  like "mov", this eventually will be done more generally, but lets now add
  some more to the existing mechanism (Arnaldo Carvalho de Melo)

perf record fixes:

- Change warning for missing topology sysfs entry to debug, as not all
  architectures have those files, s390 being one of those (Thomas Richter)

perf sched fixes:

- Fix -g/--call-graph documentation (Takuya Yamamoto)

perf stat:

- Enable 1ms interval for printing event counters values in (Alexey Budankov)

perf test fixes:

- Run dwarf unwind  on arm32 (Kim Phillips)

- Remove unused ptrace.h include from LLVM test, sidesteping older
  clang's lack of support for some asm constructs (Arnaldo Carvalho de Melo)

perf version fixes:

- Do not print info about HAVE_LIBAUDIT_SUPPORT in 'perf version --build-options'
  when HAVE_SYSCALL_TABLE_SUPPORT is true, as libaudit won't be used in that
  case, print info about syscall_table support instead (Jin Yao)

Build system fixes:

- Use HAVE_..._SUPPORT used consistently (Jin Yao)

- Restore READ_ONCE() C++ compatibility in tools/include (Mark Rutland)

- Give hints about package names needed to build jvmti (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-16 08:15:22 +02:00
Al Viro a24cd49073 hypfs_kill_super(): deal with failed allocations
hypfs_fill_super() might fail to allocate sbi; hypfs_kill_super()
should not oops on that.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-04-15 23:48:04 -04:00
Linus Torvalds ca71b3ba4c Kbuild updates for v4.17 (2nd)
- pass HOSTLDFLAGS when compiling single .c host programs
 
 - build genksyms lexer and parser files instead of using shipped
   versions
 
 - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency
 
 - let the top .gitignore globally ignore artifacts generated by
   flex, bison, and asn1_compiler
 
 - let the top Makefile globally clean artifacts generated by
   flex, bison, and asn1_compiler
 
 - use safer .SECONDARY marker instead of .PRECIOUS to prevent
   intermediate files from being removed
 
 - support -fmacro-prefix-map option to make __FILE__ a relative path
 
 - fix # escaping to prepare for the future GNU Make release
 
 - clean up deb-pkg by using debian tools instead of handrolled
   source/changes generation
 
 - improve rpm-pkg portability by supporting kernel-install as a
   fallback of new-kernel-pkg
 
 - extend Kconfig listnewconfig target to provide more information
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJa0krLAAoJED2LAQed4NsGyCAP/3Vsb8A4sea7sE3LV6/aFUJp
 WcAm6PXcip1MXy7GI5yxFciwen3Z3ghQUer7fJKDcHR5c4mRSfKaqWp+TLHd6uux
 7I4pV0FNx2PapcPu5T7wNZHN96p3xZC0Z66sq9BCZ/+gNyYmZLIDcBUSIOEk0nzJ
 IsvD46zy6R6KtEnycShKVscg4JyPXJIw1UBqsPDEFHg5l16ARkghND7e5zTW62Fi
 2MqQxNXAksIKpxxoxPH/fIcNp1kFKVxYBH2CW4LQtOjC3GmrozdeV5PUc7yTezPc
 dpqOuEcIAbMH91bkvhhF+ZBi34YrxRoT4S8B3G9iCXRz+2LRZZaitqO4dAH8Kjbn
 0KjkqzNc5TosJXQ8RPTcQlRBi+JmE1bHxICvTx3XNJcqJMqIH0vs3ez/LJKOwhB4
 DbAROoxQNfVcOdouHcx2EuCSdHn24BEyzaGFhi04LACpbRLxr8IJS7hSGXRloBYp
 K3ydRvG/dCZjFRTS+xWWSi3Nzjih2mCctQlH3D4nf4M3vtCX+/k5B9IMEYFfHlvL
 KoNlK4/1vP/dAJZj0iOqd2ksCA1G6iLoHrFp3E5pdtmb4sVe2Ez3gMt+pxz3htR9
 XvjuHOzkWE9eiihs1NsFgQuyP/o3UmNKpDDW0irQ06IFEPXkA/y1mVmeTU3qtrII
 ZDiwGozIkMMEy/MLkcjE
 =tD6R
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull more Kbuild updates from Masahiro Yamada:

 - pass HOSTLDFLAGS when compiling single .c host programs

 - build genksyms lexer and parser files instead of using shipped
   versions

 - rename *-asn1.[ch] to *.asn1.[ch] for suffix consistency

 - let the top .gitignore globally ignore artifacts generated by flex,
   bison, and asn1_compiler

 - let the top Makefile globally clean artifacts generated by flex,
   bison, and asn1_compiler

 - use safer .SECONDARY marker instead of .PRECIOUS to prevent
   intermediate files from being removed

 - support -fmacro-prefix-map option to make __FILE__ a relative path

 - fix # escaping to prepare for the future GNU Make release

 - clean up deb-pkg by using debian tools instead of handrolled
   source/changes generation

 - improve rpm-pkg portability by supporting kernel-install as a
   fallback of new-kernel-pkg

 - extend Kconfig listnewconfig target to provide more information

* tag 'kbuild-v4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: extend output of 'listnewconfig'
  kbuild: rpm-pkg: use kernel-install as a fallback for new-kernel-pkg
  Kbuild: fix # escaping in .cmd files for future Make
  kbuild: deb-pkg: split generating packaging and build
  kbuild: use -fmacro-prefix-map to make __FILE__ a relative path
  kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers
  kbuild: rename *-asn1.[ch] to *.asn1.[ch]
  kbuild: clean up *-asn1.[ch] patterns from top-level Makefile
  .gitignore: move *-asn1.[ch] patterns to the top-level .gitignore
  kbuild: add %.dtb.S and %.dtb to 'targets' automatically
  kbuild: add %.lex.c and %.tab.[ch] to 'targets' automatically
  genksyms: generate lexer and parser during build instead of shipping
  kbuild: clean up *.lex.c and *.tab.[ch] patterns from top-level Makefile
  .gitignore: move *.lex.c *.tab.[ch] patterns to the top-level .gitignore
  kbuild: use HOSTLDFLAGS for single .c executables
2018-04-15 17:21:30 -07:00
Linus Torvalds 9fb71c2f23 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of fixes and updates for x86:

   - Address a swiotlb regression which was caused by the recent DMA
     rework and made driver fail because dma_direct_supported() returned
     false

   - Fix a signedness bug in the APIC ID validation which caused invalid
     APIC IDs to be detected as valid thereby bloating the CPU possible
     space.

   - Fix inconsisten config dependcy/select magic for the MFD_CS5535
     driver.

   - Fix a corruption of the physical address space bits when encryption
     has reduced the address space and late cpuinfo updates overwrite
     the reduced bit information with the original value.

   - Dominiks syscall rework which consolidates the architecture
     specific syscall functions so all syscalls can be wrapped with the
     same macros. This allows to switch x86/64 to struct pt_regs based
     syscalls. Extend the clearing of user space controlled registers in
     the entry patch to the lower registers"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apic: Fix signedness bug in APIC ID validity checks
  x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
  x86/olpc: Fix inconsistent MFD_CS5535 configuration
  swiotlb: Use dma_direct_supported() for swiotlb_ops
  syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
  syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
  syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
  syscalls/core, syscalls/x86: Clean up syscall stub naming convention
  syscalls/x86: Extend register clearing on syscall entry to lower registers
  syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64
  syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32
  syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls
  syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls
  syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y
  x86/syscalls: Don't pointlessly reload the system call number
  x86/mm: Fix documentation of module mapping range with 4-level paging
  x86/cpuid: Switch to 'static const' specifier
2018-04-15 16:12:35 -07:00
Linus Torvalds 6b0a02e86c Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 pti updates from Thomas Gleixner:
 "Another series of PTI related changes:

   - Remove the manual stack switch for user entries from the idtentry
     code. This debloats entry by 5k+ bytes of text.

   - Use the proper types for the asm/bootparam.h defines to prevent
     user space compile errors.

   - Use PAGE_GLOBAL for !PCID systems to gain back performance

   - Prevent setting of huge PUD/PMD entries when the entries are not
     leaf entries otherwise the entries to which the PUD/PMD points to
     and are populated get lost"

* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
  x86/pti: Leave kernel text global for !PCID
  x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
  x86/pti: Enable global pages for shared areas
  x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
  x86/mm: Comment _PAGE_GLOBAL mystery
  x86/mm: Remove extra filtering in pageattr code
  x86/mm: Do not auto-massage page protections
  x86/espfix: Document use of _PAGE_GLOBAL
  x86/mm: Introduce "default" kernel PTE mask
  x86/mm: Undo double _PAGE_PSE clearing
  x86/mm: Factor out pageattr _PAGE_GLOBAL setting
  x86/entry/64: Drop idtentry's manual stack switch for user entries
  x86/uapi: Fix asm/bootparam.h userspace compilation errors
2018-04-15 13:35:29 -07:00
Linus Torvalds 174e719439 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner:
 "A rather large set of perf updates:

  Kernel:

   - Fix various initialization issues

   - Prevent creating [ku]probes for not CAP_SYS_ADMIN users

  Tooling:

   - Show only failing syscalls with 'perf trace --failure' (Arnaldo
     Carvalho de Melo)

            e.g: See what 'openat' syscalls are failing:

        # perf trace --failure -e openat
         762.323 ( 0.007 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video2) = -1 ENOENT No such file or directory
         <SNIP N /dev/videoN open attempts... sigh, where is that improvised camera lid?!? >
         790.228 ( 0.008 ms): VideoCapture/4566 openat(dfd: CWD, filename: /dev/video63) = -1 ENOENT No such file or directory
        ^C#

   - Show information about the event (freq, nr_samples, total
     period/nr_events) in the annotate --tui and --stdio2 'perf
     annotate' output, similar to the first line in the 'perf report
     --tui', but just for the samples for a the annotated symbol
     (Arnaldo Carvalho de Melo)

   - Introduce 'perf version --build-options' to show what features were
     linked, aliased as well as a shorter 'perf -vv' (Jin Yao)

   - Add a "dso_size" sort order (Kim Phillips)

   - Remove redundant ')' in the tracepoint output in 'perf trace'
     (Changbin Du)

   - Synchronize x86's cpufeatures.h, no effect on toolss (Arnaldo
     Carvalho de Melo)

   - Show group details on the title line in the annotate browser and
     'perf annotate --stdio2' output, so that the per-event columns can
     have headers (Arnaldo Carvalho de Melo)

   - Fixup vertical line separating metrics from instructions and
     cleaning unused lines at the bottom, both in the annotate TUI
     browser (Arnaldo Carvalho de Melo)

   - Remove duplicated 'samples' in lost samples warning in
     'perf report' (Arnaldo Carvalho de Melo)

   - Synchronize i915_drm.h, silencing the perf build process,
     automagically adding support for the new DRM_I915_QUERY ioctl
     (Arnaldo Carvalho de Melo)

   - Make auxtrace_queues__add_buffer() allocate struct buffer, from a
     patchkit already applied (Adrian Hunter)

   - Fix the --stdio2/TUI annotate output to include group details, be
     it for a recorded '{a,b,f}' explicit event group or when forcing
     group display using 'perf report --group' for a set of events not
     recorded as a group (Arnaldo Carvalho de Melo)

   - Fix display artifacts in the ui browser (base class for the
     annotate and main report/top TUI browser) related to the extra
     title lines work (Arnaldo Carvalho de Melo)

   - perf auxtrace refactorings, leftovers from a previously partially
     processed patchset (Adrian Hunter)

   - Fix the builtin clang build (Sandipan Das, Arnaldo Carvalho de
     Melo)

   - Synchronize i915_drm.h, silencing a perf build warning and in the
     process automagically adding support for a new ioctl command
     (Arnaldo Carvalho de Melo)

   - Fix a strncpy issue in uprobe tracing"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  perf/core: Need CAP_SYS_ADMIN to create k/uprobe with perf_event_open()
  tracing/uprobe_event: Fix strncpy corner case
  perf/core: Fix perf_uprobe_init()
  perf/core: Fix perf_kprobe_init()
  perf/core: Fix use-after-free in uprobe_perf_close()
  perf tests clang: Fix function name for clang IR test
  perf clang: Add support for recent clang versions
  perf tools: Fix perf builds with clang support
  perf tools: No need to include namespaces.h in util.h
  perf hists browser: Remove leftover from row returned from refresh
  perf hists browser: Show extra_title_lines in the 'D' debug hotkey
  perf auxtrace: Make auxtrace_queues__add_buffer() do CPU filtering
  tools headers uapi: Synchronize i915_drm.h
  perf report: Remove duplicated 'samples' in lost samples warning
  perf ui browser: Fixup cleaning unused lines at the bottom
  perf annotate browser: Fixup vertical line separating metrics from instructions
  perf annotate: Show group details on the title line
  perf auxtrace: Make auxtrace_queues__add_buffer() allocate struct buffer
  perf/x86/intel: Move regs->flags EXACT bit init
  perf trace: Remove redundant ')'
  ...
2018-04-15 12:36:31 -07:00
Linus Torvalds 19ca90de49 Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 EFI bootup fixlet from Thomas Gleixner:
 "A single fix for an early boot warning caused by invoking
  this_cpu_has() before SMP initialization"

* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Fix bogus warning during EFI bootup, use boot_cpu_has() instead of this_cpu_has() in build_cr3_noflush()
2018-04-15 12:32:06 -07:00
Linus Torvalds 9dceab89d8 OpenRISC updates for v4.17
Just one small thing here, it came in a while back but I didnt have
 anything in my 4.16 queue, still its the only thing for 4.17 so sending
 it alone.
 
 Small cleanup:
  - remove unused __ARCH_HAVE_MMU define
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJayP1LAAoJEMOzHC1eZifkoCcQAK5PZ5awA3+DVAPNWN3I9hMF
 0tk3f+R+GrF+wBQ8IwXOok4QaXmJrDwgxHnNRrhBFqHROz1OjEDYNnOk4Two3rs6
 +yTw7TC9GMyowxWcyf+VHzo4/54XIk/Cwg8thEN5W2JSYC/kn1f0HIqfF6AYfCcw
 fLqhDbuM09cslvdhfgsdRQ/ZPDcQjrr9XDP8CUZIeHqCxbPwPWikOUmey4itZKTi
 VQO60SAjNjqSYWcIEcH8EfCvvpEndREvzOH8w7um282USKfiQv3/Qkur3ihTT+0e
 /7JF2LZFLBVPU3dqZVlZDfukamfU+aRsXMh4foEhM6l6uddiKYnu0wjh25trJ6P2
 AlM8BsGL3kjWi7mqR7nxuOgRsLqNcTGPcJaU5X/TGTObDLADATDf7xrxcGxPFMY/
 Z3xkBmKHxv7tqSEo5pfUqvNV7mzT3uep5t4Ql9wq6v0dj+feB6sNx1n/mjSLefOf
 6g9mTarcVR6QajJZYggVNdY1XGOLh88MKpbV/oYIYqMNC34Q/2JTVvKKqsBmyi3r
 Rjk14gZ/fQeKT28XzUgRabDwevCSBzHknDBCQRoQJOjwC/q6Pm7xlOx8XCgEavBn
 OgqXEzq4aWrsAOL2Ffi24+dMvnJe6IOGkU0xDQD1elc1FxJedsMPJUgS9FEnPkGN
 TC3WReB0EdSJ8QI8/oy7
 =edKX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://github.com/openrisc/linux

Pull OpenRISC fixlet from Stafford Horne:
 "Just one small thing here, it came in a while back but I didnt have
  anything in my 4.16 queue, still its the only thing for 4.17 so
  sending it alone.

  Small cleanup: remove unused __ARCH_HAVE_MMU define"

* tag 'for-linus' of git://github.com/openrisc/linux:
  openrisc: remove unused __ARCH_HAVE_MMU define
2018-04-15 12:27:58 -07:00
Linus Torvalds b1cb4f93b5 powerpc fixes for 4.17 #2
- Fix crashes when loading modules built with a different CONFIG_RELOCATABLE
    value by adding CONFIG_RELOCATABLE to vermagic.
 
  - Fix busy loops in the OPAL NVRAM driver if we get certain error conditions
    from firmware.
 
  - Remove tlbie trace points from KVM code that's called in real mode, because
    it causes crashes.
 
  - Fix checkstops caused by invalid tlbiel on Power9 Radix.
 
  - Ensure the set of CPU features we "know" are always enabled is actually the
    minimal set when we build with support for firmware supplied CPU features.
 
 Thanks to:
   Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJa0oWBExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAV
 EA//UQB7n33HlroMBL3841VswUrIgY36gSi9+QZPjxHiTYGVStI+u0FHq5hm8OMm
 1FkronD0sfbN7t8LJ9FS6vCoAlW15vMBj95pWJXAL7NuQneO7cM5JvRbowVKbNbq
 GESn4EkUj9bAXl7aTzX2yA7jruzMJIwE/2bgl1J6YkpByHx5x/fgzKTuoCRggQqY
 Xd656ZZyWR/uIuWhAjCPFdrPnekVvo7/Gy5Yo5kcR6LbX4MO0JFNOHpiT3wPGGv/
 LJedJtdpH1biMk7SrqLfWD7mpMsVJeMelpkftEbe8zUXf17wBd1rl4JkybLTDwZD
 HznZ0nbnh0j5Q/KRHwOh0sLDSisJF6pQHH/Bnc4Xqrsn4OERwEk40/Ym/O0kDsjH
 n2F3X5a5QLWoBWRsdV3BMyEzc0bgnb++tXeBWOV4jJBEOhoqxwimCU+3fmLiy95A
 urJYf8Sljwve9I5kJyXKpruopgeoJ1aumRwopE8YCG9lfBgzvU/90u6+uKqAdkOv
 kpZxS5FDT2lNIwbCP9WjEelAOGrtA6JQNWU0iPGwsVAvAxX/p1RwwbQeWVlWs3Ek
 UdVF8eGezClpLPa/FQFdDyXfGE6WIf1vK9YnG7k3rUwwkSqoYxKgVQ3o1Vetp0Lg
 fzPsDdDi2V2I3KDYNav8s6kohAIIS/a9XYk4BGvT/htRnUg=
 =0e7c
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Fix crashes when loading modules built with a different
   CONFIG_RELOCATABLE value by adding CONFIG_RELOCATABLE to vermagic.

 - Fix busy loops in the OPAL NVRAM driver if we get certain error
   conditions from firmware.

 - Remove tlbie trace points from KVM code that's called in real mode,
   because it causes crashes.

 - Fix checkstops caused by invalid tlbiel on Power9 Radix.

 - Ensure the set of CPU features we "know" are always enabled is
   actually the minimal set when we build with support for firmware
   supplied CPU features.

Thanks to: Aneesh Kumar K.V, Anshuman Khandual, Nicholas Piggin.

* tag 'powerpc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
  powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
  KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
  powerpc/8xx: Fix build with hugetlbfs enabled
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/fscr: Enable interrupts earlier before calling get_user()
  powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
  powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
2018-04-15 11:57:12 -07:00
Linus Torvalds 18b7fd1c93 Merge branch 'akpm' (patches from Andrew)
Merge yet more updates from Andrew Morton:

 - various hotfixes

 - kexec_file updates and feature work

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (27 commits)
  kernel/kexec_file.c: move purgatories sha256 to common code
  kernel/kexec_file.c: allow archs to set purgatory load address
  kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
  kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs
  kernel/kexec_file.c: split up __kexec_load_puragory
  kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
  kernel/kexec_file.c: search symbols in read-only kexec_purgatory
  kernel/kexec_file.c: make purgatory_info->ehdr const
  kernel/kexec_file.c: remove checks in kexec_purgatory_load
  include/linux/kexec.h: silence compile warnings
  kexec_file, x86: move re-factored code to generic side
  x86: kexec_file: clean up prepare_elf64_headers()
  x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
  x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
  x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
  kexec_file,x86,powerpc: factor out kexec_file_ops functions
  kexec_file: make use of purgatory optional
  proc: revalidate misc dentries
  mm, slab: reschedule cache_reap() on the same CPU
  ...
2018-04-14 08:50:50 -07:00
Helge Deller c7cd882469 parisc: Fix missing binfmt_elf32.o build error
Commit 71d577db01 ("parisc: Switch to generic COMPAT_BINFMT_ELF")
removed the binfmt_elf32.c source file, but missed to drop the object
file from the list of object files the Makefile, which then results in a
build error.

Fixes: 71d577db01 ("parisc: Switch to generic COMPAT_BINFMT_ELF")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-14 11:17:59 +02:00
Philipp Rudo df6f2801f5 kernel/kexec_file.c: move purgatories sha256 to common code
The code to verify the new kernels sha digest is applicable for all
architectures.  Move it to common code.

One problem is the string.c implementation on x86.  Currently sha256
includes x86/boot/string.h which defines memcpy and memset to be gcc
builtins.  By moving the sha256 implementation to common code and
changing the include to linux/string.h both functions are no longer
defined.  Thus definitions have to be provided in x86/purgatory/string.c

Link: http://lkml.kernel.org/r/20180321112751.22196-12-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
Philipp Rudo 3be3f61d25 kernel/kexec_file.c: allow archs to set purgatory load address
For s390 new kernels are loaded to fixed addresses in memory before they
are booted.  With the current code this is a problem as it assumes the
kernel will be loaded to an 'arbitrary' address.  In particular,
kexec_locate_mem_hole searches for a large enough memory region and sets
the load address (kexec_bufer->mem) to it.

Luckily there is a simple workaround for this problem.  By returning 1
in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off.  This
allows the architecture to set kbuf->mem by hand.  While the trick works
fine for the kernel it does not for the purgatory as here the
architectures don't have access to its kexec_buffer.

Give architectures access to the purgatories kexec_buffer by changing
kexec_load_purgatory to take a pointer to it.  With this change
architectures have access to the buffer and can edit it as they need.

A nice side effect of this change is that we can get rid of the
purgatory_info->purgatory_load_address field.  As now the information
stored there can directly be accessed from kbuf->mem.

Link: http://lkml.kernel.org/r/20180321112751.22196-11-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
Philipp Rudo 8da0b72495 kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load
The current code uses the sh_offset field in purgatory_info->sechdrs to
store a pointer to the current load address of the section.  Depending
whether the section will be loaded or not this is either a pointer into
purgatory_info->purgatory_buf or kexec_purgatory.  This is not only a
violation of the ELF standard but also makes the code very hard to
understand as you cannot tell if the memory you are using is read-only
or not.

Remove this misuse and store the offset of the section in
pugaroty_info->purgatory_buf in sh_offset.

Link: http://lkml.kernel.org/r/20180321112751.22196-10-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
Philipp Rudo 8aec395b84 kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*
When the relocations are applied to the purgatory only the section the
relocations are applied to is writable.  The other sections, i.e.  the
symtab and .rel/.rela, are in read-only kexec_purgatory.  Highlight this
by marking the corresponding variables as 'const'.

While at it also change the signatures of arch_kexec_apply_relocations* to
take section pointers instead of just the index of the relocation section.
This removes the second lookup and sanity check of the sections in arch
code.

Link: http://lkml.kernel.org/r/20180321112751.22196-6-prudo@linux.vnet.ibm.com
Signed-off-by: Philipp Rudo <prudo@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:28 -07:00
AKASHI Takahiro babac4a84a kexec_file, x86: move re-factored code to generic side
In the previous patches, commonly-used routines, exclude_mem_range() and
prepare_elf64_headers(), were carved out.  Now place them in kexec
common code.  A prefix "crash_" is given to each of their names to avoid
possible name collisions.

Link: http://lkml.kernel.org/r/20180306102303.9063-8-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro eb7dae947e x86: kexec_file: clean up prepare_elf64_headers()
Removing bufp variable in prepare_elf64_headers() makes the code simpler
and more understandable.

Link: http://lkml.kernel.org/r/20180306102303.9063-7-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro 8d5f894a31 x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer
While CRASH_MAX_RANGES (== 16) seems to be good enough, fixed-number
array is not a good idea in general.

In this patch, size of crash_mem buffer is calculated as before and the
buffer is now dynamically allocated.  This change also allows removing
crash_elf_data structure.

Link: http://lkml.kernel.org/r/20180306102303.9063-6-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro c72c7e6709 x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers()
The code guarded by CONFIG_X86_64 is necessary on some architectures
which have a dedicated kernel mapping outside of linear memory mapping.
(arm64 is among those.)

In this patch, an additional argument, kernel_map, is added to enable/
disable the code removing #ifdef.

Link: http://lkml.kernel.org/r/20180306102303.9063-5-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro cbe6601617 x86: kexec_file: purge system-ram walking from prepare_elf64_headers()
While prepare_elf64_headers() in x86 looks pretty generic for other
architectures' use, it contains some code which tries to list crash
memory regions by walking through system resources, which is not always
architecture agnostic.  To make this function more generic, the related
code should be purged.

In this patch, prepare_elf64_headers() simply scans crash_mem buffer
passed and add all the listed regions to elf header as a PT_LOAD
segment.  So walk_system_ram_res(prepare_elf64_headers_callback) have
been moved forward before prepare_elf64_headers() where the callback,
prepare_elf64_headers_callback(), is now responsible for filling up
crash_mem buffer.

Meanwhile exclude_elf_header_ranges() used to be called every time in
this callback it is rather redundant and now called only once in
prepare_elf_headers() as well.

Link: http://lkml.kernel.org/r/20180306102303.9063-4-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro 9ec4ecef0a kexec_file,x86,powerpc: factor out kexec_file_ops functions
As arch_kexec_kernel_image_{probe,load}(),
arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig()
are almost duplicated among architectures, they can be commonalized with
an architecture-defined kexec_file_ops array.  So let's factor them out.

Link: http://lkml.kernel.org/r/20180306102303.9063-3-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
AKASHI Takahiro b799a09f63 kexec_file: make use of purgatory optional
Patch series "kexec_file, x86, powerpc: refactoring for other
architecutres", v2.

This is a preparatory patchset for adding kexec_file support on arm64.

It was originally included in a arm64 patch set[1], but Philipp is also
working on their kexec_file support on s390[2] and some changes are now
conflicting.

So these common parts were extracted and put into a separate patch set
for better integration.  What's more, my original patch#4 was split into
a few small chunks for easier review after Dave's comment.

As such, the resulting code is basically identical with my original, and
the only *visible* differences are:

 - renaming of _kexec_kernel_image_probe() and  _kimage_file_post_load_cleanup()

 - change one of types of arguments at prepare_elf64_headers()

Those, unfortunately, require a couple of trivial changes on the rest
(#1, #6 to #13) of my arm64 kexec_file patch set[1].

Patch #1 allows making a use of purgatory optional, particularly useful
for arm64.

Patch #2 commonalizes arch_kexec_kernel_{image_probe, image_load,
verify_sig}() and arch_kimage_file_post_load_cleanup() across
architectures.

Patches #3-#7 are also intended to generalize parse_elf64_headers(),
along with exclude_mem_range(), to be made best re-use of.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-February/561182.html
[2] http://lkml.iu.edu//hypermail/linux/kernel/1802.1/02596.html

This patch (of 7):

On arm64, crash dump kernel's usable memory is protected by *unmapping*
it from kernel virtual space unlike other architectures where the region
is just made read-only.  It is highly unlikely that the region is
accidentally corrupted and this observation rationalizes that digest
check code can also be dropped from purgatory.  The resulting code is so
simple as it doesn't require a bit ugly re-linking/relocation stuff,
i.e.  arch_kexec_apply_relocations_add().

Please see:

   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545428.html

All that the purgatory does is to shuffle arguments and jump into a new
kernel, while we still need to have some space for a hash value
(purgatory_sha256_digest) which is never checked against.

As such, it doesn't make sense to have trampline code between old kernel
and new kernel on arm64.

This patch introduces a new configuration, ARCH_HAS_KEXEC_PURGATORY, and
allows related code to be compiled in only if necessary.

[takahiro.akashi@linaro.org: fix trivial screwup]
  Link: http://lkml.kernel.org/r/20180309093346.GF25863@linaro.org
Link: http://lkml.kernel.org/r/20180306102303.9063-2-takahiro.akashi@linaro.org
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Acked-by: Dave Young <dyoung@redhat.com>
Tested-by: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Michael S. Tsirkin d081107867 mm/gup.c: document return value
__get_user_pages_fast handles errors differently from
get_user_pages_fast: the former always returns the number of pages
pinned, the later might return a negative error code.

Link: http://lkml.kernel.org/r/1522962072-182137-6-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thorsten Leemhuis <regressions@leemhuis.info>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-13 17:10:27 -07:00
Sinan Kaya a1cc7034e3
MIPS: io: Add barrier after register read in readX()
While a barrier is present in the writeX() functions before the register
write, a similar barrier is missing in the readX() functions after the
register read. This could allow memory accesses following readX() to
observe stale data.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/19069/
[jhogan@kernel.org: Tidy commit message]
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-14 00:46:53 +01:00
Linus Torvalds 1bad9ce155 Fixes for bugs in futex, device tree, and userspace breakpoint traps,
and for PCI issues on SH7786.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJa0M3WAAoJELcQ+SIFb8HasUEH/2Jb1rTgmeU/95Xg9GPpQDBc
 IWjNLmtZ3ZoXdpFZUN5Ezr1lcKu+3s4wUOEJkpznsL1VTnO0hYyY+UTfxg2ASRPC
 r/bUAPhkVVl77k7v9jrxLyBa93PxIjZua/r2PooY7nuA9EQugTRxVgCWNAkJnTrJ
 YVCy5lEXl4lyxGQNIhIOEcV1NGkwPwotEwkydjNFHtq1rUiUjwP3TfhdmshRI+7r
 HQCzhs9NaYxRb6PV4xaqiTBT5AfXQnkofpFUp7OgpOZTby8/OAwvL7+xNEfI3bt8
 cYUaEw9BRnAtutE2drSZu+4sSdGlZxCgX6TyWoHDnFREQqeg/DVSWNeJuDC9kmY=
 =feGF
 -----END PGP SIGNATURE-----

Merge tag 'sh-for-4.17' of git://git.libc.org/linux-sh

Pull arch/sh updates from Rich Felker:
 "Fixes for bugs in futex, device tree, and userspace breakpoint traps,
  and for PCI issues on SH7786"

* tag 'sh-for-4.17' of git://git.libc.org/linux-sh:
  arch/sh: pcie-sh7786: handle non-zero DMA offset
  arch/sh: pcie-sh7786: adjust the memory mapping
  arch/sh: pcie-sh7786: adjust PCI MEM and IO regions
  arch/sh: pcie-sh7786: exclude unusable PCI MEM areas
  arch/sh: pcie-sh7786: mark unavailable PCI resource as disabled
  arch/sh: pci: don't use disabled resources
  arch/sh: make the DMA mapping operations observe dev->dma_pfn_offset
  arch/sh: add sh7786_mm_sel() function
  sh: fix debug trap failure to process signals before return to user
  sh: fix memory corruption of unflattened device tree
  sh: fix futex FUTEX_OP_SET op on userspace addresses
2018-04-13 12:27:11 -07:00
Linus Torvalds e4e57f20fa Additional arm64 updates for 4.17
A few late updates to address some issues arising from conflicts with
 other trees:
 
 - Removal of Qualcomm-specific Spectre-v2 mitigation in favour of the
   generic SMCCC-based firmware call
 
 - Fix EL2 hardening capability checking, which was bodged to reduce
   conflicts with the KVM tree
 
 - Add some currently unused assembler macros for managing SIMD registers
   which will be used by some crypto code in the next merge window
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJa0H0mAAoJELescNyEwWM0YYcH/3OMP4qJYT7bKtJvuxSkR8j6
 z8QP9ujdZ3hJL5y2dddvJ1xxSOUxZ9MHMvM5PUQRI/TYj2OkXnnXoDFTtjzXrRiL
 +uDrdyMvkQSz0klAi9qsoVaPzR9LqEiqcglMbuZKUGEd5gzcdzLCrBcY2jRYpGQ8
 w5Kxdw5Am4n97yqHDoGO1tLRmz9D0K3ucMmFE319ocql+j6W0XbqEnhgVfgHvBW/
 DmaAe3VoUbABh+K4JGM7PGk+BUiMEttZpAnjNuasL0+UAnZVgSYSR2lgrex9WaxF
 1K8Aat4ftknozUrZ+H4ZTnBdwTTFkfTzsh9XOTKY7dX4dKd4m6P44r50AwGWsQM=
 =10by
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull more arm64 updates from Will Deacon:
 "A few late updates to address some issues arising from conflicts with
  other trees:

   - Removal of Qualcomm-specific Spectre-v2 mitigation in favour of the
     generic SMCCC-based firmware call

   - Fix EL2 hardening capability checking, which was bodged to reduce
     conflicts with the KVM tree

   - Add some currently unused assembler macros for managing SIMD
     registers which will be used by some crypto code in the next merge
     window"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: assembler: add macros to conditionally yield the NEON under PREEMPT
  arm64: assembler: add utility macros to push/pop stack frames
  arm64: Move the content of bpi.S to hyp-entry.S
  arm64: Get rid of __smccc_workaround_1_hvc_*
  arm64: capabilities: Rework EL2 vector hardening entry
  arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening
2018-04-13 11:24:18 -07:00
Linus Torvalds 6c21e4334a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull  more s390 updates from Martin Schwidefsky:
 "Three notable larger changes next to the usual bug fixing:

   - update the email addresses in MAINTAINERS for the s390 folks to use
     the simpler linux.ibm.com domain instead of the old
     linux.vnet.ibm.com

   - an update for the zcrypt device driver that removes some old and
     obsolete interfaces and add support for up to 256 crypto adapters

   - a rework of the IPL aka boot code"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (23 commits)
  s390: correct nospec auto detection init order
  s390/zcrypt: Support up to 256 crypto adapters.
  s390/zcrypt: Remove deprecated zcrypt proc interface.
  s390/zcrypt: Remove deprecated ioctls.
  s390/zcrypt: Make ap init functions static.
  MAINTAINERS: update s390 maintainers email addresses
  s390/ipl: remove reipl_method and dump_method
  s390/ipl: correct kdump reipl block checksum calculation
  s390/ipl: remove non-existing functions declaration
  s390: assume diag308 set always works
  s390/ipl: avoid adding scpdata to cmdline during ftp/dvd boot
  s390/ipl: correct ipl parmblock valid checks
  s390/ipl: rely on diag308 store to get ipl info
  s390/ipl: move ipl_flags to ipl.c
  s390/ipl: get rid of ipl_ssid and ipl_devno
  s390/ipl: unite diag308 and scsi boot ipl blocks
  s390/ipl: ensure loadparm valid flag is set
  s390/qdio: lock device while installing IRQ handler
  s390/qdio: clear intparm during shutdown
  s390/ccwgroup: require at least one ccw device
  ...
2018-04-13 09:43:20 -07:00
Michael Ellerman 81b654c273 powerpc/64s: Fix CPU_FTRS_ALWAYS vs DT CPU features
The cpu_has_feature() mechanism has an optimisation where at build
time we construct a mask of the CPU feature bits that will always be
true for the given .config, based on the platform/bitness/etc. that we
are building for.

That is incompatible with DT CPU features, where the set of CPU
features is dependent on feature flags that are given to us by
firmware.

The result is that some feature bits can not be *disabled* by DT CPU
features. Or more accurately, they can be disabled but they will still
appear in the ALWAYS mask, meaning cpu_has_feature() will always
return true for them.

In the past this hasn't really been a problem because on Book3S
64 (where we support DT CPU features), the set of ALWAYS bits has been
very small. That was because we always built for POWER4 and later,
meaning the set of common bits was small.

The only bit that could be cleared by DT CPU features that was also in
the ALWAYS mask was CPU_FTR_NODSISRALIGN, and that was only used in
the alignment handler to create a fake DSISR. That code was itself
deleted in 31bfdb036f ("powerpc: Use instruction emulation
infrastructure to handle alignment faults") (Sep 2017).

However the set of ALWAYS features changed with the recent commit
db5ae1c155 ("powerpc/64s: Refine feature sets for little endian
builds") which restricted the set of feature flags when building
little endian to Power7 or later. That caused the ALWAYS mask to
become much larger for little endian builds.

The result is that the following feature bits can currently not
be *disabled* by DT CPU features:

  CPU_FTR_REAL_LE, CPU_FTR_MMCRA, CPU_FTR_CTRL, CPU_FTR_SMT,
  CPU_FTR_PURR, CPU_FTR_SPURR, CPU_FTR_DSCR, CPU_FTR_PKEY,
  CPU_FTR_VMX_COPY, CPU_FTR_CFAR, CPU_FTR_HAS_PPR.

To fix it we need to mask the set of ALWAYS features with the base set
of DT CPU features, ie. the features that are always enabled by DT CPU
features. That way there are no bits in the ALWAYS mask that are not
also always set by DT CPU features.

Fixes: db5ae1c155 ("powerpc/64s: Refine feature sets for little endian builds")
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-13 23:51:44 +10:00
Linus Torvalds 681857ef0d Merge branch 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:

 - fix panic when halting system via "shutdown -h now"

 - drop own coding in favour of generic CONFIG_COMPAT_BINFMT_ELF
   implementation

 - add FPE_CONDTRAP constant: last outstanding parisc-specific cleanup
   for Eric Biedermans siginfo patches

 - move some functions to .init and some to .text.hot linker sections

* 'parisc-4.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: Prevent panic at system halt
  parisc: Switch to generic COMPAT_BINFMT_ELF
  parisc: Move cache flush functions into .text.hot section
  parisc/signal: Add FPE_CONDTRAP for conditional trap handling
2018-04-12 17:07:04 -07:00
Thomas Petazzoni bf9c7e3d79 arch/sh: pcie-sh7786: handle non-zero DMA offset
On SuperH, the base of the physical memory might be different from
zero. In this case, PCI address zero will map to a non-zero physical
address. In order to make sure that the DMA mapping API takes care of
this DMA offset, we must fill in the dev->dma_pfn_offset field for PCI
devices. This gets done in the pcibios_bus_add_device() hook, called
for each new PCI device detected.

The dma_pfn_offset global variable is re-calculated for every PCI
controller available on the platform, but that's not an issue because
its value will each time be exactly the same, as it only depends on
the memory start address and memory size.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:58 -04:00
Thomas Petazzoni 79e1c5e70b arch/sh: pcie-sh7786: adjust the memory mapping
The code setting up the PCI -> SuperHighway mapping doesn't take into
account the fact that the address stored in PCIELARx must be aligned
with the size stored in PCIELAMRx.

For example, when your physical memory starts at 0x0800_0000 (128 MB),
a size of 64 MB or 128 MB is fine. However, if you have 256 MB of
memory, it doesn't work because the base address is not aligned on the
size.

In such situation, we have to round down the base address to make sure
it is aligned on the size of the area. For for a 0x0800_0000 base
address with 256 MB of memory, we will round down to 0x0, and extend
the size of the mapping to 512 MB.

This allows the mapping to work on platforms that have 256 MB of
RAM. The current setup would only work with 128 MB of RAM or less.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:58 -04:00
Thomas Petazzoni 5da1bb96dc arch/sh: pcie-sh7786: adjust PCI MEM and IO regions
The current definition of the PCIe IO and MEM resources for SH7786
doesn't match what the datasheet says. For example, for PCIe0
0xfe100000 is advertised by the datasheet as a PCI IO region, while
0xfd000000 is advertised as a PCI MEM region. The code currently
inverts the two.

The SH4A_PCIEPARL and SH4A_PCIEPTCTLR registers allow to define the
base address and role of the different regions (including whether it's
a MEM or IO region). However, practical experience on a SH7786 shows
that if 0xfe100000 is used for LEL and 0xfd000000 for IO, a PCIe
device using two MEM BARs cannot be accessed at all. Simply using
0xfe100000 for IO and 0xfd000000 for MEM makes the PCIe device
accessible.

It is very likely that this was never seen because there are two other
PCI MEM region listed in the resources. However, for different
reasons, none of the two other MEM regions are usable on the specific
SH7786 platform the problem was encountered. Therefore, the last MEM
region at 0xfe100000 was used to place the BARs, making the device
non-functional.

This commit therefore adjusts those PCI MEM and IO resources
definitions so that they match what the datasheet says. They have only
been tested with PCIe 0.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:57 -04:00
Thomas Petazzoni d62e9bf5dd arch/sh: pcie-sh7786: exclude unusable PCI MEM areas
Depending on the physical memory layout, some PCI MEM areas are not
usable. According to the SH7786 datasheet, the PCI MEM area from
1000_0000 to 13FF_FFFF is only usable if the physical memory layout
(in MMSELR) is 1, 2, 5 or 6. In all other configurations, this PCI MEM
area is not usable (because it overlaps with DRAM).

Therefore, this commit adjusts the PCI SH7786 initialization to mark
the relevant PCI resource as IORESOURCE_DISABLED if we can't use it.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:56 -04:00
Thomas Petazzoni 7dd7f69809 arch/sh: pcie-sh7786: mark unavailable PCI resource as disabled
Some PCI MEM resources are marked as IORESOURCE_MEM_32BIT, which means
they are only usable when the SH core runs in 32-bit mode. In 29-bit
mode, such memory regions are not usable.

The existing code for SH7786 properly skips such regions when
configuring the PCIe controller registers. However, because such
regions are still described in the resource array, the
pcibios_scanbus() function in the SuperH pci.c will register them to
the PCI core. Due to this, the PCI core will allocate MEM areas from
this resource, and assign BARs pointing to this area, even though it's
unusable.

In order to prevent this from happening, we mark such regions as
IORESOURCE_DISABLED, which tells the SuperH pci.c pcibios_scanbus()
function to skip them.

Note that we separate marking the region as disabled from skipping it,
because other regions will be marked as disabled in follow-up patches.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:55 -04:00
Thomas Petazzoni 3aeb93a014 arch/sh: pci: don't use disabled resources
In pcibios_scanbus(), we provide to the PCI core the usable MEM and IO
regions using pci_add_resource_offset(). We travel through all
resources available in the "struct pci_channel".

Also, in register_pci_controller(), we travel through all resources to
request them, making sure they don't conflict with already requested
resources.

However, some resources may be disabled, in which case they should not
be requested nor provided to the PCI core.

In the current situation, none of the resources are disabled. However,
follow-up patches in this series will make some resources disabled,
making this preliminary change necessary.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:54 -04:00
Thomas Petazzoni ce88313069 arch/sh: make the DMA mapping operations observe dev->dma_pfn_offset
Some devices may have a non-zero DMA offset, i.e an offset between the
DMA address and the physical address. Such an offset can be encoded
into the dma_pfn_offset field of "struct device", but the SuperH
implementation of the DMA mapping API does not observe this
information.

This commit fixes that by ensuring the DMA address is properly
calculated depending on this DMA offset.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:53 -04:00
Thomas Petazzoni bc05aa6e13 arch/sh: add sh7786_mm_sel() function
The SH7786 has different physical memory layout configurations,
configurable through the MMSELR register. The configuration is
typically defined by the bootloader, so Linux generally doesn't care.

Except that depending on the configuration, some PCI MEM areas may or
may not be available. This commit adds a helper function that allows
to retrieve the current physical memory layout configuration. It will
be used in a following patch to exclude unusable PCI MEM areas during
the PCI initialization.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:53 -04:00
Rich Felker 96a598996f sh: fix debug trap failure to process signals before return to user
When responding to a debug trap (breakpoint) in userspace, the
kernel's trap handler raised SIGTRAP but returned from the trap via a
code path that ignored pending signals, resulting in an infinite loop
re-executing the trapping instruction.

Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:52 -04:00
Rich Felker eb6b6930a7 sh: fix memory corruption of unflattened device tree
unflatten_device_tree() makes use of memblock allocation, and
therefore must be called before paging_init() migrates the memblock
allocation data to the bootmem framework. Otherwise the record of the
allocation for the expanded device tree will be lost, and will
eventually be clobbered when allocated for another use.

Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:51 -04:00
Aurelien Jarno 9b7e30ab97 sh: fix futex FUTEX_OP_SET op on userspace addresses
Commit 00b73d8d1b ("sh: add working futex atomic ops on userspace
addresses for smp") changed the futex_atomic_op_inuser function to
use a loop. In case of the FUTEX_OP_SET op with a userspace address
containing a value different of 0, this loop is an endless loop.

Fix that by loading the value of oldval from the userspace before doing
the cmpxchg op, also for the FUTEX_OP_SET case.

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Rich Felker <dalias@libc.org>
2018-04-12 19:47:50 -04:00
Sinan Kaya f6b7aeee8f
MIPS: io: Prevent compiler reordering writeX()
writeX() has strong ordering semantics with respect to memory updates.
In the absence of a write barrier or a compiler barrier, the compiler
can reorder register and memory update instructions. This breaks the
writeX() API.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/18997/
[jhogan@kernel.org: Tidy commit message]
Signed-off-by: James Hogan <jhogan@kernel.org>
2018-04-12 23:01:58 +01:00
Linus Torvalds 67a7a8fff8 xen: fixes for 4.17-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEhRJncuj2BJSl0Jf3sN6d1ii/Ey8FAlrPnM8ACgkQsN6d1ii/
 Ey9Kzwf/eQVb6zzn7FDHAb6pLaZ5i2xi2xohsKmhAVQIEa94rZ3mLoRegtnIfyjO
 RcjjSAzHSZO9NQgNA2ALdu6bBdzu4/ywQEQCnY2Gqxp0ocG/+k3p/FqLHZGdcqPo
 e3gpcVxHSFWUCCGm1t3umI25driqrUq4xa6UFi2IB4djDvTrK/JsSygKx6GiVujL
 2eV7v7rgqaaVZQyo8iOd+LlWuKZewKLfnALUDC21X5J2HmvfoyTdn85kldzbiIsG
 YR7mcfgAtAVTyCfgXI3eqAGpRFEyqR4ga87oahdV3/iW+4wreh4hm2Xd/IETXklv
 Epxyet8IlMB9886PuZhZqgnW6o1RDA==
 =z3bP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "A few fixes of Xen related core code and drivers"

* tag 'for-linus-4.17-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
  xen/acpi: off by one in read_acpi_id()
  xen/acpi: upload _PSD info for non Dom0 CPUs too
  x86/xen: Delay get_cpu_cap until stack canary is established
  xen: xenbus_dev_frontend: Verify body of XS_TRANSACTION_END
  xen: xenbus: Catch closing of non existent transactions
  xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
2018-04-12 11:04:35 -07:00
Linus Torvalds 07820c3bf1 Microblaze patches for 4.17-rc1
- Use generic pci_mmap_resoruce_range()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iEYEABECAAYFAlrMWo4ACgkQykllyylKDCGrrQCfZHss5ank6e1H+EApm0KqEQFu
 kbwAoIj6TdCVMH44kJwqtraIhBXV9dhX
 =vYT3
 -----END PGP SIGNATURE-----

Merge tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze

Pull microblaze updates from Michal Simek:
 "Use generic pci_mmap_resource_range()"

* tag 'microblaze-4.17-rc1' of git://git.monstr.eu/linux-2.6-microblaze:
  microblaze: Use generic pci_mmap_resource_range()
  microblaze: Provide pgprot_device/writecombine macros for nommu
2018-04-12 10:18:02 -07:00
Krish Sadhukhan f0f4cf5b30 x86: Add check for APIC access address for vmentry of L2 guests
According to the sub-section titled 'VM-Execution Control Fields' in the
section titled 'Basic VM-Entry Checks' in Intel SDM vol. 3C, the following
vmentry check must be enforced:

    If the 'virtualize APIC-accesses' VM-execution control is 1, the
    APIC-access address must satisfy the following checks:

	- Bits 11:0 of the address must be 0.
	- The address should not set any bits beyond the processor's
	  physical-address width.

This patch adds the necessary check to conform to this rule. If the check
fails, we cause the L2 VMENTRY to fail which is what the associated unit
test (following patch) expects.

Reviewed-by: Mihai Carabas <mihai.carabas@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-12 18:36:28 +02:00
Michael Ellerman 2675c13b29 powerpc/mm/radix: Fix checkstops caused by invalid tlbiel
In tlbiel_radix_set_isa300() we use the PPC_TLBIEL() macro to
construct tlbiel instructions. The instruction takes 5 fields, two of
which are registers, and the others are constants. But because it's
constructed with inline asm the compiler doesn't know that.

We got the constraint wrong on the 'r' field, using "r" tells the
compiler to put the value in a register. The value we then get in the
macro is the *register number*, not the value of the field.

That means when we mask the register number with 0x1 we get 0 or 1
depending on which register the compiler happens to put the constant
in, eg:

  li      r10,1
  tlbiel  r8,r9,2,0,0

  li      r7,1
  tlbiel  r10,r6,0,0,1

If we're unlucky we might generate an invalid instruction form, for
example RIC=0, PRS=1 and R=0, tlbiel r8,r7,0,1,0, this has been
observed to cause machine checks:

  Oops: Machine check, sig: 7 [#1]
  CPU: 24 PID: 0 Comm: swapper
  NIP:  00000000000385f4 LR: 000000000100ed00 CTR: 000000000000007f
  REGS: c00000000110bb40 TRAP: 0200
  MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48002222  XER: 20040000
  CFAR: 00000000000385d0 DAR: 0000000000001c00 DSISR: 00000200 SOFTE: 1

If the machine check happens early in boot while we have MSR_ME=0 it
will escalate into a checkstop and kill the box entirely.

To fix it we could change the inline asm constraint to "i" which
tells the compiler the value is a constant. But a better fix is to just
pass a literal 1 into the macro, which bypasses any problems with inline
asm constraints.

Fixes: d4748276ae ("powerpc/64s: Improve local TLB flush for boot and MCE on POWER9")
Cc: stable@vger.kernel.org # v4.16+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-04-12 23:49:55 +10:00
Arnaldo Carvalho de Melo fd97d39b0a Revert "x86/asm: Allow again using asm.h when building for the 'bpf' clang target"
This reverts commit ca26cffa4e.

Newer clang versions accept that asm(_ASM_SP) construct, and now that
the bpf-script-test-kbuild.c script, used in one of the 'perf test LLVM'
subtests doesn't include ptrace.h, which ended up including
arch/x86/include/asm/asm.h, we can revert this patch.

Suggested-by: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/r/613f0a0d-c433-8f4d-dcc1-c9889deae39e@fb.com
Acked-by: Yonghong Song <yhs@fb.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://lkml.kernel.org/n/tip-nqozcv8loq40tkqpfw997993@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2018-04-12 10:33:27 -03:00
Ingo Molnar ef389b7346 Merge branch 'WIP.x86/asm' into x86/urgent, because the topic is ready
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:42:34 +02:00
Joerg Roedel e3e2881214 x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
The pmd_set_huge() and pud_set_huge() functions are used from
the generic ioremap() code to establish large mappings where this
is possible.

But the generic ioremap() code does not check whether the
PMD/PUD entries are already populated with a non-leaf entry,
so that any page-table pages these entries point to will be
lost.

Further, on x86-32 with SHARED_KERNEL_PMD=0, this causes a
BUG_ON() in vmalloc_sync_one() when PMD entries are synced
from swapper_pg_dir to the current page-table. This happens
because the PMD entry from swapper_pg_dir was promoted to a
huge-page entry while the current PGD still contains the
non-leaf entry. Because both entries are present and point
to a different page, the BUG_ON() triggers.

This was actually triggered with pti-x32 enabled in a KVM
virtual machine by the graphics driver.

A real and better fix for that would be to improve the
page-table handling in the generic ioremap() code. But that is
out-of-scope for this patch-set and left for later work.

Reported-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman Long <llong@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180411152437.GC15462@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:41:41 +02:00
Dave Hansen 8c06c7740d x86/pti: Leave kernel text global for !PCID
Global pages are bad for hardening because they potentially let an
exploit read the kernel image via a Meltdown-style attack which
makes it easier to find gadgets.

But, global pages are good for performance because they reduce TLB
misses when making user/kernel transitions, especially when PCIDs
are not available, such as on older hardware, or where a hypervisor
has disabled them for some reason.

This patch implements a basic, sane policy: If you have PCIDs, you
only map a minimal amount of kernel text global.  If you do not have
PCIDs, you map all kernel text global.

This policy effectively makes PCIDs something that not only adds
performance but a little bit of hardening as well.

I ran a simple "lseek" microbenchmark[1] to test the benefit on
a modern Atom microserver.  Most of the benefit comes from applying
the series before this patch ("entry only"), but there is still a
signifiant benefit from this patch.

  No Global Lines (baseline  ): 6077741 lseeks/sec
  88 Global Lines (entry only): 7528609 lseeks/sec (+23.9%)
  94 Global Lines (this patch): 8433111 lseeks/sec (+38.8%)

[1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205518.E3D989EB@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:06:00 +02:00
Dave Hansen 39114b7a74 x86/pti: Never implicitly clear _PAGE_GLOBAL for kernel image
Summary:

In current kernels, with PTI enabled, no pages are marked Global. This
potentially increases TLB misses.  But, the mechanism by which the Global
bit is set and cleared is rather haphazard.  This patch makes the process
more explicit.  In the end, it leaves us with Global entries in the page
tables for the areas truly shared by userspace and kernel and increases
TLB hit rates.

The place this patch really shines in on systems without PCIDs.  In this
case, we are using an lseek microbenchmark[1] to see how a reasonably
non-trivial syscall behaves.  Higher is better:

  No Global pages (baseline): 6077741 lseeks/sec
  88 Global Pages (this set): 7528609 lseeks/sec (+23.9%)

On a modern Skylake desktop with PCIDs, the benefits are tangible, but not
huge for a kernel compile (lower is better):

  No Global pages (baseline): 186.951 seconds time elapsed  ( +-  0.35% )
  28 Global pages (this set): 185.756 seconds time elapsed  ( +-  0.09% )
                               -1.195 seconds (-0.64%)

I also re-checked everything using the lseek1 test[1]:

  No Global pages (baseline): 15783951 lseeks/sec
  28 Global pages (this set): 16054688 lseeks/sec
			     +270737 lseeks/sec (+1.71%)

The effect is more visible, but still modest.

Details:

The kernel page tables are inherited from head_64.S which rudely marks
them as _PAGE_GLOBAL.  For PTI, we have been relying on the grace of
$DEITY and some insane behavior in pageattr.c to clear _PAGE_GLOBAL.
This patch tries to do better.

First, stop filtering out "unsupported" bits from being cleared in the
pageattr code.  It's fine to filter out *setting* these bits but it
is insane to keep us from clearing them.

Then, *explicitly* go clear _PAGE_GLOBAL from the kernel identity map.
Do not rely on pageattr to do it magically.

After this patch, we can see that "GLB" shows up in each copy of the
page tables, that we have the same number of global entries in each
and that they are the *same* entries.

  /sys/kernel/debug/page_tables/current_kernel:11
  /sys/kernel/debug/page_tables/current_user:11
  /sys/kernel/debug/page_tables/kernel:11

  9caae8ad6a1fb53aca2407ec037f612d  current_kernel.GLB
  9caae8ad6a1fb53aca2407ec037f612d  current_user.GLB
  9caae8ad6a1fb53aca2407ec037f612d  kernel.GLB

A quick visual audit also shows that all the entries make sense.
0xfffffe0000000000 is the cpu_entry_area and 0xffffffff81c00000
is the entry/exit text:

  0xfffffe0000000000-0xfffffe0000002000           8K     ro                 GLB NX pte
  0xfffffe0000002000-0xfffffe0000003000           4K     RW                 GLB NX pte
  0xfffffe0000003000-0xfffffe0000006000          12K     ro                 GLB NX pte
  0xfffffe0000006000-0xfffffe0000007000           4K     ro                 GLB x  pte
  0xfffffe0000007000-0xfffffe000000d000          24K     RW                 GLB NX pte
  0xfffffe000002d000-0xfffffe000002e000           4K     ro                 GLB NX pte
  0xfffffe000002e000-0xfffffe000002f000           4K     RW                 GLB NX pte
  0xfffffe000002f000-0xfffffe0000032000          12K     ro                 GLB NX pte
  0xfffffe0000032000-0xfffffe0000033000           4K     ro                 GLB x  pte
  0xfffffe0000033000-0xfffffe0000039000          24K     RW                 GLB NX pte
  0xffffffff81c00000-0xffffffff81e00000           2M     ro         PSE     GLB x  pmd

[1.] https://github.com/antonblanchard/will-it-scale/blob/master/tests/lseek1.c

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205517.C80FBE05@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:06:00 +02:00
Dave Hansen 0f561fce4d x86/pti: Enable global pages for shared areas
The entry/exit text and cpu_entry_area are mapped into userspace and
the kernel.  But, they are not _PAGE_GLOBAL.  This creates unnecessary
TLB misses.

Add the _PAGE_GLOBAL flag for these areas.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205515.2977EE7D@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:05:59 +02:00
Dave Hansen 639d6aafe4 x86/mm: Do not forbid _PAGE_RW before init for __ro_after_init
__ro_after_init data gets stuck in the .rodata section.  That's normally
fine because the kernel itself manages the R/W properties.

But, if we run __change_page_attr() on an area which is __ro_after_init,
the .rodata checks will trigger and force the area to be immediately
read-only, even if it is early-ish in boot.  This caused problems when
trying to clear the _PAGE_GLOBAL bit for these area in the PTI code:
it cleared _PAGE_GLOBAL like I asked, but also took it up on itself
to clear _PAGE_RW.  The kernel then oopses the next time it wrote to
a __ro_after_init data structure.

To fix this, add the kernel_set_to_readonly check, just like we have
for kernel text, just a few lines below in this function.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205514.8D898241@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:05:59 +02:00
Dave Hansen 430d4005b8 x86/mm: Comment _PAGE_GLOBAL mystery
I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
for kernel text came from.  I audited all the places I could find, but
I missed one: head_64.S.

The page tables that we create in here live for a long time, and they
also have _PAGE_GLOBAL set, despite whether the processor supports it
or not.  It's harmless, and we got *lucky* that the pageattr code
accidentally clears it when we wipe it out of __supported_pte_mask and
then later try to mark kernel text read-only.

Comment some of these properties to make it easier to find and
understand in the future.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:05:58 +02:00
Dave Hansen 1a54420aeb x86/mm: Remove extra filtering in pageattr code
The pageattr code has a mode where it can set or clear PTE bits in
existing PTEs, so the page protections of the *new* PTEs come from
one of two places:

  1. The set/clear masks: cpa->mask_clr / cpa->mask_set
  2. The existing PTE

We filter ->mask_set/clr for supported PTE bits at entry to
__change_page_attr() so we never need to filter them again.

The only other place permissions can come from is an existing PTE
and those already presumably have good bits.  We do not need to filter
them again.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205511.BC072352@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:05:58 +02:00
Dave Hansen fb43d6cb91 x86/mm: Do not auto-massage page protections
A PTE is constructed from a physical address and a pgprotval_t.
__PAGE_KERNEL, for instance, is a pgprot_t and must be converted
into a pgprotval_t before it can be used to create a PTE.  This is
done implicitly within functions like pfn_pte() by massage_pgprot().

However, this makes it very challenging to set bits (and keep them
set) if your bit is being filtered out by massage_pgprot().

This moves the bit filtering out of pfn_pte() and friends.  For
users of PAGE_KERNEL*, filtering will be done automatically inside
those macros but for users of __PAGE_KERNEL*, they need to do their
own filtering now.

Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
instead of massage_pgprot().  This way, we still *look* for
unsupported bits and properly warn about them if we find them.  This
might happen if an unfiltered __PAGE_KERNEL* value was passed in,
for instance.

- printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
- boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
- crash bisected by:              Mike Galbraith <efault@gmx.de>

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reported-and-fixed-by: Arnd Bergmann <arnd@arndb.de>
Fixed-by: Tom Lendacky <thomas.lendacky@amd.com>
Bisected-by: Mike Galbraith <efault@gmx.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-12 09:04:22 +02:00
Linus Torvalds 1fe43114ea More power management updates for 4.17-rc1
- Rework the idle loop in order to prevent CPUs from spending too
    much time in shallow idle states by making it stop the scheduler
    tick before putting the CPU into an idle state only if the idle
    duration predicted by the idle governor is long enough.  That
    required the code to be reordered to invoke the idle governor
    before stopping the tick, among other things (Rafael Wysocki,
    Frederic Weisbecker, Arnd Bergmann).
 
  - Add the missing description of the residency sysfs attribute to
    the cpuidle documentation (Prashanth Prakash).
 
  - Finalize the cpufreq cleanup moving frequency table validation
    from drivers to the core (Viresh Kumar).
 
  - Fix a clock leak regression in the armada-37xx cpufreq driver
    (Gregory Clement).
 
  - Fix the initialization of the CPU performance data structures
    for shared policies in the CPPC cpufreq driver (Shunyong Yang).
 
  - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers
    a bit (Viresh Kumar, Rafael Wysocki).
 
  - Mark the expected switch fall-throughs in the PM QoS core (Gustavo
    Silva).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJazfv7AAoJEILEb/54YlRx/kYP+gPOX5O5cFF22Y2xvDHPMWjm
 D/3Nc2aRo+5DuHHECSIJ3ZVQzVoamN5zQ1KbsBRV0bJgwim4fw4M199Jr/0I2nES
 1pkByuxLrAtwb83uX3uBIQnwgKOAwRftOTeVaFaMoXgIbyUqK7ZFkGq0xQTnKqor
 6+J+78O7wMaIZ0YXQP98BC6g96vs/f+ICrh7qqY85r4NtO/thTA1IKevBmlFeIWR
 yVhEYgwSFBaWehKK8KgbshmBBEk3qzDOYfwZF/JprPhiN/6madgHgYjHC8Seok5c
 QUUTRlyO1ULTQe4JulyJUKobx7HE9u/FXC0RjbBiKPnYR4tb9Hd8OpajPRZo96AT
 8IQCdzL2Iw/ZyQsmQZsWeO1HwPTwVlF/TO2gf6VdQtH221izuHG025p8/RcZe6zb
 fTTFhh6/tmBvmOlbKMwxaLbGbwcj/5W5GvQXlXAtaElLobwwNEcEyVfF4jo4Zx/U
 DQc7agaAps67lcgFAqNDy0PoU6bxV7yoiAIlTJHO9uyPkDNyIfb0ZPlmdIi3xYZd
 tUD7C+VBezrNCkw7JWL1xXLFfJ5X7K6x5bi9I7TBj1l928Hak0dwzs7KlcNBtF1Y
 SwnJsNa3kxunGsPajya8dy5gdO0aFeB9Bse0G429+ugk2IJO/Q9M9nQUArJiC9Xl
 Gw1bw5Ynv6lx+r5EqxHa
 =Pnk4
 -----END PGP SIGNATURE-----

Merge tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more power management updates from Rafael Wysocki:
 "These include one big-ticket item which is the rework of the idle loop
  in order to prevent CPUs from spending too much time in shallow idle
  states. It reduces idle power on some systems by 10% or more and may
  improve performance of workloads in which the idle loop overhead
  matters. This has been in the works for several weeks and it has been
  tested and reviewed quite thoroughly.

  Also included are changes that finalize the cpufreq cleanup moving
  frequency table validation from drivers to the core, a few fixes and
  cleanups of cpufreq drivers, a cpuidle documentation update and a PM
  QoS core update to mark the expected switch fall-throughs in it.

  Specifics:

   - Rework the idle loop in order to prevent CPUs from spending too
     much time in shallow idle states by making it stop the scheduler
     tick before putting the CPU into an idle state only if the idle
     duration predicted by the idle governor is long enough.

     That required the code to be reordered to invoke the idle governor
     before stopping the tick, among other things (Rafael Wysocki,
     Frederic Weisbecker, Arnd Bergmann).

   - Add the missing description of the residency sysfs attribute to the
     cpuidle documentation (Prashanth Prakash).

   - Finalize the cpufreq cleanup moving frequency table validation from
     drivers to the core (Viresh Kumar).

   - Fix a clock leak regression in the armada-37xx cpufreq driver
     (Gregory Clement).

   - Fix the initialization of the CPU performance data structures for
     shared policies in the CPPC cpufreq driver (Shunyong Yang).

   - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a
     bit (Viresh Kumar, Rafael Wysocki).

   - Mark the expected switch fall-throughs in the PM QoS core (Gustavo
     Silva)"

* tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits)
  tick-sched: avoid a maybe-uninitialized warning
  cpufreq: Drop cpufreq_table_validate_and_show()
  cpufreq: SCMI: Don't validate the frequency table twice
  cpufreq: CPPC: Initialize shared perf capabilities of CPUs
  cpufreq: armada-37xx: Fix clock leak
  cpufreq: CPPC: Don't set transition_latency
  cpufreq: ti-cpufreq: Use builtin_platform_driver()
  cpufreq: intel_pstate: Do not include debugfs.h
  PM / QoS: mark expected switch fall-throughs
  cpuidle: Add definition of residency to sysfs documentation
  time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
  nohz: Avoid duplication of code related to got_idle_tick
  nohz: Gather tick_sched booleans under a common flag field
  cpuidle: menu: Avoid selecting shallow states with stopped tick
  cpuidle: menu: Refine idle state selection for running tick
  sched: idle: Select idle state before stopping the tick
  time: hrtimer: Introduce hrtimer_next_event_without()
  time: tick-sched: Split tick_nohz_stop_sched_tick()
  cpuidle: Return nohz hint from cpuidle_select()
  jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  ...
2018-04-11 17:03:20 -07:00
Linus Torvalds 375479c386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:

 - a new and faster epoll based IRQ controller and NIC driver

 - misc fixes and janitorial updates

* git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
  Fix vector raw inintialization logic
  Migrate vector timers to new timer API
  um: Compile with modern headers
  um: vector: Fix an error handling path in 'vector_parse()'
  um: vector: Fix a memory allocation check
  um: vector: fix missing unlock on error in vector_net_open()
  um: Add missing EXPORT for free_irq_by_fd()
  High Performance UML Vector Network Driver
  Epoll based IRQ controller
  um: Use POSIX ucontext_t instead of struct ucontext
  um: time: Use timespec64 for persistent clock
  um: Restore symbol versions for __memcpy and memcpy
2018-04-11 16:36:47 -07:00
Linus Torvalds 45df60cd2c ARM: SoC fixes for 4.17
Here is a very small set of fixes for inclusion in linux-4.17-rc1:
 Two changes for the maintainer file, and one more fix for the newly
 added npcm platform, to enable the level 2 cache controller.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJazib3AAoJEGCrR//JCVIndE8P/1vRE7HcR+DPVwjDWGdfDhQ+
 LuF86+eZRwxHewykA+RQpsdQ8c2N2g+5oL20EvtJpInS/2iE36evZicFIMHQKfnL
 cbij5fi2/hlCRGWfscr2g5Zq1KNPSLqyfRms5z27TRddYoZMYqMg67TwwkQ0tdmf
 WeQ+2Dg17SaZecGFLWmr8cJlo+bx6U32KYHOMo7X8mhW91GPEHLqh/u+LhVQPnnt
 LdZ4IcMH4lC/uXl39qojT8aWmzm2fcKkYBJAMbm3cL3tu76Jyy8rEy5AAdqjHsin
 Xb+wcnCfk6q+dO3OjFhA3bvgDw4LUIyntMiFQvANWWMiGrmIynw9VYIMiQ2Y0S15
 Kv9e2uwkUHiuWJZCDbFCa9pa0kDmxMpoC45wDBR2ktuAa4HB/BP6PKTFrLC0Fxag
 KJvRRDklxHDWbsA+mZTser1mEKG9kslnGNfF3LCALWnFkMgWKiV1NuLr8HucaO7d
 7b7ppofmjva+qcHjDd8I9r4qFnLcWDJHZg3cZus+4WjTHt5ws+ueb0pYPNIKvnE1
 dhvjY9dSmeEZrM8rqLcUD9Npnm4GA+ySFpFty1jPP4hb0Q+Ho1p8BssP1Ktxw8mX
 FQ59x/RYj/5+yPj7st0ERZsxir4D39H3PlYsTWeqjZCCG8VMgBX+UND0ozh7ytYM
 1+1r1us28shsL5Xp6HRo
 =kiZ6
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Here is a very small set of fixes for inclusion in linux-4.17-rc1: Two
  changes for the maintainer file, and one more fix for the newly added
  npcm platform, to enable the level 2 cache controller"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  MAINTAINERS: Update ASPEED entry with details
  MAINTAINERS: Migrate oxnas list to groups.io
  arm: npcm: enable L2 cache in NPCM7xx architecture
2018-04-11 16:12:21 -07:00
Linus Torvalds b82b6813ff nios2 update for v4.17-rc1
nios2: Use read_persistent_clock64() instead of read_persistent_clock()
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJawyvYAAoJEFWoEK+e3syCrUoP/j/0Tdqhsh4WULaFh7InUJ/x
 Ww0C83PuOvzfmqgv8e9a7tciWjdd+lhs7jAROd2FLxa+WYwr1wdnvWZfie2XEscN
 p4clq28tTcNql3pRGFnjd//qDFziZf0XuKkdi2J6AnAu1nP3tY+y9KZD8mpgF0/9
 K4v7Xb51Grj4ltk10t1wZhsuTbkEw+fysZQFEiEwBrir11vVZi+gWpYdRetN8GrQ
 xw7MDxj1rK6tGDL+0vIsmmOKp0CLZKNrkUItBhb2aQUEadhJzZn7CMIhmkRGzLLF
 6eouYZ0rdS8rGWjn1otolXsnKaqPlHoLKR8oG/b4zpQzxNd55Aa11kEKzSNXySXH
 stFJj5lG4DGho2UJmEWk54NHRtbtNEWo0VXBMHgGtg9qoaNQBVVN+4wvV2EKfD3u
 jrGCH1V9ZEHsOsAGbip8mKSELZfxnWF9UBinxw252ZnZoWpLXISaxY/Z45ubK6jv
 t5hNWOwHQO2zvcSHEMn6Q+19CYumFpD7IAYN4sMuIYPrxOJcZfMhdkcJyQwXQ/sR
 OWjCbLneDuSpZ3G/MxG76ilS+K9JlRF0/l3fr0MJW186RloVwlD/AN1Zk4d+asVR
 +341gNiDsSxAXdEnpcBItXFHj7hLUE1mOUXCMdN+y1spPK8YGf68vCeinRMf8lXy
 DRVlYdx5Wdlrd7r6tKQD
 =l1lE
 -----END PGP SIGNATURE-----

Merge tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2

Pull nios2 update from Ley Foon Tan:
 "Use read_persistent_clock64() instead of read_persistent_clock()"

* tag 'nios2-v4.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2:
  nios2: Use read_persistent_clock64() instead of read_persistent_clock()
2018-04-11 16:02:18 -07:00
Helge Deller 6769828703 parisc: Prevent panic at system halt
When issuing a "shutdown -h now", the reboot syscall calls kernel_halt()
which shouldn't return, otherwise one gets this panic:

reboot: System halted
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000
CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 4.16.0-32bit+ #560
Backtrace:
 [<1018a694>] show_stack+0x18/0x28
 [<106e68a8>] dump_stack+0x80/0x10c
 [<101a4df8>] panic+0xfc/0x290
 [<101a90b8>] do_exit+0x73c/0x914
 [<101c7e38>] SyS_reboot+0x190/0x1d4
 [<1017e444>] syscall_exit+0x0/0x14

Fix it by letting machine_halt() call machine_power_off() which doesn't
return.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 22:28:41 +02:00
Ard Biesheuvel 24534b3511 arm64: assembler: add macros to conditionally yield the NEON under PREEMPT
Add support macros to conditionally yield the NEON (and thus the CPU)
that may be called from the assembler code.

In some cases, yielding the NEON involves saving and restoring a non
trivial amount of context (especially in the CRC folding algorithms),
and so the macro is split into three, and the code in between is only
executed when the yield path is taken, allowing the context to be preserved.
The third macro takes an optional label argument that marks the resume
path after a yield has been performed.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:50:34 +01:00
Ard Biesheuvel 0f468e221c arm64: assembler: add utility macros to push/pop stack frames
We are going to add code to all the NEON crypto routines that will
turn them into non-leaf functions, so we need to manage the stack
frames. To make this less tedious and error prone, add some macros
that take the number of callee saved registers to preserve and the
extra size to allocate in the stack frame (for locals) and emit
the ldp/stp sequences.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:50:34 +01:00
Marc Zyngier e8b22d0f45 arm64: Move the content of bpi.S to hyp-entry.S
bpi.S was introduced as we were starting to build the Spectre v2
mitigation framework, and it was rather unclear that it would
become strictly KVM specific.

Now that the picture is a lot clearer, let's move the content
of that file to hyp-entry.S, where it actually belong.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:49:30 +01:00
Marc Zyngier 22765f30db arm64: Get rid of __smccc_workaround_1_hvc_*
The very existence of __smccc_workaround_1_hvc_* is a thinko, as
KVM will never use a HVC call to perform the branch prediction
invalidation. Even as a nested hypervisor, it would use an SMC
instruction.

Let's get rid of it.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:49:30 +01:00
Marc Zyngier 8892b71885 arm64: capabilities: Rework EL2 vector hardening entry
Since 5e7951ce19 ("arm64: capabilities: Clean up midr range helpers"),
capabilities must be represented with a single entry. If multiple
CPU types can use the same capability, then they need to be enumerated
in a list.

The EL2 hardening stuff (which affects both A57 and A72) managed to
escape the conversion in the above patch thanks to the 4.17 merge
window. Let's fix it now.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:49:30 +01:00
Shanker Donthineni 4bc352ffb3 arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardening
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC
V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses
the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead
of Silicon provider service ID 0xC2001700.

Cc: <stable@vger.kernel.org> # 4.14+
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
[maz: reworked errata framework integration]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2018-04-11 18:49:30 +01:00
Matthew Wilcox b93b016313 page cache: use xa_lock
Remove the address_space ->tree_lock and use the xa_lock newly added to
the radix_tree_root.  Rename the address_space ->page_tree to ->i_pages,
since we don't really care that it's a tree.

[willy@infradead.org: fix nds32, fs/dax.c]
  Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox d339d705f7 unicore32: turn flush_dcache_mmap_lock into a no-op
Unicore doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-5-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Matthew Wilcox 427c896f26 arm64: turn flush_dcache_mmap_lock into a no-op
ARM64 doesn't walk the VMA tree in its flush_dcache_page()
implementation, so has no need to take the tree_lock.

Link: http://lkml.kernel.org/r/20180313132639.17387-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:39 -07:00
Masahiro Yamada 2dd8a62c64 linux/const.h: move UL() macro to include/linux/const.h
ARM, ARM64 and UniCore32 duplicate the definition of UL():

  #define UL(x) _AC(x, UL)

This is not actually arch-specific, so it will be useful to move it to a
common header.  Currently, we only have the uapi variant for
linux/const.h, so I am creating include/linux/const.h.

I also added _UL(), _ULL() and ULL() because _AC() is mostly used in
the form either _AC(..., UL) or _AC(..., ULL).  I expect they will be
replaced in follow-up cleanups.  The underscore-prefixed ones should
be used for exported headers.

Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Pavel Tatashin 6f84f8d158 xen, mm: allow deferred page initialization for xen pv domains
Juergen Gross noticed that commit f7f99100d8 ("mm: stop zeroing memory
during allocation in vmemmap") broke XEN PV domains when deferred struct
page initialization is enabled.

This is because the xen's PagePinned() flag is getting erased from
struct pages when they are initialized later in boot.

Juergen fixed this problem by disabling deferred pages on xen pv
domains.  It is desirable, however, to have this feature available as it
reduces boot time.  This fix re-enables the feature for pv-dmains, and
fixes the problem the following way:

The fix is to delay setting PagePinned flag until struct pages for all
allocated memory are initialized, i.e.  until after free_all_bootmem().

A new x86_init.hyper op init_after_bootmem() is called to let xen know
that boot allocator is done, and hence struct pages for all the
allocated memory are now initialized.  If deferred page initialization
is enabled, the rest of struct pages are going to be initialized later
in boot once page_alloc_init_late() is called.

xen_after_bootmem() walks page table's pages and marks them pinned.

Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: Juergen Gross <jgross@suse.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alok Kataria <akataria@vmware.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Mathias Krause <minipli@googlemail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Michal Hocko a4ff8e8620 mm: introduce MAP_FIXED_NOREPLACE
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2.

This has started as a follow up discussion [3][4] resulting in the
runtime failure caused by hardening patch [5] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber an existing underlying mapping (e.g.  stack).
The reason for the failure is that some architectures enforce an
alignment for the given address hint without MAP_FIXED used (e.g.  for
shared or file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [6].  The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution.  We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_NOREPLACE which enforces the given
address but unlike MAP_FIXED it fails with EEXIST if the given range
conflicts with an existing one.  The flag is introduced as a completely
new one rather than a MAP_FIXED extension because of the backward
compatibility.  We really want a never-clobber semantic even on older
kernels which do not recognize the flag.  Unfortunately mmap sucks
wrt flags evaluation because we do not EINVAL on unknown flags.  On
those kernels we would simply use the traditional hint based semantic so
the caller can still get a different address (which sucks) but at least
not silently corrupt an existing mapping.  I do not see a good way
around that.  Except we won't export expose the new semantic to the
userspace at all.

It seems there are users who would like to have something like that.
Jemalloc has been mentioned by Michael Ellerman [7]

Florian Weimer has mentioned the following:
: glibc ld.so currently maps DSOs without hints.  This means that the kernel
: will map right next to each other, and the offsets between them a completely
: predictable.  We would like to change that and supply a random address in a
: window of the address space.  If there is a conflict, we do not want the
: kernel to pick a non-random address. Instead, we would try again with a
: random address.

John Hubbard has mentioned CUDA example
: a) Searches /proc/<pid>/maps for a "suitable" region of available
: VA space.  "Suitable" generally means it has to have a base address
: within a certain limited range (a particular device model might
: have odd limitations, for example), it has to be large enough, and
: alignment has to be large enough (again, various devices may have
: constraints that lead us to do this).
:
: This is of course subject to races with other threads in the process.
:
: Let's say it finds a region starting at va.
:
: b) Next it does:
:     p = mmap(va, ...)
:
: *without* setting MAP_FIXED, of course (so va is just a hint), to
: attempt to safely reserve that region. If p != va, then in most cases,
: this is a failure (almost certainly due to another thread getting a
: mapping from that region before we did), and so this layer now has to
: call munmap(), before returning a "failure: retry" to upper layers.
:
:     IMPROVEMENT: --> if instead, we could call this:
:
:             p = mmap(va, ... MAP_FIXED_NOREPLACE ...)
:
:         , then we could skip the munmap() call upon failure. This
:         is a small thing, but it is useful here. (Thanks to Piotr
:         Jaroszynski and Mark Hairgrove for helping me get that detail
:         exactly right, btw.)
:
: c) After that, CUDA suballocates from p, via:
:
:      q = mmap(sub_region_start, ... MAP_FIXED ...)
:
: Interestingly enough, "freeing" is also done via MAP_FIXED, and
: setting PROT_NONE to the subregion. Anyway, I just included (c) for
: general interest.

Atomic address range probing in the multithreaded programs in general
sounds like an interesting thing to me.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_NOREPLACE.  I believe other places which rely on MAP_FIXED
should follow.  Actually real MAP_FIXED usages should be docummented
properly and they should be more of an exception.

[1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org
[2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org
[3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

This patch (of 2):

MAP_FIXED is used quite often to enforce mapping at the particular range.
The main problem of this flag is, however, that it is inherently dangerous
because it unmaps existing mappings covered by the requested range.  This
can cause silent memory corruptions.  Some of them even with serious
security implications.  While the current semantic might be really
desiderable in many cases there are others which would want to enforce the
given range but rather see a failure than a silent memory corruption on a
clashing range.  Please note that there is no guarantee that a given range
is obeyed by the mmap even when it is free - e.g.  arch specific code is
allowed to apply an alignment.

Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this
behavior.  It has the same semantic as MAP_FIXED wrt.  the given address
request with a single exception that it fails with EEXIST if the requested
address is already covered by an existing mapping.  We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

The flag is introduced as a completely new one rather than a MAP_FIXED
extension because of the backward compatibility.  We really want a
never-clobber semantic even on older kernels which do not recognize the
flag.  Unfortunately mmap sucks wrt.  flags evaluation because we do not
EINVAL on unknown flags.  On those kernels we would simply use the
traditional hint based semantic so the caller can still get a different
address (which sucks) but at least not silently corrupt an existing
mapping.  I do not see a good way around that.

[mpe@ellerman.id.au: fix whitespace]
[fail on clashing range with EEXIST as per Florian Weimer]
[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Link: http://lkml.kernel.org/r/20171213092550.2774-2-mhocko@kernel.org
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Russell King - ARM Linux <linux@armlinux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Jason Evans <jasone@google.com>
Cc: David Goldblatt <davidtgoldblatt@gmail.com>
Cc: Edward Tomasz Napierała <trasz@FreeBSD.org>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:38 -07:00
Kees Cook 8f2af155b5 exec: pass stack rlimit into mm layout functions
Patch series "exec: Pin stack limit during exec".

Attempts to solve problems with the stack limit changing during exec
continue to be frustrated[1][2].  In addition to the specific issues
around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
other places during exec where the stack limit is used and is assumed to
be unchanging.  Given the many places it gets used and the fact that it
can be manipulated/raced via setrlimit() and prlimit(), I think the only
way to handle this is to move away from the "current" view of the stack
limit and instead attach it to the bprm, and plumb this down into the
functions that need to know the stack limits.  This series implements
the approach.

[1] 04e35f4495 ("exec: avoid RLIMIT_STACK races with prlimit()")
[2] 779f4e1c6c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
[3] to security@kernel.org, "Subject: existing rlimit races?"

This patch (of 3):

Since it is possible that the stack rlimit can change externally during
exec (either via another thread calling setrlimit() or another process
calling prlimit()), provide a way to pass the rlimit down into the
per-architecture mm layout functions so that the rlimit can stay in the
bprm structure instead of sitting in the signal structure until exec is
finalized.

Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Greg KH <greg@kroah.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:37 -07:00
Joonsoo Kim 3d2054ad8c ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y
CMA area is now managed by the separate zone, ZONE_MOVABLE, to fix many
MM related problems.  In this implementation, if CONFIG_HIGHMEM = y,
then ZONE_MOVABLE is considered as HIGHMEM and the memory of the CMA
area is also considered as HIGHMEM.  That means that they are considered
as the page without direct mapping.  However, CMA area could be in a
lowmem and the memory could have direct mapping.

In ARM, when establishing a new mapping for DMA, direct mapping should
be cleared since two mapping with different cache policy could cause
unknown problem.  With this patch, PageHighmem() for the CMA memory
located in lowmem returns true so that the function for DMA mapping
cannot notice whether it needs to clear direct mapping or not,
correctly.  To handle this situation, this patch always clears direct
mapping for such CMA memory.

Link: http://lkml.kernel.org/r/1512114786-5085-4-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
Michal Hocko 666feb21a0 mm, migrate: remove reason argument from new_page_t
No allocation callback is using this argument anymore.  new_page_node
used to use this parameter to convey node_id resp.  migration error up
to move_pages code (do_move_page_to_node_array).  The error status never
made it into the final status field and we have a better way to
communicate node id to the status field now.  All other allocation
callbacks simply ignored the argument so we can drop it finally.

[mhocko@suse.com: fix migration callback]
  Link: http://lkml.kernel.org/r/20180105085259.GH2801@dhcp22.suse.cz
[akpm@linux-foundation.org: fix alloc_misplaced_dst_page()]
[mhocko@kernel.org: fix build]
  Link: http://lkml.kernel.org/r/20180103091134.GB11319@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20180103082555.14592-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Zi Yan <zi.yan@cs.rutgers.edu>
Cc: Andrea Reale <ar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11 10:28:32 -07:00
Martin Schwidefsky 6a3d1e81a4 s390: correct nospec auto detection init order
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via
early_initcall is done *after* the early_param functions. This
overwrites any settings done with the nobp/no_spectre_v2/spectre_v2
parameters. The code patching for the kernel is done after the
evaluation of the early parameters but before the early_initcall
is done. The end result is a kernel image that is patched correctly
but the kernel modules are not.

Make sure that the nospec auto detection function is called before the
early parameters are evaluated and before the code patching is done.

Fixes: 6e179d6412 ("s390: add automatic detection of the spectre defense")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-11 17:46:00 +02:00
hu huajun 2698d82e51 KVM: X86: fix incorrect reference of trace_kvm_pi_irte_update
In arch/x86/kvm/trace.h, this function is declared as host_irq the
first input, and vcpu_id the second, instead of otherwise.

Signed-off-by: hu huajun <huhuajun@linux.alibaba.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-11 13:34:48 +02:00
Rafael J. Wysocki 51798deaff Merge branches 'pm-cpuidle' and 'pm-qos'
* pm-cpuidle:
  tick-sched: avoid a maybe-uninitialized warning
  cpuidle: Add definition of residency to sysfs documentation
  time: hrtimer: Use timerqueue_iterate_next() to get to the next timer
  nohz: Avoid duplication of code related to got_idle_tick
  nohz: Gather tick_sched booleans under a common flag field
  cpuidle: menu: Avoid selecting shallow states with stopped tick
  cpuidle: menu: Refine idle state selection for running tick
  sched: idle: Select idle state before stopping the tick
  time: hrtimer: Introduce hrtimer_next_event_without()
  time: tick-sched: Split tick_nohz_stop_sched_tick()
  cpuidle: Return nohz hint from cpuidle_select()
  jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC
  sched: idle: Do not stop the tick before cpuidle_idle_call()
  sched: idle: Do not stop the tick upfront in the idle loop
  time: tick-sched: Reorganize idle tick management code

* pm-qos:
  PM / QoS: mark expected switch fall-throughs
2018-04-11 13:22:46 +02:00
Helge Deller 71d577db01 parisc: Switch to generic COMPAT_BINFMT_ELF
Drop our own compat binfmt implementation in
arch/parisc/kernel/binfmt_elf32.c in favour of the generic
implementation with CONFIG_COMPAT_BINFMT_ELF.

While cleaning up the dependencies, I noticed that ELF_PLATFORM was strangely
defined: On a 32-bit kernel, it was defined to "PARISC", while when running in
compat mode on a 64-bit kernel it was defined to "PARISC32". Since it doesn't
seem to be used in glibc yet, it's now defined in both cases to "PARISC". In
any case, it can be distinguished because it's either a 32-bit or a 64-bit ELF
file.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 11:40:35 +02:00
Helge Deller 2a03bb9e7a parisc: Move cache flush functions into .text.hot section
and move the disable_sr_hashing() C and assembly functions into the
.init section.

Signed-off-by: Helge Deller <deller@gmx.de>
2018-04-11 11:40:35 +02:00
Helge Deller 75abf64287 parisc/signal: Add FPE_CONDTRAP for conditional trap handling
Posix and common sense requires that SI_USER not be a signal specific
si_code. Thus add a new FPE_CONDTRAP si_code for conditional traps.

Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
2018-04-11 11:40:35 +02:00
KarimAllah Ahmed 8e9b29b618 X86/KVM: Do not allow DISABLE_EXITS_MWAIT when LAPIC ARAT is not available
If the processor does not have an "Always Running APIC Timer" (aka ARAT),
we should not give guests direct access to MWAIT. The LAPIC timer would
stop ticking in deep C-states, so any host deadlines would not wakeup the
host kernel.

The host kernel intel_idle driver handles this by switching to broadcast
mode when ARAT is not available and MWAIT is issued with a deep C-state
that would stop the LAPIC timer. When MWAIT is passed through, we can not
tell when MWAIT is issued.

So just disable this capability when LAPIC ARAT is not available. I am not
even sure if there are any CPUs with VMX support but no LAPIC ARAT or not.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Reported-by: Wanpeng Li <kernellwp@gmail.com>
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-11 11:34:16 +02:00
Harald Freudenberger af4a72276d s390/zcrypt: Support up to 256 crypto adapters.
There was an artificial restriction on the card/adapter id
to only 6 bits but all the AP commands do support adapter
ids with 8 bit. This patch removes this restriction to 64
adapters and now up to 256 adapter can get addressed.

Some of the ioctl calls work on the max number of cards
possible (which was 64). These ioctls are now deprecated
but still supported. All the defines, structs and ioctl
interface declarations have been kept for compabibility.
There are now new ioctls (and defines for these) with an
additional '2' appended which provide the extended versions
with 256 cards supported.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-11 10:36:27 +02:00
Nicholas Piggin 19ce7909ed KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
This crashes with a "Bad real address for load" attempting to load
from the vmalloc region in realmode (faulting address is in DAR).

  Oops: Bad interrupt in KVM entry/exit code, sig: 6 [#1]
  LE SMP NR_CPUS=2048 NUMA PowerNV
  CPU: 53 PID: 6582 Comm: qemu-system-ppc Not tainted 4.16.0-01530-g43d1859f0994
  NIP:  c0000000000155ac LR: c0000000000c2430 CTR: c000000000015580
  REGS: c000000fff76dd80 TRAP: 0200   Not tainted  (4.16.0-01530-g43d1859f0994)
  MSR:  9000000000201003 <SF,HV,ME,RI,LE>  CR: 48082222  XER: 00000000
  CFAR: 0000000102900ef0 DAR: d00017fffd941a28 DSISR: 00000040 SOFTE: 3
  NIP [c0000000000155ac] perf_trace_tlbie+0x2c/0x1a0
  LR [c0000000000c2430] do_tlbies+0x230/0x2f0

I suspect the reason is the per-cpu data is not in the linear chunk.
This could be restored if that was able to be fixed, but for now,
just remove the tracepoints.

Fixes: 0428491cba ("powerpc/mm: Trace tlbie(l) instructions")
Cc: stable@vger.kernel.org # v4.13+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 12:35:33 +10:00
Aneesh Kumar K.V 032900e62c powerpc/8xx: Fix build with hugetlbfs enabled
8xx uses the slice code when hugetlbfs is enabled. We missed a header
include on 8xx which resulted in the below build failure:

  config: mpc885_ads_defconfig + CONFIG_HUGETLBFS

  arch/powerpc/mm/slice.c: In function 'slice_get_unmapped_area':
  arch/powerpc/mm/slice.c:655:2: error: implicit declaration of function 'need_extra_context'
  arch/powerpc/mm/slice.c:656:3: error: implicit declaration of function 'alloc_extended_context'

on PPC64 the mmu_context.h was included via linux/pkeys.h

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 12:00:23 +10:00
Nicholas Piggin 3b8070335f powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
The OPAL NVRAM driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, and various lockup errors to trigger (again, BMC reboot
can cause it).

Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.

Fixes: 628daa8d5a ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Depends-on: 34dd25de9f ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
Cc: stable@vger.kernel.org # v3.2+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 11:58:43 +10:00
Linus Torvalds f77cfbe645 c6x changes 4.17
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJazMUIAAoJEOiN4VijXeFPVtQP/3JlMhI6l1GVRBFBSQRCf6+g
 bM11aPrm1Df8wPagI1FzF823PBeHREnOLsgqo9694bFaK8UVx1Fic8DrMdPZulPx
 7NkAV9nUf51HJzM5IgqZIGFa9ene3QafsKSBwC/HPkYLeXqusjtMYDzdUkzZIXij
 BuhiAWFADV/geScibaJ4fn4DmSzXlcG0grUI5whTHdO8heXv0XxPHKRbwMPCx1of
 U8vvvVCQdCbxTEbsm151qMLAfpa4FSOFDAr5WO8q6aXiKM4Zw/WCUONYAjiQLPY+
 nr+azs7+DPWMdeLbr4BqKXirch0FhHelqHbGOYrxQJCg9t+EHZ3sGoOTlPYR56Id
 Hl6giyzwyEDizyHbXMFUT5Y39s0/aCru73K9a51mVgCKYDIGw2rT0onc+Rv3vkXR
 sGyTJWXQJQbt3wsEnBnpn5Zk24fYLP0wBXU0M6J8n21sYQyCD5GWzN6W2FT2iusr
 qhfhsaL6/zPvD1leHFMHY8Q89w4a32WqopKsQRslbFSea/bH5M4b3bq4XmFcauj0
 1Z1D2q8imFAFHhhHFBSZFLzORnfwgm66ntzPIC/P9yI4Zw7YINrSb17u+INlgrWy
 CVHQKku83diZ7lYWmYnTpxX5oNb8GuFw5GUitCxEimbqXnlT2DRIKCGSc24mcmo8
 XNrMNyhKJ6c3ELxJn4W3
 =BKqi
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming

Pull c6x updates from Mark Salter.

* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
  c6x: pass endianness info to sparse
  c6x: fix platforms/plldata.c get_coreid build error
  c6x: remove unused KTHREAD_SIZE definition
2018-04-10 11:50:14 -07:00
Linus Torvalds 948869fa9f MIPS changes for 4.17
These are the main MIPS changes for 4.17. Rough overview:
 
  (1) generic platform: Add support for Microsemi Ocelot SoCs
 
  (2) crypto: Add CRC32 and CRC32C HW acceleration module
 
  (3) Various cleanups and misc improvements
 
 Miscellaneous:
 
  - Hang more efficiently on halt/powerdown/restart
 
  - pm-cps: Block system suspend when a JTAG probe is present
 
  - Expand make help text for generic defconfigs
    - Refactor handling of legacy defconfigs
 
  - Determine the entry point from the ELF file header to fix microMIPS
    for certain toolchains
 
  - Introduce isa-rev.h for MIPS_ISA_REV and use to simplify other code
 
 Minor cleanups:
 
  - DTS: boston/ci20: Unit name cleanups and correction
 
  - kdump: Make the default for PHYSICAL_START always 64-bit
 
  - Constify gpio_led in Alchemy, AR7, and TXX9
 
  - Silence a couple of W=1 warnings
 
  - Remove duplicate includes
 
 Platform support:
 
 ath79:
 
  - Fix AR724X_PLL_REG_PCIE_CONFIG offset
 
 BCM47xx:
 
  - FIRMWARE: Use mac_pton() for MAC address parsing
 
  - Add Luxul XAP1500/XWR1750 WiFi LEDs
 
  - Use standard reset button for Luxul XWR-1750
 
 BMIPS:
 
  - Enable CONFIG_BRCMSTB_PM in bmips_stb_defconfig for build coverage
 
  - Add STB PM, wake-up timer, watchdog DT nodes
 
 Generic platform:
 
  - Add support for Microsemi Ocelot
    - dt-bindings: Add vendor prefix for Microsemi Corporation
    - dt-bindings: Add bindings for Microsemi SoCs
    - Add ocelot SoC & PCB123 board DTS files
    - MAINTAINERS: Add entry for Microsemi MIPS SoCs
 
  - Enable crc32-mips on r6 configs
 
 Octeon:
 
  - Drop '.' after newlines in printk calls
 
 ralink:
 
  - pci-mt7621: Enable PCIe on MT7688
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlrL1tMACgkQbAtpk944
 dnpkhg//UnKOmj4xiQf5ik4XjSDHo2dcUzQKPLq/grzpc9yC2I6bgJ32gcQnD/rw
 x2f64DAjmbIJEY/hCGLEOH82/SqU5vF5gI+BtSKE6Ti28dyQH68pvyHRYIZBFx62
 WRYAf50Spa/q/bOJYv4/eTqjEc7FPMaUll3kH5NMiBB3X5bIeq3w+x78YBKjAqPB
 IJgHF/JnlpD8UvQ0RCl1fIXhkRYeNuTyiAseVI3ygAqbMFMGkTevePWl82D+pWuo
 W9BFaliNt/9TNgq5/L91a/uhSuo6INRls1tFQT9cPa6n4GJxLpUtO7a3v4Xgaukj
 TeXTkXZl4PEUfeTA3cVaXrENTN4sCwThkxvrKwefbUBiIa+/TDi/Wgxi3n+rKwi2
 Vu4V36uxnO/vNU7Sgp/jHdRJyxw06GfRy55MvToFlk1S25Gwt6zW2O0o8nAHwCDW
 zvoRfSkasElKwnM3zKR/mrdDn7qPulFfiyWyKuuY3bIXsaTnItO1JJ/rbgKuIXr7
 lo0eVZ7GhuWfuLCEHZ99rWsGz0SjuQQIb4c6faNELp6gTwIpz9bbsOHRuajvfoSF
 8jmPjUocbaRR0odyFBqsejwBmeebDoeIZljzD50ImSoYftEMgRoQ448mIm0EuWVU
 KY+85MIeK5zwWXKd4VBexgLvPggGURaOEEXKisgvoE3NjCuO2nw=
 =4vUc
 -----END PGP SIGNATURE-----

Merge tag 'mips_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips

Pull MIPS updates from James Hogan:
 "These are the main MIPS changes for 4.17. Rough overview:

   (1) generic platform: Add support for Microsemi Ocelot SoCs

   (2) crypto: Add CRC32 and CRC32C HW acceleration module

   (3) Various cleanups and misc improvements

  More detailed summary:

  Miscellaneous:
   - hang more efficiently on halt/powerdown/restart
   - pm-cps: Block system suspend when a JTAG probe is present
   - expand make help text for generic defconfigs
   - refactor handling of legacy defconfigs
   - determine the entry point from the ELF file header to fix microMIPS
     for certain toolchains
   - introduce isa-rev.h for MIPS_ISA_REV and use to simplify other code

  Minor cleanups:
   - DTS: boston/ci20: Unit name cleanups and correction
   - kdump: Make the default for PHYSICAL_START always 64-bit
   - constify gpio_led in Alchemy, AR7, and TXX9
   - silence a couple of W=1 warnings
   - remove duplicate includes

  Platform support:
  Generic platform:
   - add support for Microsemi Ocelot
   - dt-bindings: Add vendor prefix for Microsemi Corporation
   - dt-bindings: Add bindings for Microsemi SoCs
   - add ocelot SoC & PCB123 board DTS files
   - MAINTAINERS: Add entry for Microsemi MIPS SoCs
   - enable crc32-mips on r6 configs

  ath79:
   - fix AR724X_PLL_REG_PCIE_CONFIG offset

  BCM47xx:
   - firmware: Use mac_pton() for MAC address parsing
   - add Luxul XAP1500/XWR1750 WiFi LEDs
   - use standard reset button for Luxul XWR-1750

  BMIPS:
   - enable CONFIG_BRCMSTB_PM in bmips_stb_defconfig for build coverage
   - add STB PM, wake-up timer, watchdog DT nodes

  Octeon:
   - drop '.' after newlines in printk calls

  ralink:
   - pci-mt7621: Enable PCIe on MT7688"

* tag 'mips_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (37 commits)
  MIPS: BCM47XX: Use standard reset button for Luxul XWR-1750
  MIPS: BCM47XX: Add Luxul XAP1500/XWR1750 WiFi LEDs
  MIPS: Make the default for PHYSICAL_START always 64-bit
  MIPS: Use the entry point from the ELF file header
  MAINTAINERS: Add entry for Microsemi MIPS SoCs
  MIPS: generic: Add support for Microsemi Ocelot
  MIPS: mscc: Add ocelot PCB123 device tree
  MIPS: mscc: Add ocelot dtsi
  dt-bindings: mips: Add bindings for Microsemi SoCs
  dt-bindings: Add vendor prefix for Microsemi Corporation
  MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
  MIPS: pci-mt7620: Enable PCIe on MT7688
  MIPS: pm-cps: Block system suspend when a JTAG probe is present
  MIPS: VDSO: Replace __mips_isa_rev with MIPS_ISA_REV
  MIPS: BPF: Replace __mips_isa_rev with MIPS_ISA_REV
  MIPS: cpu-features.h: Replace __mips_isa_rev with MIPS_ISA_REV
  MIPS: Introduce isa-rev.h to define MIPS_ISA_REV
  MIPS: Hang more efficiently on halt/powerdown/restart
  FIRMWARE: bcm47xx_nvram: Replace mac address parsing
  MIPS: BMIPS: Add Broadcom STB watchdog nodes
  ...
2018-04-10 11:39:22 -07:00
Linus Torvalds 9f3a0941fb libnvdimm for 4.17
* A rework of the filesytem-dax implementation provides for detection of
   unmap operations (truncate / hole punch) colliding with in-progress
   device-DMA. A fix for these collisions remains a work-in-progress
   pending resolution of truncate latency and starvation regressions.
 
 * The of_pmem driver expands the users of libnvdimm outside of x86 and
   ACPI to describe an implementation of persistent memory on PowerPC with
   Open Firmware / Device tree.
 
 * Address Range Scrub (ARS) handling is completely rewritten to account for
   the fact that ARS may run for 100s of seconds and there is no platform
   defined way to cancel it. ARS will now no longer block namespace
   initialization.
 
 * The NVDIMM Namespace Label implementation is updated to handle label
   areas as small as 1K, down from 128K.
 
 * Miscellaneous cleanups and updates to unit test infrastructure.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B
 lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc
 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL
 /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr
 EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg
 xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga
 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57
 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf
 l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA
 ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ
 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m
 9eGJG1H+JspcbtLF9iHn
 =acZQ
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm updates from Dan Williams:
 "This cycle was was not something I ever want to repeat as there were
  several late changes that have only now just settled.

  Half of the branch up to commit d2c997c0f1 ("fs, dax: use
  page->mapping to warn...") have been in -next for several releases.
  The of_pmem driver and the address range scrub rework were late
  arrivals, and the dax work was scaled back at the last moment.

  The of_pmem driver missed a previous merge window due to an oversight.
  A sense of obligation to rectify that miss is why it is included for
  4.17. It has acks from PowerPC folks. Stephen reported a build failure
  that only occurs when merging it with your latest tree, for now I have
  fixed that up by disabling modular builds of of_pmem. A test merge
  with your tree has received a build success report from the 0day robot
  over 156 configs.

  An initial version of the ARS rework was submitted before the merge
  window. It is self contained to libnvdimm, a net code reduction, and
  passing all unit tests.

  The filesystem-dax changes are based on the wait_var_event()
  functionality from tip/sched/core. However, late review feedback
  showed that those changes regressed truncate performance to a large
  degree. The branch was rewound to drop the truncate behavior change
  and now only includes preparation patches and cleanups (with full acks
  and reviews). The finalization of this dax-dma-vs-trnucate work will
  need to wait for 4.18.

  Summary:

   - A rework of the filesytem-dax implementation provides for detection
     of unmap operations (truncate / hole punch) colliding with
     in-progress device-DMA. A fix for these collisions remains a
     work-in-progress pending resolution of truncate latency and
     starvation regressions.

   - The of_pmem driver expands the users of libnvdimm outside of x86
     and ACPI to describe an implementation of persistent memory on
     PowerPC with Open Firmware / Device tree.

   - Address Range Scrub (ARS) handling is completely rewritten to
     account for the fact that ARS may run for 100s of seconds and there
     is no platform defined way to cancel it. ARS will now no longer
     block namespace initialization.

   - The NVDIMM Namespace Label implementation is updated to handle
     label areas as small as 1K, down from 128K.

   - Miscellaneous cleanups and updates to unit test infrastructure"

* tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits)
  libnvdimm, of_pmem: workaround OF_NUMA=n build error
  nfit, address-range-scrub: add module option to skip initial ars
  nfit, address-range-scrub: rework and simplify ARS state machine
  nfit, address-range-scrub: determine one platform max_ars value
  powerpc/powernv: Create platform devs for nvdimm buses
  doc/devicetree: Persistent memory region bindings
  libnvdimm: Add device-tree based driver
  libnvdimm: Add of_node to region and bus descriptors
  libnvdimm, region: quiet region probe
  libnvdimm, namespace: use a safe lookup for dimm device name
  libnvdimm, dimm: fix dpa reservation vs uninitialized label area
  libnvdimm, testing: update the default smart ctrl_temperature
  libnvdimm, testing: Add emulation for smart injection commands
  nfit, address-range-scrub: introduce nfit_spa->ars_state
  libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device'
  nfit, address-range-scrub: fix scrub in-progress reporting
  dax, dm: allow device-mapper to operate without dax support
  dax: introduce CONFIG_DAX_DRIVER
  fs, dax: use page->mapping to warn if truncate collides with a busy page
  ext2, dax: introduce ext2_dax_aops
  ...
2018-04-10 10:25:57 -07:00
Linus Torvalds fbe173e3ff RTC for 4.17
Subsystem:
  - Add tracepoints
  - Rework of the RTC/nvmem API to allow drivers to discard struct nvmem_config
    after registration
  - New range API, drivers can now expose the useful range of the RTC
  - New offset API the core is now able to add an offset to the RTC time,
    modifying the supported range.
  - Multiple rtc_time64_to_tm fixes
  - Handle time_t overflow on 32 bit platforms in the core instead of letting
    drivers do crazy things.
  - remove rtc_control API
 
 New driver:
  - Intersil ISL12026
 
 Drivers:
  - Drivers exposing the RTC non volatile memory have been converted to use nvmem
  - Removed useless time and date validation
  - Removed an indirection pattern that was a cargo cult from ancient drivers
  - Removed VLA usage
  - Fixed a possible race condition in probe functions
  - AB8540 support is dropped from ab8500
  - pcf85363 now has alarm support
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEXx9Viay1+e7J/aM4AyWl4gNJNJIFAlrL3s4ACgkQAyWl4gNJ
 NJI+MxAAgc56UEXi5chqKpHE6GHL2aPWan9duHB6FmGrKWt/FJmpJ8UGylLFlvaW
 dpRGvW0Dynz45UztegcrbHMU6+B2O0hboL4/GVZpYAkcNAcu7Lf0ULho2rsQSDmW
 WJpemmdRxyQY1IkWmw7z7KAkMzhAfYZiVmWmVwMRZfMcKJ3DLEldfgRtkN+g0UdB
 tayWQY3mS02ki16e2figsgwZRmUUhQslDfpKlesInXOzUMmLgVWhf1QxJSEUcfs0
 AMp75vD2YvVJ/RHy/6BilQbqP9EVnaG4NHqJGFSOddazA7u+3HGubEFboI8NuPXb
 2fCvfrNux7pgtQsBF9dnpCqWlukE7sF5aDyIUvjYnr0vUm2D/CwdXglGvQSQVWea
 5GxPTWBdaCL0V7GD5OSZcfUGyz1TN/7NSUItdLSr9YK13dL+tqhcYYm5ytXJLzvO
 Z4GyUEoCOMprMJ9j5KU/TXSjauDmPDl8YZ5B93lPcNOh7y+b/2r3umBaInyZrFzX
 1WJ6FWtuhbRfEwuQtgQHBsobt9eTwZfo8C2y22HBo/fdWfBv45feiiHNXt3OVrA+
 aws6pwfVf1H0UsvpQkXtlPthAx3sbOTndDKAUhntb/U/zA9c7fTrfUyVOQ1bArJy
 p6tl/cmlpq68AjZB+0d3zUXQkQH4Syu+EqZrWItkE6XxZm+khDE=
 =l9i8
 -----END PGP SIGNATURE-----

Merge tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:
 "This contains a few series that have been in preparation for a while
  and that will help systems with RTCs that will fail in 2038, 2069 or
  2100.

  Subsystem:
   - Add tracepoints
   - Rework of the RTC/nvmem API to allow drivers to discard struct
     nvmem_config after registration
   - New range API, drivers can now expose the useful range of the RTC
   - New offset API the core is now able to add an offset to the RTC
     time, modifying the supported range.
   - Multiple rtc_time64_to_tm fixes
   - Handle time_t overflow on 32 bit platforms in the core instead of
     letting drivers do crazy things.
   - remove rtc_control API

  New driver:
   - Intersil ISL12026

  Drivers:
   - Drivers exposing the RTC non volatile memory have been converted to
     use nvmem
   - Removed useless time and date validation
   - Removed an indirection pattern that was a cargo cult from ancient
     drivers
   - Removed VLA usage
   - Fixed a possible race condition in probe functions
   - AB8540 support is dropped from ab8500
   - pcf85363 now has alarm support"

* tag 'rtc-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (128 commits)
  rtc: snvs: Fix usage of snvs_rtc_enable
  rtc: mt7622: fix module autoloading for OF platform drivers
  rtc: isl12022: use true and false for boolean values
  rtc: ab8500: Drop AB8540 support
  rtc: remove a warning during scripts/kernel-doc step
  rtc: 88pm860x: remove artificial limitation
  rtc: 88pm80x: remove artificial limitation
  rtc: st-lpc: remove artificial limitation
  rtc: mrst: remove artificial limitation
  rtc: mv: remove artificial limitation
  rtc: hctosys: Ensure system time doesn't overflow time_t
  parisc: time: stop validating rtc_time in .read_time
  rtc: pcf85063: fix clearing bits in pcf85063_start_clock
  rtc: at91sam9: Set name of regmap_config
  rtc: s5m: Remove VLA usage
  rtc: s5m: Move enum from rtc.h to rtc-s5m.c
  rtc: remove VLA usage
  rtc: Add useful timestamp definitions
  rtc: Add one offset seconds to expand RTC range
  rtc: Factor out the RTC range validation into rtc_valid_range()
  ...
2018-04-10 10:22:27 -07:00
KarimAllah Ahmed 386c6ddbda X86/VMX: Disable VMX preemption timer if MWAIT is not intercepted
The VMX-preemption timer is used by KVM as a way to set deadlines for the
guest (i.e. timer emulation). That was safe till very recently when
capability KVM_X86_DISABLE_EXITS_MWAIT to disable intercepting MWAIT was
introduced. According to Intel SDM 25.5.1:

"""
The VMX-preemption timer operates in the C-states C0, C1, and C2; it also
operates in the shutdown and wait-for-SIPI states. If the timer counts down
to zero in any state other than the wait-for SIPI state, the logical
processor transitions to the C0 C-state and causes a VM exit; the timer
does not cause a VM exit if it counts down to zero in the wait-for-SIPI
state. The timer is not decremented in C-states deeper than C2.
"""

Now once the guest issues the MWAIT with a c-state deeper than
C2 the preemption timer will never wake it up again since it stopped
ticking! Usually this is compensated by other activities in the system that
would wake the core from the deep C-state (and cause a VMExit). For
example, if the host itself is ticking or it received interrupts, etc!

So disable the VMX-preemption timer if MWAIT is exposed to the guest!

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: KarimAllah Ahmed <karahmed@amazon.de>
Fixes: 4d5422cea3
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-04-10 17:19:44 +02:00
Li RongQing a774635db5 x86/apic: Fix signedness bug in APIC ID validity checks
The APIC ID as parsed from ACPI MADT is validity checked with the
apic->apic_id_valid() callback, which depends on the selected APIC type.

For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
are detected as valid. This happens because the 'apicid' argument of the
apic_id_valid() callback is type 'int'. So the resulting comparison

   apicid < 0xFF

evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
mode are considered valid and accounted as possible CPUs.

Change the apicid argument type of the apic_id_valid() callback to u32 so
the evaluation is unsigned and returns the correct result.

[ tglx: Massaged changelog ]

Signed-off-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: jgross@suse.com
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com
2018-04-10 16:46:39 +02:00
Tomer Maimon d893c4de01 arm: npcm: enable L2 cache in NPCM7xx architecture
This patch Enable ARM L2 cache module in Nuvoton NPCM7xx BMC
by adding L2 cache parameters into NPCM7xx DT machine start structure.

At patch V7 arm: npcm: add basic support for Nuvoton BMCs we got comments
regarding the flags use in L2 cache module.
- https://www.spinics.net/lists/arm-kernel/msg613212.html

After checking again the L2 cache use in the NPCM7xx,
the only L2 cache flag we need to set is L2C_AUX_CTRL_SHARED_OVERRIDE
and it is done in the device tree:
https://patchwork.kernel.org/patch/10063497/

L2 cache flag mask allowed all the flag option.

Signed-off-by: Tomer Maimon <tmaimon77@gmail.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2018-04-10 16:40:03 +02:00
Kirill A. Shutemov d94a155c59 x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption
Some features (Intel MKTME, AMD SME) reduce the number of effectively
available physical address bits. cpuinfo_x86::x86_phys_bits is adjusted
accordingly during the early cpu feature detection.

Though if get_cpu_cap() is called later again then this adjustement is
overwritten. That happens in setup_pku(), which is called after
detect_tme().

To address this, extract the address sizes enumeration into a separate
function, which is only called only from early_identify_cpu() and from
generic_identify().

This makes get_cpu_cap() safe to be called later during boot proccess
without overwriting cpuinfo_x86::x86_phys_bits.

[ tglx: Massaged changelog ]

Fixes: cb06d8e3d0 ("x86/tme: Detect if TME and MKTME is activated by BIOS")
Reported-by: Kai Huang <kai.huang@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: linux-mm@kvack.org
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: https://lkml.kernel.org/r/20180410092704.41106-1-kirill.shutemov@linux.intel.com
2018-04-10 16:33:21 +02:00
Nicholas Piggin 34dd25de9f powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
This is the start of an effort to tidy up and standardise all the
delays. Existing loops have a range of delay/sleep periods from 1ms
to 20ms, and some have no delay. They all loop forever except rtc,
which times out after 10 retries, and that uses 10ms delays. So use
10ms as our standard delay. The OPAL maintainer agrees 10ms is a
reasonable starting point.

The idea is to use the same recipe everywhere, once this is proven to
work then it will be documented as an OPAL API standard. Then both
firmware and OS can agree, and if a particular call needs something
else, then that can be documented with reasoning.

This is not the end-all of this effort, it's just a relatively easy
change that fixes some existing high latency delays. There should be
provision for standardising timeouts and/or interruptible loops where
possible, so non-fatal firmware errors don't cause hangs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-11 00:05:23 +10:00
Luc Van Oostenryck 85fa2cc511 c6x: pass endianness info to sparse
c6x depends on the macro '_BIG_ENDIAN' being defined or not
to correctly select or define endian-specific macros, structures
or pieces of code.

This macro is predefined by the compiler but sparse knows nothing
about it and thus may pre-process files differently from what
gcc would.

Fix this by passing '-D_BIG_ENDIAN' when compiling a big-endian
kernel, like GCC would have done.

To: Mark Salter <msalter@redhat.com>
To: Aurelien Jacquiot <a-jacquiot@ti.com>
CC: linux-c6x-dev@linux-c6x.org
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
2018-04-10 09:58:58 -04:00
Randy Dunlap 319938bd6f c6x: fix platforms/plldata.c get_coreid build error
Fix build error reported by the 0day bot by including the header
file for that macro.

Fixes this build error: (should fix; not tested)
arch/c6x/platforms/plldata.c: In function 'c6472_setup_clocks':
arch/c6x/platforms/plldata.c:279:33: error: implicit declaration of function 'get_coreid'; did you mean 'get_order'? [-Werror=implicit-function-declaration]
      c6x_core_clk.parent = &sysclks[get_coreid() + 1];

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <jacquiot.aurelien@gmail.com>
Cc: linux-c6x-dev@linux-c6x.org
Cc: Ingo Molnar <mingo@kernel.org>

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mark Salter <msalter@redhat.com>
2018-04-10 09:58:38 -04:00
Jérémy Lefaure f5ad907e31 c6x: remove unused KTHREAD_SIZE definition
KTHREAD_SIZE has never been used since it has been defined for c6x arch.
Let's remove this useless definition.

Signed-off-by: Jérémy Lefaure <jeremy.lefaure@lse.epita.fr>
Signed-off-by: Mark Salter <msalter@redhat.com>
2018-04-10 09:58:38 -04:00
Boris Ostrovsky a5a18ae73b xen/pvh: Indicate XENFEAT_linux_rsdp_unrestricted to Xen
Pre-4.17 kernels ignored start_info's rsdp_paddr pointer and instead
relied on finding RSDP in standard location in BIOS RO memory. This
has worked since that's where Xen used to place it.

However, with recent Xen change (commit 4a5733771e6f ("libxl: put RSDP
for PVH guest near 4GB")) it prefers to keep RSDP at a "non-standard"
address. Even though as of commit b17d9d1df3 ("x86/xen: Add pvh
specific rsdp address retrieval function") Linux is able to find RSDP,
for back-compatibility reasons we need to indicate to Xen that we can
handle this, an we do so by setting XENFEAT_linux_rsdp_unrestricted
flag in ELF notes.

(Also take this opportunity and sync features.h header file with Xen)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
2018-04-10 09:22:22 -04:00
Harald Freudenberger 2a80786d47 s390/zcrypt: Remove deprecated ioctls.
This patch removes the old status calls which have been marked
as deprecated since at least 2 years now. There is no known
application or library relying on these ioctls any more.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:39:01 +02:00
Vasily Gorbik 96c0cdbc7c s390/ipl: remove reipl_method and dump_method
reipl_method and dump_method have been used in addition to reipl_type
and dump_type, because a single reipl_type could be achieved with
multiple reipl_method (same for dump_type/method). After dropping
non-diag308_set based reipl methods, there is a single method per
reipl_type/dump_type and reipl_method and dump_method could be simply
removed.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:39:00 +02:00
Vasily Gorbik 3b9678472b s390/ipl: correct kdump reipl block checksum calculation
s390 kdump reipl implementation relies on os_info kernel structure
residing in old memory being dumped. os_info contains reipl block,
which is used (if valid) by the kdump kernel for reipl parameters.

The problem is that the reipl block and its checksum inside
os_info is updated only when /sys/firmware/reipl/reipl_type is
written. This sets an offset of a reipl block for "reipl_type" and
re-calculates reipl block checksum. Any further alteration of values
under /sys/firmware/reipl/{reipl_type}/ without subsequent write to
/sys/firmware/reipl/reipl_type lead to incorrect os_info reipl block
checksum. In such a case kdump kernel ignores it and reboots using
default logic.

To fix this, os_info reipl block update is moved right before kdump
execution.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:39:00 +02:00
Vasily Gorbik 0649685073 s390/ipl: remove non-existing functions declaration
do_reipl, do_halt and do_poff are not defined anywhere. Cleaning up
functions declaration.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:39:00 +02:00
Vasily Gorbik d485235b00 s390: assume diag308 set always works
diag308 set has been available for many machine generations, and
alternative reipl code paths has not been exercised and seems to be
broken without noticing for a while now. So, cleaning up all obsolete
reipl methods except currently used ones, assuming that diag308 set
always works.

Also removing not longer needed reset callbacks.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:59 +02:00
Vasily Gorbik ecc0df0f23 s390/ipl: avoid adding scpdata to cmdline during ftp/dvd boot
Add missing ipl parmblock validity check to append_ipl_scpdata.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:59 +02:00
Vasily Gorbik a0832b3aef s390/ipl: correct ipl parmblock valid checks
In some cases diag308_set_works used to be misused as "we have valid ipl
parmblock", which is not the case when diag308 set works, but there is
no ipl parmblock (diag308 store returns DIAG308_RC_NOCONFIG). Such checks
are adjusted to reuse ipl_block_valid instead of diag308_set_works.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:59 +02:00
Vasily Gorbik d08091ac96 s390/ipl: rely on diag308 store to get ipl info
For both ccw and fcp boot retrieve ipl info from ipl block received via
diag308 store. Old scsi ipl parm block handling and cio_get_iplinfo are
removed. Ipl type is deducted from ipl block (if valid).

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:59 +02:00
Vasily Gorbik 283abedb1b s390/ipl: move ipl_flags to ipl.c
ipl_flags and corresponding enum are not used outside of ipl.c and will
be reworked in later commits.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:59 +02:00
Vasily Gorbik e9627da113 s390/ipl: get rid of ipl_ssid and ipl_devno
ipl_ssid and ipl_devno used to be used during ccw boot when diag308 store
was not available. Reuse ipl_block to store those values.

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:58 +02:00
Vasily Gorbik bdbfe18595 s390/ipl: unite diag308 and scsi boot ipl blocks
Ipl parm blocks received via "diag308 store" and during scsi boot at
IPL_PARMBLOCK_ORIGIN are merged into the "ipl_block".

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:58 +02:00
Vasily Gorbik 15deb080a6 s390/ipl: ensure loadparm valid flag is set
When loadparm is set in reipl parm block, the kernel should also set
DIAG308_FLAGS_LP_VALID flag.

This fixes loadparm ignoring during z/VM fcp -> ccw reipl and kvm direct
boot -> ccw reipl.

Cc: <stable@vger.kernel.org>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:58 +02:00
Heiko Carstens 8b09ca746a s390/compat: fix setup_frame32
Git commit c60a03fee0 ("s390: switch to {get,put}_compat_sigset()")
contains a typo and now copies the wrong pointer to user space.
Use the correct pointer instead.

Reported-and-tested-by: Stefan Liebler <stli@linux.vnet.ibm.com>
Fixes: c60a03fee0 ("s390: switch to {get,put}_compat_sigset()")
Cc: <stable@vger.kernel.org> # v4.15+
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:54 +02:00
Harald Freudenberger aff304e7a0 s390/crypto: Adjust s390 aes and paes cipher priorities
Tests with paes-xts and debugging investigations showed
that the ciphers are not always correctly resolved.
The rules for cipher priorities seem to be:
 - Ecb-aes should have a prio greater than the
   generic ecb-aes.
 - The mode specialized ciphers (like cbc-aes-s390)
   should have a prio greater than the sum of the
   more generic combinations (like cbs(aes)).

This patch adjusts the cipher priorities for the
s390 aes and paes in kernel crypto implementations.

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-04-10 07:38:54 +02:00
Anshuman Khandual 709b973c84 powerpc/fscr: Enable interrupts earlier before calling get_user()
The function get_user() can sleep while trying to fetch instruction
from user address space and causes the following warning from the
scheduler.

BUG: sleeping function called from invalid context

Though interrupts get enabled back but it happens bit later after
get_user() is called. This change moves enabling these interrupts
earlier covering the function get_user(). While at this, lets check
for kernel mode and crash as this interrupt should not have been
triggered from the kernel context.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10 11:23:23 +10:00
Michael Ellerman 501a78cbc1 powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
The recent LPM changes to setup_rfi_flush() are causing some section
mismatch warnings because we removed the __init annotation on
setup_rfi_flush():

  The function setup_rfi_flush() references
  the function __init ppc64_bolted_size().
  the function __init memblock_alloc_base().

The references are actually in init_fallback_flush(), but that is
inlined into setup_rfi_flush().

These references are safe because:
 - only pseries calls setup_rfi_flush() at runtime
 - pseries always passes L1D_FLUSH_FALLBACK at boot
 - so the fallback flush area will always be allocated
 - so the check in init_fallback_flush() will always return early:
   /* Only allocate the fallback flush area once (at boot time). */
   if (l1d_flush_fallback_area)
   	return;

 - and therefore we won't actually call the freed init routines.

We should rework the code to make it safer by default rather than
relying on the above, but for now as a quick-fix just add a __ref
annotation to squash the warning.

Fixes: abf110f3e1 ("powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-10 11:23:10 +10:00
Linus Torvalds c18bb396d3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) The sockmap code has to free socket memory on close if there is
    corked data, from John Fastabend.

 2) Tunnel names coming from userspace need to be length validated. From
    Eric Dumazet.

 3) arp_filter() has to take VRFs properly into account, from Miguel
    Fadon Perlines.

 4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.

 5) Missing idr_remove() in u32_delete_key(), from Cong Wang.

 6) More syzbot stuff. Several use of uninitialized value fixes all
    over, from Eric Dumazet.

 7) Do not leak kernel memory to userspace in sctp, also from Eric
    Dumazet.

 8) Discard frames from unused ports in DSA, from Andrew Lunn.

 9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
    Falcon.

10) Do not access dp83640 PHY registers prematurely after reset, from
    Esben Haabendal.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
  vhost-net: set packet weight of tx polling to 2 * vq size
  net: thunderx: rework mac addresses list to u64 array
  inetpeer: fix uninit-value in inet_getpeer
  dp83640: Ensure against premature access to PHY registers after reset
  devlink: convert occ_get op to separate registration
  ARM: dts: ls1021a: Specify TBIPA register address
  net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
  ibmvnic: Do not reset CRQ for Mobility driver resets
  ibmvnic: Fix failover case for non-redundant configuration
  ibmvnic: Fix reset scheduler error handling
  ibmvnic: Zero used TX descriptor counter on reset
  ibmvnic: Fix DMA mapping mistakes
  tipc: use the right skb in tipc_sk_fill_sock_diag()
  sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
  net: dsa: Discard frames from unused ports
  sctp: do not leak kernel memory to user space
  soreuseport: initialise timewait reuseport field
  ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
  dccp: initialize ireq->ir_mark
  net: fix uninit-value in __hw_addr_add_ex()
  ...
2018-04-09 17:04:10 -07:00
Linus Torvalds d8312a3f61 ARM:
- VHE optimizations
 - EL2 address space randomization
 - speculative execution mitigations ("variant 3a", aka execution past invalid
 privilege register access)
 - bugfixes and cleanups
 
 PPC:
 - improvements for the radix page fault handler for HV KVM on POWER9
 
 s390:
 - more kvm stat counters
 - virtio gpu plumbing
 - documentation
 - facilities improvements
 
 x86:
 - support for VMware magic I/O port and pseudo-PMCs
 - AMD pause loop exiting
 - support for AMD core performance extensions
 - support for synchronous register access
 - expose nVMX capabilities to userspace
 - support for Hyper-V signaling via eventfd
 - use Enlightened VMCS when running on Hyper-V
 - allow userspace to disable MWAIT/HLT/PAUSE vmexits
 - usual roundup of optimizations and nested virtualization bugfixes
 
 Generic:
 - API selftest infrastructure (though the only tests are for x86 as of now)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJay19UAAoJEL/70l94x66DGKYIAIu9PTHAEwaX0et15fPW5y2x
 rrtS355lSAmMrPJ1nePRQ+rProD/1B0Kizj3/9O+B9OTKKRsorRYNa4CSu9neO2k
 N3rdE46M1wHAPwuJPcYvh3iBVXtgbMayk1EK5aVoSXaMXEHh+PWZextkl+F+G853
 kC27yDy30jj9pStwnEFSBszO9ua/URdKNKBATNx8WUP6d9U/dlfm5xv3Dc3WtKt2
 UMGmog2wh0i7ecXo7hRkMK4R7OYP3ZxAexq5aa9BOPuFp+ZdzC/MVpN+jsjq2J/M
 Zq6RNyA2HFyQeP0E9QgFsYS2BNOPeLZnT5Jg1z4jyiD32lAZ/iC51zwm4oNKcDM=
 =bPlD
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
 "ARM:
   - VHE optimizations

   - EL2 address space randomization

   - speculative execution mitigations ("variant 3a", aka execution past
     invalid privilege register access)

   - bugfixes and cleanups

  PPC:
   - improvements for the radix page fault handler for HV KVM on POWER9

  s390:
   - more kvm stat counters

   - virtio gpu plumbing

   - documentation

   - facilities improvements

  x86:
   - support for VMware magic I/O port and pseudo-PMCs

   - AMD pause loop exiting

   - support for AMD core performance extensions

   - support for synchronous register access

   - expose nVMX capabilities to userspace

   - support for Hyper-V signaling via eventfd

   - use Enlightened VMCS when running on Hyper-V

   - allow userspace to disable MWAIT/HLT/PAUSE vmexits

   - usual roundup of optimizations and nested virtualization bugfixes

  Generic:
   - API selftest infrastructure (though the only tests are for x86 as
     of now)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
  kvm: x86: fix a prototype warning
  kvm: selftests: add sync_regs_test
  kvm: selftests: add API testing infrastructure
  kvm: x86: fix a compile warning
  KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
  KVM: X86: Introduce handle_ud()
  KVM: vmx: unify adjacent #ifdefs
  x86: kvm: hide the unused 'cpu' variable
  KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
  Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
  kvm: Add emulation for movups/movupd
  KVM: VMX: raise internal error for exception during invalid protected mode state
  KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
  KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
  KVM: x86: Fix misleading comments on handling pending exceptions
  KVM: x86: Rename interrupt.pending to interrupt.injected
  KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
  x86/kvm: use Enlightened VMCS when running on Hyper-V
  x86/hyper-v: detect nested features
  x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
  ...
2018-04-09 11:42:31 -07:00
Dan Williams 1ed41b5696 Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-next 2018-04-09 10:50:08 -07:00
Dave Hansen 6baf4bec02 x86/espfix: Document use of _PAGE_GLOBAL
The "normal" kernel page table creation mechanisms using
PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
The few places in the kernel that always want _PAGE_GLOBAL must
avoid using PAGE_KERNEL_*.

Document that we want it here and its use is not accidental.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205507.BCF4D4F0@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:27:33 +02:00
Dave Hansen 8a57f4849f x86/mm: Introduce "default" kernel PTE mask
The __PAGE_KERNEL_* page permissions are "raw".  They contain bits
that may or may not be supported on the current processor.  They need
to be filtered by a mask (currently __supported_pte_mask) to turn them
into a value that we can actually set in a PTE.

These __PAGE_KERNEL_* values all contain _PAGE_GLOBAL.  But, with PTI,
we want to be able to support _PAGE_GLOBAL (have the bit set in
__supported_pte_mask) but not have it appear in any of these masks by
default.

This patch creates a new mask, __default_kernel_pte_mask, and applies
it when creating all of the PAGE_KERNEL_* masks.  This makes
PAGE_KERNEL_* safe to use anywhere (they only contain supported bits).
It also ensures that PAGE_KERNEL_* contains _PAGE_GLOBAL on PTI=n
kernels but clears _PAGE_GLOBAL when PTI=y.

We also make __default_kernel_pte_mask a non-GPL exported symbol
because there are plenty of driver-available interfaces that take
PAGE_KERNEL_* permissions.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205506.030DB6B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:27:32 +02:00
Dave Hansen 606c7193d5 x86/mm: Undo double _PAGE_PSE clearing
When clearing _PAGE_PRESENT on a huge page, we need to be careful
to also clear _PAGE_PSE, otherwise it might still get confused
for a valid large page table entry.

We do that near the spot where we *set* _PAGE_PSE.  That's fine,
but it's unnecessary.  pgprot_large_2_4k() already did it.

BTW, I also noticed that pgprot_large_2_4k() and
pgprot_4k_2_large() are not symmetric.  pgprot_large_2_4k() clears
_PAGE_PSE (because it is aliased to _PAGE_PAT) but
pgprot_4k_2_large() does not put _PAGE_PSE back.  Bummer.

Also, add some comments and change "promote" to "move".  "Promote"
seems an odd word to move when we are logically moving a bit to a
lower bit position.  Also add an extra line return to make it clear
to which line the comment applies.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205504.9B0F44A9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:27:32 +02:00
Dave Hansen d1440b23c9 x86/mm: Factor out pageattr _PAGE_GLOBAL setting
The pageattr code has a pattern repeated where it sets _PAGE_GLOBAL
for present PTEs but clears it for non-present PTEs.  The intention
is to keep _PAGE_GLOBAL from getting confused with _PAGE_PROTNONE
since _PAGE_GLOBAL is for present PTEs and _PAGE_PROTNONE is for
non-present

But, this pattern makes no sense.  Effectively, it says, if you use
the pageattr code, always set _PAGE_GLOBAL when _PAGE_PRESENT.
canon_pgprot() will clear it if unsupported (because it masks the
value with __supported_pte_mask) but we *always* set it. Even if
canon_pgprot() did not filter _PAGE_GLOBAL, it would be OK.
_PAGE_GLOBAL is ignored when CR4.PGE=0 by the hardware.

This unconditional setting of _PAGE_GLOBAL is a problem when we have
PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some
not.

This updated version of the code says:
1. Clear _PAGE_GLOBAL when !_PAGE_PRESENT
2. Never set _PAGE_GLOBAL implicitly
3. Allow _PAGE_GLOBAL to be in cpa.set_mask
4. Allow _PAGE_GLOBAL to be inherited from previous PTE

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180406205502.86E199DA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:27:32 +02:00
Linus Torvalds 7886e8aa7f Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM SA1100 updates from Russell King:
 "We have support for arbitary MMIO registers providing platform GPIOs,
  which allows us to abstract some of the SA11x0 CF support.

  This set of updates makes that change"

* 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: sa1100/simpad: switch simpad CF to use gpiod APIs
  ARM: sa1100/shannon: convert to generic CF sockets
  ARM: sa1100/nanoengine: convert to generic CF sockets
  ARM: sa1100/h3xxx: switch h3xxx PCMCIA to use gpiod APIs
  ARM: sa1100/cerf: convert to generic CF sockets
  ARM: sa1100/assabet: convert to generic CF sockets
  ARM: sa1100: provide infrastructure to support generic CF sockets
  pcmcia: sa1100: provide generic CF support
2018-04-09 09:26:36 -07:00
Ingo Molnar ee1400dda3 Merge branch 'linus' into x86/pti to pick up upstream changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:24:58 +02:00
Andy Lutomirski 071ccc966b x86/entry/64: Drop idtentry's manual stack switch for user entries
For non-paranoid entries, idtentry knows how to switch from the
kernel stack to the user stack, as does error_entry.  This results
in pointless duplication and code bloat.  Make idtentry stop
thinking about stacks for non-paranoid entries.

This reduces text size by 5377 bytes.

This goes back to the following commit:

  7f2590a110 ("x86/entry/64: Use a per-CPU trampoline stack for IDT entries")

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/90aab80c1f906e70742eaa4512e3c9b5e62d59d4.1522794757.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:23:50 +02:00
Arnd Bergmann 92e830f25f x86/olpc: Fix inconsistent MFD_CS5535 configuration
This Kconfig warning appeared after a fix to the Kconfig validation.
The GPIO_CS5535 driver depends on the MFD_CS5535 driver, but the former
is selected in places where the latter is not:

WARNING: unmet direct dependencies detected for GPIO_CS5535
  Depends on [m]: GPIOLIB [=y] && (X86 [=y] || MIPS || COMPILE_TEST [=y]) && MFD_CS5535 [=m]
  Selected by [y]:
  - OLPC_XO1_SCI [=y] && X86_32 [=y] && OLPC [=y] && OLPC_XO1_PM [=y] && INPUT [=y]=y

The warning does seem appropriate, since the GPIO_CS5535 driver won't
work unless MFD_CS5535 is also present. However, there is no link time
dependency between the two, so this caused no problems during randconfig
testing before.

This changes the 'select GPIO_CS5535' to 'depends on GPIO_CS5535' to
avoid the issue, at the expense of making it harder to configure the
driver (one now has to select the dependencies first).

The 'select MFD_CORE' part is completely redundant, since we already
depend on MFD_CS5535 here, so I'm removing that as well.

Ideally, the private symbols exported by that cs5535 gpio driver would
just be converted to gpiolib interfaces so we could expletely avoid
this dependency.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kbuild@vger.kernel.org
Fixes: f622f82795 ("kconfig: warn unmet direct dependency of tristate symbols selected by y")
Link: http://lkml.kernel.org/r/20180404124539.3817101-1-arnd@arndb.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 18:22:34 +02:00
Linus Torvalds 4a1e00524c Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:
 "A number of core ARM changes:

   - Refactoring linker script by Nicolas Pitre

   - Enable source fortification

   - Add support for Cortex R8"

* 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: decompressor: fix warning introduced in fortify patch
  ARM: 8751/1: Add support for Cortex-R8 processor
  ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE
  ARM: simplify and fix linker script for TCM
  ARM: linker script: factor out TCM bits
  ARM: linker script: factor out vectors and stubs
  ARM: linker script: factor out unwinding table sections
  ARM: linker script: factor out stuff for the .text section
  ARM: linker script: factor out stuff for the DISCARD section
  ARM: linker script: factor out some common definitions between XIP and non-XIP
2018-04-09 09:19:30 -07:00
Linus Torvalds 2025fef0ca Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu
Pull m68knommu update from Greg Ungerer:
 "Only a single fix to set the DMA masks in the ColdFire FEC platform
  data structure.

  This stops the warning from dma-mapping.h at boot time"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68k: set dma and coherent masks for platform FEC ethernets
2018-04-09 09:15:46 -07:00
Linus Torvalds 5148408a51 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
 "A few small changes for alpha"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
  alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering
  alpha: Implement CPU vulnerabilities sysfs functions.
  alpha: rtc: stop validating rtc_time in .read_time
  alpha: rtc: remove unused set_mmss ops
2018-04-09 09:11:32 -07:00
Linus Torvalds becdce1c66 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - Improvements for the spectre defense:
    * The spectre related code is consolidated to a single file
      nospec-branch.c
    * Automatic enable/disable for the spectre v2 defenses (expoline vs.
      nobp)
    * Syslog messages for specve v2 are added
    * Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
      functions for spectre v1 and v2

 - Add helper macros for assembler alternatives and use them to shorten
   the code in entry.S.

 - Add support for persistent configuration data via the SCLP Store Data
   interface. The H/W interface requires a page table that uses 4K pages
   only, the code to setup such an address space is added as well.

 - Enable virtio GPU emulation in QEMU. To do this the depends
   statements for a few common Kconfig options are modified.

 - Add support for format-3 channel path descriptors and add a binary
   sysfs interface to export the associated utility strings.

 - Add a sysfs attribute to control the IFCC handling in case of
   constant channel errors.

 - The vfio-ccw changes from Cornelia.

 - Bug fixes and cleanups.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
  s390/kvm: improve stack frame constants in entry.S
  s390/lpp: use assembler alternatives for the LPP instruction
  s390/entry.S: use assembler alternatives
  s390: add assembler macros for CPU alternatives
  s390: add sysfs attributes for spectre
  s390: report spectre mitigation via syslog
  s390: add automatic detection of the spectre defense
  s390: move nobp parameter functions to nospec-branch.c
  s390/cio: add util_string sysfs attribute
  s390/chsc: query utility strings via fmt3 channel path descriptor
  s390/cio: rename struct channel_path_desc
  s390/cio: fix unbind of io_subchannel_driver
  s390/qdio: split up CCQ handling for EQBS / SQBS
  s390/qdio: don't retry EQBS after CCQ 96
  s390/qdio: restrict buffer merging to eligible devices
  s390/qdio: don't merge ERROR output buffers
  s390/qdio: simplify math in get_*_buffer_frontier()
  s390/decompressor: trim uncompressed image head during the build
  s390/crypto: Fix kernel crash on aes_s390 module remove.
  s390/defkeymap: fix global init to zero
  ...
2018-04-09 09:04:10 -07:00
Dominik Brodowski c76fc98260 syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention
Make the code in syscall_wrapper.h more readable by naming the stub macros
similar to the stub they provide. While at it, fix a stray newline at the
end of the __IA32_COMPAT_SYS_STUBx macro.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-5-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 16:47:28 +02:00
Dominik Brodowski d5a00528b5 syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*()
This rename allows us to have a coherent syscall stub naming convention on
64-bit x86 (0xffffffff prefix removed):

 810f0af0 t            kernel_waitid	# common (32/64) kernel helper

 <inline>            __do_sys_waitid	# inlined helper doing actual work
 810f0be0 t          __se_sys_waitid	# C func calling inlined helper

 <inline>     __do_compat_sys_waitid	# inlined helper doing actual work
 810f0d80 t   __se_compat_sys_waitid	# compat C func calling inlined helper

 810f2080 T         __x64_sys_waitid	# x64 64-bit-ptregs -> C stub
 810f20b0 T        __ia32_sys_waitid	# ia32 32-bit-ptregs -> C stub[*]
 810f2470 T __ia32_compat_sys_waitid	# ia32 32-bit-ptregs -> compat C stub
 810f2490 T  __x32_compat_sys_waitid	# x32 64-bit-ptregs -> compat C stub

    [*] This stub is unused, as the syscall table links
	__ia32_compat_sys_waitid instead of __ia32_sys_waitid as we need
	a compat variant here.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-4-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 16:47:28 +02:00
Dominik Brodowski 5ac9efa3c5 syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention
Tidy the naming convention for compat syscall subs. Hints which describe
the purpose of the stub go in front and receive a double underscore to
denote that they are generated on-the-fly by the COMPAT_SYSCALL_DEFINEx()
macro.

For the generic case, this means:

t            kernel_waitid	# common C function (see kernel/exit.c)

    __do_compat_sys_waitid	# inlined helper doing the actual work
				# (takes original parameters as declared)

T   __se_compat_sys_waitid	# sign-extending C function calling inlined
				# helper (takes parameters of type long,
				# casts them to unsigned long and then to
				# the declared type)

T        compat_sys_waitid      # alias to __se_compat_sys_waitid()
				# (taking parameters as declared), to
				# be included in syscall table

For x86, the naming is as follows:

t            kernel_waitid	# common C function (see kernel/exit.c)

    __do_compat_sys_waitid	# inlined helper doing the actual work
				# (takes original parameters as declared)

t   __se_compat_sys_waitid      # sign-extending C function calling inlined
				# helper (takes parameters of type long,
				# casts them to unsigned long and then to
				# the declared type)

T __ia32_compat_sys_waitid	# IA32_EMULATION 32-bit-ptregs -> C stub,
				# calls __se_compat_sys_waitid(); to be
				# included in syscall table

T  __x32_compat_sys_waitid	# x32 64-bit-ptregs -> C stub, calls
				# __se_compat_sys_waitid(); to be included
				# in syscall table

If only one of IA32_EMULATION and x32 is enabled, __se_compat_sys_waitid()
may be inlined into the stub __{ia32,x32}_compat_sys_waitid().

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-3-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 16:47:28 +02:00
Dominik Brodowski e145242ea0 syscalls/core, syscalls/x86: Clean up syscall stub naming convention
Tidy the naming convention for compat syscall subs. Hints which describe
the purpose of the stub go in front and receive a double underscore to
denote that they are generated on-the-fly by the SYSCALL_DEFINEx() macro.

For the generic case, this means (0xffffffff prefix removed):

 810f08d0 t     kernel_waitid	# common C function (see kernel/exit.c)

 <inline>     __do_sys_waitid	# inlined helper doing the actual work
				# (takes original parameters as declared)

 810f1aa0 T   __se_sys_waitid	# sign-extending C function calling inlined
				# helper (takes parameters of type long;
				# casts them to the declared type)

 810f1aa0 T        sys_waitid	# alias to __se_sys_waitid() (taking
				# parameters as declared), to be included
				# in syscall table

For x86, the naming is as follows:

 810efc70 t     kernel_waitid	# common C function (see kernel/exit.c)

 <inline>     __do_sys_waitid	# inlined helper doing the actual work
				# (takes original parameters as declared)

 810efd60 t   __se_sys_waitid	# sign-extending C function calling inlined
				# helper (takes parameters of type long;
				# casts them to the declared type)

 810f1140 T __ia32_sys_waitid	# IA32_EMULATION 32-bit-ptregs -> C stub,
				# calls __se_sys_waitid(); to be included
				# in syscall table

 810f1110 T        sys_waitid	# x86 64-bit-ptregs -> C stub, calls
				# __se_sys_waitid(); to be included in
				# syscall table

For x86, sys_waitid() will be re-named to __x64_sys_waitid in a follow-up
patch.

Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180409105145.5364-2-linux@dominikbrodowski.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-04-09 16:47:27 +02:00
Russell King 9178caf964 Merge branches 'devel-stable' and 'misc' into for-linus 2018-04-09 10:08:51 +01:00
Michael Ellerman 73aca179d7 powerpc/modules: Fix crashes by adding CONFIG_RELOCATABLE to vermagic
If you build the kernel with CONFIG_RELOCATABLE=n, then install the
modules, rebuild the kernel with CONFIG_RELOCATABLE=y and leave the
old modules installed, we crash something like:

  Unable to handle kernel paging request for data at address 0xd000000018d66cef
  Faulting instruction address: 0xc0000000021ddd08
  Oops: Kernel access of bad area, sig: 11 [#1]
  Modules linked in: x_tables autofs4
  CPU: 2 PID: 1 Comm: systemd Not tainted 4.16.0-rc6-gcc_ubuntu_le-g99fec39 #1
  ...
  NIP check_version.isra.22+0x118/0x170
  Call Trace:
    __ksymtab_xt_unregister_table+0x58/0xfffffffffffffcb8 [x_tables] (unreliable)
    resolve_symbol+0xb4/0x150
    load_module+0x10e8/0x29a0
    SyS_finit_module+0x110/0x140
    system_call+0x58/0x6c

This happens because since commit 71810db27c ("modversions: treat
symbol CRCs as 32 bit quantities"), a relocatable kernel encodes and
handles symbol CRCs differently from a non-relocatable kernel.

Although it's possible we could try and detect this situation and
handle it, it's much more robust to simply make the state of
CONFIG_RELOCATABLE part of the module vermagic.

Fixes: 71810db27c ("modversions: treat symbol CRCs as 32 bit quantities")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-04-09 13:21:02 +10:00