Commit Graph

148133 Commits

Author SHA1 Message Date
Ram Pai eabdb8ca86 powerpc/pkeys: Detach execute_only key on !PROT_EXEC
Disassociate the exec_key from a VMA if the VMA permission is not
PROT_EXEC anymore. Otherwise the exec_only key continues to be
associated with the vma, causing unexpected behavior.

The problem was reported on x86 by Shakeel Butt, which is also
applicable on powerpc.

Fixes: 5586cf61e1 ("powerpc: introduce execute-only pkey")
Cc: stable@vger.kernel.org # v4.16+
Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-05 11:03:25 +10:00
Linus Torvalds d9b446e294 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Kernel side changes:

   - x86 Intel uncore driver cleanups and enhancements (Kan Liang)

   - group scheduling and other fixes (Song Liu

   - store frame pointer in the sample traces for better profiling
     (Alexey Budankov)

   - compat fixes/enhancements (Eugene Syromiatnikov)

  Tooling side changes, which you can build and install in a single step
  via:

      make -C tools/perf clean install

  perf annotate:

   - Support 'perf annotate --group' for non-explicit recorded event
     "groups", showing multiple columns, one for each event, just like
     when dealing with explicit event groups (those enclosed with {})
     (Jin Yao)

   - Record min/max LBR cycles (>= Skylake) and add 'perf annotate' TUI
     hotkey to show it (c) (Jin Yao)

  perf bpf:

   - Add infrastructure to help in writing eBPF C programs to be used
     with '-e name.c' type events in tools such as 'record' and 'trace',
     with headers for common constructs and an examples directory that
     will get populated as we add more such helpers and the 'perf bpf'
     (Arnaldo Carvalho de Melo)

  perf stat:

   - Display time in precision based on std deviation (Jiri Olsa)

   - Add --table option to display time of each run (Jiri Olsa)

   - Display length strings of each run for --table option (Jiri Olsa)

  perf buildid-cache:

   - Add --list and --purge-all options (Ravi Bangoria)

  perf test:

   - Let 'perf test list' display subtests (Hendrik Brueckner)

  perf pti:

   - Create extra kernel maps to help in decoding samples in x86 PTI
     entry trampolines (Adrian Hunter)

   - Copy x86 PTI entry trampoline sections in the kcore copy used for
     annotation and intel_pt CPU traces decoding (Adrian Hunter)

  ... and a lot of other fixes, enhancements and cleanups I did not
  list, see the shortlog and git log for details"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  perf/x86/intel/uncore: Clean up client IMC uncore
  perf/x86/intel/uncore: Expose uncore_pmu_event*() functions
  perf/x86/intel/uncore: Support IIO free-running counters on SKX
  perf/x86/intel/uncore: Add infrastructure for free running counters
  perf/x86/intel/uncore: Add new data structures for free running counters
  perf/x86/intel/uncore: Correct fixed counter index check in generic code
  perf/x86/intel/uncore: Correct fixed counter index check for NHM
  perf/x86/intel/uncore: Introduce customized event_read() for client IMC uncore
  perf/x86: Store user space frame-pointer value on a sample
  perf/core: Wire up compat PERF_EVENT_IOC_QUERY_BPF, PERF_EVENT_IOC_MODIFY_ATTRIBUTES
  perf/core: Fix bad use of igrab()
  perf/core: Fix group scheduling with mixed hw and sw events
  perf kcore_copy: Amend the offset of sections that remap kernel text
  perf kcore_copy: Copy x86 PTI entry trampoline sections
  perf kcore_copy: Get rid of kernel_map
  perf kcore_copy: Iterate phdrs
  perf kcore_copy: Layout sections
  perf kcore_copy: Calculate offset from phnum
  perf kcore_copy: Keep a count of phdrs
  perf kcore_copy: Keep phdr data in a list
  ...
2018-06-04 17:14:22 -07:00
Linus Torvalds 92400b8c8b Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:

 - Lots of tidying up changes all across the map for Linux's formal
   memory/locking-model tooling, by Alan Stern, Akira Yokosawa, Andrea
   Parri, Paul E. McKenney and SeongJae Park.

   Notable changes beyond an overall update in the tooling itself is the
   tidying up of spin_is_locked() semantics, which spills over into the
   kernel proper as well.

 - qspinlock improvements: the locking algorithm now guarantees forward
   progress whereas the previous implementation in mainline could starve
   threads indefinitely in cmpxchg() loops. Also other related cleanups
   to the qspinlock code (Will Deacon)

 - misc smaller improvements, cleanups and fixes all across the locking
   subsystem

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  locking/rwsem: Simplify the is-owner-spinnable checks
  tools/memory-model: Add reference for 'Simplifying ARM concurrency'
  tools/memory-model: Update ASPLOS information
  MAINTAINERS, tools/memory-model: Update e-mail address for Andrea Parri
  tools/memory-model: Fix coding style in 'lock.cat'
  tools/memory-model: Remove out-of-date comments and code from lock.cat
  tools/memory-model: Improve mixed-access checking in lock.cat
  tools/memory-model: Improve comments in lock.cat
  tools/memory-model: Remove duplicated code from lock.cat
  tools/memory-model: Flag "cumulativity" and "propagation" tests
  tools/memory-model: Add model support for spin_is_locked()
  tools/memory-model: Add scripts to test memory model
  tools/memory-model: Fix coding style in 'linux-kernel.def'
  tools/memory-model: Model 'smp_store_mb()'
  tools/memory-order: Update the cheat-sheet to show that smp_mb__after_atomic() orders later RMW operations
  tools/memory-order: Improve key for SELF and SV
  tools/memory-model: Fix cheat sheet typo
  tools/memory-model: Update required version of herdtools7
  tools/memory-model: Redefine rb in terms of rcu-fence
  tools/memory-model: Rename link and rcu-path to rcu-link and rb
  ...
2018-06-04 16:40:11 -07:00
Linus Torvalds 31a85cb35c Merge branch 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:

 - decode x86 CPER data (Yazen Ghannam)

 - ignore unrealistically large option ROMs (Hans de Goede)

 - initialize UEFI secure boot state during Xen dom0 boot (Daniel Kiper)

 - additional minor tweaks and fixes.

* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi/capsule-loader: Don't output reset log when reset flags are not set
  efi/x86: Ignore unrealistically large option ROMs
  efi/x86: Fold __setup_efi_pci32() and __setup_efi_pci64() into one function
  efi: Align efi_pci_io_protocol typedefs to type naming convention
  efi/libstub/tpm: Make function efi_retrieve_tpm2_eventlog_1_2() static
  efi: Decode IA32/X64 Context Info structure
  efi: Decode IA32/X64 MS Check structure
  efi: Decode additional IA32/X64 Bus Check fields
  efi: Decode IA32/X64 Cache, TLB, and Bus Check structures
  efi: Decode UEFI-defined IA32/X64 Error Structure GUIDs
  efi: Decode IA32/X64 Processor Error Info Structure
  efi: Decode IA32/X64 Processor Error Section
  efi: Fix IA32/X64 Processor Error Record definition
  efi/cper: Remove the INDENT_SP silliness
  x86/xen/efi: Initialize UEFI secure boot state during dom0 boot
2018-06-04 16:31:06 -07:00
Linus Torvalds 4057adafb3 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:

 - updates to the handling of expedited grace periods

 - updates to reduce lock contention in the rcu_node combining tree

   [ These are in preparation for the consolidation of RCU-bh,
     RCU-preempt, and RCU-sched into a single flavor, which was
     requested by Linus in response to a security flaw whose root cause
     included confusion between the multiple flavors of RCU ]

 - torture-test updates that save their users some time and effort

 - miscellaneous fixes

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits)
  rcu/x86: Provide early rcu_cpu_starting() callback
  torture: Make kvm-find-errors.sh find build warnings
  rcutorture: Abbreviate kvm.sh summary lines
  rcutorture: Print end-of-test state in kvm.sh summary
  rcutorture: Print end-of-test state
  torture: Fold parse-torture.sh into parse-console.sh
  torture: Add a script to edit output from failed runs
  rcu: Update list of rcu_future_grace_period() trace events
  rcu: Drop early GP request check from rcu_gp_kthread()
  rcu: Simplify and inline cpu_needs_another_gp()
  rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp()
  rcu: Make rcu_start_this_gp() check for out-of-range requests
  rcu: Add funnel locking to rcu_start_this_gp()
  rcu: Make rcu_start_future_gp() caller select grace period
  rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp()
  rcu: Clear request other than RCU_GP_FLAG_INIT at GP end
  rcu: Cleanup, don't put ->completed into an int
  rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs()
  rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition
  rcu: Make rcu_migrate_callbacks wake GP kthread when needed
  ...
2018-06-04 15:54:04 -07:00
Linus Torvalds 137f5ae4da m68k updates for 4.18
- A few time-related fixes:
       - off-by-one calendar month on some classes of machines,
       - Y2038 preparation,
   - Build fix for ndelay() being called with a 64-bit type,
   - Revive 64-bit get_user(), which is used by some Android code,
   - Defconfig updates,
   - Fix for a long-standing fatal bug in iounmap() on '020/030, which
     was actually fixed in 2.4.23, but never in 2.5.x and later,
   - Default DMA mask to avoid warning splats,
   - Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iF4EABYIAAYFAlsU83oACgkQisJQ/WRJ8XC+tAD/TC++gQlSUbz5ZSjZ95aGZl1O
 nHjOHQpnae/R9pNFjogA/imtboTg7ukyx9Qnv7q47G8w7wGyxS+QHVTa6Hr2tEEF
 =8PWk
 -----END PGP SIGNATURE-----

Merge tag 'm68k-for-v4.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k

Pull m68k updates from Geert Uytterhoeven:

 - a few time-related fixes:
     - off-by-one calendar month on some classes of machines
     - Y2038 preparation

 - build fix for ndelay() being called with a 64-bit type

 - revive 64-bit get_user(), which is used by some Android code

 - defconfig updates

 - fix for a long-standing fatal bug in iounmap() on '020/030, which was
   actually fixed in 2.4.23, but never in 2.5.x and later

 - default DMA mask to avoid warning splats

 - minor fixes and cleanups

* tag 'm68k-for-v4.18-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k: Set default dma mask for platform devices
  m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
  m68k/defconfig: Update defconfigs for v4.17-rc3
  m68k/uaccess: Revive 64-bit get_user()
  m68k: Implement ndelay() as an inline function to force type checking/casting
  zorro: Add a blank line after declarations
  m68k: Use read_persistent_clock64() consistently
  m68k: Fix off-by-one calendar month
  m68k: Fix style, spelling, and grammar in siginfo_build_tests()
  m68k/mac: Fix SWIM memory resource end address
2018-06-04 15:50:22 -07:00
Linus Torvalds 93e95fa574 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
 "This set of changes close the known issues with setting si_code to an
  invalid value, and with not fully initializing struct siginfo. There
  remains work to do on nds32, arc, unicore32, powerpc, arm, arm64, ia64
  and x86 to get the code that generates siginfo into a simpler and more
  maintainable state. Most of that work involves refactoring the signal
  handling code and thus careful code review.

  Also not included is the work to shrink the in kernel version of
  struct siginfo. That depends on getting the number of places that
  directly manipulate struct siginfo under control, as it requires the
  introduction of struct kernel_siginfo for the in kernel things.

  Overall this set of changes looks like it is making good progress, and
  with a little luck I will be wrapping up the siginfo work next
  development cycle"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
  signal/sh: Stop gcc warning about an impossible case in do_divide_error
  signal/mips: Report FPE_FLTUNK for undiagnosed floating point exceptions
  signal/um: More carefully relay signals in relay_signal.
  signal: Extend siginfo_layout with SIL_FAULT_{MCEERR|BNDERR|PKUERR}
  signal: Remove unncessary #ifdef SEGV_PKUERR in 32bit compat code
  signal/signalfd: Add support for SIGSYS
  signal/signalfd: Remove __put_user from signalfd_copyinfo
  signal/xtensa: Use force_sig_fault where appropriate
  signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
  signal/um: Use force_sig_fault where appropriate
  signal/sparc: Use force_sig_fault where appropriate
  signal/sparc: Use send_sig_fault where appropriate
  signal/sh: Use force_sig_fault where appropriate
  signal/s390: Use force_sig_fault where appropriate
  signal/riscv: Replace do_trap_siginfo with force_sig_fault
  signal/riscv: Use force_sig_fault where appropriate
  signal/parisc: Use force_sig_fault where appropriate
  signal/parisc: Use force_sig_mceerr where appropriate
  signal/openrisc: Use force_sig_fault where appropriate
  signal/nios2: Use force_sig_fault where appropriate
  ...
2018-06-04 15:23:48 -07:00
Palmer Dabbelt 32c81bced3
RISC-V: Preliminary Perf Support
The RISC-V ISA defines a core set of performance counters that must
exist on all processors along with a standard way to add more
performance counters.

This patch set adds preliminary perf support for RISC-V systems.  Long
term we'll move to model where all PMUs can be built into the kernel at
the same time, detected at runtime (possibly via device tree), and
provided to userspace.  Since we currently only support the ISA-mandated
performance counters there's no need to detect anything right now.

Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-06-04 14:03:18 -07:00
Alan Kao 178e9fc47a
perf: riscv: preliminary RISC-V support
This patch provide a basic PMU, riscv_base_pmu, which supports two
general hardware event, instructions and cycles.  Furthermore, this
PMU serves as a reference implementation to ease the portings in
the future.

riscv_base_pmu should be able to run on any RISC-V machine that
conforms to the Priv-Spec.  Note that the latest qemu model hasn't
fully support a proper behavior of Priv-Spec 1.10 yet, but work
around should be easy with very small fixes.  Please check
https://github.com/riscv/riscv-qemu/pull/115 for future updates.

Cc: Nick Hu <nickhu@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Signed-off-by: Alan Kao <alankao@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-06-04 14:02:01 -07:00
Linus Torvalds 408afb8d78 Merge branch 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio updates from Al Viro:
 "Majority of AIO stuff this cycle. aio-fsync and aio-poll, mostly.

  The only thing I'm holding back for a day or so is Adam's aio ioprio -
  his last-minute fixup is trivial (missing stub in !CONFIG_BLOCK case),
  but let it sit in -next for decency sake..."

* 'work.aio-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
  aio: sanitize the limit checking in io_submit(2)
  aio: fold do_io_submit() into callers
  aio: shift copyin of iocb into io_submit_one()
  aio_read_events_ring(): make a bit more readable
  aio: all callers of aio_{read,write,fsync,poll} treat 0 and -EIOCBQUEUED the same way
  aio: take list removal to (some) callers of aio_complete()
  aio: add missing break for the IOCB_CMD_FDSYNC case
  random: convert to ->poll_mask
  timerfd: convert to ->poll_mask
  eventfd: switch to ->poll_mask
  pipe: convert to ->poll_mask
  crypto: af_alg: convert to ->poll_mask
  net/rxrpc: convert to ->poll_mask
  net/iucv: convert to ->poll_mask
  net/phonet: convert to ->poll_mask
  net/nfc: convert to ->poll_mask
  net/caif: convert to ->poll_mask
  net/bluetooth: convert to ->poll_mask
  net/sctp: convert to ->poll_mask
  net/tipc: convert to ->poll_mask
  ...
2018-06-04 13:57:43 -07:00
Alan Kao ebcbd75e39
riscv: Fix the bug in memory access fixup code
A piece of fixup code is currently shared by __copy_user and
__clear_user.  It first disables the access to user-space memory
and then returns the "n" argument, which represents #(bytes not processed).
However,__copy_user's "n" is in register a2, while __clear_user's in a1,
and thus it causes errors for programs like setdomainname02 testcase in LTP.

This patch fixes this issue by separating their fixup code and returning
the right value for the kernel to handle a relative fault properly.

Signed-off-by: Alan Kao <alankao@andestech.com>
Cc: Greentime Hu <greentime@andestech.com>
Cc: Zong Li <zong@andestech.com>
Cc: Vincent Chen <vincentc@andestech.com>
Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-06-04 13:33:31 -07:00
Linus Torvalds eeee3149aa There's been a fair amount of work in the docs tree this time around,
including:
 
  - Extensive RST conversions and organizational work in the
    memory-management docs thanks to Mike Rapoport.
 
  - An update of Documentation/features from Andrea Parri and a script to
    keep it updated.
 
  - Various LICENSES updates from Thomas, along with a script to check SPDX
    tags.
 
  - Work to fix dangling references to documentation files; this involved a
    fair number of one-liner comment changes outside of Documentation/
 
 ...and the usual list of documentation improvements, typo fixes, etc.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJbFTkKAAoJEI3ONVYwIuV6t24P/0K9qltHkLwsBo2fbGu/emem
 mb1QrZCFZGebKVrCIvET3YcT0q0xPW+ZldwMQYEUeCcu/vD3cGHGXlDbVJCa1fFD
 2OS10W/sEObPnREtlHO/zAzpapKP9DO1/f6NhO55iBJLGOCgoLL5xvSqgsI8MTGd
 vcJDXLitkh4CJEcfNLkQt8dEZzq9Tb6wdSFIvZBBXRNon2ItVN92D5xoQ0wtB+qt
 KmcGYofajK9bjtZpnC4iNg3i+zdwkd80bGTEN9f0hJTRZK5emCILk8fip8CMhRuB
 iwmcqb2RnMLydNLyK9RSs6OS5z3G4fYu9llRtLlZBAupcjRVpalWaBGxLOVO6jBG
 mvkqdKPMtxV4c7NvwKwFQL9dcjtxsxO4RDRYVWN82dS1L6WKKk8UvTuJUBLH0YA5
 af7ZKn7mJVhJ1cxPblaEBOBM3oQuk57LLkjmcpMOXyJ/IOkTIuV1Ezht+XzFyFQv
 VWSyekiKo+8D6WHACPTaWiizjW15e8CyP+WIhKzJyn7VQQrZwhsOS+R//ITsuvQ0
 vRdZ20lwUeBhR+mnXd5NfIo2w7G+OiqiREVAgxjgRrS0PnkzWG7lzzcSVU8HTfT4
 S7VXqval2a9Xg+N8aU2JUe49W858J8hKvIa98hBxGoZa84wxOGtEo7pIKhnMwMSe
 Uhkh/1/bQMxsK3fBEF74
 =I6FG
 -----END PGP SIGNATURE-----

Merge tag 'docs-4.18' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "There's been a fair amount of work in the docs tree this time around,
  including:

   - Extensive RST conversions and organizational work in the
     memory-management docs thanks to Mike Rapoport.

   - An update of Documentation/features from Andrea Parri and a script
     to keep it updated.

   - Various LICENSES updates from Thomas, along with a script to check
     SPDX tags.

   - Work to fix dangling references to documentation files; this
     involved a fair number of one-liner comment changes outside of
     Documentation/

  ... and the usual list of documentation improvements, typo fixes, etc"

* tag 'docs-4.18' of git://git.lwn.net/linux: (103 commits)
  Documentation: document hung_task_panic kernel parameter
  docs/admin-guide/mm: add high level concepts overview
  docs/vm: move ksm and transhuge from "user" to "internals" section.
  docs: Use the kerneldoc comments for memalloc_no*()
  doc: document scope NOFS, NOIO APIs
  docs: update kernel versions and dates in tables
  docs/vm: transhuge: split userspace bits to admin-guide/mm/transhuge
  docs/vm: transhuge: minor updates
  docs/vm: transhuge: change sections order
  Documentation: arm: clean up Marvell Berlin family info
  Documentation: gpio: driver: Fix a typo and some odd grammar
  docs: ranoops.rst: fix location of ramoops.txt
  scripts/documentation-file-ref-check: rewrite it in perl with auto-fix mode
  docs: uio-howto.rst: use a code block to solve a warning
  mm, THP, doc: Add document for thp_swpout/thp_swpout_fallback
  w1: w1_io.c: fix a kernel-doc warning
  Documentation/process/posting: wrap text at 80 cols
  docs: admin-guide: add cgroup-v2 documentation
  Revert "Documentation/features/vm: Remove arch support status file for 'pte_special'"
  Documentation: refcount-vs-atomic: Update reference to LKMM doc.
  ...
2018-06-04 12:34:27 -07:00
Linus Torvalds e5a594643a dma-mapping updates for 4.18:
- replaceme the force_dma flag with a dma_configure bus method.
    (Nipun Gupta, although one patch is іncorrectly attributed to me
     due to a git rebase bug)
  - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)
  - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
    right thing for bounce buffering.
  - move dma-debug initialization to common code, and apply a few cleanups
    to the dma-debug code.
  - cleanup the Kconfig mess around swiotlb selection
  - swiotlb comment fixup (Yisheng Xie)
  - a trivial swiotlb fix. (Dan Carpenter)
  - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)
  - add a new generic dma-noncoherent dma_map_ops implementation and use
    it for arc, c6x and nds32.
  - improve scatterlist validity checking in dma-debug. (Robin Murphy)
  - add a struct device quirk to limit the dma-mask to 32-bit due to
    bridge/system issues, and switch x86 to use it instead of a local
    hack for VIA bridges.
  - handle devices without a dma_mask more gracefully in the dma-direct
    code.
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlsU1hwLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYPraxAAocC7JiFKW133/VugCtGA1x9uE8DPHealtsWTAeEq
 KOOB3GxWMU2hKqQ4km5tcfdWoGJvvab6hmDXcitzZGi2JajO7Ae0FwIy3yvxSIKm
 iH/ON7c4sJt8gKrXYsLVylmwDaimNs4a6xfODoCRgnWuovI2QrrZzupnlzPNsiOC
 lv8ezzcW+Ay/gvDD/r72psO+w3QELETif/OzR/qTOtvLrVabM06eHmPQ8Wb98smu
 /UPMMv6/3XwQnxpxpdyqN+p/gUdneXithzT261wTeZ+8gDXmcWBwHGcMBCimcoBi
 FklW52moazIPIsTysqoNlVFsLGJTeS4p2D3BLAp5NwWYsLv+zHUVZsI1JY/8u5Ox
 mM11LIfvu9JtUzaqD9SvxlxIeLhhYZZGnUoV3bQAkpHSQhN/xp2YXd5NWSo5ac2O
 dch83+laZkZgd6ryw6USpt/YTPM/UHBYy7IeGGHX/PbmAke0ZlvA6Rae7kA5DG59
 7GaLdwQyrHp8uGFgwze8P+R4POSk1ly73HHLBT/pFKnDD7niWCPAnBzuuEQGJs00
 0zuyWLQyzOj1l6HCAcMNyGnYSsMp8Fx0fvEmKR/EYs8O83eJKXi6L9aizMZx4v1J
 0wTolUWH6SIIdz474YmewhG5YOLY7mfe9E8aNr8zJFdwRZqwaALKoteRGUxa3f6e
 zUE=
 =6Acj
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping updates from Christoph Hellwig:

 - replace the force_dma flag with a dma_configure bus method. (Nipun
   Gupta, although one patch is іncorrectly attributed to me due to a
   git rebase bug)

 - use GFP_DMA32 more agressively in dma-direct. (Takashi Iwai)

 - remove PCI_DMA_BUS_IS_PHYS and rely on the dma-mapping API to do the
   right thing for bounce buffering.

 - move dma-debug initialization to common code, and apply a few
   cleanups to the dma-debug code.

 - cleanup the Kconfig mess around swiotlb selection

 - swiotlb comment fixup (Yisheng Xie)

 - a trivial swiotlb fix. (Dan Carpenter)

 - support swiotlb on RISC-V. (based on a patch from Palmer Dabbelt)

 - add a new generic dma-noncoherent dma_map_ops implementation and use
   it for arc, c6x and nds32.

 - improve scatterlist validity checking in dma-debug. (Robin Murphy)

 - add a struct device quirk to limit the dma-mask to 32-bit due to
   bridge/system issues, and switch x86 to use it instead of a local
   hack for VIA bridges.

 - handle devices without a dma_mask more gracefully in the dma-direct
   code.

* tag 'dma-mapping-4.18' of git://git.infradead.org/users/hch/dma-mapping: (48 commits)
  dma-direct: don't crash on device without dma_mask
  nds32: use generic dma_noncoherent_ops
  nds32: implement the unmap_sg DMA operation
  nds32: consolidate DMA cache maintainance routines
  x86/pci-dma: switch the VIA 32-bit DMA quirk to use the struct device flag
  x86/pci-dma: remove the explicit nodac and allowdac option
  x86/pci-dma: remove the experimental forcesac boot option
  Documentation/x86: remove a stray reference to pci-nommu.c
  core, dma-direct: add a flag 32-bit dma limits
  dma-mapping: remove unused gfp_t parameter to arch_dma_alloc_attrs
  dma-debug: check scatterlist segments
  c6x: use generic dma_noncoherent_ops
  arc: use generic dma_noncoherent_ops
  arc: fix arc_dma_{map,unmap}_page
  arc: fix arc_dma_sync_sg_for_{cpu,device}
  arc: simplify arc_dma_sync_single_for_{cpu,device}
  dma-mapping: provide a generic dma-noncoherent implementation
  dma-mapping: simplify Kconfig dependencies
  riscv: add swiotlb support
  riscv: only enable ZONE_DMA32 for 64-bit
  ...
2018-06-04 10:58:12 -07:00
Linus Torvalds cf626b0da7 Merge branch 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull procfs updates from Al Viro:
 "Christoph's proc_create_... cleanups series"

* 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
  xfs, proc: hide unused xfs procfs helpers
  isdn/gigaset: add back gigaset_procinfo assignment
  proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
  tty: replace ->proc_fops with ->proc_show
  ide: replace ->proc_fops with ->proc_show
  ide: remove ide_driver_proc_write
  isdn: replace ->proc_fops with ->proc_show
  atm: switch to proc_create_seq_private
  atm: simplify procfs code
  bluetooth: switch to proc_create_seq_data
  netfilter/x_tables: switch to proc_create_seq_private
  netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
  neigh: switch to proc_create_seq_data
  hostap: switch to proc_create_{seq,single}_data
  bonding: switch to proc_create_seq_data
  rtc/proc: switch to proc_create_single_data
  drbd: switch to proc_create_single
  resource: switch to proc_create_seq_data
  staging/rtl8192u: simplify procfs code
  jfs: simplify procfs code
  ...
2018-06-04 10:00:01 -07:00
Ingo Molnar 24dd064d5b Merge branches 'x86/dma', 'x86/microcode', 'x86/mm' and 'x86/vdso' into x86/urgent
Merge these small and simple 1-2 commit branches into the urgent branch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-04 18:50:32 +02:00
Jim Mattson f4160e459c kvm: nVMX: Add support for "VMWRITE to any supported field"
Add support for "VMWRITE to any supported field in the VMCS" and
enable this feature by default in L1's IA32_VMX_MISC MSR. If userspace
clears the VMX capability bit, the old behavior will be restored.

Note that this feature is a prerequisite for kvm in L1 to use VMCS
shadowing, once that feature is available.

Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04 17:52:51 +02:00
Jim Mattson a943ac50d1 kvm: nVMX: Restrict VMX capability MSR changes
Disallow changes to the VMX capability MSRs while the vCPU is in VMX
operation. Although this does break the existing API, it helps to
avoid some potentially tricky situations for which there is no
architected behavior.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04 17:52:51 +02:00
Wanpeng Li c5ce8235cf KVM: VMX: Optimize tscdeadline timer latency
'Commit d0659d946b ("KVM: x86: add option to advance tscdeadline
hrtimer expiration")' advances the tscdeadline (the timer is emulated
by hrtimer) expiration in order that the latency which is incurred
by hypervisor (apic_timer_fn -> vmentry) can be avoided. This patch
adds the advance tscdeadline expiration support to which the tscdeadline
timer is emulated by VMX preemption timer to reduce the hypervisor
lantency (handle_preemption_timer -> vmentry). The guest can also
set an expiration that is very small (for example in Linux if an
hrtimer feeds a expiration in the past); in that case we set delta_tsc
to 0, leading to an immediately vmexit when delta_tsc is not bigger than
advance ns.

This patch can reduce ~63% latency (~4450 cycles to ~1660 cycles on
a haswell desktop) for kvm-unit-tests/tscdeadline_latency when testing
busy waits.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-04 17:51:59 +02:00
Linus Torvalds f459c34538 for-4.18/block-20180603
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJbFIrHAAoJEPfTWPspceCm2+kQAKo7o7HL30aRxJYu+gYafkuW
 PV47zr3e4vhMDEzDaMsh1+V7I7bm3uS+NZu6cFbcV+N9KXFpeb4V4Hvvm5cs+OC3
 WCOBi4eC1h4qnDQ3ZyySrCMN+KHYJ16pZqddEjqw+fhVudx8i+F+jz3Y4ZMDDc3q
 pArKZvjKh2wEuYXUMFTjaXY46IgPt+er94OwvrhyHk+4AcA+Q/oqSfSdDahUC8jb
 BVR3FV4I3NOHUaru0RbrUko13sVZSboWPCIFrlTDz8xXcJOnVHzdVS1WLFDXLHnB
 O8q9cADCfa4K08kz68RxykcJiNxNvz5ChDaG0KloCFO+q1tzYRoXLsfaxyuUDg57
 Zd93OFZC6hAzXdhclDFIuPET9OQIjDzwphodfKKmDsm3wtyOtydpA0o7JUEongp0
 O1gQsEfYOXmQsXlo8Ot+Z7Ne/HvtGZ91JahUa/59edxQbcKaMrktoyQsQ/d1nOEL
 4kXID18wPcFHWRQHYXyVuw6kbpRtQnh/U2m1eenSZ7tVQHwoe6mF3cfSf5MMseak
 k8nAnmsfEvOL4Ar9ftg61GOrImaQlidxOC2A8fmY5r0Sq/ZldvIFIZizsdTTCcni
 8SOTxcQowyqPf5NvMNQ8cKqqCJap3ppj4m7anZNhbypDIF2TmOWsEcXcMDn4y9on
 fax14DPLo59gBRiPCn5f
 =nga/
 -----END PGP SIGNATURE-----

Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - clean up how we pass around gfp_t and
   blk_mq_req_flags_t (Christoph)

 - prepare us to defer scheduler attach (Christoph)

 - clean up drivers handling of bounce buffers (Christoph)

 - fix timeout handling corner cases (Christoph/Bart/Keith)

 - bcache fixes (Coly)

 - prep work for bcachefs and some block layer optimizations (Kent).

 - convert users of bio_sets to using embedded structs (Kent).

 - fixes for the BFQ io scheduler (Paolo/Davide/Filippo)

 - lightnvm fixes and improvements (Matias, with contributions from Hans
   and Javier)

 - adding discard throttling to blk-wbt (me)

 - sbitmap blk-mq-tag handling (me/Omar/Ming).

 - remove the sparc jsflash block driver, acked by DaveM.

 - Kyber scheduler improvement from Jianchao, making it more friendly
   wrt merging.

 - conversion of symbolic proc permissions to octal, from Joe Perches.
   Previously the block parts were a mix of both.

 - nbd fixes (Josef and Kevin Vigor)

 - unify how we handle the various kinds of timestamps that the block
   core and utility code uses (Omar)

 - three NVMe pull requests from Keith and Christoph, bringing AEN to
   feature completeness, file backed namespaces, cq/sq lock split, and
   various fixes

 - various little fixes and improvements all over the map

* tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits)
  blk-mq: update nr_requests when switching to 'none' scheduler
  block: don't use blocking queue entered for recursive bio submits
  dm-crypt: fix warning in shutdown path
  lightnvm: pblk: take bitmap alloc. out of critical section
  lightnvm: pblk: kick writer on new flush points
  lightnvm: pblk: only try to recover lines with written smeta
  lightnvm: pblk: remove unnecessary bio_get/put
  lightnvm: pblk: add possibility to set write buffer size manually
  lightnvm: fix partial read error path
  lightnvm: proper error handling for pblk_bio_add_pages
  lightnvm: pblk: fix smeta write error path
  lightnvm: pblk: garbage collect lines with failed writes
  lightnvm: pblk: rework write error recovery path
  lightnvm: pblk: remove dead function
  lightnvm: pass flag on graceful teardown to targets
  lightnvm: pblk: check for chunk size before allocating it
  lightnvm: pblk: remove unnecessary argument
  lightnvm: pblk: remove unnecessary indirection
  lightnvm: pblk: return NVM_ error on failed submission
  lightnvm: pblk: warn in case of corrupted write buffer
  ...
2018-06-04 07:58:06 -07:00
Haren Myneni 7574364906 powerpc/powernv: copy/paste - Mask SO bit in CR
NX can set the 3rd bit in CR register for XER[SO] (Summary overflow)
which is not related to paste request. The current paste function
returns failure for a successful request when this bit is set. So mask
this bit and check the proper return status.

Fixes: 2392c8c8c0 ("powerpc/powernv/vas: Define copy/paste interfaces")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 22:58:41 +10:00
Boris Ostrovsky 7f47e1c52d xen/PVH: Make GDT selectors PVH-specific
We don't need to share PVH GDT layout with other GDTs, especially
since we now have a PVH-speciific entry (for stack canary segment).

Define PVH's own selectors.

(As a side effect of this change we are also fixing improper
reference to __KERNEL_CS)

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2018-06-04 10:06:08 +02:00
Boris Ostrovsky 9801406832 xen/PVH: Set up GS segment for stack canary
We are making calls to C code (e.g. xen_prepare_pvh()) which may use
stack canary (stored in GS segment).

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2018-06-04 10:05:37 +02:00
Mark Greer 04debf21fa powerpc: Remove core support for Marvell mv64x60 hostbridges
There are no longer any platforms that use Marvell's mv64x60
hostbridges so remove the supporting kernel code.

CC: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:23 +10:00
Mark Greer 297f831401 powerpc/boot: Remove core support for Marvell mv64x60 hostbridges
There are no longer any platforms that use Marvell's mv64x60
hostbridges so remove the supporting boot code.

Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:23 +10:00
Mark Greer 6bf17a83e6 powerpc/boot: Remove support for Marvell mv64x60 i2c controller
There are no longer any platforms that use Marvell's mv64x60's i2c
controller so remove its driver.

Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Acked-by: Dale Farnsworth <dale@farnsworth.org>
Signed-off-by: Mark Greer &lt;<a href="mailto:mgreer@animalcreek.com">mgreer@animalcreek.com</a>&gt;<br>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:22 +10:00
Mark Greer 30f4bbe047 powerpc/boot: Remove support for Marvell MPSC serial controller
There are no longer any platforms that use Marvell's MPSC serial
controller so remove its driver.

Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:22 +10:00
Mark Greer 92c8c16f34 powerpc/embedded6xx: Remove C2K board support
The C2K platform appears to be orphaned so remove code supporting it.

CC: Remi Machet <rmachet@nvidia.com>
Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Acked-by: Remi Machet <remi@machet.us>
Signed-off-by: Mark Greer <mgreer@animalcreek.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:22 +10:00
Christophe Leroy 2676b89eb8 powerpc/lib: optimise PPC32 memcmp
At the time being, memcmp() compares two chunks of memory
byte per byte.

This patch optimises the comparison by comparing word by word.

On the same way as commit 15c2d45d17 ("powerpc: Add 64bit
optimised memcmp"), this patch moves memcmp() into a dedicated
file named memcmp_32.S

A small benchmark performed on an 8xx comparing two chuncks
of 512 bytes performed 100000 times gives:

Before : 5852274 TB ticks
After:   1488638 TB ticks

This is almost 4 times faster

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:21 +10:00
Christophe Leroy f36bbf21e8 powerpc/lib: optimise 32 bits __clear_user()
Rewrite clear_user() on the same principle as memset(0), making use
of dcbz to clear complete cache lines.

This code is a copy/paste of memset(), with some modifications
in order to retrieve remaining number of bytes to be cleared,
as it needs to be returned in case of error.

On the same way as done on PPC64 in commit 17968fbbd1
("powerpc: 64bit optimised __clear_user"), the patch moves
__clear_user() into a dedicated file string_32.S

On a MPC885, throughput is almost doubled:

Before:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 18.990779 seconds, 52.7MB/s

After:
~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 9.611468 seconds, 104.0MB/s

On a MPC8321, throughput is multiplied by 2.12:

Before:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 6.844352 seconds, 146.1MB/s

After:
root@vgoippro:~# dd if=/dev/zero of=/dev/null bs=1M count=1000
1048576000 bytes (1000.0MB) copied, 3.218854 seconds, 310.7MB/s

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:21 +10:00
Christophe Leroy 60f1d2893e powerpc/time: inline arch_vtime_task_switch()
arch_vtime_task_switch() is a small function which is called
only from vtime_common_task_switch(), so it is worth inlining

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:20 +10:00
Christophe Leroy 1c38976334 powerpc/Makefile: set -mcpu=860 flag for the 8xx
When compiled with GCC 8.1, vmlinux is significantly bigger than
with GCC 4.8.

When looking at the generated code with objdump, we notice that
all functions and loops when a 16 bytes alignment. This significantly
increases the size of the kernel. It is pointless and even
counterproductive as on the 8xx 'nop' also consumes one clock cycle.

Size of vmlinux with GCC 4.8:
   text	   data	    bss	    dec	    hex	filename
5801948	1626076	 457796	7885820	 7853fc	vmlinux

Size of vmlinux with GCC 8.1:
   text	   data	    bss	    dec	    hex	filename
6764592	1630652	 456476	8851720	 871108	vmlinux

Size of vmlinux with GCC 8.1 and this patch:
   text	   data	    bss	    dec	    hex	filename
6331544	1631756	 456476	8419776	 8079c0	vmlinux

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:20 +10:00
Christophe Leroy e9c4943a10 powerpc: Implement csum_ipv6_magic in assembly
The generic csum_ipv6_magic() generates a pretty bad result

00000000 <csum_ipv6_magic>: (PPC32)
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	81 03 00 04 	lwz     r8,4(r3)
   8:	7c e7 4a 14 	add     r7,r7,r9
   c:	7d 29 38 10 	subfc   r9,r9,r7
  10:	7d 4a 51 10 	subfe   r10,r10,r10
  14:	7d 27 42 14 	add     r9,r7,r8
  18:	7d 2a 48 50 	subf    r9,r10,r9
  1c:	80 e3 00 08 	lwz     r7,8(r3)
  20:	7d 08 48 10 	subfc   r8,r8,r9
  24:	7d 4a 51 10 	subfe   r10,r10,r10
  28:	7d 29 3a 14 	add     r9,r9,r7
  2c:	81 03 00 0c 	lwz     r8,12(r3)
  30:	7d 2a 48 50 	subf    r9,r10,r9
  34:	7c e7 48 10 	subfc   r7,r7,r9
  38:	7d 4a 51 10 	subfe   r10,r10,r10
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	7d 2a 48 50 	subf    r9,r10,r9
  44:	80 e4 00 00 	lwz     r7,0(r4)
  48:	7d 08 48 10 	subfc   r8,r8,r9
  4c:	7d 4a 51 10 	subfe   r10,r10,r10
  50:	7d 29 3a 14 	add     r9,r9,r7
  54:	7d 2a 48 50 	subf    r9,r10,r9
  58:	81 04 00 04 	lwz     r8,4(r4)
  5c:	7c e7 48 10 	subfc   r7,r7,r9
  60:	7d 4a 51 10 	subfe   r10,r10,r10
  64:	7d 29 42 14 	add     r9,r9,r8
  68:	7d 2a 48 50 	subf    r9,r10,r9
  6c:	80 e4 00 08 	lwz     r7,8(r4)
  70:	7d 08 48 10 	subfc   r8,r8,r9
  74:	7d 4a 51 10 	subfe   r10,r10,r10
  78:	7d 29 3a 14 	add     r9,r9,r7
  7c:	7d 2a 48 50 	subf    r9,r10,r9
  80:	81 04 00 0c 	lwz     r8,12(r4)
  84:	7c e7 48 10 	subfc   r7,r7,r9
  88:	7d 4a 51 10 	subfe   r10,r10,r10
  8c:	7d 29 42 14 	add     r9,r9,r8
  90:	7d 2a 48 50 	subf    r9,r10,r9
  94:	7d 08 48 10 	subfc   r8,r8,r9
  98:	7d 4a 51 10 	subfe   r10,r10,r10
  9c:	7d 29 2a 14 	add     r9,r9,r5
  a0:	7d 2a 48 50 	subf    r9,r10,r9
  a4:	7c a5 48 10 	subfc   r5,r5,r9
  a8:	7c 63 19 10 	subfe   r3,r3,r3
  ac:	7d 29 32 14 	add     r9,r9,r6
  b0:	7d 23 48 50 	subf    r9,r3,r9
  b4:	7c c6 48 10 	subfc   r6,r6,r9
  b8:	7c 63 19 10 	subfe   r3,r3,r3
  bc:	7c 63 48 50 	subf    r3,r3,r9
  c0:	54 6a 80 3e 	rotlwi  r10,r3,16
  c4:	7c 63 52 14 	add     r3,r3,r10
  c8:	7c 63 18 f8 	not     r3,r3
  cc:	54 63 84 3e 	rlwinm  r3,r3,16,16,31
  d0:	4e 80 00 20 	blr

0000000000000000 <.csum_ipv6_magic>: (PPC64)
   0:	81 23 00 00 	lwz     r9,0(r3)
   4:	80 03 00 04 	lwz     r0,4(r3)
   8:	81 63 00 08 	lwz     r11,8(r3)
   c:	7c e7 4a 14 	add     r7,r7,r9
  10:	7f 89 38 40 	cmplw   cr7,r9,r7
  14:	7d 47 02 14 	add     r10,r7,r0
  18:	7d 30 10 26 	mfocrf  r9,1
  1c:	55 29 f7 fe 	rlwinm  r9,r9,30,31,31
  20:	7d 4a 4a 14 	add     r10,r10,r9
  24:	7f 80 50 40 	cmplw   cr7,r0,r10
  28:	7d 2a 5a 14 	add     r9,r10,r11
  2c:	80 03 00 0c 	lwz     r0,12(r3)
  30:	81 44 00 00 	lwz     r10,0(r4)
  34:	7d 10 10 26 	mfocrf  r8,1
  38:	55 08 f7 fe 	rlwinm  r8,r8,30,31,31
  3c:	7d 29 42 14 	add     r9,r9,r8
  40:	81 04 00 04 	lwz     r8,4(r4)
  44:	7f 8b 48 40 	cmplw   cr7,r11,r9
  48:	7d 29 02 14 	add     r9,r9,r0
  4c:	7d 70 10 26 	mfocrf  r11,1
  50:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  54:	7d 29 5a 14 	add     r9,r9,r11
  58:	7f 80 48 40 	cmplw   cr7,r0,r9
  5c:	7d 29 52 14 	add     r9,r9,r10
  60:	7c 10 10 26 	mfocrf  r0,1
  64:	54 00 f7 fe 	rlwinm  r0,r0,30,31,31
  68:	7d 69 02 14 	add     r11,r9,r0
  6c:	7f 8a 58 40 	cmplw   cr7,r10,r11
  70:	7c 0b 42 14 	add     r0,r11,r8
  74:	81 44 00 08 	lwz     r10,8(r4)
  78:	7c f0 10 26 	mfocrf  r7,1
  7c:	54 e7 f7 fe 	rlwinm  r7,r7,30,31,31
  80:	7c 00 3a 14 	add     r0,r0,r7
  84:	7f 88 00 40 	cmplw   cr7,r8,r0
  88:	7d 20 52 14 	add     r9,r0,r10
  8c:	80 04 00 0c 	lwz     r0,12(r4)
  90:	7d 70 10 26 	mfocrf  r11,1
  94:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  98:	7d 29 5a 14 	add     r9,r9,r11
  9c:	7f 8a 48 40 	cmplw   cr7,r10,r9
  a0:	7d 29 02 14 	add     r9,r9,r0
  a4:	7d 70 10 26 	mfocrf  r11,1
  a8:	55 6b f7 fe 	rlwinm  r11,r11,30,31,31
  ac:	7d 29 5a 14 	add     r9,r9,r11
  b0:	7f 80 48 40 	cmplw   cr7,r0,r9
  b4:	7d 29 2a 14 	add     r9,r9,r5
  b8:	7c 10 10 26 	mfocrf  r0,1
  bc:	54 00 f7 fe 	rlwinm  r0,r0,30,31,31
  c0:	7d 29 02 14 	add     r9,r9,r0
  c4:	7f 85 48 40 	cmplw   cr7,r5,r9
  c8:	7c 09 32 14 	add     r0,r9,r6
  cc:	7d 50 10 26 	mfocrf  r10,1
  d0:	55 4a f7 fe 	rlwinm  r10,r10,30,31,31
  d4:	7c 00 52 14 	add     r0,r0,r10
  d8:	7f 80 30 40 	cmplw   cr7,r0,r6
  dc:	7d 30 10 26 	mfocrf  r9,1
  e0:	55 29 ef fe 	rlwinm  r9,r9,29,31,31
  e4:	7c 09 02 14 	add     r0,r9,r0
  e8:	54 03 80 3e 	rotlwi  r3,r0,16
  ec:	7c 03 02 14 	add     r0,r3,r0
  f0:	7c 03 00 f8 	not     r3,r0
  f4:	78 63 84 22 	rldicl  r3,r3,48,48
  f8:	4e 80 00 20 	blr

This patch implements it in assembly for both PPC32 and PPC64

Link: https://github.com/linuxppc/linux/issues/9
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:19 +10:00
Christophe Leroy 373e098e1e powerpc/32: Optimise __csum_partial()
Improve __csum_partial by interleaving loads and adds.

On a 8xx, it brings neither improvement nor degradation.
On a 83xx, it brings a 25% improvement.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:19 +10:00
Christophe Leroy 1128bb7813 powerpc/lib: Adjust .balign inside string functions for PPC32
commit 87a156fb18 ("Align hot loops of some string functions")
degraded the performance of string functions by adding useless
nops

A simple benchmark on an 8xx calling 100000x a memchr() that
matches the first byte runs in 41668 TB ticks before this patch
and in 35986 TB ticks after this patch. So this gives an
improvement of approx 10%

Another benchmark doing the same with a memchr() matching the 128th
byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks
after this patch, so regardless on the number of loops, removing
those useless nops improves the test by 5683 TB ticks.

Fixes: 87a156fb18 ("Align hot loops of some string functions")
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:19 +10:00
Christophe Leroy 56b04d568f powerpc/signal32: Use fault_in_pages_readable() to prefault user context
Use fault_in_pages_readable() to prefault user context
instead of open coding

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:18 +10:00
Christophe Leroy d04f11d271 powerpc/8xx: Remove RTC clock on 88x
The 885 familly processors don't have the Real Time Clock

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:18 +10:00
Christophe Leroy 169f438a7e powerpc/boot: remove unused variable in mpc8xx
Variable div is set but never used. Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:17 +10:00
Christophe Leroy 0cc377d16e powerpc/misc: merge reloc_offset() and add_reloc_offset()
reloc_offset() is the same as add_reloc_offset(0)

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:17 +10:00
Christophe Leroy 55a0edf083 powerpc/64: optimises from64to32()
The current implementation of from64to32() gives a poor result:

0000000000000270 <.from64to32>:
 270:	38 00 ff ff 	li      r0,-1
 274:	78 69 00 22 	rldicl  r9,r3,32,32
 278:	78 00 00 20 	clrldi  r0,r0,32
 27c:	7c 60 00 38 	and     r0,r3,r0
 280:	7c 09 02 14 	add     r0,r9,r0
 284:	78 09 00 22 	rldicl  r9,r0,32,32
 288:	7c 00 4a 14 	add     r0,r0,r9
 28c:	78 03 00 20 	clrldi  r3,r0,32
 290:	4e 80 00 20 	blr

This patch modifies from64to32() to operate in the same
spirit as csum_fold()

It swaps the two 32-bit halves of sum then it adds it with the
unswapped sum. If there is a carry from adding the two 32-bit halves,
it will carry from the lower half into the upper half, giving us the
correct sum in the upper half.

The resulting code is:

0000000000000260 <.from64to32>:
 260:	78 60 00 02 	rotldi  r0,r3,32
 264:	7c 60 1a 14 	add     r3,r0,r3
 268:	78 63 00 22 	rldicl  r3,r3,32,32
 26c:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:16 +10:00
Christophe Leroy c865c95587 powerpc/mm: Remove stale_map[] handling on non SMP processors
stale_map[] bits are only set in steal_context_smp() so
on UP processors this map is useless. Only manage it for SMP
processors.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:16 +10:00
Christophe Leroy 9e4f02e24e powerpc/mm: constify LAST_CONTEXT in mmu_context_nohash
last_context is 16 on the 8xx, 65535 on the 47x and 255 on other ones.

The kernel is exclusively built for the 8xx, for the 47x or for
another processor so the last context can be defined as a constant
depending on the processor.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Reformat old comment]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:16 +10:00
Christophe Leroy 48bdcace59 powerpc/mm: Avoid unnecessary test and reduce code size
no_selective_tlbil hence the use of either steal_all_contexts()
or steal_context_up() depends on the subarch, it won't change
during run. Only the 8xx uses steal_all_contexts and CONFIG_PPC_8xx
is exclusive of other processors.

This patch replaces the test of no_selective_tlbil global var by
a test of CONFIG_PPC_8xx selection. It avoids the test and
removes unnecessary code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:15 +10:00
Christophe Leroy 8820a44738 powerpc/mm: constify FIRST_CONTEXT in mmu_context_nohash
First context is now 1 for all supported platforms, so it
can be made a constant.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:15 +10:00
Christophe Leroy 9887334b80 powerpc/dma: remove unnecessary BUG()
Direction is already checked in all calling functions in
include/linux/dma-mapping.h and also in called function __dma_sync()

So really no need to check it once more here.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:14 +10:00
Ravi Bangoria 5a61640e32 powerpc/sstep: Fix emulate_step test if VSX not present
emulate_step() tests are failing if VSX is not supported or disabled.

  emulate_step_test: lxvd2x         : FAIL
  emulate_step_test: stxvd2x        : FAIL

If !CPU_FTR_VSX, emulate_step() failure is expected and testcase should
PASS with a valid justification. After patch:

  emulate_step_test: lxvd2x         : PASS (!CPU_FTR_VSX)
  emulate_step_test: stxvd2x        : PASS (!CPU_FTR_VSX)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:14 +10:00
Ravi Bangoria 83afab4ce0 powerpc/sstep: Fix kernel crash if VSX is not present
emulate_step() is not checking runtime VSX feature flag before
emulating an instruction. This is causing kernel crash when kernel
is compiled with CONFIG_VSX=y but running on a machine where VSX
is not supported or disabled. Ex, while running emulate_step tests
on P6 machine:

  Oops: Exception in kernel mode, sig: 4 [#1]
  NIP [c000000000095c24] .load_vsrn+0x28/0x54
  LR [c000000000094bdc] .emulate_loadstore+0x167c/0x17b0
  Call Trace:
    0x40fe240c7ae147ae (unreliable)
    .emulate_loadstore+0x167c/0x17b0
    .emulate_step+0x25c/0x5bc
    .test_lxvd2x_stxvd2x+0x64/0x154
    .test_emulate_step+0x38/0x4c
    .do_one_initcall+0x5c/0x2c0
    .kernel_init_freeable+0x314/0x4cc
    .kernel_init+0x24/0x160
    .ret_from_kernel_thread+0x58/0xb4

With fix:
  emulate_step_test: lxvd2x         : FAIL
  emulate_step_test: stxvd2x        : FAIL

Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-04 00:39:08 +10:00
David S. Miller 9c54aeb03a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Filling in the padding slot in the bpf structure as a bug fix in 'ne'
overlapped with actually using that padding area for something in
'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-03 09:31:58 -04:00
Ravi Bangoria e6684d07e4 powerpc/sstep: Introduce GETTYPE macro
Replace 'op->type & INSTR_TYPE_MASK' expression with GETTYPE(op->type)
macro.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 21:19:40 +10:00
Michal Suchanek a377514519 powerpc/64s: Enhance the information in cpu_show_spectre_v1()
We now have barrier_nospec as mitigation so print it in
cpu_show_spectre_v1() when enabled.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:46 +10:00
Michael Ellerman 51973a815c powerpc/64: Use barrier_nospec in syscall entry
Our syscall entry is done in assembly so patch in an explicit
barrier_nospec.

Based on a patch by Michal Suchanek.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Michael Ellerman ddf35cf376 powerpc: Use barrier_nospec in copy_from_user()
Based on the x86 commit doing the same.

See commit 304ec1b050 ("x86/uaccess: Use __uaccess_begin_nospec()
and uaccess_try_nospec") and b3bbfb3fb5 ("x86: Introduce
__uaccess_begin_nospec() and uaccess_try_nospec") for more detail.

In all cases we are ordering the load from the potentially
user-controlled pointer vs a previous branch based on an access_ok()
check or similar.

Base on a patch from Michal Suchanek.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Michal Suchanek cb3d6759a9 powerpc/64s: Enable barrier_nospec based on firmware settings
Check what firmware told us and enable/disable the barrier_nospec as
appropriate.

We err on the side of enabling the barrier, as it's no-op on older
systems, see the comment for more detail.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:45 +10:00
Michal Suchanek 815069ca57 powerpc/64s: Patch barrier_nospec in modules
Note that unlike RFI which is patched only in kernel the nospec state
reflects settings at the time the module was loaded.

Iterating all modules and re-patching every time the settings change
is not implemented.

Based on lwsync patching.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michal Suchanek 2eea7f067f powerpc/64s: Add support for ori barrier_nospec patching
Based on the RFI patching. This is required to be able to disable the
speculation barrier.

Only one barrier type is supported and it does nothing when the
firmware does not enable it. Also re-patching modules is not supported
So the only meaningful thing that can be done is patching out the
speculation barrier at boot when the user says it is not wanted.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michal Suchanek a6b3964ad7 powerpc/64s: Add barrier_nospec
A no-op form of ori (or immediate of 0 into r31 and the result stored
in r31) has been re-tasked as a speculation barrier. The instruction
only acts as a barrier on newer machines with appropriate firmware
support. On older CPUs it remains a harmless no-op.

Implement barrier_nospec using this instruction.

mpe: The semantics of the instruction are believed to be that it
prevents execution of subsequent instructions until preceding branches
have been fully resolved and are no longer executing speculatively.
There is no further documentation available at this time.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:44 +10:00
Michael Ellerman 7af76c5f23 powerpc/stacktrace: Update copyright
This now has new code in it written by Nick and I, and switch to a
SPDX tag.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:43 +10:00
Michael Ellerman 5cc05910f2 powerpc/64s: Wire up arch_trigger_cpumask_backtrace()
This allows eg. the RCU stall detector, or the soft/hardlockup
detectors to trigger a backtrace on all CPUs.

We implement this by sending a "safe" NMI, which will actually only
send an IPI. Unfortunately the generic code prints "NMI", so that's a
little confusing but we can probably live with it.

If one of the CPUs doesn't respond to the IPI, we then print some info
from it's paca and do a backtrace based on its saved_r1.

Example output:

  INFO: rcu_sched detected stalls on CPUs/tasks:
  	2-...0: (0 ticks this GP) idle=1be/1/4611686018427387904 softirq=1055/1055 fqs=25735
  	(detected by 4, t=58847 jiffies, g=58, c=57, q=1258)
  Sending NMI from CPU 4 to CPUs 2:
  CPU 2 didn't respond to backtrace IPI, inspecting paca.
  irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 3623 (bash)
  Back trace of paca->saved_r1 (0xc0000000e1c83ba0) (possibly stale):
  Call Trace:
  [c0000000e1c83ba0] [0000000000000014] 0x14 (unreliable)
  [c0000000e1c83bc0] [c000000000765798] lkdtm_do_action+0x48/0x80
  [c0000000e1c83bf0] [c000000000765a40] direct_entry+0x110/0x1b0
  [c0000000e1c83c90] [c00000000058e650] full_proxy_write+0x90/0xe0
  [c0000000e1c83ce0] [c0000000003aae3c] __vfs_write+0x6c/0x1f0
  [c0000000e1c83d80] [c0000000003ab214] vfs_write+0xd4/0x240
  [c0000000e1c83dd0] [c0000000003ab5cc] ksys_write+0x6c/0x110
  [c0000000e1c83e30] [c00000000000b860] system_call+0x58/0x6c

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:43 +10:00
Michael Ellerman 6ba55716a2 powerpc/nmi: Add an API for sending "safe" NMIs
Currently the options we have for sending NMIs are not necessarily
safe, that is they can potentially interrupt a CPU in a
non-recoverable region of code, meaning the kernel must then panic().

But we'd like to use smp_send_nmi_ipi() to do cross-CPU calls in
situations where we don't want to risk a panic(), because it doesn't
have the requirement that interrupts must be enabled like
smp_call_function().

So add an API for the caller to indicate that it wants to use the NMI
infrastructure, but doesn't want to do anything "unsafe".

Currently that is implemented by not actually calling cause_nmi_ipi(),
instead falling back to an IPI. In future we can pass the safe
parameter down to cause_nmi_ipi() and the individual backends can
potentially take it into account before deciding what to do.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:43 +10:00
Michael Ellerman 7b08729cb2 powerpc/64: Save stack pointer when we hard disable interrupts
A CPU that gets stuck with interrupts hard disable can be difficult to
debug, as on some platforms we have no way to interrupt the CPU to
find out what it's doing.

A stop-gap is to have the CPU save it's stack pointer (r1) in its paca
when it hard disables interrupts. That way if we can't interrupt it,
we can at least trace the stack based on where it last disabled
interrupts.

In some cases that will be total junk, but the stack trace code should
handle that. In the simple case of a CPU that disable interrupts and
then gets stuck in a loop, the stack trace should be informative.

We could clear the saved stack pointer when we enable interrupts, but
that loses information which could be useful if we have nothing else
to go on.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
2018-06-03 20:43:42 +10:00
Michael Ellerman 3e3786801b powerpc: Check address limit on user-mode return (TIF_FSCHECK)
set_fs() sets the addr_limit, which is used in access_ok() to
determine if an address is a user or kernel address.

Some code paths use set_fs() to temporarily elevate the addr_limit so
that kernel code can read/write kernel memory as if it were user
memory. That is fine as long as the code can't ever return to
userspace with the addr_limit still elevated.

If that did happen, then userspace can read/write kernel memory as if
it were user memory, eg. just with write(2). In case it's not clear,
that is very bad. It has also happened in the past due to bugs.

Commit 5ea0727b16 ("x86/syscalls: Check address limit on user-mode
return") added a mechanism to check the addr_limit value before
returning to userspace. Any call to set_fs() sets a thread flag,
TIF_FSCHECK, and if we see that on the return to userspace we go out
of line to check that the addr_limit value is not elevated.

For further info see the above commit, as well as:
  https://lwn.net/Articles/722267/
  https://bugs.chromium.org/p/project-zero/issues/detail?id=990

Verified to work on 64-bit Book3S using a POC that objdumps the system
call handler, and a modified lkdtm_CORRUPT_USER_DS() that doesn't kill
the caller.

Before:
  $ sudo ./test-tif-fscheck
  ...
  0000000000000000 <.data>:
         0:       e1 f7 8a 79     rldicl. r10,r12,30,63
         4:       80 03 82 40     bne     0x384
         8:       00 40 8a 71     andi.   r10,r12,16384
         c:       78 0b 2a 7c     mr      r10,r1
        10:       10 fd 21 38     addi    r1,r1,-752
        14:       08 00 c2 41     beq-    0x1c
        18:       58 09 2d e8     ld      r1,2392(r13)
        1c:       00 00 41 f9     std     r10,0(r1)
        20:       70 01 61 f9     std     r11,368(r1)
        24:       78 01 81 f9     std     r12,376(r1)
        28:       70 00 01 f8     std     r0,112(r1)
        2c:       78 00 41 f9     std     r10,120(r1)
        30:       20 00 82 41     beq     0x50
        34:       a6 42 4c 7d     mftb    r10

After:

  $ sudo ./test-tif-fscheck
  Killed

And in dmesg:
  Invalid address limit on user-mode return
  WARNING: CPU: 1 PID: 3689 at ../include/linux/syscalls.h:260 do_notify_resume+0x140/0x170
  ...
  NIP [c00000000001ee50] do_notify_resume+0x140/0x170
  LR [c00000000001ee4c] do_notify_resume+0x13c/0x170
  Call Trace:
    do_notify_resume+0x13c/0x170 (unreliable)
    ret_from_except_lite+0x70/0x74

Performance overhead is essentially zero in the usual case, because
the bit is checked as part of the existing _TIF_USER_WORK_MASK check.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:42 +10:00
Michael Ellerman ba0635fcbe powerpc: Rename thread_struct.fs to addr_limit
It's called 'fs' for historical reasons, it's named after the x86 'FS'
register. But we don't have to use that name for the member of
thread_struct, and in fact arch/x86 doesn't even call it 'fs' anymore.

So rename it to 'addr_limit', which better reflects what it's used
for, and is also the name used on other arches.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:42 +10:00
Al Viro 6bcdd2972b powerpc/ptrace: Use copy_{from, to}_user() rather than open-coding
In PPC_PTRACE_GETHWDBGINFO and PPC_PTRACE_SETHWDEBUG we do an
access_ok() check and then __copy_{from,to}_user().

Instead we should just use copy_{from,to}_user() which does all that
for us and is less error prone.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:41 +10:00
Sam Bobroff 20b3449714 powerpc/eeh: Refactor report functions
The EEH report functions now share a fair bit of code around the start
and end of each function.

So factor out as much as possible, and move the traversal into a
custom function. This also allows accurate debug to be generated more
easily.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
[mpe: Format with clang-format]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:41 +10:00
Sam Bobroff 665012c573 powerpc/eeh: Cleaner handling of EEH_DEV_NO_HANDLER
If a device without a driver is recovered via EEH, the flag
EEH_DEV_NO_HANDLER is incorrectly left set on the device after
recovery, because the test in eeh_report_resume() for the existence of
a bound driver is done before the flag is cleared. If a driver is
later bound, and EEH experienced again, some of the drivers EEH
handers are not called.

To correct this, clear the flag unconditionally after EEH processing
is complete.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:41 +10:00
Sam Bobroff 010acfa1a7 powerpc/eeh: Introduce eeh_set_irq_state()
To ease future refactoring, extract calls to eeh_enable_irq() and
eeh_disable_irq() from the various report functions. This makes
the report functions initial sequences more similar, as well as making
the IRQ changes visible when reading eeh_handle_normal_event().

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:40 +10:00
Sam Bobroff 47cc8c1cc2 powerpc/eeh: Introduce eeh_set_channel_state()
To ease future refactoring, extract setting of the channel state
from the report functions out into their own functions. This increases
the amount of code that is identical across all of the report
functions.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:40 +10:00
Sam Bobroff e2b810d51b powerpc/eeh: Introduce eeh_edev_actionable()
The same test is done in every EEH report function, so factor it out.

Since eeh_dev_removed() needs to be moved higher up in the file,
simplify it a little while we're at it.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:40 +10:00
Sam Bobroff 309ed3a715 powerpc/eeh: Introduce eeh_for_each_pe()
Add a for_each-style macro for iterating through PEs without the
boilerplate required by a traversal function. eeh_pe_next() is now
exported, as it is now used directly in place.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:39 +10:00
Sam Bobroff 30424e386a powerpc/eeh: Clean up pci_ers_result handling
As EEH event handling progresses, a cumulative result of type
pci_ers_result is built up by (some of) the eeh_report_*() functions
using either:
	if (rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
	if (*res == PCI_ERS_RESULT_NONE) *res = rc;
or:
	if ((*res == PCI_ERS_RESULT_NONE) ||
	    (*res == PCI_ERS_RESULT_RECOVERED)) *res = rc;
	if (*res == PCI_ERS_RESULT_DISCONNECT &&
	    rc == PCI_ERS_RESULT_NEED_RESET) *res = rc;
(Where *res is the accumulator.)

However, the intent is not immediately clear and the result in some
situations is order dependent.

Address this by assigning a priority to each result value, and always
merging to the highest priority. This renders the intent clear, and
provides a stable value for all orderings.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
[mpe: Minor formatting (clang-format)]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:39 +10:00
Sam Bobroff 2eae39f29b powerpc/eeh: Add message when PE processing at parent
To aid debugging, add a message to show when EEH processing for a PE
will be done at the device's parent, rather than directly at the
device.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:39 +10:00
Sam Bobroff d6c4932fbf powerpc/eeh: Strengthen types of eeh traversal functions
The traversal functions eeh_pe_traverse() and eeh_pe_dev_traverse()
both provide their first argument as void * but every single user casts
it to the expected type.

Change the type of the first parameter from void * to the appropriate
type, and clean up all uses.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:38 +10:00
Sam Bobroff a0bd54641b powerpc/eeh: Remove unused eeh_pcid_name()
Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:38 +10:00
Sam Bobroff 46d4be41b9 powerpc/eeh: Fix use-after-release of EEH driver
Correct two cases where eeh_pcid_get() is used to reference the driver's
module but the reference is dropped before the driver pointer is used.

In eeh_rmv_device() also refactor a little so that only two calls to
eeh_pcid_put() are needed, rather than three and the reference isn't
taken at all if it wasn't needed.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:38 +10:00
Sam Bobroff 796b9f5b31 powerpc/eeh: Add final message for successful recovery
Add a single log line at the end of successful EEH recovery, so that
it's clear that event processing has finished.

Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:37 +10:00
Anju T Sudhakar 25af86b2ae powerpc/perf: Unregister thread-imc if core-imc not supported
Since thread-imc internally use the core-imc hardware infrastructure
and is depended on it, having thread-imc in the kernel in the
absence of core-imc is trivial. Patch disables thread-imc, if
core-imc is not registered.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:37 +10:00
Anju T Sudhakar e7a8ac4338 powerpc/perf: Return appropriate value for unknown domain
Return proper error code for unknown domain during IMC initialization.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:37 +10:00
Anju T Sudhakar b41bb28b9e powerpc/perf: Replace the direct return with goto statement
Replace the direct return statement in imc_mem_init() with goto, to adhere
to the kernel coding style.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:36 +10:00
Anju T Sudhakar cb094fa5af powerpc/perf: Rearrange memory freeing in imc init
When any of the IMC (In-Memory Collection counter) devices fail
to initialize, imc_common_mem_free() frees set of memory. In doing so,
pmu_ptr pointer is also freed. But pmu_ptr pointer is used in subsequent
function (imc_common_cpuhp_mem_free()) which is wrong. Patch here reorders
the code to avoid such access.

Also free the memory which is dynamically allocated during imc
initialization, wherever required.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:36 +10:00
YueHaibing 589b1f7e4b powerpc/xics: Add missing of_node_put() in error path
The device node obtained with of_find_compatible_node() should be
released by calling of_node_put().  But it was not released when
of_get_property() failed.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[mpe: Invert the sense of the if so we only need one return path]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:35 +10:00
Fabio Estevam c5cbde2df3 powerpc: cpm_gpio: Remove owner assignment from platform_driver
Structure platform_driver does not need to set the owner field, as this
will be populated by the driver core.

Generated by scripts/coccinelle/api/platform_no_drv_owner.cocci.

Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:35 +10:00
Russell Currey 8a792262f3 powerpc/xive: Remove (almost) unused macros
The GETFIELD and SETFIELD macros in xive-regs.h aren't used except for
a single instance of GETFIELD, so replace that and remove them.

These macros are also defined in vas.h, so either those should be
eventually replaced or the macros moved into bitops.h.

Signed-off-by: Russell Currey <ruscur@russell.cc>
[mpe: Rewrite the assignment to 'he' to avoid ffs() etc.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:35 +10:00
Arnd Bergmann 34efabe418 powerpc: remove unused to_tm() helper
to_tm() is now completely unused, the only reference being in the
_dump_time() helper that is also unused. This removes both, leaving
the rest of the powerpc RTC code y2038 safe to as far as the hardware
supports.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:34 +10:00
Arnd Bergmann 5235afa89a powerpc: use time64_t in update_persistent_clock
update_persistent_clock() is deprecated because it suffers from overflow
in 2038 on 32-bit architectures. This changes powerpc to use the
update_persistent_clock64() replacement, and to pass down 64-bit
timestamps consistently.

This is now simpler, as we no longer have to worry about the offset
numbers in tm_year and tm_mon that are different between the Linux
conventions and RTAS.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:34 +10:00
Arnd Bergmann 5bfd643583 powerpc: use time64_t in read_persistent_clock
Looking through the remaining users of the deprecated mktime()
function, I found the powerpc rtc handlers, which use it in
place of rtc_tm_to_time64().

To clean this up, I'm changing over the read_persistent_clock()
function to the read_persistent_clock64() variant, and change
all the platform specific handlers along with it.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:33 +10:00
Arnd Bergmann 2dc20f454d powerpc: rtas: clean up time handling
The to_tm() helper function operates on a signed integer for the time,
so it will suffer from overflow in 2038, even on 64-bit kernels.

Rather than fix that function, this replaces its use in the rtas
procfs implementation with the standard rtc_time64_to_tm() helper
that is very similar but is not affected by the overflow.

In order to actually support long times, the parser function gets
changed to 64-bit user input and output as well. Note that the tm_mon
and tm_year representation is slightly different, so we have to manually
add an offset here.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:33 +10:00
Arnd Bergmann 6e8cef384a powerpc: always enable RTC_LIB
In order to use the rtc_tm_to_time64() and rtc_time64_to_tm()
helper functions in later patches, we have to ensure that
CONFIG_RTC_LIB is always built-in.

Note that this symbol only controls a couple of helper functions,
not the actual RTC subsystem, which remains optional and is
enabled with CONFIG_RTC_CLASS.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:33 +10:00
Olof Johansson eff06ef089 powerpc/pasemi: Set PCI_SCAN_ALL_PCI_DEVS
Needed on Amiga X1000 with SB600.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:43:32 +10:00
Aneesh Kumar K.V a5db5060e0 powerpc/mm/hash: hard disable irq in the SLB insert path
When inserting SLB entries for EA above 512TB, we need to hard disable irq.
This will make sure we don't take a PMU interrupt that can possibly touch
user space address via a stack dump. To prevent this, we need to hard disable
the interrupt.

Also add a comment explaining why we don't need context synchronizing isync
with slbmte.

Fixes: f384796c4 ("powerpc/mm: Add support for handling > 512TB address in SLB miss")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:38 +10:00
Aneesh Kumar K.V ed515b6898 powerpc/mm/hugetlb: Update hugetlb related locks
With split pmd page table lock enabled, we don't use mm->page_table_lock when
updating pmd entries. This patch update hugetlb path to use the right lock
when inserting huge page directory entries into page table.

ex: if we are using hugepd and inserting hugepd entry at the pmd level, we
use pmd_lockptr, which based on config can be split pmd lock.

For update huge page directory entries itself we use mm->page_table_lock. We
do have a helper huge_pte_lockptr() for that.

Fixes: 675d99529 ("powerpc/book3s64: Enable split pmd ptlock")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:37 +10:00
Aneesh Kumar K.V 91d0697188 powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch
Currently we do not have an isync, or any other context synchronizing
instruction prior to the slbie/slbmte in _switch() that updates the
SLB entry for the kernel stack.

However that is not correct as outlined in the ISA.

From Power ISA Version 3.0B, Book III, Chapter 11, page 1133:

  "Changing the contents of ... the contents of SLB entries ... can
   have the side effect of altering the context in which data
   addresses and instruction addresses are interpreted, and in which
   instructions are executed and data accesses are performed.
   ...
   These side effects need not occur in program order, and therefore
   may require explicit synchronization by software.
   ...
   The synchronizing instruction before the context-altering
   instruction ensures that all instructions up to and including that
   synchronizing instruction are fetched and executed in the context
   that existed before the alteration."

And page 1136:

  "For data accesses, the context synchronizing instruction before the
   slbie, slbieg, slbia, slbmte, tlbie, or tlbiel instruction ensures
   that all preceding instructions that access data storage have
   completed to a point at which they have reported all exceptions
   they will cause."

We're not aware of any bugs caused by this, but it should be fixed
regardless.

Add the missing isync when updating kernel stack SLB entry.

Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
[mpe: Flesh out change log with more ISA text & explanation]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:37 +10:00
Nicholas Piggin 926bc2f100 powerpc/64s: Fix compiler store ordering to SLB shadow area
The stores to update the SLB shadow area must be made as they appear
in the C code, so that the hypervisor does not see an entry with
mismatched vsid and esid. Use WRITE_ONCE for this.

GCC has been observed to elide the first store to esid in the update,
which means that if the hypervisor interrupts the guest after storing
to vsid, it could see an entry with old esid and new vsid, which may
possibly result in memory corruption.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:37 +10:00
Nicholas Piggin 0cef77c779 powerpc/64s/radix: flush remote CPUs out of single-threaded mm_cpumask
When a single-threaded process has a non-local mm_cpumask, try to use
that point to flush the TLBs out of other CPUs in the cpumask.

An IPI is used for clearing remote CPUs for a few reasons:
- An IPI can end lazy TLB use of the mm, which is required to prevent
  TLB entries being created on the remote CPU. The alternative is to
  drop lazy TLB switching completely, which costs 7.5% in a context
  switch ping-pong test betwee a process and kernel idle thread.
- An IPI can have remote CPUs flush the entire PID, but the local CPU
  can flush a specific VA. tlbie would require over-flushing of the
  local CPU (where the process is running).
- A single threaded process that is migrated to a different CPU is
  likely to have a relatively small mm_cpumask, so IPI is reasonable.

No other thread can concurrently switch to this mm, because it must
have been given a reference to mm_users by the current thread before it
can use_mm. mm_users can be asynchronously incremented (by
mm_activate or mmget_not_zero), but those users must use remote mm
access and can't use_mm or access user address space. Existing code
makes the this assumption already, for example sparc64 has reset
mm_cpumask using this condition since the start of history, see
arch/sparc/kernel/smp_64.c.

This reduces tlbies for a kernel compile workload from 0.90M to 0.12M,
tlbiels are increased significantly due to the PID flushing for the
cleaning up remote CPUs, and increased local flushes (PID flushes take
128 tlbiels vs 1 tlbie).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin 85bcfaf69c powerpc/64s/radix: optimise pte_update
Implementing pte_update with pte_xchg (which uses cmpxchg) is
inefficient. A single larx/stcx. works fine, no need for the less
efficient cmpxchg sequence.

Then remove the memory barriers from the operation. There is a
requirement for TLB flushing to load mm_cpumask after the store
that reduces pte permissions, which is moved into the TLB flush
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin f1cb8f9beb powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags
The ISA suggests ptesync after setting a pte, to prevent a table walk
initiated by a subsequent access from missing that store and causing a
spurious fault. This is an architectual allowance that allows an
implementation's page table walker to be incoherent with the store
queue.

However there is no correctness problem in taking a spurious fault in
userspace -- the kernel copes with these at any time, so the updated
pte will be found eventually. Spurious kernel faults on vmap memory
must be avoided, so a ptesync is put into flush_cache_vmap.

On POWER9 so far I have not found a measurable window where this can
result in more minor faults, so as an optimisation, remove the costly
ptesync from pte updates. If an implementation benefits from ptesync,
it would be better to add it back in update_mmu_cache, so it's not
done for things like fork(2).

fork --fork --exec benchmark improved 5.2% (12400->13100).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:36 +10:00
Nicholas Piggin 68662f85f3 powerpc/64s/radix: prefetch user address in update_mmu_cache
Prefetch the faulting address in update_mmu_cache to give the page
table walker perhaps 100 cycles head start as locks are dropped and
the interrupt completed.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin f569bd94ef powerpc/64s/radix: make ptep_get_and_clear_full non-atomic for the full case
This matches other architectures, when we know there will be no
further accesses to the address (e.g., for teardown), page table
entries can be cleared non-atomically.

The comments about NMMU are bogus: all MMU notifiers (including NMMU)
are released at this point, with their TLBs flushed. An NMMU access at
this point would be a bug.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin 6d8278c414 powerpc/64s/radix: do not flush TLB on spurious fault
In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Refer- ence or Change bit to
    1 or to add access authorities, a simpler sequence suffices
    because the translation hardware will refetch the PTE if an access
    is attempted for which the only problems were reference and/or
    change bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 0.95M to 0.90M.

fork --fork --exec benchmark improved 0.5% (12300->12400).

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:35 +10:00
Nicholas Piggin e5f7cb58c2 powerpc/64s/radix: do not flush TLB when relaxing access
Radix flushes the TLB when updating ptes to increase permissiveness
of protection (increase access authority). Book3S does not require
TLB flushing in this case, and it is not done on hash. This patch
avoids the flush for radix.

>From Power ISA v3.0B, p.1090:

    Setting a Reference or Change Bit or Upgrading Access Authority
    (PTE Subject to Atomic Hardware Updates)

    If the only change being made to a valid PTE that is subject to
    atomic hardware updates is to set the Reference or Change bit to 1
    or to add access authorities, a simpler sequence suffices because
    the translation hardware will refetch the PTE if an access is
    attempted for which the only problems were reference and/or change
    bits needing to be set or insufficient access authority.

The nest MMU on POWER9 does not re-fetch the PTE after such an access
attempt before faulting, so address spaces with a coprocessor
attached will continue to flush in these cases.

This reduces tlbies for a kernel compile workload from 1.28M to 0.95M,
tlbiels from 20.17M 19.68M.

fork --fork --exec benchmark improved 2.77% (12000->12300).

Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V bd5050e38a powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang
When relaxing access (read -> read_write update), pte needs to be marked invalid
to handle a nest MMU bug. We also need to do a tlb flush after the pte is
marked invalid before updating the pte with new access bits.

We also move tlb flush to platform specific __ptep_set_access_flags. This will
help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
in this patch. This also helps in avoiding multiple tlbies with coprocessor
attached.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V e4c1112c3f powerpc/mm: Change function prototype
In later patch, we use the vma and psize to do tlb flush. Do the prototype
update in separate patch to make the review easy.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V 044003b52a powerpc/mm/radix: Move function from radix.h to pgtable-radix.c
In later patch we will update them which require them to be moved
to pgtable-radix.c. Keeping the function in radix.h results in
compile warning as below.

./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
  struct mm_struct *mm = vma->vm_mm;
                            ^~
./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
      atomic_read(&mm->context.copros) > 0) {
      ^~~~~~~~~~~
      __atomic_load
./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
      atomic_read(&mm->context.copros) > 0) {

Instead of fixing header dependencies, we move the function to pgtable-radix.c
Also the function is now large to be a static inline . Doing the
move in separate patch helps in review.

No functional change in this patch. Only code movement.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:34 +10:00
Aneesh Kumar K.V f069ff396d powerpc/mm/hugetlb: Update huge_ptep_set_access_flags to call __ptep_set_access_flags directly
In a later patch, we want to update __ptep_set_access_flags take page size
arg. This makes ptep_set_access_flags only work with mmu_virtual_psize.
To simplify the code make huge_ptep_set_access_flags directly call
__ptep_set_access_flags so that we can compute the hugetlb page size in
hugetlb function.

Now that ptep_set_access_flags won't be called for hugetlb remove
the is_vm_hugetlb_page() check and add the assert of pte lock
unconditionally.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:33 +10:00
Alastair D'Silva 19df39581c ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:32 +10:00
Alastair D'Silva 71cc64a85d powerpc: use task_pid_nr() for TID allocation
The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.

In the unlikely event that 2 threads share the TID and are waiting,
all potential outcomes have been determined safe.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Alastair D'Silva 3449f191ca powerpc: Use TIDR CPU feature to control TIDR allocation
Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Alastair D'Silva 819844285e powerpc: Add TIDR CPU feature for POWER9
This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
Reviewed-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:31 +10:00
Nicholas Piggin 56c0b48b1e powerpc/powernv: process all OPAL event interrupts with kopald
Using irq_work for processing OPAL event interrupts is not necessary.
irq_work is typically used to schedule work from NMI context, a
softirq may be more appropriate. However OPAL events are not
particularly performance or latency critical, so they can all be
invoked by kopald.

This patch removes the irq_work queueing, and instead wakes up
kopald when there is an event to be processed. kopald processes
interrupts individually, enabling irqs and calling cond_resched
between each one to minimise latencies.

Event handlers themselves should still use threaded handlers,
workqueues, etc. as necessary to avoid high interrupts-off latencies
within any single interrupt.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin ee03b9b447 powerpc/powernv: call OPAL_QUIESCE before OPAL_SIGNAL_SYSTEM_RESET
Although it is often possible to recover a CPU that was interrupted
from OPAL with a system reset NMI, it's undesirable to interrupt them
for a few reasons. Firstly because dump/debug code itself needs to
call firmware, so it could hang on a lock or possibly corrupt a
per-cpu data structure if it or another CPU was interrupted from
OPAL. Secondly, the kexec crash dump code will not return from
interrupt to unwind the OPAL call.

Call OPAL_QUIESCE with QUIESCE_HOLD before sending an NMI IPI to
another CPU, which wait for it to leave firmware (or time out) to
avoid this problem in normal conditions. Firmware bugs may still
result in a timeout and interrupting OPAL, but that is the best
option (stops the CPU, and possibly allows firmware to be debugged).

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin 3130a7bb6e powerpc/64: change softe to irqmask in show_regs and xmon
When the soft enabled flag was changed to a soft disable mask, xmon
and register dump code was not updated to reflect that, which is
confusing ('SOFTE: 1' previously meant interrupts were soft enabled,
currently it means the opposite, the general interrupt type has been
disabled).

Fix this by using the name irqmask, and printing it in hex.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:30 +10:00
Nicholas Piggin 81ea11d3af powerpc/pmu/fsl: fix is_nmi test for irq mask change
When soft enabled was changed to irq disabled mask, this test missed
being converted (although the equivalent book3s test was converted).

The PMU drivers consider it an NMI when they take a PMI while general
interrupts are disabled. This change restores that behaviour.

Fixes: 01417c6cc7 ("powerpc/64: Change soft_enabled from flag to bitmask")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin e360cd37f0 powerpc/time: account broadcast timer event interrupts separately
These are not local timer interrupts but IPIs. It's good to be able
to see how timer offloading is behaving, so split these out into
their own category.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin 21bfd6a8e9 powerpc: move a stray NMI IPI case under NMI_IPI ifdef
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:29 +10:00
Nicholas Piggin bc90711331 powerpc: move timer broadcast code under GENERIC_CLOCKEVENTS_BROADCAST ifdef
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:28 +10:00
Nicholas Piggin a7cba02dec powerpc: allow soft-NMI watchdog to cover timer interrupts with large decrementers
Large decrementers (e.g., POWER9) can take a very long time to wrap,
so when the timer iterrupt handler sets the decrementer to max so as
to avoid taking another decrementer interrupt when hard enabling
interrupts before running timers, it effectively disables the soft
NMI coverage for timer interrupts.

Fix this by using the traditional 31-bit value instead, which wraps
after a few seconds. masked interrupt code does the same thing, and
in normal operation neither of these paths would ever wrap even the
31 bit value.

Note: the SMP watchdog should catch timer interrupt lockups, but it
is preferable for the local soft-NMI to catch them, mainly to avoid
the IPI.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:28 +10:00
Nicholas Piggin 3f984620f9 powerpc: generic clockevents broadcast receiver call tick_receive_broadcast
The broadcast tick recipient can call tick_receive_broadcast rather
than re-running the full timer interrupt.

It does not have to check for the next event time, because the sender
already determined the timer has expired. It does not have to test
irq_work_pending, because that's a direct decrementer interrupt and
does not go through the clock events subsystem. And it does not have
to read PURR because that was removed with the previous patch.

This results in no code size change, but both the decrementer and
broadcast path lengths are reduced.

Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin 3d3a6021dd powerpc/pseries: lparcfg calculate PURR on demand
For SPLPAR, lparcfg provides a sum of PURR registers for all CPUs.
Currently this is done by reading PURR in context switch and timer
interrupt, and storing that into a per-CPU variable. These are summed
to provide the value.

This does not work with all timer schemes (e.g., NO_HZ_FULL), and it
is sub-optimal for performance because it reads the PURR register on
every context switch, although that's been difficult to distinguish
from noise in the contxt_switch microbenchmark.

This patch implements the sum by calling a function on each CPU, to
read and add PURR values of each CPU.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:27 +10:00
Nicholas Piggin 36d632ea83 powerpc/64: remove start_tb and accum_tb from thread_struct
These fields are only written to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin 54071e4176 powerpc/64s: micro-optimise __hard_irq_enable() for mtmsrd L=1 support
Book3S minimum supported ISA version now requires mtmsrd L=1. This
instruction does not require bits other than RI and EE to be supplied,
so __hard_irq_enable() and __hard_irq_disable() does not have to read
the kernel_msr from paca.

Interrupt entry code already relies on L=1 support.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:26 +10:00
Nicholas Piggin 9f4b61b277 powerpc/pseries: put cede MSR[EE] check under IRQ_SOFT_MASK_DEBUG
This check does not catch IRQ soft mask bugs, but this option is
slightly more suitable than TRACE_IRQFLAGS.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Nicholas Piggin ebb37cf3ff powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled
irq_work_raise should not cause a decrementer exception unless it is
called from NMI context. Doing so often just results in an immediate
masked decrementer interrupt:

   <...>-550    90d...    4us : update_curr_rt <-dequeue_task_rt
   <...>-550    90d...    5us : dbs_update_util_handler <-update_curr_rt
   <...>-550    90d...    6us : arch_irq_work_raise <-irq_work_queue
   <...>-550    90d...    7us : soft_nmi_interrupt <-soft_nmi_common
   <...>-550    90d...    7us : printk_nmi_enter <-soft_nmi_interrupt
   <...>-550    90d.Z.    8us : rcu_nmi_enter <-soft_nmi_interrupt
   <...>-550    90d.Z.    9us : rcu_nmi_exit <-soft_nmi_interrupt
   <...>-550    90d...    9us : printk_nmi_exit <-soft_nmi_interrupt
   <...>-550    90d...   10us : cpuacct_charge <-update_curr_rt

The soft_nmi_interrupt here is the call into the watchdog, due to the
decrementer interrupt firing with irqs soft-disabled. This is
harmless, but sub-optimal.

When it's not called from NMI context or with interrupts enabled, mark
the decrementer pending in the irq_happened mask directly, rather than
having the masked decrementer interupt handler do it. This will be
replayed at the next local_irq_enable. See the comment for details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Alexey Kardashevskiy 98fd72fe82 powerpc/powernv/ioda2: Remove redundant free of TCE pages
When IODA2 creates a PE, it creates an IOMMU table with it_ops::free
set to pnv_ioda2_table_free() which calls pnv_pci_ioda2_table_free_pages().

Since iommu_tce_table_put() calls it_ops::free when the last reference
to the table is released, explicit call to pnv_pci_ioda2_table_free_pages()
is not needed so let's remove it.

This should fix double free in the case of PCI hotuplug as
pnv_pci_ioda2_table_free_pages() does not reset neither
iommu_table::it_base nor ::it_size.

This was not exposed by SRIOV as it uses different code path via
pnv_pcibios_sriov_disable().

IODA1 does not inialize it_ops::free so it does not have this issue.

Fixes: c5f7700bbd ("powerpc/powernv: Dynamically release PE")
Cc: stable@vger.kernel.org # v4.8+
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:25 +10:00
Yisheng Xie 0abbf2bfdc powerpc/xmon: use match_string() helper
match_string() returns the index of an array for a matching string,
which can be used instead of open coded variant.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Christophe Leroy 2479bfc9bc powerpc: Fix build by disabling attribute-alias warning for SYSCALL_DEFINEx
GCC 8.1 emits warnings such as the following. As arch/powerpc code is
built with -Werror, this breaks the build with GCC 8.1.

  In file included from arch/powerpc/kernel/pci_64.c:23:
  ./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias
  between functions of incompatible types 'long int(long int, long
  unsigned int, long unsigned int)' and 'long int(long int, long int,
  long int)' [-Werror=attribute-alias]
    asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
                    ^~~
  ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
    __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)

This patch inhibits those warnings.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Trim change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Christophe Leroy c959988118 powerpc/64: Fix strncpy() related build failures with GCC 8.1
GCC 8.1 warns about possible string truncation:

  arch/powerpc/kernel/nvram_64.c:1042:2: error: 'strncpy' specified
  bound 12 equals destination size [-Werror=stringop-truncation]
    strncpy(new_part->header.name, name, 12);

  arch/powerpc/platforms/ps3/repository.c:106:2: error: 'strncpy'
  output truncated before terminating nul copying 8 bytes from a
  string of the same length [-Werror=stringop-truncation]
    strncpy((char *)&n, text, 8);

Fix it by using memcpy(). To make that safe we need to ensure the
destination is pre-zeroed. Use kzalloc() in the nvram code and
initialise the u64 to zero in the ps3 code.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
[mpe: Use kzalloc() in the nvram code, flesh out change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-03 20:40:24 +10:00
Michael Ellerman 2135a6ec3e Merge branch 'topic/pkey' into next
This is a branch with a mixture of mm, x86 and powerpc commits all
relating to some minor cross-arch pkeys consolidation. The x86/mm
changes have been reviewed by Ingo & Dave Hansen and the tree has been
in linux-next for some weeks without issue.
2018-06-03 20:35:27 +10:00
Michael Ellerman f1079d3a3d Merge branch 'fixes' into next
We ended up with an ugly conflict between fixes and next in ftrace.h
involving multiple nested ifdefs, and the automatic resolution is
wrong. So merge fixes into next so we can fix it up.
2018-06-03 20:32:02 +10:00
Michael Ellerman b5240b1439 Merge branch 'topic/kbuild' into next
Merge in some commits we're sharing with the kbuild tree.
2018-06-03 20:24:15 +10:00
Michael Ellerman 481c63acba Merge branch 'topic/ppc-kvm' into next
Merge in some commits we're sharing with the kvm-ppc tree.
2018-06-03 20:23:54 +10:00
Linus Torvalds 4277e6b9fd Final MIPS fixes for 4.17
A final few MIPS fixes for 4.17:
 
  - Drop Lantiq gphy reboot/remove reset (4.14)
 
  - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)
 
  - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQS7lRNBWUYtqfDOVL41zuSGKxAj8gUCWxHFOwAKCRA1zuSGKxAj
 8i39AQCX9phkffpRDnA1e/MiGGeRZ5+f9FBOzuS1x2nzdUagoQD9FzWxdcx57Syj
 ye0kUtmc/wm8U6kz3qC3OInSeVuIEQg=
 =3NGM
 -----END PGP SIGNATURE-----

Merge tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from James Hogan:
 "A final few MIPS fixes for 4.17:

   - drop Lantiq gphy reboot/remove reset (4.14)

   - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)

   - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)"

* tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs
  MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests
  MIPS: lantiq: gphy: Drop reboot/remove reset asserts
2018-06-02 10:12:23 -07:00
Catalin Marinas 38500be10e arm64: KVM: Move VCPU_WORKAROUND_2_FLAG macros to the top of the file
This is to avoid potential merging conflicts between commit 55e3748e89
("arm64: KVM: Add ARCH_WORKAROUND_2 support for guests") and the KVM
tree.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-06-02 10:42:54 +01:00
Olof Johansson 14321604c8 Renesas ARM Based SoC DT Updates for v4.18
* R-Mobile A1 (r8a7740) SoC
   - Describe CEU, IRQC, SYS-DMAC and USB devices
 
   - Cleanup for consistency with other Renesas SoCs and enhanced maintainability
     + Stop grouping clocks under a "clocks" subnode
     + Add soc node
     + Sort subnodes of root and soc nodes
 
 * RZ/A1H (r7s72100) SoC
   - Describe CEU device
 
 * R-Car Gen2, RZ/G1 and RZ/A1H SoCs
   - Add PMU device nodes
 
     Geert Uytterhoeven says: "This patch series enables support for the ARM
     Performance Monitor Units in Cortex-A7, Cortex-A9, and Cortex-A15 CPU
     cores on Renesas RZ/A1, R-Car Gen2, and RZ/G1 SoCs.  This allows for
     better performance analysis using the "perf" tool."
 
 * RZ/A1H (r7s72100) SoC
   - Correct interrupt types
 
     Geert Uytterhoeven says "RZ/A1H peripherals use a mix of level and edge
     interrupts.
 
     This patch series corrects the interrupt types for watchdog and RTC from
     edge to level, to match the datasheet."
 
 * R-Mobile APE6 (r8a73a4) APE4EVM board and SH-Mobile AG5 (sh73a0) SoC
   - Use generic disable-wp instead of now deprecated
     toshiba,mmc-wrprotect-disable property
 
 * EMMA Mobile EV2 (emev2) and SH-Mobile AG5 (sh73a0) SoCs
   - Add missing interrupt-affinity to PMU
 
     Geert Uytterhoeven says "The Cortex-A9 PMU nodes on SH-Mobile AG5 and
     Emma Mobile EV2 reference two interrupts, but lack interrupt-affinity
     properties, leading to:
 
     hw perfevents: no interrupt-affinity property for /pmu, guessing.
 
     This series adds the missing properties to fix this."
 
 * R-Car H2 (r8a7790) and R-Mobile APE6 (r8a73a4) SoCs
   - Correct mask for GIC PPI interrupts
 
     Geert Uytterhoeven says "R-Car H2 and R-Mobile APE6 contain four
     Cortex-A15 and four Cortex-A7 cores, hence the second interrupt
     specifier cell for Private Peripheral Interrupts should use
     "GIC_CPU_MASK_SIMPLE(8)", to make sure interrupts can be delivered to
     all 8 processor cores.
 
     This brings the predecessors of R-Car Gen3 in line with what we're
     doing on other big.LITTLE SoCs, like R-Car H3 and M3-W."
 
 * Alt board for R-Car E2 (r8a7794) SoC
 
 * RBoards for -Car Gen2 SoCs and kzm9d board for EMMA Mobile EV2 (emev2) SoC
   - Drop unnecessary address properties from VIN port nodes
 
     These are unnecessary as the nodes to not have bus addresses.
 
 * R-Car H2 (r8a7790), M2-W (r8a7791), M2-N (r8a7793) and E2 (r8a7794) SoCs
   - Describe FDP1 instances
 
 * iW-RainboW-G23S board for RZ/G1C (r8a77470) SoC
   - Initial SoC and board support
 
   - Enable EtherAVB
 
   - Describe all SCIF devices
 
 * Boards for R-Car Gen2 SoCs
   - Enable watchdog support
 
     Geert Uytterhoeven says "This patch series enables the builtin watchdog
     timer on R-Car Gen2 SoCs on all supported boards, and builds on top of
     Fabrizio's "[RFC v4 00/26] Fix watchdog on Renesas R-Car Gen2 and
     RZ/G1"."
 
 * R-Car Gen2 and RZ/G1 SoCs
   - Describe watchdog devices
 
   - For R-Car Gen2 this involves updating the SMP routine side as
     it is changed by a driver updated to allow watchdog device support
 
 * Wheat board for V2H (r8a7792) SoC
   - Correct ADV7513 address usage
 
     Kieran Bingham says "The r8a7792 Wheat board has two ADV7513 devices
     sharing a single I2C bus, however in low power mode the ADV7513 will
     reset it's slave maps to use the hardware defined default addresses.
 
     The ADV7511 driver was adapted to allow the two devices to be
     registered correctly - but it did not take into account the fault
     whereby the devices reset the addresses.
 
     This results in an address conflict between the device using the
     default addresses, and the other device if it is in low-power-mode.
 
     Repair this issue by moving both devices away from the default address
     definitions."
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE4nzZofWswv9L/nKF189kaWo3T74FAlr+s/0ACgkQ189kaWo3
 T76NqQ/+NFpjAj5GW248b9cClqkxC1OtZOC91Yz1g0kEjH1CHBc+iNRAGFKbVlFR
 k5KjRl7cp7zcJ058A47/gl0PoiRr6pXGGPqcxJwq5j5SyYb9I7bZ/wgEsd1U/cUM
 vYVj+ibmwVORfDuKxsTZU70/GXBi5vOo+T/U5csmKe2Z75emltgSH6DiMHjc/nyX
 uHIsj4IYIX07ZrTUFR6olFJfUr9YujdsD4/UtMOR32ifH8BHuPJDl5+jXngrBiFs
 pY/rvYUVctoe2/uMWB88v0NPhkWjIbY21GMluaBBDeVfM5GN34I7gqDZVQUMOx01
 Eb5L2N+BKPgRd7GrIlzsMvD/o6449p4BQ7QPqdwuY6q/5q2sZb4gA11qOGfWqnSW
 +Nm/BriJuGiNcjVMn91dv5V0cdlMur4orzbOXZ7AQnSSBmlktABCOA7tbnoosVoh
 ZVqQsmxks6RUCOTppi8tGpqtZhLdRXzzIPEmcuoDZOxtp55LipEwCrX+2QDSop75
 DhjZqq8bTdoMdd8D05bDLFFIuamf/zBW4AGE+cLBvPrRLgA4I42/2sa8KVG6H8rM
 TDAVRILt8YQmFzpW5zBRgokiE/UCZlY+g5h5K8kXEQNcl0iMiTd1seaQMjKbRP0A
 bcSJ6Ln90uDo7LZ8XxuKVDKT+HTo3unGzuOaQd0Z6BSM7S6qAfk=
 =LV2P
 -----END PGP SIGNATURE-----

Merge tag 'renesas-dt-for-v4.18' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/late

Renesas ARM Based SoC DT Updates for v4.18

* R-Mobile A1 (r8a7740) SoC
  - Describe CEU, IRQC, SYS-DMAC and USB devices
  - Cleanup for consistency with other Renesas SoCs and enhanced maintainability
* RZ/A1H (r7s72100) SoC
  - Describe CEU device
* R-Car Gen2, RZ/G1 and RZ/A1H SoCs
  - Add PMU device nodes
* RZ/A1H (r7s72100) SoC
  - Correct interrupt types
* R-Mobile APE6 (r8a73a4) APE4EVM board and SH-Mobile AG5 (sh73a0) SoC
  - Use generic disable-wp instead of now deprecated
    toshiba,mmc-wrprotect-disable property
* EMMA Mobile EV2 (emev2) and SH-Mobile AG5 (sh73a0) SoCs
  - Add missing interrupt-affinity to PMU
* R-Car H2 (r8a7790) and R-Mobile APE6 (r8a73a4) SoCs
  - Correct mask for GIC PPI interrupts
* R-Car H2 (r8a7790), M2-W (r8a7791), M2-N (r8a7793) and E2 (r8a7794) SoCs
  - Describe FDP1 instances
* R-Car Gen2 and RZ/G1 SoCs
  - Describe watchdog devices
  - For R-Car Gen2 this involves updating the SMP routine side as
    it is changed by a driver updated to allow watchdog device support

* Alt board for R-Car E2 (r8a7794) SoC
* RBoards for -Car Gen2 SoCs and kzm9d board for EMMA Mobile EV2 (emev2) SoC
* iW-RainboW-G23S board for RZ/G1C (r8a77470) SoC
  - Initial SoC and board support
  - Enable EtherAVB
  - Describe all SCIF devices
* Boards for R-Car Gen2 SoCs
  - Enable watchdog support
* Wheat board for V2H (r8a7792) SoC
  - Correct ADV7513 address usage

* tag 'renesas-dt-for-v4.18' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (69 commits)
  ARM: dts: r8a7740: Add CEU1
  ARM: dts: r8a7740: Add CEU0
  ARM: dts: r8a7745: Add PMU device node
  ARM: dts: r8a7743: Add PMU device node
  ARM: dts: r8a7794: Add PMU device node
  ARM: dts: r8a7793: Add PMU device node
  ARM: dts: r8a7792: Add PMU device node
  ARM: dts: r8a7791: Add PMU device node
  ARM: dts: r8a7790: Add PMU device nodes
  ARM: dts: r7s72100: Add PMU device node
  ARM: dts: r7s72100: Correct RTC interrupt types
  ARM: dts: r7s72100: Correct watchdog timer interrupt type
  ARM: dts: emev2: Add missing interrupt-affinity to PMU node
  ARM: dts: sh73a0: Add missing interrupt-affinity to PMU node
  ARM: dts: r8a73a4: Correct mask for GIC PPI interrupts
  ARM: dts: r8a7790: Correct mask for GIC PPI interrupts
  ARM: shmobile: r8a7794: alt: add EEPROM to DTS
  ARM: dts: kzm9d: Drop unnecessary address properties from gpio_keys node
  ARM: dts: silk: Drop unnecessary address properties from vin port node
  ARM: dts: alt: Drop unnecessary address properties from vin port node
  ...

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-02 01:32:38 -07:00
Olof Johansson 171d429a1d Renesas ARM64 Based SoC DT Updates for v4.18
* Cleanups:
   - Corresct whitespace
   - sort subnodes of the root and soc nodes
 
 * R-Car M3-N (r8a77965) SoC
   - Describe MSIOF SPI, PWM, SDHI and I2C devices in DT
 
   - Add thermal support
 
 * R-Car H3 (r8a7795) and R-Car M3-W (r8a7796) SoCs
   - Decrease temperature hysteresis
 
     Niklas Söderlund says "... decrease the hysteresis from 2C to 1C for
     the two boards we have described upstream. They have no dependencies
     and are ready to be accepted if the review is in favor of them."
 
 * R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs
   - Add address properties to rcar_sound port nodes
 
     The rcar_sound port nodes have unit names and thus should have register
     properties.
 
 * R-Car H3 (r8a7795), M3-W (r8a7796) V3M (r8a77970) and D3 (r8a77995) SoCs
   - Enable IPMMU devices
 
     Magnus Damm says "Following the policy of using DT to describe the
     hardware and not software support state, this series makes sure all
     IPMMU devices are enabled in DT for SoCs such as r8a7795, r8a7796,
     r8a77970 and r8a77995."
 
 * R-Car M3-N (r8a77965) and V3H (r8a77980) SoCs
   - Use sysc binding macros
 
     These can be used now that they are present in Linus's tree.
     This is a simple replacement of numeric values with symbolic ones.
 
   - Describe USB2 and USB3 devices in DT
 
 * R-Car V3M (r8a77970) SoC
   - Add SMP Support
 
     Geert Uytterhoeven says "This patch series enables SMP support on the
     R-Car V3M SoC, by adding the second Cortex-A53 CPU core.  It also adds
     the performance monitor unit, and links it to both CPU cores."
 
   - Correct IPMMU DS1 bit number
 
     Magnus Damm says "Judging by "R-Car-Gen3-rev0.80" IPMMU IMSSTR register
     documentation for [R-Car V3M] the DS1 bit field should be bit 0."
 
 * R-Car V3H (r8a77980) SoC
   - Use CPG clock binding macros
 
     These can be used now that they are present in Linus's tree.
     This is a simple replacement of numeric values with symbolic ones.
 
 * R-Car V3H (r8a77980) and V3M (r8a77970) SoCs
   - Disable EtherAVB
 
     Sergei Shtylov says "I'm fixing the issue in the EtherAVB device nodes
     in the R8A779{7|8}0 device trees that missed the "status" prop, usually
     disabling the SoC devices in anticipation that the board device trees
     enable the devices according to their needs. There should be no issues
     with the current R8A779{7|8}0 board device trees, as all of them use
     EtherAVB anyway, so I'm sending the patches generated against the
     'devel' branch..."
 
 * R-Car D3 (r8a77995) SoC
   - Describe VIN4 in DT
 
 * Ebisu board with R-Car E3 (r8a77990) SoC
   - Initial support:
     + PSCI
     + CPU (single)
     + Cache controller
     + Main clocks and controller
     + Interrupt controller
     + Timer
     + PMU
     + Reset controller
     + Product register
     + System controller
     + UART for console
 
 * Salvator-XS boards with R-Car H3 (r8a7795) SoC
   - Enable USB2.0 channel 3
 
 * Salvator-X and Salvator-XS boards with M3-N (r8a77965) SoC
   - Enable DU
 
     Kieran Bingham "This series enables the DU for the M3-N R8A77965 SoC,
     and provides output on the VGA and HDMI connectors.
 
     LVDS is not yet supported or tested."
 
 * Salvator-X and Salvator-XS boards with
   R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs
   - Enable nable VIN, CSI-2 and ADV7482
 
     Niklas Söderlund says "This series enable capture for H3, M3-W, M3-N
     Salvator-X and Salvator-XS boards. It also adds the VIN and CSI-2 nodes
     for V3M, but as the ADV7482 is on the V3M expansion boards I have
     chosen not include that enablement in this series."
 
   - Add PMIC DDR Backup Power config
 
     Geert Uytterhoeven says " The ROHM BD9571MWV PMIC on the Renesas
     Salvator-X(S) and ULCB development boards supports DDR Backup Power,
     which means that the DDR power rails can be kept powered while the main
     SoC is powered down.
 
     For this to function correctly, the DDR Backup Power configuration
     must be described in DT, which is the topic of this series:
       - The first patch adds the missing device node for the BD9571 PMIC on
         the ULCB boards,
       - The last two patches add DDR Backup Mode configuration for
         Salvator-X(S) and ULCB."
 
   - Add EEPROM
 
     Wolfram Sang says "Add the EEPROM found on Salvator-X and -XS boards
     for H3, M3-W, and M3-N on the IIC_DVFS bus."
 
   - Enable HDMI Sound
 
 * Salvator-X and Salvator-XS boards with
   R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs, and
   Draak board with R-Car D3 (r8a77995) SoC
   - Consistently name EtherAVB mdio pin group
 
     Geert Uytterhoeven says "When initial support was added for R-Car H3,
     the MDIO pin was forgotten, and the MDC pin got its own group named
     "mdc".  During the addition of support for R-Car M3-W, this mistake was
     noticed.  But as R-Car H3 and M3-W are pin compatible, and can be
     mounted on the same boards, the decision was made to just add the MDIO
     pin to the existing "mdc" group.  Later this was extended to R-Car H3
     ES2.0, and M3-N, because of pin compatibility, and to R-Car D3, in the
     name of consistency among R-Car Gen3 SoCs.
 
     However, this decision keeps on being questioned when adding new SoC
     support.  Hence bite the bullet and admit our mistake, and rename the
     pin group from "mdc" to "mdio", like on R-Car Gen2 SoCs."
 
 * Ebisu board with R-Car E3 (r8a77990) SoC
   - Initial support: Memory, Main crystal, Serial console
 
   - Enable Ethernet
 
   - Revise PSCI node
 
     Yoshihiro Shimoda says "The basic support patch 2d2dbadba421 ("arm64:
     dts: renesas: Add Renesas R8A77990 SoC support") lacks the compatible
     "arm,psci-1.0" in the psci node."
 
   - Revise cache controller node
 
     Yoshihiro Shimoda says "The cache controller node should not have
     unit-addresses and reg properties."
 
 * V3HSK board with R-Car V3H (r8a77980) SoC
   - Initial board device tree
 
     Sergei Shtylov says "Add the initial device  tree for  the V3H Starter
     Kit board.  The board has 1 debug serial port (SCIF0); include support
     for it, so that the serial console can work."
 
   - Enable PFC support and use for EtherAVB
 
 * V3MSK board with R-Car V3M (r8a77970) SoC
   - Add DU/LVDS/HDMI support
 
     Sergei Shtylyov says "Define the V3M Starter Kit board dependent part
     of the DU and LVDS device nodes. Also add the device nodes for Thine
     THC63LVD1024 LVDS decoder and Analog Devices ADV7511W HDMI
     transmitter..."
 
   - Enable PFC for EtherAVB
 
 * Condor board with R-Car V3H (r8a77980) SoC
   - Enable eMMC
 
     Sergei Shtylyov says "We're adding the R8A77980 MMC (SDHI)
     device nodes and then enable eMMC support on the Condor board."
 
   - Enable PFC support and use for EtherAVB and SCIF0
 
 * Eagle board with R-Car V3M (r8a77970) SoC
   - Enable HDMI output
 
 * Eagle board with R-Car V3M (r8a77970) SoC and
   Condor board with R-Car V3H (r8a77980) SoC
   - Enable CAN-FD
 
     Adds the CAN-FD device nodes so the DT SoC files and enables
     single channel CAN-FD support in DT board files.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE4nzZofWswv9L/nKF189kaWo3T74FAlr+pZkACgkQ189kaWo3
 T77FOhAAs+F25gyyQZ13/JE38Q+ku2gSrpJLwJJYjztez1EA9Jh15k7xlgU2uEaa
 +YtOodVHqL85LeHp8FMyahOmG6bh+pvqXjAc47ZVDootNm+518gdPvhZ0nWbWuYu
 SKskMN+W5pxkbDNUfIij3TCApAWEAQH4ExJxAMjTJc9ddTPmYKvOyWZTyQvm3qQA
 XsH0uOeDCmasRiJ9rUf6nPBCS4vn7U5w8cbRYQkDymrUmpEO8f+K6nMj+2yTvJJ/
 G0sA8R7Q/9ULRGs4XGadIcQ71xmwpBr13FKMC9xtlhI1NI6HIGo2RkHBRwD4mGay
 PgBhAQR4oekdyEoVwKu3KlTa3Fpohu/NbiwJMTu1QCgAtdd4MCv1RXq3QZIZtTQn
 V3FNVwaH2u4DaX814C2Dmj11zkcehc02uURzRU/pk5UmnIJ4utghu2RmpwO7YQX7
 LqBrzRFHd+Qt03Thp/fwbsdWPYvgrRDSL5v9VV/87RD0zJ9yQR/mjr4OyI5DT6g5
 bEcj535n9X3FY2SFjFWKl4BjuwOcTtiY15oFKdnFcBF2wobaiKO68YgvaYgFpHMD
 T5e+bOgq2IPKW8Z6qMboyj2z9a3oHa6xkEJMEJGKxotViQQC7JUHu3/7cknUelSs
 LP1cJy7IP/62Qb9lvUqJ70Py+L430cuHw6+/Ssf98sRaTx0VD2I=
 =4IoI
 -----END PGP SIGNATURE-----

Merge tag 'renesas-arm64-dt-for-v4.18' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into next/late

Renesas ARM64 Based SoC DT Updates for v4.18

* Cleanups:
  - Correct whitespace
  - sort subnodes of the root and soc nodes
* R-Car M3-N (r8a77965) SoC
  - Describe MSIOF SPI, PWM, SDHI and I2C devices in DT
  - Add thermal support
* R-Car H3 (r8a7795) and R-Car M3-W (r8a7796) SoCs
  - Decrease temperature hysteresis
* R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs
  - Add address properties to rcar_sound port nodes
* R-Car H3 (r8a7795), M3-W (r8a7796) V3M (r8a77970) and D3 (r8a77995) SoCs
  - Enable IPMMU devices
* R-Car M3-N (r8a77965) and V3H (r8a77980) SoCs
  - Use sysc binding macros
  - Describe USB2 and USB3 devices in DT
* R-Car V3M (r8a77970) SoC
  - Add SMP Support
  - Correct IPMMU DS1 bit number
* R-Car V3H (r8a77980) SoC
  - Use CPG clock binding macros
* R-Car V3H (r8a77980) and V3M (r8a77970) SoCs
  - Disable EtherAVB
* R-Car D3 (r8a77995) SoC
  - Describe VIN4 in DT

* Ebisu board with R-Car E3 (r8a77990) SoC
  - Initial support
* Salvator-XS boards with R-Car H3 (r8a7795) SoC
  - Enable USB2.0 channel 3
* Salvator-X and Salvator-XS boards with M3-N (r8a77965) SoC
  - Enable DU
* Salvator-X and Salvator-XS boards with
  R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs
  - Enable nable VIN, CSI-2 and ADV7482
  - Add PMIC DDR Backup Power config
  - Add EEPROM
  - Enable HDMI Sound
* Salvator-X and Salvator-XS boards with
  R-Car H3 (r8a7795), M3-W (r8a7796) and M3-N (r8a77965) SoCs, and
  Draak board with R-Car D3 (r8a77995) SoC
  - Consistently name EtherAVB mdio pin group
* Ebisu board with R-Car E3 (r8a77990) SoC
  - Initial support: Memory, Main crystal, Serial console
  - Enable Ethernet
  - Revise PSCI node
  - Revise cache controller node
* V3HSK board with R-Car V3H (r8a77980) SoC
  - Initial board device tree
  - Enable PFC support and use for EtherAVB
* V3MSK board with R-Car V3M (r8a77970) SoC
  - Add DU/LVDS/HDMI support
  - Enable PFC for EtherAVB
* Condor board with R-Car V3H (r8a77980) SoC
  - Enable eMMC
  - Enable PFC support and use for EtherAVB and SCIF0
* Eagle board with R-Car V3M (r8a77970) SoC
  - Enable HDMI output
* Eagle board with R-Car V3M (r8a77970) SoC and
  Condor board with R-Car V3H (r8a77980) SoC
  - Enable CAN-FD

* tag 'renesas-arm64-dt-for-v4.18' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas: (102 commits)
  arm64: dts: renesas: salvator-common: Add ADV7482 support
  arm64: dts: renesas: salvator-common: enable VIN
  arm64: dts: renesas: r8a77970: add VIN and CSI-2 nodes
  arm64: dts: renesas: r8a77965: add VIN and CSI-2 nodes
  arm64: dts: renesas: r8a7796: add VIN and CSI-2 nodes
  arm64: dts: renesas: r8a7795-es1: add CSI-2 node
  arm64: dts: renesas: r8a7795: add VIN and CSI-2 nodes
  arm64: dts: renesas: r8a77965: add I2C support
  arm64: dts: renesas: r8a77990: ebisu: Enable EthernetAVB
  arm64: dts: renesas: r8a77990: Add EthernetAVB device nodes
  arm64: dts: renesas: r8a77990: Add GPIO device nodes
  arm64: dts: renesas: r8a77990: Add PFC device node
  arm64: dts: renesas: initial V3HSK board device tree
  arm64: dts: renesas: r8a77980: disable EtherAVB
  arm64: dts: renesas: r8a77970: disable EtherAVB
  arm64: dts: renesas: r8a77995: Add VIN4
  arm64: dts: renesas: r8a77980: add resets property to CAN-FD node
  arm64: dts: renesas: r8a77970: Add Cortex-A53 PMU node
  arm64: dts: renesas: r8a77970: Add secondary CA53 CPU core
  arm64: dts: renesas: r8a77965: Add SDHI device nodes
  ...

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-02 01:30:07 -07:00
Olof Johansson 87815dda55 This device-tree pxa update brings :
- fix pxa3xx MMC bindings
  - add pxa3xx GPIOs
  - add pxa3xx missing pin controller
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCAA1FiEExgkueSa0u8n4Ls+HA/Z63R24yxIFAlsIfykXHHJvYmVydC5q
 YXJ6bWlrQGZyZWUuZnIACgkQA/Z63R24yxLcVg//WRbgX91qcpqmXpzgysAU7YT6
 F0mQwqt3IOWa6XUcckZKdPvl7pdLSj13yCBp/uvNi2rqSX2AydV+7p9Lz3HiXclT
 v/caYVIG1p8/idhpJy3vmiWmIcRArJm8pbDlzx1ywXEk3X6hX+j5vXsNFA6KW2qX
 YKOQT223mCrEAWL6rVuAT3/EIJ3NpuYdt6ChyzOsg7Kip9tJSplTlX6qIyBoQDKZ
 u5cakdX2818p9rd1fIX0Di7aGmS53euWUmAuQ0j3R2XgKKPPMMadWPDgH0w20FML
 Xjovvi1GWzFZYN1teJtqS7wUuyysKsQ0cReXzt2xAByTWraGVwm+g6I0TlnmI5c7
 86JztT+/pd/YCppc+QBxtMogSjzmu79ax6A4bXIEh7Xpz/HBG1qL82ukE9nkW2AF
 EnTNKuA6sMR3EsKMwPf0lP1eFQIouUCmpKdjBSs3s4vAP1FSQoVRoOyNIuPFoaA/
 KhAxBw1GPMSwAhCFQYS5xsLdv8H7xOt1ZYK3xBuyG2J4UzgCrAPxIltAg2LOHncz
 UKcC+sOBjeirC1rjyGcPXSAdz1lz+8oppJF87/ckVebcGhgZCvlng5JtLXtocPAG
 NEeG0FRLm5myCeiOuyXeBo1s06JOFkr1gS89tHHOUFYyt73w00qCcagNOg8/ODA9
 /gh2mhBEcRj46SE8tSQ=
 =xfWX
 -----END PGP SIGNATURE-----

Merge tag 'pxa-dt-4.18' of https://github.com/rjarzmik/linux into next/dt

This device-tree pxa update brings :
 - fix pxa3xx MMC bindings
 - add pxa3xx GPIOs
 - add pxa3xx missing pin controller

* tag 'pxa-dt-4.18' of https://github.com/rjarzmik/linux:
  ARM: dts: pxa3xx: fix MMC clocks
  ARM: pxa: dts: add pin definitions for extended GPIOs
  ARM: pxa: dts: add gpio-ranges to gpio controller

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-02 01:20:34 -07:00
Olof Johansson 30a04c1e32 This is is the pxa changes for v4.18 cycle :
- change to phase out at24 eeprom platform data
  - add a missing wakeup pin on pxa320 SoCs
 -----BEGIN PGP SIGNATURE-----
 
 iQJLBAABCAA1FiEExgkueSa0u8n4Ls+HA/Z63R24yxIFAlsIfowXHHJvYmVydC5q
 YXJ6bWlrQGZyZWUuZnIACgkQA/Z63R24yxI4gQ//S6EG4IeieVD8FGt7jcqp8dW/
 RDZUjK42yjTW6IRYDU5WMsTwELdYb7QwSlPe4PDxijbJC1PbeoMqLWHrZAFm+Ni5
 tDiFArgEegDbJ+nHf12fBHgUrUgoRJgnOEMKC/VGbGrfmYbfooyRsDu4Rw/pfAv0
 N3k+3+F27Ip7nzQmjZ0Uut+L0fibQz1Eoz49CB8xJ8mNZ79B/tsse3iXqqCksVRG
 0v2rGcJherHN751R02SY3sB2vg6puIGRnUiIBzMrNNT5GgD0aW3PE5yqjm0CCk61
 jTynydbrQr1A9zlAUgJqWrUKhW2Gt+C0uAGJkFh0ssRNGN6vZth8O69UaqBPXljA
 deUMOLHYhp255BKfBwi3EEvCGJPw8QHj3s0sqrjjN3aeYJG5wGBSz7An1VrLyJtR
 waGU3Ux6SAvwU2hVvXDAfICudBuKJcuUOblP5HZrVpRJyIQA5hrhAcMfyxU8ZCmY
 nXCDNZX0LUenj5NnPXFmi4+4kg1osmDU9TGmg5Zc1AicxiUbWQ8rmUJzj4geqpG/
 hOXMSr2D1wvu/bcM7cOT8WbUNBrXlLNXAfy83+CW5GSTsI1nY7i8X1c+wjbIEFd+
 5OvALK9B1R4xlqIqovnoXzYbd6h8thrfrXIlHMpoG9REB9gAqQVa4jz+nrnAFnVN
 s8l2UC4bpxKMnoCbzrQ=
 =/4US
 -----END PGP SIGNATURE-----

Merge tag 'pxa-for-4.18' of https://github.com/rjarzmik/linux into next/soc

This is is the pxa changes for v4.18 cycle :
 - change to phase out at24 eeprom platform data
 - add a missing wakeup pin on pxa320 SoCs

* tag 'pxa-for-4.18' of https://github.com/rjarzmik/linux:
  ARM: pxa3xx: enable external wakeup pins
  ARM: pxa: stargate2: use device properties for at24 eeprom

Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-02 01:19:38 -07:00
Joel Stanley 927c2fc2db ARM: dts: aspeed: Fix hwrng register address
The register address should be the full address of the rng, not the
offset from the start of the SCU.

Fixes: 5daa8212c0 ("ARM: dts: aspeed: Describe random number device")
Reviewed-by: Andrew Jeffery <andrew@aj.id.au>
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Olof Johansson <olof@lixom.net>
2018-06-02 01:18:53 -07:00
Matt Turner a00072a24a x86: msr-index.h: Correct SNB_C1/C3_AUTO_UNDEMOTE defines
According to the Intel Software Developers' Manual, Vol. 4, Order No.
335592, these macros have been reversed since they were added in the
initial turbostat commit. The reversed definitions were presumably
copied from turbostat.c to this file.

Fixes: 9c63a650bb ("tools/power/x86/turbostat: share kernel MSR #defines")
Signed-off-by: Matt Turner <mattst88@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Len Brown <len.brown@intel.com>
2018-06-01 23:12:45 -04:00
Anson Huang 64f929d824 ARM: dts: imx7: correct enet ipg clock
ENET "ipg" clock should be IMX7D_ENETx_IPG_ROOT_CLK
rather than IMX7D_ENET_AXI_ROOT_CLK which is for ENET bus
clock.

Based on Andy Duan's patch from the NXP kernel tree.

Signed-off-by: Anson Huang <Anson.Huang@nxp.com>
Reviewed-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Stephen Boyd <sboyd@kernel.org>
2018-06-01 12:15:39 -07:00
Greg Kroah-Hartman 929f45e324 kvm: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

This cleans up the error handling a lot, as this code will never get
hit.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: "Radim KrÄmář" <rkrcmar@redhat.com>
Cc: Arvind Yadav <arvind.yadav.cs@gmail.com>
Cc: Eric Auger <eric.auger@redhat.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvm@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01 19:18:27 +02:00
Marc Orr d1e5b0e98e kvm: Make VM ioctl do valloc for some archs
The kvm struct has been bloating. For example, it's tens of kilo-bytes
for x86, which turns out to be a large amount of memory to allocate
contiguously via kzalloc. Thus, this patch does the following:
1. Uses architecture-specific routines to allocate the kvm struct via
   vzalloc for x86.
2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc
   when has_vhe() is true.

Other architectures continue to default to kalloc, as they have a
dependency on kalloc or have a small-enough struct kvm.

Signed-off-by: Marc Orr <marcorr@google.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01 19:18:26 +02:00
Souptick Joarder 1499fa809e kvm: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler. For
now, this is just documenting that the function returns
a VM_FAULT value rather than an errno. Once all instances
are converted, vm_fault_t will become a distinct type.

commit 1c8f422059 ("mm: change return type to vm_fault_t")

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-01 19:18:25 +02:00
Paolo Bonzini d7a8bea363 KVM: s390: Cleanups for 4.18
- cleanups for nested, clock handling, crypto, storage keys and
 control register bits
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJa/rXhAAoJEBF7vIC1phx8+FMQAJ2hYlCKB2g52MnwRuT/QUNV
 cckX+RWfSgUT/3mBuWvTGGEepBXw/osVa8ocuwAN28E4rEwooXyyDQXUYuT6Uyyt
 3Gg1gkplCwmksZtilPMKpEIFpQ+kDEk1jx62986IRQUmPbQhBvIpxvY2gt6CXlEL
 VsHC0hNc4Gd1scelfUVh/0W8CN2DephzFEDU1qM33Z5e1j6vn/YQvdIsg4Y6OsCO
 IdnV5moNosQEPZTKBKnzi/iFsKWtlyW03llPWLqJ7wzJZUdPSpjCR35X2nheqPUl
 HOx3FfNHpdtq+UTQE+w++OvNyRsuvaOD7ueawmq+NG9B8EQuhu1zNmltdsi0s1jx
 cnOjiE3Lkyw4VPDpSDPR8tpchNoSslr2qsxGXjFF2PjcPInqueFFQjK3MNdo9w3d
 bn1zxChtJE6F1d5CorpGkW+q+dGC/vx+j0r8az/WHMS9HXpnVrNhhoLCAW8KJ9Py
 HcZmkNGGXMtl+1X/Of8TM3IrvNmvItyupuQ66LsL8Vc9oYLOcLHaNoImTAqlBbw0
 ZUWEIEChjkgL4Q6KZEnR7HHy/qtPXVvB39SEz5DAlkDJOW0eq8dJp5BPidTEFQcx
 aXMn2gkKrXS8q42+K0cZuy/90VxPTQegK2AshbFHkNHA6HqV/dce6tL71GBTzZMx
 BT1xMgTcXUjFoL3/0W/Q
 =Vdg5
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Cleanups for 4.18

- cleanups for nested, clock handling, crypto, storage keys and
control register bits
2018-06-01 19:17:41 +02:00
Paolo Bonzini 5eec43a1fa KVM/ARM updates for 4.18
- Lazy context-switching of FPSIMD registers on arm64
 - Allow virtual redistributors to be part of two or more MMIO ranges
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlsRY14VHG1hcmMuenlu
 Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDMGwP/A3FDrzGSjgC65m037/dsQj/Eniv
 NkpueEVO3Z8UN44j0TNdeUzj6vQD376GVDwnW3mFlQ416A4ZwwHkk8cQhbpP2UvQ
 EqKKUgujvLueZeuAwYG/DtrR9VZ6fh7QLD7Mv8DW/0AaNdBN2LyHEkW0qx7cSXqu
 PijTsImj9B8TSykYc0SlJz7Q7Y5QUOYbWrJqqa1cskOdmpN2ATInnA2haXeO7j8v
 lkb+WZ9R6xiJSzMCeLEzFV6tUvTiaSw5lVL64jpJhbkBNWPIVAza0erm9TSlQaTw
 d3uJlAy0W9UkXSSqvbmtXvBFqCyEOzZ0hwi2MF6RoVuFt1yXwLgHGps6OUkho4Kq
 pXWImaRHwxyQGrOY0qm0cxr+6TjYnjn8rIOzmzBOrKKq+aCIQ+Sl+CtNYzczQYeE
 rOFBQFsMlzSRJWyabUjhBGFNfDmZZaVFKnUekEqXXETtLxzLZtx+W9i4tzoA1stv
 y0+4yAjEyOQoRsAAE3GmzpDsu7Eu2sae6+lTo7DX1y+A7Wi94HKmy47sVjrS+evV
 2SLyVZ4mhwMzaQ7ngrjHLD1GXDlBxxk2X+NSmBVe5z4AsuWeoqy81f0rgjyCQNxo
 swEqs0k7mMDo8GQNjawwzhdDuHYm4gTX5iGs/Nxx6K4OoJ0bgv83yb/goArp+LEU
 /QWT4T37A/pEEECe
 =DUmC
 -----END PGP SIGNATURE-----

Merge tag 'kvmarm-for-v4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/ARM updates for 4.18

- Lazy context-switching of FPSIMD registers on arm64
- Allow virtual redistributors to be part of two or more MMIO ranges
2018-06-01 19:17:22 +02:00
Aneesh Kumar K.V 667416f385 powerpc/mm: Fix kernel crash on page table free
Fix the below crash on Book3E 64. pgtable_page_dtor expects struct
page *arg.

Also call the destructor on non book3s platforms correctly. This frees
up the split PTL locks correctly if we had allocated them before.

Call Trace:
  .kmem_cache_free+0x9c/0x44c (unreliable)
  .ptlock_free+0x1c/0x30
  .tlb_remove_table+0xdc/0x224
  .free_pgd_range+0x298/0x500
  .shift_arg_pages+0x10c/0x1e0
  .setup_arg_pages+0x200/0x25c
  .load_elf_binary+0x450/0x16c8
  .search_binary_handler.part.11+0x9c/0x248
  .do_execveat_common.isra.13+0x868/0xc18
  .run_init_process+0x34/0x4c
  .try_to_run_init_process+0x1c/0x68
  .kernel_init+0xdc/0x130
  .ret_from_kernel_thread+0x58/0x7c

Fixes: 702346768 ("powerpc/mm/nohash: Remove pte fragment dependency from nohash")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-02 01:48:11 +10:00
Mathieu Malaterre 8af1da4066 powerpc/prom: Fix %u/%llx usage since prom_printf() change
In commit eae5f709a4 ("powerpc: Add __printf verification to
prom_printf") __printf attribute was added to prom_printf(), which
means GCC started warning about type/format mismatches. As part of
that commit we changed some "%lx" formats to "%llx" where the type is
actually unsigned long long.

Unfortunately prom_printf() doesn't know how to print "%llx", it just
prints a literal "lx", eg:

  reserved memory map:
    lx - lx
    lx - lx

prom_printf() also doesn't know how to print "%u" (only "%lu"), it
just prints a literal "u", eg:

  Max number of cores passed to firmware: u (NR_CPUS = 2048)

Instead of:

  Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)

This commit adds support for the missing formatters.

Fixes: eae5f709a4 ("powerpc: Add __printf verification to prom_printf")
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Mathieu Malaterre <malat@debian.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-02 01:48:11 +10:00
Dave Martin 94b07c1f8c arm64: signal: Report signal frame size to userspace via auxv
Stateful CPU architecture extensions may require the signal frame
to grow to a size that exceeds the arch's MINSIGSTKSZ #define.
However, changing this #define is an ABI break.

To allow userspace the option of determining the signal frame size
in a more forwards-compatible way, this patch adds a new auxv entry
tagged with AT_MINSIGSTKSZ, which provides the maximum signal frame
size that the process can observe during its lifetime.

If AT_MINSIGSTKSZ is absent from the aux vector, the caller can
assume that the MINSIGSTKSZ #define is sufficient.  This allows for
a consistent interface with older kernels that do not provide
AT_MINSIGSTKSZ.

The idea is that libc could expose this via sysconf() or some
similar mechanism.

There is deliberately no AT_SIGSTKSZ.  The kernel knows nothing
about userspace's own stack overheads and should not pretend to
know.

For arm64:

The primary motivation for this interface is the Scalable Vector
Extension, which can require at least 4KB or so of extra space
in the signal frame for the largest hardware implementations.

To determine the correct value, a "Christmas tree" mode (via the
add_all argument) is added to setup_sigframe_layout(), to simulate
addition of all possible records to the signal frame at maximum
possible size.

If this procedure goes wrong somehow, resulting in a stupidly large
frame layout and hence failure of sigframe_alloc() to allocate a
record to the frame, then this is indicative of a kernel bug.  In
this case, we WARN() and no attempt is made to populate
AT_MINSIGSTKSZ for userspace.

For arm64 SVE:

The SVE context block in the signal frame needs to be considered
too when computing the maximum possible signal frame size.

Because the size of this block depends on the vector length, this
patch computes the size based not on the thread's current vector
length but instead on the maximum possible vector length: this
determines the maximum size of SVE context block that can be
observed in any signal frame for the lifetime of the process.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-06-01 15:53:10 +01:00
Dave Martin 87c021a814 arm64/sve: Thin out initialisation sanity-checks for sve_max_vl
Now that the kernel SVE support is reasonably mature, it is
excessive to default sve_max_vl to the invalid value -1 and then
sprinkle WARN_ON()s around the place to make sure it has been
initialised before use.  The cpufeatures code already runs pretty
early, and will ensure sve_max_vl gets initialised.

This patch initialises sve_max_vl to something sane that will be
supported by every SVE implementation, and removes most of the
sanity checks.

The checks in find_supported_vector_length() are retained for now.
If anything goes horribly wrong, we are likely to trip a check here
sooner or later.

Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-06-01 15:53:07 +01:00
Nicholas Piggin 1421dc6d48 powerpc/kbuild: Use flags variables rather than overriding LD/CC/AS
The powerpc toolchain can compile combinations of 32/64 bit and
big/little endian, so it's convenient to consider, e.g.,

  `CC -m64 -mbig-endian`

To be the C compiler for the purpose of invoking it to build target
artifacts. So overriding the CC variable to include these flags works
for this purpose.

Unfortunately that is not compatible with the way the proposed new
Kconfig macro language will work.

After previous patches in this series, these flags can be carefully
passed in using flags instead.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 23:08:09 +10:00
Nicholas Piggin af3901cbbd powerpc/kbuild: Remove CROSS32 defines from top level powerpc Makefile
Switch VDSO32 build over to use CROSS32_COMPILE directly, and have
it pass in -m32 after the standard c_flags. This allows endianness
overrides to be removed and the endian and bitness flags moved into
standard flags variables.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 23:08:06 +10:00
Nicholas Piggin 4bf4f42a2f powerpc/kbuild: Set default generic machine type for 32-bit compile
Some 64-bit toolchains uses the wrong ISA variant for compiling 32-bit
kernels, even with -m32. Debian's powerpc64le is one such case, and
that is because it is built with --with-cpu=power8.

So when cross compiling a 32-bit kernel with a 64-bit toolchain, set
-mcpu=powerpc initially, which is the generic 32-bit powerpc machine
type and scheduling model. CPU and platform code can override this
with subsequent -mcpu flags if necessary.

This is not done for 32-bit toolchains otherwise it would override
their defaults, which are presumably set appropriately for the
environment (moreso than a 64-bit cross compiler).

This fixes a lot of build failures due to incompatible assembly when
compiling 32-bit kernel with the Debian powerpc64le 64-bit toolchain.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2018-06-01 23:08:03 +10:00
Luc Van Oostenryck 1f2f01b122 kbuild: add machine size to CHECKFLAGS
By default, sparse assumes a 64bit machine when compiled on x86-64
and 32bit when compiled on anything else.

This can of course create all sort of problems for the other archs, like
issuing false warnings ('shift too big (32) for type unsigned long'), or
worse, failing to emit legitimate warnings.

Fix this by adding the -m32/-m64 flag, depending on CONFIG_64BIT,
to CHECKFLAGS in the main Makefile (and so for all archs).
Also, remove the now unneeded -m32/-m64 in arch specific Makefiles.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-06-01 11:36:58 +09:00
Simon Guo deeb879de9 KVM: PPC: Book3S PR: Enable kvmppc_get/set_one_reg_pr() for HTM registers
We need to migrate PR KVM during transaction and userspace will use
kvmppc_get_one_reg_pr()/kvmppc_set_one_reg_pr() APIs to get/set
transaction checkpoint state. This patch adds support for that.

So far, QEMU on PR KVM doesn't fully function for migration but the
savevm/loadvm can be done against a RHEL72 guest. During savevm/
loadvm procedure, the kvm ioctls will be invoked as well.

Test has been performed to savevm/loadvm for a guest running
a HTM test program:
https://github.com/justdoitqd/publicFiles/blob/master/test-tm-mig.c

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:31:08 +10:00
Simon Guo 8f04412699 KVM: PPC: Book3S: Remove load/put vcpu for KVM_GET_REGS/KVM_SET_REGS
In both HV and PR KVM, the KVM_SET_REGS/KVM_GET_REGS ioctl should
be able to perform without the vcpu loaded.

Since the vcpu mutex locking/unlock has been moved out of vcpu_load()
/vcpu_put(), KVM_SET_REGS/KVM_GET_REGS don't need to do ioctl with
the vcpu loaded anymore. This patch removes vcpu_load()/vcpu_put()
from KVM_SET_REGS/KVM_GET_REGS ioctl.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:31:03 +10:00
Simon Guo c8235c2891 KVM: PPC: Remove load/put vcpu for KVM_GET/SET_ONE_REG ioctl
Since the vcpu mutex locking/unlock has been moved out of vcpu_load()
/vcpu_put(), KVM_GET_ONE_REG and KVM_SET_ONE_REG doesn't need to do
ioctl with loading vcpu anymore. This patch removes vcpu_load()/vcpu_put()
from KVM_GET_ONE_REG and KVM_SET_ONE_REG ioctl.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:59 +10:00
Simon Guo b3cebfe8c1 KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl
Although we already have kvm_arch_vcpu_async_ioctl() which doesn't require
ioctl to load vcpu, the sync ioctl code need to be cleaned up when
CONFIG_HAVE_KVM_VCPU_ASYNC_IOCTL is not configured.

This patch moves vcpu_load/vcpu_put down to each ioctl switch case so that
each ioctl can decide to do vcpu_load/vcpu_put or not independently.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:53 +10:00
Simon Guo d234d68eb7 KVM: PPC: Book3S PR: Enable HTM for PR KVM for KVM_CHECK_EXTENSION ioctl
With current patch set, PR KVM now supports HTM. So this patch turns it
on for PR KVM.

Tested with:
https://github.com/justdoitqd/publicFiles/blob/master/test_kvm_htm_cap.c

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:48 +10:00
Simon Guo 7284ca8a5e KVM: PPC: Book3S PR: Support TAR handling for PR KVM HTM
Currently guest kernel doesn't handle TAR facility unavailable and it
always runs with TAR bit on. PR KVM will lazily enable TAR. TAR is not
a frequent-use register and it is not included in SVCPU struct.

Due to the above, the checkpointed TAR val might be a bogus TAR val.
To solve this issue, we will make vcpu->arch.fscr tar bit consistent
with shadow_fscr when TM is enabled.

At the end of emulating treclaim., the correct TAR val need to be loaded
into the register if FSCR_TAR bit is on.

At the beginning of emulating trechkpt., TAR needs to be flushed so that
the right tar val can be copied into tar_tm.

Tested with:
tools/testing/selftests/powerpc/tm/tm-tar
tools/testing/selftests/powerpc/ptrace/ptrace-tm-tar (remove DSCR/PPR
related testing).

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:43 +10:00
Simon Guo 68ab07b985 KVM: PPC: Book3S PR: Add guard code to prevent returning to guest with PR=0 and Transactional state
Currently PR KVM doesn't support transaction memory in guest privileged
state.

This patch adds a check at setting guest msr, so that we can never return
to guest with PR=0 and TS=0b10. A tabort will be emulated to indicate
this and fail transaction immediately.

[paulus@ozlabs.org - don't change the TM_CAUSE_MISC definition, instead
 use TM_CAUSE_KVM_FAC_UNAV.]

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:39 +10:00
Simon Guo 26798f88d5 KVM: PPC: Book3S PR: Add emulation for tabort. in privileged state
Currently privileged-state guest will be run with TM disabled.

Although the privileged-state guest cannot initiate a new transaction,
it can use tabort to terminate its problem state's transaction.
So it is still necessary to emulate tabort. for privileged-state guest.

Tested with:
https://github.com/justdoitqd/publicFiles/blob/master/test_tabort.c

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:31 +10:00
Simon Guo e32c53d1cf KVM: PPC: Book3S PR: Add emulation for trechkpt.
This patch adds host emulation when guest PR KVM executes "trechkpt.",
which is a privileged instruction and will trap into host.

We firstly copy vcpu ongoing content into vcpu tm checkpoint
content, then perform kvmppc_restore_tm_pr() to do trechkpt.
with updated vcpu tm checkpoint values.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:23 +10:00
Simon Guo 03c81682a9 KVM: PPC: Book3S PR: Add emulation for treclaim.
This patch adds support for "treclaim." emulation when PR KVM guest
executes treclaim. and traps to host.

We will firstly do treclaim. and save TM checkpoint. Then it is
necessary to update vcpu current reg content with checkpointed vals.
When rfid into guest again, those vcpu current reg content (now the
checkpoint vals) will be loaded into regs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:19 +10:00
Simon Guo 19c585eb45 KVM: PPC: Book3S PR: Restore NV regs after emulating mfspr from TM SPRs
Currently kvmppc_handle_fac() will not update NV GPRs and thus it can
return with GUEST_RESUME.

However PR KVM guest always disables MSR_TM bit in privileged state.
If PR privileged-state guest is trying to read TM SPRs, it will
trigger TM facility unavailable exception and fall into
kvmppc_handle_fac().  Then the emulation will be done by
kvmppc_core_emulate_mfspr_pr().  The mfspr instruction can include a
RT with NV reg. So it is necessary to restore NV GPRs at this case, to
reflect the update to NV RT.

This patch make kvmppc_handle_fac() return GUEST_RESUME_NV for TM
facility unavailable exceptions in guest privileged state.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:14 +10:00
Simon Guo 5706340a33 KVM: PPC: Book3S PR: Always fail transactions in guest privileged state
Currently the kernel doesn't use transaction memory.
And there is an issue for privileged state in the guest that:
tbegin/tsuspend/tresume/tabort TM instructions can impact MSR TM bits
without trapping into the PR host. So following code will lead to a
false mfmsr result:
	tbegin	<- MSR bits update to Transaction active.
	beq 	<- failover handler branch
	mfmsr	<- still read MSR bits from magic page with
		transaction inactive.

It is not an issue for non-privileged guest state since its mfmsr is
not patched with magic page and will always trap into the PR host.

This patch will always fail tbegin attempt for privileged state in the
guest, so that the above issue is prevented. It is benign since
currently (guest) kernel doesn't initiate a transaction.

Test case:
https://github.com/justdoitqd/publicFiles/blob/master/test_tbegin_pr.c

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:10 +10:00
Simon Guo 533082ae86 KVM: PPC: Book3S PR: Emulate mtspr/mfspr using active TM SPRs
The mfspr/mtspr on TM SPRs(TEXASR/TFIAR/TFHAR) are non-privileged
instructions and can be executed by PR KVM guest in problem state
without trapping into the host. We only emulate mtspr/mfspr
texasr/tfiar/tfhar in guest PR=0 state.

When we are emulating mtspr tm sprs in guest PR=0 state, the emulation
result needs to be visible to guest PR=1 state. That is, the actual TM
SPR val should be loaded into actual registers.

We already flush TM SPRs into vcpu when switching out of CPU, and load
TM SPRs when switching back.

This patch corrects mfspr()/mtspr() emulation for TM SPRs to make the
actual source/dest be the actual TM SPRs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:05 +10:00
Simon Guo 13989b65eb KVM: PPC: Book3S PR: Add math support for PR KVM HTM
The math registers will be saved into vcpu->arch.fp/vr and corresponding
vcpu->arch.fp_tm/vr_tm area.

We flush or giveup the math regs into vcpu->arch.fp/vr before saving
transaction. After transaction is restored, the math regs will be loaded
back into regs.

If there is a FP/VEC/VSX unavailable exception during transaction active
state, the math checkpoint content might be incorrect and we need to do
treclaim./load the correct checkpoint val/trechkpt. sequence to retry the
transaction. That will make our solution complicated. To solve this issue,
we always make the hardware guest MSR math bits (shadow_msr) consistent
with the MSR val which guest sees (kvmppc_get_msr()) when guest msr is
with tm enabled. Then all FP/VEC/VSX unavailable exception can be delivered
to guest and guest handles the exception by itself.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:30:00 +10:00
Simon Guo 8d2e2fc5e0 KVM: PPC: Book3S PR: Add transaction memory save/restore skeleton
The transaction memory checkpoint area save/restore behavior is
triggered when VCPU qemu process is switching out/into CPU, i.e.
at kvmppc_core_vcpu_put_pr() and kvmppc_core_vcpu_load_pr().

MSR TM active state is determined by TS bits:
    active: 10(transactional) or 01 (suspended)
    inactive: 00 (non-transactional)
We don't "fake" TM functionality for guest. We "sync" guest virtual
MSR TM active state(10 or 01) with shadow MSR. That is to say,
we don't emulate a transactional guest with a TM inactive MSR.

TM SPR support(TFIAR/TFAR/TEXASR) has already been supported by
commit 9916d57e64 ("KVM: PPC: Book3S PR: Expose TM registers").
Math register support (FPR/VMX/VSX) will be done at subsequent
patch.

Whether TM context need to be saved/restored can be determined
by kvmppc_get_msr() TM active state:
	* TM active - save/restore TM context
	* TM inactive - no need to do so and only save/restore
TM SPRs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Suggested-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:55 +10:00
Simon Guo 66c33e796c KVM: PPC: Book3S PR: Add kvmppc_save/restore_tm_sprs() APIs
This patch adds 2 new APIs, kvmppc_save_tm_sprs() and
kvmppc_restore_tm_sprs(), for the purpose of TEXASR/TFIAR/TFHAR
save/restore.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:51 +10:00
Simon Guo de7ad93219 KVM: PPC: Book3S PR: Add new kvmppc_copyto/from_vcpu_tm APIs
This patch adds 2 new APIs: kvmppc_copyto_vcpu_tm() and
kvmppc_copyfrom_vcpu_tm().  These 2 APIs will be used to copy from/to TM
data between VCPU_TM/VCPU area.

PR KVM will use these APIs for treclaim. or trechkpt. emulation.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:47 +10:00
Simon Guo 36383a0862 KVM: PPC: Book3S PR: Avoid changing TS bits when exiting guest
PR KVM host usually runs with TM enabled in its host MSR value, and
with non-transactional TS value.

When a guest with TM active traps into PR KVM host, the rfid at the
tail of kvmppc_interrupt_pr() will try to switch TS bits from
S0 (Suspended & TM disabled) to N1 (Non-transactional & TM enabled).

That will leads to TM Bad Thing interrupt.

This patch manually sets target TS bits unchanged to avoid this
exception.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:42 +10:00
Simon Guo 401a89e937 KVM: PPC: Book3S PR: Implement RFID TM behavior to suppress change from S0 to N0
According to ISA specification for RFID, in MSR TM disabled and TS
suspended state (S0), if the target MSR is TM disabled and TS state is
inactive (N0), rfid should suppress this update.

This patch makes the RFID emulation of PR KVM consistent with this.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:38 +10:00
Simon Guo 95757bfc72 KVM: PPC: Book3S PR: Sync TM bits to shadow msr for problem state guest
MSR TS bits can be modified with non-privileged instruction such as
tbegin./tend.  That means guest can change MSR value "silently" without
notifying host.

It is necessary to sync the TM bits to host so that host can calculate
shadow msr correctly.

Note, privileged mode in the guest will always fail transactions so we
only take care of problem state mode in the guest.

The logic is put into kvmppc_copy_from_svcpu() so that
kvmppc_handle_exit_pr() can use correct MSR TM bits even when preemption
occurs.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:33 +10:00
Simon Guo 901938add3 KVM: PPC: Book3S PR: Pass through MSR TM and TS bits to shadow_msr
PowerPC TM functionality needs MSR TM/TS bits support in hardware level.
Guest TM functionality can not be emulated with "fake" MSR (msr in magic
page) TS bits.

This patch syncs TM/TS bits in shadow_msr with the MSR value in magic
page, so that the MSR TS value which guest sees is consistent with actual
MSR bits running in guest.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:28 +10:00
Simon Guo 25ddda07b6 KVM: PPC: Book3S PR: Transition to Suspended state when injecting interrupt
This patch simulates interrupt behavior per Power ISA while injecting
interrupt in PR KVM:
- When interrupt happens, transactional state should be suspended.

kvmppc_mmu_book3s_64_reset_msr() will be invoked when injecting an
interrupt. This patch performs this ISA logic in
kvmppc_mmu_book3s_64_reset_msr().

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:22 +10:00
Simon Guo caa3be92be KVM: PPC: Book3S PR: Add C function wrapper for _kvmppc_save/restore_tm()
Currently __kvmppc_save/restore_tm() APIs can only be invoked from
assembly function. This patch adds C function wrappers for them so
that they can be safely called from C function.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:17 +10:00
Simon Guo 7f386af7bd KVM: PPC: Book3S PR: Turn on FP/VSX/VMX MSR bits in kvmppc_save_tm()
kvmppc_save_tm() invokes store_fp_state/store_vr_state(). So it is
mandatory to turn on FP/VSX/VMX MSR bits for its execution, just
like what kvmppc_restore_tm() did.

Previously HV KVM has turned the bits on outside of function
kvmppc_save_tm().  Now we include this bit change in kvmppc_save_tm()
so that the logic is cleaner. And PR KVM can reuse it later.

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:29:12 +10:00
Simon Guo 6f597c6b63 KVM: PPC: Book3S PR: Add guest MSR parameter for kvmppc_save_tm()/kvmppc_restore_tm()
HV KVM and PR KVM need different MSR source to indicate whether
treclaim. or trecheckpoint. is necessary.

This patch add new parameter (guest MSR) for these kvmppc_save_tm/
kvmppc_restore_tm() APIs:
- For HV KVM, it is VCPU_MSR
- For PR KVM, it is current host MSR or VCPU_SHADOW_SRR1

This enhancement enables these 2 APIs to be reused by PR KVM later.
And the patch keeps HV KVM logic unchanged.

This patch also reworks kvmppc_save_tm()/kvmppc_restore_tm() to
have a clean ABI: r3 for vcpu and r4 for guest_msr.

During kvmppc_save_tm/kvmppc_restore_tm(), the R1 need to be saved
or restored. Currently the R1 is saved into HSTATE_HOST_R1. In PR
KVM, we are going to add a C function wrapper for
kvmppc_save_tm/kvmppc_restore_tm() where the R1 will be incremented
with added stackframe and save into HSTATE_HOST_R1. There are several
places in HV KVM to load HSTATE_HOST_R1 as R1, and we don't want to
bring risk or confusion by TM code.

This patch will use HSTATE_SCRATCH2 to save/restore R1 in
kvmppc_save_tm/kvmppc_restore_tm() to avoid future confusion, since
the r1 is actually a temporary/scratch value to be saved/stored.

[paulus@ozlabs.org - rebased on top of 7b0e827c69 ("KVM: PPC: Book3S HV:
 Factor fake-suspend handling out of kvmppc_save/restore_tm", 2018-05-30)]

Signed-off-by: Simon Guo <wei.guo.simon@gmail.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
2018-06-01 10:27:59 +10:00
Russell King 10573ae547 ARM: spectre-v1: fix syscall entry
Prevent speculation at the syscall table decoding by clamping the index
used to zero on invalid system call numbers, and using the csdb
speculative barrier.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
2018-05-31 23:27:26 +01:00
Russell King 1d4238c56f ARM: spectre-v1: add array_index_mask_nospec() implementation
Add an implementation of the array_index_mask_nospec() function for
mitigating Spectre variant 1 throughout the kernel.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
2018-05-31 23:27:21 +01:00
Russell King a78d156587 ARM: spectre-v1: add speculation barrier (csdb) macros
Add assembly and C macros for the new CSDB instruction.

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Boot-tested-by: Tony Lindgren <tony@atomide.com>
Reviewed-by: Tony Lindgren <tony@atomide.com>
2018-05-31 23:27:16 +01:00
Catalin Marinas cb877710e5 Merge branch 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux
- perf/arm-cci: allow building as module

- perf/arm-ccn: demote dev_warn() to dev_dbg() in event_init()

- miscellaneous perf/arm cleanups

* 'for-next/perf' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
  ARM: mcpm, perf/arm-cci: export mcpm_is_available
  drivers/bus: arm-cci: fix build warnings
  drivers/perf: Remove ARM_SPE_PMU explicit PERF_EVENTS dependency
  drivers/perf: arm-ccn: don't log to dmesg in event_init
  perf/arm-cci: Allow building as a module
  perf/arm-cci: Remove pointless PMU disabling
  perf/arm-cc*: Fix MODULE_LICENSE() tags
  arm_pmu: simplify arm_pmu::handle_irq
  perf/arm-cci: Remove unnecessary period adjustment
  perf: simplify getting .drvdata
2018-05-31 18:09:38 +01:00
Marc Zyngier 5d81f7dc9b arm64: KVM: Add ARCH_WORKAROUND_2 discovery through ARCH_FEATURES_FUNC_ID
Now that all our infrastructure is in place, let's expose the
availability of ARCH_WORKAROUND_2 to guests. We take this opportunity
to tidy up a couple of SMCCC constants.

Acked-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:59 +01:00
Marc Zyngier b4f18c063a arm64: KVM: Handle guest's ARCH_WORKAROUND_2 requests
In order to forward the guest's ARCH_WORKAROUND_2 calls to EL3,
add a small(-ish) sequence to handle it at EL2. Special care must
be taken to track the state of the guest itself by updating the
workaround flags. We also rely on patching to enable calls into
the firmware.

Note that since we need to execute branches, this always executes
after the Spectre-v2 mitigation has been applied.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:57 +01:00
Marc Zyngier 55e3748e89 arm64: KVM: Add ARCH_WORKAROUND_2 support for guests
In order to offer ARCH_WORKAROUND_2 support to guests, we need
a bit of infrastructure.

Let's add a flag indicating whether or not the guest uses
SSBD mitigation. Depending on the state of this flag, allow
KVM to disable ARCH_WORKAROUND_2 before entering the guest,
and enable it when exiting it.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:55 +01:00
Marc Zyngier 85478bab40 arm64: KVM: Add HYP per-cpu accessors
As we're going to require to access per-cpu variables at EL2,
let's craft the minimum set of accessors required to implement
reading a per-cpu variable, relying on tpidr_el2 to contain the
per-cpu offset.

Reviewed-by: Christoffer Dall <christoffer.dall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:53 +01:00
Marc Zyngier 9cdc0108ba arm64: ssbd: Add prctl interface for per-thread mitigation
If running on a system that performs dynamic SSBD mitigation, allow
userspace to request the mitigation for itself. This is implemented
as a prctl call, allowing the mitigation to be enabled or disabled at
will for this particular thread.

Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 18:00:52 +01:00
Marc Zyngier 9dd9614f54 arm64: ssbd: Introduce thread flag to control userspace mitigation
In order to allow userspace to be mitigated on demand, let's
introduce a new thread flag that prevents the mitigation from
being turned off when exiting to userspace, and doesn't turn
it on on entry into the kernel (with the assumption that the
mitigation is always enabled in the kernel itself).

This will be used by a prctl interface introduced in a later
patch.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:35:32 +01:00
Marc Zyngier 647d0519b5 arm64: ssbd: Restore mitigation status on CPU resume
On a system where firmware can dynamically change the state of the
mitigation, the CPU will always come up with the mitigation enabled,
including when coming back from suspend.

If the user has requested "no mitigation" via a command line option,
let's enforce it by calling into the firmware again to disable it.

Similarily, for a resume from hibernate, the mitigation could have
been disabled by the boot kernel. Let's ensure that it is set
back on in that case.

Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:35:19 +01:00
Marc Zyngier 986372c436 arm64: ssbd: Skip apply_ssbd if not using dynamic mitigation
In order to avoid checking arm64_ssbd_callback_required on each
kernel entry/exit even if no mitigation is required, let's
add yet another alternative that by default jumps over the mitigation,
and that gets nop'ed out if we're doing dynamic mitigation.

Think of it as a poor man's static key...

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:35:06 +01:00
Marc Zyngier c32e1736ca arm64: ssbd: Add global mitigation state accessor
We're about to need the mitigation state in various parts of the
kernel in order to do the right thing for userspace and guests.

Let's expose an accessor that will let other subsystems know
about the state.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:34:57 +01:00
Marc Zyngier a43ae4dfe5 arm64: Add 'ssbd' command-line option
On a system where the firmware implements ARCH_WORKAROUND_2,
it may be useful to either permanently enable or disable the
workaround for cases where the user decides that they'd rather
not get a trap overhead, and keep the mitigation permanently
on or off instead of switching it on exception entry/exit.

In any case, default to the mitigation being enabled.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:34:49 +01:00
Marc Zyngier a725e3dda1 arm64: Add ARCH_WORKAROUND_2 probing
As for Spectre variant-2, we rely on SMCCC 1.1 to provide the
discovery mechanism for detecting the SSBD mitigation.

A new capability is also allocated for that purpose, and a
config option.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:34:38 +01:00
Marc Zyngier 5cf9ce6e5e arm64: Add per-cpu infrastructure to call ARCH_WORKAROUND_2
In a heterogeneous system, we can end up with both affected and
unaffected CPUs. Let's check their status before calling into the
firmware.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:34:27 +01:00
Marc Zyngier 8e2906245f arm64: Call ARCH_WORKAROUND_2 on transitions between EL0 and EL1
In order for the kernel to protect itself, let's call the SSBD mitigation
implemented by the higher exception level (either hypervisor or firmware)
on each transition between userspace and kernel.

We must take the PSCI conduit into account in order to target the
right exception level, hence the introduction of a runtime patching
callback.

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2018-05-31 17:34:01 +01:00
Kan Liang 9aae1780e7 perf/x86/intel/uncore: Clean up client IMC uncore
The counters in client IMC uncore are free running counters, not fixed
counters. It should be corrected. The new infrastructure for free
running counter should be applied.

Introducing a new type SNB_PCI_UNCORE_IMC_DATA for client IMC free
running counters.

Keeping the customized event_init() function to be compatible with old
event encoding.

Clean up other customized event_*() functions.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-8-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:29 +02:00
Kan Liang 5a6c9d94e9 perf/x86/intel/uncore: Expose uncore_pmu_event*() functions
Some uncores have customized PMU. For customized PMU, it does not need
to customize everything. For example, it only needs to customize init()
function for client IMC uncore. Other functions like
add()/del()/start()/stop()/read() can use generic code.

Expose the uncore_pmu_event_add/del/start/stop() functions.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-7-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang 0f519f0352 perf/x86/intel/uncore: Support IIO free-running counters on SKX
As of Skylake Server, there are a number of free running counters in
each IIO Box that collect counts of per-box IO clocks and per-port
Input/Output x BW/Utilization.

The free running counters cannot be part of the existing IIO BOX,
because, quoting from Peter Zijlstra:

  "This will result in some (probably) unexpected scheduling artifacts.
   Probably the only way to really cure that is to have the free running
   counters in their own PMU and not share with the GP counters of this
   box."

So let's add a new PMU for the free running counters, as suggested.

The free-running counter is read-only and always active. Counting will
be suspended only when the IIO Box is powered down.

There are three types of IIO free-running counters on Skylake server, IO
CLOCKS counter, BANDWIDTH counters and UTILIZATION counters.
IO CLOCKS counter is a clock of IIO box.
BANDWIDTH counters are to count inbound(PCIe->CPU)/outbound(CPU->PCIe)
bandwidth.
UTILIZATION counters are to count input/output utilization.

The bit width of the free-running counters is 36-bits.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-6-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang 0e0162dfcd perf/x86/intel/uncore: Add infrastructure for free running counters
There are a number of free running counters introduced for uncore, which
provide highly valuable information to a wide array of customers.
However, the generic uncore code doesn't support them yet.

The free running counters will be specially handled based on their
unique attributes:

 - They are read-only. They cannot be enabled/disabled.

 - The event and the counter are always 1:1 mapped. It doesn't need to
   be assigned nor tracked by event_list.

 - They are always active. It doesn't need to check the availability.

 - They have different bit width.

Also, using inline helpers to replace the check for fixed counter and
free running counter.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-5-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang 927b2deb06 perf/x86/intel/uncore: Add new data structures for free running counters
There are a number of free running counters introduced for uncore, which
provide highly valuable information to a wide array of customers.
For example, Skylake Server has IIO free running counters to collect
Input/Output x BW/Utilization.

There is NO event available on the general purpose counters, that is
exactly the same as the free running counters. The generic uncore code
needs to be enhanced to support the new counters.

In the uncore document, there is no event-code assigned to free running
counters. Some events need to be defined to indicate the free running
counters. The events are encoded as event-code + umask-code.

The event-code for all free running counters is 0xff, which is the same
as the fixed counters:

- It has not been decided what code will be used for common events on
  future platforms. 0xff is the only one which will definitely not be
  used as any common event-code.
- Cannot re-use current events on the general purpose counters. Because
  there is NO event available, that is exactly the same as the free
  running counters.
- Even in the existing codes, the fixed counters for core, that have the
  same event-code, may count different things. Hence, it should not
  surprise the users if the free running counters that share the same
  event-code also count different things.
  Umask will be used to distinguish the counters.

The umask-code is used to distinguish a fixed counter and a free running
counter, and different types of free running counters.

For fixed counters, the umask-code is 0x0X, where X indicates the index
of the fixed counter, which starts from 0.

 - Compatible with the old event encoding.

 - Currently, there is only one fixed counter. There are still 15
   reserved spaces for extension.

For free running counters, the umask-code uses the rest of the space.
It would follow the format of 0xXY:

 - X stands for the type of free running counters, which starts from 1.

 - Y stands for the index of free running counters of same type, which
   starts from 0.

- The free running counters do different thing. It can be categorized to
  several types, according to the MSR location, bit width and
  definition. E.g. there are three types of IIO free running counters on
  Skylake server to monitor IO CLOCKS, BANDWIDTH and UTILIZATION  on
  different ports. It makes it easy to locate the free running counter
  of a specific type.

- So far, there are at most 8 counters of each type.  There are still 8
  reserved spaces for extension.

Introducing a new index to indicate the free running counters. Only one
index is enough for all free running counters. Because the free running
counters are always active, and the event and free running counter are
always 1:1 mapped, it does not need extra index to indicate the assigned
counter.

Introducing a new data structure to store free running counters related
information for each type. It includes the number of counters, bit
width, base address, offset between counters and offset between boxes.

Introducing several inline helpers to check index for fixed counter and
free running counter, validate free running counter event, and retrieve
the free running counter information according to box and event.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-4-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang 4749f81964 perf/x86/intel/uncore: Correct fixed counter index check in generic code
There is no index which is bigger than UNCORE_PMC_IDX_FIXED. The only
exception is client IMC uncore, which has been specially handled.
For generic code, it is not correct to use >= to check fixed counter.
The code quality issue will bring problem when a new counter index is
introduced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang d71f11c076 perf/x86/intel/uncore: Correct fixed counter index check for NHM
For Nehalem and Westmere, there is only one fixed counter for W-Box.
There is no index which is bigger than UNCORE_PMC_IDX_FIXED.
It is not correct to use >= to check fixed counter.
The code quality issue will bring problem when new counter index is
introduced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-2-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:28 +02:00
Kan Liang 2da331465f perf/x86/intel/uncore: Introduce customized event_read() for client IMC uncore
There are two free-running counters for client IMC uncore. The
customized event_init() function hard codes their index to
'UNCORE_PMC_IDX_FIXED' and 'UNCORE_PMC_IDX_FIXED + 1'.
To support the index 'UNCORE_PMC_IDX_FIXED + 1', the generic
uncore_perf_event_update is obscurely hacked.
The code quality issue will bring problems when a new counter index is
introduced into the generic code, for example, a new index for
free-running counter.

Introducing a customized event_read() function for client IMC uncore.
The customized function is copied from previous generic
uncore_pmu_event_read().
The index 'UNCORE_PMC_IDX_FIXED + 1' will be isolated for client IMC
uncore only.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: acme@kernel.org
Cc: eranian@google.com
Link: http://lkml.kernel.org/r/1525371913-10597-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:36:27 +02:00