- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements
There is a small conflict in arch/s390 due to an arch-wide field rename.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJZW4XTAAoJEL/70l94x66DkhMH/izpk54KI17PtyQ9VYI2sYeZ
BWK6Kl886g3ij4pFi3pECqjDJzWaa3ai+vFfzzpJJ8OkCJT5Rv4LxC5ERltVVmR8
A3T1I/MRktSC0VJLv34daPC2z4Lco/6SPipUpPnL4bE2HATKed4vzoOjQ3tOeGTy
dwi7TFjKwoVDiM7kPPDRnTHqCe5G5n13sZ49dBe9WeJ7ttJauWqoxhlYosCGNPEj
g8ZX8+cvcAhVnz5uFL8roqZ8ygNEQq2mgkU18W8ZZKuiuwR0gdsG0gSBFNTdwIMK
NoreRKMrw0+oLXTIB8SZsoieU6Qi7w3xMAMabe8AJsvYtoersugbOmdxGCr1lsA=
=OD7H
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"PPC:
- Better machine check handling for HV KVM
- Ability to support guests with threads=2, 4 or 8 on POWER9
- Fix for a race that could cause delayed recognition of signals
- Fix for a bug where POWER9 guests could sleep with interrupts pending.
ARM:
- VCPU request overhaul
- allow timer and PMU to have their interrupt number selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
s390:
- initial machine check forwarding
- migration support for the CMMA page hinting information
- cleanups and fixes
x86:
- nested VMX bugfixes and improvements
- more reliable NMI window detection on AMD
- APIC timer optimizations
Generic:
- VCPU request overhaul + documentation of common code patterns
- kvm_stat improvements"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
Update my email address
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
x86: kvm: mmu: use ept a/d in vmcs02 iff used in vmcs12
kvm: x86: mmu: allow A/D bits to be disabled in an mmu
x86: kvm: mmu: make spte mmio mask more explicit
x86: kvm: mmu: dead code thanks to access tracking
KVM: PPC: Book3S: Fix typo in XICS-on-XIVE state saving code
KVM: PPC: Book3S HV: Close race with testing for signals on guest entry
KVM: PPC: Book3S HV: Simplify dynamic micro-threading code
KVM: x86: remove ignored type attribute
KVM: LAPIC: Fix lapic timer injection delay
KVM: lapic: reorganize restart_apic_timer
KVM: lapic: reorganize start_hv_timer
kvm: nVMX: Check memory operand to INVVPID
KVM: s390: Inject machine check into the nested guest
KVM: s390: Inject machine check into the guest
tools/kvm_stat: add new interactive command 'b'
tools/kvm_stat: add new command line switch '-i'
tools/kvm_stat: fix error on interactive command 'g'
KVM: SVM: suppress unnecessary NMI singlestep on GIF=0 and nested exit
...
Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWVpX4A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymobgCfd0d13IfpZoq1N41wc6z2Z0xD7cwAnRMeH1/p
kEeISGpHPYP9f8PBh9FO
=Hfqt
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the big driver core update for 4.13-rc1.
The large majority of this is a lot of cleanup of old fields in the
driver core structures and their remaining usages in random drivers.
All of those fixes have been reviewed by the various subsystem
maintainers. There's also some small firmware updates in here, a new
kobject uevent api interface that makes userspace interaction easier,
and a few other minor things.
All of these have been in linux-next for a long while with no reported
issues"
* tag 'driver-core-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (56 commits)
arm: mach-rpc: ecard: fix build error
zram: convert remaining CLASS_ATTR() to CLASS_ATTR_RO()
driver-core: remove struct bus_type.dev_attrs
powerpc: vio_cmo: use dev_groups and not dev_attrs for bus_type
powerpc: vio: use dev_groups and not dev_attrs for bus_type
USB: usbip: convert to use DRIVER_ATTR_RW
s390: drivers: convert to use DRIVER_ATTR_RO/WO
platform: thinkpad_acpi: convert to use DRIVER_ATTR_RO/RW
pcmcia: ds: convert to use DRIVER_ATTR_RO
wireless: ipw2x00: convert to use DRIVER_ATTR_RW
net: ehea: convert to use DRIVER_ATTR_RO
net: caif: convert to use DRIVER_ATTR_RO
TTY: hvc: convert to use DRIVER_ATTR_RW
PCI: pci-driver: convert to use DRIVER_ATTR_WO
IB: nes: convert to use DRIVER_ATTR_RW
HID: hid-core: convert to use DRIVER_ATTR_RO and drv_groups
arm: ecard: fix dev_groups patch typo
tty: serdev: use dev_groups and not dev_attrs for bus_type
sparc: vio: use dev_groups and not dev_attrs for bus_type
hid: intel-ish-hid: use dev_groups and not dev_attrs for bus_type
...
Pull SMP hotplug updates from Thomas Gleixner:
"This update is primarily a cleanup of the CPU hotplug locking code.
The hotplug locking mechanism is an open coded RWSEM, which allows
recursive locking. The main problem with that is the recursive nature
as it evades the full lockdep coverage and hides potential deadlocks.
The rework replaces the open coded RWSEM with a percpu RWSEM and
establishes full lockdep coverage that way.
The bulk of the changes fix up recursive locking issues and address
the now fully reported potential deadlocks all over the place. Some of
these deadlocks have been observed in the RT tree, but on mainline the
probability was low enough to hide them away."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
cpu/hotplug: Constify attribute_group structures
powerpc: Only obtain cpu_hotplug_lock if called by rtasd
ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init
cpu/hotplug: Remove unused check_for_tasks() function
perf/core: Don't release cred_guard_mutex if not taken
cpuhotplug: Link lock stacks for hotplug callbacks
acpi/processor: Prevent cpu hotplug deadlock
sched: Provide is_percpu_thread() helper
cpu/hotplug: Convert hotplug locking to percpu rwsem
s390: Prevent hotplug rwsem recursion
arm: Prevent hotplug rwsem recursion
arm64: Prevent cpu hotplug rwsem recursion
kprobes: Cure hotplug lock ordering issues
jump_label: Reorder hotplug lock and jump_label_lock
perf/tracing/cpuhotplug: Fix locking order
ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus()
PCI: Replace the racy recursion prevention
PCI: Use cpu_hotplug_disable() instead of get_online_cpus()
perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode()
x86/perf: Drop EXPORT of perf_check_microcode
...
Pull timer updates from Thomas Gleixner:
"A rather large update for timers/timekeeping:
- compat syscall consolidation (Al Viro)
- Posix timer consolidation (Christoph Helwig / Thomas Gleixner)
- Cleanup of the device tree based initialization for clockevents and
clocksources (Daniel Lezcano)
- Consolidation of the FTTMR010 clocksource/event driver (Linus
Walleij)
- The usual set of small fixes and updates all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
timers: Make the cpu base lock raw
clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
clocksource/drivers/tcb_clksrc: Make IO endian agnostic
clocksource/drivers/sun4i: Switch to the timer-of common init
clocksource/drivers/timer-of: Fix invalid iomap check
Revert "ktime: Simplify ktime_compare implementation"
clocksource/drivers: Fix uninitialized variable use in timer_of_init
kselftests: timers: Add test for frequency step
kselftests: timers: Fix inconsistency-check to not ignore first timestamp
time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Clean up CLOCK_MONOTONIC_RAW time handling
posix-cpu-timers: Make timespec to nsec conversion safe
itimer: Make timeval to nsec conversion range limited
timers: Fix parameter description of try_to_del_timer_sync()
ktime: Simplify ktime_compare implementation
clocksource/drivers/fttmr010: Factor out clock read code
clocksource/drivers/fttmr010: Implement delay timer
clocksource/drivers: Add timer-of common init routine
clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle were:
- Add the SYSTEM_SCHEDULING bootup state to move various scheduler
debug checks earlier into the bootup. This turns silent and
sporadically deadly bugs into nice, deterministic splats. Fix some
of the splats that triggered. (Thomas Gleixner)
- A round of restructuring and refactoring of the load-balancing and
topology code (Peter Zijlstra)
- Another round of consolidating ~20 of incremental scheduler code
history: this time in terms of wait-queue nomenclature. (I didn't
get much feedback on these renaming patches, and we can still
easily change any names I might have misplaced, so if anyone hates
a new name, please holler and I'll fix it.) (Ingo Molnar)
- sched/numa improvements, fixes and updates (Rik van Riel)
- Another round of x86/tsc scheduler clock code improvements, in hope
of making it more robust (Peter Zijlstra)
- Improve NOHZ behavior (Frederic Weisbecker)
- Deadline scheduler improvements and fixes (Luca Abeni, Daniel
Bristot de Oliveira)
- Simplify and optimize the topology setup code (Lauro Ramos
Venancio)
- Debloat and decouple scheduler code some more (Nicolas Pitre)
- Simplify code by making better use of llist primitives (Byungchul
Park)
- ... plus other fixes and improvements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits)
sched/cputime: Refactor the cputime_adjust() code
sched/debug: Expose the number of RT/DL tasks that can migrate
sched/numa: Hide numa_wake_affine() from UP build
sched/fair: Remove effective_load()
sched/numa: Implement NUMA node level wake_affine()
sched/fair: Simplify wake_affine() for the single socket case
sched/numa: Override part of migrate_degrades_locality() when idle balancing
sched/rt: Move RT related code from sched/core.c to sched/rt.c
sched/deadline: Move DL related code from sched/core.c to sched/deadline.c
sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled
sched/fair: Spare idle load balancing on nohz_full CPUs
nohz: Move idle balancer registration to the idle path
sched/loadavg: Generalize "_idle" naming to "_nohz"
sched/core: Drop the unused try_get_task_struct() helper function
sched/fair: WARN() and refuse to set buddy when !se->on_rq
sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well
sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming
sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c
sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h>
sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h>
...
Pull EFI updates from Ingo Molnar:
"The main changes in this cycle were:
- Rework the EFI capsule loader to allow for workarounds for
non-compliant firmware (Ard Biesheuvel)
- Implement a capsule loader quirk for Quark X102x (Jan Kiszka)
- Enable SMBIOS/DMI support for the ARM architecture (Ard Biesheuvel)
- Add CONFIG_EFI_PGT_DUMP=y support for x86-32 and kexec (Sai
Praneeth)
- Fixes for EFI support for Xen dom0 guests running under x86-64
hosts (Daniel Kiper)"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen/efi: Initialize only the EFI struct members used by Xen
efi: Process the MEMATTR table only if EFI_MEMMAP is enabled
efi/arm: Enable DMI/SMBIOS
x86/efi: Extend CONFIG_EFI_PGT_DUMP support to x86_32 and kexec as well
efi/efi_test: Use memdup_user() helper
efi/capsule: Add support for Quark security header
efi/capsule-loader: Use page addresses rather than struct page pointers
efi/capsule-loader: Redirect calls to efi_capsule_setup_info() via weak alias
efi/capsule: Remove NULL test on kmap()
efi/capsule-loader: Use a cached copy of the capsule header
efi/capsule: Adjust return type of efi_capsule_setup_info()
efi/capsule: Clean up pr_err/_info() messages
efi/capsule: Remove pr_debug() on ENOMEM or EFAULT
efi/capsule: Fix return code on failing kmap/vmap
With the introduction of struct pci_host_bridge.map_irq pointer it is
possible to assign IRQs for all devices originating from a PCI host bridge
at probe time; this is implemented through pci_assign_irq() that relies on
the struct pci_host_bridge.map_irq pointer to map IRQ for a given device.
The benefits this brings are twofold:
- the IRQ for a device is assigned once at probe time
- the IRQ assignment works also for hotplugged devices
With all DT based PCI host bridges converted to the struct
pci_host_bridge.{map/swizzle}_irq hooks mechanism the DT IRQ allocation in
ARM64 pcibios_alloc_irq() is now redundant and can be removed.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Will Deacon <will.deacon@arm.com>
attribute_groups are not supposed to change at runtime. All functions
working with attribute_groups provided by <linux/sysfs.h> work with const
attribute_group. So mark the non-const structs as const.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
- vcpu request overhaul
- allow timer and PMU to have their interrupt number
selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAllWCM0VHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDjJ0QAI16x6+trKhH31lTSYekYfqm4hZ2
Fp7IbALW9KNCaY35tZov2Zuh99qGRduxTh7ewqhKpON8kkU+UKj0F7zH22+vfN4m
yas/+uNr8R9VLyvea4ysPsgx8Q8v1Ix9setohHYNZIL9/klVqtaHpYvArHVF/mzq
p2j/NxRS2dlp9r2TtoMRMhA05u6r0wolhUuh+z9v2ipib0gfOBIG24jsqCTEcD9n
5A/cVd+ztYshkrV95h3y9peahwt3zOA4QBGzrQ2K25jp0s54nqhmC7JTNSa8dtar
YGW2MuAMoIFTwCFAlpwCzrwpOJFzF3Q6A8bOxei2fjclzjPMgT1xQxuhOoe4ntFa
lTPxSHalm5W6dFTW90YSo2DBcPe+N7sQkhjR0cCeY3GYsOFhXMLTlOl5Pt1YK1or
+3FAI74tFRKvVmb9mhZeGTvuzhDgRvtf3Qq5rjwlGzKc2BBOEgtMyj/Wgwo4N6Dz
IjOnoRaUGELoBCWoTorMxLpsPBdPVSUxNyJTdAhqZ/ZtT1xqjhFNLZcrVWmOTzDM
1cav+jZkla4sLmJSNDD54aCSvvtPHis0nZn9PRlh12xgOyYiAVx4K++MNuWP0P37
hbh1gbPT+FcoVxPurUsX/pjNlTucPZcBwFytZDQlpwtPBpEFzJiImLYe/PldRb0f
9WQOH1Y1+q14MF+N
=6hNK
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-for-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for 4.13
- vcpu request overhaul
- allow timer and PMU to have their interrupt number
selected from userspace
- workaround for Cavium erratum 30115
- handling of memory poisonning
- the usual crop of fixes and cleanups
Conflicts:
arch/s390/include/asm/kvm_host.h
Now that compat_vfp_get() uses the regset API to copy the FPSCR
value out to userspace, compat_vfp_set() looks inconsistent. In
particular, compat_vfp_set() will fail if called with kbuf != NULL
&& ubuf == NULL (which is valid usage according to the regset API).
This patch fixes compat_vfp_set() to use user_regset_copyin(),
similarly to compat_vfp_get().
This also squashes a sparse warning triggered by the cast that
drops __user when calling get_user().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
compat_vfp_set() checks for userspace trying to write an excessive
amount of data to the regset. However this check is conspicuous
for its absence from every other _set() in the arm64 ptrace
implementation. In fact, the core ptrace_regset() already clamps
userspace's iov_len to the regset size before the individual regset
.{get,set}() methods get called.
This patch removes the redundant check.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
If get_user() fails when reading the new FPSCR value from userspace
in compat_vfp_get(), then garbage* will be written to the task's
FPSR and FPCR registers.
This patch prevents this by checking the return from get_user()
first.
[*] Actually, zero, due to the behaviour of get_user() on error, but
that's still not what userspace expects.
Fixes: 478fcb2cdb ("arm64: Debugging support")
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
get_alt_insn() is used to read and create ARM instructions, which
are always stored in memory in little-endian order. These values
are thus correctly converted to/from native order when processed
but the pointers used to hold the address of these instructions
are declared as for native order values.
Fix this by declaring the pointers as __le32* instead of u32* and
make the few appropriate needed changes like removing the unneeded
cast '(u32*)' in front of __ALT_PTR()'s definition.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In the flattened device tree format, all integer properties are
in big-endian order.
Here the property "kaslr-seed" is read from the fdt and then
correctly converted to native order (via fdt64_to_cpu()) but the
pointer used for this is not annotated as being for big-endian.
Fix this by declaring the pointer as fdt64_t instead of u64
(fdt64_t being itself typedefed to __be64).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Here both variables 'cpu_id' and 'entry_point' are read via
read[lq]_relaxed(), from a little-endian annotated pointer
and then used as a native endian value.
This is correct since the read[lq]() family of function
internally do a little-to-native endian conversion.
But in this case, it is wrong to declare these variable as
little-endian since there are native ones.
Fix this by changing the declaration of these variables
as 'u32' or 'u64' instead of '__le32' / '__le64'.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Here the entrypoint, declared as a 64 bit integer, is read from
a pointer to 64bit integer but the read is done via readl_relaxed()
which is for 32bit quantities.
All the high bits will thus be lost which change the meaning
of the test against zero done later.
Fix this by using readq_relaxed() instead as it should be for
64bit quantities.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Here the functions reloc_insn_movw() & reloc_insn_imm() are used
to read, modify and write back ARM instructions, which are always
stored in memory in little-endian order. These values are thus
correctly converted to/from native order but the pointers used to
hold their addresses are declared as for native order values.
Fix this by declaring the pointers as __le32* and remove the
casts that are now unneeded.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
aarch64_insn_write() is used to write an instruction.
As on ARM64 in-memory instructions are always stored
in little-endian order, this function, taking the instruction
opcode in native order, correctly convert it to little-endian
before sending it to an helper function __aarch64_insn_write()
which will do the effective write.
This is all good, but the variable and argument holding the
converted value are not annotated for a little-endian value
but left for native values.
Fix this by adjusting the prototype of the helper and
directly using the result of cpu_to_le32() without passing
by an intermediate variable (which was not a distinct one
but the same as the one holding the native value).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The function arch64_insn_read() is used to read an instruction.
On AM64 instructions are always stored in little-endian order
and thus the function correctly do a little-to-native endian
conversion to the value just read.
However, the variable used to hold the value before the conversion
is not declared for a little-endian value but for a native one.
Fix this by using the correct type for the declaration: __le32
Note: This only works because the function reading the value,
probe_kernel_read((), takes a void pointer and void pointers
are endian-agnostic. Otherwise probe_kernel_read() should
also be properly annotated (or worse, need to be specialized).
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Here we're reading thumb or ARM instructions, which are always
stored in memory in little-endian order. These values are thus
correctly converted to native order but the intermediate value
should be annotated as for little-endian values.
Fix this by declaring the intermediate var as __le32 or __le16.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Here we're reading thumb or ARM instructions, which are always
stored in memory in little-endian order. These values are thus
correctly converted to native order but the intermediate value
should be annotated as for little-endian values.
Fix this by declaring the intermediate var as __le32 or __le16.
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When a kernel is built without CONFIG_ARM64_MODULE_PLTS, we don't
generate the expected branch instruction in ftrace_make_nop(). This
means we pass zero (rather than a valid branch) to ftrace_modify_code()
as the expected instruction to validate. This causes us to return
-EINVAL to the core ftrace code for a valid case, resulting in a splat
at boot time.
This was an unintended effect of commit:
687644209a ("arm64: ftrace: fix building without CONFIG_MODULES")
... which incorrectly moved the generation of the branch instruction
into the ifdef for CONFIG_ARM64_MODULE_PLTS.
This patch fixes the issue by moving the ifdef inside of the relevant
if-else case, and always checking that the branch is in range,
regardless of CONFIG_ARM64_MODULE_PLTS. This ensures that we generate
the expected branch instruction, and also improves our sanity checks.
For consistency, both ftrace_make_nop() and ftrace_make_call() are
updated with this pattern.
Fixes: 687644209a ("arm64: ftrace: fix building without CONFIG_MODULES")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch defines an extra_context signal frame record that can be
used to describe an expanded signal frame, and modifies the context
block allocator and signal frame setup and parsing code to create,
populate, parse and decode this block as necessary.
To avoid abuse by userspace, parse_user_sigframe() attempts to
ensure that:
* no more than one extra_context is accepted;
* the extra context data is a sensible size, and properly placed
and aligned.
The extra_context data is required to start at the first 16-byte
aligned address immediately after the dummy terminator record
following extra_context in rt_sigframe.__reserved[] (as ensured
during signal delivery). This serves as a sanity-check that the
signal frame has not been moved or copied without taking the extra
data into account.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
[will: add __force annotation when casting extra_datap to __user pointer]
Signed-off-by: Will Deacon <will.deacon@arm.com>
When debugging a kernel panic(), it can be useful to know which CPU
features have been detected by the kernel, as some code paths can depend
on these (and may have been patched at runtime).
This patch adds a notifier to dump the detected CPU caps (as a hex
string) at panic(), when we log other information useful for debugging.
On a Juno R1 system running v4.12-rc5, this looks like:
[ 615.431249] Kernel panic - not syncing: Fatal exception in interrupt
[ 615.437609] SMP: stopping secondary CPUs
[ 615.441872] Kernel Offset: disabled
[ 615.445372] CPU features: 0x02086
[ 615.448522] Memory Limit: none
A developer can decode this by looking at the corresponding
<asm/cpucaps.h> bits. For example, the above decodes as:
* bit 1: ARM64_WORKAROUND_DEVICE_LOAD_ACQUIRE
* bit 2: ARM64_WORKAROUND_845719
* bit 7: ARM64_WORKAROUND_834220
* bit 13: ARM64_HAS_32BIT_EL0
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When reading current's user-writable TLS register (which occurs
when dumping core for native tasks), it is possible that userspace
has modified it since the time the task was last scheduled out.
The new TLS register value is not guaranteed to have been written
immediately back to thread_struct in this case.
As a result, a coredump can capture stale data for this register.
Reading the register for a stopped task via ptrace is unaffected.
For native tasks, this patch explicitly flushes the TPIDR_EL0
register back to thread_struct before dumping when operating on
current, thus ensuring that coredump contents are up to date. For
compat tasks, the TLS register is not user-writable and so cannot
be out of sync, so no flush is required in compat_tls_get().
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
When reading the FPSIMD state of current (which occurs when dumping
core), it is possible that userspace has modified the FPSIMD
registers since the time the task was last scheduled out. Such
changes are not guaranteed to be reflected immedately in
thread_struct.
As a result, a coredump can contain stale values for these
registers. Reading the registers of a stopped task via ptrace is
unaffected.
This patch explicitly flushes the CPU state back to thread_struct
before dumping when operating on current, thus ensuring that
coredump contents are up to date.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, VFP registers are omitted from coredumps for compat
processes, due to a bug in the REGSET_COMPAT_VFP regset
implementation.
compat_vfp_get() needs to transfer non-contiguous data from
thread_struct.fpsimd_state, and uses put_user() to handle the
offending trailing word (FPSCR). This fails when copying to a
kernel address (i.e., kbuf && !ubuf), which is what happens when
dumping core. As a result, the ELF coredump core code silently
omits the NT_ARM_VFP note from the dump.
It would be possible to work around this with additional special
case code for the put_user(), but since user_regset_copyout() is
explicitly designed to handle this scenario it is cleaner to port
the put_user() to a user_regset_copyout() call, which this patch
does.
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Merge time(keeping) updates from John Stultz:
"Just a small set of changes, the biggest changes being the MONOTONIC_RAW
handling cleanup, and a new kselftest from Miroslav. Also a a clear
warning deprecating CONFIG_GENERIC_TIME_VSYSCALL_OLD, which affects ppc
and ia64."
Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Tested-by: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
This patch factors out the allocator for signal frame optional
records into a separate function, to ensure consistency and
facilitate later expansion.
No overrun checking is currently done, because the allocation is in
user memory and anyway the kernel never tries to allocate enough
space in the signal frame yet for an overrun to occur. This
behaviour will be refined in future patches.
The approach taken in this patch to allocation of the terminator
record is not very clean: this will also be replaced in subsequent
patches.
For future extension, a comment is added in sigcontext.h
documenting the current static allocations in __reserved[]. This
will be important for determining under what circumstances
userspace may or may not see an expanded signal frame.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In preparation for expanding the signal frame, this patch refactors
the signal frame setup code in setup_sigframe() into two separate
passes.
The first pass, setup_sigframe_layout(), determines the size of the
signal frame and its internal layout, including the presence and
location of optional records. The resulting knowledge is used to
allocate and locate the user stack space required for the signal
frame and to determine which optional records to include.
The second pass, setup_sigframe(), is called once the stack frame
is allocated in order to populate it with the necessary context
information.
As a result of these changes, it becomes more natural to represent
locations in the signal frame by a base pointer and an offset,
since the absolute address of each location is not known during the
layout pass. To be more consistent with this logic,
parse_user_sigframe() is refactored to describe signal frame
locations in a similar way.
This change has no effect on the signal ABI, but will make it
easier to expand the signal frame in future patches.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, rt_sigreturn does very limited checking on the
sigcontext coming from userspace.
Future additions to the sigcontext data will increase the potential
for surprises. Also, it is not clear whether the sigcontext
extension records are supposed to occur in a particular order.
To allow the parsing code to be extended more easily, this patch
factors out the sigcontext parsing into a separate function, and
adds extra checks to validate the well-formedness of the sigcontext
structure.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
In order to be able to increase the amount of the data currently
written to the __reserved[] array in the signal frame, it is
necessary to overwrite the locations currently occupied by the
{fp,lr} frame link record pushed at the top of the signal stack.
In order for this to work, this patch detaches the frame link
record from struct rt_sigframe and places it separately at the top
of the signal stack. This will allow subsequent patches to insert
data between it and __reserved[].
This change relies on the non-ABI status of the placement of the
frame record with respect to struct sigframe: this status is
undocumented, but the placement is not declared or described in the
user headers, and known unwinder implementations (libgcc,
libunwind, gdb) appear not to rely on it.
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Conflicts:
kernel/sched/Makefile
Pick up the waitqueue related renames - it didn't get much feedback,
so it appears to be uncontroversial. Famous last words? ;-)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Recently vDSO support for CLOCK_MONOTONIC_RAW was added in
49eea433b3 ("arm64: Add support for CLOCK_MONOTONIC_RAW in
clock_gettime() vDSO"). Noticing that the core timekeeping code
never set tkr_raw.xtime_nsec, the vDSO implementation didn't
bother exposing it via the data page and instead took the
unshifted tk->raw_time.tv_nsec value which was then immediately
shifted left in the vDSO code.
Unfortunately, by accellerating the MONOTONIC_RAW clockid, it
uncovered potential 1ns time inconsistencies caused by the
timekeeping core not handing sub-ns resolution.
Now that the core code has been fixed and is actually setting
tkr_raw.xtime_nsec, we need to take that into account in the
vDSO by adding it to the shifted raw_time value, in order to
fix the user-visible inconsistency. Rather than do that at each
use (and expand the data page in the process), instead perform
the shift/addition operation when populating the data page and
remove the shift from the vDSO code entirely.
[jstultz: minor whitespace tweak, tried to improve commit
message to make it more clear this fixes a regression]
Reported-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Tested-by: Daniel Mentz <danielmentz@google.com>
Acked-by: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: "stable #4 . 8+" <stable@vger.kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Link: http://lkml.kernel.org/r/1496965462-20003-4-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel watchdog is a great debugging tool for finding tasks that
consume a disproportionate amount of CPU time in contiguous chunks. One
can imagine building a similar watchdog for arbitrary driver threads
using save_stack_trace_tsk() and print_stack_trace(). However, this is
not viable for dynamically loaded driver modules on ARM platforms
because save_stack_trace_tsk() is not exported for those architectures.
Export save_stack_trace_tsk() for the ARM64 architecture to align with
x86 and support various debugging use cases such as arbitrary driver
thread watchdog timers.
Signed-off-by: Dustin Brown <dustinb@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Some Cavium Thunder CPUs suffer a problem where a KVM guest may
inadvertently cause the host kernel to quit receiving interrupts.
Use the Group-0/1 trapping in order to deal with it.
[maz]: Adapted patch to the Group-0/1 trapping, reworked commit log
Tested-by: Alexander Graf <agraf@suse.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Christoffer Dall <cdall@linaro.org>
The function name is now renamed to 'timer_probe' for consistency with
the CLOCKSOURCE_OF_DECLARE => TIMER_OF_DECLARE change.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
When CONFIG_MODULES is disabled, we cannot dereference a module pointer:
arch/arm64/kernel/ftrace.c: In function 'ftrace_make_call':
arch/arm64/kernel/ftrace.c:107:36: error: dereferencing pointer to incomplete type 'struct module'
trampoline = (unsigned long *)mod->arch.ftrace_trampoline;
Also, the within_module() function is not defined:
arch/arm64/kernel/ftrace.c: In function 'ftrace_make_nop':
arch/arm64/kernel/ftrace.c:171:8: error: implicit declaration of function 'within_module'; did you mean 'init_module'? [-Werror=implicit-function-declaration]
This addresses both by adding replacing the IS_ENABLED(CONFIG_ARM64_MODULE_PLTS)
checks with #ifdef versions.
Fixes: e71a4e1beb ("arm64: ftrace: add support for far branches to dynamic ftrace")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, dynamic ftrace support in the arm64 kernel assumes that all
core kernel code is within range of ordinary branch instructions that
occur in module code, which is usually the case, but is no longer
guaranteed now that we have support for module PLTs and address space
randomization.
Since on arm64, all patching of branch instructions involves function
calls to the same entry point [ftrace_caller()], we can emit the modules
with a trampoline that has unlimited range, and patch both the trampoline
itself and the branch instruction to redirect the call via the trampoline.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: minor clarification to smp_wmb() comment]
Signed-off-by: Will Deacon <will.deacon@arm.com>
When turning branch instructions into NOPs, we attempt to validate the
action by comparing the old value at the call site with the opcode of
a direct relative branch instruction pointing at the old target.
However, these call sites are statically initialized to call _mcount(),
and may be redirected via a PLT entry if the module is loaded far away
from the kernel text, leading to false negatives and spurious errors.
So skip the validation if CONFIG_ARM64_MODULE_PLTS is configured.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Wire up the existing arm64 support for SMBIOS tables (aka DMI) for ARM as
well, by moving the arm64 init code to drivers/firmware/efi/arm-runtime.c
(which is shared between ARM and arm64), and adding a asm/dmi.h header to
ARM that defines the mapping routines for the firmware tables.
This allows userspace to access these tables to discover system information
exposed by the firmware. It also sets the hardware name used in crash
dumps, e.g.:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = ed3c0000
[00000000] *pgd=bf1f3835
Internal error: Oops: 817 [#1] SMP THUMB2
Modules linked in:
CPU: 0 PID: 759 Comm: bash Not tainted 4.10.0-09601-g0e8f38792120-dirty #112
Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
^^^
NOTE: This does *NOT* enable or encourage the use of DMI quirks, i.e., the
the practice of identifying the platform via DMI to decide whether
certain workarounds for buggy hardware and/or firmware need to be
enabled. This would require the DMI subsystem to be enabled much
earlier than we do on ARM, which is non-trivial.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/20170602135207.21708-14-ard.biesheuvel@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Commit 3fde2999fa ("arm64: cpufeature: Don't dump useless backtrace on
CPU_OUT_OF_SPEC") changed the cpufeature detection code to use add_taint
instead of WARN_TAINT_ONCE when detecting a heterogeneous system with
mismatched feature support. Unfortunately, this resulted in all systems
getting the taint, regardless of any feature mismatch.
This patch fixes the problem by conditionalising the taint on detecting
a feature mismatch.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that some functions that deal with arch topology information live
under drivers, there is a clash of naming that might create confusion.
Tidy things up by creating a topology namespace for interfaces used by
arch code; achieve this by prepending a 'topology_' prefix to driver
interfaces.
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>