If a vm with no VCPUs is created, the injection of a floating irq
leads to an endless loop in the kernel.
Let's skip the search for a destination VCPU for a floating irq if no
VCPUs were created.
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch introduces a new module parameter for the KVM module; when it
is present, KVM attempts a bit of polling on every HLT before scheduling
itself out via kvm_vcpu_block.
This parameter helps a lot for latency-bound workloads---in particular
I tested it with O_DSYNC writes with a battery-backed disk in the host.
In this case, writes are fast (because the data doesn't have to go all
the way to the platters) but they cannot be merged by either the host or
the guest. KVM's performance here is usually around 30% of bare metal,
or 50% if you use cache=directsync or cache=writethrough (these
parameters avoid that the guest sends pointless flush requests, and
at the same time they are not slow because of the battery-backed cache).
The bad performance happens because on every halt the host CPU decides
to halt itself too. When the interrupt comes, the vCPU thread is then
migrated to a new physical CPU, and in general the latency is horrible
because the vCPU thread has to be scheduled back in.
With this patch performance reaches 60-65% of bare metal and, more
important, 99% of what you get if you use idle=poll in the guest. This
means that the tunable gets rid of this particular bottleneck, and more
work can be done to improve performance in the kernel or QEMU.
Of course there is some price to pay; every time an otherwise idle vCPUs
is interrupted by an interrupt, it will poll unnecessarily and thus
impose a little load on the host. The above results were obtained with
a mostly random value of the parameter (500000), and the load was around
1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
that can be used to tune the parameter. It counts how many HLT
instructions received an interrupt during the polling period; each
successful poll avoids that Linux schedules the VCPU thread out and back
in, and may also avoid a likely trip to C1 and back for the physical CPU.
While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
Of these halts, almost all are failed polls. During the benchmark,
instead, basically all halts end within the polling period, except a more
or less constant stream of 50 per second coming from vCPUs that are not
running the benchmark. The wasted time is thus very low. Things may
be slightly different for Windows VMs, which have a ~10 ms timer tick.
The effect is also visible on Marcelo's recently-introduced latency
test for the TSC deadline timer. Though of course a non-RT kernel has
awful latency bounds, the latency of the timer is around 8000-10000 clock
cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC
deadline timer, thus, the effect is both a smaller average latency and
a smaller variance.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Kim Phillips reported following build failure.
LD init/built-in.o
mm/built-in.o: In function `free_pages_prepare':
mm/page_alloc.c:770: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `prep_new_page':
mm/page_alloc.c:933: undefined reference to `.kernel_map_pages'
mm/built-in.o: In function `map_pages':
mm/compaction.c:61: undefined reference to `.kernel_map_pages'
make: *** [vmlinux] Error 1
Reason for this problem is that commit 031bc5743f
("mm/debug-pagealloc: make debug-pagealloc boottime configurable")
forgot to remove the old declaration of kernel_map_pages() for some
architectures. This patch removes them to fix build failure.
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With SMT we can have more than 256 CPUs. Let's make them available.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use a brcl 0,2 instruction for jump label nops during compile time,
so we don't mix up the different nops during mcount/hotpatch call
site detection.
The initial jump label code instruction replacement will exchange
these instructions with either a branch or a brcl 0,0 instruction.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add sanity checks to verify that only expected code will be replaced.
If the code patterns do not match print the code patterns and panic,
since something went terribly wrong.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make use of gcc's hotpatch support to generate better code for ftrace
function tracing.
The generated code now contains only a six byte nop in each function
prologue instead of a 24 byte code block which will be runtime patched to
support function tracing.
With the new code generation the runtime overhead for supporting function
tracing is close to zero, while the original code did show a significant
performance impact.
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Christian Borntraeger reported that the now missing diag 44 calls (voluntary
time slice end) does cause a performance regression for stop_machine() calls
if a machine has more virtual cpus than the host has physical cpus.
This patch mainly reverts 57f2ffe14f ("s390: remove diag 44 calls from
cpu_relax()") with the exception that we still do not issue diag 44 calls if
running with smt enabled. Due to group scheduling algorithms when running in
LPAR this would lead to significant latencies.
However, when running in LPAR we do not have more virtual than physical cpus.
Reported-and-tested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add the compare-and-delay instruction to the spin-lock and rw-lock
retry loops. A CPU executing the compare-and-delay instruction stops
until the lock value has changed. This is done to make the locking
code for contended locks to behave better in regard to the multi-
hreading facility. A thread of a core executing a compare-and-delay
will allow the other threads of a core to get a larger share of the
core resources.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When we convert interrupt data from struct kvm_s390_interrupt to
struct kvm_s390_irq we need to check the data in the input parameter
not the output parameter.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We have to delete the allocated interrupt info if __inject_vm() fails.
Otherwise user space can keep flooding kvm with floating interrupts and
provoke more and more memory leaks.
Reported-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org # v3.15+
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Created new KVM device attributes for indicating whether the AES and
DES/TDES protected key functions are available for programs running
on the KVM guest. The attributes are used to set up the controls in
the guest SIE block that specify whether programs running on the
guest will be given access to the protected key functions available
on the s390 hardware.
Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[split MSA4/protected key into two patches]
Provide controls for setting/getting the guest TOD clock based on the VM
attribute interface.
Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of
Day clock value.
TOD_HIGH is presently always set to 0. In the future it will contain a high
order expansion of the tod clock value after it overflows the 64-bits of
the TOD.
Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
When injecting SIGP set prefix or a machine check, we trace
the values in our per-vcpu local_int data structure instead
of the parameters passed to the function.
Fix this by changing the trace statement to use the correct values.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Currently we are always setting the wrong bit in the
bitmap for pending emergency signals. Instead of using
emerg.code from the passed in irq parameter, we use the
value in our per-vcpu local_int structure, which is always zero.
That means all emergency signals will have address 0 as parameter.
If two CPUs send a SIGP to the same target, one might be lost.
Let's fix this by using the value from the parameter and
also trace the correct value.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The handler for MVPG partial execution interception does not take
the current CPU addressing mode into account yet, so addresses are
always treated as 64-bit addresses. For correct behaviour, we should
properly handle 24-bit and 31-bit addresses, too.
Since MVPG is defined to work with logical addresses, we can simply
use guest_translate_address() to achieve the required behaviour
(since DAT is disabled here, guest_translate_address() skips the MMU
translation and only translates the address via kvm_s390_logical_to_effective()
and kvm_s390_real_to_abs(), which is exactly what we want here).
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The kvm mutex was (probably) used to protect against cpu hotplug.
The current code no longer needs to protect against that, as we only
rely on CPU data structures that are guaranteed to be available
if we can access the CPU. (e.g. vcpu_create will put the cpu
in the array AFTER the cpu is ready).
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Most SIGP orders are handled partially in kernel and partially in
user space. In order to:
- Get a correct SIGP SET PREFIX handler that informs user space
- Avoid race conditions between concurrently executed SIGP orders
- Serialize SIGP orders per VCPU
We need to handle all "slow" SIGP orders in user space. The remaining
ones to be handled completely in kernel are:
- SENSE
- SENSE RUNNING
- EXTERNAL CALL
- EMERGENCY SIGNAL
- CONDITIONAL EMERGENCY SIGNAL
According to the PoP, they have to be fast. They can be executed
without conflicting to the actions of other pending/concurrently
executing orders (e.g. STOP vs. START).
This patch introduces a new capability that will - when enabled -
forward all but the mentioned SIGP orders to user space. The
instruction counters in the kernel are still updated.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We need a way to clear the async pfault queue from user space (e.g.
for resets and SIGP SET ARCHITECTURE).
This patch simply clears the queue as soon as user space sets the
invalid pfault token. The definition of the invalid token is moved
to uapi.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Only one external call may be pending at a vcpu at a time. For this
reason, we have to detect whether the SIGP externcal call interpretation
facility is available. If so, all external calls have to be injected
using this mechanism.
SIGP EXTERNAL CALL orders have to return whether another external
call is already pending. This check was missing until now.
SIGP SENSE hasn't returned yet in all conditions whether an external
call was pending.
If a SIGP EXTERNAL CALL irq is to be injected and one is already
pending, -EBUSY is returned.
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch introduces the infrastructure to check whether the SIGP
Interpretation Facility is installed on all VCPUs in the configuration.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch cleanes up the the SIGP SET PREFIX code.
A SIGP SET PREFIX irq may only be injected if the target vcpu is
stopped. Let's move the checking code into the injection code and
return -EBUSY if the target vcpu is not stopped.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
As a SIGP STOP is an interrupt with the least priority, it may only result
in stop of the vcpu when no other interrupts are left pending.
To detect whether a non-stop irq is pending, we need a way to mask out
stop irqs from the general kvm_cpu_has_interrupt() function. For this
reason, the existing function (with an outdated name) is replaced by
kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
This patch removes the famous action_bits and moves the handling of
SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt.
The new local interrupt infrastructure is used to track pending stop
requests.
STOP irqs are the only irqs that don't get actively delivered. They
remain pending until the stop function is executed (=stop intercept).
If another STOP irq is already pending, -EBUSY will now be returned
(needed for the SIGP handling code).
Migration of pending SIGP STOP (AND STORE STATUS) orders should now
be supported out of the box.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
In order to get rid of the action_flags and to properly migrate pending SIGP
STOP irqs triggered e.g. by SIGP STOP AND STORE STATUS, we need to remember
whether to store the status when stopping.
For this reason, a new parameter (flags) for the SIGP STOP irq is introduced.
These flags further define details of the requested STOP and can be easily
migrated.
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Patch 0759d0681c ("KVM: s390: cleanup handle_wait by reusing
kvm_vcpu_block") changed the way pending guest clock comparator
interrupts are detected. It was assumed that as soon as the hrtimer
wakes up, the condition for the guest ckc is satisfied.
This is however only true as long as adjclock() doesn't speed
up the monotonic clock. Reason is that the hrtimer is based on
CLOCK_MONOTONIC, the guest clock comparator detection is based
on the raw TOD clock. If CLOCK_MONOTONIC runs faster than the
TOD clock, the hrtimer wakes the target VCPU up too early and
the target VCPU will not detect any pending interrupts, therefore
going back to sleep. It will never be woken up again because the
hrtimer has finished. The VCPU is stuck.
As a quick fix, we have to forward the hrtimer until the guest
clock comparator is really due, to guarantee properly timed wake
ups.
As the hrtimer callback might be triggered on another cpu, we
have to make sure that the timer is really stopped and not currently
executing the callback on another cpu. This can happen if the vcpu
thread is scheduled onto another physical cpu, but the timer base
is not migrated. So lets use hrtimer_cancel instead of try_to_cancel.
A proper fix might be to introduce a RAW based hrtimer.
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The hrtimer that handles the wait with enabled timer interrupts
should not be disturbed by changes of the host time.
This patch changes our hrtimer to be based on a monotonic clock.
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We sometimes get an underflow for the sleep duration, which most
likely won't result in the short sleep time we wanted.
So let's check for sleep duration underflows and directly continue
to run the guest if we get one.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
With commit c6c956b80b ("KVM: s390/mm: support gmap page tables with less
than 5 levels") we are able to define a limit for the guest memory size.
As we round up the guest size in respect to the levels of page tables
we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB.
We currently limit the guest size to 16 TB, which means we end up
creating a page table structure supporting guest sizes up to 8192 TB.
This patch introduces an interface that allows userspace to tune
this limit. This may bring performance improvements for small guests.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
As we will allow in a later patch to recreate gmaps with new limits,
we need to make sure that vcpus get their reference for that gmap
after they increased the online_vcpu counter, so there is no possible race.
While we are doing this, we also can simplify the vcpu_init function, by
moving ucontrol specifics to an own function.
That way we also start now setting the kvm_valid_regs for the ucontrol path.
Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
sparse rightfully complains about
warning: symbol '__inject_extcall' was not declared. Should it be static?
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
The return value of kvm_arch_vcpu_postcreate is not checked in its
caller. This is okay, because only x86 provides vcpu_postcreate right
now and it could only fail if vcpu_load failed. But that is not
possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so
just get rid of the unchecked return value.
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Pull s390 fixes from Martin Schwidefsky:
"Five more bug fixes from Michael for the s390 BPF jit"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/bpf: Zero extend parameters before calling C function
s390/bpf: Fix sk_load_byte_msh()
s390/bpf: Fix offset parameter for skb_copy_bits()
s390/bpf: Fix skb_copy_bits() parameter passing
s390/bpf: Fix JMP_JGE_K (A >= K) and JMP_JGT_K (A > K)
First two are minor fallout from the param rework which went in this merge
window.
Next three are a series which fixes a longstanding (but never previously
reported and unlikely , so no CC stable) race between kallsyms and freeing
the init section.
Finally, a minor cleanup as our module refcount will now be -1 during
unload.
Thanks,
Rusty.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJUwEmwAAoJENkgDmzRrbjx77kP/1cNQR2eG2sBwokg3q0tvHnQ
IKqEXErW7NvxRa+RAMEmy2uQoGt6+uNklAbtyJEYM9oR1NieFbPi2yrt9Xn5SAXS
Brp1S8WYBMilA3W3o6I0trFDRWHdpdtkKIQwLWgJNSEWjbTXh8bSwp/2X1rlOPyI
ZmphCMOQMU2/uFEyJhTz1WMEV8eVXiRLN8OxSkPxToxdZoGln2U8IBCCCJC9OG+f
Cf3eMgEcNdEXNcPKqr11NIcHkAx6M6qI/eMDOqk151PslHa8lbis6di9Z87aE0ps
i8PyrkJGTmgM9cCjXwE8deNseeCmuKYlbPIF+NoxcqtvZstfaMrISwTIEuzV4JHi
p13YhDxy4XiC3H6pKHub/jo7UCl+wWtFh9SqpqGgduFX/p6FtUHQJm0S0X/DFFZt
C+2MFVSe6HRHE8B7bFz86+619Qd/rU7+806CLCE+NbYlYAKIBYKzWt/bml6VH3RJ
OjwXhQqmznWhJjsfD3BUUUpZpHijmylI9gAe2F1oErb8YjRU6gIm7P8hlkOzD7AS
TfGHPFq2raQcfAiGdVmvkbvvhvYZXnB3WVsAexrYoqrT9I8eEfRI+7SkL75MLR2E
ikzhJS3SHkAUAd7fUVMt7xMwh0jmhsPjWCCqc13m6UUFoXhTaDgKgPGftltN0bI2
g85+enZ3/eca6xh/KxvW
=Kf9b
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module and param fixes from Rusty Russell:
"Surprising number of fixes this merge window :(
The first two are minor fallout from the param rework which went in
this merge window.
The next three are a series which fixes a longstanding (but never
previously reported and unlikely , so no CC stable) race between
kallsyms and freeing the init section.
Finally, a minor cleanup as our module refcount will now be -1 during
unload"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: make module_refcount() a signed integer.
module: fix race in kallsyms resolution during module load success.
module: remove mod arg from module_free, rename module_memfree().
module_arch_freeing_init(): new hook for archs before module->module_init freed.
param: fix uninitialized read with CONFIG_DEBUG_LOCK_ALLOC
param: initialize store function to NULL if not available.
Commit 725908110a1f ("s390: add SMT support") added a check for
CONFIG_ZFCPDUMP. But the Kconfig symbol ZFCPDUMP was removed in v3.16
through commit bf28a5970d ("s390/dump: Remove CONFIG_ZFCPDUMP"). So
this check will always evaluate to false. No one noticed probably
because the code also checks for CONFIG_CRASH_DUMP which "also enables
s390 zfcpdump".
Dump the unneeded check.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The multi-threading facility is introduced with the z13 processor family.
This patch adds code to detect the multi-threading facility. With the
facility enabled each core will surface multiple hardware threads to the
system. Each hardware threads looks like a normal CPU to the operating
system with all its registers and properties.
The SCLP interface reports the SMT topology indirectly via the maximum
thread id. Each reported CPU in the result of a read-scp-information
is a core representing a number of hardware threads.
To reflect the reduced CPU capacity if two hardware threads run on a
single core the MT utilization counter set is used to normalize the
raw cputime obtained by the CPU timer deltas. This scaled cputime is
reported via the taskstats interface. The normal /proc/stat numbers
are based on the raw cputime and are not affected by the normalization.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Avoid cache aliasing on z13 by aligning shared objects to multiples
of 512K. The virtual addresses of a page from a shared file needs
to have identical bits in the range 2^12 to 2^18.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Virtio drivers should map the part of the range they need, not
necessarily all of it.
To this end, support mapping ranges within BAR on s390.
Since multiple ranges can now be mapped within a BAR, we keep track of
the number of mappings created, and only clear out the mapping for a BAR
when this number reaches 0.
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Tested-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Pull RCU updates from Paul E. McKenney:
- Documentation updates.
- Miscellaneous fixes.
- Preemptible-RCU fixes, including fixing an old bug in the
interaction of RCU priority boosting and CPU hotplug.
- SRCU updates.
- RCU CPU stall-warning updates.
- RCU torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Archs have been abusing module_free() to clean up their arch-specific
allocations. Since module_free() is also (ab)used by BPF and trace code,
let's keep it to simple allocations, and provide a hook called before
that.
This means that avr32, ia64, parisc and s390 no longer need to implement
their own module_free() at all. avr32 doesn't need module_finalize()
either.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-parisc@vger.kernel.org
Cc: linux-s390@vger.kernel.org
The s390x ABI requires to zero extend parameters before functions
are called.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In sk_load_byte_msh() sk_load_byte_slow() is called instead of
sk_load_byte_msh_slow(). Fix this and call the correct function.
Besides of this load only one byte instead of two and fix the comment.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the offset parameter for skb_copy_bits is changed in
sk_load_word() and sk_load_half(). Therefore it is not correct when
calling skb_copy_bits(). Fix this and use the original offset
for the function call.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The skb_copy_bits() function has the following signature:
int skb_copy_bits(const struct sk_buff *skb, int offset, void *to, int len)
Currently in bpf_jit.S the "to" and "len" parameters have been
exchanged. So fix this and call the function with the correct
parameters.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the signed COMPARE HALFWORD IMMEDIATE (chi) and COMPARE (c)
instructions are used to compare "A" with "K". This is not correct
because "A" and "K" are both unsigned. To fix this remove the
chi instruction (no unsigned analogon available) and use the
unsigned COMPARE LOGICAL (cl) instruction instead of COMPARE (c).
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 fixes from Martin Schwidefsky:
"Two small performance tweaks, the plumbing for the execveat system
call and a couple of bug fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/uprobes: fix user space PER events
s390/bpf: Fix JMP_JGE_X (A > X) and JMP_JGT_X (A >= X)
s390/bpf: Fix ALU_NEG (A = -A)
s390/mm: avoid using pmd_to_page for !USE_SPLIT_PMD_PTLOCKS
s390/timex: fix get_tod_clock_ext() inline assembly
s390: wire up execveat syscall
s390/kernel: use stnsm 255 instead of stosm 0
s390/vtime: Get rid of redundant WARN_ON
s390/zcrypt: kernel oops at insmod of the z90crypt device driver
If uprobes are single stepped for example with gdb, the behavior should
now be correct. Before this patch, when gdb was single stepping a uprobe,
the result was a SIGILL.
When PER is active for any storage alteration and a uprobe is hit, a storage
alteration event is indicated. These over indications are filterd out by gdb,
if no change has happened within the observed area.
Signed-off-by: Jan Willeke <willeke@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the signed COMPARE (cr) instruction is used to compare "A"
with "X". This is not correct because "A" and "X" are both unsigned.
To fix this use the unsigned COMPARE LOGICAL (clr) instruction instead.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the LOAD NEGATIVE (lnr) instruction is used for ALU_NEG. This
instruction always loads the negative value. Therefore, if A is already
negative, it remains unchanged. To fix this use LOAD COMPLEMENT (lcr)
instead.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch removes the redundant sysfs cacheinfo code by reusing
the newly introduced generic cacheinfo infrastructure through the
commit 246246cbde ("drivers: base: support cpu cache information
interface to userspace via sysfs")
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
In aes_encrypt() and aes_decrypt(), need let 'sctx->key' be modified,
so remove 'const' for it. The related warnings:
CC [M] arch/s390/crypto/aes_s390.o
arch/s390/crypto/aes_s390.c: In function 'aes_encrypt':
arch/s390/crypto/aes_s390.c:146:37: warning: passing argument 2 of 'crypt_s390_km' discards 'const' qualifier from pointer target type [-Wdiscarded-array-qualifiers]
crypt_s390_km(KM_AES_128_ENCRYPT, &sctx->key, out, in,
^
...
In file included from arch/s390/crypto/aes_s390.c:29:0:
arch/s390/crypto/crypt_s390.h:154:19: note: expected 'void *' but argument is of type 'const u8 (*)[32] {aka const unsigned char (*)[32]}'
static inline int crypt_s390_km(long func, void *param,
^
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This has no effect as KERN_CONT is an empty string,
It's probably just a missing conversion artifact as the
other pr_cont uses in the same file don't have this prefix.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
_sclp_print_early() has a return value, but misses to sign extend it
if called from 64 bit code.
This is not really a bug, since currently no caller cares what the
return value is.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
_sclp_print_early() has return value: at present, return 0 for OK, 1 for
failure. It returns '%r2', so use 'long' as return value (upper caller
can check '%r2' directly). The related warning:
CC arch/s390/boot/compressed/misc.o
arch/s390/boot/compressed/misc.c:66:8: warning: type defaults to 'int' in declaration of '_sclp_print_early' [-Wimplicit-int]
extern _sclp_print_early(const char *);
^
At present, _sclp_print_early() is only used by puts(), so can still
remain its declaration in 'misc.c' file.
[heiko.carstens@de.ibm.com]: move declaration to sclp.h header file
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Get rid of sparse warnings like this one:
arch/s390/kernel/signal.c:244:1:
warning: symbol 'sys_sigreturn' was not declared. Should it be static?
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Get rid of warnings like this one:
warning: constant 0xffe0000000000000 is so big it is unsigned long
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Always verify that the to be replaced code matches what we expect to see.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
pmd_to_page() is only available if USE_SPLIT_PMD_PTLOCKS is defined.
The use of pmd_to_page in the gmap code can cause compile errors if
NR_CPUS is smaller than SPLIT_PTLOCK_CPUS. Do not use pmd_to_page
outside of USE_SPLIT_PMD_PTLOCKS sections.
Reported-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
For C language, it treats array parameter as a pointer, so sizeof for an
array parameter is equal to sizeof for a pointer, which causes compiler
warning (with allmodconfig by gcc 5):
./arch/s390/include/asm/timex.h: In function 'get_tod_clock_ext':
./arch/s390/include/asm/timex.h:76:32: warning: 'sizeof' on array function parameter 'clk' will return size of 'char *' [-Wsizeof-array-argument]
typedef struct { char _[sizeof(clk)]; } addrtype;
^
Can use macro CLOCK_STORE_SIZE instead of all related hard code numbers,
which also can avoid this warning. And also add a tab to CLOCK_TICK_RATE
definition to match coding styles.
[heiko.carstens@de.ibm.com]:
Chen's patch actually fixes a bug within the get_tod_clock_ext() inline assembly
where we incorrectly tell the compiler that only 8 bytes of memory get changed
instead of 16 bytes.
This would allow gcc to generate incorrect code. Right now this doesn't seem to
be the case.
Also slightly changed the patch a bit.
- renamed CLOCK_STORE_SIZE to STORE_CLOCK_EXT_SIZE
- changed get_tod_clock_ext() to receive a char pointer parameter
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
SRCU is not necessary to be compiled by default in all cases. For tinification
efforts not compiling SRCU unless necessary is desirable.
The current patch tries to make compiling SRCU optional by introducing a new
Kconfig option CONFIG_SRCU which is selected when any of the components making
use of SRCU are selected.
If we do not select CONFIG_SRCU, srcu.o will not be compiled at all.
text data bss dec hex filename
2007 0 0 2007 7d7 kernel/rcu/srcu.o
Size of arch/powerpc/boot/zImage changes from
text data bss dec hex filename
831552 64180 23944 919676 e087c arch/powerpc/boot/zImage : before
829504 64180 23952 917636 e0084 arch/powerpc/boot/zImage : after
so the savings are about ~2000 bytes.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Josh Triplett <josh@joshtriplett.org>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: resolve conflict due to removal of arch/ia64/kvm/Kconfig. ]
As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
ACCESS_ONCE might fail with specific compilers for non-scalar accesses.
Here is a set of patches to tackle that problem.
The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data structure
is larger than the machine word size memcpy is used and a warning is emitted.
The next patches fix up several in-tree users of ACCESS_ONCE on non-scalar
types.
This merge does not yet contain a patch that forces ACCESS_ONCE to work only
on scalar types. This is targetted for the next merge window as Linux next
already contains new offenders regarding ACCESS_ONCE vs. non-scalar types.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJUkrVGAAoJEBF7vIC1phx8stkP/2LmN5y6LOseoEW06xa5MX4m
cbIKsZNtsGHl7EDcTzzuWs6Sq5/Cj7V3yzeBF7QGbUKOqvFWU3jvpUBCCfjMg37C
77/Vf0ZPrxTXXxeJ4Ykdy2CGvuMtuYY9TWkrRNKmLU0xex7lGblEzCt9z6+mZviw
26/DN8ctjkHRvIUAi+7RfQBBc3oSMYAC1mzxYKBAsAFLV+LyFmsGU/4iofZMAsdt
XFyVXlrLn0Bjx/MeceGkOlMDiVx4FnfccfFaD4hhuTLBJXWitkUK/MRa4JBiXWzH
agY8942A8/j9wkI2DFp/pqZYqA/sTXLndyOWlhE//ZSti0n0BSJaOx3S27rTLkAc
5VmZEVyIrS3hyOpyyAi0sSoPkDnjeCHmQg9Rqn34/poKLd7JDrW2UkERNCf/T3eh
GI2rbhAlZz3v5mIShn8RrxzslWYmOObpMr3HYNUdRk8YUfTf6d6aZ3txHp2nP4mD
VBAEzsvP9rcVT2caVhU2dnBzeaZAj3zeDxBtjcb3X2osY9tI7qgLc9Fa/fWKgILk
2evkLcctsae2mlLNGHyaK3Dm/ZmYJv+57MyaQQEZNfZZgeB1y4k0DkxH4w1CFmCi
s8XlH5voEHgnyjSQXXgc/PNVlkPAKr78ZyTiAfiKmh8rpe41/W4hGcgao7L9Lgiu
SI0uSwKibuZt4dHGxQuG
=IQ5o
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux
Pull ACCESS_ONCE cleanup preparation from Christian Borntraeger:
"kernel: Provide READ_ONCE and ASSIGN_ONCE
As discussed on LKML http://marc.info/?i=54611D86.4040306%40de.ibm.com
ACCESS_ONCE might fail with specific compilers for non-scalar
accesses.
Here is a set of patches to tackle that problem.
The first patch introduce READ_ONCE and ASSIGN_ONCE. If the data
structure is larger than the machine word size memcpy is used and a
warning is emitted. The next patches fix up several in-tree users of
ACCESS_ONCE on non-scalar types.
This does not yet contain a patch that forces ACCESS_ONCE to work only
on scalar types. This is targetted for the next merge window as Linux
next already contains new offenders regarding ACCESS_ONCE vs.
non-scalar types"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/borntraeger/linux:
s390/kvm: REPLACE barrier fixup with READ_ONCE
arm/spinlock: Replace ACCESS_ONCE with READ_ONCE
arm64/spinlock: Replace ACCESS_ONCE READ_ONCE
mips/gup: Replace ACCESS_ONCE with READ_ONCE
x86/gup: Replace ACCESS_ONCE with READ_ONCE
x86/spinlock: Replace ACCESS_ONCE with READ_ONCE
mm: replace ACCESS_ONCE with READ_ONCE or barriers
kernel: Provide READ_ONCE and ASSIGN_ONCE
- spring cleaning: removed support for IA64, and for hardware-assisted
virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken because
the (non-KVM) XSAVES patches inadvertently changed the KVM userspace
ABI whenever XSAVES was enabled; hence, this part is going to stable.
Guest support is just a matter of exposing the feature and CPUID leaves
support.
Right now KVM is broken for PPC BookE in your tree (doesn't compile).
I'll reply to the pull request with a patch, please apply it either
before the pull request or in the merge commit, in order to preserve
bisectability somewhat.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJUkpg+AAoJEL/70l94x66DUmoH/jzXYkptSW9NGgm79KqxGJlD
lzLnLBkitVvx++Mz5YBhdJEhKKLUlCtifFT1zPJQ/pthQhIRSaaAwZyNGgUs5w5x
yMGKHiPQFyZRbmQtZhCInW0BftJoYHHciO3nUfHCZnp34My9MP2D55W7/z+fYFfQ
DuqBSE9ThyZJtZ4zh8NRA9fCOeuqwVYRyoBs820Wbsh4cpIBoIK63Dg7k+CLE+ZV
MZa/mRL6bAfsn9W5bnOUAgHJ3SPznnWbO3/g0aV+roL/5pffblprJx9lKNR08xUM
6hDFLop2gDehDJesDkY/o8Ckp1hEouvfsVpSShry4vcgtn0hgh2O5/6Orbmj6vE=
=Zwq1
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM update from Paolo Bonzini:
"3.19 changes for KVM:
- spring cleaning: removed support for IA64, and for hardware-
assisted virtualization on the PPC970
- ARM, PPC, s390 all had only small fixes
For x86:
- small performance improvements (though only on weird guests)
- usual round of hardware-compliancy fixes from Nadav
- APICv fixes
- XSAVES support for hosts and guests. XSAVES hosts were broken
because the (non-KVM) XSAVES patches inadvertently changed the KVM
userspace ABI whenever XSAVES was enabled; hence, this part is
going to stable. Guest support is just a matter of exposing the
feature and CPUID leaves support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
KVM: move APIC types to arch/x86/
KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
KVM: PPC: Book3S HV: Improve H_CONFER implementation
KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
KVM: PPC: Book3S HV: Remove code for PPC970 processors
KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
arch: powerpc: kvm: book3s_pr.c: Remove unused function
arch: powerpc: kvm: book3s.c: Remove some unused functions
arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
KVM: PPC: Book3S HV: ptes are big endian
KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
KVM: PPC: Book3S HV: Fix KSM memory corruption
KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
KVM: PPC: Book3S HV: Fix computation of tlbie operand
KVM: PPC: Book3S HV: Add missing HPTE unlock
KVM: PPC: BookE: Improve irq inject tracepoint
arm/arm64: KVM: Require in-kernel vgic for the arch timers
...
On some models, stnsm 255 might be slightly faster than stosm 0.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
in the cpu time accounting function vtime_account_irq_enter
(vtime_account_system) we use a WARN_ON_ONCE(!irqs_disabled()).
This is redundant as the function virt_timer_forward is always
called and has a BUG_ON(!irqs_disabled()).
This saves several nanoseconds in my specific testcase (KVM entry/exit)
and probably all other callers like (soft)irq entry/exit.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)
Commit 1365039d0c ("KVM: s390: Fix ipte locking") replace
ACCESS_ONCE with barriers. Lets use READ_ONCE instead.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Pull user namespace related fixes from Eric Biederman:
"As these are bug fixes almost all of thes changes are marked for
backporting to stable.
The first change (implicitly adding MNT_NODEV on remount) addresses a
regression that was created when security issues with unprivileged
remount were closed. I go on to update the remount test to make it
easy to detect if this issue reoccurs.
Then there are a handful of mount and umount related fixes.
Then half of the changes deal with the a recently discovered design
bug in the permission checks of gid_map. Unix since the beginning has
allowed setting group permissions on files to less than the user and
other permissions (aka ---rwx---rwx). As the unix permission checks
stop as soon as a group matches, and setgroups allows setting groups
that can not later be dropped, results in a situtation where it is
possible to legitimately use a group to assign fewer privileges to a
process. Which means dropping a group can increase a processes
privileges.
The fix I have adopted is that gid_map is now no longer writable
without privilege unless the new file /proc/self/setgroups has been
set to permanently disable setgroups.
The bulk of user namespace using applications even the applications
using applications using user namespaces without privilege remain
unaffected by this change. Unfortunately this ix breaks a couple user
space applications, that were relying on the problematic behavior (one
of which was tools/selftests/mount/unprivileged-remount-test.c).
To hopefully prevent needing a regression fix on top of my security
fix I rounded folks who work with the container implementations mostly
like to be affected and encouraged them to test the changes.
> So far nothing broke on my libvirt-lxc test bed. :-)
> Tested with openSUSE 13.2 and libvirt 1.2.9.
> Tested-by: Richard Weinberger <richard@nod.at>
> Tested on Fedora20 with libvirt 1.2.11, works fine.
> Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> Just to be sure I was testing the right thing I also tested using
> my unprivileged nsexec testcases, and they failed on setgroup/setgid
> as now expected, and succeeded there without your patches.
> Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> I tested this with Sandstorm. It breaks as is and it works if I add
> the setgroups thing.
> Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :("
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Unbreak the unprivileged remount tests
userns; Correct the comment in map_write
userns: Allow setting gid_maps without privilege when setgroups is disabled
userns: Add a knob to disable setgroups on a per user namespace basis
userns: Rename id_map_mutex to userns_state_mutex
userns: Only allow the creator of the userns unprivileged mappings
userns: Check euid no fsuid when establishing an unprivileged uid mapping
userns: Don't allow unprivileged creation of gid mappings
userns: Don't allow setgroups until a gid mapping has been setablished
userns: Document what the invariant required for safe unprivileged mappings.
groups: Consolidate the setgroups permission checks
mnt: Clear mnt_expire during pivot_root
mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
umount: Do not allow unmounting rootfs.
umount: Disallow unprivileged mount force
mnt: Update unprivileged remount test
mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes, just
removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There are
some ath9k patches coming in through this tree that have been acked by
the wireless maintainers as they relied on the debugfs changes.
Everything has been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlSOD20ACgkQMUfUDdst+ylLPACg2QrW1oHhdTMT9WI8jihlHVRM
53kAoLeteByQ3iVwWurwwseRPiWa8+MI
=OVRS
-----END PGP SIGNATURE-----
Merge tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core update from Greg KH:
"Here's the set of driver core patches for 3.19-rc1.
They are dominated by the removal of the .owner field in platform
drivers. They touch a lot of files, but they are "simple" changes,
just removing a line in a structure.
Other than that, a few minor driver core and debugfs changes. There
are some ath9k patches coming in through this tree that have been
acked by the wireless maintainers as they relied on the debugfs
changes.
Everything has been in linux-next for a while"
* tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
fs: debugfs: add forward declaration for struct device type
firmware class: Deletion of an unnecessary check before the function call "vunmap"
firmware loader: fix hung task warning dump
devcoredump: provide a one-way disable function
device: Add dev_<level>_once variants
ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
ath: use seq_file api for ath9k debugfs files
debugfs: add helper function to create device related seq_file
drivers/base: cacheinfo: remove noisy error boot message
Revert "core: platform: add warning if driver has no owner"
drivers: base: support cpu cache information interface to userspace via sysfs
drivers: base: add cpu_device_create to support per-cpu devices
topology: replace custom attribute macros with standard DEVICE_ATTR*
cpumask: factor out show_cpumap into separate helper function
driver core: Fix unbalanced device reference in drivers_probe
driver core: fix race with userland in device_add()
sysfs/kernfs: make read requests on pre-alloc files use the buffer.
sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
fs: sysfs: return EGBIG on write if offset is larger than file size
...
Pull crypto update from Herbert Xu:
- The crypto API is now documented :)
- Disallow arbitrary module loading through crypto API.
- Allow get request with empty driver name through crypto_user.
- Allow speed testing of arbitrary hash functions.
- Add caam support for ctr(aes), gcm(aes) and their derivatives.
- nx now supports concurrent hashing properly.
- Add sahara support for SHA1/256.
- Add ARM64 version of CRC32.
- Misc fixes.
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
crypto: tcrypt - Allow speed testing of arbitrary hash functions
crypto: af_alg - add user space interface for AEAD
crypto: qat - fix problem with coalescing enable logic
crypto: sahara - add support for SHA1/256
crypto: sahara - replace tasklets with kthread
crypto: sahara - add support for i.MX53
crypto: sahara - fix spinlock initialization
crypto: arm - replace memset by memzero_explicit
crypto: powerpc - replace memset by memzero_explicit
crypto: sha - replace memset by memzero_explicit
crypto: sparc - replace memset by memzero_explicit
crypto: algif_skcipher - initialize upon init request
crypto: algif_skcipher - removed unneeded code
crypto: algif_skcipher - Fixed blocking recvmsg
crypto: drbg - use memzero_explicit() for clearing sensitive data
crypto: drbg - use MODULE_ALIAS_CRYPTO
crypto: include crypto- module prefix in template
crypto: user - add MODULE_ALIAS
crypto: sha-mb - remove a bogus NULL check
crytpo: qat - Fix 64 bytes requests
...
Merge second patchbomb from Andrew Morton:
- the rest of MM
- misc fs fixes
- add execveat() syscall
- new ratelimit feature for fault-injection
- decompressor updates
- ipc/ updates
- fallocate feature creep
- fsnotify cleanups
- a few other misc things
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (99 commits)
cgroups: Documentation: fix trivial typos and wrong paragraph numberings
parisc: percpu: update comments referring to __get_cpu_var
percpu: update local_ops.txt to reflect this_cpu operations
percpu: remove __get_cpu_var and __raw_get_cpu_var macros
fsnotify: remove destroy_list from fsnotify_mark
fsnotify: unify inode and mount marks handling
fallocate: create FAN_MODIFY and IN_MODIFY events
mm/cma: make kmemleak ignore CMA regions
slub: fix cpuset check in get_any_partial
slab: fix cpuset check in fallback_alloc
shmdt: use i_size_read() instead of ->i_size
ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
ipc/msg: increase MSGMNI, remove scaling
ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
lib/decompress.c: consistency of compress formats for kernel image
decompress_bunzip2: off by one in get_next_block()
usr/Kconfig: make initrd compression algorithm selection not expert
fault-inject: add ratelimit option
ratelimit: add initialization macro
...
Following the suggestions from Andrew Morton and Stephen Rothwell,
Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
can enable.
set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
previously allowed + ARM64 which I tested.
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, we have prepared to avoid using debug-pagealloc in boottime. So
introduce new kernel-parameter to disable debug-pagealloc in boottime, and
makes related functions to be disabled in this case.
Only non-intuitive part is change of guard page functions. Because guard
page is effective only if debug-pagealloc is enabled, turning off
according to debug-pagealloc is reasonable thing to do.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull another networking update from David Miller:
"Small follow-up to the main merge pull from the other day:
1) Alexander Duyck's DMA memory barrier patch set.
2) cxgb4 driver fixes from Karen Xie.
3) Add missing export of fixed_phy_register() to modules, from Mark
Salter.
4) DSA bug fixes from Florian Fainelli"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
net/macb: add TX multiqueue support for gem
linux/interrupt.h: remove the definition of unused tasklet_hi_enable
jme: replace calls to redundant function
net: ethernet: davicom: Allow to select DM9000 for nios2
net: ethernet: smsc: Allow to select SMC91X for nios2
cxgb4: Add support for QSA modules
libcxgbi: fix freeing skb prematurely
cxgb4i: use set_wr_txq() to set tx queues
cxgb4i: handle non-pdu-aligned rx data
cxgb4i: additional types of negative advice
cxgb4/cxgb4i: set the max. pdu length in firmware
cxgb4i: fix credit check for tx_data_wr
cxgb4i: fix tx immediate data credit check
net: phy: export fixed_phy_register()
fib_trie: Fix trie balancing issue if new node pushes down existing node
vlan: Add ability to always enable TSO/UFO
r8169:update rtl8168g pcie ephy parameter
net: dsa: bcm_sf2: force link for all fixed PHY devices
fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
...
There are a number of situations where the mandatory barriers rmb() and
wmb() are used to order memory/memory operations in the device drivers
and those barriers are much heavier than they actually need to be. For
example in the case of PowerPC wmb() calls the heavy-weight sync
instruction when for coherent memory operations all that is really needed
is an lsync or eieio instruction.
This commit adds a coherent only version of the mandatory memory barriers
rmb() and wmb(). In most cases this should result in the barrier being the
same as the SMP barriers for the SMP case, however in some cases we use a
barrier that is somewhere in between rmb() and smp_rmb(). For example on
ARM the rmb barriers break down as follows:
Barrier Call Explanation
--------- -------- ----------------------------------
rmb() dsb() Data synchronization barrier - system
dma_rmb() dmb(osh) data memory barrier - outer sharable
smp_rmb() dmb(ish) data memory barrier - inner sharable
These new barriers are not as safe as the standard rmb() and wmb().
Specifically they do not guarantee ordering between coherent and incoherent
memories. The primary use case for these would be to enforce ordering of
reads and writes when accessing coherent memory that is shared between the
CPU and a device.
It may also be noted that there is no dma_mb(). Most architectures don't
provide a good mechanism for performing a coherent only full barrier without
resorting to the same mechanism used in mb(). As such there isn't much to
be gained in trying to define such a function.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is meant to cleanup the handling of read_barrier_depends and
smp_read_barrier_depends. In multiple spots in the kernel headers
read_barrier_depends is defined as "do {} while (0)", however we then go
into the SMP vs non-SMP sections and have the SMP version reference
read_barrier_depends, and the non-SMP define it as yet another empty
do/while.
With this commit I went through and cleaned out the duplicate definitions
and reduced the number of definitions down to 2 per header. In addition I
moved the 50 line comments for the macro from the x86 and mips headers that
defined it as an empty do/while to those that were actually defining the
macro, alpha and blackfin.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull s390 updates from Martin Schwidefsky:
"The most notable change for this pull request is the ftrace rework
from Heiko. It brings a small performance improvement and the ground
work to support a new gcc option to replace the mcount blocks with a
single nop.
Two new s390 specific system calls are added to emulate user space
mmio for PCI, an artifact of the how PCI memory is accessed.
Two patches for the memory management with changes to common code.
For KVM mm_forbids_zeropage is added which disables the empty zero
page for an mm that is used by a KVM process. And an optimization,
pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.
Some micro optimization for the cmpxchg and the spinlock code.
And as usual bug fixes and cleanups"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
s390/cputime: fix 31-bit compile
s390/scm_block: make the number of reqs per HW req configurable
s390/scm_block: handle multiple requests in one HW request
s390/scm_block: allocate aidaw pages only when necessary
s390/scm_block: use mempool to manage aidaw requests
s390/eadm: change timeout value
s390/mm: fix memory leak of ptlock in pmd_free_tlb
s390: use local symbol names in entry[64].S
s390/ptrace: always include vector registers in core files
s390/simd: clear vector register pointer on fork/clone
s390: translate cputime magic constants to macros
s390/idle: convert open coded idle time seqcount
s390/idle: add missing irq off lockdep annotation
s390/debug: avoid function call for debug_sprintf_*
s390/kprobes: fix instruction copy for out of line execution
s390: remove diag 44 calls from cpu_relax()
s390/dasd: retry partition detection
s390/dasd: fix list corruption for sleep_on requests
s390/dasd: fix infinite term I/O loop
s390/dasd: remove unused code
...
Pull networking updates from David Miller:
1) New offloading infrastructure and example 'rocker' driver for
offloading of switching and routing to hardware.
This work was done by a large group of dedicated individuals, not
limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu
2) Start making the networking operate on IOV iterators instead of
modifying iov objects in-situ during transfers. Thanks to Al Viro
and Herbert Xu.
3) A set of new netlink interfaces for the TIPC stack, from Richard
Alpe.
4) Remove unnecessary looping during ipv6 routing lookups, from Martin
KaFai Lau.
5) Add PAUSE frame generation support to gianfar driver, from Matei
Pavaluca.
6) Allow for larger reordering levels in TCP, which are easily
achievable in the real world right now, from Eric Dumazet.
7) Add a variable of napi_schedule that doesn't need to disable cpu
interrupts, from Eric Dumazet.
8) Use a doubly linked list to optimize neigh_parms_release(), from
Nicolas Dichtel.
9) Various enhancements to the kernel BPF verifier, and allow eBPF
programs to actually be attached to sockets. From Alexei
Starovoitov.
10) Support TSO/LSO in sunvnet driver, from David L Stevens.
11) Allow controlling ECN usage via routing metrics, from Florian
Westphal.
12) Remote checksum offload, from Tom Herbert.
13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
driver, from Thomas Lendacky.
14) Add MPLS support to openvswitch, from Simon Horman.
15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
Klassert.
16) Do gro flushes on a per-device basis using a timer, from Eric
Dumazet. This tries to resolve the conflicting goals between the
desired handling of bulk vs. RPC-like traffic.
17) Allow userspace to ask for the CPU upon what a packet was
received/steered, via SO_INCOMING_CPU. From Eric Dumazet.
18) Limit GSO packets to half the current congestion window, from Eric
Dumazet.
19) Add a generic helper so that all drivers set their RSS keys in a
consistent way, from Eric Dumazet.
20) Add xmit_more support to enic driver, from Govindarajulu
Varadarajan.
21) Add VLAN packet scheduler action, from Jiri Pirko.
22) Support configurable RSS hash functions via ethtool, from Eyal
Perry.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
Fix race condition between vxlan_sock_add and vxlan_sock_release
net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
net/mlx4: Add support for A0 steering
net/mlx4: Refactor QUERY_PORT
net/mlx4_core: Add explicit error message when rule doesn't meet configuration
net/mlx4: Add A0 hybrid steering
net/mlx4: Add mlx4_bitmap zone allocator
net/mlx4: Add a check if there are too many reserved QPs
net/mlx4: Change QP allocation scheme
net/mlx4_core: Use tasklet for user-space CQ completion events
net/mlx4_core: Mask out host side virtualization features for guests
net/mlx4_en: Set csum level for encapsulated packets
be2net: Export tunnel offloads only when a VxLAN tunnel is created
gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
net: fec: only enable mdio interrupt before phy device link up
net: fec: clear all interrupt events to support i.MX6SX
net: fec: reset fep link status in suspend function
net: sock: fix access via invalid file descriptor
net: introduce helper macro for_each_cmsghdr
...
Pull VFS changes from Al Viro:
"First pile out of several (there _definitely_ will be more). Stuff in
this one:
- unification of d_splice_alias()/d_materialize_unique()
- iov_iter rewrite
- killing a bunch of ->f_path.dentry users (and f_dentry macro).
Getting that completed will make life much simpler for
unionmount/overlayfs, since then we'll be able to limit the places
sensitive to file _dentry_ to reasonably few. Which allows to have
file_inode(file) pointing to inode in a covered layer, with dentry
pointing to (negative) dentry in union one.
Still not complete, but much closer now.
- crapectomy in lustre (dead code removal, mostly)
- "let's make seq_printf return nothing" preparations
- assorted cleanups and fixes
There _definitely_ will be more piles"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
copy_from_iter_nocache()
new helper: iov_iter_kvec()
csum_and_copy_..._iter()
iov_iter.c: handle ITER_KVEC directly
iov_iter.c: convert copy_to_iter() to iterate_and_advance
iov_iter.c: convert copy_from_iter() to iterate_and_advance
iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
iov_iter.c: convert iov_iter_zero() to iterate_and_advance
iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
iov_iter.c: iterate_and_advance
iov_iter.c: macros for iterating over iov_iter
kill f_dentry macro
dcache: fix kmemcheck warning in switch_names
new helper: audit_file()
nfsd_vfs_write(): use file_inode()
ncpfs: use file_inode()
kill f_dentry uses
lockd: get rid of ->f_path.dentry->d_sb
...
Conflicts:
drivers/net/ethernet/amd/xgbe/xgbe-desc.c
drivers/net/ethernet/renesas/sh_eth.c
Overlapping changes in both conflict cases.
Signed-off-by: David S. Miller <davem@davemloft.net>
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.
This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull x86 MPX support from Thomas Gleixner:
"This enables support for x86 MPX.
MPX is a new debug feature for bound checking in user space. It
requires kernel support to handle the bound tables and decode the
bound violating instruction in the trap handler"
* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
asm-generic: Remove asm-generic arch_bprm_mm_init()
mm: Make arch_unmap()/bprm_mm_init() available to all architectures
x86: Cleanly separate use of asm-generic/mm_hooks.h
x86 mpx: Change return type of get_reg_offset()
fs: Do not include mpx.h in exec.c
x86, mpx: Add documentation on Intel MPX
x86, mpx: Cleanup unused bound tables
x86, mpx: On-demand kernel allocation of bounds tables
x86, mpx: Decode MPX instruction to get bound violation information
x86, mpx: Add MPX-specific mmap interface
x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
x86, mpx: Add MPX to disabled features
ia64: Sync struct siginfo with general version
mips: Sync struct siginfo with general version
mpx: Extend siginfo structure to include bound violation information
x86, mpx: Rename cfg_reg_u and status_reg
x86: mpx: Give bndX registers actual names
x86: Remove arbitrary instruction size limit in instruction decoder
Pull irq domain updates from Thomas Gleixner:
"The real interesting irq updates:
- Support for hierarchical irq domains:
For complex interrupt routing scenarios where more than one
interrupt related chip is involved we had no proper representation
in the generic interrupt infrastructure so far. That made people
implement rather ugly constructs in their nested irq chip
implementations. The main offenders are x86 and arm/gic.
To distangle that mess we have now hierarchical irqdomains which
seperate the various interrupt chips and connect them via the
hierarchical domains. That keeps the domain specific details
internal to the particular hierarchy level and removes the
criss/cross referencing of chip internals. The resulting hierarchy
for a complex x86 system will look like this:
vector mapped: 74
msi-0 mapped: 2
dmar-ir-1 mapped: 69
ioapic-1 mapped: 4
ioapic-0 mapped: 20
pci-msi-2 mapped: 45
dmar-ir-0 mapped: 3
ioapic-2 mapped: 1
pci-msi-1 mapped: 2
htirq mapped: 0
Neither ioapic nor pci-msi know about the dmar interrupt remapping
between themself and the vector domain. If interrupt remapping is
disabled ioapic and pci-msi become direct childs of the vector
domain.
In hindsight we should have done that years ago, but in hindsight
we always know better :)
- Support for generic MSI interrupt domain handling
We have more and more non PCI related MSI interrupts, so providing
a generic infrastructure for this is better than having all
affected architectures implementing their own private hacks.
- Support for PCI-MSI interrupt domain handling, based on the generic
MSI support.
This part carries the pci/msi branch from Bjorn Helgaas pci tree to
avoid a massive conflict. The PCI/MSI parts are acked by Bjorn.
I have two more branches on top of this. The full conversion of x86
to hierarchical domains and a partial conversion of arm/gic"
* 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
genirq: Move irq_chip_write_msi_msg() helper to core
PCI/MSI: Allow an msi_controller to be associated to an irq domain
PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
PCI/MSI: Enhance core to support hierarchy irqdomain
PCI/MSI: Move cached entry functions to irq core
genirq: Provide default callbacks for msi_domain_ops
genirq: Introduce msi_domain_alloc/free_irqs()
asm-generic: Add msi.h
genirq: Add generic msi irq domain support
genirq: Introduce callback irq_chip.irq_write_msi_msg
genirq: Work around __irq_set_handler vs stacked domains ordering issues
irqdomain: Introduce helper function irq_domain_add_hierarchy()
irqdomain: Implement a method to automatically call parent domains alloc/free
genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
genirq: Split out flow handler typedefs into seperate header file
genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
genirq: Add more helper functions to support stacked irq_chip
genirq: Introduce helper functions to support stacked irq_chip
irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
...
While there normally is no reason to have a pull request for asm-generic
but have all changes get merged through whichever tree needs them, I do
have a series for 3.19. There are two sets of patches that change
significant portions of asm/io.h, and this branch contains both in order
to resolve the conflicts:
- Will Deacon has done a set of patches to ensure that all architectures
define {read,write}{b,w,l,q}_relaxed() functions or get them by
including asm-generic/io.h. These functions are commonly used on ARM
specific drivers to avoid expensive L2 cache synchronization implied by
the normal {read,write}{b,w,l,q}, but we need to define them on all
architectures in order to share the drivers across architectures and
to enable CONFIG_COMPILE_TEST configurations for them
- Thierry Reding has done an unrelated set of patches that extends
the asm-generic/io.h file to the degree necessary to make it useful
on ARM64 and potentially other architectures.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIVAwUAVIdwNmCrR//JCVInAQJWuw/9FHt2ThMnI1J1Jqy4CVwtyjWTSa6Y/uVj
xSytS7AOvmU/nw1quSoba5mN9fcUQUtK9kqjqNcq71WsQcDE6BF9SFpi9cWtjWcI
ZfWsC+5kqry/mbnuHefENipem9RqBrLbOBJ3LARf5M8rZJuTz1KbdZs9r9+1QsCX
ou8jeqVvNKUn9J1WyekJBFSrPOtZ4bCUpeyh23JHRfPtJeAHNOuPuymj6WceAz98
uMV1icRaCBMySsf9HgsHRYW5HwuCm3MrrYj6ukyPpgxYz7FRq4hJLDs6GnlFtAGb
71g87NpFdB32qbW+y1ntfYaJyUryMHMVHBWcV5H9m0btdHTRHYZjoOGOPuyLHHO8
+l4/FaOQhnDL8cNDj0HKfhdlyaFylcWgs1wzj68nv31c1dGjcJcQiyCDwry9mJhr
erh4EewcerUvWzbBMQ4JP1f8syKMsKwbo1bVU61a1RQJxEqVCzJMLweGSOFmqMX2
6E4ZJVWv81UFLoFTzYx+7+M45K4NWywKNQdzwKmqKHc4OQyvq4ALJI0A7SGFJdDR
HJ7VqDiLaSdBitgJcJUxNzKcyXij6wE9jE1fBe3YDFE4LrnZXFVLN+MX6hs7AIFJ
vJM1UpxRxQUMGIH2m7rbDNazOAsvQGxINOjNor23cNLuf6qLY1LrpHVPQDAfJVvA
6tROM77bwIQ=
=xUv6
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
"While there normally is no reason to have a pull request for
asm-generic but have all changes get merged through whichever tree
needs them, I do have a series for 3.19.
There are two sets of patches that change significant portions of
asm/io.h, and this branch contains both in order to resolve the
conflicts:
- Will Deacon has done a set of patches to ensure that all
architectures define {read,write}{b,w,l,q}_relaxed() functions or
get them by including asm-generic/io.h.
These functions are commonly used on ARM specific drivers to avoid
expensive L2 cache synchronization implied by the normal
{read,write}{b,w,l,q}, but we need to define them on all
architectures in order to share the drivers across architectures
and to enable CONFIG_COMPILE_TEST configurations for them
- Thierry Reding has done an unrelated set of patches that extends
the asm-generic/io.h file to the degree necessary to make it useful
on ARM64 and potentially other architectures"
* tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
ARM64: use GENERIC_PCI_IOMAP
sparc: io: remove duplicate relaxed accessors on sparc32
ARM: sa11x0: Use void __iomem * in MMIO accessors
arm64: Use include/asm-generic/io.h
ARM: Use include/asm-generic/io.h
asm-generic/io.h: Implement generic {read,write}s*()
asm-generic/io.h: Reconcile I/O accessor overrides
/dev/mem: Use more consistent data types
Change xlate_dev_{kmem,mem}_ptr() prototypes
ARM: ixp4xx: Properly override I/O accessors
ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
ARM: ebsa110: Properly override I/O accessors
ARC: Remove redundant PCI_IOBASE declaration
documentation: memory-barriers: clarify relaxed io accessor semantics
x86: io: implement dummy relaxed accessor macros for writes
tile: io: implement dummy relaxed accessor macros for writes
sparc: io: implement dummy relaxed accessor macros for writes
powerpc: io: implement dummy relaxed accessor macros for writes
parisc: io: implement dummy relaxed accessor macros for writes
mn10300: io: implement dummy relaxed accessor macros for writes
...
git commit 8461b63ca0
"s390: translate cputime magic constants to macros"
introduce a built error for 31-bit:
kernel/built-in.o: In function `posix_cpu_timer_set':
posix-cpu-timers.c:(.text+0x2a8cc): undefined reference to `__udivdi3'
The original code is actually broken for 31-bit and has been
corrected by the above commit by forcing the compiler to use
64-bit arithmetic through the CPUTIME_PER_USEC define.
To fix the compile error replace the 64-bit division with
a call to __div().
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The pmd_free_tlb function fails to call pgtable_pmd_page_dtor.
Without the call the ptlock for the pmd tables will not be freed.
Add the missing call.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
To improve the output of the perf tool hide most of the symbols
from entry[64].S by using the '.L' prefix.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On machines with support for vector registers the signal frame includes
an area for the vector registers and the ptrace regset interface allow
read and write. This is true even if the task never used any vector
instruction. Only elf core dumps do not include the vector registers,
to make things consistent always include the vector register note in
core dumps create on a machine with vector register support.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The copy_thread function fails to reset the p->thread.vxrs pointer.
This causes the child to use the same vector register save area,
causing both data corruptions and multiple frees of the memory for
the save area after the tasks sharing the save area terminate.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
s390 uses open coded seqcount to synchronize idle time accounting.
Lets consolidate it with the standard API.
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
psw_idle() returns with interrupts disabled, so we should add the
missing annotation.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>