Commit Graph

4747 Commits

Author SHA1 Message Date
David Hildenbrand 3218f7094b s390/mm: support real-space for gmap shadows
We can easily support real-space designation just like EDAT1 and EDAT2.
So guest2 can provide for guest3 an asce with the real-space control being
set.

We simply have to allocate the biggest page table possible and fake all
levels.

There is no protection to consider. If we exceed guest memory, vsie code
will inject an addressing exception (via program intercept). In the future,
we could limit the fake table level to the gmap page table.

As the top level page table can never go away, such gmap shadows will never
get unshadowed, we'll have to come up with another way to limit the number
of kept gmap shadows.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:02 +02:00
David Hildenbrand 1c65781b56 s390/mm: push rte protection down to shadow pte
Just like we already do with ste protection, let's take rte protection
into account. This way, the host pte doesn't have to be mapped writable.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:55:00 +02:00
David Hildenbrand 18b8980988 s390/mm: support EDAT2 for gmap shadows
If the guest is enabled for EDAT2, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT2.

If guest2 references a 2GB page, this memory looks consecutive for guest2,
but it does not have to be so for us. Therefore we have to create fake
segment and page tables.

This works just like EDAT1 support, so page tables are removed when the
parent table (r3t table entry) is changed.

We don't hve to care about:
- ACCF-Validity Control in RTTE
- Access-Control Bits in RTTE
- Fetch-Protection Bit in RTTE
- Common-Region Bit in RTTE

Just like for EDAT1, all bits might be dropped and there is no guaranteed
that they are active.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:56 +02:00
David Hildenbrand fd8d4e3ab6 s390/mm: support EDAT1 for gmap shadows
If the guest is enabled for EDAT1, we can easily create shadows for
guest2 -> guest3 provided tables that make use of EDAT1.

If guest2 references a 1MB page, this memory looks consecutive for guest2,
but it might not be so for us. Therefore we have to create fake page tables.

We can easily add that to our existing infrastructure. The invalidation
mechanism will make sure that fake page tables are removed when the parent
table (sgt table entry) is changed.

As EDAT1 also introduced protection on all page table levels, we have to
also shadow these correctly.

We don't have to care about:
- ACCF-Validity Control in STE
- Access-Control Bits in STE
- Fetch-Protection Bit in STE
- Common-Segment Bit in STE

As all bits might be dropped and there is no guaranteed that they are
active ("unpredictable whether the CPU uses these bits", "may be used").
Without using EDAT1 in the shadow ourselfes (STE-format control == 0),
simply shadowing these bits would not be enough. They would be ignored.

Please note that we are using the "fake" flag to make this look consistent
with further changes (EDAT2, real-space designation support) and don't let
the shadow functions handle fc=1 stes.

In the future, with huge pages in the host, gmap_shadow_pgt() could simply
try to map a huge host page if "fake" is set to one and indicate via return
value that no lower fake tables / shadow ptes are required.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:51 +02:00
David Hildenbrand 5b062bd494 s390/mm: prepare for EDAT1/EDAT2 support in gmap shadow
In preparation for EDAT1/EDAT2 support for gmap shadows, we have to store
the requested edat level in the gmap shadow.

The edat level used during shadow translation is a property of the gmap
shadow. Depending on that level, the gmap shadow will look differently for
the same guest tables. We have to store it internally in order to support
it later.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:47 +02:00
David Hildenbrand 00fc062d53 s390/mm: push ste protection down to shadow pte
If a guest ste is read-only, it doesn't make sense to force the ptes in as
writable in the host. If the source page is read-only in the host, it won't
have to be made writable. Please note that if the source page is not
available, it will still be faulted in writable. This can be changed
internally later on.

If ste protection is removed, underlying shadow tables are also removed,
therefore this change does not affect the guest.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:45 +02:00
David Hildenbrand f4debb4090 s390/mm: take ipte_lock during shadow faults
Let's take the ipte_lock while working on guest 2 provided page table, just
like the other gaccess functions.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:40 +02:00
David Hildenbrand 7a6741576b s390/mm: protection exceptions are corrrectly shadowed
As gmap shadows contains correct protection permissions, protection
exceptons can directly be forwarded to guest 3. If we would encounter
a protection exception while faulting, the next guest 3 run will
automatically handle that for us.

Keep the dat_protection logic in place, as it will be helpful later.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:34 +02:00
David Hildenbrand e52f8b6112 s390/mm: take the mmap_sem in kvm_s390_shadow_fault()
Instead of doing it in the caller, let's just take the mmap_sem
in kvm_s390_shadow_fault(). By taking it as read, we allow parallel
faulting on shadow page tables, gmap shadow code is prepared for that.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:33 +02:00
David Hildenbrand 0f7f848915 s390/mm: fix races on gmap_shadow creation
Before any thread is allowed to use a gmap_shadow, it has to be fully
initialized. However, for invalidation to work properly, we have to
register the new gmap_shadow before we protect the parent gmap table.

Because locking is tricky, and we have to avoid duplicate gmaps, let's
introduce an initialized field, that signalizes other threads if that
gmap_shadow can already be used or if they have to retry.

Let's properly return errors using ERR_PTR() instead of simply returning
NULL, so a caller can properly react on the error.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:28 +02:00
David Hildenbrand 998f637cc4 s390/mm: avoid races on region/segment/page table shadowing
We have to unlock sg->guest_table_lock in order to call
gmap_protect_rmap(). If we sleep just before that call, another VCPU
might pick up that shadowed page table (while it is not protected yet)
and use it.

In order to avoid these races, we have to introduce a third state -
"origin set but still invalid" for an entry. This way, we can avoid
another thread already using the entry before the table is fully protected.
As soon as everything is set up, we can clear the invalid bit - if we
had no race with the unshadowing code.

Suggested-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:27 +02:00
David Hildenbrand a9d23e71d7 s390/mm: shadow pages with real guest requested protection
We really want to avoid manually handling protection for nested
virtualization. By shadowing pages with the protection the guest asked us
for, the SIE can handle most protection-related actions for us (e.g.
special handling for MVPG) and we can directly forward protection
exceptions to the guest.

PTEs will now always be shadowed with the correct _PAGE_PROTECT flag.
Unshadowing will take care of any guest changes to the parent PTE and
any host changes to the host PTE. If the host PTE doesn't have the
fitting access rights or is not available, we have to fix it up.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:19 +02:00
David Hildenbrand eea3678d43 s390/mm: flush tlb of shadows in all situations
For now, the tlb of shadow gmap is only flushed when the parent is removed,
not when it is removed upfront. Therefore other shadow gmaps can reuse the
tables without the tlb getting flushed.

Fix this by simply flushing the tlb
1. Before the shadow tables are removed (analogouos to other unshadow functions)
2. When the gmap is freed and therefore the top level pages are freed.

Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:18 +02:00
Martin Schwidefsky aa17aa57cf s390/mm: add kvm shadow fault function
This patch introduces function kvm_s390_shadow_fault() used to resolve a
fault on a shadow gmap. This function will do validity checking and
build up the shadow page table hierarchy in order to fault in the
requested page into the shadow page table structure.

If an exception occurs while shadowing, guest 2 has to be notified about
it using either an exception or a program interrupt intercept. If
concurrent unshadowing occurres, this function will simply return with
-EAGAIN and the caller has to retry.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:12 +02:00
Martin Schwidefsky 4be130a084 s390/mm: add shadow gmap support
For a nested KVM guest the outer KVM host needs to create shadow
page tables for the nested guest. This patch adds the basic support
to the guest address space (gmap) code.

For each guest address space the inner KVM host creates, the first
outer KVM host needs to create shadow page tables. The address space
is identified by the ASCE loaded into the control register 1 at the
time the inner SIE instruction for the second nested KVM guest is
executed. The outer KVM host creates the shadow tables starting with
the table identified by the ASCE on a on-demand basis. The outer KVM
host will get repeated faults for all the shadow tables needed to
run the second KVM guest.

While a shadow page table for the second KVM guest is active the access
to the origin region, segment and page tables needs to be restricted
for the first KVM guest. For region and segment and page tables the first
KVM guest may read the memory, but write attempt has to lead to an
unshadow.  This is done using the page invalid and read-only bits in the
page table of the first KVM guest. If the first guest re-accesses one of
the origin pages of a shadow, it gets a fault and the affected parts of
the shadow page table hierarchy needs to be removed again.

PGSTE tables don't have to be shadowed, as all interpretation assist can't
deal with the invalid bits in the shadow pte being set differently than
the original ones provided by the first KVM guest.

Many bug fixes and improvements by David Hildenbrand.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:54:04 +02:00
Martin Schwidefsky 6ea427bbbd s390/mm: add reference counter to gmap structure
Let's use a reference counter mechanism to control the lifetime of
gmap structures. This will be needed for further changes related to
gmap shadows.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:53:59 +02:00
Martin Schwidefsky b2d73b2a0a s390/mm: extended gmap pte notifier
The current gmap pte notifier forces a pte into to a read-write state.
If the pte is invalidated the gmap notifier is called to inform KVM
that the mapping will go away.

Extend this approach to allow read-write, read-only and no-access
as possible target states and call the pte notifier for any change
to the pte.

This mechanism is used to temporarily set specific access rights for
a pte without doing the heavy work of a true mprotect call.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:49 +02:00
Martin Schwidefsky 8ecb1a59d6 s390/mm: use RCU for gmap notifier list and the per-mm gmap list
The gmap notifier list and the gmap list in the mm_struct change rarely.
Use RCU to optimize the reader of these lists.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:49 +02:00
Martin Schwidefsky 414d3b0749 s390/kvm: page table invalidation notifier
Pass an address range to the page table invalidation notifier
for KVM. This allows to notify changes that affect a larger
virtual memory area, e.g. for 1MB pages.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-20 09:46:48 +02:00
Hendrik Brueckner 9254e70c4e s390/cpum_cf: use perf software context for hardware counters
On s390, there are two different hardware PMUs for counting and
sampling.  Previously, both PMUs have shared the perf_hw_context
which is not correct and, recently, results in this warning:

    ------------[ cut here ]------------
    WARNING: CPU: 5 PID: 1 at kernel/events/core.c:8485 perf_pmu_register+0x420/0x428
    Modules linked in:
    CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.7.0-rc1+ #2
    task: 00000009c5240000 ti: 00000009c5234000 task.ti: 00000009c5234000
    Krnl PSW : 0704c00180000000 0000000000220c50 (perf_pmu_register+0x420/0x428)
               R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
    Krnl GPRS: ffffffffffffffff 0000000000b15ac6 0000000000000000 00000009cb440000
               000000000022087a 0000000000000000 0000000000b78fa0 0000000000000000
               0000000000a9aa90 0000000000000084 0000000000000005 000000000088a97a
               0000000000000004 0000000000749dd0 000000000022087a 00000009c5237cc0
    Krnl Code: 0000000000220c44: a7f4ff54            brc     15,220aec
               0000000000220c48: 92011000           mvi     0(%r1),1
              #0000000000220c4c: a7f40001           brc     15,220c4e
              >0000000000220c50: a7f4ff12           brc     15,220a74
               0000000000220c54: 0707               bcr     0,%r7
               0000000000220c56: 0707               bcr     0,%r7
               0000000000220c58: ebdff0800024       stmg    %r13,%r15,128(%r15)
               0000000000220c5e: a7f13fe0           tmll    %r15,16352
    Call Trace:
    ([<000000000022087a>] perf_pmu_register+0x4a/0x428)
    ([<0000000000b2c25c>] init_cpum_sampling_pmu+0x14c/0x1f8)
    ([<0000000000100248>] do_one_initcall+0x48/0x140)
    ([<0000000000b25d26>] kernel_init_freeable+0x1e6/0x2a0)
    ([<000000000072bda4>] kernel_init+0x24/0x138)
    ([<000000000073495e>] kernel_thread_starter+0x6/0xc)
    ([<0000000000734958>] kernel_thread_starter+0x0/0xc)
    Last Breaking-Event-Address:
     [<0000000000220c4c>] perf_pmu_register+0x41c/0x428
    ---[ end trace 0c6ef9f5b771ad97 ]---

Using the perf_sw_context is an option because the cpum_cf PMU does
not use interrupts.  To make this more clear, initialize the
capabilities in the PMU structure.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-16 12:08:49 +02:00
Peter Zijlstra b53d6bedbe locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
Since all architectures have this implemented now natively, remove this
dead code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-16 10:48:32 +02:00
Peter Zijlstra 56fefbbc3f locking/atomic, arch/s390: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
Implement FETCH-OP atomic primitives, these are very similar to the
existing OP-RETURN primitives we already have, except they return the
value of the atomic variable _before_ modification.

This is especially useful for irreversible operations -- such as
bitops (because it becomes impossible to reconstruct the state prior
to modification).

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-16 10:48:29 +02:00
Paolo Bonzini a03825bbd0 KVM: s390: use kvm->created_vcpus
The new created_vcpus field avoids possible races between enabling
capabilities and creating VCPUs.

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-16 10:07:37 +02:00
Heiko Carstens 11a7752e01 s390: remove math emulation code
The last in-kernel user is gone so we can finally remove this code.

Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:11 +02:00
Heiko Carstens 9c203239c5 s390: calculate loops_per_jiffies with fp instructions
Implement calculation of loops_per_jiffies with fp instructions which
are available on all 64 bit machines.
To save and restore floating point register context use the new vx support
functions.

Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:10 +02:00
Hendrik Brueckner a086171ad8 s390: Updated kernel config files
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:37:07 +02:00
Hendrik Brueckner f848dbd3bc s390/crc32-vx: add crypto API module for optimized CRC-32 algorithms
Add a crypto API module to access the vector extension based CRC-32
implementations.  Users can request the optimized implementation through
the shash crypto API interface.

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-15 16:36:34 +02:00
Paolo Bonzini f26ed98326 KVM: s390: Features and fixes for 4.8 part1
Four bigger things:
 1. The implementation of the STHYI opcode in the kernel. This is used
    in libraries like qclib [1] to provide enough information for a
    capacity and usage based software licence pricing. The STHYI content
    is defined by the related z/VM documentation [2]. Its data can be
    composed by accessing several other interfaces provided by LPAR or
    the machine. This information is partially sensitive or root-only
    so the kernel does the necessary filtering.
 2. Preparation for nested virtualization (VSIE). KVM should query the
    proper sclp interfaces for the availability of some features before
    using it. In the past we have been sloppy and simply assumed that
    several features are available. With this we should be able to handle
    most cases of a missing feature.
 3. CPU model interfaces extended by some additional features that are
    not covered by a facility bit in STFLE. For example all the crypto
    instructions of the coprocessor provide a query function. As reality
    tends to be more complex (e.g. export regulations might block some
    algorithms) we have to provide additional interfaces to query or
    set these non-stfle features.
 4. Several fixes and changes detected and fixed when doing 1-3.
 
 All features change base s390 code. All relevant patches have an ACK
 from the s390 or component maintainers.
 
 The next pull request for 4.8 (part2) will contain the implementation
 of VSIE.
 
 [1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
 [2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXX+A6AAoJEBF7vIC1phx8SBoQAIkFTMxoGvY9lFkkreUXIyeX
 XL0grybhsaKd4tT80FlobTl2ejpo/feRl5RfD5Oi75UCR4oMuk3Eb8bIyQjcKJvS
 7sYFz+zP9TZ5S/rxvc3EanXpcNnfowKDuLUyOTaq0Hq8XQHaSwzYGGbtPgTdMDAp
 DyhwNhYK8cPvmBS3KHX70ZOMfl9J4s0xvgs42BRJyyDGYrJOZcN1NLsG2l1dAb0L
 au/Svb05PxhgQvqoUId3VSrmRKLm9tSk5DJdIRcmj1+4Mlhfw14LTV+wGuTLTgSZ
 GOyEdum2E/b4QABWca7sxmgqo+Wo5voOW+WKOGLMiN2sK+JwvSnu4qmiRG/qgFCJ
 EQDZer+OEQTu+YgZzjm/r5wbIkV/gqUenjjepk5iWrxK6EB7CmlQuZyyEKm3wO7i
 LrEDqRU7SY+PuUu+Ov6/PHxmMy5DJuK+AedRe8uzuDSmYpSekYFLD44gctkPe56q
 uq4Fhx3g3EIkPMcHnAae92vHLp/INCHCGoPb4Xh6CnaP4Xm+RntCv2hWxw30rHgc
 IIYVy4fSyJuTeHpFcNgeBrbcx4jwvkfJ9kxezM864DA9hBBfcS3ZZDhLM5PPEaLr
 usu7Gt6nHeFtwvXxZn/Y+SsYWCWpmbt6An/m+lqf05aAqyndhbwJ8Kftz3OAxKDw
 b7o59x2wvV9dfakAHxNx
 =fdBQ
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Features and fixes for 4.8 part1

Four bigger things:
1. The implementation of the STHYI opcode in the kernel. This is used
   in libraries like qclib [1] to provide enough information for a
   capacity and usage based software licence pricing. The STHYI content
   is defined by the related z/VM documentation [2]. Its data can be
   composed by accessing several other interfaces provided by LPAR or
   the machine. This information is partially sensitive or root-only
   so the kernel does the necessary filtering.
2. Preparation for nested virtualization (VSIE). KVM should query the
   proper sclp interfaces for the availability of some features before
   using it. In the past we have been sloppy and simply assumed that
   several features are available. With this we should be able to handle
   most cases of a missing feature.
3. CPU model interfaces extended by some additional features that are
   not covered by a facility bit in STFLE. For example all the crypto
   instructions of the coprocessor provide a query function. As reality
   tends to be more complex (e.g. export regulations might block some
   algorithms) we have to provide additional interfaces to query or
   set these non-stfle features.
4. Several fixes and changes detected and fixed when doing 1-3.

All features change base s390 code. All relevant patches have an ACK
from the s390 or component maintainers.

The next pull request for 4.8 (part2) will contain the implementation
of VSIE.

[1] http://www.ibm.com/developerworks/linux/linux390/qclib.html
[2] https://www.ibm.com/support/knowledgecenter/SSB27U_6.3.0/com.ibm.zvm.v630.hcpb4/hcpb4sth.htm
2016-06-15 09:21:46 +02:00
Kees Cook 0208b9445b s390/ptrace: run seccomp after ptrace
Close the hole where ptrace can change a syscall out from under seccomp.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-s390@vger.kernel.org
2016-06-14 10:54:45 -07:00
Andy Lutomirski 2f275de5d1 seccomp: Add a seccomp_data parameter secure_computing()
Currently, if arch code wants to supply seccomp_data directly to
seccomp (which is generally much faster than having seccomp do it
using the syscall_get_xyz() API), it has to use the two-phase
seccomp hooks. Add it to the easy hooks, too.

Cc: linux-arch@vger.kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-06-14 10:54:39 -07:00
Hendrik Brueckner 19c93787f5 s390/crc32-vx: use vector instructions to optimize CRC-32 computation
Use vector instructions to optimize the computation of CRC-32 checksums.
An optimized version is provided for CRC-32 (IEEE 802.3 Ethernet) in
normal and bitreflected domain, as well as, for bitreflected CRC-32C
(Castagnoli).

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:16 +02:00
Hendrik Brueckner 0486480802 s390/vx: add support functions for in-kernel FPU use
Introduce the kernel_fpu_begin() and kernel_fpu_end() function
to enclose any in-kernel use of FPU instructions and registers.
In enclosed sections, you can perform floating-point or vector
(SIMD) computations.  The functions take care of saving and
restoring FPU register contents and controls.

For usage details, see the guidelines in arch/s390/include/asm/fpu/api.h

Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:11 +02:00
Heiko Carstens de3fa841e4 s390/mm: fix compile for PAGE_DEFAULT_KEY != 0
The usual problem for code that is ifdef'ed out is that it doesn't
compile after a while. That's also the case for the storage key
initialisation code, if it would be used (set PAGE_DEFAULT_KEY to
something not zero):

./arch/s390/include/asm/page.h: In function 'storage_key_init_range':
./arch/s390/include/asm/page.h:36:2: error: implicit declaration of function '__storage_key_init_range'

Since the code itself has been useful for debugging purposes several
times, remove the ifdefs and make sure the code gets compiler
coverage. The cost for this is eight bytes.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-14 16:54:05 +02:00
Peter Zijlstra 726328d92a locking/spinlock, arch: Update and fix spin_unlock_wait() implementations
This patch updates/fixes all spin_unlock_wait() implementations.

The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.

This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.

I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.

Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.

I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.

Note: PPC32 will be fixed independently

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-14 11:55:15 +02:00
Andrea Gelmini 960cb306e6 KVM: S390: Fix typo
Signed-off-by: Andrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-06-14 11:16:27 +02:00
Heiko Carstens 86d18a55dd s390/topology: remove z10 special handling
I don't have a z10 to test this anymore, so I have no idea if the code
works at all or even crashes. I can try to emulate, but it is just
guess work.

Nor do we know if the z10 special handling is performance wise still
better than the generic handling. There have been a lot of changes to
the scheduler.

Therefore let's play safe and remove the special handling.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:27 +02:00
Heiko Carstens adac0f1e8c s390/topology: add drawer scheduling domain level
The z13 machine added a fourth level to the cpu topology
information. The new top level is called drawer.

A drawer contains two books, which used to be the top level.

Adding this additional scheduling domain did show performance
improvements for some workloads of up to 8%, while there don't
seem to be any workloads impacted in a negative way.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:27 +02:00
Heiko Carstens 0599eead58 s390/ipl: rename diagnose enums
Rename DIAG308_IPL and DIAG308_DUMP to DIAG308_LOAD_CLEAR and
DIAG308_LOAD_NORMAL_DUMP to better reflect the associated IPL
functions.

Suggested-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Suggested-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens 0f7451ff3a s390/ipl: use load normal for LPAR re-ipl
Avoid clearing memory for CCW-type re-ipl within a logical
partition. This can save a significant amount of time if a logical
partition contains a lot of memory.

On the other hand we still clear memory if running within a second
level hypervisor, since the hypervisor can simply free all memory that
was used for the guest.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens 6c22c98637 s390: avoid extable collisions
We have some inline assemblies where the extable entry points to a
label at the end of an inline assembly which is not followed by an
instruction.

On the other hand we have also inline assemblies where the extable
entry points to the first instruction of an inline assembly.

If a first type inline asm (extable point to empty label at the end)
would be directly followed by a second type inline asm (extable points
to first instruction) then we would have two different extable entries
that point to the same instruction but would have a different target
address.

This can lead to quite random behaviour, depending on sorting order.

I verified that we currently do not have such collisions within the
kernel. However to avoid such subtle bugs add a couple of nop
instructions to those inline assemblies which contain an extable that
points to an empty label.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:26 +02:00
Heiko Carstens ee64baf4ea s390/uaccess: use __builtin_expect for get_user/put_user
We always expect that get_user and put_user return with zero. Give the
compiler a hint so it can slightly optimize the code and avoid
branches.
This is the same what x86 got with commit a76cf66e94 ("x86/uaccess:
Tell the compiler that uaccess is unlikely to fault").

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:25 +02:00
Heiko Carstens b8ac5e2f4d s390/uaccess: fix whitespace damage
Fix some whitespace damage that was introduced by me with a
query-replace when removing 31 bit support.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:25 +02:00
Sebastian Ott 8ee2db3cf1 s390/pci: ensure to not cross a dma segment boundary
When we use the iommu_area_alloc helper to get dma addresses
we specify the boundary_size parameter but not the offset (called
shift in this context).

As long as the offset (start_dma) is a multiple of the boundary
we're ok (on current machines start_dma always seems to be 4GB).

Don't leave this to chance and specify the offset for iommu_area_alloc.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Sebastian Ott 53b1bc9aba s390/pci: ensure page aligned dma start address
We don't have an architectural guarantee on the value of
the dma offset but rely on it to be at least page aligned.
Enforce page alignemt of start_dma.

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Sebastian Ott bb98f396f1 s390: use SPARSE_IRQ
Use dynamically allocated irq descriptors on s390 which allows
us to get rid of the s390 specific config option PCI_NR_MSI and
exploit more MSI interrupts. Also the size of the kernel image
is reduced by 131K (using performance_defconfig).

Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:24 +02:00
Heiko Carstens 72a9b02d3b s390: use __section macro everywhere
Small cleanup patch to use the shorter __section macro everywhere.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:23 +02:00
Heiko Carstens d07a980c1b s390: add proper __ro_after_init support
On s390 __ro_after_init is currently mapped to __read_mostly which
means that data marked as __ro_after_init will not be protected.

Reason for this is that the common code __ro_after_init implementation
is x86 centric: the ro_after_init data section was added to rodata,
since x86 enables write protection to kernel text and rodata very
late. On s390 we have write protection for these sections enabled with
the initial page tables. So adding the ro_after_init data section to
rodata does not work on s390.

In order to make __ro_after_init work properly on s390 move the
ro_after_init data, right behind rodata. Unlike the rodata section it
will be marked read-only later after all init calls happened.

This s390 specific implementation adds new __start_ro_after_init and
__end_ro_after_init labels. Everything in between will be marked
read-only after the init calls happened. In addition to the
__ro_after_init data move also the exception table there, since from a
practical point of view it fits the __ro_after_init requirements.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:23 +02:00
Martin Schwidefsky 64f31d5802 s390/mm: simplify the TLB flushing code
ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
decide between a lazy TLB flush vs an immediate TLB flush. The field
contains two 16-bit counters, the number of CPUs that have the mm
attached and can create TLB entries for it and the number of CPUs in
the middle of a page table update.

The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
use the attach counter and a mask check with mm_cpumask(mm) to decide
between a local flush local of the current CPU and a global flush.

For all these functions the decision between lazy vs immediate and
local vs global TLB flush can be based on CPU masks. There are two
masks:  the mm->context.cpu_attach_mask with the CPUs that are actively
using the mm, and the mm_cpumask(mm) with the CPUs that have used the
mm since the last full flush. The decision between lazy vs immediate
flush is based on the mm->context.cpu_attach_mask, to decide between
local vs global flush the mm_cpumask(mm) is used.

With this patch all checks will use the CPU masks, the old counter
mm->context.attach_count with its two 16-bit values is turned into a
single counter mm->context.flush_count that keeps track of the number
of CPUs with incomplete page table updates. The sole user of this
counter is finish_arch_post_lock_switch() which waits for the end of
all page table updates.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:22 +02:00
Martin Schwidefsky a9809407f6 s390/mm: fix vunmap vs finish_arch_post_lock_switch
The vunmap_pte_range() function calls ptep_get_and_clear() without any
locking. ptep_get_and_clear() uses ptep_xchg_lazy()/ptep_flush_direct()
for the page table update. ptep_flush_direct requires that preemption
is disabled, but without any locking this is not the case. If the kernel
preempts the task while the attach_counter is increased an endless loop
in finish_arch_post_lock_switch() will occur the next time the task is
scheduled.

Add explicit preempt_disable()/preempt_enable() calls to the relevant
functions in arch/s390/mm/pgtable.c.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:21 +02:00
Martin Schwidefsky fd5ada0403 s390/time: remove ETR support
The External-Time-Reference (ETR) clock synchronization interface has
been superseded by Server-Time-Protocol (STP). Remove the outdated
ETR interface.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:21 +02:00
Martin Schwidefsky 936cc855ff s390/time: add leap seconds to initial system time
The PTFF instruction can be used to retrieve information about UTC
including the current number of leap seconds. Use this value to
convert the coordinated server time value of the TOD clock to a
proper UTC timestamp to initialize the system time. Without this
correction the system time will be off by the number of leap seonds
until it has been corrected via NTP.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:20 +02:00
Martin Schwidefsky 4027789192 s390/time: LPAR offset handling
It is possible to specify a user offset for the TOD clock, e.g. +2 hours.
The TOD clock will carry this offset even if the clock is synchronized
with STP. This makes the time stamps acquired with get_sync_clock()
useless as another LPAR migth use a different TOD offset.

Use the PTFF instrution to get the TOD epoch difference and subtract
it from the TOD clock value to get a physical timestamp. As the epoch
difference contains the sync check delta as well the LPAR offset value
to the physical clock needs to be refreshed after each clock
synchronization.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:20 +02:00
Martin Schwidefsky 9dc06ccf46 s390/time: move PTFF definitions
The PTFF instruction is not a function of ETR, rename and move the
PTFF definitions from etr.h to timex.h.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:20 +02:00
Martin Schwidefsky 2f82f57763 s390/time: STP sync clock correction
The sync clock operation of the channel subsystem call for STP delivers
the TOD clock difference as a result. Use this TOD clock difference
instead of the difference between the TOD timestamps before and after
the sync clock operation.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:19 +02:00
Heiko Carstens 4e042af463 s390/kexec: fix crash on resize of reserved memory
Reducing the size of reserved memory for the crash kernel will result
in an immediate crash on s390. Reason for that is that we do not
create struct pages for memory that is reserved. If that memory is
freed any access to struct pages which correspond to this memory will
result in invalid memory accesses and a kernel panic.

Fix this by properly creating struct pages when the system gets
initialized. Change the code also to make use of set_memory_ro() and
set_memory_rw() so page tables will be split if required.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:19 +02:00
Heiko Carstens 2d0af22479 s390/kexec: fix update of os_info crash kernel size
Implement an s390 version of the weak crash_free_reserved_phys_range
function. This allows us to update the size of the reserved crash
kernel memory if it will be resized.

This was previously done with a call to crash_unmap_reserved_pages
from crash_shrink_memory which was removed with ("s390/kexec:
consolidate crash_map/unmap_reserved_pages() and
arch_kexec_protect(unprotect)_crashkres()")

Fixes: 7a0058ec78 ("s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()")
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:19 +02:00
Heiko Carstens 0ccb32c983 s390/mm: align swapper_pg_dir to 16k
The segment/region table that is part of the kernel image must be
properly aligned to 16k in order to make the crdte inline assembly
work.
Otherwise it will calculate a wrong segment/region table start address
and access incorrect memory locations if the swapper_pg_dir is not
aligned to 16k.

Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir
at the beginning of the bss section and also align the bss section to
16k just like other architectures did.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:18 +02:00
Christian Borntraeger 4b8fe77ace s390: dump_stack: fill in arch description
Lets provide the basic machine information for dump_stack on
s390. This enables the "Hardware name:" line and results in
output like

[...]
Oops: 0004 ilc:2 [#1] SMP
Modules linked in:
CPU: 1 PID: 74 Comm: sh Not tainted 4.5.0+ #205
Hardware name: IBM              2964 NC9              704	(KVM)
[...]

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:18 +02:00
Daniel van Gerpen 99ec1112da s390: use canonical include guard style
Signed-off-by: Daniel van Gerpen <daniel@vangerpen.de>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:17 +02:00
Heiko Carstens 097a116c7e s390/cpuinfo: show dynamic and static cpu mhz
Show the dynamic and static cpu mhz of each cpu. Since these values
are per cpu this requires a fundamental extension of the format of
/proc/cpuinfo.

Historically we had only a single line per cpu and a summary at the
top of the file. This format is hardly extendible if we want to add
more per cpu information.

Therefore this patch adds per cpu blocks at the end of /proc/cpuinfo:

cpu             : 0
cpu Mhz dynamic : 5504
cpu Mhz static  : 5504

cpu             : 1
cpu Mhz dynamic : 5504
cpu Mhz static  : 5504

cpu             : 2
cpu Mhz dynamic : 5504
cpu Mhz static  : 5504

cpu             : 3
cpu Mhz dynamic : 5504
cpu Mhz static  : 5504

Right now each block contains only the dynamic and static cpu mhz,
but it can be easily extended like on every other architecture.

This extension is supposed to be compatible with the old format.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Sascha Silbe <silbe@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:17 +02:00
Heiko Carstens 219a21b3b0 s390/cpuinfo: print cache info and all single cpu lines on first iteration
Change the code to print all the current output during the first
iteration. This is a preparation patch for the upcoming per cpu block
extension to /proc/cpuinfo.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:17 +02:00
Jason Baron ac31418445 s390: add explicit <linux/stringify.h> for jump label
Ensure that we always have __stringify().

Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:16 +02:00
Heiko Carstens 37cd944c8d s390/pgtable: add mapping statistics
Add statistics that show how memory is mapped within the kernel
identity mapping. This is more or less the same like git
commit ce0c0e50f9 ("x86, generic: CPA add statistics about state
of direct mapping v4") for x86.

I also intentionally copied the lower case "k" within DirectMap4k vs
the upper case "M" and "G" within the two other lines. Let's have
consistent inconsistencies across architectures.

The output of /proc/meminfo now contains these additional lines:

DirectMap4k:        2048 kB
DirectMap1M:     3991552 kB
DirectMap2G:     4194304 kB

The implementation on s390 is lockless unlike the x86 version, since I
assume changes to the kernel mapping are a very rare event. Therefore
it really doesn't matter if these statistics could potentially be
inconsistent if read while kernel pages tables are being changed.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:16 +02:00
Heiko Carstens bab247ff5f s390/vmem: simplify vmem code for read-only mappings
For the kernel identity mapping map everything read-writeable and
subsequently call set_memory_ro() to make the ro section read-only.
This simplifies the code a lot.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:16 +02:00
Heiko Carstens e8a97e42dc s390/pageattr: allow kernel page table splitting
set_memory_ro() and set_memory_rw() currently only work on 4k
mappings, which is good enough for module code aka the vmalloc area.

However we stumbled already twice into the need to make this also work
on larger mappings:
- the ro after init patch set
- the crash kernel resize code

Therefore this patch implements automatic kernel page table splitting
if e.g. set_memory_ro() would be called on parts of a 2G mapping.
This works quite the same as the x86 code, but is much simpler.

In order to make this work and to be architecturally compliant we now
always use the csp, cspg or crdte instructions to replace valid page
table entries. This means that set_memory_ro() and set_memory_rw()
will be much more expensive than before. In order to avoid huge
latencies the code contains a couple of cond_resched() calls.

The current code only splits page tables, but does not merge them if
it would be possible.  The reason for this is that currently there is
no real life scenarion where this would really happen. All current use
cases that I know of only change access rights once during the life
time. If that should change we can still implement kernel page table
merging at a later time.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:15 +02:00
Heiko Carstens 9e20b4dac1 s390/pgtable: make pmd and pud helper functions available
Make pmd_wrprotect() and pmd_mkwrite() available independently from
CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_HUGETLB_PAGE so these can be
used on the kernel mapping.

Also introduce a couple of pud helper functions, namely pud_pfn(),
pud_wrprotect(), pud_mkwrite(), pud_mkdirty() and pud_mkclean()
which only work on the kernel mapping.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:15 +02:00
Heiko Carstens 3e76ee99b0 s390/mm: always use PAGE_KERNEL when mapping pages
Always use PAGE_KERNEL when re-enabling pages within the kernel
mapping due to debug pagealloc. Without using this pgprot value
pte_mkwrite() and pte_wrprotect() won't work on such mappings after an
unmap -> map cycle anymore.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:14 +02:00
Heiko Carstens 5aa29975e8 s390/vmem: make use of pte_clear()
Use pte_clear() instead of open-coding it.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:14 +02:00
Heiko Carstens c126aa83e2 s390/pgtable: get rid of _REGION3_ENTRY_RO
_REGION3_ENTRY_RO is a duplicate of _REGION_ENTRY_PROTECT.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:14 +02:00
Heiko Carstens 2dffdcbac9 s390/vmem: introduce and use SEGMENT_KERNEL and REGION3_KERNEL
Instead of open-coded SEGMENT_KERNEL and REGION3_KERNEL assignments use
defines.  Also to make e.g. pmd_wrprotect() work on the kernel mapping
a couple more flags must be set. Therefore add the missing flags also.

In order to make everything symmetrical this patch also adds software
dirty, young, read and write bits for region 3 table entries.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:13 +02:00
Heiko Carstens 2e9996fcf8 s390/vmem: align segment and region tables to 16k
Usually segment and region tables are 16k aligned due to the way the
buddy allocator works.  This is not true for the vmem code which only
asks for a 4k alignment. In order to be consistent enforce a 16k
alignment here as well.

This alignment will be assumed and therefore is required by the
pageattr code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:13 +02:00
Heiko Carstens 4ccccc522b s390/pgtable: introduce and use generic csp inline asm
We have already two inline assemblies which make use of the csp
instruction. Since I need a third instance let's introduce a generic
inline assmebly which can be used by everyone.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:13 +02:00
Christian Borntraeger 1c343f7b0e KVM: s390/mm: Fix CMMA reset during reboot
commit 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c") factored
out the page table handling code from __gmap_zap and  __s390_reset_cmma
into ptep_zap_unused and added a simple flag that tells which one of the
function (reset or not) is to be made. This also changed the behaviour,
as it also zaps unused page table entries on reset.
Turns out that this is wrong as s390_reset_cmma uses the page walker,
which DOES NOT take the ptl lock.

The most simple fix is to not do the zapping part on reset (which uses
the walker)

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c")
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-06-13 15:58:09 +02:00
David Hildenbrand a7e19ab55f KVM: s390: handle missing storage-key facility
Without the storage-key facility, SIE won't interpret SSKE, ISKE and
RRBE for us. So let's add proper interception handlers that will be called
if lazy sske cannot be enabled.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:31 +02:00
David Hildenbrand 11ddcd41bc KVM: s390: trace and count all skey intercepts
Let's trace and count all skey handling operations, even if lazy skey
handling was already activated. Also, don't enable lazy skey handling if
anything went wrong while enabling skey handling for the SIE.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:31 +02:00
David Hildenbrand 2386145152 s390/sclp: detect storage-key facility
Let's correctly detect that facility.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:30 +02:00
David Hildenbrand 695be0e7a2 KVM: s390: pfmf: handle address overflows
In theory, end could always end up being < start, if overflowing to 0.
Although very unlikely for now, let's just fix it.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:30 +02:00
David Hildenbrand 1824c723ac KVM: s390: pfmf: support conditional-sske facility
We already indicate that facility but don't implement it in our pfmf
interception handler. Let's add a new storage key handling function for
conditionally setting the guest storage key.

As we will reuse this function later on, let's directly implement returning
the old key via parameter and indicating if any change happened via rc.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:30 +02:00
David Hildenbrand 2c26d1d23a KVM: s390: pfmf: take care of amode when setting reg2
Depending on the addressing mode, we must not overwrite bit 0-31 of the
register. In addition, 24 bit and 31 bit have to set certain bits to 0,
which is guaranteed by converting the end address to an effective
address.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:29 +02:00
David Hildenbrand 9a68f0af8c KVM: s390: pfmf: MR and MC are ignored without CSSKE
These two bits are simply ignored when the conditional-SSKE facility is
not installed.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:29 +02:00
David Hildenbrand 6164a2e90a KVM: s390: pfmf: fix end address calculation
The current calculation is wrong if absolute != real address. Let's just
calculate the start address for 4k frames upfront. Otherwise, the
calculated end address will be wrong, resulting in wrong memory
location/storage keys getting touched.

To keep low-address protection working (using the effective address),
we have to move the check.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:28 +02:00
David Hildenbrand fe69eabf8d KVM: s390: storage keys fit into a char
No need to convert the storage key into an unsigned long, the target
function expects a char as argument.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:28 +02:00
David Hildenbrand 154c8c19c3 s390/mm: return key via pointer in get_guest_storage_key
Let's just split returning the key and reporting errors. This makes calling
code easier and avoids bugs as happened already.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:28 +02:00
David Hildenbrand 8d6037a7b4 s390/mm: simplify get_guest_storage_key
We can safe a few LOC and make that function easier to understand
by rewriting existing code.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:27 +02:00
Martin Schwidefsky d3ed1ceeac s390/mm: set and get guest storage key mmap locking
Move the mmap semaphore locking out of set_guest_storage_key
and get_guest_storage_key. This makes the two functions more
like the other ptep_xxx operations and allows to avoid repeated
semaphore operations if multiple keys are read or written.

Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:27 +02:00
David Hildenbrand c427c42cd6 s390/mm: don't drop errors in get_guest_storage_key
Commit 1e133ab296 ("s390/mm: split arch/s390/mm/pgtable.c") changed
the return value of get_guest_storage_key to an unsigned char, resulting
in -EFAULT getting interpreted as a valid storage key.

Cc: stable@vger.kernel.org # 4.6+
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:26 +02:00
Christian Borntraeger dcc98ea614 KVM: s390: fixup I/O interrupt traces
We currently have two issues with the I/O  interrupt injection logging:
1. All QEMU versions up to 2.6 have a wrong encoding of device numbers
etc for the I/O interrupt type, so the inject VM_EVENT will have wrong
data. Let's fix this by using the interrupt parameters and not the
interrupt type number.
2. We only log in kvm_s390_inject_vm, but not when coming from
kvm_s390_reinject_io_int or from flic. Let's move the logging to the
common __inject_io function.

We also enhance the logging for delivery to match the data.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-06-10 12:07:26 +02:00
Christian Borntraeger 1bb78d161f KVM: s390: provide logging for diagnose 0x500
We might need to debug some virtio things, so better have diagnose 500
logged.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2016-06-10 12:07:26 +02:00
David Hildenbrand f597d24eee KVM: s390: turn on tx even without ctx
Constrained transactional execution is an addon of transactional execution.

Let's enable the assist also if only TX is enabled for the guest.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:25 +02:00
David Hildenbrand bdab09f3d8 KVM: s390: enable host-protection-interruption only with ESOP
host-protection-interruption control was introduced with ESOP. So let's
enable it only if we have ESOP and add an explanatory comment why
we can live without it.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:25 +02:00
David Hildenbrand 09a400e78e KVM: s390: enable ibs only if available
Let's enable interlock-and-broadcast suppression only if the facility is
actually available.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:24 +02:00
David Hildenbrand 9c375490fc s390/sclp: detect interlock-and-broadcast-suppression facility
Let's detect that facility.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:24 +02:00
David Hildenbrand 873b425e4c KVM: s390: enable PFMFI only if available
Let's enable interpretation of PFMFI only if the facility is
actually available. Emulation code still works in case the guest is
offered EDAT-1.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:23 +02:00
David Hildenbrand a0eb55e631 s390/sclp: detect PFMF interpretation facility
Let's detect that facility.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:23 +02:00
David Hildenbrand 48ee7d3a7f KVM: s390: enable cei only if available
Let's only enable conditional-external-interruption if the facility is
actually available.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:23 +02:00
David Hildenbrand 4a5c3e0827 s390/sclp: detect conditional-external-interception facility
Let's detect if we have that facility.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:22 +02:00
David Hildenbrand 11ad65b79e KVM: s390: enable ib only if available
Let's enable intervention bypass only if the facility is acutally
available.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:22 +02:00
David Hildenbrand 72cd82b9e9 s390/sclp: detect intervention bypass facility
Let's detect if we have the intervention bypass facility installed.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:21 +02:00
David Hildenbrand efed110446 KVM: s390: handle missing guest-storage-limit-suppression
If guest-storage-limit-suppression is not available, we would for now
have a valid guest address space with size 0. So let's simply set the
origin to 0 and the limit to hamax.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:21 +02:00
David Hildenbrand 5236c751da s390/sclp: detect guest-storage-limit-suppression
Let's detect that facility.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:20 +02:00
David Hildenbrand f9cbd9b025 KVM: s390: provide CMMA attributes only if available
Let's not provide the device attribute for cmma enabling and clearing
if the hardware doesn't support it.

This also helps getting rid of the undocumented return value "-EINVAL"
in case CMMA is not available when trying to enable it.

Also properly document the meaning of -EINVAL for CMMA clearing.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:20 +02:00
David Hildenbrand c24cc9c8a6 KVM: s390: enable CMMA if the interpration is available
Now that we can detect if collaborative-memory-management interpretation
is available, replace the heuristic by a real hardware detection.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:19 +02:00
David Hildenbrand 09be9cb92b s390/sclp: detect cmma
Let's detect the Collaborative-memory-management-interpretation facility,
aka CMM assist, so we can correctly enable cmma later.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:19 +02:00
David Hildenbrand 89b5b4de33 KVM: s390: guestdbg: signal missing hardware support
Without guest-PER enhancement, we can't provide any debugging support.
Therefore act like kernel support is missing.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:18 +02:00
David Hildenbrand b9e28897e6 s390/sclp: detect guest-PER enhancement
Let's detect that facility, so we can correctly handle its abscence.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:18 +02:00
David Hildenbrand 76a6dd7241 KVM: s390: handle missing 64-bit-SCAO facility
Without that facility, we may only use scaol. So fallback
to DMA allocation in that case, so we won't overwrite random memory
via the SIE.

Also disallow ESCA, so we don't have to handle that allocation case.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:18 +02:00
David Hildenbrand 4013ade3fb s390/sclp: detect 64-bit-SCAO facility
Let's correctly detect that facility, so we can correctly handle its
abscence later on.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:17 +02:00
David Hildenbrand 0a763c780b KVM: s390: interface to query and configure cpu subfunctions
We have certain instructions that indicate available subfunctions via
a query subfunction (crypto functions and ptff), or via a test bit
function (plo).

By exposing these "subfunction blocks" to user space, we allow user space
to
1) query available subfunctions and make sure subfunctions won't get lost
   during migration - e.g. properly indicate them via a CPU model
2) change the subfunctions to be reported to the guest (even adding
   unavailable ones)

This mechanism works just like the way we indicate the stfl(e) list to
user space.

This way, user space could even emulate some subfunctions in QEMU in the
future. If this is ever applicable, we have to make sure later on, that
unsupported subfunctions result in an intercept to QEMU.

Please note that support to indicate them to the guest is still missing
and requires hardware support. Usually, the IBC takes already care of these
subfunctions for migration safety. QEMU should make sure to always set
these bits properly according to the machine generation to be emulated.

Available subfunctions are only valid in combination with STFLE bits
retrieved via KVM_S390_VM_CPU_MACHINE and enabled via
KVM_S390_VM_CPU_PROCESSOR. If the applicable bits are available, the
indicated subfunctions are guaranteed to be correct.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:17 +02:00
David Hildenbrand 1afd43e0fb s390/crypto: allow to query all known cpacf functions
KVM will have to query these functions, let's add at least the query
capabilities.

PCKMO has RRE format, as bit 16-31 are ignored, we can still use the
existing function. As PCKMO won't touch the cc, let's force it to 0
upfront.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:16 +02:00
David Hildenbrand bcfa01d787 KVM: s390: gaccess: convert get_vcpu_asce()
Let's use our new function for preparing translation exceptions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:16 +02:00
David Hildenbrand cde0dcfb5d KVM: s390: gaccess: convert guest_page_range()
Let's use our new function for preparing translation exceptions. As we will
need the correct ar, let's pass that to guest_page_range().

This will also make sure that the guest address is stored in the tec
for applicable excptions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:15 +02:00
David Hildenbrand fbcb7d5157 KVM: s390: gaccess: convert guest_translate_address()
Let's use our new function for preparing translation exceptions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:15 +02:00
David Hildenbrand 3e3c67f6a3 KVM: s390: gaccess: convert kvm_s390_check_low_addr_prot_real()
Let's use our new function for preparing translation exceptions.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:15 +02:00
David Hildenbrand d03193de30 KVM: s390: gaccess: function for preparing translation exceptions
Let's provide a function trans_exc() that can be used for handling
preparation of translation exceptions on a central basis. We will use
that function to replace existing code in gaccess.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:14 +02:00
David Hildenbrand 6167375b55 KVM: s390: gaccess: store guest address on ALC prot exceptions
Let's pass the effective guest address to get_vcpu_asce(), so we
can properly set the guest address in case we inject an ALC protection
exception.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:14 +02:00
David Hildenbrand 22be5a1331 KVM: s390: forward ESOP if available
ESOP guarantees that during a protection exception, bit 61 of real location
168-175 will only be set to 1 if it was because of ALCP or DATP. If the
exception is due to LAP or KCP, the bit will always be set to 0.

The old SOP definition allowed bit 61 to be unpredictable in case of LAP
or KCP in some conditions. So ESOP replaces this unpredictability by
a guarantee.

Therefore, we can directly forward ESOP if it is available on our machine.
We don't have to do anything when ESOP is disabled - the guest will simply
expect unpredictable values. Our guest access functions are already
handling ESOP properly.

Please note that future functionality in KVM will require knowledge about
ESOP being enabled for a guest or not.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:13 +02:00
David Hildenbrand 15c9705f0c KVM: s390: interface to query and configure cpu features
For now, we only have an interface to query and configure facilities
indicated via STFL(E). However, we also have features indicated via
SCLP, that have to be indicated to the guest by user space and usually
require KVM support.

This patch allows user space to query and configure available cpu features
for the guest.

Please note that disabling a feature doesn't necessarily mean that it is
completely disabled (e.g. ESOP is mostly handled by the SIE). We will try
our best to disable it.

Most features (e.g. SCLP) can't directly be forwarded, as most of them need
in addition to hardware support, support in KVM. As we later on want to
turn these features in KVM explicitly on/off (to simulate different
behavior), we have to filter all features provided by the hardware and
make them configurable.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:13 +02:00
Alexander Yarygin c1778e5157 KVM: s390: Add mnemonic print to kvm_s390_intercept_prog
We have a table of mnemonic names for intercepted program
interruptions, let's print readable name of the interruption in the
kvm_s390_intercept_prog trace event.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:13 +02:00
Janosch Frank 7d0a5e6241 KVM: s390: Limit sthyi execution
Store hypervisor information is a valid instruction not only in
supervisor state but also in problem state, i.e. the guest's
userspace. Its execution is not only computational and memory
intensive, but also has to get hold of the ipte lock to write to the
guest's memory.

This lock is not intended to be held often and long, especially not
from the untrusted guest userspace. Therefore we apply rate limiting
of sthyi executions per VM.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:12 +02:00
Janosch Frank 95ca2cb579 KVM: s390: Add sthyi emulation
Store Hypervisor Information is an emulated z/VM instruction that
provides a guest with basic information about the layers it is running
on. This includes information about the cpu configuration of both the
machine and the lpar, as well as their names, machine model and
machine type. This information enables an application to determine the
maximum capacity of CPs and IFLs available to software.

The instruction is available whenever the facility bit 74 is set,
otherwise executing it results in an operation exception.

It is important to check the validity flags in the sections before
using data from any structure member. It is not guaranteed that all
members will be valid on all machines / machine configurations.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:12 +02:00
Janosch Frank a2d57b35c0 KVM: s390: Extend diag 204 fields
The new store hypervisor information instruction, which we are going
to introduce, needs previously unused fields in diag 204 structures.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:11 +02:00
Janosch Frank a011eeb2a3 KVM: s390: Add operation exception interception handler
This commit introduces code that handles operation exception
interceptions. With this handler we can emulate instructions by using
illegal opcodes.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:11 +02:00
Janosch Frank 022bd2d11c s390: Make diag224 public
Diag204's cpu structures only contain the cpu type by means of an
index in the diag224 name table. Hence, to be able to use diag204 in
any meaningful way, we also need a usable diag224 interface.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:10 +02:00
Janosch Frank e435dc3139 s390: Make cpc_name accessible
sclp_ocf.c is the only way to get the cpc name, as it registers the
sole event handler for the ocf event. By creating a new global
function that copies that name, we make it accessible to the world
which longs to retrieve it.

Additionally we now also store the cpc name as EBCDIC, so we don't
have to convert it to and from ASCII if it is requested in native
encoding.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:10 +02:00
Janosch Frank e65f30e0cb s390: hypfs: Move diag implementation and data definitions
Diag 204 data and function definitions currently live in the hypfs
files. As KVM will be a consumer of this data, we need to make it
publicly available and move it to the appropriate diag.{c,h} files.

__attribute__ ((packed)) occurences were replaced with __packed for
all moved structs.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 12:07:09 +02:00
Alexander Yarygin 9ec6de1923 KVM: s390: Add stats for PEI events
Add partial execution intercepted events in kvm_stats_debugfs.

Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 10:24:24 +02:00
David Hildenbrand 0487c44d1e KVM: s390: ignore IBC if zero
Looks like we forgot about the special IBC value of 0 meaning "no IBC".
Let's fix that, otherwise it gets rounded up and suddenly an IBC is active
with the lowest possible machine.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Fixes: commit 053dd2308d ("KVM: s390: force ibc into valid range")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2016-06-10 10:21:38 +02:00
Jason Low d157bd860f locking/rwsem: Remove rwsem_atomic_add() and rwsem_atomic_update()
The rwsem-xadd count has been converted to an atomic variable and the
rwsem code now directly uses atomic_long_add() and
atomic_long_add_return(), so we can remove the arch implementations of
rwsem_atomic_add() and rwsem_atomic_update().

Signed-off-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Terry Rudd <terry.rudd@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-06-08 15:16:59 +02:00
Linus Torvalds 367d3fd505 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bugs fixes and an update for the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: fix info leak in do_sigsegv
  s390/config: update default configuration
  s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
  s390/bpf: reduce maximum program size to 64 KB
2016-05-31 09:43:24 -07:00
Linus Torvalds 5b26fc8824 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and
   unexports symbols which are not used in the current config [Nicolas
   Pitre]

 - several kbuild rule cleanups [Masahiro Yamada]

 - warning option adjustments for gcov etc [Arnd Bergmann]

 - a few more small fixes

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits)
  kbuild: move -Wunused-const-variable to W=1 warning level
  kbuild: fix if_change and friends to consider argument order
  kbuild: fix adjust_autoksyms.sh for modules that need only one symbol
  kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line
  gcov: disable -Wmaybe-uninitialized warning
  gcov: disable tree-loop-im to reduce stack usage
  gcov: disable for COMPILE_TEST
  Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
  Kbuild: change CC_OPTIMIZE_FOR_SIZE definition
  kbuild: forbid kernel directory to contain spaces and colons
  kbuild: adjust ksym_dep_filter for some cmd_* renames
  kbuild: Fix dependencies for final vmlinux link
  kbuild: better abstract vmlinux sequential prerequisites
  kbuild: fix call to adjust_autoksyms.sh when output directory specified
  kbuild: Get rid of KBUILD_STR
  kbuild: rename cmd_as_s_S to cmd_cpp_s_S
  kbuild: rename cmd_cc_i_c to cmd_cpp_i_c
  kbuild: drop redundant "PHONY += FORCE"
  kbuild: delete unnecessary "@:"
  kbuild: mark help target as PHONY
  ...
2016-05-26 22:01:22 -07:00
Linus Torvalds bdc6b758e4 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar:
 "Mostly tooling and PMU driver fixes, but also a number of late updates
  such as the reworking of the call-chain size limiting logic to make
  call-graph recording more robust, plus tooling side changes for the
  new 'backwards ring-buffer' extension to the perf ring-buffer"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  perf record: Read from backward ring buffer
  perf record: Rename variable to make code clear
  perf record: Prevent reading invalid data in record__mmap_read
  perf evlist: Add API to pause/resume
  perf trace: Use the ptr->name beautifier as default for "filename" args
  perf trace: Use the fd->name beautifier as default for "fd" args
  perf report: Add srcline_from/to branch sort keys
  perf evsel: Record fd into perf_mmap
  perf evsel: Add overwrite attribute and check write_backward
  perf tools: Set buildid dir under symfs when --symfs is provided
  perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced
  perf annotate: Sort list of recognised instructions
  perf annotate: Fix identification of ARM blt and bls instructions
  perf tools: Fix usage of max_stack sysctl
  perf callchain: Stop validating callchains by the max_stack sysctl
  perf trace: Fix exit_group() formatting
  perf top: Use machine->kptr_restrict_warned
  perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1
  perf machine: Do not bail out if not managing to read ref reloc symbol
  perf/x86/intel/p4: Trival indentation fix, remove space
  ...
2016-05-25 17:05:40 -07:00
Michal Hocko 6904817607 vdso: make arch_setup_additional_pages wait for mmap_sem for write killable
most architectures are relying on mmap_sem for write in their
arch_setup_additional_pages.  If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving.  Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.

Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Andy Lutomirski <luto@amacapital.net>	[x86 vdso]
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Xunlei Pang 7a0058ec78 s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()
Commit 3f625002581b ("kexec: introduce a protection mechanism for the
crashkernel reserved memory") is a similar mechanism for protecting the
crash kernel reserved memory to previous crash_map/unmap_reserved_pages()
implementation, the new one is more generic in name and cleaner in code
(besides, some arch may not be allowed to unmap the pgtable).

Therefore, this patch consolidates them, and uses the new
arch_kexec_protect(unprotect)_crashkres() to replace former
crash_map/unmap_reserved_pages() which by now has been only used by
S390.

The consolidation work needs the crash memory to be mapped initially,
this is done in machine_kdump_pm_init() which is after
reserve_crashkernel().  Once kdump kernel is loaded, the new
arch_kexec_protect_crashkres() implemented for S390 will actually
unmap the pgtable like before.

Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Minfei Huang <mhuang@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-23 17:04:14 -07:00
Michal Hocko cf0d44d513 s390: fix info leak in do_sigsegv
Aleksa has reported incorrect si_errno value when stracing task which
received SIGSEGV:
[pid 20799] --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_errno=2510266, si_addr=0x100000000000000}

The reason seems to be that do_sigsegv is not initializing siginfo
structure defined on the stack completely so it will leak 4B of
the previous stack content. Fix it simply by initializing si_errno
to 0 (same as do_sigbus does already).

Cc: stable # introduced pre-git times
Reported-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-23 16:45:25 +02:00
Heiko Carstens e9bc15f28e s390/config: update default configuration
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-23 09:11:37 +02:00
Zhaoxiu Zeng fff7fb0b2d lib/GCD.c: use binary GCD algorithm instead of Euclidean
The binary GCD algorithm is based on the following facts:
	1. If a and b are all evens, then gcd(a,b) = 2 * gcd(a/2, b/2)
	2. If a is even and b is odd, then gcd(a,b) = gcd(a/2, b)
	3. If a and b are all odds, then gcd(a,b) = gcd((a-b)/2, b) = gcd((a+b)/2, b)

Even on x86 machines with reasonable division hardware, the binary
algorithm runs about 25% faster (80% the execution time) than the
division-based Euclidian algorithm.

On platforms like Alpha and ARMv6 where division is a function call to
emulation code, it's even more significant.

There are two variants of the code here, depending on whether a fast
__ffs (find least significant set bit) instruction is available.  This
allows the unpredictable branches in the bit-at-a-time shifting loop to
be eliminated.

If fast __ffs is not available, the "even/odd" GCD variant is used.

I use the following code to benchmark:

	#include <stdio.h>
	#include <stdlib.h>
	#include <stdint.h>
	#include <string.h>
	#include <time.h>
	#include <unistd.h>

	#define swap(a, b) \
		do { \
			a ^= b; \
			b ^= a; \
			a ^= b; \
		} while (0)

	unsigned long gcd0(unsigned long a, unsigned long b)
	{
		unsigned long r;

		if (a < b) {
			swap(a, b);
		}

		if (b == 0)
			return a;

		while ((r = a % b) != 0) {
			a = b;
			b = r;
		}

		return b;
	}

	unsigned long gcd1(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd2(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	unsigned long gcd3(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		b >>= __builtin_ctzl(b);
		if (b == 1)
			return r & -r;

		for (;;) {
			a >>= __builtin_ctzl(a);
			if (a == 1)
				return r & -r;
			if (a == b)
				return a << __builtin_ctzl(r);

			if (a < b)
				swap(a, b);
			a -= b;
		}
	}

	unsigned long gcd4(unsigned long a, unsigned long b)
	{
		unsigned long r = a | b;

		if (!a || !b)
			return r;

		r &= -r;

		while (!(b & r))
			b >>= 1;
		if (b == r)
			return r;

		for (;;) {
			while (!(a & r))
				a >>= 1;
			if (a == r)
				return r;
			if (a == b)
				return a;

			if (a < b)
				swap(a, b);
			a -= b;
			a >>= 1;
			if (a & r)
				a += b;
			a >>= 1;
		}
	}

	static unsigned long (*gcd_func[])(unsigned long a, unsigned long b) = {
		gcd0, gcd1, gcd2, gcd3, gcd4,
	};

	#define TEST_ENTRIES (sizeof(gcd_func) / sizeof(gcd_func[0]))

	#if defined(__x86_64__)

	#define rdtscll(val) do { \
		unsigned long __a,__d; \
		__asm__ __volatile__("rdtsc" : "=a" (__a), "=d" (__d)); \
		(val) = ((unsigned long long)__a) | (((unsigned long long)__d)<<32); \
	} while(0)

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		unsigned long long start, end;
		unsigned long long ret;
		unsigned long gcd_res;

		rdtscll(start);
		gcd_res = gcd(a, b);
		rdtscll(end);

		if (end >= start)
			ret = end - start;
		else
			ret = ~0ULL - start + 1 + end;

		*res = gcd_res;
		return ret;
	}

	#else

	static inline struct timespec read_time(void)
	{
		struct timespec time;
		clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time);
		return time;
	}

	static inline unsigned long long diff_time(struct timespec start, struct timespec end)
	{
		struct timespec temp;

		if ((end.tv_nsec - start.tv_nsec) < 0) {
			temp.tv_sec = end.tv_sec - start.tv_sec - 1;
			temp.tv_nsec = 1000000000ULL + end.tv_nsec - start.tv_nsec;
		} else {
			temp.tv_sec = end.tv_sec - start.tv_sec;
			temp.tv_nsec = end.tv_nsec - start.tv_nsec;
		}

		return temp.tv_sec * 1000000000ULL + temp.tv_nsec;
	}

	static unsigned long long benchmark_gcd_func(unsigned long (*gcd)(unsigned long, unsigned long),
								unsigned long a, unsigned long b, unsigned long *res)
	{
		struct timespec start, end;
		unsigned long gcd_res;

		start = read_time();
		gcd_res = gcd(a, b);
		end = read_time();

		*res = gcd_res;
		return diff_time(start, end);
	}

	#endif

	static inline unsigned long get_rand()
	{
		if (sizeof(long) == 8)
			return (unsigned long)rand() << 32 | rand();
		else
			return rand();
	}

	int main(int argc, char **argv)
	{
		unsigned int seed = time(0);
		int loops = 100;
		int repeats = 1000;
		unsigned long (*res)[TEST_ENTRIES];
		unsigned long long elapsed[TEST_ENTRIES];
		int i, j, k;

		for (;;) {
			int opt = getopt(argc, argv, "n:r:s:");
			/* End condition always first */
			if (opt == -1)
				break;

			switch (opt) {
			case 'n':
				loops = atoi(optarg);
				break;
			case 'r':
				repeats = atoi(optarg);
				break;
			case 's':
				seed = strtoul(optarg, NULL, 10);
				break;
			default:
				/* You won't actually get here. */
				break;
			}
		}

		res = malloc(sizeof(unsigned long) * TEST_ENTRIES * loops);
		memset(elapsed, 0, sizeof(elapsed));

		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			/* Do we have args? */
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			unsigned long long min_elapsed[TEST_ENTRIES];
			for (k = 0; k < repeats; k++) {
				for (i = 0; i < TEST_ENTRIES; i++) {
					unsigned long long tmp = benchmark_gcd_func(gcd_func[i], a, b, &res[j][i]);
					if (k == 0 || min_elapsed[i] > tmp)
						min_elapsed[i] = tmp;
				}
			}
			for (i = 0; i < TEST_ENTRIES; i++)
				elapsed[i] += min_elapsed[i];
		}

		for (i = 0; i < TEST_ENTRIES; i++)
			printf("gcd%d: elapsed %llu\n", i, elapsed[i]);

		k = 0;
		srand(seed);
		for (j = 0; j < loops; j++) {
			unsigned long a = get_rand();
			unsigned long b = argc > optind ? strtoul(argv[optind], NULL, 10) : get_rand();
			for (i = 1; i < TEST_ENTRIES; i++) {
				if (res[j][i] != res[j][0])
					break;
			}
			if (i < TEST_ENTRIES) {
				if (k == 0) {
					k = 1;
					fprintf(stderr, "Error:\n");
				}
				fprintf(stderr, "gcd(%lu, %lu): ", a, b);
				for (i = 0; i < TEST_ENTRIES; i++)
					fprintf(stderr, "%ld%s", res[j][i], i < TEST_ENTRIES - 1 ? ", " : "\n");
			}
		}

		if (k == 0)
			fprintf(stderr, "PASS\n");

		free(res);

		return 0;
	}

Compiled with "-O2", on "VirtualBox 4.4.0-22-generic #38-Ubuntu x86_64" got:

  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 10174
  gcd1: elapsed 2120
  gcd2: elapsed 2902
  gcd3: elapsed 2039
  gcd4: elapsed 2812
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9309
  gcd1: elapsed 2280
  gcd2: elapsed 2822
  gcd3: elapsed 2217
  gcd4: elapsed 2710
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9589
  gcd1: elapsed 2098
  gcd2: elapsed 2815
  gcd3: elapsed 2030
  gcd4: elapsed 2718
  PASS
  zhaoxiuzeng@zhaoxiuzeng-VirtualBox:~/develop$ ./gcd -r 500000 -n 10
  gcd0: elapsed 9914
  gcd1: elapsed 2309
  gcd2: elapsed 2779
  gcd3: elapsed 2228
  gcd4: elapsed 2709
  PASS

[akpm@linux-foundation.org: avoid #defining a CONFIG_ variable]
Signed-off-by: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
Signed-off-by: George Spelvin <linux@horizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Petr Mladek 42a0bb3f71 printk/nmi: generic solution for safe printk in NMI
printk() takes some locks and could not be used a safe way in NMI
context.

The chance of a deadlock is real especially when printing stacks from
all CPUs.  This particular problem has been addressed on x86 by the
commit a9edc88093 ("x86/nmi: Perform a safe NMI stack trace on all
CPUs").

The patchset brings two big advantages.  First, it makes the NMI
backtraces safe on all architectures for free.  Second, it makes all NMI
messages almost safe on all architectures (the temporary buffer is
limited.  We still should keep the number of messages in NMI context at
minimum).

Note that there already are several messages printed in NMI context:
WARN_ON(in_nmi()), BUG_ON(in_nmi()), anything being printed out from MCE
handlers.  These are not easy to avoid.

This patch reuses most of the code and makes it generic.  It is useful
for all messages and architectures that support NMI.

The alternative printk_func is set when entering and is reseted when
leaving NMI context.  It queues IRQ work to copy the messages into the
main ring buffer in a safe context.

__printk_nmi_flush() copies all available messages and reset the buffer.
Then we could use a simple cmpxchg operations to get synchronized with
writers.  There is also used a spinlock to get synchronized with other
flushers.

We do not longer use seq_buf because it depends on external lock.  It
would be hard to make all supported operations safe for a lockless use.
It would be confusing and error prone to make only some operations safe.

The code is put into separate printk/nmi.c as suggested by Steven
Rostedt.  It needs a per-CPU buffer and is compiled only on
architectures that call nmi_enter().  This is achieved by the new
HAVE_NMI Kconfig flag.

The are MN10300 and Xtensa architectures.  We need to clean up NMI
handling there first.  Let's do it separately.

The patch is heavily based on the draft from Peter Zijlstra, see

  https://lkml.org/lkml/2015/6/10/327

[arnd@arndb.de: printk-nmi: use %zu format string for size_t]
[akpm@linux-foundation.org: min_t->min - all types are size_t here]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>	[arm part]
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Jiri Kosina <jkosina@suse.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: David Miller <davem@davemloft.net>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Jiri Slaby e64646946e exit_thread: accept a task parameter to be exited
We need to call exit_thread from copy_process in a fail path.  So make it
accept task_struct as a parameter.

[v2]
* s390: exit_thread_runtime_instr doesn't make sense to be called for
  non-current tasks.
* arm: fix the comment in vfp_thread_copy
* change 'me' to 'tsk' for task_struct
* now we can change only archs that actually have exit_thread

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Jiri Slaby 5f56a5dfdb exit_thread: remove empty bodies
Define HAVE_EXIT_THREAD for archs which want to do something in
exit_thread. For others, let's define exit_thread as an empty inline.

This is a cleanup before we change the prototype of exit_thread to
accept a task parameter.

[akpm@linux-foundation.org: fix mips]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Howells <dhowells@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-20 17:58:30 -07:00
Ingo Molnar 21f77d231f perf/core improvements and fixes:
User visible:
 
 - Honour the kernel.perf_event_max_stack knob more precisely by not counting
   PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
   the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)
 
 - Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)
 
 - Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)
 
 - Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)
 
 - Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)
 
 - Store vdso buildid unconditionally, as it appears in callchains and
   we're not checking those when creating the build-id table, so we
   end up not being able to resolve VDSO symbols when doing analysis
   on a different machine than the one where recording was done, possibly
   of a different arch even (arm -> x86_64) (He Kuang)
 
 Infrastructure:
 
 - Generalize max_stack sysctl handler, will be used for configuring
   multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)
 
 Cleanups:
 
 - Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
   open coded strings (Masami Hiramatsu)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXOn7eAAoJENZQFvNTUqpAsOAP/3f/XJekPQAnMcKRBp2noCuj
 nRu1kBltVJyP8iOU5PKSJwel4F9ykNNMl+/rzzxHDo13IM8uc+HnZOJZ6e9mJIJ1
 xqjdqM4EDlYYoFApJzCjTK6CMlevCazosdQT1bbmMDYVPc2uQR/GnutFrzqf/Plg
 hEougIGtfrdy85g95CRdxpy2yMwDK4EwsiDRm9ib1hnuamQZl97buWemBVqSJmLY
 p82E2aMU5Fv5+B8AO4I7V88ZmgpmryjxpM+LjffgNUDSKsSHrlG4NiQ3znV1bgst
 Rc++w78+qxoIozOu6/IX8eSI2L/1eyM/yQ6Qre0KuvYXCl+NopTAYSSJlaA4tyHF
 c55z7HucuyATN3PrFRHlbWUT/RMIVC0j0lnZOc7SJLl90hJQ+nv0iZcbYwMbeHu1
 3LGlcd9jDwQYiClbaT9ATxZJ8B9An0/k/HJdatbAHN0wRomP2Ozz/qD2nmEbUwpV
 sCyLOo/LJkvVkuUjSg6ZiOArNIk4iTSPSAUV+SAL6YOEOZMAX5ISUJQ174+zFC9a
 gqtVsCXvwLIsndXb8ys1r9/fit/MUci0OzKX3SG1K765+E4Bk23KcAgMNbM/a7lp
 ZmHDXMC+yBYcnYNnaxkp7c55CWUlKGOeR4e+KmB99KoeIleYgPhD2UM5beo61TmN
 yUEPtiiFiZmTRkiAu83R
 =7OdF
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-20160516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

- Honour the kernel.perf_event_max_stack knob more precisely by not counting
  PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to
  the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo)

- Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim)

- Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim)

- Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim)

- Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen)

- Store vdso buildid unconditionally, as it appears in callchains and
  we're not checking those when creating the build-id table, so we
  end up not being able to resolve VDSO symbols when doing analysis
  on a different machine than the one where recording was done, possibly
  of a different arch even (arm -> x86_64) (He Kuang)

Infrastructure changes:

- Generalize max_stack sysctl handler, will be used for configuring
  multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo)

Cleanups:

- Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using
  open coded strings (Masami Hiramatsu)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-05-20 08:20:14 +02:00
Linus Torvalds a05a70db34 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - fsnotify fix

 - poll() timeout fix

 - a few scripts/ tweaks

 - debugobjects updates

 - the (small) ocfs2 queue

 - Minor fixes to kernel/padata.c

 - Maybe half of the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
  mm, page_alloc: restore the original nodemask if the fast path allocation failed
  mm, page_alloc: uninline the bad page part of check_new_page()
  mm, page_alloc: don't duplicate code in free_pcp_prepare
  mm, page_alloc: defer debugging checks of pages allocated from the PCP
  mm, page_alloc: defer debugging checks of freed pages until a PCP drain
  cpuset: use static key better and convert to new API
  mm, page_alloc: inline pageblock lookup in page free fast paths
  mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
  mm, page_alloc: pull out side effects from free_pages_check
  mm, page_alloc: un-inline the bad part of free_pages_check
  mm, page_alloc: check multiple page fields with a single branch
  mm, page_alloc: remove field from alloc_context
  mm, page_alloc: avoid looking up the first zone in a zonelist twice
  mm, page_alloc: shortcut watermark checks for order-0 pages
  mm, page_alloc: reduce cost of fair zone allocation policy retry
  mm, page_alloc: shorten the page allocator fast path
  mm, page_alloc: check once if a zone has isolated pageblocks
  mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
  mm, page_alloc: simplify last cpupid reset
  mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
  ...
2016-05-19 20:00:06 -07:00
Hugh Dickins fd8cfd3000 arch: fix has_transparent_hugepage()
I've just discovered that the useful-sounding has_transparent_hugepage()
is actually an architecture-dependent minefield: on some arches it only
builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
not, but on some of those (arm and arm64) it then gives the wrong
answer; and on mips alone it's marked __init, which would crash if
called later (but so far it has not been called later).

Straighten this out: make it available to all configs, with a sensible
default in asm-generic/pgtable.h, removing its definitions from those
arches (arc, arm, arm64, sparc, tile) which are served by the default,
adding #define has_transparent_hugepage has_transparent_hugepage to
those (mips, powerpc, s390, x86) which need to override the default at
runtime, and removing the __init from mips (but maybe that kind of code
should be avoided after init: set a static variable the first time it's
called).

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Ning Qu <quning@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com>		[arch/arc]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[arch/s390]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-19 19:12:14 -07:00
Linus Torvalds 7beaa24ba4 Small release overall.
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
 AMD version)
 
 - s390: polling for interrupts after a VCPU goes to halted state is
 now enabled for s390; use hardware provided information about facility
 bits that do not need any hypervisor activity, and other fixes for
 cpu models and facilities; improve perf output; floating interrupt
 controller improvements.
 
 - MIPS: miscellaneous fixes
 
 - PPC: bugfixes only
 
 - ARM: 16K page size support, generic firmware probing layer for
 timer and GIC
 
 Christoffer Dall (KVM-ARM maintainer) says:
 "There are a few changes in this pull request touching things outside
  KVM, but they should all carry the necessary acks and it made the
  merge process much easier to do it this way."
 
 though actually the irqchip maintainers' acks didn't make it into the
 patches.  Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
 later acked at http://mid.gmane.org/573351D1.4060303@arm.com
 "more formally and for documentation purposes".
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
 LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
 xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
 hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
 Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
 RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
 =llO5
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM updates from Paolo Bonzini:
 "Small release overall.

  x86:
   - miscellaneous fixes
   - AVIC support (local APIC virtualization, AMD version)

  s390:
   - polling for interrupts after a VCPU goes to halted state is now
     enabled for s390
   - use hardware provided information about facility bits that do not
     need any hypervisor activity, and other fixes for cpu models and
     facilities
   - improve perf output
   - floating interrupt controller improvements.

  MIPS:
   - miscellaneous fixes

  PPC:
   - bugfixes only

  ARM:
   - 16K page size support
   - generic firmware probing layer for timer and GIC

  Christoffer Dall (KVM-ARM maintainer) says:
    "There are a few changes in this pull request touching things
     outside KVM, but they should all carry the necessary acks and it
     made the merge process much easier to do it this way."

  though actually the irqchip maintainers' acks didn't make it into the
  patches.  Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
  later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
  formally and for documentation purposes')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
  KVM: MTRR: remove MSR 0x2f8
  KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
  svm: Manage vcpu load/unload when enable AVIC
  svm: Do not intercept CR8 when enable AVIC
  svm: Do not expose x2APIC when enable AVIC
  KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
  svm: Add VMEXIT handlers for AVIC
  svm: Add interrupt injection via AVIC
  KVM: x86: Detect and Initialize AVIC support
  svm: Introduce new AVIC VMCB registers
  KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
  KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
  KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
  KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
  KVM: x86: Misc LAPIC changes to expose helper functions
  KVM: shrink halt polling even more for invalid wakeups
  KVM: s390: set halt polling to 80 microseconds
  KVM: halt_polling: provide a way to qualify wakeups during poll
  KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
  kvm: Conditionally register IRQ bypass consumer
  ...
2016-05-19 11:27:09 -07:00
Michael Holzheu 6edf0aa4f8 s390/bpf: fix recache skb->data/hlen for skb_vlan_push/pop
In case of usage of skb_vlan_push/pop, in the prologue we store
the SKB pointer on the stack and restore it after BPF_JMP_CALL
to skb_vlan_push/pop.

Unfortunately currently there are two bugs in the code:

 1) The wrong stack slot (offset 170 instead of 176) is used
 2) The wrong register (W1 instead of B1) is saved

So fix this and use correct stack slot and register.

Fixes: 9db7f2b818 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop")
Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-19 09:14:27 +02:00
Michael Holzheu 0fa963553a s390/bpf: reduce maximum program size to 64 KB
The s390 BFP compiler currently uses relative branch instructions
that only support jumps up to 64 KB. Examples are "j", "jnz", "cgrj",
etc.  Currently the maximum size of s390 BPF programs is set
to 0x7ffff.  If branches over 64 KB are generated the, kernel can
crash due to incorrect code.

So fix this an reduce the maximum size to 64 KB. Programs larger than
that will be interpreted.

Fixes: ce2b6ad9c1 ("s390/bpf: increase BPF_SIZE_MAX")

Cc: stable@vger.kernel.org # 4.3+
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2016-05-19 09:14:27 +02:00
Linus Torvalds f61a657fdf Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "The s390 patches for the 4.7 merge window have the usual bug fixes and
  cleanups, and the following new features:

   - An interface for dasd driver to query if a volume is online to
     another operating system

   - A new ioctl for the dasd driver to verify the format for a range of
     tracks

   - Following the example of x86 the struct fpu is now allocated with
     the task_struct

   - The 'report_error' interface for the PCI bus to send an
     adapter-error notification from user space to the service element
     of the machine"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
  s390/vmem: remove unused function parameter
  s390/vmem: fix identity mapping
  s390: add missing include statements
  s390: add missing declarations
  s390: make couple of variables and functions static
  s390/cache: remove superfluous locking
  s390/cpuinfo: simplify locking and skip offline cpus early
  s390/3270: hangup the 3270 tty after a disconnect
  s390/3270: handle reconnect of a tty with a different size
  s390/3270: avoid endless I/O loop with disconnected 3270 terminals
  s390/3270: fix garbled output on 3270 tty view
  s390/3270: fix view reference counting
  s390/3270: add missing tty_kref_put
  s390/dumpstack: implement and use return_address()
  s390/cpum_sf: Remove superfluous SMP function call
  s390/cpum_cf: Remove superfluous SMP function call
  s390/Kconfig: make z196 the default processor type
  s390/sclp: avoid compile warning in sclp_pci_report
  s390/fpu: allocate 'struct fpu' with the task_struct
  s390/crypto: cleanup and move the header with the cpacf definitions
  ...
2016-05-18 12:17:16 -07:00
Linus Torvalds 0b86c75db6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina:

 - remove of our own implementation of architecture-specific relocation
   code and leveraging existing code in the module loader to perform
   arch-dependent work, from Jessica Yu.

   The relevant patches have been acked by Rusty (for module.c) and
   Heiko (for s390).

 - live patching support for ppc64le, which is a joint work of Michael
   Ellerman and Torsten Duwe.  This is coming from topic branch that is
   share between livepatching.git and ppc tree.

 - addition of livepatching documentation from Petr Mladek

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: make object/func-walking helpers more robust
  livepatch: Add some basic livepatch documentation
  powerpc/livepatch: Add live patching support on ppc64le
  powerpc/livepatch: Add livepatch stack to struct thread_info
  powerpc/livepatch: Add livepatch header
  livepatch: Allow architectures to specify an alternate ftrace location
  ftrace: Make ftrace_location_range() global
  livepatch: robustify klp_register_patch() API error checking
  Documentation: livepatch: outline Elf format and requirements for patch modules
  livepatch: reuse module loader code to write relocations
  module: s390: keep mod_arch_specific for livepatch modules
  module: preserve Elf information for livepatch modules
  Elf: add livepatch-specific Elf constants
2016-05-17 17:11:27 -07:00
Linus Torvalds a7fd20d1c4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support SPI based w5100 devices, from Akinobu Mita.

   2) Partial Segmentation Offload, from Alexander Duyck.

   3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE.

   4) Allow cls_flower stats offload, from Amir Vadai.

   5) Implement bpf blinding, from Daniel Borkmann.

   6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is
      actually using FASYNC these atomics are superfluous.  From Eric
      Dumazet.

   7) Run TCP more preemptibly, also from Eric Dumazet.

   8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e
      driver, from Gal Pressman.

   9) Allow creating ppp devices via rtnetlink, from Guillaume Nault.

  10) Improve BPF usage documentation, from Jesper Dangaard Brouer.

  11) Support tunneling offloads in qed, from Manish Chopra.

  12) aRFS offloading in mlx5e, from Maor Gottlieb.

  13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo
      Leitner.

  14) Add MSG_EOR support to TCP, this allows controlling packet
      coalescing on application record boundaries for more accurate
      socket timestamp sampling.  From Martin KaFai Lau.

  15) Fix alignment of 64-bit netlink attributes across the board, from
      Nicolas Dichtel.

  16) Per-vlan stats in bridging, from Nikolay Aleksandrov.

  17) Several conversions of drivers to ethtool ksettings, from Philippe
      Reynes.

  18) Checksum neutral ILA in ipv6, from Tom Herbert.

  19) Factorize all of the various marvell dsa drivers into one, from
      Vivien Didelot

  20) Add VF support to qed driver, from Yuval Mintz"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits)
  Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m"
  Revert "phy dp83867: Make rgmii parameters optional"
  r8169: default to 64-bit DMA on recent PCIe chips
  phy dp83867: Make rgmii parameters optional
  phy dp83867: Fix compilation with CONFIG_OF_MDIO=m
  bpf: arm64: remove callee-save registers use for tmp registers
  asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions
  switchdev: pass pointer to fib_info instead of copy
  net_sched: close another race condition in tcf_mirred_release()
  tipc: fix nametable publication field in nl compat
  drivers: net: Don't print unpopulated net_device name
  qed: add support for dcbx.
  ravb: Add missing free_irq() calls to ravb_close()
  qed: Remove a stray tab
  net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fec-mpc52xx: use phydev from struct net_device
  bpf, doc: fix typo on bpf_asm descriptions
  stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
  net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings
  net: ethernet: fs-enet: use phydev from struct net_device
  ...
2016-05-17 16:26:30 -07:00
Arnaldo Carvalho de Melo cfbcf46845 perf core: Pass max stack as a perf_callchain_entry context
This makes perf_callchain_{user,kernel}() receive the max stack
as context for the perf_callchain_entry, instead of accessing
the global sysctl_perf_event_max_stack.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2016-05-16 23:11:50 -03:00
Linus Torvalds 825a3b2605 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:

 - massive CPU hotplug rework (Thomas Gleixner)

 - improve migration fairness (Peter Zijlstra)

 - CPU load calculation updates/cleanups (Yuyang Du)

 - cpufreq updates (Steve Muckle)

 - nohz optimizations (Frederic Weisbecker)

 - switch_mm() micro-optimization on x86 (Andy Lutomirski)

 - ... lots of other enhancements, fixes and cleanups.

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (66 commits)
  ARM: Hide finish_arch_post_lock_switch() from modules
  sched/core: Provide a tsk_nr_cpus_allowed() helper
  sched/core: Use tsk_cpus_allowed() instead of accessing ->cpus_allowed
  sched/loadavg: Fix loadavg artifacts on fully idle and on fully loaded systems
  sched/fair: Correct unit of load_above_capacity
  sched/fair: Clean up scale confusion
  sched/nohz: Fix affine unpinned timers mess
  sched/fair: Fix fairness issue on migration
  sched/core: Kill sched_class::task_waking to clean up the migration logic
  sched/fair: Prepare to fix fairness problems on migration
  sched/fair: Move record_wakee()
  sched/core: Fix comment typo in wake_q_add()
  sched/core: Remove unused variable
  sched: Make hrtick_notifier an explicit call
  sched/fair: Make ilb_notifier an explicit call
  sched/hotplug: Make activate() the last hotplug step
  sched/hotplug: Move migration CPU_DYING to sched_cpu_dying()
  sched/migration: Move CPU_ONLINE into scheduler state
  sched/migration: Move calc_load_migrate() into CPU_DYING
  sched/migration: Move prepare transition to SCHED_STARTING state
  ...
2016-05-16 14:47:16 -07:00