Merge updates from Andrew Morton:
- fsnotify fix
- poll() timeout fix
- a few scripts/ tweaks
- debugobjects updates
- the (small) ocfs2 queue
- Minor fixes to kernel/padata.c
- Maybe half of the MM queue
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits)
mm, page_alloc: restore the original nodemask if the fast path allocation failed
mm, page_alloc: uninline the bad page part of check_new_page()
mm, page_alloc: don't duplicate code in free_pcp_prepare
mm, page_alloc: defer debugging checks of pages allocated from the PCP
mm, page_alloc: defer debugging checks of freed pages until a PCP drain
cpuset: use static key better and convert to new API
mm, page_alloc: inline pageblock lookup in page free fast paths
mm, page_alloc: remove unnecessary variable from free_pcppages_bulk
mm, page_alloc: pull out side effects from free_pages_check
mm, page_alloc: un-inline the bad part of free_pages_check
mm, page_alloc: check multiple page fields with a single branch
mm, page_alloc: remove field from alloc_context
mm, page_alloc: avoid looking up the first zone in a zonelist twice
mm, page_alloc: shortcut watermark checks for order-0 pages
mm, page_alloc: reduce cost of fair zone allocation policy retry
mm, page_alloc: shorten the page allocator fast path
mm, page_alloc: check once if a zone has isolated pageblocks
mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath
mm, page_alloc: simplify last cpupid reset
mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask()
...
I've just discovered that the useful-sounding has_transparent_hugepage()
is actually an architecture-dependent minefield: on some arches it only
builds if CONFIG_TRANSPARENT_HUGEPAGE=y, on others it's also there when
not, but on some of those (arm and arm64) it then gives the wrong
answer; and on mips alone it's marked __init, which would crash if
called later (but so far it has not been called later).
Straighten this out: make it available to all configs, with a sensible
default in asm-generic/pgtable.h, removing its definitions from those
arches (arc, arm, arm64, sparc, tile) which are served by the default,
adding #define has_transparent_hugepage has_transparent_hugepage to
those (mips, powerpc, s390, x86) which need to override the default at
runtime, and removing the __init from mips (but maybe that kind of code
should be avoided after init: set a static variable the first time it's
called).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Yang Shi <yang.shi@linaro.org>
Cc: Ning Qu <quning@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arch/arc]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [arch/s390]
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- x86: miscellaneous fixes, AVIC support (local APIC virtualization,
AMD version)
- s390: polling for interrupts after a VCPU goes to halted state is
now enabled for s390; use hardware provided information about facility
bits that do not need any hypervisor activity, and other fixes for
cpu models and facilities; improve perf output; floating interrupt
controller improvements.
- MIPS: miscellaneous fixes
- PPC: bugfixes only
- ARM: 16K page size support, generic firmware probing layer for
timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things outside
KVM, but they should all carry the necessary acks and it made the
merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com
"more formally and for documentation purposes".
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJXPJjyAAoJEL/70l94x66DhioH/j4fwQ0FmfPSM9PArzaFHQdx
LNE3tU4+bobbsy1BJr4DiAaOUQn3DAgwUvGLWXdeLiOXtoWXBiFHKaxlqEsCA6iQ
xcTH1TgfxsVoqGQ6bT9X/2GCx70heYpcWG3f+zqBy7ZfFmQykLAC/HwOr52VQL8f
hUFi3YmTHcnorp0n5Xg+9r3+RBS4D/kTbtdn6+KCLnPJ0RcgNkI3/NcafTemoofw
Tkv8+YYFNvKV13qlIfVqxMa0GwWI3pP6YaNKhaS5XO8Pu16HuuF1JthJsUBDzwBa
RInp8R9MoXgsBYhLpz3jc9vWG7G9yDl5LehsD9KOUGOaFYJ7sQN+QZOusa6jFgA=
=llO5
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"Small release overall.
x86:
- miscellaneous fixes
- AVIC support (local APIC virtualization, AMD version)
s390:
- polling for interrupts after a VCPU goes to halted state is now
enabled for s390
- use hardware provided information about facility bits that do not
need any hypervisor activity, and other fixes for cpu models and
facilities
- improve perf output
- floating interrupt controller improvements.
MIPS:
- miscellaneous fixes
PPC:
- bugfixes only
ARM:
- 16K page size support
- generic firmware probing layer for timer and GIC
Christoffer Dall (KVM-ARM maintainer) says:
"There are a few changes in this pull request touching things
outside KVM, but they should all carry the necessary acks and it
made the merge process much easier to do it this way."
though actually the irqchip maintainers' acks didn't make it into the
patches. Marc Zyngier, who is both irqchip and KVM-ARM maintainer,
later acked at http://mid.gmane.org/573351D1.4060303@arm.com ('more
formally and for documentation purposes')"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (82 commits)
KVM: MTRR: remove MSR 0x2f8
KVM: x86: make hwapic_isr_update and hwapic_irr_update look the same
svm: Manage vcpu load/unload when enable AVIC
svm: Do not intercept CR8 when enable AVIC
svm: Do not expose x2APIC when enable AVIC
KVM: x86: Introducing kvm_x86_ops.apicv_post_state_restore
svm: Add VMEXIT handlers for AVIC
svm: Add interrupt injection via AVIC
KVM: x86: Detect and Initialize AVIC support
svm: Introduce new AVIC VMCB registers
KVM: split kvm_vcpu_wake_up from kvm_vcpu_kick
KVM: x86: Introducing kvm_x86_ops VCPU blocking/unblocking hooks
KVM: x86: Introducing kvm_x86_ops VM init/destroy hooks
KVM: x86: Rename kvm_apic_get_reg to kvm_lapic_get_reg
KVM: x86: Misc LAPIC changes to expose helper functions
KVM: shrink halt polling even more for invalid wakeups
KVM: s390: set halt polling to 80 microseconds
KVM: halt_polling: provide a way to qualify wakeups during poll
KVM: PPC: Book3S HV: Re-enable XICS fast path for irqfd-generated interrupts
kvm: Conditionally register IRQ bypass consumer
...
Pull s390 updates from Martin Schwidefsky:
"The s390 patches for the 4.7 merge window have the usual bug fixes and
cleanups, and the following new features:
- An interface for dasd driver to query if a volume is online to
another operating system
- A new ioctl for the dasd driver to verify the format for a range of
tracks
- Following the example of x86 the struct fpu is now allocated with
the task_struct
- The 'report_error' interface for the PCI bus to send an
adapter-error notification from user space to the service element
of the machine"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (29 commits)
s390/vmem: remove unused function parameter
s390/vmem: fix identity mapping
s390: add missing include statements
s390: add missing declarations
s390: make couple of variables and functions static
s390/cache: remove superfluous locking
s390/cpuinfo: simplify locking and skip offline cpus early
s390/3270: hangup the 3270 tty after a disconnect
s390/3270: handle reconnect of a tty with a different size
s390/3270: avoid endless I/O loop with disconnected 3270 terminals
s390/3270: fix garbled output on 3270 tty view
s390/3270: fix view reference counting
s390/3270: add missing tty_kref_put
s390/dumpstack: implement and use return_address()
s390/cpum_sf: Remove superfluous SMP function call
s390/cpum_cf: Remove superfluous SMP function call
s390/Kconfig: make z196 the default processor type
s390/sclp: avoid compile warning in sclp_pci_report
s390/fpu: allocate 'struct fpu' with the task_struct
s390/crypto: cleanup and move the header with the cpacf definitions
...
Pull livepatching updates from Jiri Kosina:
- remove of our own implementation of architecture-specific relocation
code and leveraging existing code in the module loader to perform
arch-dependent work, from Jessica Yu.
The relevant patches have been acked by Rusty (for module.c) and
Heiko (for s390).
- live patching support for ppc64le, which is a joint work of Michael
Ellerman and Torsten Duwe. This is coming from topic branch that is
share between livepatching.git and ppc tree.
- addition of livepatching documentation from Petr Mladek
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: make object/func-walking helpers more robust
livepatch: Add some basic livepatch documentation
powerpc/livepatch: Add live patching support on ppc64le
powerpc/livepatch: Add livepatch stack to struct thread_info
powerpc/livepatch: Add livepatch header
livepatch: Allow architectures to specify an alternate ftrace location
ftrace: Make ftrace_location_range() global
livepatch: robustify klp_register_patch() API error checking
Documentation: livepatch: outline Elf format and requirements for patch modules
livepatch: reuse module loader code to write relocations
module: s390: keep mod_arch_specific for livepatch modules
module: preserve Elf information for livepatch modules
Elf: add livepatch-specific Elf constants
Pull support for killable rwsems from Ingo Molnar:
"This, by Michal Hocko, implements down_write_killable().
The main usecase will be to update mm_sem usage sites to use this new
API, to allow the mm-reaper introduced in commit aac4536355 ("mm,
oom: introduce oom reaper") to tear down oom victim address spaces
asynchronously with minimum latencies and without deadlock worries"
[ The vfs will want it too as the inode lock is changed from a mutex to
a rwsem due to the parallel lookup and readdir updates ]
* 'locking-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Fix comment on register clobbering
locking/rwsem: Fix down_write_killable()
locking/rwsem, x86: Add frame annotation for call_rwsem_down_write_failed_killable()
locking/rwsem: Provide down_write_killable()
locking/rwsem, x86: Provide __down_write_killable()
locking/rwsem, s390: Provide __down_write_killable()
locking/rwsem, ia64: Provide __down_write_killable()
locking/rwsem, alpha: Provide __down_write_killable()
locking/rwsem: Introduce basis for down_write_killable()
locking/rwsem, sparc: Drop superfluous arch specific implementation
locking/rwsem, sh: Drop superfluous arch specific implementation
locking/rwsem, xtensa: Drop superfluous arch specific implementation
locking/rwsem: Drop explicit memory barriers
locking/rwsem: Get rid of __down_write_nested()
on s390 we disabled the halt polling with commit 920552b213
("KVM: disable halt_poll_ns as default for s390x"), as floating
interrupts would let all CPUs have a successful poll, resulting
in much higher CPU usage (on otherwise idle systems).
With the improved selection of polls we can now retry halt polling.
Performance measurements with different choices like 25,50,80,100,200
microseconds showed that 80 microseconds seems to improve several cases
without increasing the CPU costs too much. Higher values would improve
the performance even more but increased the cpu time as well.
So let's start small and use this value of 80 microseconds on s390 until
we have a better understanding of cost/benefit of higher values.
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some wakeups should not be considered a sucessful poll. For example on
s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
would be considered runnable - letting all vCPUs poll all the time for
transactional like workload, even if one vCPU would be enough.
This can result in huge CPU usage for large guests.
This patch lets architectures provide a way to qualify wakeups if they
should be considered a good/bad wakeups in regard to polls.
For s390 the implementation will fence of halt polling for anything but
known good, single vCPU events. The s390 implementation for floating
interrupts does a wakeup for one vCPU, but the interrupt will be delivered
by whatever CPU checks first for a pending interrupt. We prefer the
woken up CPU by marking the poll of this CPU as "good" poll.
This code will also mark several other wakeup reasons like IPI or
expired timers as "good". This will of course also mark some events as
not sucessful. As KVM on z runs always as a 2nd level hypervisor,
we prefer to not poll, unless we are really sure, though.
This patch successfully limits the CPU usage for cases like uperf 1byte
transactional ping pong workload or wakeup heavy workload like OLTP
while still providing a proper speedup.
This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
wakeups that are considered not good for polling.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
Cc: David Matlack <dmatlack@google.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
[Rename config symbol. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
arch_dup_task_struct and the per cpu variable mt_cycles are globally
visible, but do not have any header file with a declaration.
Therefore add it so we have proper type checking in place.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Let's add hypervisor-managed facility-apportionment indications field to
SCLP structs. KVM will use it to reduce maintenance cost of
Non-Hypervisor-Managed facility bits.
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
We only have one cpuid for all VCPUs, so let's directly use the one in the
cpu model. Also always store it directly as u64, no need for struct cpuid.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
If we don't have SIGP SENSE RUNNING STATUS enabled for the guest, let's
not enable interpretation so we can correctly report an invalid order.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Implement return_address() and use it instead of __builtin_return_address(n).
__builtin_return_address(n) is not guaranteed to work for n > 0,
therefore implement a private return_address() function which walks
the stack frames and returns the proper return address.
This way we get also rid of a compile warning which gcc 6.1 emits and
look like all other architectures.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Analog to git commit 0c8c0f03e3
"x86/fpu, sched: Dynamically allocate 'struct fpu'"
move the struct fpu to the end of the struct thread_struct,
set CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT and add the
setup_task_size() function to calculate the correct size
fo the task struct.
For the performance_defconfig this increases the size of
struct task_struct from 7424 bytes to 7936 bytes (MACHINE_HAS_VX==1)
or 7552 bytes (MACHINE_HAS_VX==0). The dynamic allocation of the
struct fpu is removed. The slab cache uses an 8KB block for the
task struct in all cases, there is enough room for the struct fpu.
For MACHINE_HAS_VX==1 each task now needs 512 bytes less memory.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
There is a race with multi-threaded applications between context switch and
pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
pagetable upgrade on another thread in crst_table_upgrade() could already
have set new asce_bits, but not yet the new mm->pgd. This would result in a
corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
translation exception.
Fix this by storing the complete asce instead of just the asce_bits, which
can then be read atomically from switch_mm(), so that it either sees the
old value or the new value, but no mixture. Both cases are OK. Having the
old value would result in a page fault on access to the higher level memory,
but the fault handler would see the new mm->pgd, if it was a valid access
after the mmap on the other thread has completed. So as worst-case scenario
we would have a page fault loop for the racing thread until the next time
slice.
Also remove dead code and simplify the upgrade/downgrade path, there are no
upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
There are also no concurrent upgrades, because the mmap_sem is held with
down_write() in do_mmap, so the flush and table checks during upgrade can
be removed.
Reported-by: Michael Munday <munday@ca.ibm.com>
Reviewed-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce a FLIC operation for clearing I/O interrupts for a subchannel.
Rationale: According to the platform specification, pending I/O
interruption requests have to be revoked in certain situations. For
instance, according to the Principles of Operation (page 17-27), a
subchannel put into the installed parameters initialized state is in the
same state as after an I/O system reset (just parameters possibly changed).
This implies that any I/O interrupts for that subchannel are no longer
pending (as I/O system resets clear I/O interrupts). Therefore, we need an
interface to clear pending I/O interrupts.
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
perf kvm stat can decode sigp events, let's make
the list complete by adding the missing ones.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
The CPACF instructions are going be used in KVM as well, move the
defines and the inline functions from arch/s390/crypt/crypt_s390.h
to arch/s390/include/asm. Rename the header to cpacf.h and replace
the crypt_s390_xxx names with cpacf_xxx.
While we are at it, cleanup the header as well. The encoding for
the CPACF operations is odd, there is an enum for each of the CPACF
instructions with the hardware function code in the lower 8 bits of
each entry and a software defined number for the CPACF instruction
in the upper 8 bits. Remove the superfluous software number and
replace the enums with simple defines.
The crypt_s390_func_available() function tests for the presence
of a specific CPACF operations. The new name of the function is
cpacf_query and it works slightly different than before. It gets
passed an opcode of an CPACF instruction and a function code for
this instruction. The facility_mask parameter is gone, the opcode
is used to find the correct MSA facility bit to check if the CPACF
instruction itself is available. If it is the query function of the
given instruction is used to test if the requested CPACF operation
is present.
Acked-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement the function type specific function measurement block used
in new machines.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Implement new DASD IOCTL BIODASDCHECKFMT to check a range of tracks on a
DASD volume for correct formatting. The following characteristics are
checked:
- Block size
- ECKD key length
- ECKD record ID
- Number of records per track
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Add SCLP event 24 "Adapter-error notification".
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Introduce ___down_write() for the fast path and reuse it for __down_write()
resp. __down_write_killable() each using the respective generic slow path
(rwsem_down_write_failed() resp. rwsem_down_write_failed_killable()).
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-10-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This is no longer used anywhere and all callers (__down_write()) use
0 as a subclass. Ditch __down_write_nested() to make the code easier
to follow.
This shouldn't introduce any functional change.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: David S. Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-alpha@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-s390@vger.kernel.org
Cc: linux-sh@vger.kernel.org
Cc: linux-xtensa@linux-xtensa.org
Cc: sparclinux@vger.kernel.org
Link: http://lkml.kernel.org/r/1460041951-22347-2-git-send-email-mhocko@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Fixes this build error on linux-next:
kernel/seccomp.c: In function '__secure_computing_strict':
kernel/seccomp.c:526:3: error: implicit declaration of function
'get_compat_mode1_syscalls'
The retrieval of compat syscall numbers were moved into inline function
defined in asm-generic header but the asm-generic header is not being
used by s390.
[heiko.carstens@de.ibm.com]: even though the build error will trigger
only in the next merge window it makes sense to include the generic
header file already now.
Fixes: ("seccomp: Get compat syscalls from asm-generic header")
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Newer machines might use a different (larger) format for function
measurement blocks. To ensure that we comply with the alignment
requirement on these machines and prevent memory corruption (when
firmware writes more data than we expect) add 16 padding bytes
at the end of the fmb.
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reuse module loader code to write relocations, thereby eliminating the need
for architecture specific relocation code in livepatch. Specifically, reuse
the apply_relocate_add() function in the module loader to write relocations
instead of duplicating functionality in livepatch's arch-dependent
klp_write_module_reloc() function.
In order to accomplish this, livepatch modules manage their own relocation
sections (marked with the SHF_RELA_LIVEPATCH section flag) and
livepatch-specific symbols (marked with SHN_LIVEPATCH symbol section
index). To apply livepatch relocation sections, livepatch symbols
referenced by relocs are resolved and then apply_relocate_add() is called
to apply those relocations.
In addition, remove x86 livepatch relocation code and the s390
klp_write_module_reloc() function stub. They are no longer needed since
relocation work has been offloaded to module loader.
Lastly, mark the module as a livepatch module so that the module loader
canappropriately identify and initialize it.
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # for s390 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Pull s390 fixes from Martin Schwidefsky:
- A proper fix for the locking issue in the dasd driver
- Wire up the new preadv2 nad pwritev2 system calls
- Add the mark_rodata_ro function and set DEBUG_RODATA=y
- A few more bug fixes.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390: wire up preadv2/pwritev2 syscalls
s390/pci: PCI function group 0 is valid for clp_query_pci_fn
s390/crypto: provide correct file mode at device register.
s390/mm: handle PTE-mapped tail pages in fast gup
s390: add DEBUG_RODATA support
s390: disable postinit-readonly for now
s390/dasd: reorder lcu and device lock
s390/cpum_sf: Fix cpu hotplug notifier transitions
s390/cpum_cf: Fix missing cpu hotplug notifier transition
Replace the arch specific versions of search_extable() and
sort_extable() with calls to the generic ones, which now support
relative exception tables as well.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 protection key support from Ingo Molnar:
"This tree adds support for a new memory protection hardware feature
that is available in upcoming Intel CPUs: 'protection keys' (pkeys).
There's a background article at LWN.net:
https://lwn.net/Articles/643797/
The gist is that protection keys allow the encoding of
user-controllable permission masks in the pte. So instead of having a
fixed protection mask in the pte (which needs a system call to change
and works on a per page basis), the user can map a (handful of)
protection mask variants and can change the masks runtime relatively
cheaply, without having to change every single page in the affected
virtual memory range.
This allows the dynamic switching of the protection bits of large
amounts of virtual memory, via user-space instructions. It also
allows more precise control of MMU permission bits: for example the
executable bit is separate from the read bit (see more about that
below).
This tree adds the MM infrastructure and low level x86 glue needed for
that, plus it adds a high level API to make use of protection keys -
if a user-space application calls:
mmap(..., PROT_EXEC);
or
mprotect(ptr, sz, PROT_EXEC);
(note PROT_EXEC-only, without PROT_READ/WRITE), the kernel will notice
this special case, and will set a special protection key on this
memory range. It also sets the appropriate bits in the Protection
Keys User Rights (PKRU) register so that the memory becomes unreadable
and unwritable.
So using protection keys the kernel is able to implement 'true'
PROT_EXEC on x86 CPUs: without protection keys PROT_EXEC implies
PROT_READ as well. Unreadable executable mappings have security
advantages: they cannot be read via information leaks to figure out
ASLR details, nor can they be scanned for ROP gadgets - and they
cannot be used by exploits for data purposes either.
We know about no user-space code that relies on pure PROT_EXEC
mappings today, but binary loaders could start making use of this new
feature to map binaries and libraries in a more secure fashion.
There is other pending pkeys work that offers more high level system
call APIs to manage protection keys - but those are not part of this
pull request.
Right now there's a Kconfig that controls this feature
(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) that is default enabled
(like most x86 CPU feature enablement code that has no runtime
overhead), but it's not user-configurable at the moment. If there's
any serious problem with this then we can make it configurable and/or
flip the default"
* 'mm-pkeys-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
x86/mm/pkeys: Fix mismerge of protection keys CPUID bits
mm/pkeys: Fix siginfo ABI breakage caused by new u64 field
x86/mm/pkeys: Fix access_error() denial of writes to write-only VMA
mm/core, x86/mm/pkeys: Add execute-only protection keys support
x86/mm/pkeys: Create an x86 arch_calc_vm_prot_bits() for VMA flags
x86/mm/pkeys: Allow kernel to modify user pkey rights register
x86/fpu: Allow setting of XSAVE state
x86/mm: Factor out LDT init from context init
mm/core, x86/mm/pkeys: Add arch_validate_pkey()
mm/core, arch, powerpc: Pass a protection key in to calc_vm_flag_bits()
x86/mm/pkeys: Actually enable Memory Protection Keys in the CPU
x86/mm/pkeys: Add Kconfig prompt to existing config option
x86/mm/pkeys: Dump pkey from VMA in /proc/pid/smaps
x86/mm/pkeys: Dump PKRU with other kernel registers
mm/core, x86/mm/pkeys: Differentiate instruction fetches
x86/mm/pkeys: Optimize fault handling in access_error()
mm/core: Do not enforce PKEY permissions on remote mm access
um, pkeys: Add UML arch_*_access_permitted() methods
mm/gup, x86/mm/pkeys: Check VMAs and PTEs for protection keys
x86/mm/gup: Simplify get_user_pages() PTE bit handling
...
This adds basic polling support for vhost.
Reworks virtio to optionally use DMA API, fixing it on Xen.
Balloon stats gained a new entry.
Using the new napi_alloc_skb speeds up virtio net.
virtio blk stats can now be read while another VCPU
us busy inflating or deflating the balloon.
Plus misc cleanups in various places.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJW7qRJAAoJECgfDbjSjVRpVNoH/A7z+lZ6nooSJ9fUBtAAlwit
mE1VKi8g0G6naV1NVLFVe7hPAejExGiHfR3ZUrVoenJKj2yeW/DFojFC10YR/KTe
ac7Imuc+owA3UOE/QpeGBs59+EEWKTZUYt6r8HSJVwoodeosw9v2ecP/Iwhbax8H
a4V3HqOADjKnHg73R9o3u+bAgA1GrGYHeK0AfhCBSTNwlPdxkvf0463HgfOpM4nl
/sNoFWO3vOyekk+loIk+jpmWVIoIfG2NFzW4lPwEPkfqUBX7r0ei/NR23hIqHL7r
QZ6vMj1Ew9qctUONbJu4kXjuV2Vk9NhxwbDjoJtm8plKL2hz2prJynUEogkHh2g=
=VMD0
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio/vhost updates from Michael Tsirkin:
"New features, performance improvements, cleanups:
- basic polling support for vhost
- rework virtio to optionally use DMA API, fixing it on Xen
- balloon stats gained a new entry
- using the new napi_alloc_skb speeds up virtio net
- virtio blk stats can now be read while another VCPU is busy
inflating or deflating the balloon
plus misc cleanups in various places"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
vhost_net: basic polling support
vhost: introduce vhost_vq_avail_empty()
vhost: introduce vhost_has_work()
virtio_balloon: Allow to resize and update the balloon stats in parallel
virtio_balloon: Use a workqueue instead of "vballoon" kthread
virtio/s390: size of SET_IND payload
virtio/s390: use dev_to_virtio
vhost: rename vhost_init_used()
vhost: rename cross-endian helpers
virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
vring: Use the DMA API on Xen
virtio_pci: Use the DMA API if enabled
virtio_mmio: Use the DMA API if enabled
virtio: Add improved queue allocation API
virtio_ring: Support DMA APIs
vring: Introduce vring_use_dma_api()
s390/dma: Allow per device dma ops
alpha/dma: use common noop dma ops
dma: Provide simple noop dma ops
Pull networking updates from David Miller:
"Highlights:
1) Support more Realtek wireless chips, from Jes Sorenson.
2) New BPF types for per-cpu hash and arrap maps, from Alexei
Starovoitov.
3) Make several TCP sysctls per-namespace, from Nikolay Borisov.
4) Allow the use of SO_REUSEPORT in order to do per-thread processing
of incoming TCP/UDP connections. The muxing can be done using a
BPF program which hashes the incoming packet. From Craig Gallek.
5) Add a multiplexer for TCP streams, to provide a messaged based
interface. BPF programs can be used to determine the message
boundaries. From Tom Herbert.
6) Add 802.1AE MACSEC support, from Sabrina Dubroca.
7) Avoid factorial complexity when taking down an inetdev interface
with lots of configured addresses. We were doing things like
traversing the entire address less for each address removed, and
flushing the entire netfilter conntrack table for every address as
well.
8) Add and use SKB bulk free infrastructure, from Jesper Brouer.
9) Allow offloading u32 classifiers to hardware, and implement for
ixgbe, from John Fastabend.
10) Allow configuring IRQ coalescing parameters on a per-queue basis,
from Kan Liang.
11) Extend ethtool so that larger link mode masks can be supported.
From David Decotigny.
12) Introduce devlink, which can be used to configure port link types
(ethernet vs Infiniband, etc.), port splitting, and switch device
level attributes as a whole. From Jiri Pirko.
13) Hardware offload support for flower classifiers, from Amir Vadai.
14) Add "Local Checksum Offload". Basically, for a tunneled packet
the checksum of the outer header is 'constant' (because with the
checksum field filled into the inner protocol header, the payload
of the outer frame checksums to 'zero'), and we can take advantage
of that in various ways. From Edward Cree"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
bonding: fix bond_get_stats()
net: bcmgenet: fix dma api length mismatch
net/mlx4_core: Fix backward compatibility on VFs
phy: mdio-thunder: Fix some Kconfig typos
lan78xx: add ndo_get_stats64
lan78xx: handle statistics counter rollover
RDS: TCP: Remove unused constant
RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
net: smc911x: convert pxa dma to dmaengine
team: remove duplicate set of flag IFF_MULTICAST
bonding: remove duplicate set of flag IFF_MULTICAST
net: fix a comment typo
ethernet: micrel: fix some error codes
ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
bpf, dst: add and use dst_tclassid helper
bpf: make skb->tc_classid also readable
net: mvneta: bm: clarify dependencies
cls_bpf: reset class and reuse major in da
ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
ldmvsw: Add ldmvsw.c driver code
...
Pull livepatching update from Jiri Kosina:
- cleanup of module notifiers; this depends on a module.c cleanup which
has been acked by Rusty; from Jessica Yu
- small assorted fixes and MAINTAINERS update
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch/module: remove livepatch module notifier
modules: split part of complete_formation() into prepare_coming_module()
livepatch: Update maintainers
livepatch: Fix the error message about unresolvable ambiguity
klp: remove CONFIG_LIVEPATCH dependency from klp headers
klp: remove superfluous errors in asm/livepatch.h
This is a temporary fix to let lkdtm run again on s390, though it'll
still fail the ro_after_init tests. Until rodata and ro_after_init
sections can be split on s390, disable special handling of ro_after_init.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reported-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
- Add the CPU id for the new z13s machine
- Add a s390 specific XOR template for RAID-5 checksumming based on the
XC instruction. Remove all other alternatives, XC is always faster
- The merge of our four different stack tracers into a single one
- Tidy up the code related to page tables, several large inline
functions are now out-of-line. Bloat-o-meter reports ~11K text size
reduction
- A binary interface for the priviledged CLP instruction to retrieve
the hardware view of the installed PCI functions
- Improvements for the dasd format code
- Bug fixes and cleanups
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (31 commits)
s390/pci: enforce fmb page boundary rule
s390: fix floating pointer register corruption (again)
s390/cpumf: add missing lpp magic initialization
s390: Fix misspellings in comments
s390/mm: split arch/s390/mm/pgtable.c
s390/mm: uninline pmdp_xxx functions from pgtable.h
s390/mm: uninline ptep_xxx functions from pgtable.h
s390/pci: add ioctl interface for CLP
s390: Use pr_warn instead of pr_warning
s390/dasd: remove casts to dasd_*_private
s390/dasd: Refactor dasd format functions
s390/dasd: Simplify code in format logic
s390/dasd: Improve dasd format code
s390/percpu: remove this_cpu_cmpxchg_double_4
s390/cpumf: Improve guest detection heuristics
s390/fault: merge report_user_fault implementations
s390/dis: use correct escape sequence for '%' character
s390/kvm: simplify set_guest_storage_key
s390/oprofile: add z13/z13s model numbers
s390: add z13s model number to z13 elf platform
...
but lots of architecture-specific changes.
* ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
* PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
* s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
* x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using vector
hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest memory---currently
its only use is to speedup the legacy shadow paging (pre-EPT) case, but
in the future it will be used for virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQEcBAABAgAGBQJW5r3BAAoJEL/70l94x66D2pMH/jTSWWwdTUJMctrDjPVzKzG0
yOzHW5vSLFoFlwEOY2VpslnXzn5TUVmCAfrdmFNmQcSw6hGb3K/xA/ZX/KLwWhyb
oZpr123ycahga+3q/ht/dFUBCCyWeIVMdsLSFwpobEBzPL0pMgc9joLgdUC6UpWX
tmN0LoCAeS7spC4TTiTTpw3gZ/L+aB0B6CXhOMjldb9q/2CsgaGyoVvKA199nk9o
Ngu7ImDt7l/x1VJX4/6E/17VHuwqAdUrrnbqerB/2oJ5ixsZsHMGzxQ3sHCmvyJx
WG5L00ubB1oAJAs9fBg58Y/MdiWX99XqFhdEfxq4foZEiQuCyxygVvq3JwZTxII=
=OUZZ
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini:
"One of the largest releases for KVM... Hardly any generic
changes, but lots of architecture-specific updates.
ARM:
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- various optimizations to the vgic save/restore code.
PPC:
- enabled KVM-VFIO integration ("VFIO device")
- optimizations to speed up IPIs between vcpus
- in-kernel handling of IOMMU hypercalls
- support for dynamic DMA windows (DDW).
s390:
- provide the floating point registers via sync regs;
- separated instruction vs. data accesses
- dirty log improvements for huge guests
- bugfixes and documentation improvements.
x86:
- Hyper-V VMBus hypercall userspace exit
- alternative implementation of lowest-priority interrupts using
vector hashing (for better VT-d posted interrupt support)
- fixed guest debugging with nested virtualizations
- improved interrupt tracking in the in-kernel IOAPIC
- generic infrastructure for tracking writes to guest
memory - currently its only use is to speedup the legacy shadow
paging (pre-EPT) case, but in the future it will be used for
virtual GPUs as well
- much cleanup (LAPIC, kvmclock, MMU, PIT), including ubsan fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (217 commits)
KVM: x86: remove eager_fpu field of struct kvm_vcpu_arch
KVM: x86: disable MPX if host did not enable MPX XSAVE features
arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit
arm64: KVM: vgic-v3: Reset LRs at boot time
arm64: KVM: vgic-v3: Do not save an LR known to be empty
arm64: KVM: vgic-v3: Save maintenance interrupt state only if required
arm64: KVM: vgic-v3: Avoid accessing ICH registers
KVM: arm/arm64: vgic-v2: Make GICD_SGIR quicker to hit
KVM: arm/arm64: vgic-v2: Only wipe LRs on vcpu exit
KVM: arm/arm64: vgic-v2: Reset LRs at boot time
KVM: arm/arm64: vgic-v2: Do not save an LR known to be empty
KVM: arm/arm64: vgic-v2: Move GICH_ELRSR saving to its own function
KVM: arm/arm64: vgic-v2: Save maintenance interrupt state only if required
KVM: arm/arm64: vgic-v2: Avoid accessing GICH registers
KVM: s390: allocate only one DMA page per VM
KVM: s390: enable STFLE interpretation only if enabled for the guest
KVM: s390: wake up when the VCPU cpu timer expires
KVM: s390: step the VCPU timer while in enabled wait
KVM: s390: protect VCPU cpu timer with a seqcount
KVM: s390: step VCPU cpu timer during kvm_run ioctl
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Make schedstats a runtime tunable (disabled by default) and
optimize it via static keys.
As most distributions enable CONFIG_SCHEDSTATS=y due to its
instrumentation value, this is a nice performance enhancement.
(Mel Gorman)
- Implement 'simple waitqueues' (swait): these are just pure
waitqueues without any of the more complex features of full-blown
waitqueues (callbacks, wake flags, wake keys, etc.). Simple
waitqueues have less memory overhead and are faster.
Use simple waitqueues in the RCU code (in 4 different places) and
for handling KVM vCPU wakeups.
(Peter Zijlstra, Daniel Wagner, Thomas Gleixner, Paul Gortmaker,
Marcelo Tosatti)
- sched/numa enhancements (Rik van Riel)
- NOHZ performance enhancements (Rik van Riel)
- Various sched/deadline enhancements (Steven Rostedt)
- Various fixes (Peter Zijlstra)
- ... and a number of other fixes, cleanups and smaller enhancements"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (29 commits)
sched/cputime: Fix steal_account_process_tick() to always return jiffies
sched/deadline: Remove dl_new from struct sched_dl_entity
Revert "kbuild: Add option to turn incompatible pointer check into error"
sched/deadline: Remove superfluous call to switched_to_dl()
sched/debug: Fix preempt_disable_ip recording for preempt_disable()
sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity
time, acct: Drop irq save & restore from __acct_update_integrals()
acct, time: Change indentation in __acct_update_integrals()
sched, time: Remove non-power-of-two divides from __acct_update_integrals()
sched/rt: Kick RT bandwidth timer immediately on start up
sched/debug: Add deadline scheduler bandwidth ratio to /proc/sched_debug
sched/debug: Move sched_domain_sysctl to debug.c
sched/debug: Move the /sys/kernel/debug/sched_features file setup into debug.c
sched/rt: Fix PI handling vs. sched_setscheduler()
sched/core: Remove duplicated sched_group_set_shares() prototype
sched/fair: Consolidate nohz CPU load update code
sched/fair: Avoid using decay_load_missed() with a negative value
sched/deadline: Always calculate end of period on sched_yield()
sched/cgroup: Fix cgroup entity load tracking tear-down
rcu: Use simple wait queues where possible in rcutree
...
The function measurement block must not cross a page boundary. Ensure
that by raising the alignment requirement to the smallest power of 2
larger than the size of the fmb.
Fixes: d0b088531 ("s390/pci: performance statistics and debug infrastructure")
Cc: stable@vger.kernel.org # v3.8+
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This patch updates all instances of csum_tcpudp_magic and
csum_tcpudp_nofold to reflect the types that are usually used as the source
inputs. For example the protocol field is populated based on nexthdr which
is actually an unsigned 8 bit value. The length is usually populated based
on skb->len which is an unsigned integer.
This addresses an issue in which the IPv6 function csum_ipv6_magic was
generating a checksum using the full 32b of skb->len while
csum_tcpudp_magic was only using the lower 16 bits. As a result we could
run into issues when attempting to adjust the checksum as there was no
protocol agnostic way to update it.
With this change the value is still truncated as many architectures use
"(len + proto) << 8", however this truncation only occurs for values
greater than 16776960 in length and as such is unlikely to occur as we stop
the inner headers at ~64K in size.
I did have to make a few minor changes in the arm, mn10300, nios2, and
score versions of the function in order to support these changes as they
were either using things such as an OR to combine the protocol and length,
or were using ntohs to convert the length which would have truncated the
value.
I also updated a few spots in terms of whitespace and type differences for
the addresses. Most of this was just to make sure all of the definitions
were in sync going forward.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- add watchdog diagnose to trace event decoder
- better handle the cpu timer when not inside the guest
- only provide STFLE if the CPU model has STFLE
- reduce DMA page usage
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)
iQIcBAABAgAGBQJW3/XrAAoJEBF7vIC1phx8D7cP/RGZqjVQF8S0tOpzGl2EHRYd
8MyTA2b53wUQx1cR35U1Tx1mKoTBoAcwADVJljCsCzCuGp/Bae9kT+T1kz4/aWEv
vlQqE7+63VkPsgQxFCcKA6CKagFLz+2QBmNFMdYStjHjN/MNBDwfo58I9Y9g2DdB
r7cjdExKC1WhCUguDv8eREtNWGRGC8IDV8ID02xTABTSk7LYcIGok0YIRjsZaV5b
W1gchW5GicOTp/mktkCsNPhnFg28owsG2LVcb5zeGQuq0TowvrSvRzxCOrHM1lKS
WU28FDvFn64bPvSAu1w894YKMsMxvm9SX5SWQ7nA5JkDko5lQ49H77M2YipMLUXY
F+KN9TlS352fzGr5i7TaPUOLAtFDhdtM0NAicUf5+ydR73DLetQOvMqqenmmP1Mw
K9hgAviFtKfnScIomCPrmeel5ddIl0OTshiZMGIbS6/lAGRZ5aJuXPxUgMhFiz95
YgjH2tb16eR0eYAs2PIt2bsJVis4x2MfFi+PExYzWP5vERxzMthE879Ra42NBfT1
tswxtvvQtXgXrkakwYkCTXqnja0JLNOhfKvIL73/am9DbwUmikWfVYOoq/yEe/w9
Gf5ZHzACrokHDqeKbMT6xzGEk4C1QEn7VRc4H4MX47xseVVXb2GxPYFNASgL2SSU
Qe6+IILH9uDhDkXVHIJD
=n2MS
-----END PGP SIGNATURE-----
Merge tag 'kvm-s390-next-4.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: Fixes and features for kvm/next (4.6) part 2
- add watchdog diagnose to trace event decoder
- better handle the cpu timer when not inside the guest
- only provide STFLE if the CPU model has STFLE
- reduce DMA page usage
The fork of a process with four page table levels is broken since
git commit 6252d702c5 "[S390] dynamic page tables."
All new mm contexts are created with three page table levels and
an asce limit of 4TB. If the parent has four levels dup_mmap will
add vmas to the new context which are outside of the asce limit.
The subsequent call to copy_page_range will walk the three level
page table structure of the new process with non-zero pgd and pud
indexes. This leads to memory clobbers as the pgd_index *and* the
pud_index is added to the mm->pgd pointer without a pgd_deref
in between.
The init_new_context() function is selecting the number of page
table levels for a new context. The function is used by mm_init()
which in turn is called by dup_mm() and mm_alloc(). These two are
used by fork() and exec(). The init_new_context() function can
distinguish the two cases by looking at mm->context.asce_limit,
for fork() the mm struct has been copied and the number of page
table levels may not change. For exec() the mm_alloc() function
set the new mm structure to zero, in this case a three-level page
table is created as the temporary stack space is located at
STACK_TOP_MAX = 4TB.
This fixes CVE-2016-2143.
Reported-by: Marcin Kościelnicki <koriakin@0x04.net>
Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJW36xjAAoJECPQ0LrRPXpDGQkQAMDppzcTOixT3e8VPdHAX09a
Z5PO0gyTMVV7Jyz5Ul3pedPJA2GSK9mxOCwqvIFbdxLAR6ZB00juO5FrTHkSdI91
1XLPj4bKoMWcVvhL/g5A4Glp/pVMW1k/9Yq8zZAtYlsLRlqG5rLOutSadcqHcYaJ
cTD/pFf7b2oPtkTPyoFml75KgHBT/8uvAvFDOWA66Id2z6T11+PsBT/6XnGDiwKg
tpGTNzx3kPIKIzOAOHqVW6UBxFOeabebXLT8wUz3VwNn/UbG6gkumMNApMAyF2q1
zU0nAh8+7Ek6Dr4OFWE6BfW6sgg/l7i1lA8XoAmqG7ZTrSptCc59fvaZJxPruG+Q
dMsU6QgR77JJjbZTinf9a1jReZ/liZrx2gZXedVKdILrjmDSq0UnGcxjUOEDZOGy
2/dbrlJhv+LhpcJtuPpxPCfoqbW5L0ynzmuYuXRdRz3lTHiOWIRx5gugrhO+wH4D
4gvZhbw3XCiYfpYHYhl8A1EH5kanKgdXDocz9yIm7mZm89gngufF/HkeXS3ZU25T
yThyBGulGjqN4FCdgf1HolkTfFjnfSx4qJovJ58eHga+HNLXRkTecZZcbFy2OOHv
8Bx0PIlwj4RgSaRLWQUudAhdhKS2g22DKDDljxFwhkMPNghvqkYMJCRDKLu6GBXQ
4YsLKM+TaShHFjSpx+ao
=rpvb
-----END PGP SIGNATURE-----
Merge tag 'kvm-arm-for-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/ARM updates for 4.6
- VHE support so that we can run the kernel at EL2 on ARMv8.1 systems
- PMU support for guests
- 32bit world switch rewritten in C
- Various optimizations to the vgic save/restore code
Conflicts:
include/uapi/linux/kvm.h
Several cases of overlapping changes, as well as one instance
(vxlan) of a bug fix in 'net' overlapping with code movement
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
The pgtable.c file is quite big, before it grows any larger split it
into pgtable.c, pgalloc.c and gmap.c. In addition move the gmap related
header definitions into the new gmap.h header and all of the pgste
helpers from pgtable.h to pgtable.c.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The pmdp_xxx function are smaller than their ptep_xxx counterparts
but to keep things symmetrical unline them as well.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The code in the various ptep_xxx functions has grown quite large,
consolidate them to four out-of-line functions:
ptep_xchg_direct to exchange a pte with another with immediate flushing
ptep_xchg_lazy to exchange a pte with another in a batched update
ptep_modify_prot_start to begin a protection flags update
ptep_modify_prot_commit to commit a protection flags update
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We can fit the 2k for the STFLE interpretation and the crypto
control block into one DMA page. As we now only have to allocate
one DMA page, we can clean up the code a bit.
As a nice side effect, this also fixes a problem with crycbd alignment in
case special allocation debug options are enabled, debugged by Sascha
Silbe.
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>