In commit:
21cbc2822aa1 ("x86/mm/cpa: Unbreak populate_pgd(): stop trying to deallocate failed PUDs")
I intended to add this comment, but I failed at using git.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/242baf8612394f4e31216f96d13c4d2e9b90d1b7.1469293159.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Valdis Kletnieks bisected a boot failure back to this recent commit:
360cb4d155 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")
I broke the case where a PUD table got allocated -- populate_pud()
would wander off a pgd_none entry and get lost. I'm not sure how
this survived my testing.
Fix the original issue in a much simpler way. The problem
was that, if we allocated a PUD table, failed to populate it, and
freed it, another CPU could potentially keep using the PGD entry we
installed (either by copying it via vmalloc_fault or by speculatively
caching it). There's a straightforward fix: simply leave the
top-level entry in place if this happens. This can't waste any
significant amount of memory -- there are at most 256 entries like
this systemwide and, as a practical matter, if we hit this failure
path repeatedly, we're likely to reuse the same page anyway.
For context, this is a reversion with this hunk added in:
if (ret < 0) {
+ /*
+ * Leave the PUD page in place in case some other CPU or thread
+ * already found it, but remove any useless entries we just
+ * added to it.
+ */
- unmap_pgd_range(cpa->pgd, addr,
+ unmap_pud_range(pgd_entry, addr,
addr + (cpa->numpages << PAGE_SHIFT));
return ret;
}
This effectively open-codes what the now-deleted unmap_pgd_range()
function used to do except that unmap_pgd_range() used to try to
free the page as well.
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Mike Krinkin <krinkin.m.u@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Don't use the same syscall numbers for 2 different syscalls:
534 x32 preadv compat_sys_preadv64
535 x32 pwritev compat_sys_pwritev64
534 x32 preadv2 compat_sys_preadv2
535 x32 pwritev2 compat_sys_pwritev2
Add compat_sys_preadv64v2() and compat_sys_pwritev64v2() so that 64-bit offset
is passed in one 64-bit register on x32, similar to compat_sys_preadv64()
and compat_sys_pwritev64().
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/CAMe9rOovCMf-RQfx_n1U_Tu_DX1BYkjtFr%3DQ4-_PFVSj9BCzUA@mail.gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's statically initialized to zero -- no need to dynamically
initialize it to zero as well.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/6cf6314dce3051371a913ee19d1b88e29c68c560.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It serves no purpose -- raw_smp_processor_id() works fine. This
change will be needed to move thread_info off the stack.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/a2bf4f07fbc30fb32f9f7f3f8f94ad3580823847.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
struct thread_info is a legacy mess. To prepare for its partial removal,
move thread_info::addr_limit out.
As an added benefit, this way is simpler.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/15bee834d09402b47ac86f2feccdf6529f9bc5b0.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Rename it to match the thread_struct::uaccess_err pattern and also
because it was too long.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
struct thread_info is a legacy mess. To prepare for its partial removal,
move the uaccess control fields out -- they're straightforward.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d0ac4d01c8e4d4d756264604e47445d5acc7900e.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we call do_exit() with a clean stack, we greatly reduce the risk of
recursive oopses due to stack overflow in do_exit, and we allow
do_exit to work even if we OOPS from an IST stack. The latter gives
us a much better chance of surviving long enough after we detect a
stack overflow to write out our logs.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/32f73ceb372ec61889598da5e5b145889b9f2e19.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we get a vmalloc fault while current->active_mm->pgd doesn't
match CR3, we'll crash without this change. I've seen this failure
mode on heavily instrumented kernels with virtually mapped stacks.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4650d7674185f165ed8fdf9ac4c5c35c5c179ba8.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we overflow the stack into a guard page, we'll recursively fault
when trying to dump the contents of the guard page. Use
probe_kernel_address() so we can recover if this happens.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e626d47a55d7b04dcb1b4d33faa95e8505b217c8.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If we overflow the stack, print_context_stack() will abort. Detect
this case and rewind back into the valid part of the stack so that
we can trace it.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/ee1690eb2715ccc5dc187fde94effa4ca0ccbbcd.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
kernel_unmap_pages_in_pgd() is dangerous: if a PGD entry in
init_mm.pgd were to be cleared, callers would need to ensure that
the pgd entry hadn't been propagated to any other pgd.
Its only caller was efi_cleanup_page_tables(), and that, in turn,
was unused, so just delete both functions. This leaves a couple of
other helpers unused, so delete them, too.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/77ff20fdde3b75cd393be5559ad8218870520248.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This avoids pointless races in which another CPU or task might see a
partially populated global PGD entry. These races should normally
be harmless, but, if another CPU propagates the entry via
vmalloc_fault() and then populate_pgd() fails (due to memory allocation
failure, for example), this prevents a use-after-free of the PGD
entry.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/bf99df27eac6835f687005364bd1fbd89130946c.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So when memory hotplug removes a piece of physical memory from pagetable
mappings, it also frees the underlying PGD entry.
This complicates PGD management, so don't do this. We can keep the
PGD mapped and the PUD table all clear - it's only a single 4K page
per 512 GB of memory hotplugged.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/064ff6c7275734537f969e876f6cd0baa954d2cc.1468527351.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The page table manipulation code seems to have grown a couple of
sites that are looking for empty PTEs. Just in case one of these
entries got a stray bit set, use pte_none() instead of checking
for a zero pte_val().
The use pte_same() makes me a bit nervous. If we were doing a
pte_same() check against two cleared entries and one of them had
a stray bit set, it might fail the pte_same() check. But, I
don't think we ever _do_ pte_same() for cleared entries. It is
almost entirely used for checking for races in fault-in paths.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
Landing) has an erratum where a processor thread setting the Accessed
or Dirty bits may not do so atomically against its checks for the
Present bit. This may cause a thread (which is about to page fault)
to set A and/or D, even though the Present bit had already been
atomically cleared.
These bits are truly "stray". In the case of the Dirty bit, the
thread associated with the stray set was *not* allowed to write to
the page. This means that we do not have to launder the bit(s); we
can simply ignore them.
If the PTE is used for storing a swap index or a NUMA migration index,
the A bit could be misinterpreted as part of the swap type. The stray
bits being set cause a software-cleared PTE to be interpreted as a
swap entry. In some cases (like when the swap index ends up being
for a non-existent swapfile), the kernel detects the stray value
and WARN()s about it, but there is no guarantee that the kernel can
always detect it.
When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
to move the swap PTE format around to avoid these troublesome bits.
But, 32-bit non-PAE is tight on bits. So, disallow it from running
on this hardware. I can't imagine anyone wanting to run 32-bit
non-highmem kernels on this hardware, but disallowing them from
running entirely is surely the safe thing to do.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The erratum we are fixing here can lead to stray setting of the
A and D bits. That means that a pte that we cleared might
suddenly have A/D set. So, stop considering those bits when
determining if a pte is pte_none(). The same goes for the
other pmd_none() and pud_none(). pgd_none() can be skipped
because it is not affected; we do not use PGD entries for
anything other than pagetables on affected configurations.
This adds a tiny amount of overhead to all pte_none() checks.
I doubt we'll be able to measure it anywhere.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This erratum can result in Accessed/Dirty getting set by the hardware
when we do not expect them to be (on !Present PTEs).
Instead of trying to fix them up after this happens, we just
allow the bits to get set and try to ignore them. We do this by
shifting the layout of the bits we use for swap offset/type in
our 64-bit PTEs.
It looks like this:
bitnrs: | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0|
names: | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
before: | OFFSET (9-63) |0|X|X| TYPE(1-5) |0|
after: | OFFSET (14-63) | TYPE (9-13) |0|X|X|X| X| X|X|X|0|
Note that D was already a don't care (X) even before. We just
move TYPE up and turn its old spot (which could be hit by the
A bit) into all don't cares.
We take 5 bits away from the offset, but that still leaves us
with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
I think that's probably fine for the moment. We could
theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
doesn't gain us anything.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: dave.hansen@intel.com
Cc: linux-mm@kvack.org
Cc: mhocko@suse.com
Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This matches what is already done for prepare_exit_to_usermode(),
and saves about 60 clock cycles (4% speedup) with the benchmark
in the previous commit message.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Thanks to all the work that was done by Andy Lutomirski and others,
enter_from_user_mode() and prepare_exit_to_usermode() are now called only with
interrupts disabled. Let's provide them a version of user_enter()/user_exit()
that skips saving and restoring the interrupt flag.
On an AMD-based machine I tested this patch on, with force-enabled
context tracking, the speed-up in system calls was 90 clock cycles or 6%,
measured with the following simple benchmark:
#include <sys/signal.h>
#include <time.h>
#include <unistd.h>
#include <stdio.h>
unsigned long rdtsc()
{
unsigned long result;
asm volatile("rdtsc; shl $32, %%rdx; mov %%eax, %%eax\n"
"or %%rdx, %%rax" : "=a" (result) : : "rdx");
return result;
}
int main()
{
unsigned long tsc1, tsc2;
int pid = getpid();
int i;
tsc1 = rdtsc();
for (i = 0; i < 100000000; i++)
kill(pid, SIGWINCH);
tsc2 = rdtsc();
printf("%ld\n", tsc2 - tsc1);
}
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Link: http://lkml.kernel.org/r/1466434712-31440-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Three fixes. One is the qla24xx MSI regression, one is a theoretical
problem over blacklist matching, which would bite USB badly if it ever
triggered and one is a system hang with a particular type of IPR
device.
Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXgAriAAoJEAVr7HOZEZN40VcP/R65ZYezkobZDVR+xQfCwDnn
cCaFSl4a+FtsGnzs2hb88ZZWA2hn4xWDAdjwoPxRwD7yq8KB1Oxb2wEOiJMjIgJH
N/NU6zEIjMvVei8jDuAJJ13c1Xa1/y59ptD1nykm1dJlLGjwAE1jBiAuDs1OOm27
SnvIj+mOSRiesGqp2Srg6+w+QmOJ0biXiSf5EUhj7rFKayIaM3yLrQZ2E4a5oUfc
EZYzCBagObQoYwABw/6Nd1HhFdpptRnOvIr8xk+ch5w2m+AEiw1xAf+/B9ws25fB
EB4O7fQcNzIEbjRRIehGhtp83Q7ST77kq5mE7CHDLeu/v93ZzKBmEZXIM2Bdg2x5
TBEvFrkIFe1ECITHInegvG4/V/mNLFYcad9Ygdbdt6ndXWoJ5l1a8BRcNMCX0smQ
g7jXqMz0vYKW+JSZvP1fHtqsJR6t6CHAFOKtwwb5xlhRvbZRMB311pr3UfbMqlTx
qZOfGMqF/ta51RWkenNlRZHvg8WeeTxGioNexS8qU9j+9CUvvj5eIemEpBEBiblN
8BnbEAAnjSTSMGPFuOMQ8Njh3umC4ozzc8WcaXNjHnUMcIXaOY3PkcjUf65pi5S+
fPjV18350LAhiIlneHSCcGBO+Z+D5OjPJdQGywMb3fs9HICNfi41QsJrJVAkA3e5
vc2XZhSAfEQuwniuGJU3
=0CCg
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes. One is the qla24xx MSI regression, one is a theoretical
problem over blacklist matching, which would bite USB badly if it ever
triggered and one is a system hang with a particular type of IPR
device"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
qla2xxx: Fix NULL pointer deref in QLA interrupt
SCSI: fix new bug in scsi_dev_info_list string matching
ipr: Clear interrupt on croc/crocodile when running with LSI
- Provide a more concise fix for CVE-2016-1583
+ Additionally fixes linux-stable regressions caused by the cherry-picking of
the original fix
- Some very minor changes that have queued up
+ Fix typos in code comments
+ Remove unnecessary check for NULL before destroying kmem_cache
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCgAGBQJXf8nnAAoJENaSAD2qAscKwXgP/0awhY1z40dL/igP6fPv2ack
HbqrOjUVO2DzxinvKB3vRLNy93zwESxe8UpwPsl84IJ85zOQjkUkJ8PYk1oyBf0N
dVWqO11g6AKNZ+VQFspconvMhZATwSrsv8z3BzvwNGLsPhPuUQ+JmbBe8xMdrsZ5
qVaWswsMtMlhM3p/zFh57vWO64fT1xiabpxSkKpG2LHJN6h6QAQxkfBfa2FuXCsN
hZIw+ULcUJfdawXGq8lAfcYzbDmFpNt70fFquJgfJHrXFrOuensYfLcWUvhrSNbc
HZ6imRK9LCG4IKjJTBNmCmBR8ho71yGzdKuup81Eap+2zx2kC7twokS1d5fha8iL
Kzkx0NMDriY2N+tIfufHYk2IIenFzWG6Yuj0STswtJX4YhQGBc0H6VxcgrxE0PgW
k1iKUV7jnJGxxN+d6lmV4+fX0vKGgBMsQq1Q76CkYLN1BAvdwz6GnWSfqP8hWz3o
sNVyNtYh+/TXY8JMWKDBlps7Ib8W88qDW3K7YcAf2VPYAqIWm5Va1MR5m5s+UIeR
QiCD32X/0PfDp13QRiKAHJ6C9CInyu0r+fF/g8ZMqLuWgLxoahxpr6ML/CnHoGl5
IcDydJO3/bLBq9If8WxYsOQvVKCa4e7N7o7ZHPKd8U7O39mCGNfbQx7/FlMjtvf6
+4HAxamUC1ogpLTkpWxI
=Bt4P
-----END PGP SIGNATURE-----
Merge tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
Pull eCryptfs fixes from Tyler Hicks:
"Provide a more concise fix for CVE-2016-1583:
- Additionally fixes linux-stable regressions caused by the
cherry-picking of the original fix
Some very minor changes that have queued up:
- Fix typos in code comments
- Remove unnecessary check for NULL before destroying kmem_cache"
* tag 'ecryptfs-4.7-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"
ecryptfs: fix spelling mistakes
eCryptfs: fix typos in comment
ecryptfs: drop null test before destroy functions
Two Fixes:
* Intel VT-d fix for a suspend/resume issue, introduced with the
scalability improvements in this cycle.
* AMD IOMMU fix for systems that have unity mappings defined. There was
a race where translation got enabled before the unity mappings were
in place. This issue was seen on some HP servers.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABAgAGBQJXf7bkAAoJECvwRC2XARrjzYoP/AvvkL955TW0pSdMWZqQ0o9j
kL3jSl75Te1EwaSyJxR9nOK8azSsfYcG4NdzMxnFUCK6ydDA6Ogz0cMtcd6s2skd
unQuCSpjOJiMQU426ssQrsW+KoTxi5utpapMY3Ayh94/BbEDuVtv/iUIBE1Ca9eE
H/SQUl4l8c3Cwe//1Kwrzm6O0pslrLBoOABTpFhl4GZ5pWCcv7bwbSEjVfSY5X8x
//trqhgsACdh+Qlp5dA56BeMVyEblOvnY5XO2ODeWOG2vlEbqH9MiUeuDBNoXxtM
MqhLTSKijZ1wp2a1iaCuyBLKFlaW7zChU41YyQmYJ8B60u6CwVuFwGs+XltnkOha
iYBK56xZ8npFKOs4iauBmsVZZnNrAY0mxCYg28KtNh2uUT4sQdqUCnT/SxA9Z0RC
bRtudpoX5I3NZrlDlt6OzWqCjxFj8Zq57RwrhDuQm+ijqAAor4C7NhH7cT0QTC4k
l1srxnLtSM/iiqWgs2hNIC2GzNbdhix8YU98kIxpT2iNB99qlqPXK9ve0bJIb03N
oECjKBu+d0I7ApdfOlL6N/RHgZJ01O+6gK1neArtVnviBgN7tmTxgWvcioJd+JuL
bmjytsExfP/Z2p7iy9yQCiIDicAo0+wpMOzZRG8jAXvHHCj8laZ8PhL74voWc2wH
Mn+iaXAXUysLblaXzpEx
=axsu
-----END PGP SIGNATURE-----
Merge tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
"Two Fixes:
- Intel VT-d fix for a suspend/resume issue, introduced with the
scalability improvements in this cycle.
- AMD IOMMU fix for systems that have unity mappings defined. There
was a race where translation got enabled before the unity mappings
were in place. This issue was seen on some HP servers"
* tag 'iommu-fixes-v4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: Fix unity mapping initialization race
iommu/vt-d: Fix infinite loop in free_all_cpu_cached_iovas
- Fix two bugs in the handling of xenbus transactions.
- Make the xen acpi driver compatible with Xen 4.7.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJXf7GHAAoJEFxbo/MsZsTRYksH/0F+xZZQQiO3WPSzL0muu5Fn
wLmmSyBv6Ak76vZ6z7+ku095OagA0LgS1eISnKlP86HTaRl8eQ6AyChjKux3cX9T
+X1hHwBN39rfF6mZO4pXhu/7SKVmcOvVY7SHvKca8Lx31Y58eLB4+6ycnrGI+XQ7
oon7KrmqSAg/3r1/CLvwTE6/PPxj/T38g0QoegN6ua26O79OFY5GWmdc+ucfR76i
NIOubaVX93s8dF0YcvVBL1HIs64AkUkk6i5DiyJ1r05kCTy2sYlZ3e6abCFhqMj+
jcf4aCTI4sCzbZRHID5mEMxfiGAHFo5MPuoRpo08orMbGZu/0+ytnkJ/hYb+H7c=
=YMOM
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix two bugs in the handling of xenbus transactions.
- Make the xen acpi driver compatible with Xen 4.7.
* tag 'for-linus-4.7b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7
xenbus: simplify xenbus_dev_request_and_reply()
xenbus: don't bail early from xenbus_dev_request_and_reply()
xenbus: don't BUG() on user mode induced condition
- Enforce USER_DS on exception entry from EL1
- Apply workaround for Cavium errata #27456 on Thunderx-81xx parts
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABCgAGBQJXf5pkAAoJELescNyEwWM0CrwH/RTFmTDzlvwJbcmVKLeabfSb
8AUphL7+D8gRLBRy1l+pdjqHli4EuxA34peaIHs91ziPl85wI+l37juTZ08MqYUM
W3lLbKPmJGa39WKYq5rtKqaohCGHRA0SwLSq78kbRFb3GgWUvNbrUaC5oBoEOBkc
x2vEpsVVhAWezly1CaX0zf8yfBuGp5O8rkw2yFqPuD7MKh3D0DLK4F8UCmZ9OqQM
nI10nq9GBdbus8yA/2kIHSvtkGC9l0Cyiu8iJ/Gf4HQnSqVopPAzvP0FdNs5cj9o
5m/BOJUED/pEdps7+PZMlJHYrHpB+VTqrZ/HdFFI4M5EsIltw3OSKp/lA6cA/Xc=
=iKFx
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"A couple of late fixes here, but one that we've been sitting on for a
few weeks while the details were worked out. Specifically, we now
enforce USER_DS on taking exceptions whilst in the kernel, which
avoids leaking kernel data to userspace through things like perf. The
other patch is an update to a workaround for a hardware erratum on
some Cavium SoCs.
Summary:
- Enforce USER_DS on exception entry from EL1
- Apply workaround for Cavium errata #27456 on Thunderx-81xx parts"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Enable workaround for Cavium erratum 27456 on thunderx-81xx
arm64: kernel: Save and restore UAO and addr_limit on exception entry
Pull x86 fixes from Ingo Molnar:
"Three fixes:
- A boot crash fix with certain configs
- a MAINTAINERS entry update
- Documentation typo fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Documentation: Fix various typos in Documentation/x86/ files
x86/amd_nb: Fix boot crash on non-AMD systems
MAINTAINERS: Update the Calgary IOMMU entry
Pull perf fixes from Ingo Molnar:
"Various fixes:
- 32-bit callgraph bug fix
- suboptimal event group scheduling bug fix
- event constraint fixes for Broadwell/Skylake
- RAPL module name collision fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix pmu::filter_match for SW-led groups
x86/perf/intel/rapl: Fix module name collision with powercap intel-rapl
perf/x86: Fix 32-bit perf user callgraph collection
perf/x86/intel: Update event constraints when HT is off
Pull irq fixes from Ingo Molnar:
"Two MIPS-GIC irqchip driver fixes to unbreak certain MIPS boards"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/mips-gic: Match IPI IRQ domain by bus token only
irqchip/mips-gic: Map to VPs using HW VPNum
- Fix an oops on the Asus Eee PC 1201
- Revert a patch trying to split GPIO parsing and GPIO configuration
- Revert a too liberal compile testing thing
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXf460AAoJEEEQszewGV1zelEQAKGzur1ClwQx4PzF5W+6hAkJ
a7DDMOA2jBWgh+fnGRGq1yKNpeF1famLB4EVYvr4fzsTZDMFnFmHLAomc42wzfPM
ZDgPBcpNN9/603vXrW32c84SbNKUt3liC0VGBiQns0ixRnl/z07HcFIcQgLWjSjc
02VtYQVXlBtEvgpCSovfcly2hvoSB8onDKhF7/gaI5d+k0Ngq4MtIMueMEYaJY/z
16NZl5Sapn58gAUhElUy+9gBeIA+xPx/VXhBlP+g6pSAHiRudaHNAMEFEDiAWEOQ
Iedzy/vXtLdRV6w3ZZJ99OGSTYrGtBBMbIKOWA98IW9s2h98B3r7+267VzFMd5f2
Avcdg8emkQP6iN1hWCtT3YKKgK6L0Iwi7FIqgaHzFyg32tIc42Ej8EoD+HyzT4MU
NzAXg/0E3b0vgtxzXcWUNEJ3K4P7xlgcFRiuwGAE1C2c1RxxKLSTyvDBANRSKSLt
ADNYioWr4vAQEE+id3SZFkbxOOlTSD+nAjcE1NzXCJcGpwinpPWADfQxwn6TEvp/
KodH7eItpJsWs8z7mqnCFC6lox133phXqPvs2ECRMFi6p4OYaTRX+MSeoLv9t9w/
e/KVl9Wpnx9ImvFurr3TsPaAtjsAPlnlUxrNrY3mngAfKEm+YKoJBgen6KrHiGPI
fewQGrZr4xhtDn0AOKNm
=9+1I
-----END PGP SIGNATURE-----
Merge tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio
Pull GPIO fixes from Linus Walleij:
"I don't like to toss in last minute patches, but these are all for
things that are broken, and have bitten people for real. Two of them
go into stable. Maybe all of them if the compile test problem is a
pain in the ass also for stable folks.
Final (hopefully) GPIO fixes for v4.7:
- Fix an oops on the Asus Eee PC 1201
- Revert a patch trying to split GPIO parsing and GPIO configuration
- Revert a too liberal compile testing thing"
* tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
Revert "gpio: gpiolib-of: Allow compile testing"
Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"
gpio: sch: Fix Oops on module load on Asus Eee PC 1201
Pull drm fixes from Dave Airlie:
"One nouveau fix, and a few AMD Polaris fixes and some Allwinner fixes.
I've got some vmware fixes that I might send separate over the
weekend, they fix some black screens, but I'm still debating them"
* tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux:
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation.
drm/amd/powerplay: fix bug that get wrong polaris evv voltage.
drm/amd/powerplay: incorrectly use of the function return value
drm/amd/powerplay: fix incorrect voltage table value for tonga
drm/amd/powerplay: fix incorrect voltage table value for polaris10
drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern
gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle
drm/sun4i: Send vblank event when the CRTC is disabled
drm/sun4i: Report proper vblank
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
No need to have it appear in objdump output.
No functionality change.
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160708141016.GH3808@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
both Intel and AMD systems. Doing any kind of hardware capability
checks in the driver as a prerequisite was wrong anyway: With the
hypervisor being in charge, all such checking should be done by it. If
ACPI data gets uploaded despite some missing capability, the hypervisor
is free to ignore part or all of that data.
Ditch the entire check_prereq() function, and do the only valid check
(xen_initial_domain()) in the caller in its place.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Should print this on vDSO remapping success (on new kernels):
[root@localhost ~]# ./test_mremap_vdso_32
AT_SYSINFO_EHDR is 0xf773f000
[NOTE] Moving vDSO: [f773f000, f7740000] -> [a000000, a001000]
[OK]
Or print that mremap() for vDSOs is unsupported:
[root@localhost ~]# ./test_mremap_vdso_32
AT_SYSINFO_EHDR is 0xf773c000
[NOTE] Moving vDSO: [0xf773c000, 0xf773d000] -> [0xf7737000, 0xf7738000]
[FAIL] mremap() of the vDSO does not work on this kernel!
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kselftest@vger.kernel.org
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-3-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add possibility for 32-bit user-space applications to move
the vDSO mapping.
Previously, when a user-space app called mremap() for the vDSO
address, in the syscall return path it would land on the previous
address of the vDSOpage, resulting in segmentation violation.
Now it lands fine and returns to userspace with a remapped vDSO.
This will also fix the context.vdso pointer for 64-bit, which does
not affect the user of vDSO after mremap() currently, but this
may change in the future.
As suggested by Andy, return -EINVAL for mremap() that would
split the vDSO image: that operation cannot possibly result in
a working system so reject it.
Renamed and moved the text_mapping structure declaration inside
map_vdso(), as it used only there and now it complements the
vvar_mapping variable.
There is still a problem for remapping the vDSO in glibc
applications: the linker relocates addresses for syscalls
on the vDSO page, so you need to relink with the new
addresses.
Without that the next syscall through glibc may fail:
Program received signal SIGSEGV, Segmentation fault.
#0 0xf7fd9b80 in __kernel_vsyscall ()
#1 0xf7ec8238 in _exit () from /usr/lib32/libc.so.6
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Acked-by: Andy Lutomirski <luto@kernel.org>
Cc: 0x7f454c46@gmail.com
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No need to retain a local copy of the full request message, only the
type is really needed.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
xenbus_dev_request_and_reply() needs to track whether a transaction is
open. For XS_TRANSACTION_START messages it calls transaction_start()
and for XS_TRANSACTION_END messages it calls transaction_end().
If sending an XS_TRANSACTION_START message fails or responds with an
an error, the transaction is not open and transaction_end() must be
called.
If sending an XS_TRANSACTION_END message fails, the transaction is
still open, but if an error response is returned the transaction is
closed.
Commit 027bd7e899 ("xen/xenbus: Avoid synchronous wait on XenBus
stalling shutdown/restart") introduced a regression where failed
XS_TRANSACTION_START messages were leaving the transaction open. This
can cause problems with suspend (and migration) as all transactions
must be closed before suspending.
It appears that the problematic change was added accidentally, so just
remove it.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Currently it's possible for broken (or malicious) userspace to flood a
kernel log indefinitely with messages a-la
Program dmidecode tried to access /dev/mem between f0000->100000
because range_is_allowed() is case of CONFIG_STRICT_DEVMEM being turned on
dumps this information each and every time devmem_is_allowed() fails.
Reportedly userspace that is able to trigger contignuous flow of these
messages exists.
It would be possible to rate limit this message, but that'd have a
questionable value; the administrator wouldn't get information about all
the failing accessess, so then the information would be both superfluous
and incomplete at the same time :)
Returning EPERM (which is what is actually happening) is enough indication
for userspace what has happened; no need to log this particular error as
some sort of special condition.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1607081137020.24757@cbobk.fhfr.pm
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull apparmor fix from James Morris.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
apparmor: fix oops, validate buffer size in apparmor_setprocattr()
- Fix a lock ordering issue in ACPICA introduced by a recent commit
that attempted to fix a deadlock in the dynamic table loading code
which in turn appeared after changes related to the handling of
module-level AML also made in this cycle (Lv Zheng).
- Fix a recent regression in the ACPI IRQ management code that may
cause PCI drivers to be unable to register an IRQ if that IRQ
happens to be shared with a device on the ISA bus, like the
parallel port, by reverting one commit entirely and restoring the
previous behavior in two other places (Sinan Kaya).
- Fix a recent regression in the ACPI AML debugger introduced by
the commit that removed incorrect usage of IS_ERR_VALUE() from
multiple places (Lv Zheng).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXfuIeAAoJEILEb/54YlRxWbIP/iLT8J15RhFOoRYX4HXyULym
UxeqEHVySKaqQLYhkBRz0opclW6sJQ7GaKYpVR31V4eeoll1+iQeYBYWdyuezjwp
em3LvtLXTiLWWfRVNHg9AUOv4wc4q41m98MZ5ZScgTsUcexv2R1tt0KzLE/HNy1T
NU/7JyWBEF4AiFsfYBuqtknWudV7JF3/siJO+Q1o+HSA9DW5cdqR0K9oM+Sl9pTD
3yLpVY8DNJGLSA/7MyJlyLmZ6eJmTQbxcLO7oCQliqJ9jjBKkC4r3fgfQcOcoltT
S6O7ODcJwvqCA1sv271VyOckkhJmyT7zJi/L/yJOgiGRU1//0Vfyg6iRVou9OZmE
VcQT79W+6W6saWucoNDX1CLm6GzLIHP2e6leooL710nNmtTUgrSc6C4FT1KGGs0R
UTNPYKYtiaK54krsa48XonSCdGFIszeK8sa3DIjf6C/5AYX/eGBuGFm5dZlU/qzs
BNv+GyT/Vt/Cu3cNJUrsMugwDL5sp31u44mfM1c99LMXmnwzkPylRGR1aaO4TYOy
jE5LDwMryi5gpAF8/Es0AiU4Xa8O3p+AlD7PjZIUSEyMd9zr3fdpvF6dZwaiHP26
FX6paRbXi10mkXYGnRCbTOBV8i3PuNiplaZUT+XotHlEXOvAyMhVCVSM2XCAo1Wm
3IWSGzc5gwq1Pl4fihWi
=vKn9
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"All of these fix recent regressions in ACPICA, in the ACPI PCI IRQ
management code and in the ACPI AML debugger.
Specifics:
- Fix a lock ordering issue in ACPICA introduced by a recent commit
that attempted to fix a deadlock in the dynamic table loading code
which in turn appeared after changes related to the handling of
module-level AML also made in this cycle (Lv Zheng).
- Fix a recent regression in the ACPI IRQ management code that may
cause PCI drivers to be unable to register an IRQ if that IRQ
happens to be shared with a device on the ISA bus, like the
parallel port, by reverting one commit entirely and restoring the
previous behavior in two other places (Sinan Kaya).
- Fix a recent regression in the ACPI AML debugger introduced by the
commit that removed incorrect usage of IS_ERR_VALUE() from multiple
places (Lv Zheng)"
* tag 'acpi-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / debugger: Fix regression introduced by IS_ERR_VALUE() removal
ACPICA: Namespace: Fix namespace/interpreter lock ordering
ACPI,PCI,IRQ: separate ISA penalty calculation
Revert "ACPI, PCI, IRQ: remove redundant code in acpi_irq_penalty_init()"
ACPI,PCI,IRQ: factor in PCI possible
- Fix a recent performance regression on Power systems (powernv
and pseries) introduced by a core cpuidle commit that decreased
the precision of the last_residency conversion from nano- to
microseconds, which should not matter in theory, but turned out
to play not-so-well with the special "snooze" idle state on Power
(Shreyas B Prabhu).
- Fix a crash during resume from hibernation on x86-64 caused by
possible corruption of the kernel text part of page tables in the
last phase of image restoration exposed by a security-related
change during the 4.3 development cycle (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXfuVmAAoJEILEb/54YlRx/TUQALIrIgVMtNpPhC0Mu0DQp+1y
S2t/vWfLnMqfl1lzH/scDi4OC3cqEbEsYB5CUJPRysc1KB0fjJbW+BEuQuJyEhgf
ac95mv1IU7DKiJ1Rp4qWvvBoovY3fjb1rhHMzvo9uh6HmKrSlWLL4PmnbGw/BaFL
PgqEazsy/a9FgGT7VnvamVL6bGXgVMEf8VqoBeZFd0krH4qvBZHrS97OLqU685vT
VW0IDZr0sI3u0pG9ropJ5b8ETtnyZ4lmTO5D4TzD9P8kZyjl2TBnSZ/ls/1YLh4N
WdBZXbsfmFFt4jyDnOjilWOBA8R3IDteCDmG1l8XhK4Smbl7TRh2XVuRAHHEjG6b
oMPHTeqIil3ExBS2gdzVEFNzYfHsyg3N9cXSochqnp3vboyZOiKaxIPCiVeBOZ+t
bDPwtbovIly1aEnsJKUbHvBxJ4A9uPUqw99YbBtMK1VpQXZFg+s/0mBVNynkuu/k
EUOuU6KcY6Br4JJPsqfb26NmARgLrrI+gTJbC3pFDHARtALnCbkkbmNZRRVRS3Zs
7CzGaYqgZ0ejMv/46jFPsHYKXU52AJ5HDhfXfELd8X3AvdhtmIszfqTkBgKx4gto
fyTVzi2c6+RJJr5Jc07msYawaz86nEUjeXFsM59g6jC5dguSWnRxUuWV2zUM/ENb
GDkzNdzif0t2DheJ4bKc
=XL6s
-----END PGP SIGNATURE-----
Merge tag 'pm-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"One fix for a recent cpuidle core change that, against all odds,
introduced a functional regression on Power systems and the fix for
the crash during resume from hibernation on x86-64 that has been in
the works for the last few weeks (it actually was ready last week, but
I wanted to allow the reporters to test if for some more time).
Specifics:
- Fix a recent performance regression on Power systems (powernv and
pseries) introduced by a core cpuidle commit that decreased the
precision of the last_residency conversion from nano- to
microseconds, which should not matter in theory, but turned out to
play not-so-well with the special "snooze" idle state on Power
(Shreyas B Prabhu).
- Fix a crash during resume from hibernation on x86-64 caused by
possible corruption of the kernel text part of page tables in the
last phase of image restoration exposed by a security-related
change during the 4.3 development cycle (Rafael Wysocki)"
* tag 'pm-4.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Fix last_residency division
x86/power/64: Fix kernel text mapping corruption during image restoration
A new set of fixes for the sun4i driver, mostly related to vblank handling,
and a minor fix to release a reference on the device tree nodes we're
parsing in the probe logic.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXfBEjAAoJEBx+YmzsjxAg614P/Ak3zQiSfzdr3W95cttFN3Vj
uI5kSFJjW96KzmS3EaqJYSrVZ0gtLkofB5lS0ElsrulYKcKBmYnjCvw+f0a8gM2K
OY+y3pu17s5o3TJKWle2Sf8bW+rhA4dLaKHKBolrT2J9wFiQGTL0sOvq5ZvsCKqC
j5Vp2sEEeGopO6ZOp6lKcRT4FVB7y9A7/7UVnizsqn3ZrUqERa9ofWp9OS8p+P2H
cftyVxDgfA5Ve6bS7C0iqjfSVsSqbc/NexKHvbp3ZOLiuLUehZoFQb9fvrzZYoPK
uFTl29BEhxHEgqwAF7/iLjggBOmzzxKSaQP4/1567r+uYW+xF857yCl82JbilYiO
/laUdl4GK2B+yeJ+617THaCi5V+zdRQID0FPFcGNbDzjpw9WFN8ok7MY9iuoPfF7
30hEDwZYcGlbiNoCTs+1TKO8NQO85gNPgQ6MxfcJ+9nxR6OugkRGjltBrKJdGbSP
MafUG8FEDXnHbpNEQ7ruA+EebJZAHxjjMUvsCJlGKTIIUuw+vMfrKhexd0JKx2s2
el5Xmb2kKtl6+88xSTglPxugt0w6lM2VA1n52ZNEVkcgG4jdOPj/NVoGMmKL9/sJ
oUFIlzua06MA3fjoYMiLGmpNhls008pzs9nDPTrF7cuot5xCNnW/P/CkI7ODTB1P
khj8gD2QtIgNDGnQg/Mh
=oFqB
-----END PGP SIGNATURE-----
Merge tag 'sunxi-drm-fixes-for-4.7-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into drm-fixes
Allwinner DRM driver fixes for 4.7, take 2
A new set of fixes for the sun4i driver, mostly related to vblank handling,
and a minor fix to release a reference on the device tree nodes we're
parsing in the probe logic.
* tag 'sunxi-drm-fixes-for-4.7-2' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle
drm/sun4i: Send vblank event when the CRTC is disabled
drm/sun4i: Report proper vblank
When proc_pid_attr_write() was changed to use memdup_user apparmor's
(interface violating) assumption that the setprocattr buffer was always
a single page was violated.
The size test is not strictly speaking needed as proc_pid_attr_write()
will reject anything larger, but for the sake of robustness we can keep
it in.
SMACK and SELinux look safe to me, but somebody else should probably
have a look just in case.
Based on original patch from Vegard Nossum <vegard.nossum@oracle.com>
modified for the case that apparmor provides null termination.
Fixes: bb646cdb12
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: John Johansen <john.johansen@canonical.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: stable@kernel.org
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Tyler Hicks <tyhicks@canonical.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
This reverts commit 2f36db7100.
It fixed a local root exploit but also introduced a dependency on
the lower file system implementing an mmap operation just to open a file,
which is a bit of a heavy hammer. The right fix is to have mmap depend
on the existence of the mmap handler instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Pull block IO fixes from Jens Axboe:
"Three small fixes that have been queued up and tested for this series:
- A bug fix for xen-blkfront from Bob Liu, fixing an issue with
incomplete requests during migration.
- A fix for an ancient issue in retrieving the IO priority of a
different PID than self, preventing that task from going away while
we access it. From Omar.
- A writeback fix from Tahsin, fixing a case where we'd call ihold()
with a zero ref count inode"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: fix use-after-free in sys_ioprio_get()
writeback: inode cgroup wb switch should not call ihold()
xen-blkfront: save uncompleted reqs in blkfront_resume()