Commit Graph

215 Commits

Author SHA1 Message Date
Masami Hiramatsu 540adea380 error-injection: Separate error-injection from kprobe
Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.

Some differences has been made:

- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
  ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
  feature. It is automatically enabled if the arch supports
  error injection feature for kprobe or ftrace etc.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-12 17:33:38 -08:00
Josef Bacik 9802d86585 bpf: add a bpf_override_function helper
Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.

Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2017-12-12 09:02:34 -08:00
Ingo Molnar 15bcdc9477 Merge branch 'linus' into perf/core, to fix conflicts
Conflicts:
	tools/perf/arch/arm/annotate/instructions.c
	tools/perf/arch/arm64/annotate/instructions.c
	tools/perf/arch/powerpc/annotate/instructions.c
	tools/perf/arch/s390/annotate/instructions.c
	tools/perf/arch/x86/tests/intel-cqm.c
	tools/perf/ui/tui/progress.c
	tools/perf/util/zlib.c

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:30:18 +01:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Ingo Molnar ca4b9c3b74 Merge branch 'perf/urgent' into perf/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 11:02:05 +02:00
Masami Hiramatsu a30b85df7d kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y
We want to wait for all potentially preempted kprobes trampoline
execution to have completed. This guarantees that any freed
trampoline memory is not in use by any task in the system anymore.
synchronize_rcu_tasks() gives such a guarantee, so use it.

Also, this guarantees to wait for all potentially preempted tasks
on the instructions which will be replaced with a jump.

Since this becomes a problem only when CONFIG_PREEMPT=y, enable
CONFIG_TASKS_RCU=y for synchronize_rcu_tasks() in that case.

Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/150845661962.5443.17724352636247312231.stgit@devbox
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-20 09:45:15 +02:00
Ding Tianhong f4986d250a Revert commit 1a8b6d76dc ("net:add one common config...")
The new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING has been added
to indicate that Relaxed Ordering Attributes (RO) should not
be used for Transaction Layer Packets (TLP) targeted toward
these affected Root Port, it will clear the bit4 in the PCIe
Device Control register, so the PCIe device drivers could
query PCIe configuration space to determine if it can send
TLPs to Root Port with the Relaxed Ordering Attributes set.

With this new flag  we don't need the config ARCH_WANT_RELAX_ORDER
to control the Relaxed Ordering Attributes for the ixgbe drivers
just like the commit 1a8b6d76dc ("net:add one common config...") did,
so revert this commit.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09 07:43:06 -07:00
Linus Torvalds 44ccba3f7b - For the randstruct plugin, enable automatic randomization of structures
that are entirely function pointers (along with a couple designated
   initializer fixes).
 - For the structleak plugin, provide an option to perform zeroing
   initialization of all otherwise uninitialized stack variables that are
   passed by reference (Ard Biesheuvel).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZrwHlAAoJEIly9N/cbcAmJR0QAKsTL0B6iBJlzrcAj6HkloMu
 QTTx+qrdpuhEJ+mH10JpOJnFctVI3vt7tUXGhBb0eBXuvnXPACjy3jx2X1tcnKf4
 v2HLf2GuCb95HqDVgrzn+HNPiAPb0dEM7qJPV+VfZA0K2nb6dVmS9fDYQWCLGJI+
 aazpmJDAOhXuKtUsbONaomoygBbS2kYrYCzqYB4M0FmZvbKw4CUdvVonkxhAITtl
 Zj3cl++jgHnVSNmyk92n3LTbIOv/o+pAMWv3/K6KDUIsNtVyk4znaghQJ6VKZhoR
 ua1gGzd0vrKMm960y8sDve+w5JSwaHVq6Y4jeqQynZywDpB998IhQiLmWfdSoN0O
 BPzAkxdNjCGNe+Ro6fQWYAXvnBZN2Gw8RiIjJP5DEz8EXe2BgGAFn3C6xbIS+F+A
 mXcn3Chorc1ZEfwMrbQ24vTfHRNmwMYQbZYZ9XftzixJU8XXhAf135DS+Enrc09X
 eSWEWaAJuF4en8A+1CsxO7vMh3U8tcS2lldbEUgXCJlNExzYFxBHwB2GImYXUt9D
 1i74n0PSz3EA8zfVr3qsGdraJq+7Ubq2NRWoudtQPYbHIh+VZcQ2VQEFtWOkmlgB
 T4foN7s17MrZzxn8krlYy8yODFJkisRJi/A5ox7hERwZjAhMQdwbTEr8HhKTui6X
 rm73yglE4ebfidp4Iyq4
 =3jxS
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins update from Kees Cook:
 "This finishes the porting work on randstruct, and introduces a new
  option to structleak, both noted below:

   - For the randstruct plugin, enable automatic randomization of
     structures that are entirely function pointers (along with a couple
     designated initializer fixes).

   - For the structleak plugin, provide an option to perform zeroing
     initialization of all otherwise uninitialized stack variables that
     are passed by reference (Ard Biesheuvel)"

* tag 'gcc-plugins-v4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: structleak: add option to init all vars used as byref args
  randstruct: Enable function pointer struct detection
  drivers/net/wan/z85230.c: Use designated initializers
  drm/amd/powerplay: rv: Use designated initializers
2017-09-07 20:30:19 -07:00
Kees Cook 7a46ec0e2f locking/refcounts, x86/asm: Implement fast refcount overflow protection
This implements refcount_t overflow protection on x86 without a noticeable
performance impact, though without the fuller checking of REFCOUNT_FULL.

This is done by duplicating the existing atomic_t refcount implementation
but with normally a single instruction added to detect if the refcount
has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
the handler saturates the refcount_t to INT_MIN / 2. With this overflow
protection, the erroneous reference release that would follow a wrap back
to zero is blocked from happening, avoiding the class of refcount-overflow
use-after-free vulnerabilities entirely.

Only the overflow case of refcounting can be perfectly protected, since
it can be detected and stopped before the reference is freed and left to
be abused by an attacker. There isn't a way to block early decrements,
and while REFCOUNT_FULL stops increment-from-zero cases (which would
be the state _after_ an early decrement and stops potential double-free
conditions), this fast implementation does not, since it would require
the more expensive cmpxchg loops. Since the overflow case is much more
common (e.g. missing a "put" during an error path), this protection
provides real-world protection. For example, the two public refcount
overflow use-after-free exploits published in 2016 would have been
rendered unexploitable:

  http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/

  http://cyseclabs.com/page?n=02012016

This implementation does, however, notice an unchecked decrement to zero
(i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
resulted in a zero). Decrements under zero are noticed (since they will
have resulted in a negative value), though this only indicates that a
use-after-free may have already happened. Such notifications are likely
avoidable by an attacker that has already exploited a use-after-free
vulnerability, but it's better to have them reported than allow such
conditions to remain universally silent.

On first overflow detection, the refcount value is reset to INT_MIN / 2
(which serves as a saturation value) and a report and stack trace are
produced. When operations detect only negative value results (such as
changing an already saturated value), saturation still happens but no
notification is performed (since the value was already saturated).

On the matter of races, since the entire range beyond INT_MAX but before
0 is negative, every operation at INT_MIN / 2 will trap, leaving no
overflow-only race condition.

As for performance, this implementation adds a single "js" instruction
to the regular execution flow of a copy of the standard atomic_t refcount
operations. (The non-"and_test" refcount_dec() function, which is uncommon
in regular refcount design patterns, has an additional "jz" instruction
to detect reaching exactly zero.) Since this is a forward jump, it is by
default the non-predicted path, which will be reinforced by dynamic branch
prediction. The result is this protection having virtually no measurable
change in performance over standard atomic_t operations. The error path,
located in .text.unlikely, saves the refcount location and then uses UD0
to fire a refcount exception handler, which resets the refcount, handles
reporting, and returns to regular execution. This keeps the changes to
.text size minimal, avoiding return jumps and open-coded calls to the
error reporting routine.

Example assembly comparison:

refcount_inc() before:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)

refcount_inc() after:

  .text:
  ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
  ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
  ...
  .text.unlikely:
  ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
  ffffffff816c36d7:       0f ff                   (bad)

These are the cycle counts comparing a loop of refcount_inc() from 1
to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
(refcount_t-full), and this overflow-protected refcount (refcount_t-fast):

  2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
		    cycles		protections
  atomic_t           82249267387	none
  refcount_t-fast    82211446892	overflow, untested dec-to-zero
  refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero

This code is a modified version of the x86 PAX_REFCOUNT atomic_t
overflow defense from the last public patch of PaX/grsecurity, based
on my understanding of the code. Changes or omissions from the original
code are mine and don't reflect the original grsecurity/PaX code. Thanks
to PaX Team for various suggestions for improvement for repurposing this
code to be a refcount-only protection.

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: kernel-hardening@lists.openwall.com
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 10:40:26 +02:00
Kees Cook ad05e6ca7b Merge branch 'for-next/gcc-plugin/structleak' into for-next/gcc-plugins 2017-08-07 13:29:04 -07:00
Ard Biesheuvel f7dd250789 gcc-plugins: structleak: add option to init all vars used as byref args
In the Linux kernel, struct type variables are rarely passed by-value,
and so functions that initialize such variables typically take an input
reference to the variable rather than returning a value that can
subsequently be used in an assignment.

If the initalization function is not part of the same compilation unit,
the lack of an assignment operation defeats any analysis the compiler
can perform as to whether the variable may be used before having been
initialized. This means we may end up passing on such variables
uninitialized, resulting in potential information leaks.

So extend the existing structleak GCC plugin so it will [optionally]
apply to all struct type variables that have their address taken at any
point, rather than only to variables of struct types that have a __user
annotation.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-08-07 11:20:57 -07:00
Kees Cook 9225331b31 randstruct: Enable function pointer struct detection
This enables the automatic structure selection logic in the randstruct
GCC plugin. The selection logic randomizes all structures that contain
only function pointers, unless marked with __no_randomize_layout.

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-08-01 17:04:48 -07:00
Daniel Micay 6974f0c455 include/linux/string.h: add the option of fortified string.h functions
This adds support for compiling with a rough equivalent to the glibc
_FORTIFY_SOURCE=1 feature, providing compile-time and runtime buffer
overflow checks for string.h functions when the compiler determines the
size of the source or destination buffer at compile-time.  Unlike glibc,
it covers buffer reads in addition to writes.

GNU C __builtin_*_chk intrinsics are avoided because they would force a
much more complex implementation.  They aren't designed to detect read
overflows and offer no real benefit when using an implementation based
on inline checks.  Inline checks don't add up to much code size and
allow full use of the regular string intrinsics while avoiding the need
for a bunch of _chk functions and per-arch assembly to avoid wrapper
overhead.

This detects various overflows at compile-time in various drivers and
some non-x86 core kernel code.  There will likely be issues caught in
regular use at runtime too.

Future improvements left out of initial implementation for simplicity,
as it's all quite optional and can be done incrementally:

* Some of the fortified string functions (strncpy, strcat), don't yet
  place a limit on reads from the source based on __builtin_object_size of
  the source buffer.

* Extending coverage to more string functions like strlcat.

* It should be possible to optionally use __builtin_object_size(x, 1) for
  some functions (C strings) to detect intra-object overflows (like
  glibc's _FORTIFY_SOURCE=2), but for now this takes the conservative
  approach to avoid likely compatibility issues.

* The compile-time checks should be made available via a separate config
  option which can be enabled by default (or always enabled) once enough
  time has passed to get the issues it catches fixed.

Kees said:
 "This is great to have. While it was out-of-tree code, it would have
  blocked at least CVE-2016-3858 from being exploitable (improper size
  argument to strlcpy()). I've sent a number of fixes for
  out-of-bounds-reads that this detected upstream already"

[arnd@arndb.de: x86: fix fortified memcpy]
  Link: http://lkml.kernel.org/r/20170627150047.660360-1-arnd@arndb.de
[keescook@chromium.org: avoid panic() in favor of BUG()]
  Link: http://lkml.kernel.org/r/20170626235122.GA25261@beast
[keescook@chromium.org: move from -mm, add ARCH_HAS_FORTIFY_SOURCE, tweak Kconfig help]
Link: http://lkml.kernel.org/r/20170526095404.20439-1-danielmicay@gmail.com
Link: http://lkml.kernel.org/r/1497903987-21002-8-git-send-email-keescook@chromium.org
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Nicholas Piggin 05a4a95279 kernel/watchdog: split up config options
Split SOFTLOCKUP_DETECTOR from LOCKUP_DETECTOR, and split
HARDLOCKUP_DETECTOR_PERF from HARDLOCKUP_DETECTOR.

LOCKUP_DETECTOR implies the general boot, sysctl, and programming
interfaces for the lockup detectors.

An architecture that wants to use a hard lockup detector must define
HAVE_HARDLOCKUP_DETECTOR_PERF or HAVE_HARDLOCKUP_DETECTOR_ARCH.

Alternatively an arch can define HAVE_NMI_WATCHDOG, which provides the
minimum arch_touch_nmi_watchdog, and it otherwise does its own thing and
does not implement the LOCKUP_DETECTOR interfaces.

sparc is unusual in that it has started to implement some of the
interfaces, but not fully yet.  It should probably be converted to a full
HAVE_HARDLOCKUP_DETECTOR_ARCH.

[npiggin@gmail.com: fix]
  Link: http://lkml.kernel.org/r/20170617223522.66c0ad88@roar.ozlabs.ibm.com
Link: http://lkml.kernel.org/r/20170616065715.18390-4-npiggin@gmail.com
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Babu Moger <babu.moger@oracle.com>
Tested-by: Babu Moger <babu.moger@oracle.com>	[sparc]
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:02 -07:00
Linus Torvalds 98ced886dd Kbuild thin archives updates for v4.13
Thin archives migration by Nicholas Piggin.
 
 THIN_ARCHIVES has been available for a while as an optional feature
 only for PowerPC architecture, but we do not need two different
 intermediate-artifact schemes.
 
 Using thin archives instead of conventional incremental linking has
 various advantages:
  - save disk space for builds
  - speed-up building a little
  - fix some link issues (for example, allyesconfig on ARM) due to
    more flexibility for the final linking
  - work better with dead code elimination we are planning
 
 As discussed before, this migration has been done unconditionally
 so that any problems caused by this will show up with "git bisect".
 
 With testing with 0-day and linux-next, some architectures actually
 showed up problems, but they were trivial and all fixed now.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZXsiSAAoJED2LAQed4NsGfqUQAIxbR4JcFCeGNNqgOV1q7Ban
 CaMzVZWPum0Mq+JWzknHrCJQzBE+4BPLbOtZH4Y0YhjXVfc2/M8QkzEzSWyEPm03
 FyaQ6WTq479mv7Ot2nAwaRSUYNSOuvlCx5KUOxITMJ/VmxwXXc9fCuT3ORu9opdK
 4iyh0P2D+IeABQlrS5k1Rj+y4u/BtpiGY9U5RDssn7u8sjEgBHWFXFfE2fQ0No+0
 1lzwa5EVyPHuq0XTBeZkPSDNxtou4iZzQC9QeNIYlyiod1G9deE4lzB55s+Qtkk0
 h6rN9WF+Rvy7/hjFUJy0TDPNx0io2kdJxMaMKp2HaES49w5fHv7NAgxuipFC91vE
 5UKs1sXxBe8dpPjfZWY7QSQ/JQv6NuG7NWcSGM29BWy3yFefSAXCggM+nn5IWzLH
 pSutfOBGeceJdyKMcdn3AgcHCj0wddFxX8AXst+ZebnqVoNxR/Nu6HGmyaucwyp3
 6fFTkbZ6DvOlu9MKbK0HSqrsT3DlAas2YWZKZ4Cc20wM99Z0OtFZlmpMCRIdiYtx
 hZBwze/ElheUbZu6igH6UX2lpOlat0V6nT5vKHGGeOJlwkxduKi3Kj6zVSkCHic5
 w3NLXr5FDWdkrMiC6/Z0Uae5mtAWOYyt6z1CwjgVmFrAkqlL8aWNagOcDCSFc1qR
 +3Cv7pZQSRWy2TaaLMzo
 =PAWi
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild thin archives updates from Masahiro Yamada:
 "Thin archives migration by Nicholas Piggin.

  THIN_ARCHIVES has been available for a while as an optional feature
  only for PowerPC architecture, but we do not need two different
  intermediate-artifact schemes.

  Using thin archives instead of conventional incremental linking has
  various advantages:

   - save disk space for builds

   - speed-up building a little

   - fix some link issues (for example, allyesconfig on ARM) due to more
     flexibility for the final linking

   - work better with dead code elimination we are planning

  As discussed before, this migration has been done unconditionally so
  that any problems caused by this will show up with "git bisect".

  With testing with 0-day and linux-next, some architectures actually
  showed up problems, but they were trivial and all fixed now"

* tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  tile: remove unneeded extra-y in Makefile
  kbuild: thin archives make default for all archs
  x86/um: thin archives build fix
  tile: thin archives fix linking
  ia64: thin archives fix linking
  sh: thin archives fix linking
  kbuild: handle libs-y archives separately from built-in.o archives
  kbuild: thin archives use P option to ar
  kbuild: thin archives final link close --whole-archives option
  ia64: remove unneeded extra-y in Makefile.gate
  tile: fix dependency and .*.cmd inclusion for incremental build
  sparc64: Use indirect calls in hamming weight stubs
2017-07-07 15:11:12 -07:00
Linus Torvalds 59005b0c59 GCC plugin updates:
- typo fix in Kconfig (Jean Delvare)
 - randstruct infrastructure
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZXG6JAAoJEIly9N/cbcAmoO4P/jgF32XpC/HYGxcLARpcXUFr
 Dct/KJa6LdSIkeiMlmJD2DaLVQqeIyqQd8Aq/6jv4OMC3KtlquAygx4DoGh2zYYP
 HbSBiHz/czL1FCQpbXma2UUff1EDwuNM+wBJp80MgXy6J5KiKjB7yQAp9g0QS4o9
 3WSSitr9VcPEoxF7J9zySobd41IClFYnf1yi/gms2T/uvOHWEqDTUl06Dl3AEXPo
 0C/nMC4sNFggfTcsseAP7HGKiFyGErz2iER5wM0KXmU5eo4wgBK+mNN+n+oz1Doq
 BvkXraAyeor3YsKdu1oOkyeNK8iRscfeiqWUv86kBtfP3vNKUmWmpo77O3qGz5ra
 BwqcPF7nCtejs+QRVgeCrq3M/TUP1USN6shYS1uRVV5EPSy5NAsMO11Nzft7jaax
 LHQxJrCUeO2fHs2vTlzmwoxFq/9882LFRmOzuKqXAnhMQyuySdtbK4rs7ap4gjIt
 Zg6m0xDZWxPdIIrtoZGRuTcMSwV5QT4oTFQ125dgPO6zX9pwUWwN4Sg2zwn6aMx5
 BuHiJmfZsz48TRv1ui7wWjMNrMs8XnUPEOQUJpNHlDbuZbK+WRoIIUjVvtffSclu
 InpFCEq7OSov45ASYZ0SLNJO3N5L1zWjjjrJ3BQjCTxBNLUniBp6w2byWq0XObPD
 BnkZ3MA9xvkvrDsucAkm
 =rtdH
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull GCC plugin updates from Kees Cook:
 "The big part is the randstruct plugin infrastructure.

  This is the first of two expected pull requests for randstruct since
  there are dependencies in other trees that would be easier to merge
  once those have landed. Notably, the IPC allocation refactoring in
  -mm, and many trivial merge conflicts across several trees when
  applying the __randomize_layout annotation.

  As a result, it seemed like I should send this now since it is
  relatively self-contained, and once the rest of the trees have landed,
  send the annotation patches. I'm expecting the final phase of
  randstruct (automatic struct selection) will land for v4.14, but if
  its other tree dependencies actually make it for v4.13, I can send
  that merge request too.

  Summary:

  - typo fix in Kconfig (Jean Delvare)

  - randstruct infrastructure"

* tag 'gcc-plugins-v4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  ARM: Prepare for randomized task_struct
  randstruct: Whitelist NIU struct page overloading
  randstruct: Whitelist big_key path struct overloading
  randstruct: Whitelist UNIXCB cast
  randstruct: Whitelist struct security_hook_heads cast
  gcc-plugins: Add the randstruct plugin
  Fix English in description of GCC_PLUGIN_STRUCTLEAK
  compiler: Add __designated_init annotation
  gcc-plugins: Detail c-common.h location for GCC 4.6
2017-07-05 11:46:59 -07:00
Kees Cook 03232e0dde Merge branch 'for-next/gcc-plugin-infrastructure' into merge/randstruct 2017-07-04 21:40:47 -07:00
Nicholas Piggin 799c434154 kbuild: thin archives make default for all archs
Make thin archives build the default, but keep the config option
to allow exemptions if any breakage can't be quickly solved.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2017-06-30 09:03:05 +09:00
Kees Cook fd25d19f6b locking/refcount: Create unchecked atomic_t implementation
Many subsystems will not use refcount_t unless there is a way to build the
kernel so that there is no regression in speed compared to atomic_t. This
adds CONFIG_REFCOUNT_FULL to enable the full refcount_t implementation
which has the validation but is slightly slower. When not enabled,
refcount_t uses the basic unchecked atomic_t routines, which results in
no code changes compared to just using atomic_t directly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Windsor <dwindsor@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Elena Reshetova <elena.reshetova@intel.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Hans Liljestrand <ishkamiel@gmail.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Jann Horn <jannh@google.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Serge E. Hallyn <serge@hallyn.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arozansk@redhat.com
Cc: axboe@kernel.dk
Cc: linux-arch <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/20170621200026.GA115679@beast
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-28 18:54:46 +02:00
Kees Cook 313dd1b629 gcc-plugins: Add the randstruct plugin
This randstruct plugin is modified from Brad Spengler/PaX Team's code
in the last public patch of grsecurity/PaX based on my understanding
of the code. Changes or omissions from the original code are mine and
don't reflect the original grsecurity/PaX code.

The randstruct GCC plugin randomizes the layout of selected structures
at compile time, as a probabilistic defense against attacks that need to
know the layout of structures within the kernel. This is most useful for
"in-house" kernel builds where neither the randomization seed nor other
build artifacts are made available to an attacker. While less useful for
distribution kernels (where the randomization seed must be exposed for
third party kernel module builds), it still has some value there since now
all kernel builds would need to be tracked by an attacker.

In more performance sensitive scenarios, GCC_PLUGIN_RANDSTRUCT_PERFORMANCE
can be selected to make a best effort to restrict randomization to
cacheline-sized groups of elements, and will not randomize bitfields. This
comes at the cost of reduced randomization.

Two annotations are defined,__randomize_layout and __no_randomize_layout,
which respectively tell the plugin to either randomize or not to
randomize instances of the struct in question. Follow-on patches enable
the auto-detection logic for selecting structures for randomization
that contain only function pointers. It is disabled here to assist with
bisection.

Since any randomized structs must be initialized using designated
initializers, __randomize_layout includes the __designated_init annotation
even when the plugin is disabled so that all builds will require
the needed initialization. (With the plugin enabled, annotations for
automatically chosen structures are marked as well.)

The main differences between this implemenation and grsecurity are:
- disable automatic struct selection (to be enabled in follow-up patch)
- add designated_init attribute at runtime and for manual marking
- clarify debugging output to differentiate bad cast warnings
- add whitelisting infrastructure
- support gcc 7's DECL_ALIGN and DECL_MODE changes (Laura Abbott)
- raise minimum required GCC version to 4.7

Earlier versions of this patch series were ported by Michael Leibowitz.

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-06-22 16:15:45 -07:00
Jean Delvare f136e090c7 Fix English in description of GCC_PLUGIN_STRUCTLEAK
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: c61f13eaa1 ("gcc-plugins: Add structleak for more stack initialization")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-06-19 12:37:32 -07:00
Linus Torvalds de4d195308 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main changes are:

   - Debloat RCU headers

   - Parallelize SRCU callback handling (plus overlapping patches)

   - Improve the performance of Tree SRCU on a CPU-hotplug stress test

   - Documentation updates

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
  rcu: Open-code the rcu_cblist_n_lazy_cbs() function
  rcu: Open-code the rcu_cblist_n_cbs() function
  rcu: Open-code the rcu_cblist_empty() function
  rcu: Separately compile large rcu_segcblist functions
  srcu: Debloat the <linux/rcu_segcblist.h> header
  srcu: Adjust default auto-expediting holdoff
  srcu: Specify auto-expedite holdoff time
  srcu: Expedite first synchronize_srcu() when idle
  srcu: Expedited grace periods with reduced memory contention
  srcu: Make rcutorture writer stalls print SRCU GP state
  srcu: Exact tracking of srcu_data structures containing callbacks
  srcu: Make SRCU be built by default
  srcu: Fix Kconfig botch when SRCU not selected
  rcu: Make non-preemptive schedule be Tasks RCU quiescent state
  srcu: Expedite srcu_schedule_cbs_snp() callback invocation
  srcu: Parallelize callback handling
  kvm: Move srcu_struct fields to end of struct kvm
  rcu: Fix typo in PER_RCU_NODE_PERIOD header comment
  rcu: Use true/false in assignment to bool
  rcu: Use bool value directly
  ...
2017-05-10 10:30:46 -07:00
Hari Bathini 692f66f26a crash: move crashkernel parsing and vmcore related code under CONFIG_CRASH_CORE
Patch series "kexec/fadump: remove dependency with CONFIG_KEXEC and
reuse crashkernel parameter for fadump", v4.

Traditionally, kdump is used to save vmcore in case of a crash.  Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism.  Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.

This patchset removes dependency with CONFIG_KEXEC for crashkernel
parameter and vmcoreinfo related code as it can be reused without kexec
support.  Also, crashkernel parameter is reused instead of
fadump_reserve_mem to reserve memory for fadump.

The first patch moves crashkernel parameter parsing and vmcoreinfo
related code under CONFIG_CRASH_CORE instead of CONFIG_KEXEC_CORE.  The
second patch reuses the definitions of append_elf_note() & final_note()
functions under CONFIG_CRASH_CORE in IA64 arch code.  The third patch
removes dependency on CONFIG_KEXEC for firmware-assisted dump (fadump)
in powerpc.  The next patch reuses crashkernel parameter for reserving
memory for fadump, instead of the fadump_reserve_mem parameter.  This
has the advantage of using all syntaxes crashkernel parameter supports,
for fadump as well.  The last patch updates fadump kernel documentation
about use of crashkernel parameter.

This patch (of 5):

Traditionally, kdump is used to save vmcore in case of a crash.  Some
architectures like powerpc can save vmcore using architecture specific
support instead of kexec/kdump mechanism.  Such architecture specific
support also needs to reserve memory, to be used by dump capture kernel.
crashkernel parameter can be a reused, for memory reservation, by such
architecture specific infrastructure.

But currently, code related to vmcoreinfo and parsing of crashkernel
parameter is built under CONFIG_KEXEC_CORE.  This patch introduces
CONFIG_CRASH_CORE and moves the above mentioned code under this config,
allowing code reuse without dependency on CONFIG_KEXEC.  There is no
functional change with this patch.

Link: http://lkml.kernel.org/r/149035338104.6881.4550894432615189948.stgit@hbathini.in.ibm.com
Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Acked-by: Dave Young <dyoung@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-05-08 17:15:11 -07:00
Linus Torvalds 76f1948a79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatch updates from Jiri Kosina:

 - a per-task consistency model is being added for architectures that
   support reliable stack dumping (extending this, currently rather
   trivial set, is currently in the works).

   This extends the nature of the types of patches that can be applied
   by live patching infrastructure. The code stems from the design
   proposal made [1] back in November 2014. It's a hybrid of SUSE's
   kGraft and RH's kpatch, combining advantages of both: it uses
   kGraft's per-task consistency and syscall barrier switching combined
   with kpatch's stack trace switching. There are also a number of
   fallback options which make it quite flexible.

   Most of the heavy lifting done by Josh Poimboeuf with help from
   Miroslav Benes and Petr Mladek

   [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

 - module load time patch optimization from Zhou Chengming

 - a few assorted small fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add missing printk newlines
  livepatch: Cancel transition a safe way for immediate patches
  livepatch: Reduce the time of finding module symbols
  livepatch: make klp_mutex proper part of API
  livepatch: allow removal of a disabled patch
  livepatch: add /proc/<pid>/patch_state
  livepatch: change to a per-task consistency model
  livepatch: store function sizes
  livepatch: use kstrtobool() in enabled_store()
  livepatch: move patching functions into patch.c
  livepatch: remove unnecessary object loaded check
  livepatch: separate enabled and patched states
  livepatch/s390: add TIF_PATCH_PENDING thread flag
  livepatch/s390: reorganize TIF thread flag bits
  livepatch/powerpc: add TIF_PATCH_PENDING thread flag
  livepatch/x86: add TIF_PATCH_PENDING thread flag
  livepatch: create temporary klp_update_patch_state() stub
  x86/entry: define _TIF_ALLWORK_MASK flags explicitly
  stacktrace/x86: add function for detecting reliable stack traces
2017-05-02 18:24:16 -07:00
Paul E. McKenney 77e5849688 rcu: Make arch select smp_mb__after_unlock_lock() strength
The definition of smp_mb__after_unlock_lock() is currently smp_mb()
for CONFIG_PPC and a no-op otherwise.  It would be better to instead
provide an architecture-selectable Kconfig option, and select the
strength of smp_mb__after_unlock_lock() based on that option.  This
commit therefore creates ARCH_WEAK_RELEASE_ACQUIRE, has PPC select it,
and bases the definition of smp_mb__after_unlock_lock() on this new
ARCH_WEAK_RELEASE_ACQUIRE Kconfig option.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Boqun Feng <boqun.feng@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: <linuxppc-dev@lists.ozlabs.org>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-04-18 11:20:15 -07:00
Dmitry Safonov 1b028f784e x86/mm: Introduce mmap_compat_base() for 32-bit mmap()
mmap() uses a base address, from which it starts to look for a free space
for allocation.

The base address is stored in mm->mmap_base, which is calculated during
exec(). The address depends on task's size, set rlimit for stack, ASLR
randomization. The base depends on the task size and the number of random
bits which are different for 64-bit and 32bit applications.

Due to the fact, that the base address is fixed, its mmap() from a compat
(32bit) syscall issued by a 64bit task will return a address which is based
on the 64bit base address and does not fit into the 32bit address space
(4GB). The returned pointer is truncated to 32bit, which results in an
invalid address.

To solve store a seperate compat address base plus a compat legacy address
base in mm_struct. These bases are calculated at exec() time and can be
used later to address the 32bit compat mmap() issued by 64 bit
applications.

As a consequence of this change 32-bit applications issuing a 64-bit
syscall (after doing a long jump) will get a 64-bit mapping now. Before
this change 32-bit applications always got a 32bit mapping.

[ tglx: Massaged changelog and added a comment ]

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: 0x7f454c46@gmail.com
Cc: linux-mm@kvack.org
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: http://lkml.kernel.org/r/20170306141721.9188-4-dsafonov@virtuozzo.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-03-13 14:59:22 +01:00
Josh Poimboeuf af085d9084 stacktrace/x86: add function for detecting reliable stack traces
For live patching and possibly other use cases, a stack trace is only
useful if it can be assured that it's completely reliable.  Add a new
save_stack_trace_tsk_reliable() function to achieve that.

Note that if the target task isn't the current task, and the target task
is allowed to run, then it could be writing the stack while the unwinder
is reading it, resulting in possible corruption.  So the caller of
save_stack_trace_tsk_reliable() must ensure that the task is either
'current' or inactive.

save_stack_trace_tsk_reliable() relies on the x86 unwinder's detection
of pt_regs on the stack.  If the pt_regs are not user-mode registers
from a syscall, then they indicate an in-kernel interrupt or exception
(e.g. preemption or a page fault), in which case the stack is considered
unreliable due to the nature of frame pointers.

It also relies on the x86 unwinder's detection of other issues, such as:

- corrupted stack data
- stack grows the wrong way
- stack walk doesn't reach the bottom
- user didn't provide a large enough entries array

Such issues are reported by checking unwind_error() and !unwind_done().

Also add CONFIG_HAVE_RELIABLE_STACKTRACE so arch-independent code can
determine at build time whether the function is implemented.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Acked-by: Ingo Molnar <mingo@kernel.org>	# for the x86 changes
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2017-03-08 09:18:02 +01:00
Masahiro Yamada 9332ef9dbd scripts/spelling.txt: add "an user" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  an user||a user
  an userspace||a userspace

I also added "userspace" to the list since it is a common word in Linux.
I found some instances for "an userfaultfd", but I did not add it to the
list.  I felt it is endless to find words that start with "user" such as
"userland" etc., so must draw a line somewhere.

Link: http://lkml.kernel.org/r/1481573103-11329-4-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Matthew Wilcox a00cc7d9dd mm, x86: add support for PUD-sized transparent hugepages
The current transparent hugepage code only supports PMDs.  This patch
adds support for transparent use of PUDs with DAX.  It does not include
support for anonymous pages.  x86 support code also added.

Most of this patch simply parallels the work that was done for huge
PMDs.  The only major difference is how the new ->pud_entry method in
mm_walk works.  The ->pmd_entry method replaces the ->pte_entry method,
whereas the ->pud_entry method works along with either ->pmd_entry or
->pte_entry.  The pagewalk code takes care of locking the PUD before
calling ->pud_walk, so handlers do not need to worry whether the PUD is
stable.

[dave.jiang@intel.com: fix SMP x86 32bit build for native_pud_clear()]
  Link: http://lkml.kernel.org/r/148719066814.31111.3239231168815337012.stgit@djiang5-desk3.ch.intel.com
[dave.jiang@intel.com: native_pud_clear missing on i386 build]
  Link: http://lkml.kernel.org/r/148640375195.69754.3315433724330910314.stgit@djiang5-desk3.ch.intel.com
Link: http://lkml.kernel.org/r/148545059381.17912.8602162635537598445.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alexander Kapshuk <alexander.kapshuk@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nilesh Choudhury <nilesh.choudhury@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Linus Torvalds 3051bf36c2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Support TX_RING in AF_PACKET TPACKET_V3 mode, from Sowmini
      Varadhan.

   2) Simplify classifier state on sk_buff in order to shrink it a bit.
      From Willem de Bruijn.

   3) Introduce SIPHASH and it's usage for secure sequence numbers and
      syncookies. From Jason A. Donenfeld.

   4) Reduce CPU usage for ICMP replies we are going to limit or
      suppress, from Jesper Dangaard Brouer.

   5) Introduce Shared Memory Communications socket layer, from Ursula
      Braun.

   6) Add RACK loss detection and allow it to actually trigger fast
      recovery instead of just assisting after other algorithms have
      triggered it. From Yuchung Cheng.

   7) Add xmit_more and BQL support to mvneta driver, from Simon Guinot.

   8) skb_cow_data avoidance in esp4 and esp6, from Steffen Klassert.

   9) Export MPLS packet stats via netlink, from Robert Shearman.

  10) Significantly improve inet port bind conflict handling, especially
      when an application is restarted and changes it's setting of
      reuseport. From Josef Bacik.

  11) Implement TX batching in vhost_net, from Jason Wang.

  12) Extend the dummy device so that VF (virtual function) features,
      such as configuration, can be more easily tested. From Phil
      Sutter.

  13) Avoid two atomic ops per page on x86 in bnx2x driver, from Eric
      Dumazet.

  14) Add new bpf MAP, implementing a longest prefix match trie. From
      Daniel Mack.

  15) Packet sample offloading support in mlxsw driver, from Yotam Gigi.

  16) Add new aquantia driver, from David VomLehn.

  17) Add bpf tracepoints, from Daniel Borkmann.

  18) Add support for port mirroring to b53 and bcm_sf2 drivers, from
      Florian Fainelli.

  19) Remove custom busy polling in many drivers, it is done in the core
      networking since 4.5 times. From Eric Dumazet.

  20) Support XDP adjust_head in virtio_net, from John Fastabend.

  21) Fix several major holes in neighbour entry confirmation, from
      Julian Anastasov.

  22) Add XDP support to bnxt_en driver, from Michael Chan.

  23) VXLAN offloads for enic driver, from Govindarajulu Varadarajan.

  24) Add IPVTAP driver (IP-VLAN based tap driver) from Sainath Grandhi.

  25) Support GRO in IPSEC protocols, from Steffen Klassert"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1764 commits)
  Revert "ath10k: Search SMBIOS for OEM board file extension"
  net: socket: fix recvmmsg not returning error from sock_error
  bnxt_en: use eth_hw_addr_random()
  bpf: fix unlocking of jited image when module ronx not set
  arch: add ARCH_HAS_SET_MEMORY config
  net: napi_watchdog() can use napi_schedule_irqoff()
  tcp: Revert "tcp: tcp_probe: use spin_lock_bh()"
  net/hsr: use eth_hw_addr_random()
  net: mvpp2: enable building on 64-bit platforms
  net: mvpp2: switch to build_skb() in the RX path
  net: mvpp2: simplify MVPP2_PRS_RI_* definitions
  net: mvpp2: fix indentation of MVPP2_EXT_GLOBAL_CTRL_DEFAULT
  net: mvpp2: remove unused register definitions
  net: mvpp2: simplify mvpp2_bm_bufs_add()
  net: mvpp2: drop useless fields in mvpp2_bm_pool and related code
  net: mvpp2: remove unused 'tx_skb' field of 'struct mvpp2_tx_queue'
  net: mvpp2: release reference to txq_cpu[] entry after unmapping
  net: mvpp2: handle too large value in mvpp2_rx_time_coal_set()
  net: mvpp2: handle too large value handling in mvpp2_rx_pkts_coal_set()
  net: mvpp2: remove useless arguments in mvpp2_rx_{pkts, time}_coal_set
  ...
2017-02-22 10:15:09 -08:00
Linus Torvalds 1e74a2eb1f Updates to the gcc-plugins:
- infrastructure updates (gcc-common.h)
 - introduce structleak plugin for forced initialization of some structures
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJYrR4wAAoJEIly9N/cbcAm4Z0P/36LnCI5ji26A2RF6F12C8jG
 I/g7FXtVP3H3rgn8yXOKetdGT8TggWseUs/+TjJ7LmeDYWcVPB9Cf1SEQO1uT+d6
 7Y5teuc+1sqYIx5lYMUXfpZ4z34M3lRMtqfZE56UxDe5B57NqXLqsjv+t1LssxIu
 4wuFy5GIfou360jdYzH5bORYhV9FX/b108f+N0TZCXubu8nV5+DqTO/amgZn/7BU
 8kWl3VzX28/PwE6RCvi/olzjA+v4chHFvvl0LKA3WbB52Jcp9JXIg8jMZJd47fOv
 kx7WO7YgCPY2T/SwXC3vKNxvOr8P5MEfZ15cSVccUfPLdhx3+lNEaDXSr0On9zcl
 6dR0ozKCOAgrIojzocSpIAuSP2eMw0czXl+pXYRMRwSqGlWpjWmdvEHhkwodhFwL
 +O0Rjw2Joj0mdCP+lSb1H9u29UOC7jqfxqxCV1NCUewBEwm1JN4RDkv+9ZdRnMQX
 Fh4dkMo4xHvzREpdkf2h+ymFFdMiWNIwEm8ZciUnMop8ydhJ/B8akcLzEzS+0aOB
 cw5Vzb2Vh5eDsR10EVZu5ONJ9SVJEQSGTplyIY6TmteLqj7qNMAZoeiZPI3QjnDn
 9bVqBvWEkyGTiTAC0azCnaeBIqCpwfYTJa7q+Hs7rdxuy29pxsaXpH9WIblYYAPU
 2KArvOGQ28Fjf5xwocRI
 =fGQr
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc-plugins updates from Kees Cook:
 "This includes infrastructure updates and the structleak plugin, which
  performs forced initialization of certain structures to avoid possible
  information exposures to userspace.

  Summary:

   - infrastructure updates (gcc-common.h)

   - introduce structleak plugin for forced initialization of some
     structures"

* tag 'gcc-plugins-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: Add structleak for more stack initialization
  gcc-plugins: consolidate on PASS_INFO macro
  gcc-plugins: add PASS_INFO and build_const_char_string()
2017-02-22 09:05:47 -08:00
Daniel Borkmann d2852a2240 arch: add ARCH_HAS_SET_MEMORY config
Currently, there's no good way to test for the presence of
set_memory_ro/rw/x/nx() helpers implemented by archs such as
x86, arm, arm64 and s390.

There's DEBUG_SET_MODULE_RONX and DEBUG_RODATA, however both
don't really reflect that: set_memory_*() are also available
even when DEBUG_SET_MODULE_RONX is turned off, and DEBUG_RODATA
is set by parisc, but doesn't implement above functions. Thus,
add ARCH_HAS_SET_MEMORY that is selected by mentioned archs,
where generic code can test against this.

This also allows later on to move DEBUG_SET_MODULE_RONX out of
the arch specific Kconfig to define it only once depending on
ARCH_HAS_SET_MEMORY.

Suggested-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-21 13:30:13 -05:00
Laura Abbott 0f5bf6d0af arch: Rename CONFIG_DEBUG_RODATA and CONFIG_DEBUG_MODULE_RONX
Both of these options are poorly named. The features they provide are
necessary for system security and should not be considered debug only.
Change the names to CONFIG_STRICT_KERNEL_RWX and
CONFIG_STRICT_MODULE_RWX to better describe what these options do.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Jessica Yu <jeyu@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-07 12:32:52 -08:00
Laura Abbott ad21fc4faa arch: Move CONFIG_DEBUG_RODATA and CONFIG_SET_MODULE_RONX to be common
There are multiple architectures that support CONFIG_DEBUG_RODATA and
CONFIG_SET_MODULE_RONX. These options also now have the ability to be
turned off at runtime. Move these to an architecture independent
location and make these options def_bool y for almost all of those
arches.

Signed-off-by: Laura Abbott <labbott@redhat.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-02-07 12:32:52 -08:00
Mao Wenan 1a8b6d76dc net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering
Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
enhance the performance for some cpu architecure, such as SPARC and so on.
Currently it only supports one special cpu architecture(SPARC) in 82599
driver to enable RO feature, this is not very common for other cpu architecture
which really needs RO feature.
This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Reviewed-by: Alexander Duyck <alexander.duyck@gmail.com>
Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-01-18 16:33:00 -05:00
Kees Cook c61f13eaa1 gcc-plugins: Add structleak for more stack initialization
This plugin detects any structures that contain __user attributes and
makes sure it is being fully initialized so that a specific class of
information exposure is eliminated. (This plugin was originally designed
to block the exposure of siginfo in CVE-2013-2141.)

Ported from grsecurity/PaX. This version adds a verbose option to the
plugin and the Kconfig.

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-01-18 12:02:35 -08:00
Thiago Jung Bauermann 467d278249 powerpc: ima: get the kexec buffer passed by the previous kernel
Patch series "ima: carry the measurement list across kexec", v8.

The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg.  kexec -e), the IMA measurement
list of the running kernel must be saved and then restored on the
subsequent boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list.  This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format.  The assumption being that userspace could and would
handle any architecture conversions.  With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash.  To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg.  TPM 2.0
support for larger digests).

A simplified method of Thiago Bauermann's "kexec buffer handover" patch
series for carrying the IMA measurement list across kexec is included in
this patch set.  The simplified method requires all file measurements be
taken prior to executing the kexec load, as subsequent measurements will
not be carried across the kexec and restored.

This patch (of 10):

The IMA kexec buffer allows the currently running kernel to pass the
measurement list via a kexec segment to the kernel that will be kexec'd.
The second kernel can check whether the previous kernel sent the buffer
and retrieve it.

This is the architecture-specific part which enables IMA to receive the
measurement list passed by the previous kernel.  It will be used in the
next patch.

The change in machine_kexec_64.c is to factor out the logic of removing
an FDT memory reservation so that it can be used by remove_ima_buffer.

Link: http://lkml.kernel.org/r/1480554346-29071-2-git-send-email-zohar@linux.vnet.ibm.com
Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andreas Steffen <andreas.steffen@strongswan.org>
Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
Cc: Josh Sklar <sklar@linux.vnet.ibm.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:40 -08:00
Linus Torvalds 22d8262c33 Minor changes to the gcc plugins:
- Add the gcc plugins Makefile to MAINTAINERS to route things correctly
 
 - Hide cyc_complexity behind !CONFIG_TEST for the future unhiding of
   plugins generally.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJYSKINAAoJEIly9N/cbcAmbPYP/3rdXMDtMyUCqUNz/EljLsqw
 f1oVed7HPMGC63qlMOFMjWXvsNbtcWUVvk892VZH6Tt3zp3AfwubqcEOg3xp6LzQ
 nuJpPgufYEtgErra4cZRvxwDfX+YlaYMDqhmfezGLMKB4mYnuTLpSJOo1V0yXBPW
 H8Isb9wYTeh8TlwAtfnKolexvbdB7lTkiXyPSanVvgsQRimi32/AXR7sqD6u+OfH
 /K7gUv0xo/X9tf4eva9WQDDOyznE0b36OEI1if0nrIhC8i0Uy7MMIMoE2PNPVlCy
 /kPFrK1QQpyjqhihqbjbGgYzQFHz6vSpW/ByVBVYiiV7qXKtMl9v1nDsZE0Qgyqe
 yWQwxhZCBISWJIOm2s95rH5LWNiMOVe3UsgBZ8PENv09CWzqJv7P1gkme15MSexD
 pA0EpjUUnPpWi7GJjLS7NhZYtYfn+kel2JjTI/zvTtQ/8KBoyLTJC/poNWpDHtqf
 elM/YUtFwsyu5xXBuayAv9Gbbm7OAxToBBzz6PkNBUNraWsnZZyuXCQSDNYyjKKU
 3SrB6h2e5mSjDwnQeWed7AQbeClqkt6Flza8EBw0XppVUYPtNbyIPz6sJk52JnLA
 3nRO9F2IPfQ2bn7UaGK9UQPHxiiP1OeN1OtHqr5RX/PXdj9ugF6eccticubhUC/U
 7HS4junHivL69cOmF8A2
 =FEZG
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins updates from Kees Cook:
 "Minor changes to the gcc plugins:

   - add the gcc plugins Makefile to MAINTAINERS to route things
     correctly

   - hide cyc_complexity behind !CONFIG_TEST for the future unhiding of
     plugins generally"

* tag 'gcc-plugins-v4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  gcc-plugins: Adjust Kconfig to avoid cyc_complexity
  MAINTAINERS: add GCC plugins Makefile
2016-12-13 09:22:21 -08:00
Linus Torvalds 92c020d08d Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main scheduler changes in this cycle were:

   - support Intel Turbo Boost Max Technology 3.0 (TBM3) by introducig a
     notion of 'better cores', which the scheduler will prefer to
     schedule single threaded workloads on. (Tim Chen, Srinivas
     Pandruvada)

   - enhance the handling of asymmetric capacity CPUs further (Morten
     Rasmussen)

   - improve/fix load handling when moving tasks between task groups
     (Vincent Guittot)

   - simplify and clean up the cputime code (Stanislaw Gruszka)

   - improve mass fork()ed task spread a.k.a. hackbench speedup (Vincent
     Guittot)

   - make struct kthread kmalloc()ed and related fixes (Oleg Nesterov)

   - add uaccess atomicity debugging (when using access_ok() in the
     wrong context), under CONFIG_DEBUG_ATOMIC_SLEEP=y (Peter Zijlstra)

   - implement various fixes, cleanups and other enhancements (Daniel
     Bristot de Oliveira, Martin Schwidefsky, Rafael J. Wysocki)"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
  sched/core: Use load_avg for selecting idlest group
  sched/core: Fix find_idlest_group() for fork
  kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker()
  kthread: Don't use to_live_kthread() in kthread_[un]park()
  kthread: Don't use to_live_kthread() in kthread_stop()
  Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
  kthread: Make struct kthread kmalloc'ed
  x86/uaccess, sched/preempt: Verify access_ok() context
  sched/x86: Make CONFIG_SCHED_MC_PRIO=y easier to enable
  sched/x86: Change CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO
  x86/sched: Use #include <linux/mutex.h> instead of #include <asm/mutex.h>
  cpufreq/intel_pstate: Use CPPC to get max performance
  acpi/bus: Set _OSC for diverse core support
  acpi/bus: Enable HWP CPPC objects
  x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  x86/sysctl: Add sysctl for ITMT scheduling feature
  x86: Enable Intel Turbo Boost Max Technology 3.0
  x86/topology: Define x86's arch_update_cpu_topology
  sched: Extend scheduler's asym packing
  sched/fair: Clean up the tunable parameter definitions
  ...
2016-12-12 12:15:10 -08:00
Allen Pais e8f4aa6087 sparc64:Support User Probes for sparc
Signed-off-by: Eric Saint Etienne <eric.saint.etienne@oracle.com>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-11 18:01:51 -08:00
Stanislaw Gruszka 40565b5aed sched/cputime, powerpc, s390: Make scaled cputime arch specific
Only s390 and powerpc have hardware facilities allowing to measure
cputimes scaled by frequency. On all other architectures
utimescaled/stimescaled are equal to utime/stime (however they are
accounted separately).

Remove {u,s}timescaled accounting on all architectures except
powerpc and s390, where those values are explicitly accounted
in the proper places.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161031162143.GB12646@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-11-15 09:51:05 +01:00
Kees Cook 215e2aa6c0 gcc-plugins: Adjust Kconfig to avoid cyc_complexity
In preparation for removing "depends on !COMPILE_TEST" from GCC_PLUGINS,
the GCC_PLUGIN_CYC_COMPLEXITY plugin needs to gain the restriction,
since it is mainly an example, and produces (intended) voluminous stderr
reporting, which is generally undesirable for allyesconfig-style build
tests. This additionally puts the plugin behind EXPERT and improves the
help text.

Signed-off-by: Kees Cook <keescook@chromium.org>
2016-11-08 16:27:03 -08:00
Linus Torvalds 9ffc66941d This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
 possible, hoping to capitalize on any possible variation in CPU operation
 (due to runtime data differences, hardware differences, SMP ordering,
 thermal timing variation, cache behavior, etc).
 
 At the very least, this plugin is a much more comprehensive example for
 how to manipulate kernel code using the gcc plugin internals.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJX/BAFAAoJEIly9N/cbcAmzW8QALFbCs7EFFkML+M/M/9d8zEk
 1QbUs/z8covJTTT1PjSdw7JUrAMulI3S00owpcQVd/PcWjRPU80QwfsXBgIB0tvC
 Kub2qxn6Oaf+kTB646zwjFgjdCecw/USJP+90nfcu2+LCnE8ReclKd1aUee+Bnhm
 iDEUyH2ONIoWq6ta2Z9sA7+E4y2ZgOlmW0iga3Mnf+OcPtLE70fWPoe5E4g9DpYk
 B+kiPDrD9ql5zsHaEnKG1ldjiAZ1L6Grk8rGgLEXmbOWtTOFmnUhR+raK5NA/RCw
 MXNuyPay5aYPpqDHFm+OuaWQAiPWfPNWM3Ett4k0d9ZWLixTcD1z68AciExwk7aW
 SEA8b1Jwbg05ZNYM7NJB6t6suKC4dGPxWzKFOhmBicsh2Ni5f+Az0BQL6q8/V8/4
 8UEqDLuFlPJBB50A3z5ngCVeYJKZe8Bg/Swb4zXl6mIzZ9darLzXDEV6ystfPXxJ
 e1AdBb41WC+O2SAI4l64yyeswkGo3Iw2oMbXG5jmFl6wY/xGp7dWxw7gfnhC6oOh
 afOT54p2OUDfSAbJaO0IHliWoIdmE5ZYdVYVU9Ek+uWyaIwcXhNmqRg+Uqmo32jf
 cP5J9x2kF3RdOcbSHXmFp++fU+wkhBtEcjkNpvkjpi4xyA47IWS7lrVBBebrCq9R
 pa/A7CNQwibIV6YD8+/p
 =1dUK
 -----END PGP SIGNATURE-----

Merge tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull gcc plugins update from Kees Cook:
 "This adds a new gcc plugin named "latent_entropy". It is designed to
  extract as much possible uncertainty from a running system at boot
  time as possible, hoping to capitalize on any possible variation in
  CPU operation (due to runtime data differences, hardware differences,
  SMP ordering, thermal timing variation, cache behavior, etc).

  At the very least, this plugin is a much more comprehensive example
  for how to manipulate kernel code using the gcc plugin internals"

* tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  latent_entropy: Mark functions with __latent_entropy
  gcc-plugins: Add latent_entropy plugin
2016-10-15 10:03:15 -07:00
Linus Torvalds 84d69848c9 Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek:

 - EXPORT_SYMBOL for asm source by Al Viro.

   This does bring a regression, because genksyms no longer generates
   checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
   working on a patch to fix this.

   Plus, we are talking about functions like strcpy(), which rarely
   change prototypes.

 - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
   Piggin

 - fixdep speedup by Alexey Dobriyan.

 - preparatory work by Nick Piggin to allow architectures to build with
   -ffunction-sections, -fdata-sections and --gc-sections

 - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

 - fix for filenames with colons in the initramfs source by me.

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
  initramfs: Escape colons in depfile
  ppc: there is no clear_pages to export
  powerpc/64: whitelist unresolved modversions CRCs
  kbuild: -ffunction-sections fix for archs with conflicting sections
  kbuild: add arch specific post-link Makefile
  kbuild: allow archs to select link dead code/data elimination
  kbuild: allow architectures to use thin archives instead of ld -r
  kbuild: Regenerate genksyms lexer
  kbuild: genksyms fix for typeof handling
  fixdep: faster CONFIG_ search
  ia64: move exports to definitions
  sparc32: debride memcpy.S a bit
  [sparc] unify 32bit and 64bit string.h
  sparc: move exports to definitions
  ppc: move exports to definitions
  arm: move exports to definitions
  s390: move exports to definitions
  m68k: move exports to definitions
  alpha: move exports to actual definitions
  x86: move exports to actual definitions
  ...
2016-10-14 14:26:58 -07:00
Emese Revfy 38addce8b6 gcc-plugins: Add latent_entropy plugin
This adds a new gcc plugin named "latent_entropy". It is designed to
extract as much possible uncertainty from a running system at boot time as
possible, hoping to capitalize on any possible variation in CPU operation
(due to runtime data differences, hardware differences, SMP ordering,
thermal timing variation, cache behavior, etc).

At the very least, this plugin is a much more comprehensive example for
how to manipulate kernel code using the gcc plugin internals.

The need for very-early boot entropy tends to be very architecture or
system design specific, so this plugin is more suited for those sorts
of special cases. The existing kernel RNG already attempts to extract
entropy from reliable runtime variation, but this plugin takes the idea to
a logical extreme by permuting a global variable based on any variation
in code execution (e.g. a different value (and permutation function)
is used to permute the global based on loop count, case statement,
if/then/else branching, etc).

To do this, the plugin starts by inserting a local variable in every
marked function. The plugin then adds logic so that the value of this
variable is modified by randomly chosen operations (add, xor and rol) and
random values (gcc generates separate static values for each location at
compile time and also injects the stack pointer at runtime). The resulting
value depends on the control flow path (e.g., loops and branches taken).

Before the function returns, the plugin mixes this local variable into
the latent_entropy global variable. The value of this global variable
is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
though it does not credit any bytes of entropy to the pool; the contents
of the global are just used to mix the pool.

Additionally, the plugin can pre-initialize arrays with build-time
random contents, so that two different kernel builds running on identical
hardware will not have the same starting values.

Signed-off-by: Emese Revfy <re.emese@gmail.com>
[kees: expanded commit message and code comments]
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-10-10 14:51:44 -07:00
Nicholas Piggin 0f4c4af06e kbuild: -ffunction-sections fix for archs with conflicting sections
Enabling -ffunction-sections modified the generic linker script to
pull .text.* sections into regular TEXT_TEXT section, conflicting
with some architectures. Revert that change and require archs that
enable the option to ensure they have no conflicting section names,
and do the appropriate merging.

Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: b67067f117 ("kbuild: allow archs to select link dead code/data elimination")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-22 14:37:14 +02:00
Ingo Molnar d4b80afbba Merge branch 'linus' into x86/asm, to pick up recent fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-09-15 08:24:53 +02:00
Nicholas Piggin b67067f117 kbuild: allow archs to select link dead code/data elimination
Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
select to build with -ffunction-sections, -fdata-sections, and link
with --gc-sections. It requires some work (documented) to ensure all
unreferenced entrypoints are live, and requires toolchain and build
verification, so it is made a per-arch option for now.

On a random powerpc64le build, this yelds a significant size saving,
it boots and runs fine, but there is a lot I haven't tested as yet, so
these savings may be reduced if there are bugs in the link.

    text      data        bss        dec   filename
11169741   1180744    1923176	14273661   vmlinux
10445269   1004127    1919707	13369103   vmlinux.dce

~700K text, ~170K data, 6% removed from kernel image size.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-09 10:47:00 +02:00
Stephen Rothwell a5967db9af kbuild: allow architectures to use thin archives instead of ld -r
ld -r is an incremental link used to create built-in.o files in build
subdirectories. It produces relocatable object files containing all
its input files, and these are are then pulled together and relocated
in the final link. Aside from the bloat, this constrains the final
link relocations, which has bitten large powerpc builds with
unresolvable relocations in the final link.

Alan Modra has recommended the kernel use thin archives for linking.
This is an alternative and means that the linker has more information
available to it when it links the kernel.

This patch enables a config option architectures can select, which
causes all built-in.o files to be built as thin archives. built-in.o
files in subdirectories do not get symbol table or index attached,
which improves speed and size. The final link pass creates a
built-in.o archive in the root output directory which includes the
symbol table and index. The linker then uses takes this file to link.

The --whole-archive linker option is required, because the linker now
has visibility to every individual object file, and it will otherwise
just completely avoid including those without external references
(consider a file with EXPORT_SYMBOL or initcall or hardware exceptions
as its only entry points). The traditional built works "by luck" as
built-in.o files are large enough that they're going to get external
references. However this optimisation is unpredictable for the kernel
(due to above external references), ineffective at culling unused, and
costly because the .o files have to be searched for references.
Superior alternatives for link-time culling should be used instead.

Build characteristics for inclink vs thinarc, on a small powerpc64le
pseries VM with a modest .config:

                                  inclink       thinarc
sizes
vmlinux                        15 618 680    15 625 028
sum of all built-in.o          56 091 808     1 054 334
sum excluding root built-in.o                   151 430

find -name built-in.o | xargs rm ; time make vmlinux
real                              22.772s       21.143s
user                              13.280s       13.430s
sys                                4.310s        2.750s

- Final kernel pulled in only about 6K more, which shows how
  ineffective the object file culling is.
- Build performance looks improved due to less pagecache activity.
  On IO constrained systems it could be a bigger win.
- Build size saving is significant.

Side note, the toochain understands archives, so there's some tricks,
$ ar t built-in.o          # list all files you linked with
$ size built-in.o          # and their sizes
$ objdump -d built-in.o    # disassembly (unrelocated) with filenames

Implementation by sfr, minor tweaks by npiggin.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
2016-09-09 10:31:19 +02:00
Mickaël Salaün 4fadd04d50 seccomp: Remove 2-phase API documentation
Fixes: 8112c4f140 ("seccomp: remove 2-phase API")

Signed-off-by: Mickaël Salaün <mic@digikod.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2016-09-07 09:25:05 -07:00