Commit Graph

17514 Commits

Author SHA1 Message Date
Linus Torvalds 1ce235faa8 - KGDB support for arm64
- PCI I/O space extended to 16M (in preparation of PCIe support patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.9 (GNU/Linux)
 
 iQIcBAABAgAGBQJTOaqsAAoJEGvWsS0AyF7xYNUP/3/IPySIB+/6pyUG6q7kvIpF
 Di93M+VdmnLEOKhhx/tjkiEmEQMp0hFPeOlQRWf/Ugg4ksulP6gRejdDEjIfkmsk
 LrRXLjvH79NDJbN0pTUXqGDvLLZ9Qnib+HEOuKABIYUrwhNKySBk+5omGfXFtwLR
 Mb5JxPX0kbBXOqbOX4RgANQoRlE8GxJR3V245zlGxA4klcN4IiaDy/99kj+kaeaa
 Cl8X9K2I550IZ2YUAWPOut2aee2qRFQtAhIDgVthTYlGRx7Y/rDLM16B8fFY/T0H
 7azIpSO5hk5lp8J3giJHYajlJlXNla5FeHQb8XAVnlyqFBmCUn0vvd2VbPvWREJp
 UD8t1vZZt/s2he6CVAQIfQghwLyzrpPa19KbnyI+3HtsZ+NS/puBJmcVKZ2PBY/L
 28BsRzB7BKAPEVhNmyPwFHNdZTvjaqYUCLhQ0uTp1sSHMcLeSs7+vyMR99f/0u9E
 doSYAeF41ZkxHXL5xEevdj4sFkCEY1XFxER1Y8VM1rqHTeGEoeYbdS/u9tEeBgit
 jBelvHAlNTBgbur2nW4E9fQpAF2CsvWnRq6lSmDRTkyjzcLUQqA8bsQJ3aUyJtZt
 j17kUIzSH1q7x3zAaWQcvMVeawdkv2+HanjuTOdeO2ehvyG71vvxA3RkCv8o5Jhh
 da+jAMhkpYQxk8mSKkWm
 =8+cB
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull ARM64 updates from Catalin Marinas:
 - KGDB support for arm64
 - PCI I/O space extended to 16M (in preparation of PCIe support
   patches)
 - Dropping ZONE_DMA32 in favour of ZONE_DMA (we only need one for the
   time being), together with swiotlb late initialisation to correctly
   setup the bounce buffer
 - DMA API cache maintenance support (not all ARMv8 platforms have
   hardware cache coherency)
 - Crypto extensions advertising via ELF_HWCAP2 for compat user space
 - Perf support for dwarf unwinding in compat mode
 - asm/tlb.h converted to the generic mmu_gather code
 - asm-generic rwsem implementation
 - Code clean-up

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
  arm64: Remove pgprot_dmacoherent()
  arm64: Support DMA_ATTR_WRITE_COMBINE
  arm64: Implement custom mmap functions for dma mapping
  arm64: Fix __range_ok macro
  arm64: Fix duplicated Kconfig entries
  arm64: mm: Route pmd thp functions through pte equivalents
  arm64: rwsem: use asm-generic rwsem implementation
  asm-generic: rwsem: de-PPCify rwsem.h
  arm64: enable generic CPU feature modalias matching for this architecture
  arm64: smp: make local symbol static
  arm64: debug: make local symbols static
  ARM64: perf: support dwarf unwinding in compat mode
  ARM64: perf: add support for frame pointer unwinding in compat mode
  ARM64: perf: add support for perf registers API
  arm64: Add boot time configuration of Intermediate Physical Address size
  arm64: Do not synchronise I and D caches for special ptes
  arm64: Make DMA coherent and strongly ordered mappings not executable
  arm64: barriers: add dmb barrier
  arm64: topology: Implement basic CPU topology support
  arm64: advertise ARMv8 extensions to 32-bit compat ELF binaries
  ...
2014-03-31 15:01:45 -07:00
Linus Torvalds 1f8c538ed6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:
 "There are two memory management related changes, the CMMA support for
  KVM to avoid swap-in of freed pages and the split page table lock for
  the PMD level.  These two come with common code changes in mm/.

  A fix for the long standing theoretical TLB flush problem, this one
  comes with a common code change in kernel/sched/.

  Another set of changes is Heikos uaccess work, included is the initial
  set of patches with more to come.

  And fixes and cleanups as usual"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits)
  s390/con3270: optionally disable auto update
  s390/mm: remove unecessary parameter from pgste_ipte_notify
  s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
  s390/mm: fixing comment so that parameter name match
  s390/smp: limit number of cpus in possible cpu mask
  hypfs: Add clarification for "weight_min" attribute
  s390: update defconfigs
  s390/ptrace: add support for PTRACE_SINGLEBLOCK
  s390/perf: make print_debug_cf() static
  s390/topology: Remove call to update_cpu_masks()
  s390/compat: remove compat exec domain
  s390: select CONFIG_TTY for use of tty in unconditional keyboard driver
  s390/appldata_os: fix cpu array size calculation
  s390/checksum: remove memset() within csum_partial_copy_from_user()
  s390/uaccess: remove copy_from_user_real()
  s390/sclp_early: Return correct HSA block count also for zero
  s390: add some drivers/subsystems to the MAINTAINERS file
  s390: improve debug feature usage
  s390/airq: add support for irq ranges
  s390/mm: enable split page table lock for PMD level
  ...
2014-03-31 14:35:30 -07:00
Linus Torvalds 190f918660 Merge branch 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 compat wrapper rework from Heiko Carstens:
 "S390 compat system call wrapper simplification work.

  The intention of this work is to get rid of all hand written assembly
  compat system call wrappers on s390, which perform proper sign or zero
  extension, or pointer conversion of compat system call parameters.
  Instead all of this should be done with C code eg by using Al's
  COMPAT_SYSCALL_DEFINEx() macro.

  Therefore all common code and s390 specific compat system calls have
  been converted to the COMPAT_SYSCALL_DEFINEx() macro.

  In order to generate correct code all compat system calls may only
  have eg compat_ulong_t parameters, but no unsigned long parameters.
  Those patches which change parameter types from unsigned long to
  compat_ulong_t parameters are separate in this series, but shouldn't
  cause any harm.

  The only compat system calls which intentionally have 64 bit
  parameters (preadv64 and pwritev64) in support of the x86/32 ABI
  haven't been changed, but are now only available if an architecture
  defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

  System calls which do not have a compat variant but still need proper
  zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
  get a proper wrapper function with the new s390 specific
  COMPAT_SYSCALL_WRAPx() macro:

     COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

  which generates the following code (simplified):

     asmlinkage long sys_brk(unsigned long brk);
     asmlinkage long compat_sys_brk(long brk)
     {
         return sys_brk((u32)brk);
     }

  Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
  includes both linux/syscall.h and linux/compat.h, it will generate
  build errors, if the declaration of sys_brk() doesn't match, or if
  there exists a non-matching compat_sys_brk() declaration.

  In addition this will intentionally result in a link error if
  somewhere else a compat_sys_brk() function exists, which probably
  should have been used instead.  Two more BUILD_BUG_ONs make sure the
  size and type of each compat syscall parameter can be handled
  correctly with the s390 specific macros.

  I converted the compat system calls step by step to verify the
  generated code is correct and matches the previous code.  In fact it
  did not always match, however that was always a bug in the hand
  written asm code.

  In result we get less code, less bugs, and much more sanity checking"

* 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
  s390/compat: add copyright statement
  compat: include linux/unistd.h within linux/compat.h
  s390/compat: get rid of compat wrapper assembly code
  s390/compat: build error for large compat syscall args
  mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
  ipc/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: convert to COMPAT_SYSCALL_DEFINE
  security/compat: convert to COMPAT_SYSCALL_DEFINE
  mm/compat: convert to COMPAT_SYSCALL_DEFINE
  net/compat: convert to COMPAT_SYSCALL_DEFINE
  kernel/compat: convert to COMPAT_SYSCALL_DEFINE
  fs/compat: optional preadv64/pwrite64 compat system calls
  ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
  s390/compat: partial parameter conversion within syscall wrappers
  s390/compat: automatic zero, sign and pointer conversion of syscalls
  s390/compat: add sync_file_range and fallocate compat syscalls
  ...
2014-03-31 14:32:17 -07:00
Linus Torvalds 176ab02d49 Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LTO changes from Peter Anvin:
 "More infrastructure work in preparation for link-time optimization
  (LTO).  Most of these changes is to make sure symbols accessed from
  assembly code are properly marked as visible so the linker doesn't
  remove them.

  My understanding is that the changes to support LTO are still not
  upstream in binutils, but are on the way there.  This patchset should
  conclude the x86-specific changes, and remaining patches to actually
  enable LTO will be fed through the Kbuild tree (other than keeping up
  with changes to the x86 code base, of course), although not
  necessarily in this merge window"

* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  Kbuild, lto: Handle basic LTO in modpost
  Kbuild, lto: Disable LTO for asm-offsets.c
  Kbuild, lto: Add a gcc-ld script to let run gcc as ld
  Kbuild, lto: add ld-version and ld-ifversion macros
  Kbuild, lto: Drop .number postfixes in modpost
  Kbuild, lto, workaround: Don't warn for initcall_reference in modpost
  lto: Disable LTO for sys_ni
  lto: Handle LTO common symbols in module loader
  lto, workaround: Add workaround for initcall reordering
  lto: Make asmlinkage __visible
  x86, lto: Disable LTO for the x86 VDSO
  initconst, x86: Fix initconst mistake in ts5500 code
  initconst: Fix initconst mistake in dcdbas
  asmlinkage: Make trace_hardirqs_on/off_caller visible
  asmlinkage, x86: Fix 32bit memcpy for LTO
  asmlinkage Make __stack_chk_failed and memcmp visible
  asmlinkage: Mark rwsem functions that can be called from assembler asmlinkage
  asmlinkage: Make main_extable_sort_needed visible
  asmlinkage, mutex: Mark __visible
  asmlinkage: Make trace_hardirq visible
  ...
2014-03-31 14:13:25 -07:00
Linus Torvalds 918d80a136 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cpu handling changes from Ingo Molnar:
 "Bigger changes:

   - Intel CPU hardware-enablement: new vector instructions support
     (AVX-512), by Fenghua Yu.

   - Support the clflushopt instruction and use it in appropriate
     places.  clflushopt is similar to clflush but with more relaxed
     ordering, by Ross Zwisler.

   - MSR accessor cleanups, by Borislav Petkov.

   - 'forcepae' boot flag for those who have way too much time to spend
     on way too old Pentium-M systems and want to live way too
     dangerously, by Chris Bainbridge"

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, cpu: Add forcepae parameter for booting PAE kernels on PAE-disabled Pentium M
  Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
  x86, intel: Make MSR_IA32_MISC_ENABLE bit constants systematic
  x86, Intel: Convert to the new bit access MSR accessors
  x86, AMD: Convert to the new bit access MSR accessors
  x86: Add another set of MSR accessor functions
  x86: Use clflushopt in drm_clflush_virt_range
  x86: Use clflushopt in drm_clflush_page
  x86: Use clflushopt in clflush_cache_range
  x86: Add support for the clflushopt instruction
  x86, AVX-512: Enable AVX-512 States Context Switch
  x86, AVX-512: AVX-512 Feature Detection
2014-03-31 12:00:45 -07:00
Linus Torvalds 971eae7c99 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar:
 "Bigger changes:

   - sched/idle restructuring: they are WIP preparation for deeper
     integration between the scheduler and idle state selection, by
     Nicolas Pitre.

   - add NUMA scheduling pseudo-interleaving, by Rik van Riel.

   - optimize cgroup context switches, by Peter Zijlstra.

   - RT scheduling enhancements, by Thomas Gleixner.

  The rest is smaller changes, non-urgnt fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
  sched: Clean up the task_hot() function
  sched: Remove double calculation in fix_small_imbalance()
  sched: Fix broken setscheduler()
  sparc64, sched: Remove unused sparc64_multi_core
  sched: Remove unused mc_capable() and smt_capable()
  sched/numa: Move task_numa_free() to __put_task_struct()
  sched/fair: Fix endless loop in idle_balance()
  sched/core: Fix endless loop in pick_next_task()
  sched/fair: Push down check for high priority class task into idle_balance()
  sched/rt: Fix picking RT and DL tasks from empty queue
  trace: Replace hardcoding of 19 with MAX_NICE
  sched: Guarantee task priority in pick_next_task()
  sched/idle: Remove stale old file
  sched: Put rq's sched_avg under CONFIG_FAIR_GROUP_SCHED
  cpuidle/arm64: Remove redundant cpuidle_idle_call()
  cpuidle/powernv: Remove redundant cpuidle_idle_call()
  sched, nohz: Exclude isolated cores from load balancing
  sched: Fix select_task_rq_fair() description comments
  workqueue: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  sys: Replace hardcoding of -20 and 19 with MIN_NICE and MAX_NICE
  ...
2014-03-31 11:21:19 -07:00
Linus Torvalds 8c292f1174 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar:
 "Main changes:

  Kernel side changes:

   - Add SNB/IVB/HSW client uncore memory controller support (Stephane
     Eranian)

   - Fix various x86/P4 PMU driver bugs (Don Zickus)

  Tooling, user visible changes:

   - Add several futex 'perf bench' microbenchmarks (Davidlohr Bueso)

   - Speed up thread map generation (Don Zickus)

   - Introduce 'perf kvm --list-cmds' command line option for use by
     scripts (Ramkumar Ramachandra)

   - Print the evsel name in the annotate stdio output, prep to fix
     support outputting annotation for multiple events, not just for the
     first one (Arnaldo Carvalho de Melo)

   - Allow setting preferred callchain method in .perfconfig (Jiri Olsa)

   - Show in what binaries/modules 'perf probe's are set (Masami
     Hiramatsu)

   - Support distro-style debuginfo for uprobe in 'perf probe' (Masami
     Hiramatsu)

  Tooling, internal changes and fixes:

   - Use tid in mmap/mmap2 events to find maps (Don Zickus)

   - Record the reason for filtering an address_location (Namhyung Kim)

   - Apply all filters to an addr_location (Namhyung Kim)

   - Merge al->filtered with hist_entry->filtered in report/hists
     (Namhyung Kim)

   - Fix memory leak when synthesizing thread records (Namhyung Kim)

   - Use ui__has_annotation() in 'report' (Namhyung Kim)

   - hists browser refactorings to reuse code accross UIs (Namhyung Kim)

   - Add support for the new DWARF unwinder library in elfutils (Jiri
     Olsa)

   - Fix build race in the generation of bison files (Jiri Olsa)

   - Further streamline the feature detection display, trimming it a bit
     to show just the libraries detected, using VF=1 gets a more verbose
     output, showing the less interesting feature checks as well (Jiri
     Olsa).

   - Check compatible symtab type before loading dso (Namhyung Kim)

   - Check return value of filename__read_debuglink() (Stephane Eranian)

   - Move some hashing and fs related code from tools/perf/util/ to
     tools/lib/ so that it can be used by more tools/ living utilities
     (Borislav Petkov)

   - Prepare DWARF unwinding code for using an elfutils alternative
     unwinding library (Jiri Olsa)

   - Fix DWARF unwind max_stack processing (Jiri Olsa)

   - Add dwarf unwind 'perf test' entry (Jiri Olsa)

   - 'perf probe' improvements including memory leak fixes, sharing the
     intlist class with other tools, uprobes/kprobes code sharing and
     use of ref_reloc_sym (Masami Hiramatsu)

   - Shorten sample symbol resolving by adding cpumode to struct
     addr_location (Arnaldo Carvalho de Melo)

   - Fix synthesizing mmaps for threads (Don Zickus)

   - Fix invalid output on event group stdio report (Namhyung Kim)

   - Fixup header alignment in 'perf sched latency' output (Ramkumar
     Ramachandra)

   - Fix off-by-one error in 'perf timechart record' argv handling
     (Ramkumar Ramachandra)

  Tooling, cleanups:

   - Remove unused thread__find_map function (Jiri Olsa)

   - Remove unused simple_strtoul() function (Ramkumar Ramachandra)

  Tooling, documentation updates:

   - Update function names in debug messages (Ramkumar Ramachandra)

   - Update some code references in design.txt (Ramkumar Ramachandra)

   - Clarify load-latency information in the 'perf mem' docs (Andi
     Kleen)

   - Clarify x86 register naming in 'perf probe' docs (Andi Kleen)"

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (96 commits)
  perf tools: Remove unused simple_strtoul() function
  perf tools: Update some code references in design.txt
  perf evsel: Update function names in debug messages
  perf tools: Remove thread__find_map function
  perf annotate: Print the evsel name in the stdio output
  perf report: Use ui__has_annotation()
  perf tools: Fix memory leak when synthesizing thread records
  perf tools: Use tid in mmap/mmap2 events to find maps
  perf report: Merge al->filtered with hist_entry->filtered
  perf symbols: Apply all filters to an addr_location
  perf symbols: Record the reason for filtering an address_location
  perf sched: Fixup header alignment in 'latency' output
  perf timechart: Fix off-by-one error in 'record' argv handling
  perf machine: Factor machine__find_thread to take tid argument
  perf tools: Speed up thread map generation
  perf kvm: introduce --list-cmds for use by scripts
  perf ui hists: Pass evsel to hpp->header/width functions explicitly
  perf symbols: Introduce thread__find_cpumode_addr_location
  perf session: Change header.misc dump from decimal to hex
  perf ui/tui: Reuse generic __hpp__fmt() code
  ...
2014-03-31 11:13:25 -07:00
Linus Torvalds b3fd4ea9df Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "Main changes:

   - Torture-test changes, including refactoring of rcutorture and
     introduction of a vestigial locktorture.

   - Real-time latency fixes.

   - Documentation updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (77 commits)
  rcu: Provide grace-period piggybacking API
  rcu: Ensure kernel/rcu/rcu.h can be sourced/used stand-alone
  rcu: Fix sparse warning for rcu_expedited from kernel/ksysfs.c
  notifier: Substitute rcu_access_pointer() for rcu_dereference_raw()
  Documentation/memory-barriers.txt: Clarify release/acquire ordering
  rcutorture: Save kvm.sh output to log
  rcutorture: Add a lock_busted to test the test
  rcutorture: Place kvm-test-1-run.sh output into res directory
  rcutorture: Rename TREE_RCU-Kconfig.txt
  locktorture: Add kvm-recheck.sh plug-in for locktorture
  rcutorture: Gracefully handle NULL cleanup hooks
  locktorture: Add vestigial locktorture configuration
  rcutorture: Introduce "rcu" directory level underneath configs
  rcutorture: Rename kvm-test-1-rcu.sh
  rcutorture: Remove RCU dependencies from ver_functions.sh API
  rcutorture: Create CFcommon file for common Kconfig parameters
  rcutorture: Create config files for scripted test-the-test testing
  rcutorture: Add an rcu_busted to test the test
  locktorture: Add a lock-torture kernel module
  rcutorture: Abstract kvm-recheck.sh
  ...
2014-03-31 11:05:24 -07:00
Linus Torvalds 462bf234a8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock->check == 0
  lockdep: Make held_lock->check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
2014-03-31 10:59:39 -07:00
Eric Paris aa4af831bb AUDIT: Allow login in non-init namespaces
It its possible to configure your PAM stack to refuse login if audit
messages (about the login) were unable to be sent.  This is common in
many distros and thus normal configuration of many containers.  The PAM
modules determine if audit is enabled/disabled in the kernel based on
the return value from sending an audit message on the netlink socket.
If userspace gets back ECONNREFUSED it believes audit is disabled in the
kernel.  If it gets any other error else it refuses to let the login
proceed.

Just about ever since the introduction of namespaces the kernel audit
subsystem has returned EPERM if the task sending a message was not in
the init user or pid namespace.  So many forms of containers have never
worked if audit was enabled in the kernel.

BUT if the container was not in net_init then the kernel network code
would send ECONNREFUSED (instead of the audit code sending EPERM).  Thus
by pure accident/dumb luck/bug if an admin configured the PAM stack to
reject all logins that didn't talk to audit, but then ran the login
untility in the non-init_net namespace, it would work!! Clearly this was
a bug, but it is a bug some people expected.

With the introduction of network namespace support in 3.14-rc1 the two
bugs stopped cancelling each other out.  Now, containers in the
non-init_net namespace refused to let users log in (just like PAM was
configfured!) Obviously some people were not happy that what used to let
users log in, now didn't!

This fix is kinda hacky.  We return ECONNREFUSED for all non-init
relevant namespaces.  That means that not only will the old broken
non-init_net setups continue to work, now the broken non-init_pid or
non-init_user setups will 'work'.  They don't really work, since audit
isn't logging things.  But it's what most users want.

In 3.15 we should have patches to support not only the non-init_net
(3.14) namespace but also the non-init_pid and non-init_user namespace.
So all will be right in the world.  This just opens the doors wide open
on 3.14 and hopefully makes users happy, if not the audit system...

Reported-by: Andre Tomt <andre@tomt.net>
Reported-by: Adam Richter <adam_richter2004@yahoo.com>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-30 17:02:53 -07:00
John Stultz cab5e127ee time: Revert to calling clock_was_set_delayed() while in irq context
In commit 47a1b79630 ("tick/timekeeping: Call
update_wall_time outside the jiffies lock"), we moved to calling
clock_was_set() due to the fact that we were no longer holding
the timekeeping or jiffies lock.

However, there is still the problem that clock_was_set()
triggers an IPI, which cannot be done from the timer's hard irq
context, and will generate WARN_ON warnings.

Apparently in my earlier testing, I'm guessing I didn't bump the
dmesg log level, so I somehow missed the WARN_ONs.

Thus we need to revert back to calling clock_was_set_delayed().

Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-28 08:07:07 +01:00
Linus Torvalds f217c44ebd While on my flight to Linux Collaboration Summit, I was working on
my slides for the event trigger tutorial. I booted a 3.14-rc7 kernel
 to perform what I wanted to teach and cut and paste it into my slides.
 When I tried the traceon event trigger with a condition attached to it
 (turns tracing on only if a field of the trigger event matches a condition
 set by the user), nothing happened. Tracing would not turn on. I stopped
 working on my presentation in order to find what was wrong.
 
 It ended up being the way trace event triggers work when they have
 conditions. Instead of copying the fields, the condition code just
 looks at the fields that were copied into the ring buffer. This works
 great, unless tracing is off. That's because when the event is reserved
 on the ring buffer, the ring buffer returns a NULL pointer, this tells
 the tracing code that the ring buffer is disabled. This ends up being
 a problem for the traceon trigger if it is using this information to
 check its condition.
 
 Luckily the code that checks if tracing is on returns the ring buffer
 to use (because the ring buffer is determined by the event file
 also passed to that field). I was able to easily solve this bug by
 checking in that helper function if the returned ring buffer entry
 is NULL, and if so, also check the file flag if it has a trace event
 trigger condition, and if so, to pass back a temp ring buffer to use.
 This will allow the trace event trigger condition to still test the
 event fields, but nothing will be recorded.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTMtD2AAoJEKQekfcNnQGuwrUH/juk/qKY7T/nq98iMleOD+LR
 iTQyWh9zvHdku/3ZlK5eJqla2TJUctM5Sf9WhvV/iXJ2/9m5qnekZ6gCoraGVpG4
 qfKZAiHdGU0sFdp2NIt1uIHcqjYc8dg2KGajEKGNr7E2F3fvSTfNZV+WKAnmGqlE
 8ICsoYJ/Uv4kh18QQJexbyd7jXWE8tzxxS9UzPvCtCiSd1xf2yCAS5OlwPYafgiv
 Egbjqq7IUCJfVb0h/YfgPbEW69sjlxOulzn42Nii+UOx9uv1RTdtXdivyCXDWfX3
 Que0Do757oTvpW/22s9T3mpUtYLCB+6B2GV+ljqbLCZ5pRywc/GDCU//D1AXKEI=
 =dKWH
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "While on my flight to Linux Collaboration Summit, I was working on my
  slides for the event trigger tutorial.  I booted a 3.14-rc7 kernel to
  perform what I wanted to teach and cut and paste it into my slides.
  When I tried the traceon event trigger with a condition attached to it
  (turns tracing on only if a field of the trigger event matches a
  condition set by the user), nothing happened.  Tracing would not turn
  on.  I stopped working on my presentation in order to find what was
  wrong.

  It ended up being the way trace event triggers work when they have
  conditions.  Instead of copying the fields, the condition code just
  looks at the fields that were copied into the ring buffer.  This works
  great, unless tracing is off.  That's because when the event is
  reserved on the ring buffer, the ring buffer returns a NULL pointer,
  this tells the tracing code that the ring buffer is disabled.  This
  ends up being a problem for the traceon trigger if it is using this
  information to check its condition.

  Luckily the code that checks if tracing is on returns the ring buffer
  to use (because the ring buffer is determined by the event file also
  passed to that field).  I was able to easily solve this bug by
  checking in that helper function if the returned ring buffer entry is
  NULL, and if so, also check the file flag if it has a trace event
  trigger condition, and if so, to pass back a temp ring buffer to use.
  This will allow the trace event trigger condition to still test the
  event fields, but nothing will be recorded"

* tag 'trace-fixes-v3.14-rc7-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix traceon trigger condition to actually turn tracing on
2014-03-26 09:09:18 -07:00
Steven Rostedt (Red Hat) 2c4a33aba5 tracing: Fix traceon trigger condition to actually turn tracing on
While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.

The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-25 23:39:41 -04:00
Ingo Molnar 7de700e680 Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU update from Paul E. McKenney:

" [...] one late-breaking commit.  This one was requested for 3.15 by Peter Zijlstra.
  It is low risk because it adds a new in-kernel API with minimal changes to the
  existing code.  Those minimal changes are the addition of memory barriers and
  ACCESS_ONCE() macro calls, neither of which should be able to break things.
  This commit has passed significant rcutorture testing, with these additional
  additions to rcutorture slated for 3.16.  This commit has also been exposed to
  -next testing. "

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-25 06:45:39 +01:00
Linus Torvalds 11d4616bd0 futex: revert back to the explicit waiter counting code
Srikar Dronamraju reports that commit b0c29f79ec ("futexes: Avoid
taking the hb->lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Original-code-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-20 22:11:17 -07:00
Linus Torvalds 477cc48484 Vaibhav Nagarnaik discovered that since 3.10 a clean up patch made the
array index in the trace event format bogus. He supplied an elegant solution
 that uses __stringify() and also removes the need for the event_storage
 and event_storage_mutex and also cuts off a few K of overhead from
 the trace events.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTK6cxAAoJEKQekfcNnQGu+3oIAIPGwsevXcVNlmqLwtsoUhu2
 d0uMJYBpi+aQ/stenhkAThzY/5/O0SN2o+uyacSJKDo3WayF9fxGTvOHVbJhvmLF
 YsX6oQLVmzqPrq7BGJTvglv4+RYf+HkV1MAb/1iacA7sFtd7jVpUiPvLlQ3CEwph
 kNqdmoFT16iyE1snUviE0GmZmrdOqZBjwC1Ys+oSbaycRSFnvmDjYAGYot5tJfU5
 gYgpkeJ8J3bxHOGzNCRgmpLQNR3P1HzailPQVi51We14FlzSwOTwuKDtpf8WwzXV
 0fIEkdTU3+K62XxVw5/YQ5o/PpFKO/J5dSPjFe7PF2e6hCTTOABcK5foSSP1KSU=
 =559y
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull trace fix from Steven Rostedt:
 "Vaibhav Nagarnaik discovered that since 3.10 a clean-up patch made the
  array index in the trace event format bogus.

  He supplied an elegant solution that uses __stringify() and also
  removes the need for the event_storage and event_storage_mutex and
  also cuts off a few K of overhead from the trace events"

* tag 'trace-fixes-v3.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix array size mismatch in format string
2014-03-20 22:09:30 -07:00
Paul E. McKenney 765a3f4fed rcu: Provide grace-period piggybacking API
The following pattern is currently not well supported by RCU:

1.	Make data element inaccessible to RCU readers.

2.	Do work that probably lasts for more than one grace period.

3.	Do something to make sure RCU readers in flight before #1 above
	have completed.

Here are some things that could currently be done:

a.	Do a synchronize_rcu() unconditionally at either #1 or #3 above.
	This works, but imposes needless work and latency.

b.	Post an RCU callback at #1 above that does a wakeup, then
	wait for the wakeup at #3.  This works well, but likely results
	in an extra unneeded grace period.  Open-coding this is also
	a bit more semi-tricky code than would be good.

This commit therefore adds get_state_synchronize_rcu() and
cond_synchronize_rcu() APIs.  Call get_state_synchronize_rcu() at #1
above and pass its return value to cond_synchronize_rcu() at #3 above.
This results in a call to synchronize_rcu() if no grace period has
elapsed between #1 and #3, but requires only a load, comparison, and
memory barrier if a full grace period did elapse.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
2014-03-20 17:12:25 -07:00
Dave Jones 8c90487cdc Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC
Rename TAINT_UNSAFE_SMP to TAINT_CPU_OUT_OF_SPEC, so we can repurpose
the flag to encompass a wider range of pushing the CPU beyond its
warrany.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20140226154949.GA770@redhat.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-03-20 16:28:09 -07:00
Vaibhav Nagarnaik 87291347c4 tracing: Fix array size mismatch in format string
In event format strings, the array size is reported in two locations.
One in array subscript and then via the "size:" attribute. The values
reported there have a mismatch.

For e.g., in sched:sched_switch the prev_comm and next_comm character
arrays have subscript values as [32] where as the actual field size is
16.

name: sched_switch
ID: 301
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1;signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;

        field:char prev_comm[32];       offset:8;       size:16;        signed:1;
        field:pid_t prev_pid;   offset:24;      size:4; signed:1;
        field:int prev_prio;    offset:28;      size:4; signed:1;
        field:long prev_state;  offset:32;      size:8; signed:1;
        field:char next_comm[32];       offset:40;      size:16;        signed:1;
        field:pid_t next_pid;   offset:56;      size:4; signed:1;
        field:int next_prio;    offset:60;      size:4; signed:1;

After bisection, the following commit was blamed:
92edca0 tracing: Use direct field, type and system names

This commit removes the duplication of strings for field->name and
field->type assuming that all the strings passed in
__trace_define_field() are immutable. This is not true for arrays, where
the type string is created in event_storage variable and field->type for
all array fields points to event_storage.

Use __stringify() to create a string constant for the type string.

Also, get rid of event_storage and event_storage_mutex that are not
needed anymore.

also, an added benefit is that this reduces the overhead of events a bit more:

   text    data     bss     dec     hex filename
8424787 2036472 1302528 11763787         b3804b vmlinux
8420814 2036408 1302528 11759750         b37086 vmlinux.patched

Link: http://lkml.kernel.org/r/1392349908-29685-1-git-send-email-vnagarnaik@google.com

Cc: Laurent Chavey <chavey@google.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-20 13:21:05 -04:00
Linus Torvalds 7c3895383f Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
 "One really late cgroup patch to fix error path in create_css().
  Hitting this bug would be pretty rare but still possible and it gets
  delayed we'd need to backport it through -stable anyway.  It only
  updates error path in create_css() and has low chance of new
  breakages"

* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix a failure path in create_css()
2014-03-19 16:13:40 -07:00
Li Zefan 3eb59ec64f cgroup: fix a failure path in create_css()
If online_css() fails, we should remove cgroup files belonging
to css->ss.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-03-18 17:15:36 -04:00
Linus Torvalds 59bf6c3c6c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
 "Three small fixes"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/clock: Prevent tracing recursion in sched_clock_cpu()
  stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
  sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
2014-03-16 10:42:07 -07:00
Peter Zijlstra 6f008e72cd locking/mutex: Fix debug checks
OK, so commit:

  1d8fe7dc80 ("locking/mutexes: Unlock the mutex without the wait_lock")

generates this boot warning when CONFIG_DEBUG_MUTEXES=y:

  WARNING: CPU: 0 PID: 139 at /usr/src/linux-2.6/kernel/locking/mutex-debug.c:82 debug_mutex_unlock+0x155/0x180() DEBUG_LOCKS_WARN_ON(lock->owner != current)

And that makes sense, because as soon as we release the lock a
new owner can come in...

One would think that !__mutex_slowpath_needs_to_unlock()
implementations suffer the same, but for DEBUG we fall back to
mutex-null.h which has an unconditional 1 for that.

The mutex debug code requires the mutex to be unlocked after
doing the debug checks, otherwise it can find inconsistent
state.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: jason.low2@hp.com
Link: http://lkml.kernel.org/r/20140312122442.GB27965@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 13:49:47 +01:00
Alex Shi 6037dd1a49 sched: Clean up the task_hot() function
task_hot() doesn't need the 'sched_domain' parameter, so remove it.

Signed-off-by: Alex Shi <alex.shi@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394607111-1904-1-git-send-email-alex.shi@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 10:49:01 +01:00
Vincent Guittot a2cd42601b sched: Remove double calculation in fix_small_imbalance()
The tmp value has been already calculated in:

  scaled_busy_load_per_task =
		(busiest->load_per_task * SCHED_POWER_SCALE) /
		busiest->group_power;

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394555166-22894-1-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 10:49:00 +01:00
Steven Rostedt 383afd0971 sched: Fix broken setscheduler()
I decided to run my tests on linux-next, and my wakeup_rt tracer was
broken. After running a bisect, I found that the problem commit was:

   linux-next commit c365c292d0
   "sched: Consider pi boosting in setscheduler()"

And the reason the wake_rt tracer test was failing, was because it had
no RT task to trace. I first noticed this when running with
sched_switch event and saw that my RT task still had normal SCHED_OTHER
priority. Looking at the problem commit, I found:

 -       p->normal_prio = normal_prio(p);
 -       p->prio = rt_mutex_getprio(p);

With no

 +       p->normal_prio = normal_prio(p);
 +       p->prio = rt_mutex_getprio(p);

Reading what the commit is suppose to do, I realize that the p->prio
can't be set if the task is boosted with a higher prio, but the
p->normal_prio still needs to be set regardless, otherwise, when the
task is deboosted, it wont get the new priority.

The p->prio has to be set before "check_class_changed()" is called,
otherwise the class wont be changed.

Also added fix to newprio to include a check for deadline policy that
was missing. This change was suggested by Juri Lelli.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: SebastianAndrzej Siewior <bigeasy@linutronix.de>
Cc: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306120438.638bfe94@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 10:48:59 +01:00
Linus Torvalds adf961d7e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull audit namespace fixes from Eric Biederman:
 "Starting with 3.14-rc1 the audit code is faulty (think oopses and
  races) with respect to how it computes the network namespace of which
  socket to reply to, and I happened to notice by chance when reading
  through the code.

  My testing and the automated build bots don't find any problems with
  these fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  audit: Update kdoc for audit_send_reply and audit_list_rules_send
  audit: Send replies in the proper network namespace.
  audit: Use struct net not pid_t to remember the network namespce to reply in
2014-03-11 10:17:50 -07:00
Peter Zijlstra 34c6bc2c91 locking/mutexes: Add extra reschedule point
Add in an extra reschedule in an attempt to avoid getting reschedule
the moment we've acquired the lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-zah5eyn9gu7qlgwh9r6n2anc@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:59 +01:00
Peter Zijlstra fb0527bd5e locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
Since we want a task waiting for a mutex_lock() to go to sleep and
reschedule on need_resched() we must be able to abort the
mcs_spin_lock() around the adaptive spin.

Therefore implement a cancelable mcs lock.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: Jason Low <jason.low2@hp.com>
Link: http://lkml.kernel.org/n/tip-62hcl5wxydmjzd182zhvk89m@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:56 +01:00
Jason Low 1d8fe7dc80 locking/mutexes: Unlock the mutex without the wait_lock
When running workloads that have high contention in mutexes on an 8 socket
machine, mutex spinners would often spin for a long time with no lock owner.

The main reason why this is occuring is in __mutex_unlock_common_slowpath(),
if __mutex_slowpath_needs_to_unlock(), then the owner needs to acquire the
mutex->wait_lock before releasing the mutex (setting lock->count to 1). When
the wait_lock is contended, this delays the mutex from being released.
We should be able to release the mutex without holding the wait_lock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-4-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:54 +01:00
Jason Low 47667fa150 locking/mutexes: Modify the way optimistic spinners are queued
The mutex->spin_mlock was introduced in order to ensure that only 1 thread
spins for lock acquisition at a time to reduce cache line contention. When
lock->owner is NULL and the lock->count is still not 1, the spinner(s) will
continually release and obtain the lock->spin_mlock. This can generate
quite a bit of overhead/contention, and also might just delay the spinner
from getting the lock.

This patch modifies the way optimistic spinners are queued by queuing before
entering the optimistic spinning loop as oppose to acquiring before every
call to mutex_spin_on_owner(). So in situations where the spinner requires
a few extra spins before obtaining the lock, then there will only be 1 spinner
trying to get the lock and it will avoid the overhead from unnecessarily
unlocking and locking the spin_mlock.

Signed-off-by: Jason Low <jason.low2@hp.com>
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: Waiman.Long@hp.com
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390936396-3962-3-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:53 +01:00
Jason Low 46af29e479 locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
The mutex_can_spin_on_owner() function should also return false if the
task needs to be rescheduled to avoid entering the MCS queue when it
needs to reschedule.

Signed-off-by: Jason Low <jason.low2@hp.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Waiman.Long@hp.com
Cc: torvalds@linux-foundation.org
Cc: tglx@linutronix.de
Cc: riel@redhat.com
Cc: akpm@linux-foundation.org
Cc: davidlohr@hp.com
Cc: hpa@zytor.com
Cc: andi@firstfloor.org
Cc: aswin@hp.com
Cc: scott.norton@hp.com
Cc: chegu_vinod@hp.com
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1390936396-3962-2-git-send-email-jason.low2@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:52 +01:00
Peter Zijlstra c9122da1e2 locking: Move mcs_spinlock.h into kernel/locking/
The mcs_spinlock code is not meant (or suitable) as a generic locking
primitive, therefore take it away from the normal includes and place
it in kernel/locking/.

This way the locking primitives implemented there can use it as part
of their implementation but we do not risk it getting used
inapropriately.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-byirmpamgr7h25m5kyavwpzx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:14:52 +01:00
Mike Galbraith 156654f491 sched/numa: Move task_numa_free() to __put_task_struct()
Bad idea on -rt:

[  908.026136]  [<ffffffff8150ad6a>] rt_spin_lock_slowlock+0xaa/0x2c0
[  908.026145]  [<ffffffff8108f701>] task_numa_free+0x31/0x130
[  908.026151]  [<ffffffff8108121e>] finish_task_switch+0xce/0x100
[  908.026156]  [<ffffffff81509c0a>] thread_return+0x48/0x4ae
[  908.026160]  [<ffffffff8150a095>] schedule+0x25/0xa0
[  908.026163]  [<ffffffff8150ad95>] rt_spin_lock_slowlock+0xd5/0x2c0
[  908.026170]  [<ffffffff810658cf>] get_signal_to_deliver+0xaf/0x680
[  908.026175]  [<ffffffff8100242d>] do_signal+0x3d/0x5b0
[  908.026179]  [<ffffffff81002a30>] do_notify_resume+0x90/0xe0
[  908.026186]  [<ffffffff81513176>] int_signal+0x12/0x17
[  908.026193]  [<00007ff2a388b1d0>] 0x7ff2a388b1cf

and since upstream does not mind where we do this, be a bit nicer ...

Signed-off-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1393568591.6018.27.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:43 +01:00
Kirill Tkhai 35805ff8f4 sched/fair: Fix endless loop in idle_balance()
Check for fair tasks number to decide, that we've pulled a task.
rq's nr_running may contain throttled RT tasks.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394118975.19290.104.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:41 +01:00
Kirill Tkhai 4c6c4e38c4 sched/core: Fix endless loop in pick_next_task()
1) Single cpu machine case.

When rq has only RT tasks, but no one of them can be picked
because of throttling, we enter in endless loop.

pick_next_task_{dl,rt} return NULL.

In pick_next_task_fair() we permanently go to retry

	if (rq->nr_running != rq->cfs.h_nr_running)
		return RETRY_TASK;

(rq->nr_running is not being decremented when rt_rq becomes
throttled).

No chances to unthrottle any rt_rq or to wake fair here,
because of rq is locked permanently and interrupts are
disabled.

2) In case of SMP this can cause a hang too. Although we unlock
   rq in idle_balance(), interrupts are still disabled.

The solution is to check for available tasks in DL and RT
classes instead of checking for sum.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1394098321.19290.11.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:39 +01:00
Kirill Tkhai e4aa358b6c sched/fair: Push down check for high priority class task into idle_balance()
We close idle_exit_fair() bracket in case of we've pulled something or we've received
task of high priority class.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/1394098315.19290.10.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:37 +01:00
Kirill Tkhai 734ff2a71f sched/rt: Fix picking RT and DL tasks from empty queue
The problems:

1) We check for rt_nr_running before call of put_prev_task().
   If previous task is RT, its rt_rq may become throttled
   and dequeued after this call.

In case of p is from rt->rq this just causes picking a task
from throttled queue, but in case of its rt_rq is child
we are guaranteed catch BUG_ON.

2) The same with deadline class. The only difference we operate
   on only dl_rq.

This patch fixes all the above problems and it adds a small skip in the
DL update like we've already done for RT class:

	if (unlikely((s64)delta_exec <= 0))
		return;

This will optimize sequential update_curr_dl() calls a little.

Signed-off-by: Kirill Tkhai <ktkhai@parallels.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Link: http://lkml.kernel.org/r/1393946746.3643.3.camel@tkhai
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:35 +01:00
Jiri Olsa 63c45f4ba5 perf: Disallow user-space stack dumps for function trace events
Recent issues with user space callchains processing within
page fault handler tracing showed as Peter said 'there's
just too much fail surface'.

The user space stack dump is just another source of the this issue.

Related list discussions:
  http://marc.info/?t=139302086500001&r=1&w=2
  http://marc.info/?t=139301437300003&r=1&w=2

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1393775800-13524-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:57:58 +01:00
Jiri Olsa cfa77bc4af perf: Disallow user-space callchains for function trace events
Recent issues with user space callchains processing within
page fault handler tracing showed as Peter said 'there's
just too much fail surface'.

Related list discussions:

  http://marc.info/?t=139302086500001&r=1&w=2
  http://marc.info/?t=139301437300003&r=1&w=2

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1393775800-13524-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:57:57 +01:00
Ingo Molnar 0066f3b93e Merge branch 'perf/urgent' into perf/core
Merge the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:53:50 +01:00
Ingo Molnar a02ed5e3e0 Merge branch 'sched/urgent' into sched/core
Pick up fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:34:27 +01:00
Fernando Luis Vazquez Cao 96b3d28bf4 sched/clock: Prevent tracing recursion in sched_clock_cpu()
Prevent tracing of preempt_disable/enable() in sched_clock_cpu().
When CONFIG_DEBUG_PREEMPT is enabled, preempt_disable/enable() are
traced and this causes trace_clock() users (and probably others) to
go into an infinite recursion. Systems with a stable sched_clock()
are not affected.

This problem is similar to that fixed by upstream commit 95ef1e5292
("KVM guest: prevent tracing recursion with kvmclock").

Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1394083528.4524.3.camel@nexus
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:33:48 +01:00
Peter Zijlstra 177c53d943 stop_machine: Fix^2 race between stop_two_cpus() and stop_cpus()
We must use smp_call_function_single(.wait=1) for the
irq_cpu_stop_queue_work() to ensure the queueing is actually done under
stop_cpus_lock. Without this we could have dropped the lock by the time
we do the queueing and get the race we tried to fix.

Fixes: 7053ea1a34 ("stop_machine: Fix race between stop_two_cpus() and stop_cpus()")

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140228123905.GK3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:33:47 +01:00
Juri Lelli d44753b843 sched/deadline: Deny unprivileged users to set/change SCHED_DEADLINE policy
Deny the use of SCHED_DEADLINE policy to unprivileged users.
Even if root users can set the policy for normal users, we
don't want the latter to be able to change their parameters
(safest behavior).

Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1393844961-18097-1-git-send-email-juri.lelli@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:33:46 +01:00
Johannes Weiner e97ca8e5b8 mm: fix GFP_THISNODE callers and clarify
GFP_THISNODE is for callers that implement their own clever fallback to
remote nodes.  It restricts the allocation to the specified node and
does not invoke reclaim, assuming that the caller will take care of it
when the fallback fails, e.g.  through a subsequent allocation request
without GFP_THISNODE set.

However, many current GFP_THISNODE users only want the node exclusive
aspect of the flag, without actually implementing their own fallback or
triggering reclaim if necessary.  This results in things like page
migration failing prematurely even when there is easily reclaimable
memory available, unless kswapd happens to be running already or a
concurrent allocation attempt triggers the necessary reclaim.

Convert all callsites that don't implement their own fallback strategy
to __GFP_THISNODE.  This restricts the allocation a single node too, but
at the same time allows the allocator to enter the slowpath, wake
kswapd, and invoke direct reclaim if necessary, to make the allocation
happen when memory is full.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Jan Stancek <jstancek@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-10 17:26:19 -07:00
Eric W. Biederman d211f177b2 audit: Update kdoc for audit_send_reply and audit_list_rules_send
The kbuild test robot reported:
> tree:   git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git for-next
> head:   6f285b19d0
> commit: 6f285b19d0 [2/2] audit: Send replies in the proper network namespace.
> reproduce: make htmldocs
>
> >> Warning(kernel/audit.c:575): No description found for parameter 'request_skb'
> >> Warning(kernel/audit.c:575): Excess function parameter 'portid' description in 'audit_send_reply'
> >> Warning(kernel/auditfilter.c:1074): No description found for parameter 'request_skb'
> >> Warning(kernel/auditfilter.c:1074): Excess function parameter 'portid' description in 'audit_list_rules_s

Which was caused by my failure to update the kdoc annotations when I
updated the functions.  Fix that small oversight now.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2014-03-08 15:31:54 -08:00
Linus Torvalds ca62eec4e5 Merge branch 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "Two cpuset locking fixes from Li.  Both tagged for -stable"

* 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cpuset: fix a race condition in __cpuset_node_allowed_softwall()
  cpuset: fix a locking issue in cpuset_migrate_mm()
2014-03-08 11:57:38 -08:00
Linus Torvalds 721f0c1260 In the past, I've had lots of reports about trace events not working.
Developers would say they put a trace_printk() before and after the trace
 event but when they enable it (and the trace event said it was enabled) they
 would see the trace_printks but not the trace event.
 
 I was not able to reproduce this, but that's because I wasn't looking at
 the right location. Recently, another bug came up that showed the issue.
 
 If your kernel supports signed modules but allows for non-signed modules
 to be loaded, then when one is, the kernel will silently set the
 MODULE_FORCED taint on the module. Although, this taint happens without
 the need for insmod --force or anything of the kind, it labels the
 module with that taint anyway.
 
 If this tainted module has tracepoints, the tracepoints will be ignored
 because of the MODULE_FORCED taint. But no error message will be
 displayed. Worse yet, the event infrastructure will still be created
 letting users enable the trace event represented by the tracepoint,
 although that event will never actually be enabled. This is because
 the tracepoint infrastructure allows for non-existing tracepoints to
 be enabled for new modules to arrive and have their tracepoints set.
 
 Although there are several things wrong with the above, this change
 only addresses the creation of the trace event files for tracepoints
 that are not created when a module is loaded and is tainted. This change
 will print an error message about the module being tainted and not the
 trace events will not be created, and it does not create the trace event
 infrastructure.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJTFnMPAAoJEKQekfcNnQGuPPwH/Rtwy/siM+ltvlLnEbRjS4RL
 9aF5mfJUazmfCaOBMSaMUo92uCbciIVif6icX843JmCdCOR5Hk5SZryBbt2A/dF9
 TcMloKNbIn/ad7yZ0O75BJlPnRJ5RZ42edQfW1lkdeWo644C8Kj399fVPt7KU5SH
 1KTWyShT05E2fYjp2lMrb+FOFfKerlzkXtgGwJKXnd/7hrbdmKEH/OO8YkMrlVZp
 SURPyzNMMVKoUFY797b6FrFRqV04C210BtNcNrd4S3/V9VE4IPS/8YSLfvVaGkD0
 e2kVAvIOkwPnYzMZg70jf2R8NlGS2mwaVC+NenBHz3KlpFdaeRu1hFw7/n8h2/s=
 =YbJd
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "In the past, I've had lots of reports about trace events not working.
  Developers would say they put a trace_printk() before and after the
  trace event but when they enable it (and the trace event said it was
  enabled) they would see the trace_printks but not the trace event.

  I was not able to reproduce this, but that's because I wasn't looking
  at the right location.  Recently, another bug came up that showed the
  issue.

  If your kernel supports signed modules but allows for non-signed
  modules to be loaded, then when one is, the kernel will silently set
  the MODULE_FORCED taint on the module.  Although, this taint happens
  without the need for insmod --force or anything of the kind, it labels
  the module with that taint anyway.

  If this tainted module has tracepoints, the tracepoints will be
  ignored because of the MODULE_FORCED taint.  But no error message will
  be displayed.  Worse yet, the event infrastructure will still be
  created letting users enable the trace event represented by the
  tracepoint, although that event will never actually be enabled.  This
  is because the tracepoint infrastructure allows for non-existing
  tracepoints to be enabled for new modules to arrive and have their
  tracepoints set.

  Although there are several things wrong with the above, this change
  only addresses the creation of the trace event files for tracepoints
  that are not created when a module is loaded and is tainted.  This
  change will print an error message about the module being tainted and
  not the trace events will not be created, and it does not create the
  trace event infrastructure"

* tag 'trace-fixes-v3.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Do not add event files for modules that fail tracepoints
2014-03-07 16:32:40 -08:00
Linus Torvalds 27ea0f7811 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
 - a bugfix for a long standing waitqueue race
 - a trivial fix for a missing include

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Include missing header file in irqdomain.c
  genirq: Remove racy waitqueue_active check
2014-03-07 16:31:41 -08:00