Commit Graph

4088 Commits

Author SHA1 Message Date
Linus Torvalds 42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Linus Torvalds f7458a5d63 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The RCU changes in this cycle are:

   - Dynticks updates, consolidating open-coded counter accesses into a
     well-defined API

   - SRCU updates: Simplify algorithm, add formal verification

   - Documentation updates

   - Miscellaneous fixes

   - Torture-test updates

  Most of the diffstat comes from the relatively large documentation
  update"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits)
  srcu: Reduce probability of SRCU ->unlock_count[] counter overflow
  rcutorture: Add CBMC-based formal verification for SRCU
  srcu: Force full grace-period ordering
  srcu: Implement more-efficient reader counts
  rcu: Adjust FQS offline checks for exact online-CPU detection
  rcu: Check cond_resched_rcu_qs() state less often to reduce GP overhead
  rcu: Abstract extended quiescent state determination
  rcu: Abstract dynticks extended quiescent state enter/exit operations
  rcu: Add lockdep checks to synchronous expedited primitives
  rcu: Eliminate unused expedited_normal counter
  llist: Clarify comments about when locking is needed
  rcu: Fix comment in rcu_organize_nocb_kthreads()
  rcu: Enable RCU tracepoints by default to aid in debugging
  rcu: Make rcu_cpu_starting() use its "cpu" argument
  rcu: Add comment headers to expedited-grace-period counter functions
  rcu: Don't wake rcuc/X kthreads on NOCB CPUs
  rcu: Re-enable TASKS_RCU for User Mode Linux
  rcu: Once again use NMI-based stack traces in stall warnings
  rcu: Remove short-term CPU kicking
  rcu: Add long-term CPU kicking
  ...
2017-02-20 11:21:17 -08:00
Linus Torvalds 575260e3f8 Merge branch 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull debugobjects updates from Ingo Molnar:
 "A number of scalability improvements by Waimang Long"

* 'core-debugobjects-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Improve variable naming
  debugobjects: Reduce contention on the global pool_lock
  debugobjects: Scale thresholds with # of CPUs
  debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
2017-02-20 11:19:09 -08:00
Linus Torvalds 20dcfe1b7d Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Nothing exciting, just the usual pile of fixes, updates and cleanups:

   - A bunch of clocksource driver updates

   - Removal of CONFIG_TIMER_STATS and the related /proc file

   - More posix timer slim down work

   - A scalability enhancement in the tick broadcast code

   - Math cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
  hrtimer: Catch invalid clockids again
  math64, tile: Fix build failure
  clocksource/drivers/arm_arch_timer:: Mark cyclecounter __ro_after_init
  timerfd: Protect the might cancel mechanism proper
  timer_list: Remove useless cast when printing
  time: Remove CONFIG_TIMER_STATS
  clocksource/drivers/arm_arch_timer: Work around Hisilicon erratum 161010101
  clocksource/drivers/arm_arch_timer: Introduce generic errata handling infrastructure
  clocksource/drivers/arm_arch_timer: Remove fsl-a008585 parameter
  clocksource/drivers/arm_arch_timer: Add dt binding for hisilicon-161010101 erratum
  clocksource/drivers/ostm: Add renesas-ostm timer driver
  clocksource/drivers/ostm: Document renesas-ostm timer DT bindings
  clocksource/drivers/tcb_clksrc: Use 32 bit tcb as sched_clock
  clocksource/drivers/gemini: Add driver for the Cortina Gemini
  clocksource: add DT bindings for Cortina Gemini
  clockevents: Add a clkevt-of mechanism like clksrc-of
  tick/broadcast: Reduce lock cacheline contention
  timers: Omit POSIX timer stuff from task_struct when disabled
  x86/timer: Make delay() work during early bootup
  delay: Add explanation of udelay() inaccuracy
  ...
2017-02-20 10:06:32 -08:00
Kees Cook dfb4357da6 time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:08 +01:00
Waiman Long 0cad93c345 debugobjects: Improve variable naming
As suggested by Ingo, the debug_objects_alloc counter is now renamed to
debug_objects_allocated with minor twist in comment and debug output.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1486503630-1501-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:53:04 +01:00
Peter Zijlstra f405df5de3 refcount_t: Introduce a special purpose refcount type
Provide refcount_t, an atomic_t like primitive built just for
refcounting.

It provides saturation semantics such that overflow becomes impossible
and thereby 'spurious' use-after-free is avoided.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-02-10 09:04:19 +01:00
Waiman Long 858274b6a1 debugobjects: Reduce contention on the global pool_lock
On a large SMP system with many CPUs, the global pool_lock may become
a performance bottleneck as all the CPUs that need to allocate or
free debug objects have to take the lock. That can sometimes cause
soft lockups like:

 NMI watchdog: BUG: soft lockup - CPU#35 stuck for 22s! [rcuos/1:21]
 ...
 RIP: 0010:[<ffffffff817c216b>]  [<ffffffff817c216b>]
	_raw_spin_unlock_irqrestore+0x3b/0x60
 ...
 Call Trace:
  [<ffffffff813f40d1>] free_object+0x81/0xb0
  [<ffffffff813f4f33>] debug_check_no_obj_freed+0x193/0x220
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff81284996>] ? file_free_rcu+0x36/0x60
  [<ffffffff81251712>] kmem_cache_free+0xd2/0x380
  [<ffffffff81284960>] ? fput+0x90/0x90
  [<ffffffff81284996>] file_free_rcu+0x36/0x60
  [<ffffffff81124c23>] rcu_nocb_kthread+0x1b3/0x550
  [<ffffffff81124b71>] ? rcu_nocb_kthread+0x101/0x550
  [<ffffffff81124a70>] ? sync_exp_work_done.constprop.63+0x50/0x50
  [<ffffffff810c59d1>] kthread+0x101/0x120
  [<ffffffff81101a59>] ? trace_hardirqs_on_caller+0xf9/0x1c0
  [<ffffffff817c2d32>] ret_from_fork+0x22/0x50

To reduce the amount of contention on the pool_lock, the actual
kmem_cache_free() of the debug objects will be delayed if the pool_lock
is busy. This will temporarily increase the amount of free objects
available at the free pool when the system is busy. As a result,
the number of kmem_cache allocation and freeing is reduced.

To further reduce the lock operations free debug objects in batches of
four.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-4-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-05 17:09:32 +01:00
Waiman Long 97dd552eb2 debugobjects: Scale thresholds with # of CPUs
On a large SMP systems with hundreds of CPUs, the current thresholds
for allocating and freeing debug objects (256 and 1024 respectively)
may not work well. This can cause a lot of needless calls to
kmem_aloc() and kmem_free() on those systems.

To alleviate this thrashing problem, the object freeing threshold
is now increased to "1024 + # of CPUs * 32". Whereas the object
allocation threshold is increased to "256 + # of CPUs * 4". That
should make the debug objects subsystem scale better with the number
of CPUs available in the system.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-3-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 09:01:55 +01:00
Waiman Long c4b73aabd0 debugobjects: Track number of kmem_cache_alloc/kmem_cache_free done
New debugfs stat counters are added to track the numbers of
kmem_cache_alloc() and kmem_cache_free() function calls to get a
sense of how the internal debug objects cache management is performing.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: "Du Changbin" <changbin.du@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: http://lkml.kernel.org/r/1483647425-4135-2-git-send-email-longman@redhat.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-04 09:01:54 +01:00
Ingo Molnar a8709fa4a0 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU changes from Paul E. McKenney:

 - Dynticks updates, consolidating open-coded counter accesses into a well-defined API

 - SRCU updates: Simplify algorithm, add formal verification

 - Documentation updates

 - Miscellaneous fixes

 - Torture-test updates

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-31 07:45:42 +01:00
zhong jiang 3277953de2 mm: do not export ioremap_page_range symbol for external module
Recently, I've found cases in which ioremap_page_range was used
incorrectly, in external modules, leading to crashes.  This can be
partly attributed to the fact that ioremap_page_range is lower-level,
with fewer protections, as compared to the other functions that an
external module would typically call.  Those include:

     ioremap_cache
     ioremap_nocache
     ioremap_prot
     ioremap_uc
     ioremap_wc
     ioremap_wt

...each of which wraps __ioremap_caller, which in turn provides a safer
way to achieve the mapping.

Therefore, stop EXPORT-ing ioremap_page_range.

Link: http://lkml.kernel.org/r/1485173220-29010-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Suggested-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Matthew Wilcox dd040b6f6d radix-tree: fix private list warnings
The newly introduced warning in radix_tree_free_nodes() was testing the
wrong variable; it should have been 'old' instead of 'node'.

Fixes: ea07b862ac ("mm: workingset: fix use-after-free in shadow node shrinker")
Link: http://lkml.kernel.org/r/20170118163746.GA32495@cmpxchg.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-24 16:26:14 -08:00
Matt Fleming 961518259b rcu: Enable RCU tracepoints by default to aid in debugging
While debugging a performance issue I needed to understand why
RCU sofitrqs were firing so frequently.

Unfortunately, the RCU callback tracepoints are hidden behind
CONFIG_RCU_TRACE which defaults to off in the upstream kernel and is
likely to also be disabled in enterprise distribution configs.

Enable it by default for CONFIG_TREE_RCU. However, we must keep it
disabled for tiny RCU, because it would otherwise pull in a large
amount of code that would make tiny RCU less than tiny.

I ran some file system metadata intensive workloads (git checkout,
FS-Mark) on a variety of machines with this patch and saw no
detectable change in performance.

Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2017-01-23 11:37:13 -08:00
Geliang Tang d852d39432 timerqueue: Use rb_entry_safe() instead of open-coding it
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/0d5cf199ac43792df0b6f7e2145545c30fa1dbbe.1482222135.git.geliangtang@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-01-20 08:03:42 +01:00
Linus Torvalds 203f80f1c4 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fix from Konrad Rzeszutek Wilk:
 "A tiny fix to make sure that page-sized mappings are page-aligned (and
  not say straddle two pages). This is important for some drivers (such
  as NVME)"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: ensure that page-sized mappings are page-aligned
2017-01-17 09:27:50 -08:00
Nikita Yushchenko 602d9858f0 swiotlb: ensure that page-sized mappings are page-aligned
Some drivers do depend on page mappings to be page aligned.

Swiotlb already enforces such alignment for mappings greater than page,
extend that to page-sized mappings as well.

Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine
assumes page-aligned mappings.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
2017-01-15 12:37:24 -05:00
Linus Torvalds f4d3935e4f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro.

The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix a fencepost error in pipe_advance()
  coredump: Ensure proper size of sparse core files
  aio: fix lock dep warning
  tmpfs: clear S_ISGID when setting posix ACLs
2017-01-14 17:13:28 -08:00
Al Viro b9dc6f65bc fix a fencepost error in pipe_advance()
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers.  If that happened, none of them had been released,
leaving pipe full.  Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages.  IOW, it's an infoleak.

Cc: stable@vger.kernel.org # v4.9
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-01-14 19:50:41 -05:00
Chris Wilson f2a5fec173 locking/ww_mutex: Begin kselftests for ww_mutex
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <dev@mblankhorst.nl>
Cc: Nicolai Hähnle <nhaehnle@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20161201114711.28697-4-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:37:14 +01:00
Sudip Mukherjee da0510c475 lib/Kconfig.debug: fix frv build failure
The build of frv allmodconfig was failing with the errors like:

  /tmp/cc0JSPc3.s: Assembler messages:
  /tmp/cc0JSPc3.s:1839: Error: symbol `.LSLT0' is already defined
  /tmp/cc0JSPc3.s:1842: Error: symbol `.LASLTP0' is already defined
  /tmp/cc0JSPc3.s:1969: Error: symbol `.LELTP0' is already defined
  /tmp/cc0JSPc3.s:1970: Error: symbol `.LELT0' is already defined

Commit 866ced950b ("kbuild: Support split debug info v4") introduced
splitting the debug info and keeping that in a separate file.  Somehow,
the frv-linux gcc did not like that and I am guessing that instead of
splitting it started copying.  The first report about this is at:

  https://lists.01.org/pipermail/kbuild-all/2015-July/010527.html.

I will try and see if this can work with frv and if still fails I will
open a bug report with gcc.  But meanwhile this is the easiest option to
solve build failure of frv.

Fixes: 866ced950b ("kbuild: Support split debug info v4")
Link: http://lkml.kernel.org/r/1482062348-5352-1-git-send-email-sudipm.mukherjee@gmail.com
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-10 18:31:55 -08:00
Johannes Weiner ea07b862ac mm: workingset: fix use-after-free in shadow node shrinker
Several people report seeing warnings about inconsistent radix tree
nodes followed by crashes in the workingset code, which all looked like
use-after-free access from the shadow node shrinker.

Dave Jones managed to reproduce the issue with a debug patch applied,
which confirmed that the radix tree shrinking indeed frees shadow nodes
while they are still linked to the shadow LRU:

  WARNING: CPU: 2 PID: 53 at lib/radix-tree.c:643 delete_node+0x1e4/0x200
  CPU: 2 PID: 53 Comm: kswapd0 Not tainted 4.10.0-rc2-think+ #3
  Call Trace:
     delete_node+0x1e4/0x200
     __radix_tree_delete_node+0xd/0x10
     shadow_lru_isolate+0xe6/0x220
     __list_lru_walk_one.isra.4+0x9b/0x190
     list_lru_walk_one+0x23/0x30
     scan_shadow_nodes+0x2e/0x40
     shrink_slab.part.44+0x23d/0x5d0
     shrink_node+0x22c/0x330
     kswapd+0x392/0x8f0

This is the WARN_ON_ONCE(!list_empty(&node->private_list)) placed in the
inlined radix_tree_shrink().

The problem is with 14b468791f ("mm: workingset: move shadow entry
tracking to radix tree exceptional tracking"), which passes an update
callback into the radix tree to link and unlink shadow leaf nodes when
tree entries change, but forgot to pass the callback when reclaiming a
shadow node.

While the reclaimed shadow node itself is unlinked by the shrinker, its
deletion from the tree can cause the left-most leaf node in the tree to
be shrunk.  If that happens to be a shadow node as well, we don't unlink
it from the LRU as we should.

Consider this tree, where the s are shadow entries:

       root->rnode
            |
       [0       n]
        |       |
     [s    ] [sssss]

Now the shadow node shrinker reclaims the rightmost leaf node through
the shadow node LRU:

       root->rnode
            |
       [0        ]
        |
    [s     ]

Because the parent of the deleted node is the first level below the
root and has only one child in the left-most slot, the intermediate
level is shrunk and the node containing the single shadow is put in
its place:

       root->rnode
            |
       [s        ]

The shrinker again sees a single left-most slot in a first level node
and thus decides to store the shadow in root->rnode directly and free
the node - which is a leaf node on the shadow node LRU.

  root->rnode
       |
       s

Without the update callback, the freed node remains on the shadow LRU,
where it causes later shrinker runs to crash.

Pass the node updater callback into __radix_tree_delete_node() in case
the deletion causes the left-most branch in the tree to collapse too.

Also add warnings when linked nodes are freed right away, rather than
wait for the use-after-free when the list is scanned much later.

Fixes: 14b468791f ("mm: workingset: move shadow entry tracking to radix tree exceptional tracking")
Reported-by: Dave Chinner <david@fromorbit.com>
Reported-by: Hugh Dickins <hughd@google.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-and-tested-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chris Leech <cleech@redhat.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@linuxonhyperv.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-01-07 18:22:40 -08:00
Linus Torvalds 2fd8774c79 Merge branch 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb
Pull swiotlb fixes from Konrad Rzeszutek Wilk:
 "This has one fix to make i915 work when using Xen SWIOTLB, and a
  feature from Geert to aid in debugging of devices that can't do DMA
  outside the 32-bit address space.

  The feature from Geert is on top of v4.10 merge window commit
  (specifically you pulling my previous branch), as his changes were
  dependent on the Documentation/ movement patches.

  I figured it would just easier than me trying than to cherry-pick the
  Documentation patches to satisfy git.

  The patches have been soaking since 12/20, albeit I updated the last
  patch due to linux-next catching an compiler error and adding an
  Tested-and-Reported-by tag"

* 'stable/for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: Export swiotlb_max_segment to users
  swiotlb: Add swiotlb=noforce debug option
  swiotlb: Convert swiotlb_force from int to enum
  x86, swiotlb: Simplify pci_swiotlb_detect_override()
2017-01-06 10:53:21 -08:00
Konrad Rzeszutek Wilk 7453c549f5 swiotlb: Export swiotlb_max_segment to users
So they can figure out what is the optimal number of pages
that can be contingously stitched together without fear of
bounce buffer.

We also expose an mechanism for sub-users of SWIOTLB API, such
as Xen-SWIOTLB to set the max segment value. And lastly
if swiotlb=force is set (which mandates we bounce buffer everything)
we set max_segment so at least we can bounce buffer one 4K page
instead of a giant 512KB one for which we may not have space.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reported-and-Tested-by: Juergen Gross <jgross@suse.com>
2017-01-06 13:00:01 -05:00
Linus Torvalds 3ddc76dfc7 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
 "This series does a tree wide cleanup of types related to
  timers/timekeeping.

   - Get rid of cycles_t and use a plain u64. The type is not really
     helpful and caused more confusion than clarity

   - Get rid of the ktime union. The union has become useless as we use
     the scalar nanoseconds storage unconditionally now. The 32bit
     timespec alike storage got removed due to the Y2038 limitations
     some time ago.

     That leaves the odd union access around for no reason. Clean it up.

  Both changes have been done with coccinelle and a small amount of
  manual mopping up"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  ktime: Get rid of ktime_equal()
  ktime: Cleanup ktime_set() usage
  ktime: Get rid of the union
  clocksource: Use a plain u64 instead of cycle_t
2016-12-25 14:30:04 -08:00
Linus Torvalds b272f732f8 Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug notifier removal from Thomas Gleixner:
 "This is the final cleanup of the hotplug notifier infrastructure. The
  series has been reintgrated in the last two days because there came a
  new driver using the old infrastructure via the SCSI tree.

  Summary:

   - convert the last leftover drivers utilizing notifiers

   - fixup for a completely broken hotplug user

   - prevent setup of already used states

   - removal of the notifiers

   - treewide cleanup of hotplug state names

   - consolidation of state space

  There is a sphinx based documentation pending, but that needs review
  from the documentation folks"

* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/armada-xp: Consolidate hotplug state space
  irqchip/gic: Consolidate hotplug state space
  coresight/etm3/4x: Consolidate hotplug state space
  cpu/hotplug: Cleanup state names
  cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
  staging/lustre/libcfs: Convert to hotplug state machine
  scsi/bnx2i: Convert to hotplug state machine
  scsi/bnx2fc: Convert to hotplug state machine
  cpu/hotplug: Prevent overwriting of callbacks
  x86/msr: Remove bogus cleanup from the error path
  bus: arm-ccn: Prevent hotplug callback leak
  perf/x86/intel/cstate: Prevent hotplug callback leak
  ARM/imx/mmcd: Fix broken cpu hotplug handling
  scsi: qedi: Convert to hotplug state machine
2016-12-25 14:05:56 -08:00
Thomas Gleixner 2456e85535 ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
2016-12-25 17:21:22 +01:00
Thomas Gleixner 530e9b76ae cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(),
register_hotcpu_notifier(), register_cpu_notifier(),
__register_hotcpu_notifier(), __register_cpu_notifier(),
unregister_hotcpu_notifier(), unregister_cpu_notifier(),
__unregister_hotcpu_notifier(), __unregister_cpu_notifier()

are unused now. Remove them and all related code.

Remove also the now pointless cpu notifier error injection mechanism. The
states can be executed step by step and error rollback is the same as cpu
down, so any state transition can be tested w/o requiring the notifier
error injection.

Some CPU hotplug states are kept as they are (ab)used for hotplug state
tracking.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2016-12-25 10:47:43 +01:00
Linus Torvalds 7c0f6ba682 Replace <asm/uaccess.h> with <linux/uaccess.h> globally
This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-24 11:46:01 -08:00
Linus Torvalds a307d0a007 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull final vfs updates from Al Viro:
 "Assorted cleanups and fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sg_write()/bsg_write() is not fit to be called under KERNEL_DS
  ufs: fix function declaration for ufs_truncate_blocks
  fs: exec: apply CLOEXEC before changing dumpable task flags
  seq_file: reset iterator to first record for zero offset
  vfs: fix isize/pos/len checks for reflink & dedupe
  [iov_iter] fix iterate_all_kinds() on empty iterators
  move aio compat to fs/aio.c
  reorganize do_make_slave()
  clone_private_mount() doesn't need to touch namespace_sem
  remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
2016-12-23 10:52:43 -08:00
Al Viro 33844e6651 [iov_iter] fix iterate_all_kinds() on empty iterators
Problem similar to ones dealt with in "fold checks into iterate_and_advance()"
and followups, except that in this case we really want to do nothing when
asked for zero-length operation - unlike zero-length iterate_and_advance(),
zero-length iterate_all_kinds() has no side effects, and callers are simpler
that way.

That got exposed when copy_from_iter_full() had been used by tipc, which
builds an msghdr with zero payload and (now) feeds it to a primitive
based on iterate_all_kinds() instead of iterate_and_advance().

Reported-by: Jon Maloy <jon.maloy@ericsson.com>
Tested-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22 23:00:22 -05:00
Borislav Petkov 50f4d9bda9 printk: fix typo in CONSOLE_LOGLEVEL_DEFAULT help text
s/prink/printk/

Link: http://lkml.kernel.org/r/20161215170111.19075-1-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Olof Johansson <olof@lixom.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-20 09:48:47 -08:00
Geert Uytterhoeven fff5d99225 swiotlb: Add swiotlb=noforce debug option
On architectures like arm64, swiotlb is tied intimately to the core
architecture DMA support. In addition, ZONE_DMA cannot be disabled.

To aid debugging and catch devices not supporting DMA to memory outside
the 32-bit address space, add a kernel command line option
"swiotlb=noforce", which disables the use of bounce buffers.
If specified, trying to map memory that cannot be used with DMA will
fail, and a rate-limited warning will be printed.

Note that io_tlb_nslabs is set to 1, which is the minimal supported
value.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-12-19 09:05:20 -05:00
Geert Uytterhoeven ae7871be18 swiotlb: Convert swiotlb_force from int to enum
Convert the flag swiotlb_force from an int to an enum, to prepare for
the advent of more possible values.

Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2016-12-19 09:05:20 -05:00
Linus Torvalds 9a19a6db37 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs updates from Al Viro:

 - more ->d_init() stuff (work.dcache)

 - pathname resolution cleanups (work.namei)

 - a few missing iov_iter primitives - copy_from_iter_full() and
   friends. Either copy the full requested amount, advance the iterator
   and return true, or fail, return false and do _not_ advance the
   iterator. Quite a few open-coded callers converted (and became more
   readable and harder to fuck up that way) (work.iov_iter)

 - several assorted patches, the big one being logfs removal

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  logfs: remove from tree
  vfs: fix put_compat_statfs64() does not handle errors
  namei: fold should_follow_link() with the step into not-followed link
  namei: pass both WALK_GET and WALK_MORE to should_follow_link()
  namei: invert WALK_PUT logics
  namei: shift interpretation of LOOKUP_FOLLOW inside should_follow_link()
  namei: saner calling conventions for mountpoint_last()
  namei.c: get rid of user_path_parent()
  switch getfrag callbacks to ..._full() primitives
  make skb_add_data,{_nocache}() and skb_copy_to_page_nocache() advance only on success
  [iov_iter] new primitives - copy_from_iter_full() and friends
  don't open-code file_inode()
  ceph: switch to use of ->d_init()
  ceph: unify dentry_operations instances
  lustre: switch to use of ->d_init()
2016-12-16 10:24:44 -08:00
Matthew Wilcox b9a0deb96b redo: radix tree test suite: fix compilation
[ This resurrects commit 53855d10f4, which was reverted in
  2b41226b39.  It depended on commit d544abd5ff ("lib/radix-tree:
  Convert to hotplug state machine") so now it is correct to apply ]

Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
suite as it adds a call to cpuhp_setup_state_nocalls() which is not
currently emulated in the test suite.  Add it, and delete the emulation
of the old CPU hotplug mechanism.

Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-15 11:04:20 -08:00
Matthew Wilcox e8de434076 radix-tree: ensure counts are initialised
radix_tree_join() was freeing nodes with a non-zero ->exceptional count,
and radix_tree_split() wasn't zeroing ->exceptional when it allocated
the new node.  Fix this by making all callers of radix_tree_node_alloc()
pass in the new counts (and some other always-initialised fields), which
will prevent the problem recurring if in future we decide to do
something similar.

Link: http://lkml.kernel.org/r/1481667692-14500-3-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox a90eb3a2a4 radix-tree: fix replacement for multiorder entries
When replacing an entry with NULL, we need to delete any sibling
entries.  Also account deleting exceptional entries properly.  Also fix
a bug with radix_tree_iter_replace() where we would fail to remove
entirely freed nodes.  Also fix accounting bug when switching between
normal and exceptional entries with replace_slot.  Also add testcases
for all these bugs.

Link: http://lkml.kernel.org/r/1480369871-5271-61-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 2791653a68 radix-tree: add radix_tree_split_preload()
Calculate how many nodes we need to allocate to split an old_order entry
into multiple entries, each of size new_order.  The test suite checks
that we allocated exactly the right number of nodes; neither too many
(checked by rtp->nr == 0), nor too few (checked by comparing
nr_allocated before and after the call to radix_tree_split()).

Link: http://lkml.kernel.org/r/1480369871-5271-60-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox e157b55594 radix-tree: add radix_tree_split
This new function splits a larger multiorder entry into smaller entries
(potentially multi-order entries).  These entries are initialised to
RADIX_TREE_RETRY to ensure that RCU walkers who see this state aren't
confused.  The caller should then call radix_tree_for_each_slot() and
radix_tree_replace_slot() in order to turn these retry entries into the
intended new entries.  Tags are replicated from the original multiorder
entry into each new entry.

Link: http://lkml.kernel.org/r/1480369871-5271-59-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 175542f575 radix-tree: add radix_tree_join
This new function allows for the replacement of many smaller entries in
the radix tree with one larger multiorder entry.  From the point of view
of an RCU walker, they may see a mixture of the smaller entries and the
large entry during the same walk, but they will never see NULL for an
index which was populated before the join.

Link: http://lkml.kernel.org/r/1480369871-5271-58-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 268f42de71 radix-tree: delete radix_tree_range_tag_if_tagged()
This is an exceptionally complicated function with just one caller
(tag_pages_for_writeback).  We devote a large portion of the runtime of
the test suite to testing this one function which has one caller.  By
introducing the new function radix_tree_iter_tag_set(), we can eliminate
all of the complexity while keeping the performance.  The caller can now
use a fairly standard radix_tree_for_each() loop, and it doesn't need to
worry about tricksy things like 'start' wrapping.

The test suite continues to spend a large amount of time investigating
this function, but now it's testing the underlying primitives such as
radix_tree_iter_resume() and the radix_tree_for_each_tagged() iterator
which are also used by other parts of the kernel.

Link: http://lkml.kernel.org/r/1480369871-5271-57-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 478922e2b0 radix-tree: delete radix_tree_locate_item()
This rather complicated function can be better implemented as an
iterator.  It has only one caller, so move the functionality to the only
place that needs it.  Update the test suite to follow the same pattern.

Link: http://lkml.kernel.org/r/1480369871-5271-56-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 148deab223 radix-tree: improve multiorder iterators
This fixes several interlinked problems with the iterators in the
presence of multiorder entries.

1. radix_tree_iter_next() would only advance by one slot, which would
   result in the iterators returning the same entry more than once if
   there were sibling entries.

2. radix_tree_next_slot() could return an internal pointer instead of
   a user pointer if a tagged multiorder entry was immediately followed by
   an entry of lower order.

3. radix_tree_next_slot() expanded to a lot more code than it used to
   when multiorder support was compiled in.  And I wasn't comfortable with
   entry_to_node() being in a header file.

Fixing radix_tree_iter_next() for the presence of sibling entries
necessarily involves examining the contents of the radix tree, so we now
need to pass 'slot' to radix_tree_iter_next(), and we need to change the
calling convention so it is called *before* dropping the lock which
protects the tree.  Also rename it to radix_tree_iter_resume(), as some
people thought it was necessary to call radix_tree_iter_next() each time
around the loop.

radix_tree_next_slot() becomes closer to how it looked before multiorder
support was introduced.  It only checks to see if the next entry in the
chunk is a sibling entry or a pointer to a node; this should be rare
enough that handling this case out of line is not a performance impact
(and such impact is amortised by the fact that the entry we just
processed was a multiorder entry).  Also, radix_tree_next_slot() used to
force a new chunk lookup for untagged entries, which is more expensive
than the out of line sibling entry skipping.

Link: http://lkml.kernel.org/r/1480369871-5271-55-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 218ed7503a radix-tree: improve dump output
Print the indices of the entries as unsigned (instead of signed)
integers and print the parent node of each entry to help navigate around
larger trees where the layout is not quite so obvious.  Print the
indices covered by a node.  Rearrange the order of fields printed so the
indices and parents line up for each type of entry.

Link: http://lkml.kernel.org/r/1480369871-5271-53-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox bc412fca6e radix-tree: make radix_tree_find_next_bit more useful
Since this function is specialised to the radix tree, pass in the node
and tag to calculate the address of the bitmap in
radix_tree_find_next_bit() instead of the caller.  Likewise, there is no
need to pass in the size of the bitmap.

Link: http://lkml.kernel.org/r/1480369871-5271-52-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 9498d2bb34 radix-tree: create node_tag_set()
Similar to node_tag_clear(), factor node_tag_set() out of
radix_tree_range_tag_if_tagged().

Link: http://lkml.kernel.org/r/1480369871-5271-51-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 91d9c05ac6 radix-tree: move rcu_head into a union with private_list
I want to be able to reference node->parent after freeing node.

Currently node->parent is in a union with rcu_head, so it is overwritten
when the node is put on the RCU list.  We know that private_list is not
referenced after the node is freed, so it is safe for these two members
to share space.

Link: http://lkml.kernel.org/r/1480369871-5271-50-git-send-email-mawilcox@linuxonhyperv.com
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Matthew Wilcox 91b9677c4c radix-tree: fix typo
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Tested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:10 -08:00
Andreas Platschek 0462554707 Kconfig: lib/Kconfig.ubsan fix reference to ubsan documentation
Documenation/ubsan.txt was moved to Documentation/dev-tools/ubsan.rst,
this fixes the reference.

Link: http://lkml.kernel.org/r/1476698152-29340-3-git-send-email-andreas.platschek@opentech.at
Signed-off-by: Andreas Platschek <andreas.platschek@opentech.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-14 16:04:08 -08:00