Commit Graph

692708 Commits

Author SHA1 Message Date
Byungchul Park b09be676e0 locking/lockdep: Implement the 'crossrelease' feature
Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.

So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:07 +02:00
Byungchul Park ce07a9415f locking/lockdep: Make check_prev_add() able to handle external stack_trace
Currently, a space for stack_trace is pinned in check_prev_add(), that
makes us not able to use external stack_trace. The simplest way to
achieve it is to pass an external stack_trace as an argument.

A more suitable solution is to pass a callback additionally along with
a stack_trace so that callers can decide the way to save or whether to
save. Actually crossrelease needs to do other than saving a stack_trace.
So pass a stack_trace and callback to handle it, to check_prev_add().

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-5-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:06 +02:00
Byungchul Park 70911fdc95 locking/lockdep: Change the meaning of check_prev_add()'s return value
Firstly, return 1 instead of 2 when 'prev -> next' dependency already
exists. Since the value 2 is not referenced anywhere, just return 1
indicating success in this case.

Secondly, return 2 instead of 1 when successfully added a lock_list
entry with saving stack_trace. With that, a caller can decide whether
to avoid redundant save_trace() on the caller site.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-4-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:06 +02:00
Byungchul Park 49347a986a locking/lockdep: Add a function building a chain between two classes
Crossrelease needs to build a chain between two classes regardless of
their contexts. However, add_chain_cache() cannot be used for that
purpose since it assumes that it's called in the acquisition context
of the hlock. So this patch introduces a new function doing it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-3-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:05 +02:00
Byungchul Park 545c23f2e9 locking/lockdep: Refactor lookup_chain_cache()
Currently, lookup_chain_cache() provides both 'lookup' and 'add'
functionalities in a function. However, each is useful. So this
patch makes lookup_chain_cache() only do 'lookup' functionality and
makes add_chain_cahce() only do 'add' functionality. And it's more
readable than before.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:05 +02:00
Peter Zijlstra ae813308f4 locking/lockdep: Avoid creating redundant links
Two boots + a make defconfig, the first didn't have the redundant bit
in, the second did:

 lock-classes:                         1168       1169 [max: 8191]
 direct dependencies:                  7688       5812 [max: 32768]
 indirect dependencies:               25492      25937
 all direct dependencies:            220113     217512
 dependency chains:                    9005       9008 [max: 65536]
 dependency chain hlocks:             34450      34366 [max: 327680]
 in-hardirq chains:                      55         51
 in-softirq chains:                     371        378
 in-process chains:                    8579       8579
 stack-trace entries:                108073      88474 [max: 524288]
 combined max dependencies:       178738560  169094640

 max locking depth:                      15         15
 max bfs queue depth:                   320        329

 cyclic checks:                        9123       9190

 redundant checks:                                5046
 redundant links:                                 1828

 find-mask forwards checks:            2564       2599
 find-mask backwards checks:          39521      39789

So it saves nearly 2k links and a fair chunk of stack-trace entries, but
as expected, makes no real difference on the indirect dependencies.

At the same time, you see the max BFS depth increase, which is also
expected, although it could easily be boot variance -- these numbers are
not entirely stable between boots.

The down side is that the cycles in the graph become larger and thus
the reports harder to read.

XXX: do we want this as a CONFIG variable, implied by LOCKDEP_SMALL?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: iamjoonsoo.kim@lge.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Link: http://lkml.kernel.org/r/20170303091338.GH6536@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:04 +02:00
Peter Zijlstra d92a8cfcb3 locking/lockdep: Rework FS_RECLAIM annotation
A while ago someone, and I cannot find the email just now, asked if we
could not implement the RECLAIM_FS inversion stuff with a 'fake' lock
like we use for other things like workqueues etc. I think this should
be possible which allows reducing the 'irq' states and will reduce the
amount of __bfs() lookups we do.

Removing the 1 IRQ state results in 4 less __bfs() walks per
dependency, improving lockdep performance. And by moving this
annotation out of the lockdep code it becomes easier for the mm people
to extend.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nikolay Borisov <nborisov@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: iamjoonsoo.kim@lge.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:03 +02:00
Peter Zijlstra a9668cd6ee locking: Remove smp_mb__before_spinlock()
Now that there are no users of smp_mb__before_spinlock() left, remove
it entirely.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:03 +02:00
Peter Zijlstra d89e588ca4 locking: Introduce smp_mb__after_spinlock()
Since its inception, our understanding of ACQUIRE, esp. as applied to
spinlocks, has changed somewhat. Also, I wonder if, with a simple
change, we cannot make it provide more.

The problem with the comment is that the STORE done by spin_lock isn't
itself ordered by the ACQUIRE, and therefore a later LOAD can pass over
it and cross with any prior STORE, rendering the default WMB
insufficient (pointed out by Alan).

Now, this is only really a problem on PowerPC and ARM64, both of
which already defined smp_mb__before_spinlock() as a smp_mb().

At the same time, we can get a much stronger construct if we place
that same barrier _inside_ the spin_lock(). In that case we upgrade
the RCpc spinlock to an RCsc.  That would make all schedule() calls
fully transitive against one another.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:02 +02:00
Peter Zijlstra ff7a5fb0f1 overlayfs, locking: Remove smp_mb__before_spinlock() usage
While we could replace the smp_mb__before_spinlock() with the new
smp_mb__after_spinlock(), the normal pattern is to use
smp_store_release() to publish an object that is used for
lockless_dereference() -- and mirrors the regular rcu_assign_pointer()
/ rcu_dereference() patterns.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:02 +02:00
Peter Zijlstra 8b1b436dd1 mm, locking: Rework {set,clear,mm}_tlb_flush_pending()
Commit:

  af2c1401e6 ("mm: numa: guarantee that tlb_flush_pending updates are visible before page table updates")

added smp_mb__before_spinlock() to set_tlb_flush_pending(). I think we
can solve the same problem without this barrier.

If instead we mandate that mm_tlb_flush_pending() is used while
holding the PTL we're guaranteed to observe prior
set_tlb_flush_pending() instances.

For this to work we need to rework migrate_misplaced_transhuge_page()
a little and move the test up into do_huge_pmd_numa_page().

NOTE: this relies on flush_tlb_range() to guarantee:

   (1) it ensures that prior page table updates are visible to the
       page table walker and
   (2) it ensures that subsequent memory accesses are only made
       visible after the invalidation has completed

This is required for architectures that implement TRANSPARENT_HUGEPAGE
(arc, arm, arm64, mips, powerpc, s390, sparc, x86) or otherwise use
mm_tlb_flush_pending() in their page-table operations (arm, arm64,
x86).

This appears true for:

 - arm (DSB ISB before and after),
 - arm64 (DSB ISHST before, and DSB ISH after),
 - powerpc (PTESYNC before and after),
 - s390 and x86 TLB invalidate are serializing instructions

But I failed to understand the situation for:

 - arc, mips, sparc

Now SPARC64 is a wee bit special in that flush_tlb_range() is a no-op
and it flushes the TLBs using arch_{enter,leave}_lazy_mmu_mode()
inside the PTL. It still needs to guarantee the PTL unlock happens
_after_ the invalidate completes.

Vineet, Ralf and Dave could you guys please have a look?

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:01 +02:00
Peter Zijlstra 706eeb3e9c Documentation/locking/atomic: Add documents for new atomic_t APIs
Since we've vastly expanded the atomic_t interface in recent years the
existing documentation is woefully out of date and people seem to get
confused a bit.

Start a new document to hopefully better explain the current state of
affairs.

The old atomic_ops.txt also covers bitmaps and a few more details so
this is not a full replacement and we'll therefore keep that document
around until such a time that we've managed to write more text to cover
its entire.

Also please, ReST people, go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:00 +02:00
Marc Zyngier 450f9689f2 clocksource/arm_arch_timer: Use static_branch_enable_cpuslocked()
Use the new static_branch_enable_cpuslocked() function to switch
the workaround static key on the CPU hotplug path.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20170801080257.5056-5-marc.zyngier@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:00 +02:00
Marc Zyngier 5a40527f8f jump_label: Provide hotplug context variants
As using the normal static key API under the hotplug lock is
pretty much impossible, let's provide a variant of some of them
that require the hotplug lock to have already been taken.

These function are only meant to be used in CPU hotplug callbacks.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20170801080257.5056-4-marc.zyngier@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:59 +02:00
Marc Zyngier 8b7b412807 jump_label: Split out code under the hotplug lock
In order to later introduce an "already locked" version of some
of the static key funcions, let's split the code into the core stuff
(the *_cpuslocked functions) and the usual helpers, which now
take/release the hotplug lock and call into the _cpuslocked
versions.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20170801080257.5056-3-marc.zyngier@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:58 +02:00
Marc Zyngier b70cecf4b6 jump_label: Move CPU hotplug locking
As we're about to rework the locking, let's move the taking and
release of the CPU hotplug lock to locations that will make its
reworking completely obvious.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/20170801080257.5056-2-marc.zyngier@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:58 +02:00
Peter Zijlstra d0646a6f55 jump_label: Add RELEASE barrier after text changes
In the unlikely case text modification does not fully order things,
add some extra ordering of our own to ensure we only enabled the fast
path after all text is visible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:57 +02:00
Paolo Bonzini be040bea90 cpuset: Make nr_cpusets private
Any use of key->enabled (that is static_key_enabled and static_key_count)
outside jump_label_lock should handle its own serialization.  In the case
of cpusets_enabled_key, the key is always incremented/decremented under
cpuset_mutex, and hence the same rule applies to nr_cpusets.  The rule
*is* respected currently, but the mutex is static so nr_cpusets should
be static too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1501601046-35683-4-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:57 +02:00
Paolo Bonzini 7a34bcb8b2 jump_label: Do not use unserialized static_key_enabled()
Any use of key->enabled (that is static_key_enabled and static_key_count)
outside jump_label_lock should handle its own serialization.  The only
two that are not doing so are the UDP encapsulation static keys.  Change
them to use static_key_enable, which now correctly tests key->enabled under
the jump label lock.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1501601046-35683-3-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:56 +02:00
Paolo Bonzini 1dbb6704de jump_label: Fix concurrent static_key_enable/disable()
static_key_enable/disable are trying to cap the static key count to
0/1.  However, their use of key->enabled is outside jump_label_lock
so they do not really ensure that.

Rewrite them to do a quick check for an already enabled (respectively,
already disabled), and then recheck under the jump label lock.  Unlike
static_key_slow_inc/dec, a failed check under the jump label lock does
not modify key->enabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1501601046-35683-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:56 +02:00
Kirill Tkhai 83ced169d9 locking/rwsem-xadd: Add killable versions of rwsem_down_read_failed()
Rename rwsem_down_read_failed() in __rwsem_down_read_failed_common()
and teach it to abort waiting in case of pending signals and killable
state argument passed.

Note, that we shouldn't wake anybody up in EINTR path, as:

We check for (waiter.task) under spinlock before we go to out_nolock
path. Current task wasn't able to be woken up, so there are
a writer, owning the sem, or a writer, which is the first waiter.
In the both cases we shouldn't wake anybody. If there is a writer,
owning the sem, and we were the only waiter, remove RWSEM_WAITING_BIAS,
as there are no waiters anymore.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/149789534632.9059.2901382369609922565.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:55 +02:00
Kirill Tkhai 0aa1125fa8 locking/rwsem-spinlock: Add killable versions of __down_read()
Rename __down_read() in __down_read_common() and teach it
to abort waiting in case of pending signals and killable
state argument passed.

Note, that we shouldn't wake anybody up in EINTR path, as:

We check for signal_pending_state() after (!waiter.task)
test and under spinlock. So, current task wasn't able to
be woken up. It may be in two cases: a writer is owner
of the sem, or a writer is a first waiter of the sem.

If a writer is owner of the sem, no one else may work
with it in parallel. It will wake somebody, when it
call up_write() or downgrade_write().

If a writer is the first waiter, it will be woken up,
when the last active reader releases the sem, and
sem->count became 0.

Also note, that set_current_state() may be moved down
to schedule() (after !waiter.task check), as all
assignments in this type of semaphore (including wake_up),
occur under spinlock, so we can't miss anything.

Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: arnd@arndb.de
Cc: avagin@virtuozzo.com
Cc: davem@davemloft.net
Cc: fenghua.yu@intel.com
Cc: gorcunov@virtuozzo.com
Cc: heiko.carstens@de.ibm.com
Cc: hpa@zytor.com
Cc: ink@jurassic.park.msu.ru
Cc: mattst88@gmail.com
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/149789533283.9059.9829416940494747182.stgit@localhost.localdomain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:55 +02:00
Prateek Sood 50972fe78f locking/osq_lock: Fix osq_lock queue corruption
Fix ordering of link creation between node->prev and prev->next in
osq_lock(). A case in which the status of optimistic spin queue is
CPU6->CPU2 in which CPU6 has acquired the lock.

        tail
          v
  ,-. <- ,-.
  |6|    |2|
  `-' -> `-'

At this point if CPU0 comes in to acquire osq_lock, it will update the
tail count.

  CPU2			CPU0
  ----------------------------------

				       tail
				         v
			  ,-. <- ,-.    ,-.
			  |6|    |2|    |0|
			  `-' -> `-'    `-'

After tail count update if CPU2 starts to unqueue itself from
optimistic spin queue, it will find an updated tail count with CPU0 and
update CPU2 node->next to NULL in osq_wait_next().

  unqueue-A

	       tail
	         v
  ,-. <- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'

  unqueue-B

  ->tail != curr && !node->next

If reordering of following stores happen then prev->next where prev
being CPU2 would be updated to point to CPU0 node:

				       tail
				         v
			  ,-. <- ,-.    ,-.
			  |6|    |2|    |0|
			  `-'    `-' -> `-'

  osq_wait_next()
    node->next <- 0
    xchg(node->next, NULL)

	       tail
	         v
  ,-. <- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'

  unqueue-C

At this point if next instruction
	WRITE_ONCE(next->prev, prev);
in CPU2 path is committed before the update of CPU0 node->prev = prev then
CPU0 node->prev will point to CPU6 node.

	       tail
    v----------. v
  ,-. <- ,-.    ,-.
  |6|    |2|    |0|
  `-'    `-'    `-'
     `----------^

At this point if CPU0 path's node->prev = prev is committed resulting
in change of CPU0 prev back to CPU2 node. CPU2 node->next is NULL
currently,

				       tail
			                 v
			  ,-. <- ,-. <- ,-.
			  |6|    |2|    |0|
			  `-'    `-'    `-'
			     `----------^

so if CPU0 gets into unqueue path of osq_lock it will keep spinning
in infinite loop as condition prev->next == node will never be true.

Signed-off-by: Prateek Sood <prsood@codeaurora.org>
[ Added pictures, rewrote comments. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sramana@codeaurora.org
Link: http://lkml.kernel.org/r/1500040076-27626-1-git-send-email-prsood@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:54 +02:00
Peter Zijlstra 9d664c0aec locking/atomic: Fix atomic_set_release() for 'funny' architectures
Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.

The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.

The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).

Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: davem@davemloft.net
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: rkuo@codeaurora.org
Cc: vgupta@synopsys.com
Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:54 +02:00
Boqun Feng 35a2897c2a sched/wait: Remove the lockless swait_active() check in swake_up*()
Steven Rostedt reported a potential race in RCU core because of
swake_up():

        CPU0                            CPU1
        ----                            ----
                                __call_rcu_core() {

                                 spin_lock(rnp_root)
                                 need_wake = __rcu_start_gp() {
                                  rcu_start_gp_advanced() {
                                   gp_flags = FLAG_INIT
                                  }
                                 }

 rcu_gp_kthread() {
   swait_event_interruptible(wq,
        gp_flags & FLAG_INIT) {
   spin_lock(q->lock)

                                *fetch wq->task_list here! *

   list_add(wq->task_list, q->task_list)
   spin_unlock(q->lock);

   *fetch old value of gp_flags here *

                                 spin_unlock(rnp_root)

                                 rcu_gp_kthread_wake() {
                                  swake_up(wq) {
                                   swait_active(wq) {
                                    list_empty(wq->task_list)

                                   } * return false *

  if (condition) * false *
    schedule();

In this case, a wakeup is missed, which could cause the rcu_gp_kthread
waits for a long time.

The reason of this is that we do a lockless swait_active() check in
swake_up(). To fix this, we can either 1) add a smp_mb() in swake_up()
before swait_active() to provide the proper order or 2) simply remove
the swait_active() in swake_up().

The solution 2 not only fixes this problem but also keeps the swait and
wait API as close as possible, as wake_up() doesn't provide a full
barrier and doesn't do a lockless check of the wait queue either.
Moreover, there are users already using swait_active() to do their quick
checks for the wait queues, so it make less sense that swake_up() and
swake_up_all() do this on their own.

This patch then removes the lockless swait_active() check in swake_up()
and swake_up_all().

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170615041828.zk3a3sfyudm5p6nl@tardis
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:28:53 +02:00
Ingo Molnar 388f8e1273 Merge branch 'linus' into locking/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:20:53 +02:00
Linus Torvalds 8d31f80eb3 Pin control fixes for the v4.13 cycle:
- Fix the documentation build as the docs were moved.
 
 - Correct the UART pin list on the Intell Merrifield.
 
 - Fix pin assignment and number of pins on the Marvell Armada
   37xx pin controller.
 
 - Cover the Setzer models in the Chromebook DMI quirk in the
   Intel cheryview driver so they start working.
 
 - Add the missing "sim" function to the sunxi driver.
 
 - Fix USB pin definitions on Uniphier Pro4.
 
 - Smatch fix for invalid reference in the zx pin control driver.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZi3joAAoJEEEQszewGV1zVgoQAKejQg9435pCxXVqdKW2319t
 RHj4w3qpA8+nIx5qT2GGnxD68/O3RnApB88XBo7ySKQnOVnY2aSQmRWRgJc6jKol
 9NYaBHO5h7DUZFNe3GQWq9xiCYryRoaDOyPbuOq99q0fhJLDUnLxxPs7xtEr4+1i
 ryni3fujFYDczWDthugaGdTaHsjapl2BM+mL5ZkBDAoS3cNNt3nlsmYy17DHPN6Q
 CLfTBVWgbEI37YxjGu3XJdyArYRijEHei8RmRJnWvezrRzHLMkq1NZnpOlkajail
 8iW7uvX3af3tQNaAeXdD2WcF9+UmIfFUwY1SDDBMWk+vAzljRiSh5Ug+HuHJO2fq
 eQQ/l/nTVFQPKEztFHWWfe6y/ovw49dqhC8E12mkYVrp8+ciaknUUC7xLLNTMVBM
 HRawsHtuHxEV6ziK0JHfOwQQdhu/zD3CbIE7s8CEcd+juFrStR95TGHNsd03dH9q
 D7/GBrmrHs06Pd8IZ/xdpe3yGhdrd4qknnl5vkDYE5p7EdjtGmYAXR7LARdaDji2
 zzs8Hd8Rs0Dam28+5nWVl+aL2vAXMXmNSj28Pp52r4oPl6+61IFxlYJAsVmIJQwd
 o6+mMmb2euS664QYG3KvORmlKj+Bw0ii0NTxFmlDI+r8uu1vM4JM3HGh5zmJ2FQD
 umL8zl1KUPEUgejAYqug
 =Dyh5
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:
 "These are the pin control fixes I have gathered since the return from
  my vacation. They boiled in -next a while so let's get them in.

  Apart from the documentation build it is purely driver fixes. Which is
  nice. The Intel fixes seem kind of important.

   - Fix the documentation build as the docs were moved

   - Correct the UART pin list on the Intel Merrifield

   - Fix pin assignment and number of pins on the Marvell Armada 37xx
     pin controller

   - Cover the Setzer models in the Chromebook DMI quirk in the Intel
     cheryview driver so they start working

   - Add the missing "sim" function to the sunxi driver

   - Fix USB pin definitions on Uniphier Pro4

   - Smatch fix for invalid reference in the zx pin control driver"

* tag 'pinctrl-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: generic: update references to Documentation/pinctrl.txt
  pinctrl: intel: merrifield: Correct UART pin lists
  pinctrl: armada-37xx: Fix number of pin in south bridge
  pinctrl: armada-37xx: Fix the pin 23 on south bridge
  pinctrl: cherryview: Add Setzer models to the Chromebook DMI quirk
  pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
  pinctrl: uniphier: fix USB3 pin assignment for Pro4
  pinctrl: zte: fix dereference of 'data' in zx_set_mux()
2017-08-09 14:30:34 -07:00
Mel Gorman 48fb6f4db9 futex: Remove unnecessary warning from get_futex_key
Commit 65d8fc777f ("futex: Remove requirement for lock_page() in
get_futex_key()") removed an unnecessary lock_page() with the
side-effect that page->mapping needed to be treated very carefully.

Two defensive warnings were added in case any assumption was missed and
the first warning assumed a correct application would not alter a
mapping backing a futex key.  Since merging, it has not triggered for
any unexpected case but Mark Rutland reported the following bug
triggering due to the first warning.

  kernel BUG at kernel/futex.c:679!
  Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
  Modules linked in:
  CPU: 0 PID: 3695 Comm: syz-executor1 Not tainted 4.13.0-rc3-00020-g307fec773ba3 #3
  Hardware name: linux,dummy-virt (DT)
  task: ffff80001e271780 task.stack: ffff000010908000
  PC is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
  LR is at get_futex_key+0x6a4/0xcf0 kernel/futex.c:679
  pc : [<ffff00000821ac14>] lr : [<ffff00000821ac14>] pstate: 80000145

The fact that it's a bug instead of a warning was due to an unrelated
arm64 problem, but the warning itself triggered because the underlying
mapping changed.

This is an application issue but from a kernel perspective it's a
recoverable situation and the warning is unnecessary so this patch
removes the warning.  The warning may potentially be triggered with the
following test program from Mark although it may be necessary to adjust
NR_FUTEX_THREADS to be a value smaller than the number of CPUs in the
system.

    #include <linux/futex.h>
    #include <pthread.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/mman.h>
    #include <sys/syscall.h>
    #include <sys/time.h>
    #include <unistd.h>

    #define NR_FUTEX_THREADS 16
    pthread_t threads[NR_FUTEX_THREADS];

    void *mem;

    #define MEM_PROT  (PROT_READ | PROT_WRITE)
    #define MEM_SIZE  65536

    static int futex_wrapper(int *uaddr, int op, int val,
                             const struct timespec *timeout,
                             int *uaddr2, int val3)
    {
        syscall(SYS_futex, uaddr, op, val, timeout, uaddr2, val3);
    }

    void *poll_futex(void *unused)
    {
        for (;;) {
            futex_wrapper(mem, FUTEX_CMP_REQUEUE_PI, 1, NULL, mem + 4, 1);
        }
    }

    int main(int argc, char *argv[])
    {
        int i;

        mem = mmap(NULL, MEM_SIZE, MEM_PROT,
               MAP_SHARED | MAP_ANONYMOUS, -1, 0);

        printf("Mapping @ %p\n", mem);

        printf("Creating futex threads...\n");

        for (i = 0; i < NR_FUTEX_THREADS; i++)
            pthread_create(&threads[i], NULL, poll_futex, NULL);

        printf("Flipping mapping...\n");
        for (;;) {
            mmap(mem, MEM_SIZE, MEM_PROT,
                 MAP_FIXED | MAP_SHARED | MAP_ANONYMOUS, -1, 0);
        }

        return 0;
    }

Reported-and-tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-09 14:00:54 -07:00
Linus Torvalds 358f8c26b1 Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
 "The main thing is to allow empty id_tables for ACPI to make some
  drivers get probed again. It looks a bit bigger than usual because it
  needs some internal renaming, too.

  Other than that, there is a fix for broken DSTDs, a super simple
  enablement for ARM MPS, and two documentation fixes which I'd like to
  see in v4.13 already"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: rephrase explanation of I2C_CLASS_DEPRECATED
  i2c: allow i2c-versatile for ARM MPS platforms
  i2c: designware: Some broken DSTDs use 1MiHz instead of 1MHz
  i2c: designware: Print clock freq on invalid clock freq error
  i2c: core: Allow empty id_table in ACPI case as well
  i2c: mux: pinctrl: mention correct module name in Kconfig help text
2017-08-09 13:21:28 -07:00
Linus Torvalds 31cf92f3dd Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Three patches that should go into this release.

  Two of them are from Paolo and fix up some corner cases with BFQ, and
  the last patch is from Ming and fixes up a potential usage count
  imbalance regression due to the recent NOWAIT work"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: don't leak preempt counter/q_usage_counter when allocating rq failed
  block, bfq: consider also in_service_entity to state whether an entity is active
  block, bfq: reset in_service_entity if it becomes idle
2017-08-09 10:37:35 -07:00
Linus Torvalds d555eb6b36 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "Fix two regressions in the inside-secure driver with respect to
  hmac(sha1)"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: inside-secure - fix the sha state length in hmac_sha1_setkey
  crypto: inside-secure - fix invalidation check in hmac_sha1_setkey
2017-08-09 10:33:49 -07:00
Linus Torvalds 4530cca198 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "The pull requests are getting smaller, that's progress I suppose :-)

   1) Fix infinite loop in CIPSO option parsing, from Yujuan Qi.

   2) Fix remote checksum handling in VXLAN and GUE tunneling drivers,
      from Koichiro Den.

   3) Missing u64_stats_init() calls in several drivers, from Florian
      Fainelli.

   4) TCP can set the congestion window to an invalid ssthresh value
      after congestion window reductions, from Yuchung Cheng.

   5) Fix BPF jit branch generation on s390, from Daniel Borkmann.

   6) Correct MIPS ebpf JIT merge, from David Daney.

   7) Correct byte order test in BPF test_verifier.c, from Daniel
      Borkmann.

   8) Fix various crashes and leaks in ASIX driver, from Dean Jenkins.

   9) Handle SCTP checksums properly in mlx4 driver, from Davide
      Caratti.

  10) We can potentially enter tcp_connect() with a cached route
      already, due to fastopen, so we have to explicitly invalidate it.

  11) skb_warn_bad_offload() can bark in legitimate situations, fix from
      Willem de Bruijn"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  net: avoid skb_warn_bad_offload false positives on UFO
  qmi_wwan: fix NULL deref on disconnect
  ppp: fix xmit recursion detection on ppp channels
  rds: Reintroduce statistics counting
  tcp: fastopen: tcp_connect() must refresh the route
  net: sched: set xt_tgchk_param par.net properly in ipt_init_target
  net: dsa: mediatek: add adjust link support for user ports
  net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
  qed: Fix a memory allocation failure test in 'qed_mcp_cmd_init()'
  hysdn: fix to a race condition in put_log_buffer
  s390/qeth: fix L3 next-hop in xmit qeth hdr
  asix: Fix small memory leak in ax88772_unbind()
  asix: Ensure asix_rx_fixup_info members are all reset
  asix: Add rx->ax_skb = NULL after usbnet_skb_return()
  bpf: fix selftest/bpf/test_pkt_md_access on s390x
  netvsc: fix race on sub channel creation
  bpf: fix byte order test in test_verifier
  xgene: Always get clk source, but ignore if it's missing for SGMII ports
  MIPS: Add missing file for eBPF JIT.
  bpf, s390: fix build for libbpf and selftest suite
  ...
2017-08-09 10:14:04 -07:00
Willem de Bruijn 8d63bee643 net: avoid skb_warn_bad_offload false positives on UFO
skb_warn_bad_offload triggers a warning when an skb enters the GSO
stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
checksum offload set.

Commit b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
observed that SKB_GSO_DODGY producers can trigger the check and
that passing those packets through the GSO handlers will fix it
up. But, the software UFO handler will set ip_summed to
CHECKSUM_NONE.

When __skb_gso_segment is called from the receive path, this
triggers the warning again.

Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
Tx these two are equivalent. On Rx, this better matches the
skb state (checksum computed), as CHECKSUM_NONE here means no
checksum computed.

See also this thread for context:
http://patchwork.ozlabs.org/patch/799015/

Fixes: b2504a5dbe ("net: reduce skb_warn_bad_offload() noise")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 21:39:01 -07:00
Bjørn Mork bbae08e592 qmi_wwan: fix NULL deref on disconnect
qmi_wwan_disconnect is called twice when disconnecting devices with
separate control and data interfaces.  The first invocation will set
the interface data to NULL for both interfaces to flag that the
disconnect has been handled.  But the matching NULL check was left
out when qmi_wwan_disconnect was added, resulting in this oops:

  usb 2-1.4: USB disconnect, device number 4
  qmi_wwan 2-1.4:1.6 wwp0s29u1u4i6: unregister 'qmi_wwan' usb-0000:00:1d.0-1.4, WWAN/QMI device
  BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0
  IP: qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
  PGD 0
  P4D 0
  Oops: 0000 [#1] SMP
  Modules linked in: <stripped irrelevant module list>
  CPU: 2 PID: 33 Comm: kworker/2:1 Tainted: G            E   4.12.3-nr44-normandy-r1500619820+ #1
  Hardware name: LENOVO 4291LR7/4291LR7, BIOS CBET4000 4.6-810-g50522254fb 07/21/2017
  Workqueue: usb_hub_wq hub_event [usbcore]
  task: ffff8c882b716040 task.stack: ffffb8e800d84000
  RIP: 0010:qmi_wwan_disconnect+0x25/0xc0 [qmi_wwan]
  RSP: 0018:ffffb8e800d87b38 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
  RDX: 0000000000000001 RSI: ffff8c8824f3f1d0 RDI: ffff8c8824ef6400
  RBP: ffff8c8824ef6400 R08: 0000000000000000 R09: 0000000000000000
  R10: ffffb8e800d87780 R11: 0000000000000011 R12: ffffffffc07ea0e8
  R13: ffff8c8824e2e000 R14: ffff8c8824e2e098 R15: 0000000000000000
  FS:  0000000000000000(0000) GS:ffff8c8835300000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000000000000e0 CR3: 0000000229ca5000 CR4: 00000000000406e0
  Call Trace:
   ? usb_unbind_interface+0x71/0x270 [usbcore]
   ? device_release_driver_internal+0x154/0x210
   ? qmi_wwan_unbind+0x6d/0xc0 [qmi_wwan]
   ? usbnet_disconnect+0x6c/0xf0 [usbnet]
   ? qmi_wwan_disconnect+0x87/0xc0 [qmi_wwan]
   ? usb_unbind_interface+0x71/0x270 [usbcore]
   ? device_release_driver_internal+0x154/0x210

Reported-and-tested-by: Nathaniel Roach <nroach44@gmail.com>
Fixes: c6adf77953 ("net: usb: qmi_wwan: add qmap mux protocol support")
Cc: Daniele Palmas <dnlplm@gmail.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 21:14:16 -07:00
Guillaume Nault 0a0e1a85c8 ppp: fix xmit recursion detection on ppp channels
Commit e5dadc65f9 ("ppp: Fix false xmit recursion detect with two ppp
devices") dropped the xmit_recursion counter incrementation in
ppp_channel_push() and relied on ppp_xmit_process() for this task.
But __ppp_channel_push() can also send packets directly (using the
.start_xmit() channel callback), in which case the xmit_recursion
counter isn't incremented anymore. If such packets get routed back to
the parent ppp unit, ppp_xmit_process() won't notice the recursion and
will call ppp_channel_push() on the same channel, effectively creating
the deadlock situation that the xmit_recursion mechanism was supposed
to prevent.

This patch re-introduces the xmit_recursion counter incrementation in
ppp_channel_push(). Since the xmit_recursion variable is now part of
the parent ppp unit, incrementation is skipped if the channel doesn't
have any. This is fine because only packets routed through the parent
unit may enter the channel recursively.

Finally, we have to ensure that pch->ppp is not going to be modified
while executing ppp_channel_push(). Instead of taking this lock only
while calling ppp_xmit_process(), we now have to hold it for the full
ppp_channel_push() execution. This respects the ppp locks ordering
which requires locking ->upl before ->downl.

Fixes: e5dadc65f9 ("ppp: Fix false xmit recursion detect with two ppp devices")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 21:06:11 -07:00
Håkon Bugge 05bfd7dbb5 rds: Reintroduce statistics counting
In commit 7e3f2952ee ("rds: don't let RDS shutdown a connection
while senders are present"), refilling the receive queue was removed
from rds_ib_recv(), along with the increment of
s_ib_rx_refill_from_thread.

Commit 73ce4317bf ("RDS: make sure we post recv buffers")
re-introduces filling the receive queue from rds_ib_recv(), but does
not add the statistics counter. rds_ib_recv() was later renamed to
rds_ib_recv_path().

This commit reintroduces the statistics counting of
s_ib_rx_refill_from_thread and s_ib_rx_refill_from_cq.

Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Knut Omang <knut.omang@oracle.com>
Reviewed-by: Wei Lin Guay <wei.lin.guay@oracle.com>
Reviewed-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 21:03:47 -07:00
Eric Dumazet 8ba6092471 tcp: fastopen: tcp_connect() must refresh the route
With new TCP_FASTOPEN_CONNECT socket option, there is a possibility
to call tcp_connect() while socket sk_dst_cache is either NULL
or invalid.

 +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 4
 +0 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
 +0 setsockopt(4, SOL_TCP, TCP_FASTOPEN_CONNECT, [1], 4) = 0
 +0 connect(4, ..., ...) = 0

<< sk->sk_dst_cache becomes obsolete, or even set to NULL >>

 +1 sendto(4, ..., 1000, MSG_FASTOPEN, ..., ...) = 1000

We need to refresh the route otherwise bad things can happen,
especially when syzkaller is running on the host :/

Fixes: 19f6d3f3c8 ("net/tcp-fastopen: Add new API support")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 20:39:52 -07:00
Xin Long ec0acb0931 net: sched: set xt_tgchk_param par.net properly in ipt_init_target
Now xt_tgchk_param par in ipt_init_target is a local varibale,
par.net is not initialized there. Later when xt_check_target
calls target's checkentry in which it may access par.net, it
would cause kernel panic.

Jaroslav found this panic when running:

  # ip link add TestIface type dummy
  # tc qd add dev TestIface ingress handle ffff:
  # tc filter add dev TestIface parent ffff: u32 match u32 0 0 \
    action xt -j CONNMARK --set-mark 4

This patch is to pass net param into ipt_init_target and set
par.net with it properly in there.

v1->v2:
  As Wang Cong pointed, I missed ipt_net_id != xt_net_id, so fix
  it by also passing net_id to __tcf_ipt_init.
v2->v3:
  Missed the fixes tag, so add it.

Fixes: ecb2421b5d ("netfilter: add and use nf_ct_netns_get/put")
Reported-by: Jaroslav Aster <jaster@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 20:38:00 -07:00
John Crispin 8e6f1521ec net: dsa: mediatek: add adjust link support for user ports
Manually adjust the port settings of user ports once PHY polling has
completed. This patch extends the adjust_link callback to configure the
per port PMCR register, applying the proper values polled from the PHY.
Without this patch flow control was not always getting setup properly.

Signed-off-by: Shashidhar Lakkavalli <shashidhar.lakkavalli@openmesh.com>
Signed-off-by: Muciri Gatimu <muciri@openmesh.com>
Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 18:01:25 -07:00
Davide Caratti e718fe450e net/mlx4_en: don't set CHECKSUM_COMPLETE on SCTP packets
if the NIC fails to validate the checksum on TCP/UDP, and validation of IP
checksum is successful, the driver subtracts the pseudo-header checksum
from the value obtained by the hardware and sets CHECKSUM_COMPLETE. Don't
do that if protocol is IPPROTO_SCTP, otherwise CRC32c validation fails.

V2: don't test MLX4_CQE_STATUS_IPV6 if MLX4_CQE_STATUS_IPV4 is set

Reported-by: Shuang Li <shuali@redhat.com>
Fixes: f8c6455bb0 ("net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE")
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-08 17:59:57 -07:00
Linus Torvalds bfa738cf3d Third set of -rc fixes for 4.13 cycle
- small set of miscellanous fixes
 - a reasonably sizable set of IPoIB fixes that deal with multiple long
   standing issues
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZiKRfAAoJELgmozMOVy/dU6cP/00pqRgSkywN+6Duclqofdvs
 D2vK1Mp+ehrzhbq0tdlkBizG+cM6EUHkFOcUmUMhsbTCxsiSjWqqHQvz/6EmPK/m
 mSbFkiYOsEzVOuZ+65VQRLhwXe3WnQ0GLJZ58+RLML7NkSK2/AElzrlDmVzfm2Ve
 yQfYM4QGGymhHyHjpKzA955Q9Tt/TS62dUTLAJZbpJfPcR7eErLaLZLsoUlcpDeP
 zyybbP+YCPu03w9eC8O3Jd8r9s0o4w8qWPo8ROaIzIkJGbou5yrt7mQNP2gSASta
 rHbLF96p9myIv4u60+UdsZt2GAthUHxqyEDJpQNfB1tAjb8JtZ7cy4xpXYAelU8P
 PYs86PUZiCj2kPhOzhaI6OooNZANDK+4IBu8pVE68V0aWVnu84qA3ZjAfRBtaUgA
 eNRXa/M+ivjNcJ4bGDqQVAnulTVpSD7PNPWlmGcZahv+kXHT1Bs4dgjUW/RWtFkS
 A2XP23gyOHMrK9aEaiAhcE2Vky+0RuMiMbuyNzzSuB6XgUP+kgO9z6OkZWhpEOXu
 SLgVf0fgqo9rlmoQCwaSh/saYSHg4F6XQit9Gk37tpOlnx2OQbzFBHXLJ9zAf9UU
 CALh9QzroalR0140vfIqOGCcDKFKputZNMIBpKnsYio+JLMUwdUZA2UwtNsKLBYr
 Ih7p6tC9jMqYusAWAyFg
 =2fQS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull rdma fixes from Doug Ledford:
 "Third set of -rc fixes for 4.13 cycle

   - small set of miscellanous fixes

   - a reasonably sizable set of IPoIB fixes that deal with multiple
     long standing issues"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/hns: checking for IS_ERR() instead of NULL
  RDMA/mlx5: Fix existence check for extended address vector
  IB/uverbs: Fix device cleanup
  RDMA/uverbs: Prevent leak of reserved field
  IB/core: Fix race condition in resolving IP to MAC
  IB/ipoib: Notify on modify QP failure only when relevant
  Revert "IB/core: Allow QP state transition from reset to error"
  IB/ipoib: Remove double pointer assigning
  IB/ipoib: Clean error paths in add port
  IB/ipoib: Add get statistics support to SRIOV VF
  IB/ipoib: Add multicast packets statistics
  IB/ipoib: Set IPOIB_NEIGH_TBL_FLUSH after flushed completion initialization
  IB/ipoib: Prevent setting negative values to max_nonsrq_conn_qp
  IB/ipoib: Make sure no in-flight joins while leaving that mcast
  IB/ipoib: Use cancel_delayed_work_sync when needed
  IB/ipoib: Fix race between light events and interface restart
2017-08-08 11:42:33 -07:00
Joe Perches b95c29a20f parse-maintainers: Move matching sections from MAINTAINERS
Allow any number of command line arguments to match either the
section header or the section contents and create new files.

Create MAINTAINERS.new and SECTION.new.

This allows scripting of the movement of various sections from
MAINTAINERS.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:16:14 -07:00
Joe Perches fe9090301f parse-maintainers: Use perl hash references and specific filenames
Instead of reading STDIN and writing STDOUT, use specific filenames of
MAINTAINERS and MAINTAINERS.new.

Use hash references instead of global hash %hash so future modifications
can read and write specific hashes to split up MAINTAINERS into multiple
files using a script.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:16:14 -07:00
Joe Perches 61f741645a parse-maintainers: Add section pattern sorting
Section [A-Z]: patterns are not currently in any required sorting order.
Add a specific sorting sequence to MAINTAINERS entries.
Sort F: and X: patterns in alphabetic order.

The preferred section ordering is:

  SECTION HEADER
  M:	Maintainers
  R:	Reviewers
  P:	Named persons without email addresses
  L:	Mailing list addresses
  S:	Status of this section (Supported, Maintained, Orphan, etc...)
  W:	Any relevant URLs
  T:	Source code control type (git, quilt, etc)
  Q:	Patchwork patch acceptance queue site
  B:	Bug tracking URIs
  C:	Chat URIs
  F:	Files with wildcard patterns (alphabetic ordered)
  X:	Excluded files with wildcard patterns (alphabetic ordered)
  N:	Files with regex patterns
  K:	Keyword regexes in source code for maintainership identification

Miscellaneous perl neatening:

 - Rename %map to %hash, map has a different meaning in perl
 - Avoid using \& and local variables for function indirection
 - Use return for a little c like clarity
 - Use c-like function call style instead of &function

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:16:13 -07:00
Joe Perches 6f7d98ec44 get_maintainer: Prepare for separate MAINTAINERS files
Allow for MAINTAINERS to become a directory and if it is,
read all the files in the directory for maintained sections.

Optionally look for all files named MAINTAINERS in directories
excluding the .git directory by using --find-maintainer-files.

This optional feature adds ~.3 seconds of CPU on an Intel
i5-6200 with an SSD.

Miscellanea:

 - Create a read_maintainer_file subroutine from the existing code
 - Test only the existence of MAINTAINERS, not whether it's a file

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:09:31 -07:00
Randy Dunlap 6209ef6788 MAINTAINERS: openbmc mailing list is moderated
The openbmc mailing list is moderated for non-subscribers.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Joel Stanley <joel@jms.id.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:09:31 -07:00
Sedat Dilek a1ffc2d25a MAINTAINERS: greybus: Fix typo s/LOOBACK/LOOPBACK
Fixes: f47e07bc5f ("Fix up MAINTAINERS file problems")
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 11:09:31 -07:00
Linus Torvalds de70be0ae3 SCSI fixes on 20170808
Two small fixes, one re-fix of a previous fix and five patches sorting
 out hotplug in the bnx2X class of drivers.  The latter is rather
 involved, but necessary because these drivers have started dropping
 lockdep recursion warnings on the hotplug lock because of its
 conversion to a percpu rwsem.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJZidgyAAoJEAVr7HOZEZN4tzEQAJsrO6rW6ng/LGIbf+oYdNFY
 lPcdC3wCJHpPJLzvd0yPu0Davu+m1bCtke8XctWKRcj2QO9YvDmwCrJ6nVNMPO8p
 LFz0rYSma5VDXyPVcUn+rtMDY3eRVQG8oukVdMzFRtAAlXMDpDl4gaITp4X2HX78
 7hSt/SdmpKAgidrvgln7hHvVJKE+Z3Tzp2asS5ygfle+1/u0ByMskiyo6tR/zz8I
 uf8Y2s5GeiHBe5K6bi4j+xSmC7evxqnDZpQAHeeT/weJQ8l96SIrPN66W9IjJWnD
 D0P10ctFYS5WfQAB6Fd3UHilVDYOeGOXHOIdWBqEbpi2OtsstihsQy95tQmRxgN7
 uBBkCanPbcLm1J9Vg/K08wY8TUdsjg+zJbq+Q+u4q37xJUjz6SYDSMzQyUNT4W1u
 3+dB9ImjBZjisg1stokuUTM5lbd9V/SvZngs7nzzi0siRXQI4oTKbGkS1PZUYnly
 rBfSpnakuraKnQqs0s0c1y0+bcGT4AVMGkzaZ0ivtk4JGWmmkXSoAurByfu9Eq87
 7feVyl9QxTBQAD1bl5IaRA/cxBhqe31FwvIpM8ik9zUF5hqt8vAZszB0Yi9MqyvM
 3ZCR0poMk8e2ZE+vMcH304uQ9CW3jG7cgWwZgfvWBmd/NgMOKgFhY3PAScmGkFex
 EgeDmxx+/uctEHQYYV3g
 =Lg/x
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Two small fixes, one re-fix of a previous fix and five patches sorting
  out hotplug in the bnx2X class of drivers. The latter is rather
  involved, but necessary because these drivers have started dropping
  lockdep recursion warnings on the hotplug lock because of its
  conversion to a percpu rwsem"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sg: only check for dxfer_len greater than 256M
  scsi: aacraid: reading out of bounds
  scsi: qedf: Limit number of CQs
  scsi: bnx2i: Simplify cpu hotplug code
  scsi: bnx2fc: Simplify CPU hotplug code
  scsi: bnx2i: Prevent recursive cpuhotplug locking
  scsi: bnx2fc: Prevent recursive cpuhotplug locking
  scsi: bnx2fc: Plug CPU hotplug race
2017-08-08 09:38:41 -07:00
Helge Deller 51d96dc2e2 random: fix warning message on ia64 and parisc
Fix the warning message on the parisc and IA64 architectures to show the
correct function name of the caller by using %pS instead of %pF. The
message is printed with the value of _RET_IP_ which calls
__builtin_return_address(0) and as such returns the IP address caller
instead of pointer to a function descriptor of the caller.

The effect of this patch is visible on the parisc and ia64 architectures
only since those are the ones which use function descriptors while on
all others %pS and %pF will behave the same.

Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: eecabf5674 ("random: suppress spammy warnings about unseeded randomness")
Fixes: d06bfd1989 ("random: warn when kernel uses unseeded randomness")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-08 09:36:46 -07:00
Linus Torvalds 623ce34566 Xtensa fixes for v4.13-rc4
- use asm-generic instances of asm/param.h and asm/device.h instead of exact
   copies in arch/xtensa/include/asm;
 - fix build error for xtensa cores with aliasing WT cache: define cache
   flushing functions and copy_{to,from}_user_page;
 - add missing EXPORT_SYMBOLs for clear_user_highpage, copy_user_highpage,
   flush_dcache_page, local_flush_cache_range, local_flush_cache_page,
   csum_partial and csum_partial_copy_generic.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJZiOctAAoJEFH5zJH4P6BEqsMQAIgyKC3QvZpJWzKRe4ET0sdr
 cKH5/XOsk1C0TiBkG0O3+BrvfYSnK+qdksByI/KjuP2BCPqH8iwGiY8zvSHP+m7S
 xRqivIp6J+N9jZGn4buxdAQ6zgqS1MtKwZkp3H1LtfIPh7jXTn9HvSrG+4jqhXNl
 Jjejd4so/m7XSrbCgMEt98WoFSPJ1QUVmPmTdOdEbv4Q+4Vf15mqnoMQ/L8uBu0o
 dqmSRYp3zUKNU2kr1DC56x56OVD6WRr9jxaaW0MtuUxG01YBfy3PELxbHmEFJi0X
 Po1TRF06B42Z1+6MVQR8R1um7cHM8PyHZh+WBCEpvDhfUUslKtKpZkKfSkLdskTv
 Cyf6h2hazCYA1Tmtirk2IKkJuzXvLh3639X/jXsC27t5NGX43gJ/3mm+PfEX5h0r
 Lm+dMsghFXLR3v+hNv1xv8w3lSZcRnrAaTTBnehgUNCwHMyToUqEQgHpIhH0LrjY
 +z813FeLaCwekhbFXoYCJLngwJ6RKPeDNyhebcvYwC66btIr/SYLcZS4ysFA5HQw
 qTAHghdaMlGjkqXYYc6OLHDHbEKIhhfk3qVHe/fYPtxrb4QxZId0aiLMehRrKWCD
 440PE4TmkkPFU4Rfn2/um90lQS/69RouWVbyethHhYrLrq68qmpH9/ukB76phroU
 TN/MK50EjZ+v5ZHUSUVH
 =I9hz
 -----END PGP SIGNATURE-----

Merge tag 'xtensa-20170807' of git://github.com/jcmvbkbc/linux-xtensa

Pull Xtensa fixes from Max Filippov:

 - use asm-generic instances of asm/param.h and asm/device.h instead of
   exact copies in arch/xtensa/include/asm;

 - fix build error for xtensa cores with aliasing WT cache: define cache
   flushing functions and copy_{to,from}_user_page;

 - add missing EXPORT_SYMBOLs for clear_user_highpage, copy_user_highpage,
   flush_dcache_page, local_flush_cache_range, local_flush_cache_page,
   csum_partial and csum_partial_copy_generic.

* tag 'xtensa-20170807' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: mm/cache: add missing EXPORT_SYMBOLs
  xtensa: don't limit csum_partial export by CONFIG_NET
  xtensa: fix cache aliasing handling code for WT cache
  xtensa: remove wrapper header for asm/param.h
  xtensa: remove wrapper header for asm/device.h
2017-08-07 18:58:10 -07:00