Commit Graph

607 Commits

Author SHA1 Message Date
Frederic Weisbecker 1bda3f8087 sched/isolation: Isolate workqueues when "nohz_full=" is set
As we prepare for offloading the residual 1hz scheduler ticks to
workqueue, let's affine those to housekeepers so that they don't
interrupt the CPUs that don't want to be disturbed.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-21 09:49:08 +01:00
Linus Torvalds 5d8515bc23 Staging/IIO patches for 4.16-rc1
Here is the big Staging and IIO driver patches for 4.16-rc1.
 
 There is the normal amount of new IIO drivers added, like all releases.
 
 The networking IPX and the ncpfs filesystem are moved into the staging
 tree, as they are on their way out of the kernel due to lack of use
 anymore.
 
 The visorbus subsystem finall has started moving out of the staging tree
 to the "real" part of the kernel, and the most and fsl-mc codebases are
 almost ready to move out, that will probably happen for 4.17-rc1 if all
 goes well.
 
 Other than that, there is a bunch of license header cleanups in the
 tree, along with the normal amount of coding style churn that we all
 know and love for this codebase.  I also got frustrated at the
 Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
 huge chunks of it that were never even being used.
 
 Full details of everything is in the shortlog.
 
 All of these patches have been in linux-next for a while with no
 reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWnLxoA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk4vgCgjeMlwhtar65DIticIRj626EFxiQAnjGmH8Kd
 d9Xz2Piq8X47uSsC/6AE
 =xxMT
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO updates from Greg KH:
 "Here is the big Staging and IIO driver patches for 4.16-rc1.

  There is the normal amount of new IIO drivers added, like all
  releases.

  The networking IPX and the ncpfs filesystem are moved into the staging
  tree, as they are on their way out of the kernel due to lack of use
  anymore.

  The visorbus subsystem finall has started moving out of the staging
  tree to the "real" part of the kernel, and the most and fsl-mc
  codebases are almost ready to move out, that will probably happen for
  4.17-rc1 if all goes well.

  Other than that, there is a bunch of license header cleanups in the
  tree, along with the normal amount of coding style churn that we all
  know and love for this codebase. I also got frustrated at the
  Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting
  huge chunks of it that were never even being used.

  Full details of everything is in the shortlog.

  All of these patches have been in linux-next for a while with no
  reported issues"

* tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits)
  staging: rtlwifi: remove redundant initialization of 'cfg_cmd'
  staging: rtl8723bs: remove a couple of redundant initializations
  staging: comedi: reformat lines to 80 chars or less
  staging: lustre: separate a connection destroy from free struct kib_conn
  Staging: rtl8723bs: Use !x instead of NULL comparison
  Staging: rtl8723bs: Remove dead code
  Staging: rtl8723bs: Change names to conform to the kernel code
  staging: ccree: Fix missing blank line after declaration
  staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd'
  staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST
  staging: fbtft: remove unused FB_TFT_SSD1325 kconfig
  staging: comedi: dt2811: remove redundant initialization of 'ns'
  staging: wilc1000: fix alignments to match open parenthesis
  staging: wilc1000: removed unnecessary defined enums typedef
  staging: wilc1000: remove unnecessary use of parentheses
  staging: rtl8192u: remove redundant initialization of 'timeout'
  staging: sm750fb: fix CamelCase for dispSet var
  staging: lustre: lnet/selftest: fix compile error on UP build
  staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons
  staging: rts5208: Fix "seg_no" calculation in reset_ms_card()
  ...
2018-02-01 09:51:57 -08:00
Linus Torvalds f8cc87b6c1 Merge branch 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Workqueue has an early init trick where workqueues can be created and
  work items queued on them before the workqueue subsystem is online.
  This helps simplifying early init and operation of low level
  subsystems which use workqueues for managerial things which aren't
  depended upon early during boot.

  Out of laziness, the early init didn't cover workqueues with
  WQ_MEM_RECLAIM, which is inconsistent and confusing because adding the
  flag simply makes the system fail to boot. Cover WQ_MEM_RECLAIM too.

  This was originally brought up for RCU but RCU didn't actually need
  this. I still think it's a good idea to cover it"

* 'for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: allow WQ_MEM_RECLAIM on early init workqueues
  workqueue: separate out init_rescuer()
2018-01-30 14:45:39 -08:00
Linus Torvalds d772794637 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle were:

   - Updates to use cond_resched() instead of cond_resched_rcu_qs()
     where feasible (currently everywhere except in kernel/rcu and in
     kernel/torture.c). Also a couple of fixes to avoid sending IPIs to
     offline CPUs.

   - Updates to simplify RCU's dyntick-idle handling.

   - Updates to remove almost all uses of smp_read_barrier_depends() and
     read_barrier_depends().

   - Torture-test updates.

   - Miscellaneous fixes"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
  torture: Save a line in stutter_wait(): while -> for
  torture: Eliminate torture_runnable and perf_runnable
  torture: Make stutter less vulnerable to compilers and races
  locking/locktorture: Fix num reader/writer corner cases
  locking/locktorture: Fix rwsem reader_delay
  torture: Place all torture-test modules in one MAINTAINERS group
  rcutorture/kvm-build.sh: Skip build directory check
  rcutorture: Simplify functions.sh include path
  rcutorture: Simplify logging
  rcutorture/kvm-recheck-*: Improve result directory readability check
  rcutorture/kvm.sh: Support execution from any directory
  rcutorture/kvm.sh: Use consistent help text for --qemu-args
  rcutorture/kvm.sh: Remove unused variable, `alldone`
  rcutorture: Remove unused script, config2frag.sh
  rcutorture/configinit: Fix build directory error message
  rcutorture: Preempt RCU-preempt readers more vigorously
  torture: Reduce #ifdefs for preempt_schedule()
  rcu: Remove have_rcu_nocb_mask from tree_plugin.h
  rcu: Add comment giving debug strategy for double call_rcu()
  tracing, rcu: Hide trace event rcu_nocb_wake when not used
  ...
2018-01-30 10:15:30 -08:00
NeilBrown 6106c0f824 staging: lustre: lnet: convert selftest to use workqueues
Instead of the cfs workitem library, use workqueues.

As lnet wants to provide a cpu mask of allowed cpus, it
needs to be a WQ_UNBOUND work queue so that tasks can
run on cpus other than where they were submitted.

This patch also exported apply_workqueue_attrs() which is
a documented part of the workqueue API, that isn't currently
exported.  lustre needs it to allow workqueue thread to be limited
to a subset of CPUs.

Acked-by: Tejun Heo <tj@kernel.org> (for export of apply_workqueue_attrs)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-15 15:44:08 +01:00
Sergey Senozhatsky 62635ea8c1 workqueue: avoid hard lockups in show_workqueue_state()
show_workqueue_state() can print out a lot of messages while being in
atomic context, e.g. sysrq-t -> show_workqueue_state(). If the console
device is slow it may end up triggering NMI hard lockup watchdog.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # v4.5+
2018-01-12 11:39:49 -08:00
Tejun Heo 40c17f75df workqueue: allow WQ_MEM_RECLAIM on early init workqueues
Workqueues can be created early during boot before workqueue subsystem
in fully online - work items are queued waiting for later full
initialization.  However, early init wasn't supported for
WQ_MEM_RECLAIM workqueues causing unnecessary annoyances for a subset
of users.  Expand early init support to include WQ_MEM_RECLAIM
workqueues.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
2018-01-08 05:38:37 -08:00
Tejun Heo 983c751532 workqueue: separate out init_rescuer()
Separate out init_rescuer() from __alloc_workqueue_key() to prepare
for early init support for WQ_MEM_RECLAIM.  This patch doesn't
introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
2018-01-08 05:38:32 -08:00
Ingo Molnar 475c5ee193 Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

- Updates to use cond_resched() instead of cond_resched_rcu_qs()
  where feasible (currently everywhere except in kernel/rcu and
  in kernel/torture.c).  Also a couple of fixes to avoid sending
  IPIs to offline CPUs.

- Updates to simplify RCU's dyntick-idle handling.

- Updates to remove almost all uses of smp_read_barrier_depends()
  and read_barrier_depends().

- Miscellaneous fixes.

- Torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-01-03 14:14:18 +01:00
Sergey Senozhatsky 01dfee9582 workqueue: remove unneeded kallsyms include
The filw was converted from print_symbol() to %pf some time
ago (044c782ce3 "workqueue: fix checkpatch issues").
kallsyms does not seem to be needed anymore.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-12-11 07:15:43 -08:00
Lai Jiangshan 62408c1ef0 workqueue/hotplug: remove the workaround in rebind_workers()
Since the cpu/hotplug refactoring, DOWN_FAILED is never called without
preceding DOWN_PREPARE making the workaround unnecessary.  Remove it.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-12-04 14:46:09 -08:00
Lai Jiangshan e8b3f8db7a workqueue/hotplug: simplify workqueue_offline_cpu()
Since the recent cpu/hotplug refactoring, workqueue_offline_cpu() is
guaranteed to run on the local cpu which is going offline.

This also fixes the following deadlock by removing work item
scheduling and flushing from CPU hotplug path.

 http://lkml.kernel.org/r/1504764252-29091-1-git-send-email-prsood@codeaurora.org

tj: Description update.

Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-12-04 14:44:11 -08:00
Paul E. McKenney a7e6425ea5 workqueue: Eliminate cond_resched_rcu_qs() in favor of cond_resched()
Now that cond_resched() also provides RCU quiescent states when
needed, it can be used in place of cond_resched_rcu_qs().  This
commit therefore makes this change.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
2017-12-04 10:28:10 -08:00
Tal Shorer c98a980509 workqueue: respect isolated cpus when queueing an unbound work
Initialize wq_unbound_cpumask to exclude cpus that were isolated by
the cmdline's isolcpus parameter.

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-27 08:57:00 -08:00
Kees Cook 841b86f328 treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:

    perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
        $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)

    perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
        $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)

The now unused macros are also dropped from include/linux/timer.h.

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 16:35:54 -08:00
Linus Torvalds 0be500363c Merge branch 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "There was a commit to make unbound kworkers respect cpu isolation but
  it conflicted with the restructuring of cpu isolation and got
  reverted, so the only thing left is the trivial comment fix.

  Will retry the cpu isolation change after this merge window"

* 'for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix comment for unbound workqueue's attrbutes
  Revert "workqueue: respect isolated cpus when queueing an unbound work"
  workqueue: respect isolated cpus when queueing an unbound work
2017-11-15 14:15:21 -08:00
Linus Torvalds 2bcc673101 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Yet another big pile of changes:

   - More year 2038 work from Arnd slowly reaching the point where we
     need to think about the syscalls themself.

   - A new timer function which allows to conditionally (re)arm a timer
     only when it's either not running or the new expiry time is sooner
     than the armed expiry time. This allows to use a single timer for
     multiple timeout requirements w/o caring about the first expiry
     time at the call site.

   - A new NMI safe accessor to clock real time for the printk timestamp
     work. Can be used by tracing, perf as well if required.

   - A large number of timer setup conversions from Kees which got
     collected here because either maintainers requested so or they
     simply got ignored. As Kees pointed out already there are a few
     trivial merge conflicts and some redundant commits which was
     unavoidable due to the size of this conversion effort.

   - Avoid a redundant iteration in the timer wheel softirq processing.

   - Provide a mechanism to treat RTC implementations depending on their
     hardware properties, i.e. don't inflict the write at the 0.5
     seconds boundary which originates from the PC CMOS RTC to all RTCs.
     No functional change as drivers need to be updated separately.

   - The usual small updates to core code clocksource drivers. Nothing
     really exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  timers: Add a function to start/reduce a timer
  pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
  timer: Prepare to change all DEFINE_TIMER() callbacks
  netfilter: ipvs: Convert timers to use timer_setup()
  scsi: qla2xxx: Convert timers to use timer_setup()
  block/aoe: discover_timer: Convert timers to use timer_setup()
  ide: Convert timers to use timer_setup()
  drbd: Convert timers to use timer_setup()
  mailbox: Convert timers to use timer_setup()
  crypto: Convert timers to use timer_setup()
  drivers/pcmcia: omap1: Fix error in automated timer conversion
  ARM: footbridge: Fix typo in timer conversion
  drivers/sgi-xp: Convert timers to use timer_setup()
  drivers/pcmcia: Convert timers to use timer_setup()
  drivers/memstick: Convert timers to use timer_setup()
  drivers/macintosh: Convert timers to use timer_setup()
  hwrng/xgene-rng: Convert timers to use timer_setup()
  auxdisplay: Convert timers to use timer_setup()
  sparc/led: Convert timers to use timer_setup()
  mips: ip22/32: Convert timers to use timer_setup()
  ...
2017-11-13 17:56:58 -08:00
Frederic Weisbecker 8e8eb73075 workqueue: Use lockdep to assert IRQs are disabled/enabled
Use lockdep to check that IRQs are enabled or disabled as expected. This
way the sanity check only shows overhead when concurrency correctness
debug code is enabled.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: David S . Miller <davem@davemloft.net>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1509980490-4285-4-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-08 11:13:48 +01:00
Ingo Molnar 8c5db92a70 Merge branch 'linus' into locking/core, to resolve conflicts
Conflicts:
	include/linux/compiler-clang.h
	include/linux/compiler-gcc.h
	include/linux/compiler-intel.h
	include/uapi/linux/stddef.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:32:44 +01:00
Wang Long 9a19b46386 workqueue: Fix comment for unbound workqueue's attrbutes
Signed-off-by: Wang Long <wanglong19@meituan.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-06 07:04:47 -08:00
Tejun Heo edbfd9112f Revert "workqueue: respect isolated cpus when queueing an unbound work"
This reverts commit b5149873a0.

It conflicts with the following isolcpus change from the sched branch.

 edb9382175 ("sched/isolation: Move isolcpus= handling to the housekeeping code")

Let's revert for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
2017-11-03 07:02:15 -07:00
Byungchul Park fd1a5b04df workqueue: Remove now redundant lock acquisitions wrt. workqueue flushes
The workqueue code added manual lock acquisition annotations to catch
deadlocks.

After lockdepcrossrelease was introduced, some of those became redundant,
since wait_for_completion() already does the acquisition and tracking.

Remove the duplicate annotations.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: amir73il@gmail.com
Cc: axboe@kernel.dk
Cc: darrick.wong@oracle.com
Cc: david@fromorbit.com
Cc: hch@infradead.org
Cc: idryomov@gmail.com
Cc: johan@kernel.org
Cc: johannes.berg@intel.com
Cc: kernel-team@lge.com
Cc: linux-block@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/1508921765-15396-9-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 12:19:03 +02:00
Mark Rutland c95491ed6d locking/atomics, workqueue: Convert ACCESS_ONCE() to READ_ONCE()/WRITE_ONCE()
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't currently harmful.

However, for some features it is necessary to instrument reads and
writes separately, which is not possible with ACCESS_ONCE(). This
distinction is critical to correct operation.

It's possible to transform the bulk of kernel code using the Coccinelle
script below. However, this doesn't handle comments, leaving references
to ACCESS_ONCE() instances which have been removed. As a preparatory
step, this patch converts the workqueue code and comments to use
{READ,WRITE}_ONCE() consistently.

----
virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-12-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:03 +02:00
Tal Shorer b5149873a0 workqueue: respect isolated cpus when queueing an unbound work
Initialize wq_unbound_cpumask to exclude cpus that were isolated by
the cmdline's isolcpus parameter.

Signed-off-by: Tal Shorer <tal.shorer@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-10-21 09:32:15 -07:00
Kees Cook 32a6c7233c workqueue: Convert timers to use timer_setup() (part 2)
In preparation for unconditionally passing the struct timer_list pointer
to all timer callbacks, switch to using the new timer_setup() and
from_timer() to pass the timer pointer explicitly. (The prior workqueue
patch missed a few timers.)

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Link: https://lkml.kernel.org/r/20171016225825.GA99101@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-10-18 17:04:25 +02:00
Tejun Heo 692b48258d workqueue: replace pool->manager_arb mutex with a flag
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:

 [ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
 [ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
 [ 1270.473240] -----------------------------------------------------
 [ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
 [ 1270.474239]  (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
 [ 1270.474994]
 [ 1270.474994] and this task is already holding:
 [ 1270.475440]  (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
 [ 1270.476046] which would create a new lock dependency:
 [ 1270.476436]  (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
 [ 1270.476949]
 [ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
 [ 1270.477553]  (&pool->lock/1){-.-.}
 ...
 [ 1270.488900] to a HARDIRQ-irq-unsafe lock:
 [ 1270.489327]  (&(&lock->wait_lock)->rlock){+.+.}
 ...
 [ 1270.494735]  Possible interrupt unsafe locking scenario:
 [ 1270.494735]
 [ 1270.495250]        CPU0                    CPU1
 [ 1270.495600]        ----                    ----
 [ 1270.495947]   lock(&(&lock->wait_lock)->rlock);
 [ 1270.496295]                                local_irq_disable();
 [ 1270.496753]                                lock(&pool->lock/1);
 [ 1270.497205]                                lock(&(&lock->wait_lock)->rlock);
 [ 1270.497744]   <Interrupt>
 [ 1270.497948]     lock(&pool->lock/1);

, which will cause a irq inversion deadlock if the above lock scenario
happens.

The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.

Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f ("locking/mutex: Fix
lockdep_assert_held() fail").

Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag.  The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.

This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.

v2: Drop unnecessary flag clearing before pool destruction as
    suggested by Boqun.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: stable@vger.kernel.org
2017-10-10 07:13:57 -07:00
Kees Cook 8c20feb606 workqueue: Convert callback to use from_timer()
In preparation for unconditionally passing the struct timer_list pointer
to all timer callbacks, switch workqueue to use from_timer() and pass the
timer pointer explicitly.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: Petr Mladek <pmladek@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux1394-devel@lists.sourceforge.net
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-s390@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Manish Chopra <manish.chopra@cavium.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-pm@vger.kernel.org
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Gross <mark.gross@intel.com>
Cc: linux-watchdog@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Michael Reed <mdr@sgi.com>
Cc: netdev@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lkml.kernel.org/r/1507159627-127660-14-git-send-email-keescook@chromium.org
2017-10-05 15:01:22 +02:00
Kees Cook 5cd79d6abd timer: Remove users of TIMER_DEFERRED_INITIALIZER
This removes uses of TIMER_DEFERRED_INITIALIZER and chooses a location
to call timer_setup() from before add_timer() or mod_timer() is called.
Adjusts callbacks to use from_timer() as needed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mips@linux-mips.org
Cc: Petr Mladek <pmladek@suse.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: linux1394-devel@lists.sourceforge.net
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: linux-s390@vger.kernel.org
Cc: linux-wireless@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: Wim Van Sebroeck <wim@iguana.be>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ursula Braun <ubraun@linux.vnet.ibm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Harish Patil <harish.patil@cavium.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Manish Chopra <manish.chopra@cavium.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-pm@vger.kernel.org
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Mark Gross <mark.gross@intel.com>
Cc: linux-watchdog@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Michael Reed <mdr@sgi.com>
Cc: netdev@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Link: https://lkml.kernel.org/r/1507159627-127660-7-git-send-email-keescook@chromium.org
2017-10-05 15:01:18 +02:00
Linus Torvalds 9954d4892a Merge branch 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue updates from Tejun Heo:
 "Nothing major. I introduced a flag collsion bug during v4.13 cycle
  which is fixed in this pull request. Fortunately, the flag is for
  debugging / verification and the bug isn't critical"

* 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: Fix flag collision
  workqueue: Use TASK_IDLE
  workqueue: fix path to documentation
  workqueue: doc change for ST behavior on NUMA systems
2017-09-06 21:59:31 -07:00
Tejun Heo 058fc47ee2 Merge branch 'for-4.13-fixes' into for-4.14 2017-09-05 06:33:41 -07:00
Peter Zijlstra f52be57080 locking/lockdep: Untangle xhlock history save/restore from task independence
Where XHLOCK_{SOFT,HARD} are save/restore points in the xhlocks[] to
ensure the temporal IRQ events don't interact with task state, the
XHLOCK_PROC is a fundament different beast that just happens to share
the interface.

The purpose of XHLOCK_PROC is to annotate independent execution inside
one task. For example workqueues, each work should appear to run in its
own 'pristine' 'task'.

Remove XHLOCK_PROC in favour of its own interface to avoid confusion.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: kernel-team@lge.com
Cc: oleg@redhat.com
Cc: tj@kernel.org
Link: http://lkml.kernel.org/r/20170829085939.ggmb6xiohw67micb@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-29 15:14:38 +02:00
Peter Zijlstra e6f3faa734 locking/lockdep: Fix workqueue crossrelease annotation
The new completion/crossrelease annotations interact unfavourable with
the extant flush_work()/flush_workqueue() annotations.

The problem is that when a single work class does:

  wait_for_completion(&C)

and

  complete(&C)

in different executions, we'll build dependencies like:

  lock_map_acquire(W)
  complete_acquire(C)

and

  lock_map_acquire(W)
  complete_release(C)

which results in the dependency chain: W->C->W, which lockdep thinks
spells deadlock, even though there is no deadlock potential since
works are ran concurrently.

One possibility would be to change the work 'lock' to recursive-read,
but that would mean hitting a lockdep limitation on recursive locks.
Also, unconditinoally switching to recursive-read here would fail to
detect the actual deadlock on single-threaded workqueues, which do
have a problem with this.

For now, forcefully disregard these locks for crossrelease.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:06:33 +02:00
Peter Zijlstra a1d14934ea workqueue/lockdep: 'Fix' flush_work() annotation
The flush_work() annotation as introduced by commit:

  e159489baa ("workqueue: relax lockdep annotation on flush_work()")

hits on the lockdep problem with recursive read locks.

The situation as described is:

Work W1:                Work W2:        Task:

ARR(Q)                  ARR(Q)		flush_workqueue(Q)
A(W1)                   A(W2)             A(Q)
  flush_work(W2)			  R(Q)
    A(W2)
    R(W2)
    if (special)
      A(Q)
    else
      ARR(Q)
    R(Q)

where: A - acquire, ARR - acquire-read-recursive, R - release.

Where under 'special' conditions we want to trigger a lock recursion
deadlock, but otherwise allow the flush_work(). The allowing is done
by using recursive read locks (ARR), but lockdep is broken for
recursive stuff.

However, there appears to be no need to acquire the lock if we're not
'special', so if we remove the 'else' clause things become much
simpler and no longer need the recursion thing at all.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boqun.feng@gmail.com
Cc: byungchul.park@lge.com
Cc: david@fromorbit.com
Cc: johannes@sipsolutions.net
Cc: oleg@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-25 11:06:32 +02:00
Peter Zijlstra c5a94a618e workqueue: Use TASK_IDLE
Workqueues don't use signals, it (ab)uses TASK_INTERRUPTIBLE to avoid
increasing the loadavg numbers. We've 'recently' introduced TASK_IDLE
for this case:

  80ed87c8a9 ("sched/wait: Introduce TASK_NOLOAD and TASK_IDLE")

use it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-23 06:30:35 -07:00
Boqun Feng 52fa5bc5cb locking/lockdep: Explicitly initialize wq_barrier::done::map
With the new lockdep crossrelease feature, which checks completions usage,
a false positive is reported in the workqueue code:

> Worker A : acquired of wfc.work -> wait for cpu_hotplug_lock to be released
> Task   B : acquired of cpu_hotplug_lock -> wait for lock#3 to be released
> Task   C : acquired of lock#3 -> wait for completion of barr->done
> (Task C is in lru_add_drain_all_cpuslocked())
> Worker D : wait for wfc.work to be released -> will complete barr->done

Such a dead lock can not happen because Task C's barr->done and Worker D's
barr->done can not be the same instance.

The reason of this false positive is we initialize all wq_barrier::done
at insert_wq_barrier() via init_completion(), which makes them belong to
the same lock class, therefore, impossible circles are reported.

To fix this, explicitly initialize the lockdep map for wq_barrier::done
in insert_wq_barrier(), so that the lock class key of wq_barrier::done
is a subkey of the corresponding work_struct, as a result we won't build
a dependency between a wq_barrier with a unrelated work, and we can
differ wq barriers based on the related works, so the false positive
above is avoided.

Also define the empty lockdep_init_map_crosslock() for !CROSSRELEASE
to make the code simple and away from unnecessary #ifdefs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Cc: Byungchul Park <byungchul.park@lge.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170817094622.12915-1-boqun.feng@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-17 12:12:33 +02:00
Byungchul Park b09be676e0 locking/lockdep: Implement the 'crossrelease' feature
Lockdep is a runtime locking correctness validator that detects and
reports a deadlock or its possibility by checking dependencies between
locks. It's useful since it does not report just an actual deadlock but
also the possibility of a deadlock that has not actually happened yet.
That enables problems to be fixed before they affect real systems.

However, this facility is only applicable to typical locks, such as
spinlocks and mutexes, which are normally released within the context in
which they were acquired. However, synchronization primitives like page
locks or completions, which are allowed to be released in any context,
also create dependencies and can cause a deadlock.

So lockdep should track these locks to do a better job. The 'crossrelease'
implementation makes these primitives also be tracked.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akpm@linux-foundation.org
Cc: boqun.feng@gmail.com
Cc: kernel-team@lge.com
Cc: kirill@shutemov.name
Cc: npiggin@gmail.com
Cc: walken@google.com
Cc: willy@infradead.org
Link: http://lkml.kernel.org/r/1502089981-21272-6-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-08-10 12:29:07 +02:00
Benjamin Peterson 9a2614916a workqueue: fix path to documentation
Signed-off-by: Benjamin Peterson <bp@benjamin.pe>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-08-07 08:03:24 -07:00
Michael Bringmann 1ad0f0a7aa workqueue: Work around edge cases for calc of pool's cpumask
There is an underlying assumption/trade-off in many layers of the Linux
system that CPU <-> node mapping is static.  This is despite the presence
of features like NUMA and 'hotplug' that support the dynamic addition/
removal of fundamental system resources like CPUs and memory.  PowerPC
systems, however, do provide extensive features for the dynamic change
of resources available to a system.

Currently, there is little or no synchronization protection around the
updating of the CPU <-> node mapping, and the export/update of this
information for other layers / modules.  In systems which can change
this mapping during 'hotplug', like PowerPC, the information is changing
underneath all layers that might reference it.

This patch attempts to ensure that a valid, usable cpumask attribute
is used by the workqueue infrastructure when setting up new resource
pools.  It prevents a crash that has been observed when an 'empty'
cpumask is passed along to the worker/task scheduling code.  It is
intended as a temporary workaround until a more fundamental review and
correction of the issue can be done.

[With additions to the patch provided by Tejun Hao <tj@kernel.org>]

Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-07-28 11:05:52 -04:00
Tejun Heo 0a94efb5ac workqueue: implicit ordered attribute should be overridable
5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
ordered") automatically enabled ordered attribute for unbound
workqueues w/ max_active == 1.  Because ordered workqueues reject
max_active and some attribute changes, this implicit ordered mode
broke cases where the user creates an unbound workqueue w/ max_active
== 1 and later explicitly changes the related attributes.

This patch distinguishes explicit and implicit ordered setting and
overrides from attribute changes if implict.

Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: 5c0338c687 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
2017-07-25 13:28:56 -04:00
Tejun Heo 5c0338c687 workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
The combination of WQ_UNBOUND and max_active == 1 used to imply
ordered execution.  After NUMA affinity 4c16bd327c ("workqueue:
implement NUMA affinity for unbound workqueues"), this is no longer
true due to per-node worker pools.

While the right way to create an ordered workqueue is
alloc_ordered_workqueue(), the documentation has been misleading for a
long time and people do use WQ_UNBOUND and max_active == 1 for ordered
workqueues which can lead to subtle bugs which are very difficult to
trigger.

It's unlikely that we'd see noticeable performance impact by enforcing
ordering on WQ_UNBOUND / max_active == 1 workqueues.  Let's
automatically set __WQ_ORDERED for those workqueues.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reported-by: Alexei Potashnik <alexei@purestorage.com>
Fixes: 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues")
Cc: stable@vger.kernel.org # v3.10+
2017-07-19 11:24:19 -04:00
Ingo Molnar ac6424b981 sched/wait: Rename wait_queue_t => wait_queue_entry_t
Rename:

	wait_queue_t		=>	wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-06-20 12:18:27 +02:00
Linus Torvalds 3527d3e951 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - another round of rq-clock handling debugging, robustization and
     fixes

   - PELT accounting improvements

   - CPU hotplug related ->cpus_allowed affinity handling fixes all
     around the tree

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  sched/x86: Update reschedule warning text
  crypto: N2 - Replace racy task affinity logic
  cpufreq/sparc-us2e: Replace racy task affinity logic
  cpufreq/sparc-us3: Replace racy task affinity logic
  cpufreq/sh: Replace racy task affinity logic
  cpufreq/ia64: Replace racy task affinity logic
  ACPI/processor: Replace racy task affinity logic
  ACPI/processor: Fix error handling in __acpi_processor_start()
  sparc/sysfs: Replace racy task affinity logic
  powerpc/smp: Replace open coded task affinity logic
  ia64/sn/hwperf: Replace racy task affinity logic
  ia64/salinfo: Replace racy task affinity logic
  workqueue: Provide work_on_cpu_safe()
  ia64/topology: Remove cpus_allowed manipulation
  sched/fair: Move the PELT constants into a generated header
  sched/fair: Increase PELT accuracy for small tasks
  sched/fair: Fix comments
  sched/Documentation: Add 'sched-pelt' tool
  sched/fair: Fix corner case in __accumulate_sum()
  sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
  ...
2017-05-01 19:12:53 -07:00
Linus Torvalds ad1490bcd2 Merge branch 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue update from Tejun Heo:
 "One trivial patch to use setup_deferrable_timer() instead of
  open-coding the initialization"

* 'for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: use setup_deferrable_timer
2017-05-01 13:49:27 -07:00
Thomas Gleixner 0e8d6a9336 workqueue: Provide work_on_cpu_safe()
work_on_cpu() is not protected against CPU hotplug. For code which requires
to be either executed on an online CPU or to fail if the CPU is not
available the callsite would have to protect against CPU hotplug.

Provide a function which does get/put_online_cpus() around the call to
work_on_cpu() and fails the call with -ENODEV if the target CPU is not
online.

Preparatory patch to convert several racy task affinity manipulations.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Len Brown <lenb@kernel.org>
Link: http://lkml.kernel.org/r/20170412201042.262610721@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-04-15 12:20:53 +02:00
Geliang Tang c30fb26b11 workqueue: use setup_deferrable_timer
Use setup_deferrable_timer() instead of init_timer_deferrable() to
simplify the code.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-06 15:42:20 -05:00
Tejun Heo 637fdbae60 workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
If queue_delayed_work() gets called with NULL @wq, the kernel will
oops asynchronuosly on timer expiration which isn't too helpful in
tracking down the offender.  This actually happened with smc.

__queue_delayed_work() already does several input sanity checks
synchronously.  Add NULL @wq check.

Reported-by: Dave Jones <davej@codemonkey.org.uk>
Link: http://lkml.kernel.org/r/20170227171439.jshx3qplflyrgcv7@codemonkey.org.uk
Signed-off-by: Tejun Heo <tj@kernel.org>
2017-03-06 15:33:42 -05:00
Kees Cook dfb4357da6 time: Remove CONFIG_TIMER_STATS
Currently CONFIG_TIMER_STATS exposes process information across namespaces:

kernel/time/timer_list.c print_timer():

        SEQ_printf(m, ", %s/%d", tmp, timer->start_pid);

/proc/timer_list:

 #11: <0000000000000000>, hrtimer_wakeup, S:01, do_nanosleep, cron/2570

Given that the tracer can give the same information, this patch entirely
removes CONFIG_TIMER_STATS.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: John Stultz <john.stultz@linaro.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: linux-doc@vger.kernel.org
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Xing Gao <xgao01@email.wm.edu>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jessica Frazelle <me@jessfraz.com>
Cc: kernel-hardening@lists.openwall.com
Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-api@vger.kernel.org
Cc: Arjan van de Ven <arjan@linux.intel.com>
Link: http://lkml.kernel.org/r/20170208192659.GA32582@beast
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2017-02-10 11:15:08 +01:00
Tejun Heo 8bc4a04455 Merge branch 'for-4.9' into for-4.10 2016-10-19 12:12:40 -04:00
Tejun Heo 2186d9f940 workqueue: move wq_numa_init() to workqueue_init()
While splitting up workqueue initialization into two parts,
ac8f73400782 ("workqueue: make workqueue available early during boot")
put wq_numa_init() into workqueue_init_early().  Unfortunately, on
some archs including power and arm64, cpu to node mapping isn't yet
established by the time the early init is called leading to incorrect
NUMA initialization and subsequently the following oops due to zero
cpumask on node-specific unbound pools.

  Unable to handle kernel paging request for data at address 0x00000038
  Faulting instruction address: 0xc0000000000fc0cc
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=2048 NUMA PowerNV
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161005 #94
  task: c0000007f5400000 task.stack: c000001ffc084000
  NIP: c0000000000fc0cc LR: c0000000000ed928 CTR: c0000000000fbfd0
  REGS: c000001ffc087780 TRAP: 0300   Not tainted  (4.8.0-compiler_gcc-6.2.0-next-20161005)
  MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 48000424  XER: 00000000
  CFAR: c0000000000089dc DAR: 0000000000000038 DSISR: 40000000 SOFTE: 0
  GPR00: c0000000000ed928 c000001ffc087a00 c000000000e63200 c000000010d6d600
  GPR04: c0000007f5409200 0000000000000021 000000000748e08c 000000000000001f
  GPR08: 0000000000000000 0000000000000021 000000000748f1f8 0000000000000000
  GPR12: 0000000028000422 c00000000fb80000 c00000000000e0c8 0000000000000000
  GPR16: 0000000000000000 0000000000000000 0000000000000021 0000000000000001
  GPR20: ffffffffafb50401 0000000000000000 c000000010d6d600 000000000000ba7e
  GPR24: 000000000000ba7e c000000000d8bc58 afb504000afb5041 0000000000000001
  GPR28: 0000000000000000 0000000000000004 c0000007f5409280 0000000000000000
  NIP [c0000000000fc0cc] enqueue_task_fair+0xfc/0x18b0
  LR [c0000000000ed928] activate_task+0x78/0xe0
  Call Trace:
  [c000001ffc087a00] [c0000007f5409200] 0xc0000007f5409200 (unreliable)
  [c000001ffc087b10] [c0000000000ed928] activate_task+0x78/0xe0
  [c000001ffc087b50] [c0000000000ede58] ttwu_do_activate+0x68/0xc0
  [c000001ffc087b90] [c0000000000ef1b8] try_to_wake_up+0x208/0x4f0
  [c000001ffc087c10] [c0000000000d3484] create_worker+0x144/0x250
  [c000001ffc087cb0] [c000000000cd72d0] workqueue_init+0x124/0x150
  [c000001ffc087d00] [c000000000cc0e74] kernel_init_freeable+0x158/0x360
  [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160
  [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc
  Instruction dump:
  62940401 3b800000 3aa00000 7f17c378 3a600001 3b600001 60000000 60000000
  60420000 72490021 ebfe0150 2f890001 <ebbf0038> 419e0de0 7fbee840 419e0e58
  ---[ end trace 0000000000000000 ]---

Fix it by moving wq_numa_init() to workqueue_init().  As this means
that the early intialization may not have full NUMA info for per-cpu
pools and ignores NUMA affinity for unbound pools, fix them up from
workqueue_init() after wq_numa_init().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/87twck5wqo.fsf@concordia.ellerman.id.au
Fixes: ac8f73400782 ("workqueue: make workqueue available early during boot")
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-10-19 12:12:26 -04:00
Petr Mladek e700591ae0 kthread: rename probe_kthread_data() to kthread_probe_data()
Patch series "kthread: Kthread worker API improvements"

The intention of this patchset is to make it easier to manipulate and
maintain kthreads.  Especially, I want to replace all the custom main
cycles with a generic one.  Also I want to make the kthreads sleep in a
consistent state in a common place when there is no work.

This patch (of 11):

A good practice is to prefix the names of functions by the name of the
subsystem.

This patch fixes the name of probe_kthread_data().  The other wrong
functions names are part of the kthread worker API and will be fixed
separately.

Link: http://lkml.kernel.org/r/1470754545-17632-2-git-send-email-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Borislav Petkov <bp@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-10-11 15:06:33 -07:00