Merge branch 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo: "There are three major changes. - WQ_HIGHPRI has been reimplemented so that high priority work items are served by worker threads with -20 nice value from dedicated highpri worker pools. - CPU hotplug support has been reimplemented such that idle workers are kept across CPU hotplug events. This makes CPU hotplug cheaper (for PM) and makes the code simpler. - flush_kthread_work() has been reimplemented so that a work item can be freed while executing. This removes an annoying behavior difference between kthread_worker and workqueue." * 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix spurious CPU locality WARN from process_one_work() kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation workqueue: simplify CPU hotplug code workqueue: remove CPU offline trustee workqueue: don't butcher idle workers on an offline CPU workqueue: reimplement CPU online rebinding to handle idle workers workqueue: drop @bind from create_worker() workqueue: use mutex for global_cwq manager exclusion workqueue: ROGUE workers are UNBOUND workers workqueue: drop CPU_DYING notifier operation workqueue: perform cpu down operations from low priority cpu_notifier() workqueue: reimplement WQ_HIGHPRI using a separate worker_pool workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool() workqueue: separate out worker_pool flags workqueue: use @pool instead of @gcwq or @cpu where applicable workqueue: factor out worker_pool from global_cwq workqueue: don't use WQ_HIGHPRI for unbound workqueues
This commit is contained in:
commit
a08489c569
|
@ -89,25 +89,28 @@ called thread-pools.
|
||||||
|
|
||||||
The cmwq design differentiates between the user-facing workqueues that
|
The cmwq design differentiates between the user-facing workqueues that
|
||||||
subsystems and drivers queue work items on and the backend mechanism
|
subsystems and drivers queue work items on and the backend mechanism
|
||||||
which manages thread-pool and processes the queued work items.
|
which manages thread-pools and processes the queued work items.
|
||||||
|
|
||||||
The backend is called gcwq. There is one gcwq for each possible CPU
|
The backend is called gcwq. There is one gcwq for each possible CPU
|
||||||
and one gcwq to serve work items queued on unbound workqueues.
|
and one gcwq to serve work items queued on unbound workqueues. Each
|
||||||
|
gcwq has two thread-pools - one for normal work items and the other
|
||||||
|
for high priority ones.
|
||||||
|
|
||||||
Subsystems and drivers can create and queue work items through special
|
Subsystems and drivers can create and queue work items through special
|
||||||
workqueue API functions as they see fit. They can influence some
|
workqueue API functions as they see fit. They can influence some
|
||||||
aspects of the way the work items are executed by setting flags on the
|
aspects of the way the work items are executed by setting flags on the
|
||||||
workqueue they are putting the work item on. These flags include
|
workqueue they are putting the work item on. These flags include
|
||||||
things like CPU locality, reentrancy, concurrency limits and more. To
|
things like CPU locality, reentrancy, concurrency limits, priority and
|
||||||
get a detailed overview refer to the API description of
|
more. To get a detailed overview refer to the API description of
|
||||||
alloc_workqueue() below.
|
alloc_workqueue() below.
|
||||||
|
|
||||||
When a work item is queued to a workqueue, the target gcwq is
|
When a work item is queued to a workqueue, the target gcwq and
|
||||||
determined according to the queue parameters and workqueue attributes
|
thread-pool is determined according to the queue parameters and
|
||||||
and appended on the shared worklist of the gcwq. For example, unless
|
workqueue attributes and appended on the shared worklist of the
|
||||||
specifically overridden, a work item of a bound workqueue will be
|
thread-pool. For example, unless specifically overridden, a work item
|
||||||
queued on the worklist of exactly that gcwq that is associated to the
|
of a bound workqueue will be queued on the worklist of either normal
|
||||||
CPU the issuer is running on.
|
or highpri thread-pool of the gcwq that is associated to the CPU the
|
||||||
|
issuer is running on.
|
||||||
|
|
||||||
For any worker pool implementation, managing the concurrency level
|
For any worker pool implementation, managing the concurrency level
|
||||||
(how many execution contexts are active) is an important issue. cmwq
|
(how many execution contexts are active) is an important issue. cmwq
|
||||||
|
@ -115,26 +118,26 @@ tries to keep the concurrency at a minimal but sufficient level.
|
||||||
Minimal to save resources and sufficient in that the system is used at
|
Minimal to save resources and sufficient in that the system is used at
|
||||||
its full capacity.
|
its full capacity.
|
||||||
|
|
||||||
Each gcwq bound to an actual CPU implements concurrency management by
|
Each thread-pool bound to an actual CPU implements concurrency
|
||||||
hooking into the scheduler. The gcwq is notified whenever an active
|
management by hooking into the scheduler. The thread-pool is notified
|
||||||
worker wakes up or sleeps and keeps track of the number of the
|
whenever an active worker wakes up or sleeps and keeps track of the
|
||||||
currently runnable workers. Generally, work items are not expected to
|
number of the currently runnable workers. Generally, work items are
|
||||||
hog a CPU and consume many cycles. That means maintaining just enough
|
not expected to hog a CPU and consume many cycles. That means
|
||||||
concurrency to prevent work processing from stalling should be
|
maintaining just enough concurrency to prevent work processing from
|
||||||
optimal. As long as there are one or more runnable workers on the
|
stalling should be optimal. As long as there are one or more runnable
|
||||||
CPU, the gcwq doesn't start execution of a new work, but, when the
|
workers on the CPU, the thread-pool doesn't start execution of a new
|
||||||
last running worker goes to sleep, it immediately schedules a new
|
work, but, when the last running worker goes to sleep, it immediately
|
||||||
worker so that the CPU doesn't sit idle while there are pending work
|
schedules a new worker so that the CPU doesn't sit idle while there
|
||||||
items. This allows using a minimal number of workers without losing
|
are pending work items. This allows using a minimal number of workers
|
||||||
execution bandwidth.
|
without losing execution bandwidth.
|
||||||
|
|
||||||
Keeping idle workers around doesn't cost other than the memory space
|
Keeping idle workers around doesn't cost other than the memory space
|
||||||
for kthreads, so cmwq holds onto idle ones for a while before killing
|
for kthreads, so cmwq holds onto idle ones for a while before killing
|
||||||
them.
|
them.
|
||||||
|
|
||||||
For an unbound wq, the above concurrency management doesn't apply and
|
For an unbound wq, the above concurrency management doesn't apply and
|
||||||
the gcwq for the pseudo unbound CPU tries to start executing all work
|
the thread-pools for the pseudo unbound CPU try to start executing all
|
||||||
items as soon as possible. The responsibility of regulating
|
work items as soon as possible. The responsibility of regulating
|
||||||
concurrency level is on the users. There is also a flag to mark a
|
concurrency level is on the users. There is also a flag to mark a
|
||||||
bound wq to ignore the concurrency management. Please refer to the
|
bound wq to ignore the concurrency management. Please refer to the
|
||||||
API section for details.
|
API section for details.
|
||||||
|
@ -205,31 +208,22 @@ resources, scheduled and executed.
|
||||||
|
|
||||||
WQ_HIGHPRI
|
WQ_HIGHPRI
|
||||||
|
|
||||||
Work items of a highpri wq are queued at the head of the
|
Work items of a highpri wq are queued to the highpri
|
||||||
worklist of the target gcwq and start execution regardless of
|
thread-pool of the target gcwq. Highpri thread-pools are
|
||||||
the current concurrency level. In other words, highpri work
|
served by worker threads with elevated nice level.
|
||||||
items will always start execution as soon as execution
|
|
||||||
resource is available.
|
|
||||||
|
|
||||||
Ordering among highpri work items is preserved - a highpri
|
Note that normal and highpri thread-pools don't interact with
|
||||||
work item queued after another highpri work item will start
|
each other. Each maintain its separate pool of workers and
|
||||||
execution after the earlier highpri work item starts.
|
implements concurrency management among its workers.
|
||||||
|
|
||||||
Although highpri work items are not held back by other
|
|
||||||
runnable work items, they still contribute to the concurrency
|
|
||||||
level. Highpri work items in runnable state will prevent
|
|
||||||
non-highpri work items from starting execution.
|
|
||||||
|
|
||||||
This flag is meaningless for unbound wq.
|
|
||||||
|
|
||||||
WQ_CPU_INTENSIVE
|
WQ_CPU_INTENSIVE
|
||||||
|
|
||||||
Work items of a CPU intensive wq do not contribute to the
|
Work items of a CPU intensive wq do not contribute to the
|
||||||
concurrency level. In other words, runnable CPU intensive
|
concurrency level. In other words, runnable CPU intensive
|
||||||
work items will not prevent other work items from starting
|
work items will not prevent other work items in the same
|
||||||
execution. This is useful for bound work items which are
|
thread-pool from starting execution. This is useful for bound
|
||||||
expected to hog CPU cycles so that their execution is
|
work items which are expected to hog CPU cycles so that their
|
||||||
regulated by the system scheduler.
|
execution is regulated by the system scheduler.
|
||||||
|
|
||||||
Although CPU intensive work items don't contribute to the
|
Although CPU intensive work items don't contribute to the
|
||||||
concurrency level, start of their executions is still
|
concurrency level, start of their executions is still
|
||||||
|
@ -239,14 +233,6 @@ resources, scheduled and executed.
|
||||||
|
|
||||||
This flag is meaningless for unbound wq.
|
This flag is meaningless for unbound wq.
|
||||||
|
|
||||||
WQ_HIGHPRI | WQ_CPU_INTENSIVE
|
|
||||||
|
|
||||||
This combination makes the wq avoid interaction with
|
|
||||||
concurrency management completely and behave as a simple
|
|
||||||
per-CPU execution context provider. Work items queued on a
|
|
||||||
highpri CPU-intensive wq start execution as soon as resources
|
|
||||||
are available and don't affect execution of other work items.
|
|
||||||
|
|
||||||
@max_active:
|
@max_active:
|
||||||
|
|
||||||
@max_active determines the maximum number of execution contexts per
|
@max_active determines the maximum number of execution contexts per
|
||||||
|
@ -328,20 +314,7 @@ If @max_active == 2,
|
||||||
35 w2 wakes up and finishes
|
35 w2 wakes up and finishes
|
||||||
|
|
||||||
Now, let's assume w1 and w2 are queued to a different wq q1 which has
|
Now, let's assume w1 and w2 are queued to a different wq q1 which has
|
||||||
WQ_HIGHPRI set,
|
WQ_CPU_INTENSIVE set,
|
||||||
|
|
||||||
TIME IN MSECS EVENT
|
|
||||||
0 w1 and w2 start and burn CPU
|
|
||||||
5 w1 sleeps
|
|
||||||
10 w2 sleeps
|
|
||||||
10 w0 starts and burns CPU
|
|
||||||
15 w0 sleeps
|
|
||||||
15 w1 wakes up and finishes
|
|
||||||
20 w2 wakes up and finishes
|
|
||||||
25 w0 wakes up and burns CPU
|
|
||||||
30 w0 finishes
|
|
||||||
|
|
||||||
If q1 has WQ_CPU_INTENSIVE set,
|
|
||||||
|
|
||||||
TIME IN MSECS EVENT
|
TIME IN MSECS EVENT
|
||||||
0 w0 starts and burns CPU
|
0 w0 starts and burns CPU
|
||||||
|
|
|
@ -73,8 +73,9 @@ enum {
|
||||||
/* migration should happen before other stuff but after perf */
|
/* migration should happen before other stuff but after perf */
|
||||||
CPU_PRI_PERF = 20,
|
CPU_PRI_PERF = 20,
|
||||||
CPU_PRI_MIGRATION = 10,
|
CPU_PRI_MIGRATION = 10,
|
||||||
/* prepare workqueues for other notifiers */
|
/* bring up workqueues before normal notifiers and down after */
|
||||||
CPU_PRI_WORKQUEUE = 5,
|
CPU_PRI_WORKQUEUE_UP = 5,
|
||||||
|
CPU_PRI_WORKQUEUE_DOWN = -5,
|
||||||
};
|
};
|
||||||
|
|
||||||
#define CPU_ONLINE 0x0002 /* CPU (unsigned)v is up */
|
#define CPU_ONLINE 0x0002 /* CPU (unsigned)v is up */
|
||||||
|
|
|
@ -49,8 +49,6 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
|
||||||
* can be queued and flushed using queue/flush_kthread_work()
|
* can be queued and flushed using queue/flush_kthread_work()
|
||||||
* respectively. Queued kthread_works are processed by a kthread
|
* respectively. Queued kthread_works are processed by a kthread
|
||||||
* running kthread_worker_fn().
|
* running kthread_worker_fn().
|
||||||
*
|
|
||||||
* A kthread_work can't be freed while it is executing.
|
|
||||||
*/
|
*/
|
||||||
struct kthread_work;
|
struct kthread_work;
|
||||||
typedef void (*kthread_work_func_t)(struct kthread_work *work);
|
typedef void (*kthread_work_func_t)(struct kthread_work *work);
|
||||||
|
@ -59,15 +57,14 @@ struct kthread_worker {
|
||||||
spinlock_t lock;
|
spinlock_t lock;
|
||||||
struct list_head work_list;
|
struct list_head work_list;
|
||||||
struct task_struct *task;
|
struct task_struct *task;
|
||||||
|
struct kthread_work *current_work;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct kthread_work {
|
struct kthread_work {
|
||||||
struct list_head node;
|
struct list_head node;
|
||||||
kthread_work_func_t func;
|
kthread_work_func_t func;
|
||||||
wait_queue_head_t done;
|
wait_queue_head_t done;
|
||||||
atomic_t flushing;
|
struct kthread_worker *worker;
|
||||||
int queue_seq;
|
|
||||||
int done_seq;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
#define KTHREAD_WORKER_INIT(worker) { \
|
#define KTHREAD_WORKER_INIT(worker) { \
|
||||||
|
@ -79,7 +76,6 @@ struct kthread_work {
|
||||||
.node = LIST_HEAD_INIT((work).node), \
|
.node = LIST_HEAD_INIT((work).node), \
|
||||||
.func = (fn), \
|
.func = (fn), \
|
||||||
.done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done), \
|
.done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done), \
|
||||||
.flushing = ATOMIC_INIT(0), \
|
|
||||||
}
|
}
|
||||||
|
|
||||||
#define DEFINE_KTHREAD_WORKER(worker) \
|
#define DEFINE_KTHREAD_WORKER(worker) \
|
||||||
|
|
|
@ -54,7 +54,7 @@ TRACE_EVENT(workqueue_queue_work,
|
||||||
__entry->function = work->func;
|
__entry->function = work->func;
|
||||||
__entry->workqueue = cwq->wq;
|
__entry->workqueue = cwq->wq;
|
||||||
__entry->req_cpu = req_cpu;
|
__entry->req_cpu = req_cpu;
|
||||||
__entry->cpu = cwq->gcwq->cpu;
|
__entry->cpu = cwq->pool->gcwq->cpu;
|
||||||
),
|
),
|
||||||
|
|
||||||
TP_printk("work struct=%p function=%pf workqueue=%p req_cpu=%u cpu=%u",
|
TP_printk("work struct=%p function=%pf workqueue=%p req_cpu=%u cpu=%u",
|
||||||
|
|
|
@ -360,16 +360,12 @@ repeat:
|
||||||
struct kthread_work, node);
|
struct kthread_work, node);
|
||||||
list_del_init(&work->node);
|
list_del_init(&work->node);
|
||||||
}
|
}
|
||||||
|
worker->current_work = work;
|
||||||
spin_unlock_irq(&worker->lock);
|
spin_unlock_irq(&worker->lock);
|
||||||
|
|
||||||
if (work) {
|
if (work) {
|
||||||
__set_current_state(TASK_RUNNING);
|
__set_current_state(TASK_RUNNING);
|
||||||
work->func(work);
|
work->func(work);
|
||||||
smp_wmb(); /* wmb worker-b0 paired with flush-b1 */
|
|
||||||
work->done_seq = work->queue_seq;
|
|
||||||
smp_mb(); /* mb worker-b1 paired with flush-b0 */
|
|
||||||
if (atomic_read(&work->flushing))
|
|
||||||
wake_up_all(&work->done);
|
|
||||||
} else if (!freezing(current))
|
} else if (!freezing(current))
|
||||||
schedule();
|
schedule();
|
||||||
|
|
||||||
|
@ -378,6 +374,19 @@ repeat:
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(kthread_worker_fn);
|
EXPORT_SYMBOL_GPL(kthread_worker_fn);
|
||||||
|
|
||||||
|
/* insert @work before @pos in @worker */
|
||||||
|
static void insert_kthread_work(struct kthread_worker *worker,
|
||||||
|
struct kthread_work *work,
|
||||||
|
struct list_head *pos)
|
||||||
|
{
|
||||||
|
lockdep_assert_held(&worker->lock);
|
||||||
|
|
||||||
|
list_add_tail(&work->node, pos);
|
||||||
|
work->worker = worker;
|
||||||
|
if (likely(worker->task))
|
||||||
|
wake_up_process(worker->task);
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* queue_kthread_work - queue a kthread_work
|
* queue_kthread_work - queue a kthread_work
|
||||||
* @worker: target kthread_worker
|
* @worker: target kthread_worker
|
||||||
|
@ -395,10 +404,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
|
||||||
|
|
||||||
spin_lock_irqsave(&worker->lock, flags);
|
spin_lock_irqsave(&worker->lock, flags);
|
||||||
if (list_empty(&work->node)) {
|
if (list_empty(&work->node)) {
|
||||||
list_add_tail(&work->node, &worker->work_list);
|
insert_kthread_work(worker, work, &worker->work_list);
|
||||||
work->queue_seq++;
|
|
||||||
if (likely(worker->task))
|
|
||||||
wake_up_process(worker->task);
|
|
||||||
ret = true;
|
ret = true;
|
||||||
}
|
}
|
||||||
spin_unlock_irqrestore(&worker->lock, flags);
|
spin_unlock_irqrestore(&worker->lock, flags);
|
||||||
|
@ -406,36 +412,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
|
||||||
}
|
}
|
||||||
EXPORT_SYMBOL_GPL(queue_kthread_work);
|
EXPORT_SYMBOL_GPL(queue_kthread_work);
|
||||||
|
|
||||||
/**
|
|
||||||
* flush_kthread_work - flush a kthread_work
|
|
||||||
* @work: work to flush
|
|
||||||
*
|
|
||||||
* If @work is queued or executing, wait for it to finish execution.
|
|
||||||
*/
|
|
||||||
void flush_kthread_work(struct kthread_work *work)
|
|
||||||
{
|
|
||||||
int seq = work->queue_seq;
|
|
||||||
|
|
||||||
atomic_inc(&work->flushing);
|
|
||||||
|
|
||||||
/*
|
|
||||||
* mb flush-b0 paired with worker-b1, to make sure either
|
|
||||||
* worker sees the above increment or we see done_seq update.
|
|
||||||
*/
|
|
||||||
smp_mb__after_atomic_inc();
|
|
||||||
|
|
||||||
/* A - B <= 0 tests whether B is in front of A regardless of overflow */
|
|
||||||
wait_event(work->done, seq - work->done_seq <= 0);
|
|
||||||
atomic_dec(&work->flushing);
|
|
||||||
|
|
||||||
/*
|
|
||||||
* rmb flush-b1 paired with worker-b0, to make sure our caller
|
|
||||||
* sees every change made by work->func().
|
|
||||||
*/
|
|
||||||
smp_mb__after_atomic_dec();
|
|
||||||
}
|
|
||||||
EXPORT_SYMBOL_GPL(flush_kthread_work);
|
|
||||||
|
|
||||||
struct kthread_flush_work {
|
struct kthread_flush_work {
|
||||||
struct kthread_work work;
|
struct kthread_work work;
|
||||||
struct completion done;
|
struct completion done;
|
||||||
|
@ -448,6 +424,46 @@ static void kthread_flush_work_fn(struct kthread_work *work)
|
||||||
complete(&fwork->done);
|
complete(&fwork->done);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* flush_kthread_work - flush a kthread_work
|
||||||
|
* @work: work to flush
|
||||||
|
*
|
||||||
|
* If @work is queued or executing, wait for it to finish execution.
|
||||||
|
*/
|
||||||
|
void flush_kthread_work(struct kthread_work *work)
|
||||||
|
{
|
||||||
|
struct kthread_flush_work fwork = {
|
||||||
|
KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
|
||||||
|
COMPLETION_INITIALIZER_ONSTACK(fwork.done),
|
||||||
|
};
|
||||||
|
struct kthread_worker *worker;
|
||||||
|
bool noop = false;
|
||||||
|
|
||||||
|
retry:
|
||||||
|
worker = work->worker;
|
||||||
|
if (!worker)
|
||||||
|
return;
|
||||||
|
|
||||||
|
spin_lock_irq(&worker->lock);
|
||||||
|
if (work->worker != worker) {
|
||||||
|
spin_unlock_irq(&worker->lock);
|
||||||
|
goto retry;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!list_empty(&work->node))
|
||||||
|
insert_kthread_work(worker, &fwork.work, work->node.next);
|
||||||
|
else if (worker->current_work == work)
|
||||||
|
insert_kthread_work(worker, &fwork.work, worker->work_list.next);
|
||||||
|
else
|
||||||
|
noop = true;
|
||||||
|
|
||||||
|
spin_unlock_irq(&worker->lock);
|
||||||
|
|
||||||
|
if (!noop)
|
||||||
|
wait_for_completion(&fwork.done);
|
||||||
|
}
|
||||||
|
EXPORT_SYMBOL_GPL(flush_kthread_work);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* flush_kthread_worker - flush all current works on a kthread_worker
|
* flush_kthread_worker - flush all current works on a kthread_worker
|
||||||
* @worker: worker to flush
|
* @worker: worker to flush
|
||||||
|
|
1154
kernel/workqueue.c
1154
kernel/workqueue.c
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue