Merge branch 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue changes from Tejun Heo: "There are three major changes. - WQ_HIGHPRI has been reimplemented so that high priority work items are served by worker threads with -20 nice value from dedicated highpri worker pools. - CPU hotplug support has been reimplemented such that idle workers are kept across CPU hotplug events. This makes CPU hotplug cheaper (for PM) and makes the code simpler. - flush_kthread_work() has been reimplemented so that a work item can be freed while executing. This removes an annoying behavior difference between kthread_worker and workqueue." * 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix spurious CPU locality WARN from process_one_work() kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation workqueue: simplify CPU hotplug code workqueue: remove CPU offline trustee workqueue: don't butcher idle workers on an offline CPU workqueue: reimplement CPU online rebinding to handle idle workers workqueue: drop @bind from create_worker() workqueue: use mutex for global_cwq manager exclusion workqueue: ROGUE workers are UNBOUND workers workqueue: drop CPU_DYING notifier operation workqueue: perform cpu down operations from low priority cpu_notifier() workqueue: reimplement WQ_HIGHPRI using a separate worker_pool workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool() workqueue: separate out worker_pool flags workqueue: use @pool instead of @gcwq or @cpu where applicable workqueue: factor out worker_pool from global_cwq workqueue: don't use WQ_HIGHPRI for unbound workqueues
This commit is contained in:
commit
a08489c569
|
@ -89,25 +89,28 @@ called thread-pools.
|
|||
|
||||
The cmwq design differentiates between the user-facing workqueues that
|
||||
subsystems and drivers queue work items on and the backend mechanism
|
||||
which manages thread-pool and processes the queued work items.
|
||||
which manages thread-pools and processes the queued work items.
|
||||
|
||||
The backend is called gcwq. There is one gcwq for each possible CPU
|
||||
and one gcwq to serve work items queued on unbound workqueues.
|
||||
and one gcwq to serve work items queued on unbound workqueues. Each
|
||||
gcwq has two thread-pools - one for normal work items and the other
|
||||
for high priority ones.
|
||||
|
||||
Subsystems and drivers can create and queue work items through special
|
||||
workqueue API functions as they see fit. They can influence some
|
||||
aspects of the way the work items are executed by setting flags on the
|
||||
workqueue they are putting the work item on. These flags include
|
||||
things like CPU locality, reentrancy, concurrency limits and more. To
|
||||
get a detailed overview refer to the API description of
|
||||
things like CPU locality, reentrancy, concurrency limits, priority and
|
||||
more. To get a detailed overview refer to the API description of
|
||||
alloc_workqueue() below.
|
||||
|
||||
When a work item is queued to a workqueue, the target gcwq is
|
||||
determined according to the queue parameters and workqueue attributes
|
||||
and appended on the shared worklist of the gcwq. For example, unless
|
||||
specifically overridden, a work item of a bound workqueue will be
|
||||
queued on the worklist of exactly that gcwq that is associated to the
|
||||
CPU the issuer is running on.
|
||||
When a work item is queued to a workqueue, the target gcwq and
|
||||
thread-pool is determined according to the queue parameters and
|
||||
workqueue attributes and appended on the shared worklist of the
|
||||
thread-pool. For example, unless specifically overridden, a work item
|
||||
of a bound workqueue will be queued on the worklist of either normal
|
||||
or highpri thread-pool of the gcwq that is associated to the CPU the
|
||||
issuer is running on.
|
||||
|
||||
For any worker pool implementation, managing the concurrency level
|
||||
(how many execution contexts are active) is an important issue. cmwq
|
||||
|
@ -115,26 +118,26 @@ tries to keep the concurrency at a minimal but sufficient level.
|
|||
Minimal to save resources and sufficient in that the system is used at
|
||||
its full capacity.
|
||||
|
||||
Each gcwq bound to an actual CPU implements concurrency management by
|
||||
hooking into the scheduler. The gcwq is notified whenever an active
|
||||
worker wakes up or sleeps and keeps track of the number of the
|
||||
currently runnable workers. Generally, work items are not expected to
|
||||
hog a CPU and consume many cycles. That means maintaining just enough
|
||||
concurrency to prevent work processing from stalling should be
|
||||
optimal. As long as there are one or more runnable workers on the
|
||||
CPU, the gcwq doesn't start execution of a new work, but, when the
|
||||
last running worker goes to sleep, it immediately schedules a new
|
||||
worker so that the CPU doesn't sit idle while there are pending work
|
||||
items. This allows using a minimal number of workers without losing
|
||||
execution bandwidth.
|
||||
Each thread-pool bound to an actual CPU implements concurrency
|
||||
management by hooking into the scheduler. The thread-pool is notified
|
||||
whenever an active worker wakes up or sleeps and keeps track of the
|
||||
number of the currently runnable workers. Generally, work items are
|
||||
not expected to hog a CPU and consume many cycles. That means
|
||||
maintaining just enough concurrency to prevent work processing from
|
||||
stalling should be optimal. As long as there are one or more runnable
|
||||
workers on the CPU, the thread-pool doesn't start execution of a new
|
||||
work, but, when the last running worker goes to sleep, it immediately
|
||||
schedules a new worker so that the CPU doesn't sit idle while there
|
||||
are pending work items. This allows using a minimal number of workers
|
||||
without losing execution bandwidth.
|
||||
|
||||
Keeping idle workers around doesn't cost other than the memory space
|
||||
for kthreads, so cmwq holds onto idle ones for a while before killing
|
||||
them.
|
||||
|
||||
For an unbound wq, the above concurrency management doesn't apply and
|
||||
the gcwq for the pseudo unbound CPU tries to start executing all work
|
||||
items as soon as possible. The responsibility of regulating
|
||||
the thread-pools for the pseudo unbound CPU try to start executing all
|
||||
work items as soon as possible. The responsibility of regulating
|
||||
concurrency level is on the users. There is also a flag to mark a
|
||||
bound wq to ignore the concurrency management. Please refer to the
|
||||
API section for details.
|
||||
|
@ -205,31 +208,22 @@ resources, scheduled and executed.
|
|||
|
||||
WQ_HIGHPRI
|
||||
|
||||
Work items of a highpri wq are queued at the head of the
|
||||
worklist of the target gcwq and start execution regardless of
|
||||
the current concurrency level. In other words, highpri work
|
||||
items will always start execution as soon as execution
|
||||
resource is available.
|
||||
Work items of a highpri wq are queued to the highpri
|
||||
thread-pool of the target gcwq. Highpri thread-pools are
|
||||
served by worker threads with elevated nice level.
|
||||
|
||||
Ordering among highpri work items is preserved - a highpri
|
||||
work item queued after another highpri work item will start
|
||||
execution after the earlier highpri work item starts.
|
||||
|
||||
Although highpri work items are not held back by other
|
||||
runnable work items, they still contribute to the concurrency
|
||||
level. Highpri work items in runnable state will prevent
|
||||
non-highpri work items from starting execution.
|
||||
|
||||
This flag is meaningless for unbound wq.
|
||||
Note that normal and highpri thread-pools don't interact with
|
||||
each other. Each maintain its separate pool of workers and
|
||||
implements concurrency management among its workers.
|
||||
|
||||
WQ_CPU_INTENSIVE
|
||||
|
||||
Work items of a CPU intensive wq do not contribute to the
|
||||
concurrency level. In other words, runnable CPU intensive
|
||||
work items will not prevent other work items from starting
|
||||
execution. This is useful for bound work items which are
|
||||
expected to hog CPU cycles so that their execution is
|
||||
regulated by the system scheduler.
|
||||
work items will not prevent other work items in the same
|
||||
thread-pool from starting execution. This is useful for bound
|
||||
work items which are expected to hog CPU cycles so that their
|
||||
execution is regulated by the system scheduler.
|
||||
|
||||
Although CPU intensive work items don't contribute to the
|
||||
concurrency level, start of their executions is still
|
||||
|
@ -239,14 +233,6 @@ resources, scheduled and executed.
|
|||
|
||||
This flag is meaningless for unbound wq.
|
||||
|
||||
WQ_HIGHPRI | WQ_CPU_INTENSIVE
|
||||
|
||||
This combination makes the wq avoid interaction with
|
||||
concurrency management completely and behave as a simple
|
||||
per-CPU execution context provider. Work items queued on a
|
||||
highpri CPU-intensive wq start execution as soon as resources
|
||||
are available and don't affect execution of other work items.
|
||||
|
||||
@max_active:
|
||||
|
||||
@max_active determines the maximum number of execution contexts per
|
||||
|
@ -328,20 +314,7 @@ If @max_active == 2,
|
|||
35 w2 wakes up and finishes
|
||||
|
||||
Now, let's assume w1 and w2 are queued to a different wq q1 which has
|
||||
WQ_HIGHPRI set,
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w1 and w2 start and burn CPU
|
||||
5 w1 sleeps
|
||||
10 w2 sleeps
|
||||
10 w0 starts and burns CPU
|
||||
15 w0 sleeps
|
||||
15 w1 wakes up and finishes
|
||||
20 w2 wakes up and finishes
|
||||
25 w0 wakes up and burns CPU
|
||||
30 w0 finishes
|
||||
|
||||
If q1 has WQ_CPU_INTENSIVE set,
|
||||
WQ_CPU_INTENSIVE set,
|
||||
|
||||
TIME IN MSECS EVENT
|
||||
0 w0 starts and burns CPU
|
||||
|
|
|
@ -73,8 +73,9 @@ enum {
|
|||
/* migration should happen before other stuff but after perf */
|
||||
CPU_PRI_PERF = 20,
|
||||
CPU_PRI_MIGRATION = 10,
|
||||
/* prepare workqueues for other notifiers */
|
||||
CPU_PRI_WORKQUEUE = 5,
|
||||
/* bring up workqueues before normal notifiers and down after */
|
||||
CPU_PRI_WORKQUEUE_UP = 5,
|
||||
CPU_PRI_WORKQUEUE_DOWN = -5,
|
||||
};
|
||||
|
||||
#define CPU_ONLINE 0x0002 /* CPU (unsigned)v is up */
|
||||
|
|
|
@ -49,8 +49,6 @@ extern int tsk_fork_get_node(struct task_struct *tsk);
|
|||
* can be queued and flushed using queue/flush_kthread_work()
|
||||
* respectively. Queued kthread_works are processed by a kthread
|
||||
* running kthread_worker_fn().
|
||||
*
|
||||
* A kthread_work can't be freed while it is executing.
|
||||
*/
|
||||
struct kthread_work;
|
||||
typedef void (*kthread_work_func_t)(struct kthread_work *work);
|
||||
|
@ -59,15 +57,14 @@ struct kthread_worker {
|
|||
spinlock_t lock;
|
||||
struct list_head work_list;
|
||||
struct task_struct *task;
|
||||
struct kthread_work *current_work;
|
||||
};
|
||||
|
||||
struct kthread_work {
|
||||
struct list_head node;
|
||||
kthread_work_func_t func;
|
||||
wait_queue_head_t done;
|
||||
atomic_t flushing;
|
||||
int queue_seq;
|
||||
int done_seq;
|
||||
struct kthread_worker *worker;
|
||||
};
|
||||
|
||||
#define KTHREAD_WORKER_INIT(worker) { \
|
||||
|
@ -79,7 +76,6 @@ struct kthread_work {
|
|||
.node = LIST_HEAD_INIT((work).node), \
|
||||
.func = (fn), \
|
||||
.done = __WAIT_QUEUE_HEAD_INITIALIZER((work).done), \
|
||||
.flushing = ATOMIC_INIT(0), \
|
||||
}
|
||||
|
||||
#define DEFINE_KTHREAD_WORKER(worker) \
|
||||
|
|
|
@ -54,7 +54,7 @@ TRACE_EVENT(workqueue_queue_work,
|
|||
__entry->function = work->func;
|
||||
__entry->workqueue = cwq->wq;
|
||||
__entry->req_cpu = req_cpu;
|
||||
__entry->cpu = cwq->gcwq->cpu;
|
||||
__entry->cpu = cwq->pool->gcwq->cpu;
|
||||
),
|
||||
|
||||
TP_printk("work struct=%p function=%pf workqueue=%p req_cpu=%u cpu=%u",
|
||||
|
|
|
@ -360,16 +360,12 @@ repeat:
|
|||
struct kthread_work, node);
|
||||
list_del_init(&work->node);
|
||||
}
|
||||
worker->current_work = work;
|
||||
spin_unlock_irq(&worker->lock);
|
||||
|
||||
if (work) {
|
||||
__set_current_state(TASK_RUNNING);
|
||||
work->func(work);
|
||||
smp_wmb(); /* wmb worker-b0 paired with flush-b1 */
|
||||
work->done_seq = work->queue_seq;
|
||||
smp_mb(); /* mb worker-b1 paired with flush-b0 */
|
||||
if (atomic_read(&work->flushing))
|
||||
wake_up_all(&work->done);
|
||||
} else if (!freezing(current))
|
||||
schedule();
|
||||
|
||||
|
@ -378,6 +374,19 @@ repeat:
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(kthread_worker_fn);
|
||||
|
||||
/* insert @work before @pos in @worker */
|
||||
static void insert_kthread_work(struct kthread_worker *worker,
|
||||
struct kthread_work *work,
|
||||
struct list_head *pos)
|
||||
{
|
||||
lockdep_assert_held(&worker->lock);
|
||||
|
||||
list_add_tail(&work->node, pos);
|
||||
work->worker = worker;
|
||||
if (likely(worker->task))
|
||||
wake_up_process(worker->task);
|
||||
}
|
||||
|
||||
/**
|
||||
* queue_kthread_work - queue a kthread_work
|
||||
* @worker: target kthread_worker
|
||||
|
@ -395,10 +404,7 @@ bool queue_kthread_work(struct kthread_worker *worker,
|
|||
|
||||
spin_lock_irqsave(&worker->lock, flags);
|
||||
if (list_empty(&work->node)) {
|
||||
list_add_tail(&work->node, &worker->work_list);
|
||||
work->queue_seq++;
|
||||
if (likely(worker->task))
|
||||
wake_up_process(worker->task);
|
||||
insert_kthread_work(worker, work, &worker->work_list);
|
||||
ret = true;
|
||||
}
|
||||
spin_unlock_irqrestore(&worker->lock, flags);
|
||||
|
@ -406,36 +412,6 @@ bool queue_kthread_work(struct kthread_worker *worker,
|
|||
}
|
||||
EXPORT_SYMBOL_GPL(queue_kthread_work);
|
||||
|
||||
/**
|
||||
* flush_kthread_work - flush a kthread_work
|
||||
* @work: work to flush
|
||||
*
|
||||
* If @work is queued or executing, wait for it to finish execution.
|
||||
*/
|
||||
void flush_kthread_work(struct kthread_work *work)
|
||||
{
|
||||
int seq = work->queue_seq;
|
||||
|
||||
atomic_inc(&work->flushing);
|
||||
|
||||
/*
|
||||
* mb flush-b0 paired with worker-b1, to make sure either
|
||||
* worker sees the above increment or we see done_seq update.
|
||||
*/
|
||||
smp_mb__after_atomic_inc();
|
||||
|
||||
/* A - B <= 0 tests whether B is in front of A regardless of overflow */
|
||||
wait_event(work->done, seq - work->done_seq <= 0);
|
||||
atomic_dec(&work->flushing);
|
||||
|
||||
/*
|
||||
* rmb flush-b1 paired with worker-b0, to make sure our caller
|
||||
* sees every change made by work->func().
|
||||
*/
|
||||
smp_mb__after_atomic_dec();
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(flush_kthread_work);
|
||||
|
||||
struct kthread_flush_work {
|
||||
struct kthread_work work;
|
||||
struct completion done;
|
||||
|
@ -448,6 +424,46 @@ static void kthread_flush_work_fn(struct kthread_work *work)
|
|||
complete(&fwork->done);
|
||||
}
|
||||
|
||||
/**
|
||||
* flush_kthread_work - flush a kthread_work
|
||||
* @work: work to flush
|
||||
*
|
||||
* If @work is queued or executing, wait for it to finish execution.
|
||||
*/
|
||||
void flush_kthread_work(struct kthread_work *work)
|
||||
{
|
||||
struct kthread_flush_work fwork = {
|
||||
KTHREAD_WORK_INIT(fwork.work, kthread_flush_work_fn),
|
||||
COMPLETION_INITIALIZER_ONSTACK(fwork.done),
|
||||
};
|
||||
struct kthread_worker *worker;
|
||||
bool noop = false;
|
||||
|
||||
retry:
|
||||
worker = work->worker;
|
||||
if (!worker)
|
||||
return;
|
||||
|
||||
spin_lock_irq(&worker->lock);
|
||||
if (work->worker != worker) {
|
||||
spin_unlock_irq(&worker->lock);
|
||||
goto retry;
|
||||
}
|
||||
|
||||
if (!list_empty(&work->node))
|
||||
insert_kthread_work(worker, &fwork.work, work->node.next);
|
||||
else if (worker->current_work == work)
|
||||
insert_kthread_work(worker, &fwork.work, worker->work_list.next);
|
||||
else
|
||||
noop = true;
|
||||
|
||||
spin_unlock_irq(&worker->lock);
|
||||
|
||||
if (!noop)
|
||||
wait_for_completion(&fwork.done);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(flush_kthread_work);
|
||||
|
||||
/**
|
||||
* flush_kthread_worker - flush all current works on a kthread_worker
|
||||
* @worker: worker to flush
|
||||
|
|
1154
kernel/workqueue.c
1154
kernel/workqueue.c
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue