Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler changes from Ingo Molnar: "The main changes in this cycle are: - (much) improved CONFIG_NUMA_BALANCING support from Mel Gorman, Rik van Riel, Peter Zijlstra et al. Yay! - optimize preemption counter handling: merge the NEED_RESCHED flag into the preempt_count variable, by Peter Zijlstra. - wait.h fixes and code reorganization from Peter Zijlstra - cfs_bandwidth fixes from Ben Segall - SMP load-balancer cleanups from Peter Zijstra - idle balancer improvements from Jason Low - other fixes and cleanups" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED stop_machine: Fix race between stop_two_cpus() and stop_cpus() sched: Remove unnecessary iteration over sched domains to update nr_busy_cpus sched: Fix asymmetric scheduling for POWER7 sched: Move completion code from core.c to completion.c sched: Move wait code from core.c to wait.c sched: Move wait.c into kernel/sched/ sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() sched: Avoid throttle_cfs_rq() racing with period_timer stopping sched: Guarantee new group-entities always have weight sched: Fix hrtimer_cancel()/rq->lock deadlock sched: Fix cfs_bandwidth misuse of hrtimer_expires_remaining sched: Fix race on toggling cfs_bandwidth_used sched: Remove extra put_online_cpus() inside sched_setaffinity() sched/rt: Fix task_tick_rt() comment sched/wait: Fix build breakage sched/wait: Introduce prepare_to_wait_event() sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too sched: Remove get_online_cpus() usage sched: Fix race in migrate_swap_stop() ...
This commit is contained in:
commit
39cf275a1a
|
@ -355,6 +355,82 @@ utilize.
|
|||
|
||||
==============================================================
|
||||
|
||||
numa_balancing
|
||||
|
||||
Enables/disables automatic page fault based NUMA memory
|
||||
balancing. Memory is moved automatically to nodes
|
||||
that access it often.
|
||||
|
||||
Enables/disables automatic NUMA memory balancing. On NUMA machines, there
|
||||
is a performance penalty if remote memory is accessed by a CPU. When this
|
||||
feature is enabled the kernel samples what task thread is accessing memory
|
||||
by periodically unmapping pages and later trapping a page fault. At the
|
||||
time of the page fault, it is determined if the data being accessed should
|
||||
be migrated to a local memory node.
|
||||
|
||||
The unmapping of pages and trapping faults incur additional overhead that
|
||||
ideally is offset by improved memory locality but there is no universal
|
||||
guarantee. If the target workload is already bound to NUMA nodes then this
|
||||
feature should be disabled. Otherwise, if the system overhead from the
|
||||
feature is too high then the rate the kernel samples for NUMA hinting
|
||||
faults may be controlled by the numa_balancing_scan_period_min_ms,
|
||||
numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
|
||||
numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
|
||||
numa_balancing_migrate_deferred.
|
||||
|
||||
==============================================================
|
||||
|
||||
numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
|
||||
numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
|
||||
|
||||
Automatic NUMA balancing scans tasks address space and unmaps pages to
|
||||
detect if pages are properly placed or if the data should be migrated to a
|
||||
memory node local to where the task is running. Every "scan delay" the task
|
||||
scans the next "scan size" number of pages in its address space. When the
|
||||
end of the address space is reached the scanner restarts from the beginning.
|
||||
|
||||
In combination, the "scan delay" and "scan size" determine the scan rate.
|
||||
When "scan delay" decreases, the scan rate increases. The scan delay and
|
||||
hence the scan rate of every task is adaptive and depends on historical
|
||||
behaviour. If pages are properly placed then the scan delay increases,
|
||||
otherwise the scan delay decreases. The "scan size" is not adaptive but
|
||||
the higher the "scan size", the higher the scan rate.
|
||||
|
||||
Higher scan rates incur higher system overhead as page faults must be
|
||||
trapped and potentially data must be migrated. However, the higher the scan
|
||||
rate, the more quickly a tasks memory is migrated to a local node if the
|
||||
workload pattern changes and minimises performance impact due to remote
|
||||
memory accesses. These sysctls control the thresholds for scan delays and
|
||||
the number of pages scanned.
|
||||
|
||||
numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the maximum scanning
|
||||
rate for each task.
|
||||
|
||||
numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
|
||||
when it initially forks.
|
||||
|
||||
numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
|
||||
scan a tasks virtual memory. It effectively controls the minimum scanning
|
||||
rate for each task.
|
||||
|
||||
numa_balancing_scan_size_mb is how many megabytes worth of pages are
|
||||
scanned for a given scan.
|
||||
|
||||
numa_balancing_settle_count is how many scan periods must complete before
|
||||
the schedule balancer stops pushing the task towards a preferred node. This
|
||||
gives the scheduler a chance to place the task on an alternative node if the
|
||||
preferred node is overloaded.
|
||||
|
||||
numa_balancing_migrate_deferred is how many page migrations get skipped
|
||||
unconditionally, after a page migration is skipped because a page is shared
|
||||
with other tasks. This reduces page migration overhead, and determines
|
||||
how much stronger the "move task near its memory" policy scheduler becomes,
|
||||
versus the "move memory near its task" memory management policy, for workloads
|
||||
with shared memory.
|
||||
|
||||
==============================================================
|
||||
|
||||
osrelease, ostype & version:
|
||||
|
||||
# cat osrelease
|
||||
|
|
|
@ -655,7 +655,11 @@ explains which is which.
|
|||
read the irq flags variable, an 'X' will always
|
||||
be printed here.
|
||||
|
||||
need-resched: 'N' task need_resched is set, '.' otherwise.
|
||||
need-resched:
|
||||
'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
|
||||
'n' only TIF_NEED_RESCHED is set,
|
||||
'p' only PREEMPT_NEED_RESCHED is set,
|
||||
'.' otherwise.
|
||||
|
||||
hardirq/softirq:
|
||||
'H' - hard irq occurred inside a softirq.
|
||||
|
|
|
@ -7326,6 +7326,8 @@ S: Maintained
|
|||
F: kernel/sched/
|
||||
F: include/linux/sched.h
|
||||
F: include/uapi/linux/sched.h
|
||||
F: kernel/wait.c
|
||||
F: include/linux/wait.h
|
||||
|
||||
SCORE ARCHITECTURE
|
||||
M: Chen Liqin <liqin.linux@gmail.com>
|
||||
|
|
|
@ -3,3 +3,4 @@ generic-y += clkdev.h
|
|||
|
||||
generic-y += exec.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -46,3 +46,4 @@ generic-y += ucontext.h
|
|||
generic-y += user.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -32,3 +32,4 @@ generic-y += termios.h
|
|||
generic-y += timex.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += unaligned.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -50,3 +50,4 @@ generic-y += unaligned.h
|
|||
generic-y += user.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -7,6 +7,7 @@ generic-y += div64.h
|
|||
generic-y += emergency-restart.h
|
||||
generic-y += exec.h
|
||||
generic-y += futex.h
|
||||
generic-y += preempt.h
|
||||
generic-y += irq_regs.h
|
||||
generic-y += param.h
|
||||
generic-y += local.h
|
||||
|
|
|
@ -44,3 +44,4 @@ generic-y += ucontext.h
|
|||
generic-y += unaligned.h
|
||||
generic-y += user.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -56,3 +56,4 @@ generic-y += ucontext.h
|
|||
generic-y += user.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -11,3 +11,4 @@ generic-y += module.h
|
|||
generic-y += trace_clock.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -2,3 +2,4 @@
|
|||
generic-y += clkdev.h
|
||||
generic-y += exec.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -6,3 +6,4 @@ generic-y += mmu.h
|
|||
generic-y += module.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -53,3 +53,4 @@ generic-y += types.h
|
|||
generic-y += ucontext.h
|
||||
generic-y += unaligned.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -3,4 +3,5 @@ generic-y += clkdev.h
|
|||
generic-y += exec.h
|
||||
generic-y += kvm_para.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
generic-y += vtime.h
|
|
@ -3,3 +3,4 @@ generic-y += clkdev.h
|
|||
generic-y += exec.h
|
||||
generic-y += module.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -31,3 +31,4 @@ generic-y += trace_clock.h
|
|||
generic-y += types.h
|
||||
generic-y += word-at-a-time.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -52,3 +52,4 @@ generic-y += unaligned.h
|
|||
generic-y += user.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -26,6 +26,8 @@
|
|||
.last_balance = jiffies, \
|
||||
.balance_interval = 1, \
|
||||
.nr_balance_failed = 0, \
|
||||
.max_newidle_lb_cost = 0, \
|
||||
.next_decay_max_lb_cost = jiffies, \
|
||||
}
|
||||
|
||||
#define cpu_to_node(cpu) ((void)(cpu), 0)
|
||||
|
|
|
@ -3,3 +3,4 @@ generic-y += clkdev.h
|
|||
generic-y += exec.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += syscalls.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -11,5 +11,6 @@ generic-y += sections.h
|
|||
generic-y += segment.h
|
||||
generic-y += serial.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
generic-y += ucontext.h
|
||||
generic-y += xor.h
|
||||
|
|
|
@ -172,8 +172,9 @@ int rtlx_open(int index, int can_sleep)
|
|||
if (rtlx == NULL) {
|
||||
if( (p = vpe_get_shared(tclimit)) == NULL) {
|
||||
if (can_sleep) {
|
||||
__wait_event_interruptible(channel_wqs[index].lx_queue,
|
||||
(p = vpe_get_shared(tclimit)), ret);
|
||||
ret = __wait_event_interruptible(
|
||||
channel_wqs[index].lx_queue,
|
||||
(p = vpe_get_shared(tclimit)));
|
||||
if (ret)
|
||||
goto out_fail;
|
||||
} else {
|
||||
|
@ -263,11 +264,10 @@ unsigned int rtlx_read_poll(int index, int can_sleep)
|
|||
/* data available to read? */
|
||||
if (chan->lx_read == chan->lx_write) {
|
||||
if (can_sleep) {
|
||||
int ret = 0;
|
||||
|
||||
__wait_event_interruptible(channel_wqs[index].lx_queue,
|
||||
int ret = __wait_event_interruptible(
|
||||
channel_wqs[index].lx_queue,
|
||||
(chan->lx_read != chan->lx_write) ||
|
||||
sp_stopping, ret);
|
||||
sp_stopping);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
|
@ -440,14 +440,13 @@ static ssize_t file_write(struct file *file, const char __user * buffer,
|
|||
|
||||
/* any space left... */
|
||||
if (!rtlx_write_poll(minor)) {
|
||||
int ret = 0;
|
||||
int ret;
|
||||
|
||||
if (file->f_flags & O_NONBLOCK)
|
||||
return -EAGAIN;
|
||||
|
||||
__wait_event_interruptible(channel_wqs[minor].rt_queue,
|
||||
rtlx_write_poll(minor),
|
||||
ret);
|
||||
ret = __wait_event_interruptible(channel_wqs[minor].rt_queue,
|
||||
rtlx_write_poll(minor));
|
||||
if (ret)
|
||||
return ret;
|
||||
}
|
||||
|
|
|
@ -124,7 +124,7 @@ void *kmap_coherent(struct page *page, unsigned long addr)
|
|||
|
||||
BUG_ON(Page_dcache_dirty(page));
|
||||
|
||||
inc_preempt_count();
|
||||
pagefault_disable();
|
||||
idx = (addr >> PAGE_SHIFT) & (FIX_N_COLOURS - 1);
|
||||
#ifdef CONFIG_MIPS_MT_SMTC
|
||||
idx += FIX_N_COLOURS * smp_processor_id() +
|
||||
|
@ -193,8 +193,7 @@ void kunmap_coherent(void)
|
|||
write_c0_entryhi(old_ctx);
|
||||
EXIT_CRITICAL(flags);
|
||||
#endif
|
||||
dec_preempt_count();
|
||||
preempt_check_resched();
|
||||
pagefault_enable();
|
||||
}
|
||||
|
||||
void copy_user_highpage(struct page *to, struct page *from,
|
||||
|
|
|
@ -2,3 +2,4 @@
|
|||
generic-y += clkdev.h
|
||||
generic-y += exec.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -67,3 +67,4 @@ generic-y += ucontext.h
|
|||
generic-y += user.h
|
||||
generic-y += word-at-a-time.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -4,3 +4,4 @@ generic-y += word-at-a-time.h auxvec.h user.h cputime.h emergency-restart.h \
|
|||
div64.h irq_regs.h kdebug.h kvm_para.h local64.h local.h param.h \
|
||||
poll.h xor.h clkdev.h exec.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -2,4 +2,5 @@
|
|||
generic-y += clkdev.h
|
||||
generic-y += rwsem.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
generic-y += vtime.h
|
|
@ -2,3 +2,4 @@
|
|||
|
||||
generic-y += clkdev.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -4,3 +4,4 @@ header-y +=
|
|||
generic-y += clkdev.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -34,3 +34,4 @@ generic-y += termios.h
|
|||
generic-y += trace_clock.h
|
||||
generic-y += ucontext.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -16,3 +16,4 @@ generic-y += serial.h
|
|||
generic-y += trace_clock.h
|
||||
generic-y += types.h
|
||||
generic-y += word-at-a-time.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -38,3 +38,4 @@ generic-y += termios.h
|
|||
generic-y += trace_clock.h
|
||||
generic-y += types.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -3,3 +3,4 @@ generic-y += hw_irq.h irq_regs.h kdebug.h percpu.h sections.h topology.h xor.h
|
|||
generic-y += ftrace.h pci.h io.h param.h delay.h mutex.h current.h exec.h
|
||||
generic-y += switch_to.h clkdev.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -60,3 +60,4 @@ generic-y += unaligned.h
|
|||
generic-y += user.h
|
||||
generic-y += vga.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -6,6 +6,7 @@
|
|||
#include <asm/processor.h>
|
||||
#include <asm/alternative.h>
|
||||
#include <asm/cmpxchg.h>
|
||||
#include <asm/rmwcc.h>
|
||||
|
||||
/*
|
||||
* Atomic operations that C can't guarantee us. Useful for
|
||||
|
@ -76,12 +77,7 @@ static inline void atomic_sub(int i, atomic_t *v)
|
|||
*/
|
||||
static inline int atomic_sub_and_test(int i, atomic_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "subl %2,%0; sete %1"
|
||||
: "+m" (v->counter), "=qm" (c)
|
||||
: "ir" (i) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, i, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -118,12 +114,7 @@ static inline void atomic_dec(atomic_t *v)
|
|||
*/
|
||||
static inline int atomic_dec_and_test(atomic_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "decl %0; sete %1"
|
||||
: "+m" (v->counter), "=qm" (c)
|
||||
: : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -136,12 +127,7 @@ static inline int atomic_dec_and_test(atomic_t *v)
|
|||
*/
|
||||
static inline int atomic_inc_and_test(atomic_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "incl %0; sete %1"
|
||||
: "+m" (v->counter), "=qm" (c)
|
||||
: : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -155,12 +141,7 @@ static inline int atomic_inc_and_test(atomic_t *v)
|
|||
*/
|
||||
static inline int atomic_add_negative(int i, atomic_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "addl %2,%0; sets %1"
|
||||
: "+m" (v->counter), "=qm" (c)
|
||||
: "ir" (i) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, i, "%0", "s");
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -72,12 +72,7 @@ static inline void atomic64_sub(long i, atomic64_t *v)
|
|||
*/
|
||||
static inline int atomic64_sub_and_test(long i, atomic64_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "subq %2,%0; sete %1"
|
||||
: "=m" (v->counter), "=qm" (c)
|
||||
: "er" (i), "m" (v->counter) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, i, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -116,12 +111,7 @@ static inline void atomic64_dec(atomic64_t *v)
|
|||
*/
|
||||
static inline int atomic64_dec_and_test(atomic64_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "decq %0; sete %1"
|
||||
: "=m" (v->counter), "=qm" (c)
|
||||
: "m" (v->counter) : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -134,12 +124,7 @@ static inline int atomic64_dec_and_test(atomic64_t *v)
|
|||
*/
|
||||
static inline int atomic64_inc_and_test(atomic64_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "incq %0; sete %1"
|
||||
: "=m" (v->counter), "=qm" (c)
|
||||
: "m" (v->counter) : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -153,12 +138,7 @@ static inline int atomic64_inc_and_test(atomic64_t *v)
|
|||
*/
|
||||
static inline int atomic64_add_negative(long i, atomic64_t *v)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(LOCK_PREFIX "addq %2,%0; sets %1"
|
||||
: "=m" (v->counter), "=qm" (c)
|
||||
: "er" (i), "m" (v->counter) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, i, "%0", "s");
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -14,6 +14,7 @@
|
|||
|
||||
#include <linux/compiler.h>
|
||||
#include <asm/alternative.h>
|
||||
#include <asm/rmwcc.h>
|
||||
|
||||
#if BITS_PER_LONG == 32
|
||||
# define _BITOPS_LONG_SHIFT 5
|
||||
|
@ -204,12 +205,7 @@ static inline void change_bit(long nr, volatile unsigned long *addr)
|
|||
*/
|
||||
static inline int test_and_set_bit(long nr, volatile unsigned long *addr)
|
||||
{
|
||||
int oldbit;
|
||||
|
||||
asm volatile(LOCK_PREFIX "bts %2,%1\n\t"
|
||||
"sbb %0,%0" : "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
|
||||
|
||||
return oldbit;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "bts", *addr, nr, "%0", "c");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -255,13 +251,7 @@ static inline int __test_and_set_bit(long nr, volatile unsigned long *addr)
|
|||
*/
|
||||
static inline int test_and_clear_bit(long nr, volatile unsigned long *addr)
|
||||
{
|
||||
int oldbit;
|
||||
|
||||
asm volatile(LOCK_PREFIX "btr %2,%1\n\t"
|
||||
"sbb %0,%0"
|
||||
: "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
|
||||
|
||||
return oldbit;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "btr", *addr, nr, "%0", "c");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -314,13 +304,7 @@ static inline int __test_and_change_bit(long nr, volatile unsigned long *addr)
|
|||
*/
|
||||
static inline int test_and_change_bit(long nr, volatile unsigned long *addr)
|
||||
{
|
||||
int oldbit;
|
||||
|
||||
asm volatile(LOCK_PREFIX "btc %2,%1\n\t"
|
||||
"sbb %0,%0"
|
||||
: "=r" (oldbit), ADDR : "Ir" (nr) : "memory");
|
||||
|
||||
return oldbit;
|
||||
GEN_BINARY_RMWcc(LOCK_PREFIX "btc", *addr, nr, "%0", "c");
|
||||
}
|
||||
|
||||
static __always_inline int constant_test_bit(long nr, const volatile unsigned long *addr)
|
||||
|
|
|
@ -48,6 +48,8 @@ For 32-bit we have the following conventions - kernel is built with
|
|||
|
||||
#include <asm/dwarf2.h>
|
||||
|
||||
#ifdef CONFIG_X86_64
|
||||
|
||||
/*
|
||||
* 64-bit system call stack frame layout defines and helpers,
|
||||
* for assembly code:
|
||||
|
@ -192,3 +194,51 @@ For 32-bit we have the following conventions - kernel is built with
|
|||
.macro icebp
|
||||
.byte 0xf1
|
||||
.endm
|
||||
|
||||
#else /* CONFIG_X86_64 */
|
||||
|
||||
/*
|
||||
* For 32bit only simplified versions of SAVE_ALL/RESTORE_ALL. These
|
||||
* are different from the entry_32.S versions in not changing the segment
|
||||
* registers. So only suitable for in kernel use, not when transitioning
|
||||
* from or to user space. The resulting stack frame is not a standard
|
||||
* pt_regs frame. The main use case is calling C code from assembler
|
||||
* when all the registers need to be preserved.
|
||||
*/
|
||||
|
||||
.macro SAVE_ALL
|
||||
pushl_cfi %eax
|
||||
CFI_REL_OFFSET eax, 0
|
||||
pushl_cfi %ebp
|
||||
CFI_REL_OFFSET ebp, 0
|
||||
pushl_cfi %edi
|
||||
CFI_REL_OFFSET edi, 0
|
||||
pushl_cfi %esi
|
||||
CFI_REL_OFFSET esi, 0
|
||||
pushl_cfi %edx
|
||||
CFI_REL_OFFSET edx, 0
|
||||
pushl_cfi %ecx
|
||||
CFI_REL_OFFSET ecx, 0
|
||||
pushl_cfi %ebx
|
||||
CFI_REL_OFFSET ebx, 0
|
||||
.endm
|
||||
|
||||
.macro RESTORE_ALL
|
||||
popl_cfi %ebx
|
||||
CFI_RESTORE ebx
|
||||
popl_cfi %ecx
|
||||
CFI_RESTORE ecx
|
||||
popl_cfi %edx
|
||||
CFI_RESTORE edx
|
||||
popl_cfi %esi
|
||||
CFI_RESTORE esi
|
||||
popl_cfi %edi
|
||||
CFI_RESTORE edi
|
||||
popl_cfi %ebp
|
||||
CFI_RESTORE ebp
|
||||
popl_cfi %eax
|
||||
CFI_RESTORE eax
|
||||
.endm
|
||||
|
||||
#endif /* CONFIG_X86_64 */
|
||||
|
||||
|
|
|
@ -52,12 +52,7 @@ static inline void local_sub(long i, local_t *l)
|
|||
*/
|
||||
static inline int local_sub_and_test(long i, local_t *l)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(_ASM_SUB "%2,%0; sete %1"
|
||||
: "+m" (l->a.counter), "=qm" (c)
|
||||
: "ir" (i) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(_ASM_SUB, l->a.counter, i, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -70,12 +65,7 @@ static inline int local_sub_and_test(long i, local_t *l)
|
|||
*/
|
||||
static inline int local_dec_and_test(local_t *l)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(_ASM_DEC "%0; sete %1"
|
||||
: "+m" (l->a.counter), "=qm" (c)
|
||||
: : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(_ASM_DEC, l->a.counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -88,12 +78,7 @@ static inline int local_dec_and_test(local_t *l)
|
|||
*/
|
||||
static inline int local_inc_and_test(local_t *l)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(_ASM_INC "%0; sete %1"
|
||||
: "+m" (l->a.counter), "=qm" (c)
|
||||
: : "memory");
|
||||
return c != 0;
|
||||
GEN_UNARY_RMWcc(_ASM_INC, l->a.counter, "%0", "e");
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -107,12 +92,7 @@ static inline int local_inc_and_test(local_t *l)
|
|||
*/
|
||||
static inline int local_add_negative(long i, local_t *l)
|
||||
{
|
||||
unsigned char c;
|
||||
|
||||
asm volatile(_ASM_ADD "%2,%0; sets %1"
|
||||
: "+m" (l->a.counter), "=qm" (c)
|
||||
: "ir" (i) : "memory");
|
||||
return c;
|
||||
GEN_BINARY_RMWcc(_ASM_ADD, l->a.counter, i, "%0", "s");
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -0,0 +1,100 @@
|
|||
#ifndef __ASM_PREEMPT_H
|
||||
#define __ASM_PREEMPT_H
|
||||
|
||||
#include <asm/rmwcc.h>
|
||||
#include <asm/percpu.h>
|
||||
#include <linux/thread_info.h>
|
||||
|
||||
DECLARE_PER_CPU(int, __preempt_count);
|
||||
|
||||
/*
|
||||
* We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
|
||||
* that think a non-zero value indicates we cannot preempt.
|
||||
*/
|
||||
static __always_inline int preempt_count(void)
|
||||
{
|
||||
return __this_cpu_read_4(__preempt_count) & ~PREEMPT_NEED_RESCHED;
|
||||
}
|
||||
|
||||
static __always_inline void preempt_count_set(int pc)
|
||||
{
|
||||
__this_cpu_write_4(__preempt_count, pc);
|
||||
}
|
||||
|
||||
/*
|
||||
* must be macros to avoid header recursion hell
|
||||
*/
|
||||
#define task_preempt_count(p) \
|
||||
(task_thread_info(p)->saved_preempt_count & ~PREEMPT_NEED_RESCHED)
|
||||
|
||||
#define init_task_preempt_count(p) do { \
|
||||
task_thread_info(p)->saved_preempt_count = PREEMPT_DISABLED; \
|
||||
} while (0)
|
||||
|
||||
#define init_idle_preempt_count(p, cpu) do { \
|
||||
task_thread_info(p)->saved_preempt_count = PREEMPT_ENABLED; \
|
||||
per_cpu(__preempt_count, (cpu)) = PREEMPT_ENABLED; \
|
||||
} while (0)
|
||||
|
||||
/*
|
||||
* We fold the NEED_RESCHED bit into the preempt count such that
|
||||
* preempt_enable() can decrement and test for needing to reschedule with a
|
||||
* single instruction.
|
||||
*
|
||||
* We invert the actual bit, so that when the decrement hits 0 we know we both
|
||||
* need to resched (the bit is cleared) and can resched (no preempt count).
|
||||
*/
|
||||
|
||||
static __always_inline void set_preempt_need_resched(void)
|
||||
{
|
||||
__this_cpu_and_4(__preempt_count, ~PREEMPT_NEED_RESCHED);
|
||||
}
|
||||
|
||||
static __always_inline void clear_preempt_need_resched(void)
|
||||
{
|
||||
__this_cpu_or_4(__preempt_count, PREEMPT_NEED_RESCHED);
|
||||
}
|
||||
|
||||
static __always_inline bool test_preempt_need_resched(void)
|
||||
{
|
||||
return !(__this_cpu_read_4(__preempt_count) & PREEMPT_NEED_RESCHED);
|
||||
}
|
||||
|
||||
/*
|
||||
* The various preempt_count add/sub methods
|
||||
*/
|
||||
|
||||
static __always_inline void __preempt_count_add(int val)
|
||||
{
|
||||
__this_cpu_add_4(__preempt_count, val);
|
||||
}
|
||||
|
||||
static __always_inline void __preempt_count_sub(int val)
|
||||
{
|
||||
__this_cpu_add_4(__preempt_count, -val);
|
||||
}
|
||||
|
||||
static __always_inline bool __preempt_count_dec_and_test(void)
|
||||
{
|
||||
GEN_UNARY_RMWcc("decl", __preempt_count, __percpu_arg(0), "e");
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true when we need to resched and can (barring IRQ state).
|
||||
*/
|
||||
static __always_inline bool should_resched(void)
|
||||
{
|
||||
return unlikely(!__this_cpu_read_4(__preempt_count));
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
extern asmlinkage void ___preempt_schedule(void);
|
||||
# define __preempt_schedule() asm ("call ___preempt_schedule")
|
||||
extern asmlinkage void preempt_schedule(void);
|
||||
# ifdef CONFIG_CONTEXT_TRACKING
|
||||
extern asmlinkage void ___preempt_schedule_context(void);
|
||||
# define __preempt_schedule_context() asm ("call ___preempt_schedule_context")
|
||||
# endif
|
||||
#endif
|
||||
|
||||
#endif /* __ASM_PREEMPT_H */
|
|
@ -0,0 +1,41 @@
|
|||
#ifndef _ASM_X86_RMWcc
|
||||
#define _ASM_X86_RMWcc
|
||||
|
||||
#ifdef CC_HAVE_ASM_GOTO
|
||||
|
||||
#define __GEN_RMWcc(fullop, var, cc, ...) \
|
||||
do { \
|
||||
asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
|
||||
: : "m" (var), ## __VA_ARGS__ \
|
||||
: "memory" : cc_label); \
|
||||
return 0; \
|
||||
cc_label: \
|
||||
return 1; \
|
||||
} while (0)
|
||||
|
||||
#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
|
||||
__GEN_RMWcc(op " " arg0, var, cc)
|
||||
|
||||
#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
|
||||
__GEN_RMWcc(op " %1, " arg0, var, cc, "er" (val))
|
||||
|
||||
#else /* !CC_HAVE_ASM_GOTO */
|
||||
|
||||
#define __GEN_RMWcc(fullop, var, cc, ...) \
|
||||
do { \
|
||||
char c; \
|
||||
asm volatile (fullop "; set" cc " %1" \
|
||||
: "+m" (var), "=qm" (c) \
|
||||
: __VA_ARGS__ : "memory"); \
|
||||
return c != 0; \
|
||||
} while (0)
|
||||
|
||||
#define GEN_UNARY_RMWcc(op, var, arg0, cc) \
|
||||
__GEN_RMWcc(op " " arg0, var, cc)
|
||||
|
||||
#define GEN_BINARY_RMWcc(op, var, val, arg0, cc) \
|
||||
__GEN_RMWcc(op " %2, " arg0, var, cc, "er" (val))
|
||||
|
||||
#endif /* CC_HAVE_ASM_GOTO */
|
||||
|
||||
#endif /* _ASM_X86_RMWcc */
|
|
@ -28,8 +28,7 @@ struct thread_info {
|
|||
__u32 flags; /* low level flags */
|
||||
__u32 status; /* thread synchronous flags */
|
||||
__u32 cpu; /* current CPU */
|
||||
int preempt_count; /* 0 => preemptable,
|
||||
<0 => BUG */
|
||||
int saved_preempt_count;
|
||||
mm_segment_t addr_limit;
|
||||
struct restart_block restart_block;
|
||||
void __user *sysenter_return;
|
||||
|
@ -49,7 +48,7 @@ struct thread_info {
|
|||
.exec_domain = &default_exec_domain, \
|
||||
.flags = 0, \
|
||||
.cpu = 0, \
|
||||
.preempt_count = INIT_PREEMPT_COUNT, \
|
||||
.saved_preempt_count = INIT_PREEMPT_COUNT, \
|
||||
.addr_limit = KERNEL_DS, \
|
||||
.restart_block = { \
|
||||
.fn = do_no_restart_syscall, \
|
||||
|
|
|
@ -36,6 +36,8 @@ obj-y += tsc.o io_delay.o rtc.o
|
|||
obj-y += pci-iommu_table.o
|
||||
obj-y += resource.o
|
||||
|
||||
obj-$(CONFIG_PREEMPT) += preempt.o
|
||||
|
||||
obj-y += process.o
|
||||
obj-y += i387.o xsave.o
|
||||
obj-y += ptrace.o
|
||||
|
|
|
@ -32,7 +32,6 @@ void common(void) {
|
|||
OFFSET(TI_flags, thread_info, flags);
|
||||
OFFSET(TI_status, thread_info, status);
|
||||
OFFSET(TI_addr_limit, thread_info, addr_limit);
|
||||
OFFSET(TI_preempt_count, thread_info, preempt_count);
|
||||
|
||||
BLANK();
|
||||
OFFSET(crypto_tfm_ctx_offset, crypto_tfm, __crt_ctx);
|
||||
|
|
|
@ -1095,6 +1095,9 @@ DEFINE_PER_CPU(char *, irq_stack_ptr) =
|
|||
|
||||
DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1;
|
||||
|
||||
DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
|
||||
EXPORT_PER_CPU_SYMBOL(__preempt_count);
|
||||
|
||||
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
|
||||
|
||||
/*
|
||||
|
@ -1169,6 +1172,8 @@ void debug_stack_reset(void)
|
|||
|
||||
DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task;
|
||||
EXPORT_PER_CPU_SYMBOL(current_task);
|
||||
DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT;
|
||||
EXPORT_PER_CPU_SYMBOL(__preempt_count);
|
||||
DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
|
||||
|
||||
#ifdef CONFIG_CC_STACKPROTECTOR
|
||||
|
|
|
@ -362,12 +362,9 @@ END(ret_from_exception)
|
|||
#ifdef CONFIG_PREEMPT
|
||||
ENTRY(resume_kernel)
|
||||
DISABLE_INTERRUPTS(CLBR_ANY)
|
||||
cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
|
||||
jnz restore_all
|
||||
need_resched:
|
||||
movl TI_flags(%ebp), %ecx # need_resched set ?
|
||||
testb $_TIF_NEED_RESCHED, %cl
|
||||
jz restore_all
|
||||
cmpl $0,PER_CPU_VAR(__preempt_count)
|
||||
jnz restore_all
|
||||
testl $X86_EFLAGS_IF,PT_EFLAGS(%esp) # interrupts off (exception path) ?
|
||||
jz restore_all
|
||||
call preempt_schedule_irq
|
||||
|
|
|
@ -1103,10 +1103,8 @@ retint_signal:
|
|||
/* Returning to kernel space. Check if we need preemption */
|
||||
/* rcx: threadinfo. interrupts off. */
|
||||
ENTRY(retint_kernel)
|
||||
cmpl $0,TI_preempt_count(%rcx)
|
||||
cmpl $0,PER_CPU_VAR(__preempt_count)
|
||||
jnz retint_restore_args
|
||||
bt $TIF_NEED_RESCHED,TI_flags(%rcx)
|
||||
jnc retint_restore_args
|
||||
bt $9,EFLAGS-ARGOFFSET(%rsp) /* interrupts off? */
|
||||
jnc retint_restore_args
|
||||
call preempt_schedule_irq
|
||||
|
|
|
@ -37,3 +37,10 @@ EXPORT_SYMBOL(strstr);
|
|||
|
||||
EXPORT_SYMBOL(csum_partial);
|
||||
EXPORT_SYMBOL(empty_zero_page);
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
EXPORT_SYMBOL(___preempt_schedule);
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
EXPORT_SYMBOL(___preempt_schedule_context);
|
||||
#endif
|
||||
#endif
|
||||
|
|
|
@ -100,9 +100,6 @@ execute_on_irq_stack(int overflow, struct irq_desc *desc, int irq)
|
|||
irqctx->tinfo.task = curctx->tinfo.task;
|
||||
irqctx->tinfo.previous_esp = current_stack_pointer;
|
||||
|
||||
/* Copy the preempt_count so that the [soft]irq checks work. */
|
||||
irqctx->tinfo.preempt_count = curctx->tinfo.preempt_count;
|
||||
|
||||
if (unlikely(overflow))
|
||||
call_on_stack(print_stack_overflow, isp);
|
||||
|
||||
|
@ -131,7 +128,6 @@ void irq_ctx_init(int cpu)
|
|||
THREAD_SIZE_ORDER));
|
||||
memset(&irqctx->tinfo, 0, sizeof(struct thread_info));
|
||||
irqctx->tinfo.cpu = cpu;
|
||||
irqctx->tinfo.preempt_count = HARDIRQ_OFFSET;
|
||||
irqctx->tinfo.addr_limit = MAKE_MM_SEG(0);
|
||||
|
||||
per_cpu(hardirq_ctx, cpu) = irqctx;
|
||||
|
|
|
@ -0,0 +1,25 @@
|
|||
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/dwarf2.h>
|
||||
#include <asm/asm.h>
|
||||
#include <asm/calling.h>
|
||||
|
||||
ENTRY(___preempt_schedule)
|
||||
CFI_STARTPROC
|
||||
SAVE_ALL
|
||||
call preempt_schedule
|
||||
RESTORE_ALL
|
||||
ret
|
||||
CFI_ENDPROC
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
|
||||
ENTRY(___preempt_schedule_context)
|
||||
CFI_STARTPROC
|
||||
SAVE_ALL
|
||||
call preempt_schedule_context
|
||||
RESTORE_ALL
|
||||
ret
|
||||
CFI_ENDPROC
|
||||
|
||||
#endif
|
|
@ -291,6 +291,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
|
|||
if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl))
|
||||
set_iopl_mask(next->iopl);
|
||||
|
||||
/*
|
||||
* If it were not for PREEMPT_ACTIVE we could guarantee that the
|
||||
* preempt_count of all tasks was equal here and this would not be
|
||||
* needed.
|
||||
*/
|
||||
task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
|
||||
this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
|
||||
|
||||
/*
|
||||
* Now maybe handle debug registers and/or IO bitmaps
|
||||
*/
|
||||
|
|
|
@ -363,6 +363,14 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
|
|||
this_cpu_write(old_rsp, next->usersp);
|
||||
this_cpu_write(current_task, next_p);
|
||||
|
||||
/*
|
||||
* If it were not for PREEMPT_ACTIVE we could guarantee that the
|
||||
* preempt_count of all tasks was equal here and this would not be
|
||||
* needed.
|
||||
*/
|
||||
task_thread_info(prev_p)->saved_preempt_count = this_cpu_read(__preempt_count);
|
||||
this_cpu_write(__preempt_count, task_thread_info(next_p)->saved_preempt_count);
|
||||
|
||||
this_cpu_write(kernel_stack,
|
||||
(unsigned long)task_stack_page(next_p) +
|
||||
THREAD_SIZE - KERNEL_STACK_OFFSET);
|
||||
|
|
|
@ -88,7 +88,7 @@ static inline void conditional_sti(struct pt_regs *regs)
|
|||
|
||||
static inline void preempt_conditional_sti(struct pt_regs *regs)
|
||||
{
|
||||
inc_preempt_count();
|
||||
preempt_count_inc();
|
||||
if (regs->flags & X86_EFLAGS_IF)
|
||||
local_irq_enable();
|
||||
}
|
||||
|
@ -103,7 +103,7 @@ static inline void preempt_conditional_cli(struct pt_regs *regs)
|
|||
{
|
||||
if (regs->flags & X86_EFLAGS_IF)
|
||||
local_irq_disable();
|
||||
dec_preempt_count();
|
||||
preempt_count_dec();
|
||||
}
|
||||
|
||||
static int __kprobes
|
||||
|
|
|
@ -66,3 +66,10 @@ EXPORT_SYMBOL(empty_zero_page);
|
|||
#ifndef CONFIG_PARAVIRT
|
||||
EXPORT_SYMBOL(native_load_gs_index);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
EXPORT_SYMBOL(___preempt_schedule);
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
EXPORT_SYMBOL(___preempt_schedule_context);
|
||||
#endif
|
||||
#endif
|
||||
|
|
|
@ -28,3 +28,4 @@ generic-y += termios.h
|
|||
generic-y += topology.h
|
||||
generic-y += trace_clock.h
|
||||
generic-y += xor.h
|
||||
generic-y += preempt.h
|
||||
|
|
|
@ -119,17 +119,10 @@ static struct dmi_system_id processor_power_dmi_table[] = {
|
|||
*/
|
||||
static void acpi_safe_halt(void)
|
||||
{
|
||||
current_thread_info()->status &= ~TS_POLLING;
|
||||
/*
|
||||
* TS_POLLING-cleared state must be visible before we
|
||||
* test NEED_RESCHED:
|
||||
*/
|
||||
smp_mb();
|
||||
if (!need_resched()) {
|
||||
if (!tif_need_resched()) {
|
||||
safe_halt();
|
||||
local_irq_disable();
|
||||
}
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
}
|
||||
|
||||
#ifdef ARCH_APICTIMER_STOPS_ON_C3
|
||||
|
@ -737,6 +730,11 @@ static int acpi_idle_enter_c1(struct cpuidle_device *dev,
|
|||
if (unlikely(!pr))
|
||||
return -EINVAL;
|
||||
|
||||
if (cx->entry_method == ACPI_CSTATE_FFH) {
|
||||
if (current_set_polling_and_test())
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
lapic_timer_state_broadcast(pr, cx, 1);
|
||||
acpi_idle_do_entry(cx);
|
||||
|
||||
|
@ -790,19 +788,10 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
|
|||
if (unlikely(!pr))
|
||||
return -EINVAL;
|
||||
|
||||
if (cx->entry_method != ACPI_CSTATE_FFH) {
|
||||
current_thread_info()->status &= ~TS_POLLING;
|
||||
/*
|
||||
* TS_POLLING-cleared state must be visible before we test
|
||||
* NEED_RESCHED:
|
||||
*/
|
||||
smp_mb();
|
||||
|
||||
if (unlikely(need_resched())) {
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
if (cx->entry_method == ACPI_CSTATE_FFH) {
|
||||
if (current_set_polling_and_test())
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Must be done before busmaster disable as we might need to
|
||||
|
@ -819,9 +808,6 @@ static int acpi_idle_enter_simple(struct cpuidle_device *dev,
|
|||
|
||||
sched_clock_idle_wakeup_event(0);
|
||||
|
||||
if (cx->entry_method != ACPI_CSTATE_FFH)
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
|
||||
lapic_timer_state_broadcast(pr, cx, 0);
|
||||
return index;
|
||||
}
|
||||
|
@ -858,19 +844,10 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
|
|||
}
|
||||
}
|
||||
|
||||
if (cx->entry_method != ACPI_CSTATE_FFH) {
|
||||
current_thread_info()->status &= ~TS_POLLING;
|
||||
/*
|
||||
* TS_POLLING-cleared state must be visible before we test
|
||||
* NEED_RESCHED:
|
||||
*/
|
||||
smp_mb();
|
||||
|
||||
if (unlikely(need_resched())) {
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
if (cx->entry_method == ACPI_CSTATE_FFH) {
|
||||
if (current_set_polling_and_test())
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
|
||||
acpi_unlazy_tlb(smp_processor_id());
|
||||
|
||||
|
@ -915,9 +892,6 @@ static int acpi_idle_enter_bm(struct cpuidle_device *dev,
|
|||
|
||||
sched_clock_idle_wakeup_event(0);
|
||||
|
||||
if (cx->entry_method != ACPI_CSTATE_FFH)
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
|
||||
lapic_timer_state_broadcast(pr, cx, 0);
|
||||
return index;
|
||||
}
|
||||
|
|
|
@ -359,7 +359,7 @@ static int intel_idle(struct cpuidle_device *dev,
|
|||
if (!(lapic_timer_reliable_states & (1 << (cstate))))
|
||||
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
|
||||
|
||||
if (!need_resched()) {
|
||||
if (!current_set_polling_and_test()) {
|
||||
|
||||
__monitor((void *)¤t_thread_info()->flags, 0, 0);
|
||||
smp_mb();
|
||||
|
|
|
@ -1547,6 +1547,7 @@ static int do_execve_common(const char *filename,
|
|||
current->fs->in_exec = 0;
|
||||
current->in_execve = 0;
|
||||
acct_update_integrals(current);
|
||||
task_numa_free(current);
|
||||
free_bprm(bprm);
|
||||
if (displaced)
|
||||
put_files_struct(displaced);
|
||||
|
|
|
@ -183,6 +183,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
|
|||
seq_printf(m,
|
||||
"State:\t%s\n"
|
||||
"Tgid:\t%d\n"
|
||||
"Ngid:\t%d\n"
|
||||
"Pid:\t%d\n"
|
||||
"PPid:\t%d\n"
|
||||
"TracerPid:\t%d\n"
|
||||
|
@ -190,6 +191,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns,
|
|||
"Gid:\t%d\t%d\t%d\t%d\n",
|
||||
get_task_state(p),
|
||||
task_tgid_nr_ns(p, ns),
|
||||
task_numa_group_id(p),
|
||||
pid_nr_ns(pid, ns),
|
||||
ppid, tpid,
|
||||
from_kuid_munged(user_ns, cred->uid),
|
||||
|
|
|
@ -0,0 +1,105 @@
|
|||
#ifndef __ASM_PREEMPT_H
|
||||
#define __ASM_PREEMPT_H
|
||||
|
||||
#include <linux/thread_info.h>
|
||||
|
||||
/*
|
||||
* We mask the PREEMPT_NEED_RESCHED bit so as not to confuse all current users
|
||||
* that think a non-zero value indicates we cannot preempt.
|
||||
*/
|
||||
static __always_inline int preempt_count(void)
|
||||
{
|
||||
return current_thread_info()->preempt_count & ~PREEMPT_NEED_RESCHED;
|
||||
}
|
||||
|
||||
static __always_inline int *preempt_count_ptr(void)
|
||||
{
|
||||
return ¤t_thread_info()->preempt_count;
|
||||
}
|
||||
|
||||
/*
|
||||
* We now loose PREEMPT_NEED_RESCHED and cause an extra reschedule; however the
|
||||
* alternative is loosing a reschedule. Better schedule too often -- also this
|
||||
* should be a very rare operation.
|
||||
*/
|
||||
static __always_inline void preempt_count_set(int pc)
|
||||
{
|
||||
*preempt_count_ptr() = pc;
|
||||
}
|
||||
|
||||
/*
|
||||
* must be macros to avoid header recursion hell
|
||||
*/
|
||||
#define task_preempt_count(p) \
|
||||
(task_thread_info(p)->preempt_count & ~PREEMPT_NEED_RESCHED)
|
||||
|
||||
#define init_task_preempt_count(p) do { \
|
||||
task_thread_info(p)->preempt_count = PREEMPT_DISABLED; \
|
||||
} while (0)
|
||||
|
||||
#define init_idle_preempt_count(p, cpu) do { \
|
||||
task_thread_info(p)->preempt_count = PREEMPT_ENABLED; \
|
||||
} while (0)
|
||||
|
||||
/*
|
||||
* We fold the NEED_RESCHED bit into the preempt count such that
|
||||
* preempt_enable() can decrement and test for needing to reschedule with a
|
||||
* single instruction.
|
||||
*
|
||||
* We invert the actual bit, so that when the decrement hits 0 we know we both
|
||||
* need to resched (the bit is cleared) and can resched (no preempt count).
|
||||
*/
|
||||
|
||||
static __always_inline void set_preempt_need_resched(void)
|
||||
{
|
||||
*preempt_count_ptr() &= ~PREEMPT_NEED_RESCHED;
|
||||
}
|
||||
|
||||
static __always_inline void clear_preempt_need_resched(void)
|
||||
{
|
||||
*preempt_count_ptr() |= PREEMPT_NEED_RESCHED;
|
||||
}
|
||||
|
||||
static __always_inline bool test_preempt_need_resched(void)
|
||||
{
|
||||
return !(*preempt_count_ptr() & PREEMPT_NEED_RESCHED);
|
||||
}
|
||||
|
||||
/*
|
||||
* The various preempt_count add/sub methods
|
||||
*/
|
||||
|
||||
static __always_inline void __preempt_count_add(int val)
|
||||
{
|
||||
*preempt_count_ptr() += val;
|
||||
}
|
||||
|
||||
static __always_inline void __preempt_count_sub(int val)
|
||||
{
|
||||
*preempt_count_ptr() -= val;
|
||||
}
|
||||
|
||||
static __always_inline bool __preempt_count_dec_and_test(void)
|
||||
{
|
||||
return !--*preempt_count_ptr();
|
||||
}
|
||||
|
||||
/*
|
||||
* Returns true when we need to resched and can (barring IRQ state).
|
||||
*/
|
||||
static __always_inline bool should_resched(void)
|
||||
{
|
||||
return unlikely(!*preempt_count_ptr());
|
||||
}
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
extern asmlinkage void preempt_schedule(void);
|
||||
#define __preempt_schedule() preempt_schedule()
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
extern asmlinkage void preempt_schedule_context(void);
|
||||
#define __preempt_schedule_context() preempt_schedule_context()
|
||||
#endif
|
||||
#endif /* CONFIG_PREEMPT */
|
||||
|
||||
#endif /* __ASM_PREEMPT_H */
|
|
@ -5,7 +5,7 @@
|
|||
* (C) Copyright 2001 Linus Torvalds
|
||||
*
|
||||
* Atomic wait-for-completion handler data structures.
|
||||
* See kernel/sched/core.c for details.
|
||||
* See kernel/sched/completion.c for details.
|
||||
*/
|
||||
|
||||
#include <linux/wait.h>
|
||||
|
|
|
@ -33,7 +33,7 @@ extern void rcu_nmi_exit(void);
|
|||
#define __irq_enter() \
|
||||
do { \
|
||||
account_irq_enter_time(current); \
|
||||
add_preempt_count(HARDIRQ_OFFSET); \
|
||||
preempt_count_add(HARDIRQ_OFFSET); \
|
||||
trace_hardirq_enter(); \
|
||||
} while (0)
|
||||
|
||||
|
@ -49,7 +49,7 @@ extern void irq_enter(void);
|
|||
do { \
|
||||
trace_hardirq_exit(); \
|
||||
account_irq_exit_time(current); \
|
||||
sub_preempt_count(HARDIRQ_OFFSET); \
|
||||
preempt_count_sub(HARDIRQ_OFFSET); \
|
||||
} while (0)
|
||||
|
||||
/*
|
||||
|
@ -62,7 +62,7 @@ extern void irq_exit(void);
|
|||
lockdep_off(); \
|
||||
ftrace_nmi_enter(); \
|
||||
BUG_ON(in_nmi()); \
|
||||
add_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
rcu_nmi_enter(); \
|
||||
trace_hardirq_enter(); \
|
||||
} while (0)
|
||||
|
@ -72,7 +72,7 @@ extern void irq_exit(void);
|
|||
trace_hardirq_exit(); \
|
||||
rcu_nmi_exit(); \
|
||||
BUG_ON(!in_nmi()); \
|
||||
sub_preempt_count(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
preempt_count_sub(NMI_OFFSET + HARDIRQ_OFFSET); \
|
||||
ftrace_nmi_exit(); \
|
||||
lockdep_on(); \
|
||||
} while (0)
|
||||
|
|
|
@ -136,6 +136,7 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp,
|
|||
|
||||
struct mempolicy *get_vma_policy(struct task_struct *tsk,
|
||||
struct vm_area_struct *vma, unsigned long addr);
|
||||
bool vma_policy_mof(struct task_struct *task, struct vm_area_struct *vma);
|
||||
|
||||
extern void numa_default_policy(void);
|
||||
extern void numa_policy_init(void);
|
||||
|
|
|
@ -90,11 +90,12 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping,
|
|||
#endif /* CONFIG_MIGRATION */
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
extern int migrate_misplaced_page(struct page *page, int node);
|
||||
extern int migrate_misplaced_page(struct page *page, int node);
|
||||
extern int migrate_misplaced_page(struct page *page,
|
||||
struct vm_area_struct *vma, int node);
|
||||
extern bool migrate_ratelimited(int node);
|
||||
#else
|
||||
static inline int migrate_misplaced_page(struct page *page, int node)
|
||||
static inline int migrate_misplaced_page(struct page *page,
|
||||
struct vm_area_struct *vma, int node)
|
||||
{
|
||||
return -EAGAIN; /* can't migrate now */
|
||||
}
|
||||
|
|
|
@ -581,11 +581,11 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
|
|||
* sets it, so none of the operations on it need to be atomic.
|
||||
*/
|
||||
|
||||
/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_NID] | ... | FLAGS | */
|
||||
/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */
|
||||
#define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH)
|
||||
#define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH)
|
||||
#define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH)
|
||||
#define LAST_NID_PGOFF (ZONES_PGOFF - LAST_NID_WIDTH)
|
||||
#define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH)
|
||||
|
||||
/*
|
||||
* Define the bit shifts to access each section. For non-existent
|
||||
|
@ -595,7 +595,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
|
|||
#define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0))
|
||||
#define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0))
|
||||
#define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0))
|
||||
#define LAST_NID_PGSHIFT (LAST_NID_PGOFF * (LAST_NID_WIDTH != 0))
|
||||
#define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0))
|
||||
|
||||
/* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
|
||||
#ifdef NODE_NOT_IN_PAGE_FLAGS
|
||||
|
@ -617,7 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
|
|||
#define ZONES_MASK ((1UL << ZONES_WIDTH) - 1)
|
||||
#define NODES_MASK ((1UL << NODES_WIDTH) - 1)
|
||||
#define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1)
|
||||
#define LAST_NID_MASK ((1UL << LAST_NID_WIDTH) - 1)
|
||||
#define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_WIDTH) - 1)
|
||||
#define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1)
|
||||
|
||||
static inline enum zone_type page_zonenum(const struct page *page)
|
||||
|
@ -661,51 +661,117 @@ static inline int page_to_nid(const struct page *page)
|
|||
#endif
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
|
||||
static inline int page_nid_xchg_last(struct page *page, int nid)
|
||||
static inline int cpu_pid_to_cpupid(int cpu, int pid)
|
||||
{
|
||||
return xchg(&page->_last_nid, nid);
|
||||
return ((cpu & LAST__CPU_MASK) << LAST__PID_SHIFT) | (pid & LAST__PID_MASK);
|
||||
}
|
||||
|
||||
static inline int page_nid_last(struct page *page)
|
||||
static inline int cpupid_to_pid(int cpupid)
|
||||
{
|
||||
return page->_last_nid;
|
||||
return cpupid & LAST__PID_MASK;
|
||||
}
|
||||
static inline void page_nid_reset_last(struct page *page)
|
||||
|
||||
static inline int cpupid_to_cpu(int cpupid)
|
||||
{
|
||||
page->_last_nid = -1;
|
||||
return (cpupid >> LAST__PID_SHIFT) & LAST__CPU_MASK;
|
||||
}
|
||||
|
||||
static inline int cpupid_to_nid(int cpupid)
|
||||
{
|
||||
return cpu_to_node(cpupid_to_cpu(cpupid));
|
||||
}
|
||||
|
||||
static inline bool cpupid_pid_unset(int cpupid)
|
||||
{
|
||||
return cpupid_to_pid(cpupid) == (-1 & LAST__PID_MASK);
|
||||
}
|
||||
|
||||
static inline bool cpupid_cpu_unset(int cpupid)
|
||||
{
|
||||
return cpupid_to_cpu(cpupid) == (-1 & LAST__CPU_MASK);
|
||||
}
|
||||
|
||||
static inline bool __cpupid_match_pid(pid_t task_pid, int cpupid)
|
||||
{
|
||||
return (task_pid & LAST__PID_MASK) == cpupid_to_pid(cpupid);
|
||||
}
|
||||
|
||||
#define cpupid_match_pid(task, cpupid) __cpupid_match_pid(task->pid, cpupid)
|
||||
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
|
||||
static inline int page_cpupid_xchg_last(struct page *page, int cpupid)
|
||||
{
|
||||
return xchg(&page->_last_cpupid, cpupid);
|
||||
}
|
||||
|
||||
static inline int page_cpupid_last(struct page *page)
|
||||
{
|
||||
return page->_last_cpupid;
|
||||
}
|
||||
static inline void page_cpupid_reset_last(struct page *page)
|
||||
{
|
||||
page->_last_cpupid = -1;
|
||||
}
|
||||
#else
|
||||
static inline int page_nid_last(struct page *page)
|
||||
static inline int page_cpupid_last(struct page *page)
|
||||
{
|
||||
return (page->flags >> LAST_NID_PGSHIFT) & LAST_NID_MASK;
|
||||
return (page->flags >> LAST_CPUPID_PGSHIFT) & LAST_CPUPID_MASK;
|
||||
}
|
||||
|
||||
extern int page_nid_xchg_last(struct page *page, int nid);
|
||||
extern int page_cpupid_xchg_last(struct page *page, int cpupid);
|
||||
|
||||
static inline void page_nid_reset_last(struct page *page)
|
||||
static inline void page_cpupid_reset_last(struct page *page)
|
||||
{
|
||||
int nid = (1 << LAST_NID_SHIFT) - 1;
|
||||
int cpupid = (1 << LAST_CPUPID_SHIFT) - 1;
|
||||
|
||||
page->flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT);
|
||||
page->flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT;
|
||||
page->flags &= ~(LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT);
|
||||
page->flags |= (cpupid & LAST_CPUPID_MASK) << LAST_CPUPID_PGSHIFT;
|
||||
}
|
||||
#endif /* LAST_NID_NOT_IN_PAGE_FLAGS */
|
||||
#else
|
||||
static inline int page_nid_xchg_last(struct page *page, int nid)
|
||||
#endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */
|
||||
#else /* !CONFIG_NUMA_BALANCING */
|
||||
static inline int page_cpupid_xchg_last(struct page *page, int cpupid)
|
||||
{
|
||||
return page_to_nid(page);
|
||||
return page_to_nid(page); /* XXX */
|
||||
}
|
||||
|
||||
static inline int page_nid_last(struct page *page)
|
||||
static inline int page_cpupid_last(struct page *page)
|
||||
{
|
||||
return page_to_nid(page);
|
||||
return page_to_nid(page); /* XXX */
|
||||
}
|
||||
|
||||
static inline void page_nid_reset_last(struct page *page)
|
||||
static inline int cpupid_to_nid(int cpupid)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
static inline int cpupid_to_pid(int cpupid)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
static inline int cpupid_to_cpu(int cpupid)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
static inline int cpu_pid_to_cpupid(int nid, int pid)
|
||||
{
|
||||
return -1;
|
||||
}
|
||||
|
||||
static inline bool cpupid_pid_unset(int cpupid)
|
||||
{
|
||||
return 1;
|
||||
}
|
||||
|
||||
static inline void page_cpupid_reset_last(struct page *page)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
|
||||
{
|
||||
return false;
|
||||
}
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
|
||||
static inline struct zone *page_zone(const struct page *page)
|
||||
{
|
||||
|
|
|
@ -174,8 +174,8 @@ struct page {
|
|||
void *shadow;
|
||||
#endif
|
||||
|
||||
#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
|
||||
int _last_nid;
|
||||
#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
|
||||
int _last_cpupid;
|
||||
#endif
|
||||
}
|
||||
/*
|
||||
|
@ -420,28 +420,15 @@ struct mm_struct {
|
|||
*/
|
||||
unsigned long numa_next_scan;
|
||||
|
||||
/* numa_next_reset is when the PTE scanner period will be reset */
|
||||
unsigned long numa_next_reset;
|
||||
|
||||
/* Restart point for scanning and setting pte_numa */
|
||||
unsigned long numa_scan_offset;
|
||||
|
||||
/* numa_scan_seq prevents two threads setting pte_numa */
|
||||
int numa_scan_seq;
|
||||
|
||||
/*
|
||||
* The first node a task was scheduled on. If a task runs on
|
||||
* a different node than Make PTE Scan Go Now.
|
||||
*/
|
||||
int first_nid;
|
||||
#endif
|
||||
struct uprobes_state uprobes_state;
|
||||
};
|
||||
|
||||
/* first nid will either be a valid NID or one of these values */
|
||||
#define NUMA_PTE_SCAN_INIT -1
|
||||
#define NUMA_PTE_SCAN_ACTIVE -2
|
||||
|
||||
static inline void mm_init_cpumask(struct mm_struct *mm)
|
||||
{
|
||||
#ifdef CONFIG_CPUMASK_OFFSTACK
|
||||
|
|
|
@ -39,9 +39,9 @@
|
|||
* lookup is necessary.
|
||||
*
|
||||
* No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS |
|
||||
* " plus space for last_nid: | NODE | ZONE | LAST_NID ... | FLAGS |
|
||||
* " plus space for last_cpupid: | NODE | ZONE | LAST_CPUPID ... | FLAGS |
|
||||
* classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS |
|
||||
* " plus space for last_nid: | SECTION | NODE | ZONE | LAST_NID ... | FLAGS |
|
||||
* " plus space for last_cpupid: | SECTION | NODE | ZONE | LAST_CPUPID ... | FLAGS |
|
||||
* classic sparse no space for node: | SECTION | ZONE | ... | FLAGS |
|
||||
*/
|
||||
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
|
||||
|
@ -62,15 +62,21 @@
|
|||
#endif
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
#define LAST_NID_SHIFT NODES_SHIFT
|
||||
#define LAST__PID_SHIFT 8
|
||||
#define LAST__PID_MASK ((1 << LAST__PID_SHIFT)-1)
|
||||
|
||||
#define LAST__CPU_SHIFT NR_CPUS_BITS
|
||||
#define LAST__CPU_MASK ((1 << LAST__CPU_SHIFT)-1)
|
||||
|
||||
#define LAST_CPUPID_SHIFT (LAST__PID_SHIFT+LAST__CPU_SHIFT)
|
||||
#else
|
||||
#define LAST_NID_SHIFT 0
|
||||
#define LAST_CPUPID_SHIFT 0
|
||||
#endif
|
||||
|
||||
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_NID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
|
||||
#define LAST_NID_WIDTH LAST_NID_SHIFT
|
||||
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
|
||||
#define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
|
||||
#else
|
||||
#define LAST_NID_WIDTH 0
|
||||
#define LAST_CPUPID_WIDTH 0
|
||||
#endif
|
||||
|
||||
/*
|
||||
|
@ -81,8 +87,8 @@
|
|||
#define NODE_NOT_IN_PAGE_FLAGS
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_NUMA_BALANCING) && LAST_NID_WIDTH == 0
|
||||
#define LAST_NID_NOT_IN_PAGE_FLAGS
|
||||
#if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
|
||||
#define LAST_CPUPID_NOT_IN_PAGE_FLAGS
|
||||
#endif
|
||||
|
||||
#endif /* _LINUX_PAGE_FLAGS_LAYOUT */
|
||||
|
|
|
@ -6,106 +6,95 @@
|
|||
* preempt_count (used for kernel preemption, interrupt count, etc.)
|
||||
*/
|
||||
|
||||
#include <linux/thread_info.h>
|
||||
#include <linux/linkage.h>
|
||||
#include <linux/list.h>
|
||||
|
||||
/*
|
||||
* We use the MSB mostly because its available; see <linux/preempt_mask.h> for
|
||||
* the other bits -- can't include that header due to inclusion hell.
|
||||
*/
|
||||
#define PREEMPT_NEED_RESCHED 0x80000000
|
||||
|
||||
#include <asm/preempt.h>
|
||||
|
||||
#if defined(CONFIG_DEBUG_PREEMPT) || defined(CONFIG_PREEMPT_TRACER)
|
||||
extern void add_preempt_count(int val);
|
||||
extern void sub_preempt_count(int val);
|
||||
extern void preempt_count_add(int val);
|
||||
extern void preempt_count_sub(int val);
|
||||
#define preempt_count_dec_and_test() ({ preempt_count_sub(1); should_resched(); })
|
||||
#else
|
||||
# define add_preempt_count(val) do { preempt_count() += (val); } while (0)
|
||||
# define sub_preempt_count(val) do { preempt_count() -= (val); } while (0)
|
||||
#define preempt_count_add(val) __preempt_count_add(val)
|
||||
#define preempt_count_sub(val) __preempt_count_sub(val)
|
||||
#define preempt_count_dec_and_test() __preempt_count_dec_and_test()
|
||||
#endif
|
||||
|
||||
#define inc_preempt_count() add_preempt_count(1)
|
||||
#define dec_preempt_count() sub_preempt_count(1)
|
||||
|
||||
#define preempt_count() (current_thread_info()->preempt_count)
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
|
||||
asmlinkage void preempt_schedule(void);
|
||||
|
||||
#define preempt_check_resched() \
|
||||
do { \
|
||||
if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
|
||||
preempt_schedule(); \
|
||||
} while (0)
|
||||
|
||||
#ifdef CONFIG_CONTEXT_TRACKING
|
||||
|
||||
void preempt_schedule_context(void);
|
||||
|
||||
#define preempt_check_resched_context() \
|
||||
do { \
|
||||
if (unlikely(test_thread_flag(TIF_NEED_RESCHED))) \
|
||||
preempt_schedule_context(); \
|
||||
} while (0)
|
||||
#else
|
||||
|
||||
#define preempt_check_resched_context() preempt_check_resched()
|
||||
|
||||
#endif /* CONFIG_CONTEXT_TRACKING */
|
||||
|
||||
#else /* !CONFIG_PREEMPT */
|
||||
|
||||
#define preempt_check_resched() do { } while (0)
|
||||
#define preempt_check_resched_context() do { } while (0)
|
||||
|
||||
#endif /* CONFIG_PREEMPT */
|
||||
#define __preempt_count_inc() __preempt_count_add(1)
|
||||
#define __preempt_count_dec() __preempt_count_sub(1)
|
||||
|
||||
#define preempt_count_inc() preempt_count_add(1)
|
||||
#define preempt_count_dec() preempt_count_sub(1)
|
||||
|
||||
#ifdef CONFIG_PREEMPT_COUNT
|
||||
|
||||
#define preempt_disable() \
|
||||
do { \
|
||||
inc_preempt_count(); \
|
||||
preempt_count_inc(); \
|
||||
barrier(); \
|
||||
} while (0)
|
||||
|
||||
#define sched_preempt_enable_no_resched() \
|
||||
do { \
|
||||
barrier(); \
|
||||
dec_preempt_count(); \
|
||||
preempt_count_dec(); \
|
||||
} while (0)
|
||||
|
||||
#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
|
||||
|
||||
#ifdef CONFIG_PREEMPT
|
||||
#define preempt_enable() \
|
||||
do { \
|
||||
preempt_enable_no_resched(); \
|
||||
barrier(); \
|
||||
preempt_check_resched(); \
|
||||
if (unlikely(preempt_count_dec_and_test())) \
|
||||
__preempt_schedule(); \
|
||||
} while (0)
|
||||
|
||||
/* For debugging and tracer internals only! */
|
||||
#define add_preempt_count_notrace(val) \
|
||||
do { preempt_count() += (val); } while (0)
|
||||
#define sub_preempt_count_notrace(val) \
|
||||
do { preempt_count() -= (val); } while (0)
|
||||
#define inc_preempt_count_notrace() add_preempt_count_notrace(1)
|
||||
#define dec_preempt_count_notrace() sub_preempt_count_notrace(1)
|
||||
#define preempt_check_resched() \
|
||||
do { \
|
||||
if (should_resched()) \
|
||||
__preempt_schedule(); \
|
||||
} while (0)
|
||||
|
||||
#else
|
||||
#define preempt_enable() preempt_enable_no_resched()
|
||||
#define preempt_check_resched() do { } while (0)
|
||||
#endif
|
||||
|
||||
#define preempt_disable_notrace() \
|
||||
do { \
|
||||
inc_preempt_count_notrace(); \
|
||||
__preempt_count_inc(); \
|
||||
barrier(); \
|
||||
} while (0)
|
||||
|
||||
#define preempt_enable_no_resched_notrace() \
|
||||
do { \
|
||||
barrier(); \
|
||||
dec_preempt_count_notrace(); \
|
||||
__preempt_count_dec(); \
|
||||
} while (0)
|
||||
|
||||
/* preempt_check_resched is OK to trace */
|
||||
#ifdef CONFIG_PREEMPT
|
||||
|
||||
#ifndef CONFIG_CONTEXT_TRACKING
|
||||
#define __preempt_schedule_context() __preempt_schedule()
|
||||
#endif
|
||||
|
||||
#define preempt_enable_notrace() \
|
||||
do { \
|
||||
preempt_enable_no_resched_notrace(); \
|
||||
barrier(); \
|
||||
preempt_check_resched_context(); \
|
||||
if (unlikely(__preempt_count_dec_and_test())) \
|
||||
__preempt_schedule_context(); \
|
||||
} while (0)
|
||||
#else
|
||||
#define preempt_enable_notrace() preempt_enable_no_resched_notrace()
|
||||
#endif
|
||||
|
||||
#else /* !CONFIG_PREEMPT_COUNT */
|
||||
|
||||
|
@ -119,6 +108,7 @@ do { \
|
|||
#define sched_preempt_enable_no_resched() barrier()
|
||||
#define preempt_enable_no_resched() barrier()
|
||||
#define preempt_enable() barrier()
|
||||
#define preempt_check_resched() do { } while (0)
|
||||
|
||||
#define preempt_disable_notrace() barrier()
|
||||
#define preempt_enable_no_resched_notrace() barrier()
|
||||
|
|
|
@ -22,6 +22,7 @@ struct sched_param {
|
|||
#include <linux/errno.h>
|
||||
#include <linux/nodemask.h>
|
||||
#include <linux/mm_types.h>
|
||||
#include <linux/preempt.h>
|
||||
|
||||
#include <asm/page.h>
|
||||
#include <asm/ptrace.h>
|
||||
|
@ -427,6 +428,14 @@ struct task_cputime {
|
|||
.sum_exec_runtime = 0, \
|
||||
}
|
||||
|
||||
#define PREEMPT_ENABLED (PREEMPT_NEED_RESCHED)
|
||||
|
||||
#ifdef CONFIG_PREEMPT_COUNT
|
||||
#define PREEMPT_DISABLED (1 + PREEMPT_ENABLED)
|
||||
#else
|
||||
#define PREEMPT_DISABLED PREEMPT_ENABLED
|
||||
#endif
|
||||
|
||||
/*
|
||||
* Disable preemption until the scheduler is running.
|
||||
* Reset by start_kernel()->sched_init()->init_idle().
|
||||
|
@ -434,7 +443,7 @@ struct task_cputime {
|
|||
* We include PREEMPT_ACTIVE to avoid cond_resched() from working
|
||||
* before the scheduler is active -- see should_resched().
|
||||
*/
|
||||
#define INIT_PREEMPT_COUNT (1 + PREEMPT_ACTIVE)
|
||||
#define INIT_PREEMPT_COUNT (PREEMPT_DISABLED + PREEMPT_ACTIVE)
|
||||
|
||||
/**
|
||||
* struct thread_group_cputimer - thread group interval timer counts
|
||||
|
@ -768,6 +777,7 @@ enum cpu_idle_type {
|
|||
#define SD_ASYM_PACKING 0x0800 /* Place busy groups earlier in the domain */
|
||||
#define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */
|
||||
#define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */
|
||||
#define SD_NUMA 0x4000 /* cross-node balancing */
|
||||
|
||||
extern int __weak arch_sd_sibiling_asym_packing(void);
|
||||
|
||||
|
@ -811,6 +821,10 @@ struct sched_domain {
|
|||
|
||||
u64 last_update;
|
||||
|
||||
/* idle_balance() stats */
|
||||
u64 max_newidle_lb_cost;
|
||||
unsigned long next_decay_max_lb_cost;
|
||||
|
||||
#ifdef CONFIG_SCHEDSTATS
|
||||
/* load_balance() stats */
|
||||
unsigned int lb_count[CPU_MAX_IDLE_TYPES];
|
||||
|
@ -1029,6 +1043,8 @@ struct task_struct {
|
|||
struct task_struct *last_wakee;
|
||||
unsigned long wakee_flips;
|
||||
unsigned long wakee_flip_decay_ts;
|
||||
|
||||
int wake_cpu;
|
||||
#endif
|
||||
int on_rq;
|
||||
|
||||
|
@ -1324,10 +1340,41 @@ struct task_struct {
|
|||
#endif
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
int numa_scan_seq;
|
||||
int numa_migrate_seq;
|
||||
unsigned int numa_scan_period;
|
||||
unsigned int numa_scan_period_max;
|
||||
int numa_preferred_nid;
|
||||
int numa_migrate_deferred;
|
||||
unsigned long numa_migrate_retry;
|
||||
u64 node_stamp; /* migration stamp */
|
||||
struct callback_head numa_work;
|
||||
|
||||
struct list_head numa_entry;
|
||||
struct numa_group *numa_group;
|
||||
|
||||
/*
|
||||
* Exponential decaying average of faults on a per-node basis.
|
||||
* Scheduling placement decisions are made based on the these counts.
|
||||
* The values remain static for the duration of a PTE scan
|
||||
*/
|
||||
unsigned long *numa_faults;
|
||||
unsigned long total_numa_faults;
|
||||
|
||||
/*
|
||||
* numa_faults_buffer records faults per node during the current
|
||||
* scan window. When the scan completes, the counts in numa_faults
|
||||
* decay and these values are copied.
|
||||
*/
|
||||
unsigned long *numa_faults_buffer;
|
||||
|
||||
/*
|
||||
* numa_faults_locality tracks if faults recorded during the last
|
||||
* scan window were remote/local. The task scan period is adapted
|
||||
* based on the locality of the faults with different weights
|
||||
* depending on whether they were shared or private faults
|
||||
*/
|
||||
unsigned long numa_faults_locality[2];
|
||||
|
||||
unsigned long numa_pages_migrated;
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
|
||||
struct rcu_head rcu;
|
||||
|
@ -1412,16 +1459,33 @@ struct task_struct {
|
|||
/* Future-safe accessor for struct task_struct's cpus_allowed. */
|
||||
#define tsk_cpus_allowed(tsk) (&(tsk)->cpus_allowed)
|
||||
|
||||
#define TNF_MIGRATED 0x01
|
||||
#define TNF_NO_GROUP 0x02
|
||||
#define TNF_SHARED 0x04
|
||||
#define TNF_FAULT_LOCAL 0x08
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
extern void task_numa_fault(int node, int pages, bool migrated);
|
||||
extern void task_numa_fault(int last_node, int node, int pages, int flags);
|
||||
extern pid_t task_numa_group_id(struct task_struct *p);
|
||||
extern void set_numabalancing_state(bool enabled);
|
||||
extern void task_numa_free(struct task_struct *p);
|
||||
|
||||
extern unsigned int sysctl_numa_balancing_migrate_deferred;
|
||||
#else
|
||||
static inline void task_numa_fault(int node, int pages, bool migrated)
|
||||
static inline void task_numa_fault(int last_node, int node, int pages,
|
||||
int flags)
|
||||
{
|
||||
}
|
||||
static inline pid_t task_numa_group_id(struct task_struct *p)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
static inline void set_numabalancing_state(bool enabled)
|
||||
{
|
||||
}
|
||||
static inline void task_numa_free(struct task_struct *p)
|
||||
{
|
||||
}
|
||||
#endif
|
||||
|
||||
static inline struct pid *task_pid(struct task_struct *task)
|
||||
|
@ -1974,7 +2038,7 @@ extern void wake_up_new_task(struct task_struct *tsk);
|
|||
#else
|
||||
static inline void kick_process(struct task_struct *tsk) { }
|
||||
#endif
|
||||
extern void sched_fork(struct task_struct *p);
|
||||
extern void sched_fork(unsigned long clone_flags, struct task_struct *p);
|
||||
extern void sched_dead(struct task_struct *p);
|
||||
|
||||
extern void proc_caches_init(void);
|
||||
|
@ -2401,11 +2465,6 @@ static inline int signal_pending_state(long state, struct task_struct *p)
|
|||
return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
|
||||
}
|
||||
|
||||
static inline int need_resched(void)
|
||||
{
|
||||
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
|
||||
}
|
||||
|
||||
/*
|
||||
* cond_resched() and cond_resched_lock(): latency reduction via
|
||||
* explicit rescheduling in places that are safe. The return
|
||||
|
@ -2474,36 +2533,105 @@ static inline int tsk_is_polling(struct task_struct *p)
|
|||
{
|
||||
return task_thread_info(p)->status & TS_POLLING;
|
||||
}
|
||||
static inline void current_set_polling(void)
|
||||
static inline void __current_set_polling(void)
|
||||
{
|
||||
current_thread_info()->status |= TS_POLLING;
|
||||
}
|
||||
|
||||
static inline void current_clr_polling(void)
|
||||
static inline bool __must_check current_set_polling_and_test(void)
|
||||
{
|
||||
__current_set_polling();
|
||||
|
||||
/*
|
||||
* Polling state must be visible before we test NEED_RESCHED,
|
||||
* paired by resched_task()
|
||||
*/
|
||||
smp_mb();
|
||||
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
|
||||
static inline void __current_clr_polling(void)
|
||||
{
|
||||
current_thread_info()->status &= ~TS_POLLING;
|
||||
smp_mb__after_clear_bit();
|
||||
}
|
||||
|
||||
static inline bool __must_check current_clr_polling_and_test(void)
|
||||
{
|
||||
__current_clr_polling();
|
||||
|
||||
/*
|
||||
* Polling state must be visible before we test NEED_RESCHED,
|
||||
* paired by resched_task()
|
||||
*/
|
||||
smp_mb();
|
||||
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
#elif defined(TIF_POLLING_NRFLAG)
|
||||
static inline int tsk_is_polling(struct task_struct *p)
|
||||
{
|
||||
return test_tsk_thread_flag(p, TIF_POLLING_NRFLAG);
|
||||
}
|
||||
static inline void current_set_polling(void)
|
||||
|
||||
static inline void __current_set_polling(void)
|
||||
{
|
||||
set_thread_flag(TIF_POLLING_NRFLAG);
|
||||
}
|
||||
|
||||
static inline void current_clr_polling(void)
|
||||
static inline bool __must_check current_set_polling_and_test(void)
|
||||
{
|
||||
__current_set_polling();
|
||||
|
||||
/*
|
||||
* Polling state must be visible before we test NEED_RESCHED,
|
||||
* paired by resched_task()
|
||||
*
|
||||
* XXX: assumes set/clear bit are identical barrier wise.
|
||||
*/
|
||||
smp_mb__after_clear_bit();
|
||||
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
|
||||
static inline void __current_clr_polling(void)
|
||||
{
|
||||
clear_thread_flag(TIF_POLLING_NRFLAG);
|
||||
}
|
||||
|
||||
static inline bool __must_check current_clr_polling_and_test(void)
|
||||
{
|
||||
__current_clr_polling();
|
||||
|
||||
/*
|
||||
* Polling state must be visible before we test NEED_RESCHED,
|
||||
* paired by resched_task()
|
||||
*/
|
||||
smp_mb__after_clear_bit();
|
||||
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
|
||||
#else
|
||||
static inline int tsk_is_polling(struct task_struct *p) { return 0; }
|
||||
static inline void current_set_polling(void) { }
|
||||
static inline void current_clr_polling(void) { }
|
||||
static inline void __current_set_polling(void) { }
|
||||
static inline void __current_clr_polling(void) { }
|
||||
|
||||
static inline bool __must_check current_set_polling_and_test(void)
|
||||
{
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
static inline bool __must_check current_clr_polling_and_test(void)
|
||||
{
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
#endif
|
||||
|
||||
static __always_inline bool need_resched(void)
|
||||
{
|
||||
return unlikely(tif_need_resched());
|
||||
}
|
||||
|
||||
/*
|
||||
* Thread group CPU time accounting.
|
||||
*/
|
||||
|
@ -2545,6 +2673,11 @@ static inline unsigned int task_cpu(const struct task_struct *p)
|
|||
return task_thread_info(p)->cpu;
|
||||
}
|
||||
|
||||
static inline int task_node(const struct task_struct *p)
|
||||
{
|
||||
return cpu_to_node(task_cpu(p));
|
||||
}
|
||||
|
||||
extern void set_task_cpu(struct task_struct *p, unsigned int cpu);
|
||||
|
||||
#else
|
||||
|
|
|
@ -47,7 +47,6 @@ extern enum sched_tunable_scaling sysctl_sched_tunable_scaling;
|
|||
extern unsigned int sysctl_numa_balancing_scan_delay;
|
||||
extern unsigned int sysctl_numa_balancing_scan_period_min;
|
||||
extern unsigned int sysctl_numa_balancing_scan_period_max;
|
||||
extern unsigned int sysctl_numa_balancing_scan_period_reset;
|
||||
extern unsigned int sysctl_numa_balancing_scan_size;
|
||||
extern unsigned int sysctl_numa_balancing_settle_count;
|
||||
|
||||
|
|
|
@ -28,6 +28,7 @@ struct cpu_stop_work {
|
|||
};
|
||||
|
||||
int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg);
|
||||
int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg);
|
||||
void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
|
||||
struct cpu_stop_work *work_buf);
|
||||
int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
|
||||
|
|
|
@ -104,8 +104,21 @@ static inline int test_ti_thread_flag(struct thread_info *ti, int flag)
|
|||
#define test_thread_flag(flag) \
|
||||
test_ti_thread_flag(current_thread_info(), flag)
|
||||
|
||||
#define set_need_resched() set_thread_flag(TIF_NEED_RESCHED)
|
||||
#define clear_need_resched() clear_thread_flag(TIF_NEED_RESCHED)
|
||||
static inline __deprecated void set_need_resched(void)
|
||||
{
|
||||
/*
|
||||
* Use of this function in deprecated.
|
||||
*
|
||||
* As of this writing there are only a few users in the DRM tree left
|
||||
* all of which are wrong and can be removed without causing too much
|
||||
* grief.
|
||||
*
|
||||
* The DRM people are aware and are working on removing the last few
|
||||
* instances.
|
||||
*/
|
||||
}
|
||||
|
||||
#define tif_need_resched() test_thread_flag(TIF_NEED_RESCHED)
|
||||
|
||||
#if defined TIF_RESTORE_SIGMASK && !defined HAVE_SET_RESTORE_SIGMASK
|
||||
/*
|
||||
|
|
|
@ -106,6 +106,8 @@ int arch_update_cpu_topology(void);
|
|||
.last_balance = jiffies, \
|
||||
.balance_interval = 1, \
|
||||
.smt_gain = 1178, /* 15% */ \
|
||||
.max_newidle_lb_cost = 0, \
|
||||
.next_decay_max_lb_cost = jiffies, \
|
||||
}
|
||||
#endif
|
||||
#endif /* CONFIG_SCHED_SMT */
|
||||
|
@ -135,6 +137,8 @@ int arch_update_cpu_topology(void);
|
|||
, \
|
||||
.last_balance = jiffies, \
|
||||
.balance_interval = 1, \
|
||||
.max_newidle_lb_cost = 0, \
|
||||
.next_decay_max_lb_cost = jiffies, \
|
||||
}
|
||||
#endif
|
||||
#endif /* CONFIG_SCHED_MC */
|
||||
|
@ -166,6 +170,8 @@ int arch_update_cpu_topology(void);
|
|||
, \
|
||||
.last_balance = jiffies, \
|
||||
.balance_interval = 1, \
|
||||
.max_newidle_lb_cost = 0, \
|
||||
.next_decay_max_lb_cost = jiffies, \
|
||||
}
|
||||
#endif
|
||||
|
||||
|
|
|
@ -671,31 +671,17 @@ static inline void tty_wait_until_sent_from_close(struct tty_struct *tty,
|
|||
#define wait_event_interruptible_tty(tty, wq, condition) \
|
||||
({ \
|
||||
int __ret = 0; \
|
||||
if (!(condition)) { \
|
||||
__wait_event_interruptible_tty(tty, wq, condition, __ret); \
|
||||
} \
|
||||
if (!(condition)) \
|
||||
__ret = __wait_event_interruptible_tty(tty, wq, \
|
||||
condition); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_interruptible_tty(tty, wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (!signal_pending(current)) { \
|
||||
#define __wait_event_interruptible_tty(tty, wq, condition) \
|
||||
___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
|
||||
tty_unlock(tty); \
|
||||
schedule(); \
|
||||
tty_lock(tty); \
|
||||
continue; \
|
||||
} \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
tty_lock(tty))
|
||||
|
||||
#ifdef CONFIG_PROC_FS
|
||||
extern void proc_tty_register_driver(struct tty_driver *);
|
||||
|
|
|
@ -15,7 +15,7 @@
|
|||
*/
|
||||
static inline void pagefault_disable(void)
|
||||
{
|
||||
inc_preempt_count();
|
||||
preempt_count_inc();
|
||||
/*
|
||||
* make sure to have issued the store before a pagefault
|
||||
* can hit.
|
||||
|
@ -30,11 +30,7 @@ static inline void pagefault_enable(void)
|
|||
* the pagefault handler again.
|
||||
*/
|
||||
barrier();
|
||||
dec_preempt_count();
|
||||
/*
|
||||
* make sure we do..
|
||||
*/
|
||||
barrier();
|
||||
preempt_count_dec();
|
||||
preempt_check_resched();
|
||||
}
|
||||
|
||||
|
|
|
@ -1,7 +1,8 @@
|
|||
#ifndef _LINUX_WAIT_H
|
||||
#define _LINUX_WAIT_H
|
||||
|
||||
|
||||
/*
|
||||
* Linux wait queue related types and methods
|
||||
*/
|
||||
#include <linux/list.h>
|
||||
#include <linux/stddef.h>
|
||||
#include <linux/spinlock.h>
|
||||
|
@ -89,8 +90,8 @@ static inline void init_waitqueue_entry(wait_queue_t *q, struct task_struct *p)
|
|||
q->func = default_wake_function;
|
||||
}
|
||||
|
||||
static inline void init_waitqueue_func_entry(wait_queue_t *q,
|
||||
wait_queue_func_t func)
|
||||
static inline void
|
||||
init_waitqueue_func_entry(wait_queue_t *q, wait_queue_func_t func)
|
||||
{
|
||||
q->flags = 0;
|
||||
q->private = NULL;
|
||||
|
@ -114,8 +115,8 @@ static inline void __add_wait_queue(wait_queue_head_t *head, wait_queue_t *new)
|
|||
/*
|
||||
* Used for wake-one threads:
|
||||
*/
|
||||
static inline void __add_wait_queue_exclusive(wait_queue_head_t *q,
|
||||
wait_queue_t *wait)
|
||||
static inline void
|
||||
__add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
|
||||
{
|
||||
wait->flags |= WQ_FLAG_EXCLUSIVE;
|
||||
__add_wait_queue(q, wait);
|
||||
|
@ -127,23 +128,22 @@ static inline void __add_wait_queue_tail(wait_queue_head_t *head,
|
|||
list_add_tail(&new->task_list, &head->task_list);
|
||||
}
|
||||
|
||||
static inline void __add_wait_queue_tail_exclusive(wait_queue_head_t *q,
|
||||
wait_queue_t *wait)
|
||||
static inline void
|
||||
__add_wait_queue_tail_exclusive(wait_queue_head_t *q, wait_queue_t *wait)
|
||||
{
|
||||
wait->flags |= WQ_FLAG_EXCLUSIVE;
|
||||
__add_wait_queue_tail(q, wait);
|
||||
}
|
||||
|
||||
static inline void __remove_wait_queue(wait_queue_head_t *head,
|
||||
wait_queue_t *old)
|
||||
static inline void
|
||||
__remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
|
||||
{
|
||||
list_del(&old->task_list);
|
||||
}
|
||||
|
||||
void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
|
||||
void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key);
|
||||
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr,
|
||||
void *key);
|
||||
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
|
||||
void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr);
|
||||
void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr);
|
||||
void __wake_up_bit(wait_queue_head_t *, void *, int);
|
||||
|
@ -179,18 +179,55 @@ wait_queue_head_t *bit_waitqueue(void *, int);
|
|||
#define wake_up_interruptible_sync_poll(x, m) \
|
||||
__wake_up_sync_key((x), TASK_INTERRUPTIBLE, 1, (void *) (m))
|
||||
|
||||
#define __wait_event(wq, condition) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
#define ___wait_cond_timeout(condition) \
|
||||
({ \
|
||||
bool __cond = (condition); \
|
||||
if (__cond && !__ret) \
|
||||
__ret = 1; \
|
||||
__cond || !__ret; \
|
||||
})
|
||||
|
||||
#define ___wait_is_interruptible(state) \
|
||||
(!__builtin_constant_p(state) || \
|
||||
state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \
|
||||
|
||||
#define ___wait_event(wq, condition, state, exclusive, ret, cmd) \
|
||||
({ \
|
||||
__label__ __out; \
|
||||
wait_queue_t __wait; \
|
||||
long __ret = ret; \
|
||||
\
|
||||
INIT_LIST_HEAD(&__wait.task_list); \
|
||||
if (exclusive) \
|
||||
__wait.flags = WQ_FLAG_EXCLUSIVE; \
|
||||
else \
|
||||
__wait.flags = 0; \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
|
||||
long __int = prepare_to_wait_event(&wq, &__wait, state);\
|
||||
\
|
||||
if (condition) \
|
||||
break; \
|
||||
schedule(); \
|
||||
\
|
||||
if (___wait_is_interruptible(state) && __int) { \
|
||||
__ret = __int; \
|
||||
if (exclusive) { \
|
||||
abort_exclusive_wait(&wq, &__wait, \
|
||||
state, NULL); \
|
||||
goto __out; \
|
||||
} \
|
||||
break; \
|
||||
} \
|
||||
\
|
||||
cmd; \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
__out: __ret; \
|
||||
})
|
||||
|
||||
#define __wait_event(wq, condition) \
|
||||
(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
|
||||
schedule())
|
||||
|
||||
/**
|
||||
* wait_event - sleep until a condition gets true
|
||||
|
@ -211,22 +248,10 @@ do { \
|
|||
__wait_event(wq, condition); \
|
||||
} while (0)
|
||||
|
||||
#define __wait_event_timeout(wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
ret = schedule_timeout(ret); \
|
||||
if (!ret) \
|
||||
break; \
|
||||
} \
|
||||
if (!ret && (condition)) \
|
||||
ret = 1; \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
#define __wait_event_timeout(wq, condition, timeout) \
|
||||
___wait_event(wq, ___wait_cond_timeout(condition), \
|
||||
TASK_UNINTERRUPTIBLE, 0, timeout, \
|
||||
__ret = schedule_timeout(__ret))
|
||||
|
||||
/**
|
||||
* wait_event_timeout - sleep until a condition gets true or a timeout elapses
|
||||
|
@ -248,28 +273,14 @@ do { \
|
|||
#define wait_event_timeout(wq, condition, timeout) \
|
||||
({ \
|
||||
long __ret = timeout; \
|
||||
if (!(condition)) \
|
||||
__wait_event_timeout(wq, condition, __ret); \
|
||||
if (!___wait_cond_timeout(condition)) \
|
||||
__ret = __wait_event_timeout(wq, condition, timeout); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_interruptible(wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (!signal_pending(current)) { \
|
||||
schedule(); \
|
||||
continue; \
|
||||
} \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
#define __wait_event_interruptible(wq, condition) \
|
||||
___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
|
||||
schedule())
|
||||
|
||||
/**
|
||||
* wait_event_interruptible - sleep until a condition gets true
|
||||
|
@ -290,31 +301,14 @@ do { \
|
|||
({ \
|
||||
int __ret = 0; \
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible(wq, condition, __ret); \
|
||||
__ret = __wait_event_interruptible(wq, condition); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_interruptible_timeout(wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (!signal_pending(current)) { \
|
||||
ret = schedule_timeout(ret); \
|
||||
if (!ret) \
|
||||
break; \
|
||||
continue; \
|
||||
} \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
if (!ret && (condition)) \
|
||||
ret = 1; \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
#define __wait_event_interruptible_timeout(wq, condition, timeout) \
|
||||
___wait_event(wq, ___wait_cond_timeout(condition), \
|
||||
TASK_INTERRUPTIBLE, 0, timeout, \
|
||||
__ret = schedule_timeout(__ret))
|
||||
|
||||
/**
|
||||
* wait_event_interruptible_timeout - sleep until a condition gets true or a timeout elapses
|
||||
|
@ -337,15 +331,15 @@ do { \
|
|||
#define wait_event_interruptible_timeout(wq, condition, timeout) \
|
||||
({ \
|
||||
long __ret = timeout; \
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible_timeout(wq, condition, __ret); \
|
||||
if (!___wait_cond_timeout(condition)) \
|
||||
__ret = __wait_event_interruptible_timeout(wq, \
|
||||
condition, timeout); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_hrtimeout(wq, condition, timeout, state) \
|
||||
({ \
|
||||
int __ret = 0; \
|
||||
DEFINE_WAIT(__wait); \
|
||||
struct hrtimer_sleeper __t; \
|
||||
\
|
||||
hrtimer_init_on_stack(&__t.timer, CLOCK_MONOTONIC, \
|
||||
|
@ -356,25 +350,15 @@ do { \
|
|||
current->timer_slack_ns, \
|
||||
HRTIMER_MODE_REL); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, state); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (state == TASK_INTERRUPTIBLE && \
|
||||
signal_pending(current)) { \
|
||||
__ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
__ret = ___wait_event(wq, condition, state, 0, 0, \
|
||||
if (!__t.task) { \
|
||||
__ret = -ETIME; \
|
||||
break; \
|
||||
} \
|
||||
schedule(); \
|
||||
} \
|
||||
schedule()); \
|
||||
\
|
||||
hrtimer_cancel(&__t.timer); \
|
||||
destroy_hrtimer_on_stack(&__t.timer); \
|
||||
finish_wait(&wq, &__wait); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
|
@ -428,33 +412,15 @@ do { \
|
|||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_interruptible_exclusive(wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait_exclusive(&wq, &__wait, \
|
||||
TASK_INTERRUPTIBLE); \
|
||||
if (condition) { \
|
||||
finish_wait(&wq, &__wait); \
|
||||
break; \
|
||||
} \
|
||||
if (!signal_pending(current)) { \
|
||||
schedule(); \
|
||||
continue; \
|
||||
} \
|
||||
ret = -ERESTARTSYS; \
|
||||
abort_exclusive_wait(&wq, &__wait, \
|
||||
TASK_INTERRUPTIBLE, NULL); \
|
||||
break; \
|
||||
} \
|
||||
} while (0)
|
||||
#define __wait_event_interruptible_exclusive(wq, condition) \
|
||||
___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \
|
||||
schedule())
|
||||
|
||||
#define wait_event_interruptible_exclusive(wq, condition) \
|
||||
({ \
|
||||
int __ret = 0; \
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible_exclusive(wq, condition, __ret);\
|
||||
__ret = __wait_event_interruptible_exclusive(wq, condition);\
|
||||
__ret; \
|
||||
})
|
||||
|
||||
|
@ -606,24 +572,8 @@ do { \
|
|||
? 0 : __wait_event_interruptible_locked(wq, condition, 1, 1))
|
||||
|
||||
|
||||
|
||||
#define __wait_event_killable(wq, condition, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_KILLABLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (!fatal_signal_pending(current)) { \
|
||||
schedule(); \
|
||||
continue; \
|
||||
} \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
#define __wait_event_killable(wq, condition) \
|
||||
___wait_event(wq, condition, TASK_KILLABLE, 0, 0, schedule())
|
||||
|
||||
/**
|
||||
* wait_event_killable - sleep until a condition gets true
|
||||
|
@ -644,26 +594,17 @@ do { \
|
|||
({ \
|
||||
int __ret = 0; \
|
||||
if (!(condition)) \
|
||||
__wait_event_killable(wq, condition, __ret); \
|
||||
__ret = __wait_event_killable(wq, condition); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
|
||||
#define __wait_event_lock_irq(wq, condition, lock, cmd) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_UNINTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
(void)___wait_event(wq, condition, TASK_UNINTERRUPTIBLE, 0, 0, \
|
||||
spin_unlock_irq(&lock); \
|
||||
cmd; \
|
||||
schedule(); \
|
||||
spin_lock_irq(&lock); \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
spin_lock_irq(&lock))
|
||||
|
||||
/**
|
||||
* wait_event_lock_irq_cmd - sleep until a condition gets true. The
|
||||
|
@ -723,26 +664,12 @@ do { \
|
|||
} while (0)
|
||||
|
||||
|
||||
#define __wait_event_interruptible_lock_irq(wq, condition, \
|
||||
lock, ret, cmd) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (signal_pending(current)) { \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
#define __wait_event_interruptible_lock_irq(wq, condition, lock, cmd) \
|
||||
___wait_event(wq, condition, TASK_INTERRUPTIBLE, 0, 0, \
|
||||
spin_unlock_irq(&lock); \
|
||||
cmd; \
|
||||
schedule(); \
|
||||
spin_lock_irq(&lock); \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
spin_lock_irq(&lock))
|
||||
|
||||
/**
|
||||
* wait_event_interruptible_lock_irq_cmd - sleep until a condition gets true.
|
||||
|
@ -772,10 +699,9 @@ do { \
|
|||
#define wait_event_interruptible_lock_irq_cmd(wq, condition, lock, cmd) \
|
||||
({ \
|
||||
int __ret = 0; \
|
||||
\
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible_lock_irq(wq, condition, \
|
||||
lock, __ret, cmd); \
|
||||
__ret = __wait_event_interruptible_lock_irq(wq, \
|
||||
condition, lock, cmd); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
|
@ -804,39 +730,24 @@ do { \
|
|||
#define wait_event_interruptible_lock_irq(wq, condition, lock) \
|
||||
({ \
|
||||
int __ret = 0; \
|
||||
\
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible_lock_irq(wq, condition, \
|
||||
lock, __ret, ); \
|
||||
__ret = __wait_event_interruptible_lock_irq(wq, \
|
||||
condition, lock,); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
#define __wait_event_interruptible_lock_irq_timeout(wq, condition, \
|
||||
lock, ret) \
|
||||
do { \
|
||||
DEFINE_WAIT(__wait); \
|
||||
\
|
||||
for (;;) { \
|
||||
prepare_to_wait(&wq, &__wait, TASK_INTERRUPTIBLE); \
|
||||
if (condition) \
|
||||
break; \
|
||||
if (signal_pending(current)) { \
|
||||
ret = -ERESTARTSYS; \
|
||||
break; \
|
||||
} \
|
||||
lock, timeout) \
|
||||
___wait_event(wq, ___wait_cond_timeout(condition), \
|
||||
TASK_INTERRUPTIBLE, 0, timeout, \
|
||||
spin_unlock_irq(&lock); \
|
||||
ret = schedule_timeout(ret); \
|
||||
spin_lock_irq(&lock); \
|
||||
if (!ret) \
|
||||
break; \
|
||||
} \
|
||||
finish_wait(&wq, &__wait); \
|
||||
} while (0)
|
||||
__ret = schedule_timeout(__ret); \
|
||||
spin_lock_irq(&lock));
|
||||
|
||||
/**
|
||||
* wait_event_interruptible_lock_irq_timeout - sleep until a condition gets true or a timeout elapses.
|
||||
* The condition is checked under the lock. This is expected
|
||||
* to be called with the lock taken.
|
||||
* wait_event_interruptible_lock_irq_timeout - sleep until a condition gets
|
||||
* true or a timeout elapses. The condition is checked under
|
||||
* the lock. This is expected to be called with the lock taken.
|
||||
* @wq: the waitqueue to wait on
|
||||
* @condition: a C expression for the event to wait for
|
||||
* @lock: a locked spinlock_t, which will be released before schedule()
|
||||
|
@ -860,11 +771,10 @@ do { \
|
|||
#define wait_event_interruptible_lock_irq_timeout(wq, condition, lock, \
|
||||
timeout) \
|
||||
({ \
|
||||
int __ret = timeout; \
|
||||
\
|
||||
if (!(condition)) \
|
||||
__wait_event_interruptible_lock_irq_timeout( \
|
||||
wq, condition, lock, __ret); \
|
||||
long __ret = timeout; \
|
||||
if (!___wait_cond_timeout(condition)) \
|
||||
__ret = __wait_event_interruptible_lock_irq_timeout( \
|
||||
wq, condition, lock, timeout); \
|
||||
__ret; \
|
||||
})
|
||||
|
||||
|
@ -875,20 +785,18 @@ do { \
|
|||
* We plan to remove these interfaces.
|
||||
*/
|
||||
extern void sleep_on(wait_queue_head_t *q);
|
||||
extern long sleep_on_timeout(wait_queue_head_t *q,
|
||||
signed long timeout);
|
||||
extern long sleep_on_timeout(wait_queue_head_t *q, signed long timeout);
|
||||
extern void interruptible_sleep_on(wait_queue_head_t *q);
|
||||
extern long interruptible_sleep_on_timeout(wait_queue_head_t *q,
|
||||
signed long timeout);
|
||||
extern long interruptible_sleep_on_timeout(wait_queue_head_t *q, signed long timeout);
|
||||
|
||||
/*
|
||||
* Waitqueues which are removed from the waitqueue_head at wakeup time
|
||||
*/
|
||||
void prepare_to_wait(wait_queue_head_t *q, wait_queue_t *wait, int state);
|
||||
void prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state);
|
||||
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state);
|
||||
void finish_wait(wait_queue_head_t *q, wait_queue_t *wait);
|
||||
void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
|
||||
unsigned int mode, void *key);
|
||||
void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, unsigned int mode, void *key);
|
||||
int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
|
||||
int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
|
||||
|
||||
|
@ -934,8 +842,8 @@ int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
|
|||
* One uses wait_on_bit() where one is waiting for the bit to clear,
|
||||
* but has no intention of setting it.
|
||||
*/
|
||||
static inline int wait_on_bit(void *word, int bit,
|
||||
int (*action)(void *), unsigned mode)
|
||||
static inline int
|
||||
wait_on_bit(void *word, int bit, int (*action)(void *), unsigned mode)
|
||||
{
|
||||
if (!test_bit(bit, word))
|
||||
return 0;
|
||||
|
@ -958,8 +866,8 @@ static inline int wait_on_bit(void *word, int bit,
|
|||
* One uses wait_on_bit_lock() where one is waiting for the bit to
|
||||
* clear with the intention of setting it, and when done, clearing it.
|
||||
*/
|
||||
static inline int wait_on_bit_lock(void *word, int bit,
|
||||
int (*action)(void *), unsigned mode)
|
||||
static inline int
|
||||
wait_on_bit_lock(void *word, int bit, int (*action)(void *), unsigned mode)
|
||||
{
|
||||
if (!test_and_set_bit(bit, word))
|
||||
return 0;
|
||||
|
@ -984,4 +892,4 @@ int wait_on_atomic_t(atomic_t *val, int (*action)(atomic_t *), unsigned mode)
|
|||
return out_of_line_wait_on_atomic_t(val, action, mode);
|
||||
}
|
||||
|
||||
#endif
|
||||
#endif /* _LINUX_WAIT_H */
|
||||
|
|
|
@ -100,7 +100,7 @@ static inline long __trace_sched_switch_state(struct task_struct *p)
|
|||
/*
|
||||
* For all intents and purposes a preempted task is a running task.
|
||||
*/
|
||||
if (task_thread_info(p)->preempt_count & PREEMPT_ACTIVE)
|
||||
if (task_preempt_count(p) & PREEMPT_ACTIVE)
|
||||
state = TASK_RUNNING | TASK_STATE_MAX;
|
||||
#endif
|
||||
|
||||
|
|
|
@ -693,7 +693,7 @@ int __init_or_module do_one_initcall(initcall_t fn)
|
|||
|
||||
if (preempt_count() != count) {
|
||||
sprintf(msgbuf, "preemption imbalance ");
|
||||
preempt_count() = count;
|
||||
preempt_count_set(count);
|
||||
}
|
||||
if (irqs_disabled()) {
|
||||
strlcat(msgbuf, "disabled interrupts ", sizeof(msgbuf));
|
||||
|
|
|
@ -7,7 +7,7 @@ obj-y = fork.o exec_domain.o panic.o \
|
|||
sysctl.o sysctl_binary.o capability.o ptrace.o timer.o user.o \
|
||||
signal.o sys.o kmod.o workqueue.o pid.o task_work.o \
|
||||
extable.o params.o posix-timers.o \
|
||||
kthread.o wait.o sys_ni.o posix-cpu-timers.o mutex.o \
|
||||
kthread.o sys_ni.o posix-cpu-timers.o mutex.o \
|
||||
hrtimer.o rwsem.o nsproxy.o semaphore.o \
|
||||
notifier.o ksysfs.o cred.o reboot.o \
|
||||
async.o range.o groups.o lglock.o smpboot.o
|
||||
|
|
|
@ -10,6 +10,7 @@
|
|||
#include <linux/mmzone.h>
|
||||
#include <linux/kbuild.h>
|
||||
#include <linux/page_cgroup.h>
|
||||
#include <linux/log2.h>
|
||||
|
||||
void foo(void)
|
||||
{
|
||||
|
@ -17,5 +18,8 @@ void foo(void)
|
|||
DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS);
|
||||
DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES);
|
||||
DEFINE(NR_PCG_FLAGS, __NR_PCG_FLAGS);
|
||||
#ifdef CONFIG_SMP
|
||||
DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
|
||||
#endif
|
||||
/* End of constants */
|
||||
}
|
||||
|
|
|
@ -120,7 +120,7 @@ void context_tracking_user_enter(void)
|
|||
* instead of preempt_schedule() to exit user context if needed before
|
||||
* calling the scheduler.
|
||||
*/
|
||||
void __sched notrace preempt_schedule_context(void)
|
||||
asmlinkage void __sched notrace preempt_schedule_context(void)
|
||||
{
|
||||
enum ctx_state prev_ctx;
|
||||
|
||||
|
|
17
kernel/cpu.c
17
kernel/cpu.c
|
@ -308,6 +308,23 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen)
|
|||
}
|
||||
smpboot_park_threads(cpu);
|
||||
|
||||
/*
|
||||
* By now we've cleared cpu_active_mask, wait for all preempt-disabled
|
||||
* and RCU users of this state to go away such that all new such users
|
||||
* will observe it.
|
||||
*
|
||||
* For CONFIG_PREEMPT we have preemptible RCU and its sync_rcu() might
|
||||
* not imply sync_sched(), so explicitly call both.
|
||||
*/
|
||||
#ifdef CONFIG_PREEMPT
|
||||
synchronize_sched();
|
||||
#endif
|
||||
synchronize_rcu();
|
||||
|
||||
/*
|
||||
* So now all preempt/rcu users must observe !cpu_active().
|
||||
*/
|
||||
|
||||
err = __stop_machine(take_cpu_down, &tcd_param, cpumask_of(cpu));
|
||||
if (err) {
|
||||
/* CPU didn't die: tell everyone. Can't complain. */
|
||||
|
|
|
@ -44,7 +44,7 @@ static inline int cpu_idle_poll(void)
|
|||
rcu_idle_enter();
|
||||
trace_cpu_idle_rcuidle(0, smp_processor_id());
|
||||
local_irq_enable();
|
||||
while (!need_resched())
|
||||
while (!tif_need_resched())
|
||||
cpu_relax();
|
||||
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
|
||||
rcu_idle_exit();
|
||||
|
@ -92,8 +92,7 @@ static void cpu_idle_loop(void)
|
|||
if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
|
||||
cpu_idle_poll();
|
||||
} else {
|
||||
current_clr_polling();
|
||||
if (!need_resched()) {
|
||||
if (!current_clr_polling_and_test()) {
|
||||
stop_critical_timings();
|
||||
rcu_idle_enter();
|
||||
arch_cpu_idle();
|
||||
|
@ -103,9 +102,16 @@ static void cpu_idle_loop(void)
|
|||
} else {
|
||||
local_irq_enable();
|
||||
}
|
||||
current_set_polling();
|
||||
__current_set_polling();
|
||||
}
|
||||
arch_cpu_idle_exit();
|
||||
/*
|
||||
* We need to test and propagate the TIF_NEED_RESCHED
|
||||
* bit here because we might not have send the
|
||||
* reschedule IPI to idle tasks.
|
||||
*/
|
||||
if (tif_need_resched())
|
||||
set_preempt_need_resched();
|
||||
}
|
||||
tick_nohz_idle_exit();
|
||||
schedule_preempt_disabled();
|
||||
|
@ -129,7 +135,7 @@ void cpu_startup_entry(enum cpuhp_state state)
|
|||
*/
|
||||
boot_init_stack_canary();
|
||||
#endif
|
||||
current_set_polling();
|
||||
__current_set_polling();
|
||||
arch_cpu_idle_prepare();
|
||||
cpu_idle_loop();
|
||||
}
|
||||
|
|
|
@ -816,9 +816,6 @@ struct mm_struct *dup_mm(struct task_struct *tsk)
|
|||
|
||||
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
||||
mm->pmd_huge_pte = NULL;
|
||||
#endif
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
mm->first_nid = NUMA_PTE_SCAN_INIT;
|
||||
#endif
|
||||
if (!mm_init(mm, tsk))
|
||||
goto fail_nomem;
|
||||
|
@ -1313,7 +1310,7 @@ static struct task_struct *copy_process(unsigned long clone_flags,
|
|||
#endif
|
||||
|
||||
/* Perform scheduler related setup. Assign this task to a CPU. */
|
||||
sched_fork(p);
|
||||
sched_fork(clone_flags, p);
|
||||
|
||||
retval = perf_event_init_task(p);
|
||||
if (retval)
|
||||
|
|
|
@ -916,6 +916,12 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
|
|||
force_quiescent_state(rsp); /* Kick them all. */
|
||||
}
|
||||
|
||||
/*
|
||||
* This function really isn't for public consumption, but RCU is special in
|
||||
* that context switches can allow the state machine to make progress.
|
||||
*/
|
||||
extern void resched_cpu(int cpu);
|
||||
|
||||
static void print_cpu_stall(struct rcu_state *rsp)
|
||||
{
|
||||
int cpu;
|
||||
|
@ -945,7 +951,14 @@ static void print_cpu_stall(struct rcu_state *rsp)
|
|||
3 * rcu_jiffies_till_stall_check() + 3;
|
||||
raw_spin_unlock_irqrestore(&rnp->lock, flags);
|
||||
|
||||
set_need_resched(); /* kick ourselves to get things going. */
|
||||
/*
|
||||
* Attempt to revive the RCU machinery by forcing a context switch.
|
||||
*
|
||||
* A context switch would normally allow the RCU state machine to make
|
||||
* progress and it could be we're stuck in kernel space without context
|
||||
* switches for an entirely unreasonable amount of time.
|
||||
*/
|
||||
resched_cpu(smp_processor_id());
|
||||
}
|
||||
|
||||
static void check_cpu_stall(struct rcu_state *rsp, struct rcu_data *rdp)
|
||||
|
|
|
@ -12,6 +12,7 @@ CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer
|
|||
endif
|
||||
|
||||
obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o
|
||||
obj-y += wait.o completion.o
|
||||
obj-$(CONFIG_SMP) += cpupri.o
|
||||
obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
|
||||
obj-$(CONFIG_SCHEDSTATS) += stats.o
|
||||
|
|
|
@ -0,0 +1,299 @@
|
|||
/*
|
||||
* Generic wait-for-completion handler;
|
||||
*
|
||||
* It differs from semaphores in that their default case is the opposite,
|
||||
* wait_for_completion default blocks whereas semaphore default non-block. The
|
||||
* interface also makes it easy to 'complete' multiple waiting threads,
|
||||
* something which isn't entirely natural for semaphores.
|
||||
*
|
||||
* But more importantly, the primitive documents the usage. Semaphores would
|
||||
* typically be used for exclusion which gives rise to priority inversion.
|
||||
* Waiting for completion is a typically sync point, but not an exclusion point.
|
||||
*/
|
||||
|
||||
#include <linux/sched.h>
|
||||
#include <linux/completion.h>
|
||||
|
||||
/**
|
||||
* complete: - signals a single thread waiting on this completion
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This will wake up a single thread waiting on this completion. Threads will be
|
||||
* awakened in the same order in which they were queued.
|
||||
*
|
||||
* See also complete_all(), wait_for_completion() and related routines.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void complete(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
x->done++;
|
||||
__wake_up_locked(&x->wait, TASK_NORMAL, 1);
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(complete);
|
||||
|
||||
/**
|
||||
* complete_all: - signals all threads waiting on this completion
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This will wake up all threads waiting on this particular completion event.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void complete_all(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
x->done += UINT_MAX/2;
|
||||
__wake_up_locked(&x->wait, TASK_NORMAL, 0);
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(complete_all);
|
||||
|
||||
static inline long __sched
|
||||
do_wait_for_common(struct completion *x,
|
||||
long (*action)(long), long timeout, int state)
|
||||
{
|
||||
if (!x->done) {
|
||||
DECLARE_WAITQUEUE(wait, current);
|
||||
|
||||
__add_wait_queue_tail_exclusive(&x->wait, &wait);
|
||||
do {
|
||||
if (signal_pending_state(state, current)) {
|
||||
timeout = -ERESTARTSYS;
|
||||
break;
|
||||
}
|
||||
__set_current_state(state);
|
||||
spin_unlock_irq(&x->wait.lock);
|
||||
timeout = action(timeout);
|
||||
spin_lock_irq(&x->wait.lock);
|
||||
} while (!x->done && timeout);
|
||||
__remove_wait_queue(&x->wait, &wait);
|
||||
if (!x->done)
|
||||
return timeout;
|
||||
}
|
||||
x->done--;
|
||||
return timeout ?: 1;
|
||||
}
|
||||
|
||||
static inline long __sched
|
||||
__wait_for_common(struct completion *x,
|
||||
long (*action)(long), long timeout, int state)
|
||||
{
|
||||
might_sleep();
|
||||
|
||||
spin_lock_irq(&x->wait.lock);
|
||||
timeout = do_wait_for_common(x, action, timeout, state);
|
||||
spin_unlock_irq(&x->wait.lock);
|
||||
return timeout;
|
||||
}
|
||||
|
||||
static long __sched
|
||||
wait_for_common(struct completion *x, long timeout, int state)
|
||||
{
|
||||
return __wait_for_common(x, schedule_timeout, timeout, state);
|
||||
}
|
||||
|
||||
static long __sched
|
||||
wait_for_common_io(struct completion *x, long timeout, int state)
|
||||
{
|
||||
return __wait_for_common(x, io_schedule_timeout, timeout, state);
|
||||
}
|
||||
|
||||
/**
|
||||
* wait_for_completion: - waits for completion of a task
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It is NOT
|
||||
* interruptible and there is no timeout.
|
||||
*
|
||||
* See also similar routines (i.e. wait_for_completion_timeout()) with timeout
|
||||
* and interrupt capability. Also see complete().
|
||||
*/
|
||||
void __sched wait_for_completion(struct completion *x)
|
||||
{
|
||||
wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion);
|
||||
|
||||
/**
|
||||
* wait_for_completion_timeout: - waits for completion of a task (w/timeout)
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. The timeout is in jiffies. It is not
|
||||
* interruptible.
|
||||
*
|
||||
* Return: 0 if timed out, and positive (at least 1, or number of jiffies left
|
||||
* till timeout) if completed.
|
||||
*/
|
||||
unsigned long __sched
|
||||
wait_for_completion_timeout(struct completion *x, unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_io: - waits for completion of a task
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It is NOT
|
||||
* interruptible and there is no timeout. The caller is accounted as waiting
|
||||
* for IO.
|
||||
*/
|
||||
void __sched wait_for_completion_io(struct completion *x)
|
||||
{
|
||||
wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_io);
|
||||
|
||||
/**
|
||||
* wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. The timeout is in jiffies. It is not
|
||||
* interruptible. The caller is accounted as waiting for IO.
|
||||
*
|
||||
* Return: 0 if timed out, and positive (at least 1, or number of jiffies left
|
||||
* till timeout) if completed.
|
||||
*/
|
||||
unsigned long __sched
|
||||
wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
|
||||
{
|
||||
return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_io_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_interruptible: - waits for completion of a task (w/intr)
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits for completion of a specific task to be signaled. It is
|
||||
* interruptible.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if completed.
|
||||
*/
|
||||
int __sched wait_for_completion_interruptible(struct completion *x)
|
||||
{
|
||||
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
|
||||
if (t == -ERESTARTSYS)
|
||||
return t;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_interruptible);
|
||||
|
||||
/**
|
||||
* wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. It is interruptible. The timeout is in jiffies.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
|
||||
* or number of jiffies left till timeout) if completed.
|
||||
*/
|
||||
long __sched
|
||||
wait_for_completion_interruptible_timeout(struct completion *x,
|
||||
unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_killable: - waits for completion of a task (killable)
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It can be
|
||||
* interrupted by a kill signal.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if completed.
|
||||
*/
|
||||
int __sched wait_for_completion_killable(struct completion *x)
|
||||
{
|
||||
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
|
||||
if (t == -ERESTARTSYS)
|
||||
return t;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_killable);
|
||||
|
||||
/**
|
||||
* wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be
|
||||
* signaled or for a specified timeout to expire. It can be
|
||||
* interrupted by a kill signal. The timeout is in jiffies.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
|
||||
* or number of jiffies left till timeout) if completed.
|
||||
*/
|
||||
long __sched
|
||||
wait_for_completion_killable_timeout(struct completion *x,
|
||||
unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_KILLABLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_killable_timeout);
|
||||
|
||||
/**
|
||||
* try_wait_for_completion - try to decrement a completion without blocking
|
||||
* @x: completion structure
|
||||
*
|
||||
* Return: 0 if a decrement cannot be done without blocking
|
||||
* 1 if a decrement succeeded.
|
||||
*
|
||||
* If a completion is being used as a counting completion,
|
||||
* attempt to decrement the counter without blocking. This
|
||||
* enables us to avoid waiting if the resource the completion
|
||||
* is protecting is not available.
|
||||
*/
|
||||
bool try_wait_for_completion(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
int ret = 1;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
if (!x->done)
|
||||
ret = 0;
|
||||
else
|
||||
x->done--;
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(try_wait_for_completion);
|
||||
|
||||
/**
|
||||
* completion_done - Test to see if a completion has any waiters
|
||||
* @x: completion structure
|
||||
*
|
||||
* Return: 0 if there are waiters (wait_for_completion() in progress)
|
||||
* 1 if there are no waiters.
|
||||
*
|
||||
*/
|
||||
bool completion_done(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
int ret = 1;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
if (!x->done)
|
||||
ret = 0;
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(completion_done);
|
|
@ -513,12 +513,11 @@ static inline void init_hrtick(void)
|
|||
* might also involve a cross-CPU call to trigger the scheduler on
|
||||
* the target CPU.
|
||||
*/
|
||||
#ifdef CONFIG_SMP
|
||||
void resched_task(struct task_struct *p)
|
||||
{
|
||||
int cpu;
|
||||
|
||||
assert_raw_spin_locked(&task_rq(p)->lock);
|
||||
lockdep_assert_held(&task_rq(p)->lock);
|
||||
|
||||
if (test_tsk_need_resched(p))
|
||||
return;
|
||||
|
@ -526,8 +525,10 @@ void resched_task(struct task_struct *p)
|
|||
set_tsk_need_resched(p);
|
||||
|
||||
cpu = task_cpu(p);
|
||||
if (cpu == smp_processor_id())
|
||||
if (cpu == smp_processor_id()) {
|
||||
set_preempt_need_resched();
|
||||
return;
|
||||
}
|
||||
|
||||
/* NEED_RESCHED must be visible before we test polling */
|
||||
smp_mb();
|
||||
|
@ -546,6 +547,7 @@ void resched_cpu(int cpu)
|
|||
raw_spin_unlock_irqrestore(&rq->lock, flags);
|
||||
}
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
#ifdef CONFIG_NO_HZ_COMMON
|
||||
/*
|
||||
* In the semi idle case, use the nearest busy cpu for migrating timers
|
||||
|
@ -693,12 +695,6 @@ void sched_avg_update(struct rq *rq)
|
|||
}
|
||||
}
|
||||
|
||||
#else /* !CONFIG_SMP */
|
||||
void resched_task(struct task_struct *p)
|
||||
{
|
||||
assert_raw_spin_locked(&task_rq(p)->lock);
|
||||
set_tsk_need_resched(p);
|
||||
}
|
||||
#endif /* CONFIG_SMP */
|
||||
|
||||
#if defined(CONFIG_RT_GROUP_SCHED) || (defined(CONFIG_FAIR_GROUP_SCHED) && \
|
||||
|
@ -767,14 +763,14 @@ static void set_load_weight(struct task_struct *p)
|
|||
static void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
|
||||
{
|
||||
update_rq_clock(rq);
|
||||
sched_info_queued(p);
|
||||
sched_info_queued(rq, p);
|
||||
p->sched_class->enqueue_task(rq, p, flags);
|
||||
}
|
||||
|
||||
static void dequeue_task(struct rq *rq, struct task_struct *p, int flags)
|
||||
{
|
||||
update_rq_clock(rq);
|
||||
sched_info_dequeued(p);
|
||||
sched_info_dequeued(rq, p);
|
||||
p->sched_class->dequeue_task(rq, p, flags);
|
||||
}
|
||||
|
||||
|
@ -987,7 +983,7 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
|
|||
* ttwu() will sort out the placement.
|
||||
*/
|
||||
WARN_ON_ONCE(p->state != TASK_RUNNING && p->state != TASK_WAKING &&
|
||||
!(task_thread_info(p)->preempt_count & PREEMPT_ACTIVE));
|
||||
!(task_preempt_count(p) & PREEMPT_ACTIVE));
|
||||
|
||||
#ifdef CONFIG_LOCKDEP
|
||||
/*
|
||||
|
@ -1017,6 +1013,107 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
|
|||
__set_task_cpu(p, new_cpu);
|
||||
}
|
||||
|
||||
static void __migrate_swap_task(struct task_struct *p, int cpu)
|
||||
{
|
||||
if (p->on_rq) {
|
||||
struct rq *src_rq, *dst_rq;
|
||||
|
||||
src_rq = task_rq(p);
|
||||
dst_rq = cpu_rq(cpu);
|
||||
|
||||
deactivate_task(src_rq, p, 0);
|
||||
set_task_cpu(p, cpu);
|
||||
activate_task(dst_rq, p, 0);
|
||||
check_preempt_curr(dst_rq, p, 0);
|
||||
} else {
|
||||
/*
|
||||
* Task isn't running anymore; make it appear like we migrated
|
||||
* it before it went to sleep. This means on wakeup we make the
|
||||
* previous cpu our targer instead of where it really is.
|
||||
*/
|
||||
p->wake_cpu = cpu;
|
||||
}
|
||||
}
|
||||
|
||||
struct migration_swap_arg {
|
||||
struct task_struct *src_task, *dst_task;
|
||||
int src_cpu, dst_cpu;
|
||||
};
|
||||
|
||||
static int migrate_swap_stop(void *data)
|
||||
{
|
||||
struct migration_swap_arg *arg = data;
|
||||
struct rq *src_rq, *dst_rq;
|
||||
int ret = -EAGAIN;
|
||||
|
||||
src_rq = cpu_rq(arg->src_cpu);
|
||||
dst_rq = cpu_rq(arg->dst_cpu);
|
||||
|
||||
double_raw_lock(&arg->src_task->pi_lock,
|
||||
&arg->dst_task->pi_lock);
|
||||
double_rq_lock(src_rq, dst_rq);
|
||||
if (task_cpu(arg->dst_task) != arg->dst_cpu)
|
||||
goto unlock;
|
||||
|
||||
if (task_cpu(arg->src_task) != arg->src_cpu)
|
||||
goto unlock;
|
||||
|
||||
if (!cpumask_test_cpu(arg->dst_cpu, tsk_cpus_allowed(arg->src_task)))
|
||||
goto unlock;
|
||||
|
||||
if (!cpumask_test_cpu(arg->src_cpu, tsk_cpus_allowed(arg->dst_task)))
|
||||
goto unlock;
|
||||
|
||||
__migrate_swap_task(arg->src_task, arg->dst_cpu);
|
||||
__migrate_swap_task(arg->dst_task, arg->src_cpu);
|
||||
|
||||
ret = 0;
|
||||
|
||||
unlock:
|
||||
double_rq_unlock(src_rq, dst_rq);
|
||||
raw_spin_unlock(&arg->dst_task->pi_lock);
|
||||
raw_spin_unlock(&arg->src_task->pi_lock);
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
/*
|
||||
* Cross migrate two tasks
|
||||
*/
|
||||
int migrate_swap(struct task_struct *cur, struct task_struct *p)
|
||||
{
|
||||
struct migration_swap_arg arg;
|
||||
int ret = -EINVAL;
|
||||
|
||||
arg = (struct migration_swap_arg){
|
||||
.src_task = cur,
|
||||
.src_cpu = task_cpu(cur),
|
||||
.dst_task = p,
|
||||
.dst_cpu = task_cpu(p),
|
||||
};
|
||||
|
||||
if (arg.src_cpu == arg.dst_cpu)
|
||||
goto out;
|
||||
|
||||
/*
|
||||
* These three tests are all lockless; this is OK since all of them
|
||||
* will be re-checked with proper locks held further down the line.
|
||||
*/
|
||||
if (!cpu_active(arg.src_cpu) || !cpu_active(arg.dst_cpu))
|
||||
goto out;
|
||||
|
||||
if (!cpumask_test_cpu(arg.dst_cpu, tsk_cpus_allowed(arg.src_task)))
|
||||
goto out;
|
||||
|
||||
if (!cpumask_test_cpu(arg.src_cpu, tsk_cpus_allowed(arg.dst_task)))
|
||||
goto out;
|
||||
|
||||
ret = stop_two_cpus(arg.dst_cpu, arg.src_cpu, migrate_swap_stop, &arg);
|
||||
|
||||
out:
|
||||
return ret;
|
||||
}
|
||||
|
||||
struct migration_arg {
|
||||
struct task_struct *task;
|
||||
int dest_cpu;
|
||||
|
@ -1236,9 +1333,9 @@ out:
|
|||
* The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
|
||||
*/
|
||||
static inline
|
||||
int select_task_rq(struct task_struct *p, int sd_flags, int wake_flags)
|
||||
int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
|
||||
{
|
||||
int cpu = p->sched_class->select_task_rq(p, sd_flags, wake_flags);
|
||||
cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
|
||||
|
||||
/*
|
||||
* In order not to call set_task_cpu() on a blocking task we need
|
||||
|
@ -1330,12 +1427,13 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
|
|||
|
||||
if (rq->idle_stamp) {
|
||||
u64 delta = rq_clock(rq) - rq->idle_stamp;
|
||||
u64 max = 2*sysctl_sched_migration_cost;
|
||||
u64 max = 2*rq->max_idle_balance_cost;
|
||||
|
||||
if (delta > max)
|
||||
rq->avg_idle = max;
|
||||
else
|
||||
update_avg(&rq->avg_idle, delta);
|
||||
|
||||
if (rq->avg_idle > max)
|
||||
rq->avg_idle = max;
|
||||
|
||||
rq->idle_stamp = 0;
|
||||
}
|
||||
#endif
|
||||
|
@ -1396,6 +1494,14 @@ static void sched_ttwu_pending(void)
|
|||
|
||||
void scheduler_ipi(void)
|
||||
{
|
||||
/*
|
||||
* Fold TIF_NEED_RESCHED into the preempt_count; anybody setting
|
||||
* TIF_NEED_RESCHED remotely (for the first time) will also send
|
||||
* this IPI.
|
||||
*/
|
||||
if (tif_need_resched())
|
||||
set_preempt_need_resched();
|
||||
|
||||
if (llist_empty(&this_rq()->wake_list)
|
||||
&& !tick_nohz_full_cpu(smp_processor_id())
|
||||
&& !got_nohz_idle_kick())
|
||||
|
@ -1513,7 +1619,7 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
|
|||
if (p->sched_class->task_waking)
|
||||
p->sched_class->task_waking(p);
|
||||
|
||||
cpu = select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
|
||||
cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
|
||||
if (task_cpu(p) != cpu) {
|
||||
wake_flags |= WF_MIGRATED;
|
||||
set_task_cpu(p, cpu);
|
||||
|
@ -1595,7 +1701,7 @@ int wake_up_state(struct task_struct *p, unsigned int state)
|
|||
*
|
||||
* __sched_fork() is basic setup used by init_idle() too:
|
||||
*/
|
||||
static void __sched_fork(struct task_struct *p)
|
||||
static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
{
|
||||
p->on_rq = 0;
|
||||
|
||||
|
@ -1619,16 +1725,24 @@ static void __sched_fork(struct task_struct *p)
|
|||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
if (p->mm && atomic_read(&p->mm->mm_users) == 1) {
|
||||
p->mm->numa_next_scan = jiffies;
|
||||
p->mm->numa_next_reset = jiffies;
|
||||
p->mm->numa_next_scan = jiffies + msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
|
||||
p->mm->numa_scan_seq = 0;
|
||||
}
|
||||
|
||||
if (clone_flags & CLONE_VM)
|
||||
p->numa_preferred_nid = current->numa_preferred_nid;
|
||||
else
|
||||
p->numa_preferred_nid = -1;
|
||||
|
||||
p->node_stamp = 0ULL;
|
||||
p->numa_scan_seq = p->mm ? p->mm->numa_scan_seq : 0;
|
||||
p->numa_migrate_seq = p->mm ? p->mm->numa_scan_seq - 1 : 0;
|
||||
p->numa_scan_period = sysctl_numa_balancing_scan_delay;
|
||||
p->numa_work.next = &p->numa_work;
|
||||
p->numa_faults = NULL;
|
||||
p->numa_faults_buffer = NULL;
|
||||
|
||||
INIT_LIST_HEAD(&p->numa_entry);
|
||||
p->numa_group = NULL;
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
}
|
||||
|
||||
|
@ -1654,12 +1768,12 @@ void set_numabalancing_state(bool enabled)
|
|||
/*
|
||||
* fork()/clone()-time setup:
|
||||
*/
|
||||
void sched_fork(struct task_struct *p)
|
||||
void sched_fork(unsigned long clone_flags, struct task_struct *p)
|
||||
{
|
||||
unsigned long flags;
|
||||
int cpu = get_cpu();
|
||||
|
||||
__sched_fork(p);
|
||||
__sched_fork(clone_flags, p);
|
||||
/*
|
||||
* We mark the process as running here. This guarantees that
|
||||
* nobody will actually run it, and a signal or other external
|
||||
|
@ -1717,10 +1831,7 @@ void sched_fork(struct task_struct *p)
|
|||
#if defined(CONFIG_SMP)
|
||||
p->on_cpu = 0;
|
||||
#endif
|
||||
#ifdef CONFIG_PREEMPT_COUNT
|
||||
/* Want to start with kernel preemption disabled. */
|
||||
task_thread_info(p)->preempt_count = 1;
|
||||
#endif
|
||||
init_task_preempt_count(p);
|
||||
#ifdef CONFIG_SMP
|
||||
plist_node_init(&p->pushable_tasks, MAX_PRIO);
|
||||
#endif
|
||||
|
@ -1747,7 +1858,7 @@ void wake_up_new_task(struct task_struct *p)
|
|||
* - cpus_allowed can change in the fork path
|
||||
* - any previously selected cpu might disappear through hotplug
|
||||
*/
|
||||
set_task_cpu(p, select_task_rq(p, SD_BALANCE_FORK, 0));
|
||||
set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
|
||||
#endif
|
||||
|
||||
/* Initialize new task's runnable average */
|
||||
|
@ -1838,7 +1949,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
|
|||
struct task_struct *next)
|
||||
{
|
||||
trace_sched_switch(prev, next);
|
||||
sched_info_switch(prev, next);
|
||||
sched_info_switch(rq, prev, next);
|
||||
perf_event_task_sched_out(prev, next);
|
||||
fire_sched_out_preempt_notifiers(prev, next);
|
||||
prepare_lock_switch(rq, next);
|
||||
|
@ -1890,6 +2001,8 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
|
|||
if (mm)
|
||||
mmdrop(mm);
|
||||
if (unlikely(prev_state == TASK_DEAD)) {
|
||||
task_numa_free(prev);
|
||||
|
||||
/*
|
||||
* Remove function-return probe instances associated with this
|
||||
* task and put them back on the free list.
|
||||
|
@ -2073,7 +2186,7 @@ void sched_exec(void)
|
|||
int dest_cpu;
|
||||
|
||||
raw_spin_lock_irqsave(&p->pi_lock, flags);
|
||||
dest_cpu = p->sched_class->select_task_rq(p, SD_BALANCE_EXEC, 0);
|
||||
dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0);
|
||||
if (dest_cpu == smp_processor_id())
|
||||
goto unlock;
|
||||
|
||||
|
@ -2215,7 +2328,7 @@ notrace unsigned long get_parent_ip(unsigned long addr)
|
|||
#if defined(CONFIG_PREEMPT) && (defined(CONFIG_DEBUG_PREEMPT) || \
|
||||
defined(CONFIG_PREEMPT_TRACER))
|
||||
|
||||
void __kprobes add_preempt_count(int val)
|
||||
void __kprobes preempt_count_add(int val)
|
||||
{
|
||||
#ifdef CONFIG_DEBUG_PREEMPT
|
||||
/*
|
||||
|
@ -2224,7 +2337,7 @@ void __kprobes add_preempt_count(int val)
|
|||
if (DEBUG_LOCKS_WARN_ON((preempt_count() < 0)))
|
||||
return;
|
||||
#endif
|
||||
preempt_count() += val;
|
||||
__preempt_count_add(val);
|
||||
#ifdef CONFIG_DEBUG_PREEMPT
|
||||
/*
|
||||
* Spinlock count overflowing soon?
|
||||
|
@ -2235,9 +2348,9 @@ void __kprobes add_preempt_count(int val)
|
|||
if (preempt_count() == val)
|
||||
trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
|
||||
}
|
||||
EXPORT_SYMBOL(add_preempt_count);
|
||||
EXPORT_SYMBOL(preempt_count_add);
|
||||
|
||||
void __kprobes sub_preempt_count(int val)
|
||||
void __kprobes preempt_count_sub(int val)
|
||||
{
|
||||
#ifdef CONFIG_DEBUG_PREEMPT
|
||||
/*
|
||||
|
@ -2255,9 +2368,9 @@ void __kprobes sub_preempt_count(int val)
|
|||
|
||||
if (preempt_count() == val)
|
||||
trace_preempt_on(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
|
||||
preempt_count() -= val;
|
||||
__preempt_count_sub(val);
|
||||
}
|
||||
EXPORT_SYMBOL(sub_preempt_count);
|
||||
EXPORT_SYMBOL(preempt_count_sub);
|
||||
|
||||
#endif
|
||||
|
||||
|
@ -2430,6 +2543,7 @@ need_resched:
|
|||
put_prev_task(rq, prev);
|
||||
next = pick_next_task(rq);
|
||||
clear_tsk_need_resched(prev);
|
||||
clear_preempt_need_resched();
|
||||
rq->skip_clock_update = 0;
|
||||
|
||||
if (likely(prev != next)) {
|
||||
|
@ -2520,9 +2634,9 @@ asmlinkage void __sched notrace preempt_schedule(void)
|
|||
return;
|
||||
|
||||
do {
|
||||
add_preempt_count_notrace(PREEMPT_ACTIVE);
|
||||
__preempt_count_add(PREEMPT_ACTIVE);
|
||||
__schedule();
|
||||
sub_preempt_count_notrace(PREEMPT_ACTIVE);
|
||||
__preempt_count_sub(PREEMPT_ACTIVE);
|
||||
|
||||
/*
|
||||
* Check again in case we missed a preemption opportunity
|
||||
|
@ -2541,20 +2655,19 @@ EXPORT_SYMBOL(preempt_schedule);
|
|||
*/
|
||||
asmlinkage void __sched preempt_schedule_irq(void)
|
||||
{
|
||||
struct thread_info *ti = current_thread_info();
|
||||
enum ctx_state prev_state;
|
||||
|
||||
/* Catch callers which need to be fixed */
|
||||
BUG_ON(ti->preempt_count || !irqs_disabled());
|
||||
BUG_ON(preempt_count() || !irqs_disabled());
|
||||
|
||||
prev_state = exception_enter();
|
||||
|
||||
do {
|
||||
add_preempt_count(PREEMPT_ACTIVE);
|
||||
__preempt_count_add(PREEMPT_ACTIVE);
|
||||
local_irq_enable();
|
||||
__schedule();
|
||||
local_irq_disable();
|
||||
sub_preempt_count(PREEMPT_ACTIVE);
|
||||
__preempt_count_sub(PREEMPT_ACTIVE);
|
||||
|
||||
/*
|
||||
* Check again in case we missed a preemption opportunity
|
||||
|
@ -2575,393 +2688,6 @@ int default_wake_function(wait_queue_t *curr, unsigned mode, int wake_flags,
|
|||
}
|
||||
EXPORT_SYMBOL(default_wake_function);
|
||||
|
||||
/*
|
||||
* The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
|
||||
* wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
|
||||
* number) then we wake all the non-exclusive tasks and one exclusive task.
|
||||
*
|
||||
* There are circumstances in which we can try to wake a task which has already
|
||||
* started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
|
||||
* zero in this (rare) case, and we handle it by continuing to scan the queue.
|
||||
*/
|
||||
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, int wake_flags, void *key)
|
||||
{
|
||||
wait_queue_t *curr, *next;
|
||||
|
||||
list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
|
||||
unsigned flags = curr->flags;
|
||||
|
||||
if (curr->func(curr, mode, wake_flags, key) &&
|
||||
(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* __wake_up - wake up threads blocked on a waitqueue.
|
||||
* @q: the waitqueue
|
||||
* @mode: which threads
|
||||
* @nr_exclusive: how many wake-one or wake-many threads to wake up
|
||||
* @key: is directly passed to the wakeup function
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void __wake_up(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, void *key)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&q->lock, flags);
|
||||
__wake_up_common(q, mode, nr_exclusive, 0, key);
|
||||
spin_unlock_irqrestore(&q->lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(__wake_up);
|
||||
|
||||
/*
|
||||
* Same as __wake_up but called with the spinlock in wait_queue_head_t held.
|
||||
*/
|
||||
void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
|
||||
{
|
||||
__wake_up_common(q, mode, nr, 0, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_locked);
|
||||
|
||||
void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key)
|
||||
{
|
||||
__wake_up_common(q, mode, 1, 0, key);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_locked_key);
|
||||
|
||||
/**
|
||||
* __wake_up_sync_key - wake up threads blocked on a waitqueue.
|
||||
* @q: the waitqueue
|
||||
* @mode: which threads
|
||||
* @nr_exclusive: how many wake-one or wake-many threads to wake up
|
||||
* @key: opaque value to be passed to wakeup targets
|
||||
*
|
||||
* The sync wakeup differs that the waker knows that it will schedule
|
||||
* away soon, so while the target thread will be woken up, it will not
|
||||
* be migrated to another CPU - ie. the two threads are 'synchronized'
|
||||
* with each other. This can prevent needless bouncing between CPUs.
|
||||
*
|
||||
* On UP it can prevent extra preemption.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, void *key)
|
||||
{
|
||||
unsigned long flags;
|
||||
int wake_flags = WF_SYNC;
|
||||
|
||||
if (unlikely(!q))
|
||||
return;
|
||||
|
||||
if (unlikely(nr_exclusive != 1))
|
||||
wake_flags = 0;
|
||||
|
||||
spin_lock_irqsave(&q->lock, flags);
|
||||
__wake_up_common(q, mode, nr_exclusive, wake_flags, key);
|
||||
spin_unlock_irqrestore(&q->lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_sync_key);
|
||||
|
||||
/*
|
||||
* __wake_up_sync - see __wake_up_sync_key()
|
||||
*/
|
||||
void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
|
||||
{
|
||||
__wake_up_sync_key(q, mode, nr_exclusive, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */
|
||||
|
||||
/**
|
||||
* complete: - signals a single thread waiting on this completion
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This will wake up a single thread waiting on this completion. Threads will be
|
||||
* awakened in the same order in which they were queued.
|
||||
*
|
||||
* See also complete_all(), wait_for_completion() and related routines.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void complete(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
x->done++;
|
||||
__wake_up_common(&x->wait, TASK_NORMAL, 1, 0, NULL);
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(complete);
|
||||
|
||||
/**
|
||||
* complete_all: - signals all threads waiting on this completion
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This will wake up all threads waiting on this particular completion event.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void complete_all(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
x->done += UINT_MAX/2;
|
||||
__wake_up_common(&x->wait, TASK_NORMAL, 0, 0, NULL);
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(complete_all);
|
||||
|
||||
static inline long __sched
|
||||
do_wait_for_common(struct completion *x,
|
||||
long (*action)(long), long timeout, int state)
|
||||
{
|
||||
if (!x->done) {
|
||||
DECLARE_WAITQUEUE(wait, current);
|
||||
|
||||
__add_wait_queue_tail_exclusive(&x->wait, &wait);
|
||||
do {
|
||||
if (signal_pending_state(state, current)) {
|
||||
timeout = -ERESTARTSYS;
|
||||
break;
|
||||
}
|
||||
__set_current_state(state);
|
||||
spin_unlock_irq(&x->wait.lock);
|
||||
timeout = action(timeout);
|
||||
spin_lock_irq(&x->wait.lock);
|
||||
} while (!x->done && timeout);
|
||||
__remove_wait_queue(&x->wait, &wait);
|
||||
if (!x->done)
|
||||
return timeout;
|
||||
}
|
||||
x->done--;
|
||||
return timeout ?: 1;
|
||||
}
|
||||
|
||||
static inline long __sched
|
||||
__wait_for_common(struct completion *x,
|
||||
long (*action)(long), long timeout, int state)
|
||||
{
|
||||
might_sleep();
|
||||
|
||||
spin_lock_irq(&x->wait.lock);
|
||||
timeout = do_wait_for_common(x, action, timeout, state);
|
||||
spin_unlock_irq(&x->wait.lock);
|
||||
return timeout;
|
||||
}
|
||||
|
||||
static long __sched
|
||||
wait_for_common(struct completion *x, long timeout, int state)
|
||||
{
|
||||
return __wait_for_common(x, schedule_timeout, timeout, state);
|
||||
}
|
||||
|
||||
static long __sched
|
||||
wait_for_common_io(struct completion *x, long timeout, int state)
|
||||
{
|
||||
return __wait_for_common(x, io_schedule_timeout, timeout, state);
|
||||
}
|
||||
|
||||
/**
|
||||
* wait_for_completion: - waits for completion of a task
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It is NOT
|
||||
* interruptible and there is no timeout.
|
||||
*
|
||||
* See also similar routines (i.e. wait_for_completion_timeout()) with timeout
|
||||
* and interrupt capability. Also see complete().
|
||||
*/
|
||||
void __sched wait_for_completion(struct completion *x)
|
||||
{
|
||||
wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion);
|
||||
|
||||
/**
|
||||
* wait_for_completion_timeout: - waits for completion of a task (w/timeout)
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. The timeout is in jiffies. It is not
|
||||
* interruptible.
|
||||
*
|
||||
* Return: 0 if timed out, and positive (at least 1, or number of jiffies left
|
||||
* till timeout) if completed.
|
||||
*/
|
||||
unsigned long __sched
|
||||
wait_for_completion_timeout(struct completion *x, unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_io: - waits for completion of a task
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It is NOT
|
||||
* interruptible and there is no timeout. The caller is accounted as waiting
|
||||
* for IO.
|
||||
*/
|
||||
void __sched wait_for_completion_io(struct completion *x)
|
||||
{
|
||||
wait_for_common_io(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_io);
|
||||
|
||||
/**
|
||||
* wait_for_completion_io_timeout: - waits for completion of a task (w/timeout)
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. The timeout is in jiffies. It is not
|
||||
* interruptible. The caller is accounted as waiting for IO.
|
||||
*
|
||||
* Return: 0 if timed out, and positive (at least 1, or number of jiffies left
|
||||
* till timeout) if completed.
|
||||
*/
|
||||
unsigned long __sched
|
||||
wait_for_completion_io_timeout(struct completion *x, unsigned long timeout)
|
||||
{
|
||||
return wait_for_common_io(x, timeout, TASK_UNINTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_io_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_interruptible: - waits for completion of a task (w/intr)
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits for completion of a specific task to be signaled. It is
|
||||
* interruptible.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if completed.
|
||||
*/
|
||||
int __sched wait_for_completion_interruptible(struct completion *x)
|
||||
{
|
||||
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
|
||||
if (t == -ERESTARTSYS)
|
||||
return t;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_interruptible);
|
||||
|
||||
/**
|
||||
* wait_for_completion_interruptible_timeout: - waits for completion (w/(to,intr))
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be signaled or for a
|
||||
* specified timeout to expire. It is interruptible. The timeout is in jiffies.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
|
||||
* or number of jiffies left till timeout) if completed.
|
||||
*/
|
||||
long __sched
|
||||
wait_for_completion_interruptible_timeout(struct completion *x,
|
||||
unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_INTERRUPTIBLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_interruptible_timeout);
|
||||
|
||||
/**
|
||||
* wait_for_completion_killable: - waits for completion of a task (killable)
|
||||
* @x: holds the state of this particular completion
|
||||
*
|
||||
* This waits to be signaled for completion of a specific task. It can be
|
||||
* interrupted by a kill signal.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if completed.
|
||||
*/
|
||||
int __sched wait_for_completion_killable(struct completion *x)
|
||||
{
|
||||
long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
|
||||
if (t == -ERESTARTSYS)
|
||||
return t;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_killable);
|
||||
|
||||
/**
|
||||
* wait_for_completion_killable_timeout: - waits for completion of a task (w/(to,killable))
|
||||
* @x: holds the state of this particular completion
|
||||
* @timeout: timeout value in jiffies
|
||||
*
|
||||
* This waits for either a completion of a specific task to be
|
||||
* signaled or for a specified timeout to expire. It can be
|
||||
* interrupted by a kill signal. The timeout is in jiffies.
|
||||
*
|
||||
* Return: -ERESTARTSYS if interrupted, 0 if timed out, positive (at least 1,
|
||||
* or number of jiffies left till timeout) if completed.
|
||||
*/
|
||||
long __sched
|
||||
wait_for_completion_killable_timeout(struct completion *x,
|
||||
unsigned long timeout)
|
||||
{
|
||||
return wait_for_common(x, timeout, TASK_KILLABLE);
|
||||
}
|
||||
EXPORT_SYMBOL(wait_for_completion_killable_timeout);
|
||||
|
||||
/**
|
||||
* try_wait_for_completion - try to decrement a completion without blocking
|
||||
* @x: completion structure
|
||||
*
|
||||
* Return: 0 if a decrement cannot be done without blocking
|
||||
* 1 if a decrement succeeded.
|
||||
*
|
||||
* If a completion is being used as a counting completion,
|
||||
* attempt to decrement the counter without blocking. This
|
||||
* enables us to avoid waiting if the resource the completion
|
||||
* is protecting is not available.
|
||||
*/
|
||||
bool try_wait_for_completion(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
int ret = 1;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
if (!x->done)
|
||||
ret = 0;
|
||||
else
|
||||
x->done--;
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(try_wait_for_completion);
|
||||
|
||||
/**
|
||||
* completion_done - Test to see if a completion has any waiters
|
||||
* @x: completion structure
|
||||
*
|
||||
* Return: 0 if there are waiters (wait_for_completion() in progress)
|
||||
* 1 if there are no waiters.
|
||||
*
|
||||
*/
|
||||
bool completion_done(struct completion *x)
|
||||
{
|
||||
unsigned long flags;
|
||||
int ret = 1;
|
||||
|
||||
spin_lock_irqsave(&x->wait.lock, flags);
|
||||
if (!x->done)
|
||||
ret = 0;
|
||||
spin_unlock_irqrestore(&x->wait.lock, flags);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL(completion_done);
|
||||
|
||||
static long __sched
|
||||
sleep_on_common(wait_queue_head_t *q, int state, long timeout)
|
||||
{
|
||||
|
@ -3598,13 +3324,11 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
|
|||
struct task_struct *p;
|
||||
int retval;
|
||||
|
||||
get_online_cpus();
|
||||
rcu_read_lock();
|
||||
|
||||
p = find_process_by_pid(pid);
|
||||
if (!p) {
|
||||
rcu_read_unlock();
|
||||
put_online_cpus();
|
||||
return -ESRCH;
|
||||
}
|
||||
|
||||
|
@ -3661,7 +3385,6 @@ out_free_cpus_allowed:
|
|||
free_cpumask_var(cpus_allowed);
|
||||
out_put_task:
|
||||
put_task_struct(p);
|
||||
put_online_cpus();
|
||||
return retval;
|
||||
}
|
||||
|
||||
|
@ -3706,7 +3429,6 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
|
|||
unsigned long flags;
|
||||
int retval;
|
||||
|
||||
get_online_cpus();
|
||||
rcu_read_lock();
|
||||
|
||||
retval = -ESRCH;
|
||||
|
@ -3719,12 +3441,11 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask)
|
|||
goto out_unlock;
|
||||
|
||||
raw_spin_lock_irqsave(&p->pi_lock, flags);
|
||||
cpumask_and(mask, &p->cpus_allowed, cpu_online_mask);
|
||||
cpumask_and(mask, &p->cpus_allowed, cpu_active_mask);
|
||||
raw_spin_unlock_irqrestore(&p->pi_lock, flags);
|
||||
|
||||
out_unlock:
|
||||
rcu_read_unlock();
|
||||
put_online_cpus();
|
||||
|
||||
return retval;
|
||||
}
|
||||
|
@ -3794,16 +3515,11 @@ SYSCALL_DEFINE0(sched_yield)
|
|||
return 0;
|
||||
}
|
||||
|
||||
static inline int should_resched(void)
|
||||
{
|
||||
return need_resched() && !(preempt_count() & PREEMPT_ACTIVE);
|
||||
}
|
||||
|
||||
static void __cond_resched(void)
|
||||
{
|
||||
add_preempt_count(PREEMPT_ACTIVE);
|
||||
__preempt_count_add(PREEMPT_ACTIVE);
|
||||
__schedule();
|
||||
sub_preempt_count(PREEMPT_ACTIVE);
|
||||
__preempt_count_sub(PREEMPT_ACTIVE);
|
||||
}
|
||||
|
||||
int __sched _cond_resched(void)
|
||||
|
@ -4186,7 +3902,7 @@ void init_idle(struct task_struct *idle, int cpu)
|
|||
|
||||
raw_spin_lock_irqsave(&rq->lock, flags);
|
||||
|
||||
__sched_fork(idle);
|
||||
__sched_fork(0, idle);
|
||||
idle->state = TASK_RUNNING;
|
||||
idle->se.exec_start = sched_clock();
|
||||
|
||||
|
@ -4212,7 +3928,7 @@ void init_idle(struct task_struct *idle, int cpu)
|
|||
raw_spin_unlock_irqrestore(&rq->lock, flags);
|
||||
|
||||
/* Set the preempt count _outside_ the spinlocks! */
|
||||
task_thread_info(idle)->preempt_count = 0;
|
||||
init_idle_preempt_count(idle, cpu);
|
||||
|
||||
/*
|
||||
* The idle tasks have their own, simple scheduling class:
|
||||
|
@ -4346,6 +4062,53 @@ fail:
|
|||
return ret;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
/* Migrate current task p to target_cpu */
|
||||
int migrate_task_to(struct task_struct *p, int target_cpu)
|
||||
{
|
||||
struct migration_arg arg = { p, target_cpu };
|
||||
int curr_cpu = task_cpu(p);
|
||||
|
||||
if (curr_cpu == target_cpu)
|
||||
return 0;
|
||||
|
||||
if (!cpumask_test_cpu(target_cpu, tsk_cpus_allowed(p)))
|
||||
return -EINVAL;
|
||||
|
||||
/* TODO: This is not properly updating schedstats */
|
||||
|
||||
return stop_one_cpu(curr_cpu, migration_cpu_stop, &arg);
|
||||
}
|
||||
|
||||
/*
|
||||
* Requeue a task on a given node and accurately track the number of NUMA
|
||||
* tasks on the runqueues
|
||||
*/
|
||||
void sched_setnuma(struct task_struct *p, int nid)
|
||||
{
|
||||
struct rq *rq;
|
||||
unsigned long flags;
|
||||
bool on_rq, running;
|
||||
|
||||
rq = task_rq_lock(p, &flags);
|
||||
on_rq = p->on_rq;
|
||||
running = task_current(rq, p);
|
||||
|
||||
if (on_rq)
|
||||
dequeue_task(rq, p, 0);
|
||||
if (running)
|
||||
p->sched_class->put_prev_task(rq, p);
|
||||
|
||||
p->numa_preferred_nid = nid;
|
||||
|
||||
if (running)
|
||||
p->sched_class->set_curr_task(rq);
|
||||
if (on_rq)
|
||||
enqueue_task(rq, p, 0);
|
||||
task_rq_unlock(rq, p, &flags);
|
||||
}
|
||||
#endif
|
||||
|
||||
/*
|
||||
* migration_cpu_stop - this will be executed by a highprio stopper thread
|
||||
* and performs thread migration by bumping thread off CPU then
|
||||
|
@ -5119,6 +4882,9 @@ static void destroy_sched_domains(struct sched_domain *sd, int cpu)
|
|||
DEFINE_PER_CPU(struct sched_domain *, sd_llc);
|
||||
DEFINE_PER_CPU(int, sd_llc_size);
|
||||
DEFINE_PER_CPU(int, sd_llc_id);
|
||||
DEFINE_PER_CPU(struct sched_domain *, sd_numa);
|
||||
DEFINE_PER_CPU(struct sched_domain *, sd_busy);
|
||||
DEFINE_PER_CPU(struct sched_domain *, sd_asym);
|
||||
|
||||
static void update_top_cache_domain(int cpu)
|
||||
{
|
||||
|
@ -5130,11 +4896,18 @@ static void update_top_cache_domain(int cpu)
|
|||
if (sd) {
|
||||
id = cpumask_first(sched_domain_span(sd));
|
||||
size = cpumask_weight(sched_domain_span(sd));
|
||||
rcu_assign_pointer(per_cpu(sd_busy, cpu), sd->parent);
|
||||
}
|
||||
|
||||
rcu_assign_pointer(per_cpu(sd_llc, cpu), sd);
|
||||
per_cpu(sd_llc_size, cpu) = size;
|
||||
per_cpu(sd_llc_id, cpu) = id;
|
||||
|
||||
sd = lowest_flag_domain(cpu, SD_NUMA);
|
||||
rcu_assign_pointer(per_cpu(sd_numa, cpu), sd);
|
||||
|
||||
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
|
||||
rcu_assign_pointer(per_cpu(sd_asym, cpu), sd);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -5654,6 +5427,7 @@ sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
|
|||
| 0*SD_SHARE_PKG_RESOURCES
|
||||
| 1*SD_SERIALIZE
|
||||
| 0*SD_PREFER_SIBLING
|
||||
| 1*SD_NUMA
|
||||
| sd_local_flags(level)
|
||||
,
|
||||
.last_balance = jiffies,
|
||||
|
@ -6335,14 +6109,17 @@ void __init sched_init_smp(void)
|
|||
|
||||
sched_init_numa();
|
||||
|
||||
get_online_cpus();
|
||||
/*
|
||||
* There's no userspace yet to cause hotplug operations; hence all the
|
||||
* cpu masks are stable and all blatant races in the below code cannot
|
||||
* happen.
|
||||
*/
|
||||
mutex_lock(&sched_domains_mutex);
|
||||
init_sched_domains(cpu_active_mask);
|
||||
cpumask_andnot(non_isolated_cpus, cpu_possible_mask, cpu_isolated_map);
|
||||
if (cpumask_empty(non_isolated_cpus))
|
||||
cpumask_set_cpu(smp_processor_id(), non_isolated_cpus);
|
||||
mutex_unlock(&sched_domains_mutex);
|
||||
put_online_cpus();
|
||||
|
||||
hotcpu_notifier(sched_domains_numa_masks_update, CPU_PRI_SCHED_ACTIVE);
|
||||
hotcpu_notifier(cpuset_cpu_active, CPU_PRI_CPUSET_ACTIVE);
|
||||
|
@ -6505,6 +6282,7 @@ void __init sched_init(void)
|
|||
rq->online = 0;
|
||||
rq->idle_stamp = 0;
|
||||
rq->avg_idle = 2*sysctl_sched_migration_cost;
|
||||
rq->max_idle_balance_cost = sysctl_sched_migration_cost;
|
||||
|
||||
INIT_LIST_HEAD(&rq->cfs_tasks);
|
||||
|
||||
|
@ -7277,7 +7055,12 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
|
|||
|
||||
runtime_enabled = quota != RUNTIME_INF;
|
||||
runtime_was_enabled = cfs_b->quota != RUNTIME_INF;
|
||||
account_cfs_bandwidth_used(runtime_enabled, runtime_was_enabled);
|
||||
/*
|
||||
* If we need to toggle cfs_bandwidth_used, off->on must occur
|
||||
* before making related changes, and on->off must occur afterwards
|
||||
*/
|
||||
if (runtime_enabled && !runtime_was_enabled)
|
||||
cfs_bandwidth_usage_inc();
|
||||
raw_spin_lock_irq(&cfs_b->lock);
|
||||
cfs_b->period = ns_to_ktime(period);
|
||||
cfs_b->quota = quota;
|
||||
|
@ -7303,6 +7086,8 @@ static int tg_set_cfs_bandwidth(struct task_group *tg, u64 period, u64 quota)
|
|||
unthrottle_cfs_rq(cfs_rq);
|
||||
raw_spin_unlock_irq(&rq->lock);
|
||||
}
|
||||
if (runtime_was_enabled && !runtime_enabled)
|
||||
cfs_bandwidth_usage_dec();
|
||||
out_unlock:
|
||||
mutex_unlock(&cfs_constraints_mutex);
|
||||
|
||||
|
|
|
@ -15,6 +15,7 @@
|
|||
#include <linux/seq_file.h>
|
||||
#include <linux/kallsyms.h>
|
||||
#include <linux/utsname.h>
|
||||
#include <linux/mempolicy.h>
|
||||
|
||||
#include "sched.h"
|
||||
|
||||
|
@ -137,6 +138,9 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p)
|
|||
SEQ_printf(m, "%15Ld %15Ld %15Ld.%06ld %15Ld.%06ld %15Ld.%06ld",
|
||||
0LL, 0LL, 0LL, 0L, 0LL, 0L, 0LL, 0L);
|
||||
#endif
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
SEQ_printf(m, " %d", cpu_to_node(task_cpu(p)));
|
||||
#endif
|
||||
#ifdef CONFIG_CGROUP_SCHED
|
||||
SEQ_printf(m, " %s", task_group_path(task_group(p)));
|
||||
#endif
|
||||
|
@ -159,7 +163,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
|
|||
read_lock_irqsave(&tasklist_lock, flags);
|
||||
|
||||
do_each_thread(g, p) {
|
||||
if (!p->on_rq || task_cpu(p) != rq_cpu)
|
||||
if (task_cpu(p) != rq_cpu)
|
||||
continue;
|
||||
|
||||
print_task(m, rq, p);
|
||||
|
@ -225,6 +229,14 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
|
|||
atomic_read(&cfs_rq->tg->runnable_avg));
|
||||
#endif
|
||||
#endif
|
||||
#ifdef CONFIG_CFS_BANDWIDTH
|
||||
SEQ_printf(m, " .%-30s: %d\n", "tg->cfs_bandwidth.timer_active",
|
||||
cfs_rq->tg->cfs_bandwidth.timer_active);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "throttled",
|
||||
cfs_rq->throttled);
|
||||
SEQ_printf(m, " .%-30s: %d\n", "throttle_count",
|
||||
cfs_rq->throttle_count);
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_FAIR_GROUP_SCHED
|
||||
print_cfs_group_stats(m, cpu, cfs_rq->tg);
|
||||
|
@ -345,7 +357,7 @@ static void sched_debug_header(struct seq_file *m)
|
|||
cpu_clk = local_clock();
|
||||
local_irq_restore(flags);
|
||||
|
||||
SEQ_printf(m, "Sched Debug Version: v0.10, %s %.*s\n",
|
||||
SEQ_printf(m, "Sched Debug Version: v0.11, %s %.*s\n",
|
||||
init_utsname()->release,
|
||||
(int)strcspn(init_utsname()->version, " "),
|
||||
init_utsname()->version);
|
||||
|
@ -488,6 +500,56 @@ static int __init init_sched_debug_procfs(void)
|
|||
|
||||
__initcall(init_sched_debug_procfs);
|
||||
|
||||
#define __P(F) \
|
||||
SEQ_printf(m, "%-45s:%21Ld\n", #F, (long long)F)
|
||||
#define P(F) \
|
||||
SEQ_printf(m, "%-45s:%21Ld\n", #F, (long long)p->F)
|
||||
#define __PN(F) \
|
||||
SEQ_printf(m, "%-45s:%14Ld.%06ld\n", #F, SPLIT_NS((long long)F))
|
||||
#define PN(F) \
|
||||
SEQ_printf(m, "%-45s:%14Ld.%06ld\n", #F, SPLIT_NS((long long)p->F))
|
||||
|
||||
|
||||
static void sched_show_numa(struct task_struct *p, struct seq_file *m)
|
||||
{
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
struct mempolicy *pol;
|
||||
int node, i;
|
||||
|
||||
if (p->mm)
|
||||
P(mm->numa_scan_seq);
|
||||
|
||||
task_lock(p);
|
||||
pol = p->mempolicy;
|
||||
if (pol && !(pol->flags & MPOL_F_MORON))
|
||||
pol = NULL;
|
||||
mpol_get(pol);
|
||||
task_unlock(p);
|
||||
|
||||
SEQ_printf(m, "numa_migrations, %ld\n", xchg(&p->numa_pages_migrated, 0));
|
||||
|
||||
for_each_online_node(node) {
|
||||
for (i = 0; i < 2; i++) {
|
||||
unsigned long nr_faults = -1;
|
||||
int cpu_current, home_node;
|
||||
|
||||
if (p->numa_faults)
|
||||
nr_faults = p->numa_faults[2*node + i];
|
||||
|
||||
cpu_current = !i ? (task_node(p) == node) :
|
||||
(pol && node_isset(node, pol->v.nodes));
|
||||
|
||||
home_node = (p->numa_preferred_nid == node);
|
||||
|
||||
SEQ_printf(m, "numa_faults, %d, %d, %d, %d, %ld\n",
|
||||
i, node, cpu_current, home_node, nr_faults);
|
||||
}
|
||||
}
|
||||
|
||||
mpol_put(pol);
|
||||
#endif
|
||||
}
|
||||
|
||||
void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
|
||||
{
|
||||
unsigned long nr_switches;
|
||||
|
@ -591,6 +653,8 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m)
|
|||
SEQ_printf(m, "%-45s:%21Ld\n",
|
||||
"clock-delta", (long long)(t1-t0));
|
||||
}
|
||||
|
||||
sched_show_numa(p, m);
|
||||
}
|
||||
|
||||
void proc_sched_set_task(struct task_struct *p)
|
||||
|
|
1393
kernel/sched/fair.c
1393
kernel/sched/fair.c
File diff suppressed because it is too large
Load Diff
|
@ -63,10 +63,23 @@ SCHED_FEAT(LB_MIN, false)
|
|||
/*
|
||||
* Apply the automatic NUMA scheduling policy. Enabled automatically
|
||||
* at runtime if running on a NUMA machine. Can be controlled via
|
||||
* numa_balancing=. Allow PTE scanning to be forced on UMA machines
|
||||
* for debugging the core machinery.
|
||||
* numa_balancing=
|
||||
*/
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
SCHED_FEAT(NUMA, false)
|
||||
SCHED_FEAT(NUMA_FORCE, false)
|
||||
|
||||
/*
|
||||
* NUMA_FAVOUR_HIGHER will favor moving tasks towards nodes where a
|
||||
* higher number of hinting faults are recorded during active load
|
||||
* balancing.
|
||||
*/
|
||||
SCHED_FEAT(NUMA_FAVOUR_HIGHER, true)
|
||||
|
||||
/*
|
||||
* NUMA_RESIST_LOWER will resist moving tasks towards nodes where a
|
||||
* lower number of hinting faults have been recorded. As this has
|
||||
* the potential to prevent a task ever migrating to a new node
|
||||
* due to CPU overload it is disabled by default.
|
||||
*/
|
||||
SCHED_FEAT(NUMA_RESIST_LOWER, false)
|
||||
#endif
|
||||
|
|
|
@ -9,7 +9,7 @@
|
|||
|
||||
#ifdef CONFIG_SMP
|
||||
static int
|
||||
select_task_rq_idle(struct task_struct *p, int sd_flag, int flags)
|
||||
select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags)
|
||||
{
|
||||
return task_cpu(p); /* IDLE tasks as never migrated */
|
||||
}
|
||||
|
|
|
@ -246,8 +246,10 @@ static inline void rt_set_overload(struct rq *rq)
|
|||
* if we should look at the mask. It would be a shame
|
||||
* if we looked at the mask, but the mask was not
|
||||
* updated yet.
|
||||
*
|
||||
* Matched by the barrier in pull_rt_task().
|
||||
*/
|
||||
wmb();
|
||||
smp_wmb();
|
||||
atomic_inc(&rq->rd->rto_count);
|
||||
}
|
||||
|
||||
|
@ -1169,13 +1171,10 @@ static void yield_task_rt(struct rq *rq)
|
|||
static int find_lowest_rq(struct task_struct *task);
|
||||
|
||||
static int
|
||||
select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
|
||||
select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
|
||||
{
|
||||
struct task_struct *curr;
|
||||
struct rq *rq;
|
||||
int cpu;
|
||||
|
||||
cpu = task_cpu(p);
|
||||
|
||||
if (p->nr_cpus_allowed == 1)
|
||||
goto out;
|
||||
|
@ -1213,8 +1212,7 @@ select_task_rq_rt(struct task_struct *p, int sd_flag, int flags)
|
|||
*/
|
||||
if (curr && unlikely(rt_task(curr)) &&
|
||||
(curr->nr_cpus_allowed < 2 ||
|
||||
curr->prio <= p->prio) &&
|
||||
(p->nr_cpus_allowed > 1)) {
|
||||
curr->prio <= p->prio)) {
|
||||
int target = find_lowest_rq(p);
|
||||
|
||||
if (target != -1)
|
||||
|
@ -1630,6 +1628,12 @@ static int pull_rt_task(struct rq *this_rq)
|
|||
if (likely(!rt_overloaded(this_rq)))
|
||||
return 0;
|
||||
|
||||
/*
|
||||
* Match the barrier from rt_set_overloaded; this guarantees that if we
|
||||
* see overloaded we must also see the rto_mask bit.
|
||||
*/
|
||||
smp_rmb();
|
||||
|
||||
for_each_cpu(cpu, this_rq->rd->rto_mask) {
|
||||
if (this_cpu == cpu)
|
||||
continue;
|
||||
|
@ -1931,8 +1935,8 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
|
|||
p->rt.time_slice = sched_rr_timeslice;
|
||||
|
||||
/*
|
||||
* Requeue to the end of queue if we (and all of our ancestors) are the
|
||||
* only element on the queue
|
||||
* Requeue to the end of queue if we (and all of our ancestors) are not
|
||||
* the only element on the queue
|
||||
*/
|
||||
for_each_sched_rt_entity(rt_se) {
|
||||
if (rt_se->run_list.prev != rt_se->run_list.next) {
|
||||
|
|
|
@ -6,6 +6,7 @@
|
|||
#include <linux/spinlock.h>
|
||||
#include <linux/stop_machine.h>
|
||||
#include <linux/tick.h>
|
||||
#include <linux/slab.h>
|
||||
|
||||
#include "cpupri.h"
|
||||
#include "cpuacct.h"
|
||||
|
@ -408,6 +409,10 @@ struct rq {
|
|||
* remote CPUs use both these fields when doing load calculation.
|
||||
*/
|
||||
unsigned int nr_running;
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
unsigned int nr_numa_running;
|
||||
unsigned int nr_preferred_running;
|
||||
#endif
|
||||
#define CPU_LOAD_IDX_MAX 5
|
||||
unsigned long cpu_load[CPU_LOAD_IDX_MAX];
|
||||
unsigned long last_load_update_tick;
|
||||
|
@ -476,6 +481,9 @@ struct rq {
|
|||
u64 age_stamp;
|
||||
u64 idle_stamp;
|
||||
u64 avg_idle;
|
||||
|
||||
/* This is used to determine avg_idle's max value */
|
||||
u64 max_idle_balance_cost;
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
|
||||
|
@ -552,6 +560,12 @@ static inline u64 rq_clock_task(struct rq *rq)
|
|||
return rq->clock_task;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_NUMA_BALANCING
|
||||
extern void sched_setnuma(struct task_struct *p, int node);
|
||||
extern int migrate_task_to(struct task_struct *p, int cpu);
|
||||
extern int migrate_swap(struct task_struct *, struct task_struct *);
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
|
||||
#define rcu_dereference_check_sched_domain(p) \
|
||||
|
@ -593,9 +607,24 @@ static inline struct sched_domain *highest_flag_domain(int cpu, int flag)
|
|||
return hsd;
|
||||
}
|
||||
|
||||
static inline struct sched_domain *lowest_flag_domain(int cpu, int flag)
|
||||
{
|
||||
struct sched_domain *sd;
|
||||
|
||||
for_each_domain(cpu, sd) {
|
||||
if (sd->flags & flag)
|
||||
break;
|
||||
}
|
||||
|
||||
return sd;
|
||||
}
|
||||
|
||||
DECLARE_PER_CPU(struct sched_domain *, sd_llc);
|
||||
DECLARE_PER_CPU(int, sd_llc_size);
|
||||
DECLARE_PER_CPU(int, sd_llc_id);
|
||||
DECLARE_PER_CPU(struct sched_domain *, sd_numa);
|
||||
DECLARE_PER_CPU(struct sched_domain *, sd_busy);
|
||||
DECLARE_PER_CPU(struct sched_domain *, sd_asym);
|
||||
|
||||
struct sched_group_power {
|
||||
atomic_t ref;
|
||||
|
@ -605,6 +634,7 @@ struct sched_group_power {
|
|||
*/
|
||||
unsigned int power, power_orig;
|
||||
unsigned long next_update;
|
||||
int imbalance; /* XXX unrelated to power but shared group state */
|
||||
/*
|
||||
* Number of busy cpus in this group.
|
||||
*/
|
||||
|
@ -719,6 +749,7 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
|
|||
*/
|
||||
smp_wmb();
|
||||
task_thread_info(p)->cpu = cpu;
|
||||
p->wake_cpu = cpu;
|
||||
#endif
|
||||
}
|
||||
|
||||
|
@ -974,7 +1005,7 @@ struct sched_class {
|
|||
void (*put_prev_task) (struct rq *rq, struct task_struct *p);
|
||||
|
||||
#ifdef CONFIG_SMP
|
||||
int (*select_task_rq)(struct task_struct *p, int sd_flag, int flags);
|
||||
int (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
|
||||
void (*migrate_task_rq)(struct task_struct *p, int next_cpu);
|
||||
|
||||
void (*pre_schedule) (struct rq *this_rq, struct task_struct *task);
|
||||
|
@ -1220,6 +1251,24 @@ static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
|
|||
lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_);
|
||||
}
|
||||
|
||||
static inline void double_lock(spinlock_t *l1, spinlock_t *l2)
|
||||
{
|
||||
if (l1 > l2)
|
||||
swap(l1, l2);
|
||||
|
||||
spin_lock(l1);
|
||||
spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
|
||||
}
|
||||
|
||||
static inline void double_raw_lock(raw_spinlock_t *l1, raw_spinlock_t *l2)
|
||||
{
|
||||
if (l1 > l2)
|
||||
swap(l1, l2);
|
||||
|
||||
raw_spin_lock(l1);
|
||||
raw_spin_lock_nested(l2, SINGLE_DEPTH_NESTING);
|
||||
}
|
||||
|
||||
/*
|
||||
* double_rq_lock - safely lock two runqueues
|
||||
*
|
||||
|
@ -1305,7 +1354,8 @@ extern void print_rt_stats(struct seq_file *m, int cpu);
|
|||
extern void init_cfs_rq(struct cfs_rq *cfs_rq);
|
||||
extern void init_rt_rq(struct rt_rq *rt_rq, struct rq *rq);
|
||||
|
||||
extern void account_cfs_bandwidth_used(int enabled, int was_enabled);
|
||||
extern void cfs_bandwidth_usage_inc(void);
|
||||
extern void cfs_bandwidth_usage_dec(void);
|
||||
|
||||
#ifdef CONFIG_NO_HZ_COMMON
|
||||
enum rq_nohz_flag_bits {
|
||||
|
|
|
@ -59,9 +59,9 @@ static inline void sched_info_reset_dequeued(struct task_struct *t)
|
|||
* from dequeue_task() to account for possible rq->clock skew across cpus. The
|
||||
* delta taken on each cpu would annul the skew.
|
||||
*/
|
||||
static inline void sched_info_dequeued(struct task_struct *t)
|
||||
static inline void sched_info_dequeued(struct rq *rq, struct task_struct *t)
|
||||
{
|
||||
unsigned long long now = rq_clock(task_rq(t)), delta = 0;
|
||||
unsigned long long now = rq_clock(rq), delta = 0;
|
||||
|
||||
if (unlikely(sched_info_on()))
|
||||
if (t->sched_info.last_queued)
|
||||
|
@ -69,7 +69,7 @@ static inline void sched_info_dequeued(struct task_struct *t)
|
|||
sched_info_reset_dequeued(t);
|
||||
t->sched_info.run_delay += delta;
|
||||
|
||||
rq_sched_info_dequeued(task_rq(t), delta);
|
||||
rq_sched_info_dequeued(rq, delta);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -77,9 +77,9 @@ static inline void sched_info_dequeued(struct task_struct *t)
|
|||
* long it was waiting to run. We also note when it began so that we
|
||||
* can keep stats on how long its timeslice is.
|
||||
*/
|
||||
static void sched_info_arrive(struct task_struct *t)
|
||||
static void sched_info_arrive(struct rq *rq, struct task_struct *t)
|
||||
{
|
||||
unsigned long long now = rq_clock(task_rq(t)), delta = 0;
|
||||
unsigned long long now = rq_clock(rq), delta = 0;
|
||||
|
||||
if (t->sched_info.last_queued)
|
||||
delta = now - t->sched_info.last_queued;
|
||||
|
@ -88,7 +88,7 @@ static void sched_info_arrive(struct task_struct *t)
|
|||
t->sched_info.last_arrival = now;
|
||||
t->sched_info.pcount++;
|
||||
|
||||
rq_sched_info_arrive(task_rq(t), delta);
|
||||
rq_sched_info_arrive(rq, delta);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -96,11 +96,11 @@ static void sched_info_arrive(struct task_struct *t)
|
|||
* the timestamp if it is already not set. It's assumed that
|
||||
* sched_info_dequeued() will clear that stamp when appropriate.
|
||||
*/
|
||||
static inline void sched_info_queued(struct task_struct *t)
|
||||
static inline void sched_info_queued(struct rq *rq, struct task_struct *t)
|
||||
{
|
||||
if (unlikely(sched_info_on()))
|
||||
if (!t->sched_info.last_queued)
|
||||
t->sched_info.last_queued = rq_clock(task_rq(t));
|
||||
t->sched_info.last_queued = rq_clock(rq);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -111,15 +111,15 @@ static inline void sched_info_queued(struct task_struct *t)
|
|||
* sched_info_queued() to mark that it has now again started waiting on
|
||||
* the runqueue.
|
||||
*/
|
||||
static inline void sched_info_depart(struct task_struct *t)
|
||||
static inline void sched_info_depart(struct rq *rq, struct task_struct *t)
|
||||
{
|
||||
unsigned long long delta = rq_clock(task_rq(t)) -
|
||||
unsigned long long delta = rq_clock(rq) -
|
||||
t->sched_info.last_arrival;
|
||||
|
||||
rq_sched_info_depart(task_rq(t), delta);
|
||||
rq_sched_info_depart(rq, delta);
|
||||
|
||||
if (t->state == TASK_RUNNING)
|
||||
sched_info_queued(t);
|
||||
sched_info_queued(rq, t);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -128,32 +128,34 @@ static inline void sched_info_depart(struct task_struct *t)
|
|||
* the idle task.) We are only called when prev != next.
|
||||
*/
|
||||
static inline void
|
||||
__sched_info_switch(struct task_struct *prev, struct task_struct *next)
|
||||
__sched_info_switch(struct rq *rq,
|
||||
struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
struct rq *rq = task_rq(prev);
|
||||
|
||||
/*
|
||||
* prev now departs the cpu. It's not interesting to record
|
||||
* stats about how efficient we were at scheduling the idle
|
||||
* process, however.
|
||||
*/
|
||||
if (prev != rq->idle)
|
||||
sched_info_depart(prev);
|
||||
sched_info_depart(rq, prev);
|
||||
|
||||
if (next != rq->idle)
|
||||
sched_info_arrive(next);
|
||||
sched_info_arrive(rq, next);
|
||||
}
|
||||
static inline void
|
||||
sched_info_switch(struct task_struct *prev, struct task_struct *next)
|
||||
sched_info_switch(struct rq *rq,
|
||||
struct task_struct *prev, struct task_struct *next)
|
||||
{
|
||||
if (unlikely(sched_info_on()))
|
||||
__sched_info_switch(prev, next);
|
||||
__sched_info_switch(rq, prev, next);
|
||||
}
|
||||
#else
|
||||
#define sched_info_queued(t) do { } while (0)
|
||||
#define sched_info_queued(rq, t) do { } while (0)
|
||||
#define sched_info_reset_dequeued(t) do { } while (0)
|
||||
#define sched_info_dequeued(t) do { } while (0)
|
||||
#define sched_info_switch(t, next) do { } while (0)
|
||||
#define sched_info_dequeued(rq, t) do { } while (0)
|
||||
#define sched_info_depart(rq, t) do { } while (0)
|
||||
#define sched_info_arrive(rq, next) do { } while (0)
|
||||
#define sched_info_switch(rq, t, next) do { } while (0)
|
||||
#endif /* CONFIG_SCHEDSTATS || CONFIG_TASK_DELAY_ACCT */
|
||||
|
||||
/*
|
||||
|
|
|
@ -11,7 +11,7 @@
|
|||
|
||||
#ifdef CONFIG_SMP
|
||||
static int
|
||||
select_task_rq_stop(struct task_struct *p, int sd_flag, int flags)
|
||||
select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags)
|
||||
{
|
||||
return task_cpu(p); /* stop tasks as never migrate */
|
||||
}
|
||||
|
|
|
@ -52,6 +52,109 @@ void remove_wait_queue(wait_queue_head_t *q, wait_queue_t *wait)
|
|||
EXPORT_SYMBOL(remove_wait_queue);
|
||||
|
||||
|
||||
/*
|
||||
* The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just
|
||||
* wake everything up. If it's an exclusive wakeup (nr_exclusive == small +ve
|
||||
* number) then we wake all the non-exclusive tasks and one exclusive task.
|
||||
*
|
||||
* There are circumstances in which we can try to wake a task which has already
|
||||
* started to run but is not in state TASK_RUNNING. try_to_wake_up() returns
|
||||
* zero in this (rare) case, and we handle it by continuing to scan the queue.
|
||||
*/
|
||||
static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, int wake_flags, void *key)
|
||||
{
|
||||
wait_queue_t *curr, *next;
|
||||
|
||||
list_for_each_entry_safe(curr, next, &q->task_list, task_list) {
|
||||
unsigned flags = curr->flags;
|
||||
|
||||
if (curr->func(curr, mode, wake_flags, key) &&
|
||||
(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* __wake_up - wake up threads blocked on a waitqueue.
|
||||
* @q: the waitqueue
|
||||
* @mode: which threads
|
||||
* @nr_exclusive: how many wake-one or wake-many threads to wake up
|
||||
* @key: is directly passed to the wakeup function
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void __wake_up(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, void *key)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
spin_lock_irqsave(&q->lock, flags);
|
||||
__wake_up_common(q, mode, nr_exclusive, 0, key);
|
||||
spin_unlock_irqrestore(&q->lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL(__wake_up);
|
||||
|
||||
/*
|
||||
* Same as __wake_up but called with the spinlock in wait_queue_head_t held.
|
||||
*/
|
||||
void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
|
||||
{
|
||||
__wake_up_common(q, mode, nr, 0, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_locked);
|
||||
|
||||
void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key)
|
||||
{
|
||||
__wake_up_common(q, mode, 1, 0, key);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_locked_key);
|
||||
|
||||
/**
|
||||
* __wake_up_sync_key - wake up threads blocked on a waitqueue.
|
||||
* @q: the waitqueue
|
||||
* @mode: which threads
|
||||
* @nr_exclusive: how many wake-one or wake-many threads to wake up
|
||||
* @key: opaque value to be passed to wakeup targets
|
||||
*
|
||||
* The sync wakeup differs that the waker knows that it will schedule
|
||||
* away soon, so while the target thread will be woken up, it will not
|
||||
* be migrated to another CPU - ie. the two threads are 'synchronized'
|
||||
* with each other. This can prevent needless bouncing between CPUs.
|
||||
*
|
||||
* On UP it can prevent extra preemption.
|
||||
*
|
||||
* It may be assumed that this function implies a write memory barrier before
|
||||
* changing the task state if and only if any tasks are woken up.
|
||||
*/
|
||||
void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode,
|
||||
int nr_exclusive, void *key)
|
||||
{
|
||||
unsigned long flags;
|
||||
int wake_flags = 1; /* XXX WF_SYNC */
|
||||
|
||||
if (unlikely(!q))
|
||||
return;
|
||||
|
||||
if (unlikely(nr_exclusive != 1))
|
||||
wake_flags = 0;
|
||||
|
||||
spin_lock_irqsave(&q->lock, flags);
|
||||
__wake_up_common(q, mode, nr_exclusive, wake_flags, key);
|
||||
spin_unlock_irqrestore(&q->lock, flags);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_sync_key);
|
||||
|
||||
/*
|
||||
* __wake_up_sync - see __wake_up_sync_key()
|
||||
*/
|
||||
void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr_exclusive)
|
||||
{
|
||||
__wake_up_sync_key(q, mode, nr_exclusive, NULL);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(__wake_up_sync); /* For internal use only */
|
||||
|
||||
/*
|
||||
* Note: we use "set_current_state()" _after_ the wait-queue add,
|
||||
* because we need a memory barrier there on SMP, so that any
|
||||
|
@ -92,6 +195,30 @@ prepare_to_wait_exclusive(wait_queue_head_t *q, wait_queue_t *wait, int state)
|
|||
}
|
||||
EXPORT_SYMBOL(prepare_to_wait_exclusive);
|
||||
|
||||
long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
|
||||
{
|
||||
unsigned long flags;
|
||||
|
||||
if (signal_pending_state(state, current))
|
||||
return -ERESTARTSYS;
|
||||
|
||||
wait->private = current;
|
||||
wait->func = autoremove_wake_function;
|
||||
|
||||
spin_lock_irqsave(&q->lock, flags);
|
||||
if (list_empty(&wait->task_list)) {
|
||||
if (wait->flags & WQ_FLAG_EXCLUSIVE)
|
||||
__add_wait_queue_tail(q, wait);
|
||||
else
|
||||
__add_wait_queue(q, wait);
|
||||
}
|
||||
set_current_state(state);
|
||||
spin_unlock_irqrestore(&q->lock, flags);
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(prepare_to_wait_event);
|
||||
|
||||
/**
|
||||
* finish_wait - clean up after waiting in a queue
|
||||
* @q: waitqueue waited on
|
|
@ -99,13 +99,13 @@ static void __local_bh_disable(unsigned long ip, unsigned int cnt)
|
|||
|
||||
raw_local_irq_save(flags);
|
||||
/*
|
||||
* The preempt tracer hooks into add_preempt_count and will break
|
||||
* The preempt tracer hooks into preempt_count_add and will break
|
||||
* lockdep because it calls back into lockdep after SOFTIRQ_OFFSET
|
||||
* is set and before current->softirq_enabled is cleared.
|
||||
* We must manually increment preempt_count here and manually
|
||||
* call the trace_preempt_off later.
|
||||
*/
|
||||
preempt_count() += cnt;
|
||||
__preempt_count_add(cnt);
|
||||
/*
|
||||
* Were softirqs turned off above:
|
||||
*/
|
||||
|
@ -119,7 +119,7 @@ static void __local_bh_disable(unsigned long ip, unsigned int cnt)
|
|||
#else /* !CONFIG_TRACE_IRQFLAGS */
|
||||
static inline void __local_bh_disable(unsigned long ip, unsigned int cnt)
|
||||
{
|
||||
add_preempt_count(cnt);
|
||||
preempt_count_add(cnt);
|
||||
barrier();
|
||||
}
|
||||
#endif /* CONFIG_TRACE_IRQFLAGS */
|
||||
|
@ -137,7 +137,7 @@ static void __local_bh_enable(unsigned int cnt)
|
|||
|
||||
if (softirq_count() == cnt)
|
||||
trace_softirqs_on(_RET_IP_);
|
||||
sub_preempt_count(cnt);
|
||||
preempt_count_sub(cnt);
|
||||
}
|
||||
|
||||
/*
|
||||
|
@ -168,7 +168,7 @@ static inline void _local_bh_enable_ip(unsigned long ip)
|
|||
* Keep preemption disabled until we are done with
|
||||
* softirq processing:
|
||||
*/
|
||||
sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);
|
||||
preempt_count_sub(SOFTIRQ_DISABLE_OFFSET - 1);
|
||||
|
||||
if (unlikely(!in_interrupt() && local_softirq_pending())) {
|
||||
/*
|
||||
|
@ -178,7 +178,7 @@ static inline void _local_bh_enable_ip(unsigned long ip)
|
|||
do_softirq();
|
||||
}
|
||||
|
||||
dec_preempt_count();
|
||||
preempt_count_dec();
|
||||
#ifdef CONFIG_TRACE_IRQFLAGS
|
||||
local_irq_enable();
|
||||
#endif
|
||||
|
@ -260,7 +260,7 @@ restart:
|
|||
" exited with %08x?\n", vec_nr,
|
||||
softirq_to_name[vec_nr], h->action,
|
||||
prev_count, preempt_count());
|
||||
preempt_count() = prev_count;
|
||||
preempt_count_set(prev_count);
|
||||
}
|
||||
|
||||
rcu_bh_qs(cpu);
|
||||
|
@ -378,7 +378,7 @@ void irq_exit(void)
|
|||
|
||||
account_irq_exit_time(current);
|
||||
trace_hardirq_exit();
|
||||
sub_preempt_count(HARDIRQ_OFFSET);
|
||||
preempt_count_sub(HARDIRQ_OFFSET);
|
||||
if (!in_interrupt() && local_softirq_pending())
|
||||
invoke_softirq();
|
||||
|
||||
|
|
|
@ -20,6 +20,7 @@
|
|||
#include <linux/kallsyms.h>
|
||||
#include <linux/smpboot.h>
|
||||
#include <linux/atomic.h>
|
||||
#include <linux/lglock.h>
|
||||
|
||||
/*
|
||||
* Structure to determine completion condition and record errors. May
|
||||
|
@ -43,6 +44,14 @@ static DEFINE_PER_CPU(struct cpu_stopper, cpu_stopper);
|
|||
static DEFINE_PER_CPU(struct task_struct *, cpu_stopper_task);
|
||||
static bool stop_machine_initialized = false;
|
||||
|
||||
/*
|
||||
* Avoids a race between stop_two_cpus and global stop_cpus, where
|
||||
* the stoppers could get queued up in reverse order, leading to
|
||||
* system deadlock. Using an lglock means stop_two_cpus remains
|
||||
* relatively cheap.
|
||||
*/
|
||||
DEFINE_STATIC_LGLOCK(stop_cpus_lock);
|
||||
|
||||
static void cpu_stop_init_done(struct cpu_stop_done *done, unsigned int nr_todo)
|
||||
{
|
||||
memset(done, 0, sizeof(*done));
|
||||
|
@ -115,6 +124,184 @@ int stop_one_cpu(unsigned int cpu, cpu_stop_fn_t fn, void *arg)
|
|||
return done.executed ? done.ret : -ENOENT;
|
||||
}
|
||||
|
||||
/* This controls the threads on each CPU. */
|
||||
enum multi_stop_state {
|
||||
/* Dummy starting state for thread. */
|
||||
MULTI_STOP_NONE,
|
||||
/* Awaiting everyone to be scheduled. */
|
||||
MULTI_STOP_PREPARE,
|
||||
/* Disable interrupts. */
|
||||
MULTI_STOP_DISABLE_IRQ,
|
||||
/* Run the function */
|
||||
MULTI_STOP_RUN,
|
||||
/* Exit */
|
||||
MULTI_STOP_EXIT,
|
||||
};
|
||||
|
||||
struct multi_stop_data {
|
||||
int (*fn)(void *);
|
||||
void *data;
|
||||
/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
|
||||
unsigned int num_threads;
|
||||
const struct cpumask *active_cpus;
|
||||
|
||||
enum multi_stop_state state;
|
||||
atomic_t thread_ack;
|
||||
};
|
||||
|
||||
static void set_state(struct multi_stop_data *msdata,
|
||||
enum multi_stop_state newstate)
|
||||
{
|
||||
/* Reset ack counter. */
|
||||
atomic_set(&msdata->thread_ack, msdata->num_threads);
|
||||
smp_wmb();
|
||||
msdata->state = newstate;
|
||||
}
|
||||
|
||||
/* Last one to ack a state moves to the next state. */
|
||||
static void ack_state(struct multi_stop_data *msdata)
|
||||
{
|
||||
if (atomic_dec_and_test(&msdata->thread_ack))
|
||||
set_state(msdata, msdata->state + 1);
|
||||
}
|
||||
|
||||
/* This is the cpu_stop function which stops the CPU. */
|
||||
static int multi_cpu_stop(void *data)
|
||||
{
|
||||
struct multi_stop_data *msdata = data;
|
||||
enum multi_stop_state curstate = MULTI_STOP_NONE;
|
||||
int cpu = smp_processor_id(), err = 0;
|
||||
unsigned long flags;
|
||||
bool is_active;
|
||||
|
||||
/*
|
||||
* When called from stop_machine_from_inactive_cpu(), irq might
|
||||
* already be disabled. Save the state and restore it on exit.
|
||||
*/
|
||||
local_save_flags(flags);
|
||||
|
||||
if (!msdata->active_cpus)
|
||||
is_active = cpu == cpumask_first(cpu_online_mask);
|
||||
else
|
||||
is_active = cpumask_test_cpu(cpu, msdata->active_cpus);
|
||||
|
||||
/* Simple state machine */
|
||||
do {
|
||||
/* Chill out and ensure we re-read multi_stop_state. */
|
||||
cpu_relax();
|
||||
if (msdata->state != curstate) {
|
||||
curstate = msdata->state;
|
||||
switch (curstate) {
|
||||
case MULTI_STOP_DISABLE_IRQ:
|
||||
local_irq_disable();
|
||||
hard_irq_disable();
|
||||
break;
|
||||
case MULTI_STOP_RUN:
|
||||
if (is_active)
|
||||
err = msdata->fn(msdata->data);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
ack_state(msdata);
|
||||
}
|
||||
} while (curstate != MULTI_STOP_EXIT);
|
||||
|
||||
local_irq_restore(flags);
|
||||
return err;
|
||||
}
|
||||
|
||||
struct irq_cpu_stop_queue_work_info {
|
||||
int cpu1;
|
||||
int cpu2;
|
||||
struct cpu_stop_work *work1;
|
||||
struct cpu_stop_work *work2;
|
||||
};
|
||||
|
||||
/*
|
||||
* This function is always run with irqs and preemption disabled.
|
||||
* This guarantees that both work1 and work2 get queued, before
|
||||
* our local migrate thread gets the chance to preempt us.
|
||||
*/
|
||||
static void irq_cpu_stop_queue_work(void *arg)
|
||||
{
|
||||
struct irq_cpu_stop_queue_work_info *info = arg;
|
||||
cpu_stop_queue_work(info->cpu1, info->work1);
|
||||
cpu_stop_queue_work(info->cpu2, info->work2);
|
||||
}
|
||||
|
||||
/**
|
||||
* stop_two_cpus - stops two cpus
|
||||
* @cpu1: the cpu to stop
|
||||
* @cpu2: the other cpu to stop
|
||||
* @fn: function to execute
|
||||
* @arg: argument to @fn
|
||||
*
|
||||
* Stops both the current and specified CPU and runs @fn on one of them.
|
||||
*
|
||||
* returns when both are completed.
|
||||
*/
|
||||
int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *arg)
|
||||
{
|
||||
struct cpu_stop_done done;
|
||||
struct cpu_stop_work work1, work2;
|
||||
struct irq_cpu_stop_queue_work_info call_args;
|
||||
struct multi_stop_data msdata;
|
||||
|
||||
preempt_disable();
|
||||
msdata = (struct multi_stop_data){
|
||||
.fn = fn,
|
||||
.data = arg,
|
||||
.num_threads = 2,
|
||||
.active_cpus = cpumask_of(cpu1),
|
||||
};
|
||||
|
||||
work1 = work2 = (struct cpu_stop_work){
|
||||
.fn = multi_cpu_stop,
|
||||
.arg = &msdata,
|
||||
.done = &done
|
||||
};
|
||||
|
||||
call_args = (struct irq_cpu_stop_queue_work_info){
|
||||
.cpu1 = cpu1,
|
||||
.cpu2 = cpu2,
|
||||
.work1 = &work1,
|
||||
.work2 = &work2,
|
||||
};
|
||||
|
||||
cpu_stop_init_done(&done, 2);
|
||||
set_state(&msdata, MULTI_STOP_PREPARE);
|
||||
|
||||
/*
|
||||
* If we observe both CPUs active we know _cpu_down() cannot yet have
|
||||
* queued its stop_machine works and therefore ours will get executed
|
||||
* first. Or its not either one of our CPUs that's getting unplugged,
|
||||
* in which case we don't care.
|
||||
*
|
||||
* This relies on the stopper workqueues to be FIFO.
|
||||
*/
|
||||
if (!cpu_active(cpu1) || !cpu_active(cpu2)) {
|
||||
preempt_enable();
|
||||
return -ENOENT;
|
||||
}
|
||||
|
||||
lg_local_lock(&stop_cpus_lock);
|
||||
/*
|
||||
* Queuing needs to be done by the lowest numbered CPU, to ensure
|
||||
* that works are always queued in the same order on every CPU.
|
||||
* This prevents deadlocks.
|
||||
*/
|
||||
smp_call_function_single(min(cpu1, cpu2),
|
||||
&irq_cpu_stop_queue_work,
|
||||
&call_args, 0);
|
||||
lg_local_unlock(&stop_cpus_lock);
|
||||
preempt_enable();
|
||||
|
||||
wait_for_completion(&done.completion);
|
||||
|
||||
return done.executed ? done.ret : -ENOENT;
|
||||
}
|
||||
|
||||
/**
|
||||
* stop_one_cpu_nowait - stop a cpu but don't wait for completion
|
||||
* @cpu: cpu to stop
|
||||
|
@ -159,10 +346,10 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask,
|
|||
* preempted by a stopper which might wait for other stoppers
|
||||
* to enter @fn which can lead to deadlock.
|
||||
*/
|
||||
preempt_disable();
|
||||
lg_global_lock(&stop_cpus_lock);
|
||||
for_each_cpu(cpu, cpumask)
|
||||
cpu_stop_queue_work(cpu, &per_cpu(stop_cpus_work, cpu));
|
||||
preempt_enable();
|
||||
lg_global_unlock(&stop_cpus_lock);
|
||||
}
|
||||
|
||||
static int __stop_cpus(const struct cpumask *cpumask,
|
||||
|
@ -359,98 +546,14 @@ early_initcall(cpu_stop_init);
|
|||
|
||||
#ifdef CONFIG_STOP_MACHINE
|
||||
|
||||
/* This controls the threads on each CPU. */
|
||||
enum stopmachine_state {
|
||||
/* Dummy starting state for thread. */
|
||||
STOPMACHINE_NONE,
|
||||
/* Awaiting everyone to be scheduled. */
|
||||
STOPMACHINE_PREPARE,
|
||||
/* Disable interrupts. */
|
||||
STOPMACHINE_DISABLE_IRQ,
|
||||
/* Run the function */
|
||||
STOPMACHINE_RUN,
|
||||
/* Exit */
|
||||
STOPMACHINE_EXIT,
|
||||
};
|
||||
|
||||
struct stop_machine_data {
|
||||
int (*fn)(void *);
|
||||
void *data;
|
||||
/* Like num_online_cpus(), but hotplug cpu uses us, so we need this. */
|
||||
unsigned int num_threads;
|
||||
const struct cpumask *active_cpus;
|
||||
|
||||
enum stopmachine_state state;
|
||||
atomic_t thread_ack;
|
||||
};
|
||||
|
||||
static void set_state(struct stop_machine_data *smdata,
|
||||
enum stopmachine_state newstate)
|
||||
{
|
||||
/* Reset ack counter. */
|
||||
atomic_set(&smdata->thread_ack, smdata->num_threads);
|
||||
smp_wmb();
|
||||
smdata->state = newstate;
|
||||
}
|
||||
|
||||
/* Last one to ack a state moves to the next state. */
|
||||
static void ack_state(struct stop_machine_data *smdata)
|
||||
{
|
||||
if (atomic_dec_and_test(&smdata->thread_ack))
|
||||
set_state(smdata, smdata->state + 1);
|
||||
}
|
||||
|
||||
/* This is the cpu_stop function which stops the CPU. */
|
||||
static int stop_machine_cpu_stop(void *data)
|
||||
{
|
||||
struct stop_machine_data *smdata = data;
|
||||
enum stopmachine_state curstate = STOPMACHINE_NONE;
|
||||
int cpu = smp_processor_id(), err = 0;
|
||||
unsigned long flags;
|
||||
bool is_active;
|
||||
|
||||
/*
|
||||
* When called from stop_machine_from_inactive_cpu(), irq might
|
||||
* already be disabled. Save the state and restore it on exit.
|
||||
*/
|
||||
local_save_flags(flags);
|
||||
|
||||
if (!smdata->active_cpus)
|
||||
is_active = cpu == cpumask_first(cpu_online_mask);
|
||||
else
|
||||
is_active = cpumask_test_cpu(cpu, smdata->active_cpus);
|
||||
|
||||
/* Simple state machine */
|
||||
do {
|
||||
/* Chill out and ensure we re-read stopmachine_state. */
|
||||
cpu_relax();
|
||||
if (smdata->state != curstate) {
|
||||
curstate = smdata->state;
|
||||
switch (curstate) {
|
||||
case STOPMACHINE_DISABLE_IRQ:
|
||||
local_irq_disable();
|
||||
hard_irq_disable();
|
||||
break;
|
||||
case STOPMACHINE_RUN:
|
||||
if (is_active)
|
||||
err = smdata->fn(smdata->data);
|
||||
break;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
ack_state(smdata);
|
||||
}
|
||||
} while (curstate != STOPMACHINE_EXIT);
|
||||
|
||||
local_irq_restore(flags);
|
||||
return err;
|
||||
}
|
||||
|
||||
int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
|
||||
{
|
||||
struct stop_machine_data smdata = { .fn = fn, .data = data,
|
||||
struct multi_stop_data msdata = {
|
||||
.fn = fn,
|
||||
.data = data,
|
||||
.num_threads = num_online_cpus(),
|
||||
.active_cpus = cpus };
|
||||
.active_cpus = cpus,
|
||||
};
|
||||
|
||||
if (!stop_machine_initialized) {
|
||||
/*
|
||||
|
@ -461,7 +564,7 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
|
|||
unsigned long flags;
|
||||
int ret;
|
||||
|
||||
WARN_ON_ONCE(smdata.num_threads != 1);
|
||||
WARN_ON_ONCE(msdata.num_threads != 1);
|
||||
|
||||
local_irq_save(flags);
|
||||
hard_irq_disable();
|
||||
|
@ -472,8 +575,8 @@ int __stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
|
|||
}
|
||||
|
||||
/* Set the initial state and stop all online cpus. */
|
||||
set_state(&smdata, STOPMACHINE_PREPARE);
|
||||
return stop_cpus(cpu_online_mask, stop_machine_cpu_stop, &smdata);
|
||||
set_state(&msdata, MULTI_STOP_PREPARE);
|
||||
return stop_cpus(cpu_online_mask, multi_cpu_stop, &msdata);
|
||||
}
|
||||
|
||||
int stop_machine(int (*fn)(void *), void *data, const struct cpumask *cpus)
|
||||
|
@ -513,25 +616,25 @@ EXPORT_SYMBOL_GPL(stop_machine);
|
|||
int stop_machine_from_inactive_cpu(int (*fn)(void *), void *data,
|
||||
const struct cpumask *cpus)
|
||||
{
|
||||
struct stop_machine_data smdata = { .fn = fn, .data = data,
|
||||
struct multi_stop_data msdata = { .fn = fn, .data = data,
|
||||
.active_cpus = cpus };
|
||||
struct cpu_stop_done done;
|
||||
int ret;
|
||||
|
||||
/* Local CPU must be inactive and CPU hotplug in progress. */
|
||||
BUG_ON(cpu_active(raw_smp_processor_id()));
|
||||
smdata.num_threads = num_active_cpus() + 1; /* +1 for local */
|
||||
msdata.num_threads = num_active_cpus() + 1; /* +1 for local */
|
||||
|
||||
/* No proper task established and can't sleep - busy wait for lock. */
|
||||
while (!mutex_trylock(&stop_cpus_mutex))
|
||||
cpu_relax();
|
||||
|
||||
/* Schedule work on other CPUs and execute directly for local CPU */
|
||||
set_state(&smdata, STOPMACHINE_PREPARE);
|
||||
set_state(&msdata, MULTI_STOP_PREPARE);
|
||||
cpu_stop_init_done(&done, num_active_cpus());
|
||||
queue_stop_cpus_work(cpu_active_mask, stop_machine_cpu_stop, &smdata,
|
||||
queue_stop_cpus_work(cpu_active_mask, multi_cpu_stop, &msdata,
|
||||
&done);
|
||||
ret = stop_machine_cpu_stop(&smdata);
|
||||
ret = multi_cpu_stop(&msdata);
|
||||
|
||||
/* Busy wait for completion. */
|
||||
while (!completion_done(&done.completion))
|
||||
|
|
|
@ -370,13 +370,6 @@ static struct ctl_table kern_table[] = {
|
|||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "numa_balancing_scan_period_reset",
|
||||
.data = &sysctl_numa_balancing_scan_period_reset,
|
||||
.maxlen = sizeof(unsigned int),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "numa_balancing_scan_period_max_ms",
|
||||
.data = &sysctl_numa_balancing_scan_period_max,
|
||||
|
@ -391,6 +384,20 @@ static struct ctl_table kern_table[] = {
|
|||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "numa_balancing_settle_count",
|
||||
.data = &sysctl_numa_balancing_settle_count,
|
||||
.maxlen = sizeof(unsigned int),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
{
|
||||
.procname = "numa_balancing_migrate_deferred",
|
||||
.data = &sysctl_numa_balancing_migrate_deferred,
|
||||
.maxlen = sizeof(unsigned int),
|
||||
.mode = 0644,
|
||||
.proc_handler = proc_dointvec,
|
||||
},
|
||||
#endif /* CONFIG_NUMA_BALANCING */
|
||||
#endif /* CONFIG_SCHED_DEBUG */
|
||||
{
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue