linux-sg2042/kernel
Paul Burton 7af443ee16 sched/core: Require cpu_active() in select_task_rq(), for user tasks
select_task_rq() is used in a few paths to select the CPU upon which a
thread should be run - for example it is used by try_to_wake_up() & by
fork or exec balancing. As-is it allows use of any online CPU that is
present in the task's cpus_allowed mask.

This presents a problem because there is a period whilst CPUs are
brought online where a CPU is marked online, but is not yet fully
initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state <
CPUHP_ONLINE. Usually we don't run any user tasks during this window,
but there are corner cases where this can happen. An example observed
is:

  - Some user task A, running on CPU X, forks to create task B.

  - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's
    task_struct::cpu field to X.

  - CPU X is offlined.

  - Task A, currently somewhere between the __set_task_cpu() in
    copy_process() and the call to wake_up_new_task(), is migrated to
    CPU Y by migrate_tasks() when CPU X is offlined.

  - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The
    scheduler is now active on CPU X, but there are no user tasks on
    the runqueue.

  - Task A runs on CPU Y & reaches wake_up_new_task(). This calls
    select_task_rq() with cpu=X, taken from task B's task_struct,
    and select_task_rq() allows CPU X to be returned.

  - Task A enqueues task B on CPU X's runqueue, via activate_task() &
    enqueue_task().

  - CPU X now has a user task on its runqueue before it has reached the
    CPUHP_ONLINE state.

In most cases, the user tasks that schedule on the newly onlined CPU
have no idea that anything went wrong, but one case observed to be
problematic is if the task goes on to invoke the sched_setaffinity
syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state
before the CPU that brought it online calls stop_machine_unpark(). This
means that for a portion of the window of time between
CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct
cpu_stopper has its enabled field set to false. If a user thread is
executed on the CPU during this window and it invokes sched_setaffinity
with a CPU mask that does not include the CPU it's running on, then when
__set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke
migration_cpu_stop() and perform the actual migration away from the CPU
it will simply return -ENOENT rather than calling migration_cpu_stop().
We then return from the sched_setaffinity syscall back to the user task
that is now running on a CPU which it just asked not to run on, and
which is not present in its cpus_allowed mask.

This patch resolves the problem by having select_task_rq() enforce that
user tasks run on CPUs that are active - the same requirement that
select_fallback_rq() already enforces. This should ensure that newly
onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to
schedule user tasks, and also implies that bringup_wait_for_ap() will
have called stop_machine_unpark() which resolves the sched_setaffinity
issue above.

I haven't yet investigated them, but it may be of interest to review
whether any of the actions performed by hotplug states between
CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended
effects on user tasks that might schedule before they are reached, which
might widen the scope of the problem from just affecting the behaviour
of sched_setaffinity.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180526154648.11635-2-paul.burton@mips.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-31 12:24:25 +02:00
..
bpf Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-05-25 19:54:42 -07:00
cgroup Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-04-03 18:00:13 -07:00
configs KVM changes for 4.16 2018-02-10 13:16:35 -08:00
debug * Fix 2032 time access issues and new compiler warnings 2018-04-12 10:21:19 -07:00
events perf/core: Fix possible Spectre-v1 indexing for ->aux_pages[] 2018-05-05 08:37:27 +02:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq/affinity: Spread irq vectors among present CPUs as far as possible 2018-04-06 12:19:51 +02:00
livepatch livepatch: Allow to call a custom callback when freeing shadow variables 2018-04-17 13:42:48 +02:00
locking locking/percpu-rwsem: Annotate rwsem ownership transfer by setting RWSEM_OWNER_UNKNOWN 2018-05-16 11:45:16 +02:00
power PM / QoS: mark expected switch fall-throughs 2018-04-09 13:49:40 +02:00
printk New features: 2018-04-10 11:27:30 -07:00
rcu Merge branches 'fixes.2018.02.23a', 'srcu.2018.02.20a' and 'torture.2018.02.20a' into HEAD 2018-02-23 15:15:41 -08:00
sched sched/core: Require cpu_active() in select_task_rq(), for user tasks 2018-05-31 12:24:25 +02:00
time tick/broadcast: Use for_each_cpu() specially on UP kernels 2018-05-15 22:45:54 +02:00
trace tracing: Fix regex_match_front() to not over compare the test string 2018-05-11 10:56:42 -04:00
.gitignore
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
Makefile error-injection: Support fault injection framework 2018-01-12 17:33:38 -08:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit.c audit/stable-4.17 PR 20180403 2018-04-06 15:01:25 -07:00
audit.h audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_fsnotify.c
audit_tree.c audit: track the owner of the command mutex ourselves 2018-02-23 11:22:22 -05:00
audit_watch.c audit/stable-4.13 PR 20170816 2017-08-16 16:48:34 -07:00
auditfilter.c audit: deprecate the AUDIT_FILTER_ENTRY filter 2018-02-15 14:36:29 -05:00
auditsc.c audit: bail before bug check if audit disabled 2018-02-15 14:40:25 -05:00
backtracetest.c
bounds.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c compat: fix 4-byte infoleak via uninitialized struct field 2018-05-10 17:51:58 -07:00
configs.c
context_tracking.c
cpu.c cpu/hotplug: Fix unused function warning 2018-03-15 20:34:40 +01:00
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
crash_core.c kexec: export PG_swapbacked to VMCOREINFO 2018-04-13 17:10:27 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Account blkio completion on the correct task 2018-01-16 03:29:36 +01:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c get rid of pointless includes of fs_struct.h 2018-02-22 14:28:50 -05:00
exit.c kernel: use kernel_wait4() instead of sys_wait4() 2018-04-02 20:14:51 +02:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c error-injection: Fix to prohibit jump optimization 2018-03-12 16:16:00 +01:00
fork.c fork: unconditionally clear stack on fork 2018-04-20 17:18:35 -07:00
freezer.c
futex.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c jump_label: Disable jump labels in __exit code 2018-03-20 08:57:17 +01:00
kallsyms.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kcov.c kcov: detect double association with a single task 2018-02-06 18:32:46 -08:00
kexec.c kexec: call do_kexec_load() in compat syscall directly 2018-04-02 20:15:01 +02:00
kexec_core.c x86/mm, kexec: Allow kexec to be used with SME 2017-07-18 11:38:04 +02:00
kexec_file.c kernel/kexec_file.c: allow archs to set purgatory load address 2018-04-13 17:10:28 -07:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Fix random address output of blacklist file 2018-04-25 10:27:56 -04:00
ksysfs.c kexec: move vmcoreinfo out of the kernel's .bss section 2017-07-12 16:25:59 -07:00
kthread.c kthread: Allow kthread_park() on a parked kthread 2018-05-25 08:03:51 +02:00
latencytop.c
memremap.c kernel/memremap: Remove stale devres_free() call 2018-03-06 10:58:54 -08:00
module-internal.h
module.c init: fix false positives in W+X checking 2018-05-11 17:28:45 -07:00
module_signing.c
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c taint: add taint for randstruct 2018-04-11 10:28:35 -07:00
params.c kernel/params.c: downgrade warning for unsafe parameters 2018-04-11 10:28:37 -07:00
pid.c xarray: add the xa_lock to the radix_tree_root 2018-04-11 10:28:39 -07:00
pid_namespace.c Merge branch 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2018-04-03 19:15:32 -07:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c kernel/reboot.c: add devm_register_reboot_notifier() 2017-11-17 16:10:04 -08:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-02-21 15:35:43 -08:00
resource.c resource: fix integer overflow at reallocation 2018-04-13 17:10:27 -07:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-05-05 00:51:44 +02:00
signal.c sched/core: Introduce set_special_state() 2018-05-04 07:54:54 +02:00
smp.c smp/core: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:50 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c softirq: Consolidate common code in tasklet_[hi]_action() 2018-03-09 11:50:55 +01:00
stacktrace.c
stop_machine.c stop_machine, sched: Fix migrate_swap() vs. active_balance() deadlock 2018-05-03 07:38:03 +02:00
sys.c kernel/sys.c: fix potential Spectre v1 issue 2018-05-25 18:12:11 -07:00
sys_ni.c syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls 2018-04-05 16:59:38 +02:00
sysctl.c kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param 2018-04-11 10:28:38 -07:00
sysctl_binary.c staging: irda: remove remaining remants of irda code removal 2018-04-16 11:26:49 +02:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Disable the jprobes test code 2017-10-20 11:02:54 +02:00
torture.c torture: Save a line in stutter_wait(): while -> for 2017-12-11 09:18:30 -08:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-04-30 12:09:56 -04:00
tsacct.c
ucount.c headers: untangle kmemleak.h from mm.h 2018-04-05 21:36:27 -07:00
uid16.c fs: add do_fchownat(), ksys_fchown() helpers and ksys_{,l}chown() wrappers 2018-04-02 20:15:59 +02:00
uid16.h kernel: provide ksys_*() wrappers for syscalls called by kernel/uid16.c 2018-04-02 20:15:30 +02:00
umh.c kernel: use kernel_wait4() instead of sys_wait4() 2018-04-02 20:14:51 +02:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user-return-notifier.c
user.c efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-11-16 12:20:15 -08:00
utsname.c uts: create "struct uts_namespace" from kmem_cache 2018-04-11 10:28:35 -07:00
utsname_sysctl.c
watchdog.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
watchdog_hld.c Merge branch 'linus' into core/urgent, to pick up dependent commits 2017-11-04 08:53:04 +01:00
workqueue.c Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2018-04-03 18:00:13 -07:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00