linux-sg2042/include/linux/sched
Christian Brauner ef2c41cf38 clone3: allow spawning processes into cgroups
This adds support for creating a process in a different cgroup than its
parent. Callers can limit and account processes and threads right from
the moment they are spawned:
- A service manager can directly spawn new services into dedicated
  cgroups.
- A process can be directly created in a frozen cgroup and will be
  frozen as well.
- The initial accounting jitter experienced by process supervisors and
  daemons is eliminated with this.
- Threaded applications or even thread implementations can choose to
  create a specific cgroup layout where each thread is spawned
  directly into a dedicated cgroup.

This feature is limited to the unified hierarchy. Callers need to pass
a directory file descriptor for the target cgroup. The caller can
choose to pass an O_PATH file descriptor. All usual migration
restrictions apply, i.e. there can be no processes in inner nodes. In
general, creating a process directly in a target cgroup adheres to all
migration restrictions.

One of the biggest advantages of this feature is that CLONE_INTO_GROUP does
not need to grab the write side of the cgroup cgroup_threadgroup_rwsem.
This global lock makes moving tasks/threads around super expensive. With
clone3() this lock is avoided.

Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: cgroups@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2020-02-12 17:57:51 -05:00
..
autogroup.h
clock.h
coredump.h oom, oom_reaper: do not enqueue same task twice 2019-02-01 15:46:23 -08:00
cpufreq.h sched/fair: Remove redundant call to cpufreq_update_util() 2020-01-17 10:19:22 +01:00
cputime.h posix-cpu-timers: Move state tracking to struct posix_cputimers 2019-08-28 11:50:42 +02:00
deadline.h cpusets: Rebuild root domain deadline accounting information 2019-07-25 15:55:01 +02:00
debug.h
hotplug.h
idle.h
init.h
isolation.h genirq, sched/isolation: Isolate from handling managed interrupts 2020-01-22 16:29:49 +01:00
jobctl.h cgroup: cgroup v2 freezer 2019-04-19 11:26:48 -07:00
loadavg.h sched: loadavg: make calc_load_n() public 2018-10-26 16:26:32 -07:00
mm.h exit/exec: Seperate mm_release() 2019-11-20 09:40:08 +01:00
nohz.h sched/fair: Remove the rq->cpu_load[] update code 2019-06-03 11:49:38 +02:00
numa_balancing.h sched/fair: Don't free p->numa_faults with concurrent readers 2019-07-25 15:37:04 +02:00
prio.h
rt.h
signal.h posix-cpu-timers: Move state tracking to struct posix_cputimers 2019-08-28 11:50:42 +02:00
smt.h x86/speculation: Rework SMT state change 2018-11-28 11:57:07 +01:00
stat.h sched: Fix various typos in comments 2018-12-03 11:55:42 +01:00
sysctl.h sched/uclamp: Add system default clamps 2019-06-24 19:23:45 +02:00
task.h clone3: allow spawning processes into cgroups 2020-02-12 17:57:51 -05:00
task_stack.h sched/core: Convert task_struct.stack_refcount to refcount_t 2019-02-04 08:53:56 +01:00
topology.h sched/topology: Add partition_sched_domains_locked() 2019-07-25 15:51:57 +02:00
types.h posix-cpu-timers: Provide array based access to expiry cache 2019-08-28 11:50:35 +02:00
user.h keys: Move the user and user-session keyrings to the user_namespace 2019-06-26 21:02:32 +01:00
wake_q.h locking/rwsem: Always release wait_lock before waking up tasks 2019-06-17 12:28:00 +02:00
xacct.h