OpenCloudOS-Kernel/tools/sched_ext/scx_simple.c

100 lines
2.2 KiB
C
Raw Normal View History

rue/scx/sched_ext: Add scx_simple and scx_example_qmap example schedulers Upstream: no Add two simple example BPF schedulers - simple and qmap. * simple: In terms of scheduling, it behaves identical to not having any operation implemented at all. The two operations it implements are only to improve visibility and exit handling. On certain homogeneous configurations, this actually can perform pretty well. * qmap: A fixed five level priority scheduler to demonstrate queueing PIDs on BPF maps for scheduling. While not very practical, this is useful as a simple example and will be used to demonstrate different features. v5: * Improve Makefile. Build artifects are now collected into a separate dir which change be changed. Install and help targets are added and clean actually cleans everything. * MEMBER_VPTR() improved to improve access to structs. ARRAY_ELEM_PTR() and RESIZEABLE_ARRAY() are added to support resizable arrays in .bss. * Add scx_common.h which provides common utilities to user code such as SCX_BUG[_ON]() and RESIZE_ARRAY(). * Use SCX_BUG[_ON]() to simplify error handling. v4: * Dropped _example prefix from scheduler names. v3: * Rename scx_example_dummy to scx_example_simple and restructure a bit to ease later additions. Comment updates. * Added declarations for BPF inline iterators. In the future, hopefully, these will be consolidated into a generic BPF header so that they don't need to be replicated here. v2: * Updated with the generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2023-11-15 03:19:47 +08:00
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright (c) 2022 Meta Platforms, Inc. and affiliates.
* Copyright (c) 2022 Tejun Heo <tj@kernel.org>
* Copyright (c) 2022 David Vernet <dvernet@meta.com>
*/
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <libgen.h>
#include <bpf/bpf.h>
#include "scx_common.h"
#include "scx_simple.skel.h"
const char help_fmt[] =
"A simple sched_ext scheduler.\n"
"\n"
"See the top-level comment in .bpf.c for more details.\n"
"\n"
rue/scx/sched_ext: Add vtime-ordered priority queue to dispatch_q's Upstream: no Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. When executing or consuming the dsq, the FIFO is always processed first and the priority queue is processed iff the FIFO is empty. The design decision was made to allow both FIFO and priority queue to be available at the same timeq for all dsq's for three reasons. First, the new priority queue is useful for the local dsq's too but they also need the FIFO when consuming tasks from other dsq's as the vtimes may not be comparable across them. Second, the interface surface is smaller this way - the only additional interface necessary is scx_bpf_dispsatch_vtime(). Third, the overhead isn't meaningfully different whether they're available at the same time or not. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v3: * SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. * Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: * p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com>
2023-11-15 03:19:50 +08:00
"Usage: %s [-f] [-p]\n"
rue/scx/sched_ext: Add scx_simple and scx_example_qmap example schedulers Upstream: no Add two simple example BPF schedulers - simple and qmap. * simple: In terms of scheduling, it behaves identical to not having any operation implemented at all. The two operations it implements are only to improve visibility and exit handling. On certain homogeneous configurations, this actually can perform pretty well. * qmap: A fixed five level priority scheduler to demonstrate queueing PIDs on BPF maps for scheduling. While not very practical, this is useful as a simple example and will be used to demonstrate different features. v5: * Improve Makefile. Build artifects are now collected into a separate dir which change be changed. Install and help targets are added and clean actually cleans everything. * MEMBER_VPTR() improved to improve access to structs. ARRAY_ELEM_PTR() and RESIZEABLE_ARRAY() are added to support resizable arrays in .bss. * Add scx_common.h which provides common utilities to user code such as SCX_BUG[_ON]() and RESIZE_ARRAY(). * Use SCX_BUG[_ON]() to simplify error handling. v4: * Dropped _example prefix from scheduler names. v3: * Rename scx_example_dummy to scx_example_simple and restructure a bit to ease later additions. Comment updates. * Added declarations for BPF inline iterators. In the future, hopefully, these will be consolidated into a generic BPF header so that they don't need to be replicated here. v2: * Updated with the generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2023-11-15 03:19:47 +08:00
"\n"
rue/scx/sched_ext: Add vtime-ordered priority queue to dispatch_q's Upstream: no Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. When executing or consuming the dsq, the FIFO is always processed first and the priority queue is processed iff the FIFO is empty. The design decision was made to allow both FIFO and priority queue to be available at the same timeq for all dsq's for three reasons. First, the new priority queue is useful for the local dsq's too but they also need the FIFO when consuming tasks from other dsq's as the vtimes may not be comparable across them. Second, the interface surface is smaller this way - the only additional interface necessary is scx_bpf_dispsatch_vtime(). Third, the overhead isn't meaningfully different whether they're available at the same time or not. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v3: * SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. * Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: * p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com>
2023-11-15 03:19:50 +08:00
" -f Use FIFO scheduling instead of weighted vtime scheduling\n"
" -p Switch only tasks on SCHED_EXT policy intead of all\n"
rue/scx/sched_ext: Add scx_simple and scx_example_qmap example schedulers Upstream: no Add two simple example BPF schedulers - simple and qmap. * simple: In terms of scheduling, it behaves identical to not having any operation implemented at all. The two operations it implements are only to improve visibility and exit handling. On certain homogeneous configurations, this actually can perform pretty well. * qmap: A fixed five level priority scheduler to demonstrate queueing PIDs on BPF maps for scheduling. While not very practical, this is useful as a simple example and will be used to demonstrate different features. v5: * Improve Makefile. Build artifects are now collected into a separate dir which change be changed. Install and help targets are added and clean actually cleans everything. * MEMBER_VPTR() improved to improve access to structs. ARRAY_ELEM_PTR() and RESIZEABLE_ARRAY() are added to support resizable arrays in .bss. * Add scx_common.h which provides common utilities to user code such as SCX_BUG[_ON]() and RESIZE_ARRAY(). * Use SCX_BUG[_ON]() to simplify error handling. v4: * Dropped _example prefix from scheduler names. v3: * Rename scx_example_dummy to scx_example_simple and restructure a bit to ease later additions. Comment updates. * Added declarations for BPF inline iterators. In the future, hopefully, these will be consolidated into a generic BPF header so that they don't need to be replicated here. v2: * Updated with the generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2023-11-15 03:19:47 +08:00
" -h Display this help and exit\n";
static volatile int exit_req;
static void sigint_handler(int simple)
{
exit_req = 1;
}
static void read_stats(struct scx_simple *skel, __u64 *stats)
{
int nr_cpus = libbpf_num_possible_cpus();
__u64 cnts[2][nr_cpus];
__u32 idx;
memset(stats, 0, sizeof(stats[0]) * 2);
for (idx = 0; idx < 2; idx++) {
int ret, cpu;
ret = bpf_map_lookup_elem(bpf_map__fd(skel->maps.stats),
&idx, cnts[idx]);
if (ret < 0)
continue;
for (cpu = 0; cpu < nr_cpus; cpu++)
stats[idx] += cnts[idx][cpu];
}
}
int main(int argc, char **argv)
{
struct scx_simple *skel;
struct bpf_link *link;
__u32 opt;
signal(SIGINT, sigint_handler);
signal(SIGTERM, sigint_handler);
libbpf_set_strict_mode(LIBBPF_STRICT_ALL);
skel = scx_simple__open();
SCX_BUG_ON(!skel, "Failed to open skel");
rue/scx/sched_ext: Add vtime-ordered priority queue to dispatch_q's Upstream: no Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. When executing or consuming the dsq, the FIFO is always processed first and the priority queue is processed iff the FIFO is empty. The design decision was made to allow both FIFO and priority queue to be available at the same timeq for all dsq's for three reasons. First, the new priority queue is useful for the local dsq's too but they also need the FIFO when consuming tasks from other dsq's as the vtimes may not be comparable across them. Second, the interface surface is smaller this way - the only additional interface necessary is scx_bpf_dispsatch_vtime(). Third, the overhead isn't meaningfully different whether they're available at the same time or not. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v3: * SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. * Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: * p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com>
2023-11-15 03:19:50 +08:00
while ((opt = getopt(argc, argv, "fph")) != -1) {
rue/scx/sched_ext: Add scx_simple and scx_example_qmap example schedulers Upstream: no Add two simple example BPF schedulers - simple and qmap. * simple: In terms of scheduling, it behaves identical to not having any operation implemented at all. The two operations it implements are only to improve visibility and exit handling. On certain homogeneous configurations, this actually can perform pretty well. * qmap: A fixed five level priority scheduler to demonstrate queueing PIDs on BPF maps for scheduling. While not very practical, this is useful as a simple example and will be used to demonstrate different features. v5: * Improve Makefile. Build artifects are now collected into a separate dir which change be changed. Install and help targets are added and clean actually cleans everything. * MEMBER_VPTR() improved to improve access to structs. ARRAY_ELEM_PTR() and RESIZEABLE_ARRAY() are added to support resizable arrays in .bss. * Add scx_common.h which provides common utilities to user code such as SCX_BUG[_ON]() and RESIZE_ARRAY(). * Use SCX_BUG[_ON]() to simplify error handling. v4: * Dropped _example prefix from scheduler names. v3: * Rename scx_example_dummy to scx_example_simple and restructure a bit to ease later additions. Comment updates. * Added declarations for BPF inline iterators. In the future, hopefully, these will be consolidated into a generic BPF header so that they don't need to be replicated here. v2: * Updated with the generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2023-11-15 03:19:47 +08:00
switch (opt) {
rue/scx/sched_ext: Add vtime-ordered priority queue to dispatch_q's Upstream: no Currently, a dsq is always a FIFO. A task which is dispatched earlier gets consumed or executed earlier. While this is sufficient when dsq's are used for simple staging areas for tasks which are ready to execute, it'd make dsq's a lot more useful if they can implement custom ordering. This patch adds a vtime-ordered priority queue to dsq's. When the BPF scheduler dispatches a task with the new scx_bpf_dispatch_vtime() helper, it can specify the vtime tha the task should be inserted at and the task is inserted into the priority queue in the dsq which is ordered according to time_before64() comparison of the vtime values. When executing or consuming the dsq, the FIFO is always processed first and the priority queue is processed iff the FIFO is empty. The design decision was made to allow both FIFO and priority queue to be available at the same timeq for all dsq's for three reasons. First, the new priority queue is useful for the local dsq's too but they also need the FIFO when consuming tasks from other dsq's as the vtimes may not be comparable across them. Second, the interface surface is smaller this way - the only additional interface necessary is scx_bpf_dispsatch_vtime(). Third, the overhead isn't meaningfully different whether they're available at the same time or not. This makes it very easy for the BPF schedulers to implement proper vtime based scheduling within each dsq very easy and efficient at a negligible cost in terms of code complexity and overhead. scx_simple and scx_example_flatcg are updated to default to weighted vtime scheduling (the latter within each cgroup). FIFO scheduling can be selected with -f option. v3: * SCX_TASK_DSQ_ON_PRIQ flag is moved from p->scx.flags into its own p->scx.dsq_flags. The flag is protected with the dsq lock unlike other flags in p->scx.flags. This led to flag corruption in some cases. * Add comments explaining the interaction between using consumption of p->scx.slice to determine vtime progress and yielding. v2: * p->scx.dsq_vtime was not initialized on load or across cgroup migrations leading to some tasks being stalled for extended period of time depending on how saturated the machine is. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com>
2023-11-15 03:19:50 +08:00
case 'f':
skel->rodata->fifo_sched = true;
break;
case 'p':
skel->rodata->switch_partial = true;
break;
rue/scx/sched_ext: Add scx_simple and scx_example_qmap example schedulers Upstream: no Add two simple example BPF schedulers - simple and qmap. * simple: In terms of scheduling, it behaves identical to not having any operation implemented at all. The two operations it implements are only to improve visibility and exit handling. On certain homogeneous configurations, this actually can perform pretty well. * qmap: A fixed five level priority scheduler to demonstrate queueing PIDs on BPF maps for scheduling. While not very practical, this is useful as a simple example and will be used to demonstrate different features. v5: * Improve Makefile. Build artifects are now collected into a separate dir which change be changed. Install and help targets are added and clean actually cleans everything. * MEMBER_VPTR() improved to improve access to structs. ARRAY_ELEM_PTR() and RESIZEABLE_ARRAY() are added to support resizable arrays in .bss. * Add scx_common.h which provides common utilities to user code such as SCX_BUG[_ON]() and RESIZE_ARRAY(). * Use SCX_BUG[_ON]() to simplify error handling. v4: * Dropped _example prefix from scheduler names. v3: * Rename scx_example_dummy to scx_example_simple and restructure a bit to ease later additions. Comment updates. * Added declarations for BPF inline iterators. In the future, hopefully, these will be consolidated into a generic BPF header so that they don't need to be replicated here. v2: * Updated with the generic BPF cpumask helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Hongbo Li <herberthbli@tencent.com> Reviewed-by: David Vernet <dvernet@meta.com> Acked-by: Josh Don <joshdon@google.com> Acked-by: Hao Luo <haoluo@google.com> Acked-by: Barret Rhoden <brho@google.com>
2023-11-15 03:19:47 +08:00
default:
fprintf(stderr, help_fmt, basename(argv[0]));
return opt != 'h';
}
}
SCX_BUG_ON(scx_simple__load(skel), "Failed to load skel");
link = bpf_map__attach_struct_ops(skel->maps.simple_ops);
SCX_BUG_ON(!link, "Failed to attach struct_ops");
while (!exit_req && !uei_exited(&skel->bss->uei)) {
__u64 stats[2];
read_stats(skel, stats);
printf("local=%llu global=%llu\n", stats[0], stats[1]);
fflush(stdout);
sleep(1);
}
bpf_link__destroy(link);
uei_print(&skel->bss->uei);
scx_simple__destroy(skel);
return 0;
}