OpenCloudOS-Kernel

History

Linus Torvalds 7fef099702 x86/resctl: fix scheduler confusion with 'current' The implementation of 'current' on x86 is very intentionally special: it is a very common thing to look up, and it uses 'this_cpu_read_stable()' to get the current thread pointer efficiently from per-cpu storage. And the keyword in there is 'stable': the current thread pointer never changes as far as a single thread is concerned. Even if when a thread is preempted, or moved to another CPU, or even across an explicit call 'schedule()' that thread will still have the same value for 'current'. It is, after all, the kernel base pointer to thread-local storage. That's why it's stable to begin with, but it's also why it's important enough that we have that special 'this_cpu_read_stable()' access for it. So this is all done very intentionally to allow the compiler to treat 'current' as a value that never visibly changes, so that the compiler can do CSE and combine multiple different 'current' accesses into one. However, there is obviously one very special situation when the currently running thread does actually change: inside the scheduler itself. So the scheduler code paths are special, and do not have a 'current' thread at all. Instead there are _two_ threads: the previous and the next thread - typically called 'prev' and 'next' (or prev_p/next_p) internally. So this is all actually quite straightforward and simple, and not all that complicated. Except for when you then have special code that is run in scheduler context, that code then has to be aware that 'current' isn't really a valid thing. Did you mean 'prev'? Did you mean 'next'? In fact, even if then look at the code, and you use 'current' after the new value has been assigned to the percpu variable, we have explicitly told the compiler that 'current' is magical and always stable. So the compiler is quite free to use an older (or newer) value of 'current', and the actual assignment to the percpu storage is not relevant even if it might look that way. Which is exactly what happened in the resctl code, that blithely used 'current' in '__resctrl_sched_in()' when it really wanted the new process state (as implied by the name: we're scheduling 'into' that new resctl state). And clang would end up just using the old thread pointer value at least in some configurations. This could have happened with gcc too, and purely depends on random compiler details. Clang just seems to have been more aggressive about moving the read of the per-cpu current_task pointer around. The fix is trivial: just make the resctl code adhere to the scheduler rules of using the prev/next thread pointer explicitly, instead of using 'current' in a situation where it just wasn't valid. That same code is then also used outside of the scheduler context (when a thread resctl state is explicitly changed), and then we will just pass in 'current' as that pointer, of course. There is no ambiguity in that case. The fix may be trivial, but noticing and figuring out what went wrong was not. The credit for that goes to Stephane Eranian. Reported-by: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/ Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/ Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Stephane Eranian <eranian@google.com> Tested-by: Babu Moger <babu.moger@amd.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>		2023-03-08 11:48:11 -08:00
..
boot	Kbuild updates for v6.3	2023-02-26 11:53:25 -08:00
coco	- Fixup comment typo	2023-02-25 09:11:30 -08:00
configs	x86/defconfig: Enable CONFIG_DEBUG_WX=y	2022-09-02 10:41:42 +02:00
crypto	crypto: x86/aria-avx - Do not use avx2 instructions	2023-02-14 13:39:33 +08:00
entry	Changes in this cycle were:	2023-03-02 09:45:34 -08:00
events	ARM:	2023-02-25 11:30:21 -08:00
hyperv	x86/hyperv: Remove unregister syscore call from Hyper-V cleanup	2022-11-29 17:55:29 +00:00
ia32	x86/signal/32: Merge native and compat 32-bit signal code	2022-10-19 09:58:49 +02:00
include	x86/resctl: fix scheduler confusion with 'current'	2023-03-08 11:48:11 -08:00
kernel	x86/resctl: fix scheduler confusion with 'current'	2023-03-08 11:48:11 -08:00
kvm	ARM:	2023-02-25 11:30:21 -08:00
lib	- Cache the AMD debug registers in per-CPU variables to avoid MSR writes	2023-02-21 14:51:40 -08:00
math-emu	…
mm	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
net	bpf, x86: Simplify the parsing logic of structure parameters	2023-01-10 15:53:22 -08:00
pci	x86/pci/xen: Fixup fallout from the PCI/MSI overhaul	2023-01-16 20:40:44 +01:00
platform	A healthy mix of EFI contributions this time:	2023-02-23 14:41:48 -08:00
power	- Add the call depth tracking mitigation for Retbleed which has	2022-12-14 15:03:00 -08:00
purgatory	x86/purgatory: disable KMSAN instrumentation	2022-10-28 13:37:23 -07:00
ras	…
realmode	x86/boot: Skip realmode init code when running as Xen PV guest	2022-11-25 12:05:22 +01:00
tools	kbuild: allow to combine multiple V= levels	2023-01-22 23:43:32 +09:00
um	This pull request contains the following changes for UML:	2023-03-01 09:13:00 -08:00
video	…
virt/vmx/tdx	…
xen	xen: branch for v6.3-rc1	2023-02-21 17:07:39 -08:00
.gitignore	x86/purgatory: Omit use of bin2c	2022-07-25 10:32:32 +02:00
Kbuild	…
Kconfig	x86/Kconfig: Fix spellos & punctuation	2023-01-25 12:21:04 +01:00
Kconfig.assembler	crypto: x86/aria-avx - fix build failure with old binutils	2023-01-20 18:29:31 +08:00
Kconfig.cpu	…
Kconfig.debug	arch: make TRACE_IRQFLAGS_NMI_SUPPORT generic	2022-06-23 15:39:21 +01:00
Makefile	x86/build: Make 64-bit defconfig the default	2023-02-15 14:20:17 +01:00
Makefile.um	This pull request contains the following changes for UML:	2023-03-01 09:13:00 -08:00
Makefile_32.cpu	…