Add the span to the year of the development.
Link: https://lkml.kernel.org/r/20221025173709.2718725-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When libelf was not installed in the standard location, it cannot be
located by the current building config.
Use pkg-config to help locate libelf in such cases.
Signed-off-by: Shen Jiamin <shen_jiamin@comp.nus.edu.sg>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20221215044703.400139-1-shen_jiamin@comp.nus.edu.sg
Fix bug in btf_dump's logic of determining if a given struct type is
packed or not. The notion of "natural alignment" is not needed and is
even harmful in this case, so drop it altogether. The biggest difference
in btf_is_struct_packed() compared to its original implementation is
that we don't really use btf__align_of() to determine overall alignment
of a struct type (because it could be 1 for both packed and non-packed
struct, depending on specifci field definitions), and just use field's
actual alignment to calculate whether any field is requiring packing or
struct's size overall necessitates packing.
Add two simple test cases that demonstrate the difference this change
would make.
Fixes: ea2ce1ba99 ("libbpf: Fix BTF-to-C converter's padding logic")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org
* Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
* Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
* Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping option,
which multi-process VMMs such as crosvm rely on (see merge commit 382b5b87a97d:
"Fix a number of issues with MTE, such as races on the tags being
initialised vs the PG_mte_tagged flag as well as the lack of support
for VM_SHARED when KVM is involved. Patches from Catalin Marinas and
Peter Collingbourne").
* Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
* Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
* Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
* Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
* Second batch of the lazy destroy patches
* First batch of KVM changes for kernel virtual != physical address support
* Removal of a unused function
x86:
* Allow compiling out SMM support
* Cleanup and documentation of SMM state save area format
* Preserve interrupt shadow in SMM state save area
* Respond to generic signals during slow page faults
* Fixes and optimizations for the non-executable huge page errata fix.
* Reprogram all performance counters on PMU filter change
* Cleanups to Hyper-V emulation and tests
* Process Hyper-V TLB flushes from a nested guest (i.e. from a L2 guest
running on top of a L1 Hyper-V hypervisor)
* Advertise several new Intel features
* x86 Xen-for-KVM:
** Allow the Xen runstate information to cross a page boundary
** Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
** Add support for 32-bit guests in SCHEDOP_poll
* Notable x86 fixes and cleanups:
** One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
** Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
** Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
** Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
** Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
** Advertise (on AMD) that the SMM_CTL MSR is not supported
** Remove unnecessary exports
Generic:
* Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
* Add support for pinning vCPUs in dirty_log_perf_test.
* Rename the so called "perf_util" framework to "memstress".
* Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress tests.
* Add a common ucall implementation; code dedup and pre-work for running
SEV (and beyond) guests in selftests.
* Provide a common constructor and arch hook, which will eventually be
used by x86 to automatically select the right hypercall (AMD vs. Intel).
* A bunch of added/enabled/fixed selftests for ARM64, covering memslots,
breakpoints, stage-2 faults and access tracking.
* x86-specific selftest changes:
** Clean up x86's page table management.
** Clean up and enhance the "smaller maxphyaddr" test, and add a related
test to cover generic emulation failure.
** Clean up the nEPT support checks.
** Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
** Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
Documentation:
* Remove deleted ioctls from documentation
* Clean up the docs for the x86 MSR filter.
* Various fixes
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOaFrcUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPemQgAq49excg2Cc+EsHnZw3vu/QWdA0Rt
KhL3OgKxuHNjCbD2O9n2t5di7eJOTQ7F7T0eDm3xPTr4FS8LQ2327/mQePU/H2CF
mWOpq9RBWLzFsSTeVA2Mz9TUTkYSnDHYuRsBvHyw/n9cL76BWVzjImldFtjYjjex
yAwl8c5itKH6bc7KO+5ydswbvBzODkeYKUSBNdbn6m0JGQST7XppNwIAJvpiHsii
Qgpk0e4Xx9q4PXG/r5DedI6BlufBsLhv0aE9SHPzyKH3JbbUFhJYI8ZD5OhBQuYW
MwxK2KlM5Jm5ud2NZDDlsMmmvd1lnYCFDyqNozaKEWC1Y5rq1AbMa51fXA==
=QAYX
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM64:
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on (see merge
commit 382b5b87a97d: "Fix a number of issues with MTE, such as
races on the tags being initialised vs the PG_mte_tagged flag as
well as the lack of support for VM_SHARED when KVM is involved.
Patches from Catalin Marinas and Peter Collingbourne").
- Merge the pKVM shadow vcpu state tracking that allows the
hypervisor to have its own view of a vcpu, keeping that state
private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB
pages only) as a prefix of the oncoming support for 4kB and 16kB
pages.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
s390:
- Second batch of the lazy destroy patches
- First batch of KVM changes for kernel virtual != physical address
support
- Removal of a unused function
x86:
- Allow compiling out SMM support
- Cleanup and documentation of SMM state save area format
- Preserve interrupt shadow in SMM state save area
- Respond to generic signals during slow page faults
- Fixes and optimizations for the non-executable huge page errata
fix.
- Reprogram all performance counters on PMU filter change
- Cleanups to Hyper-V emulation and tests
- Process Hyper-V TLB flushes from a nested guest (i.e. from a L2
guest running on top of a L1 Hyper-V hypervisor)
- Advertise several new Intel features
- x86 Xen-for-KVM:
- Allow the Xen runstate information to cross a page boundary
- Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
- Add support for 32-bit guests in SCHEDOP_poll
- Notable x86 fixes and cleanups:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped
a few years back when eliminating unnecessary barriers when
switching between vmcs01 and vmcs02.
- Clean up vmread_error_trampoline() to make it more obvious that
params must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL
irrespective of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM
incorrectly thinking a guest needs TSC scaling when running on a
CPU with a constant TSC, but no hardware-enumerated TSC
frequency.
- Advertise (on AMD) that the SMM_CTL MSR is not supported
- Remove unnecessary exports
Generic:
- Support for responding to signals during page faults; introduces
new FOLL_INTERRUPTIBLE flag that was reviewed by mm folks
Selftests:
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix build errors that occur in certain setups (unsure exactly what
is unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
- Introduce actual atomics for clear/set_bit() in selftests
- Add support for pinning vCPUs in dirty_log_perf_test.
- Rename the so called "perf_util" framework to "memstress".
- Add a lightweight psuedo RNG for guest use, and use it to randomize
the access pattern and write vs. read percentage in the memstress
tests.
- Add a common ucall implementation; code dedup and pre-work for
running SEV (and beyond) guests in selftests.
- Provide a common constructor and arch hook, which will eventually
be used by x86 to automatically select the right hypercall (AMD vs.
Intel).
- A bunch of added/enabled/fixed selftests for ARM64, covering
memslots, breakpoints, stage-2 faults and access tracking.
- x86-specific selftest changes:
- Clean up x86's page table management.
- Clean up and enhance the "smaller maxphyaddr" test, and add a
related test to cover generic emulation failure.
- Clean up the nEPT support checks.
- Add X86_PROPERTY_* framework to retrieve multi-bit CPUID values.
- Fix an ordering issue in the AMX test introduced by recent
conversions to use kvm_cpu_has(), and harden the code to guard
against similar bugs in the future. Anything that tiggers
caching of KVM's supported CPUID, kvm_cpu_has() in this case,
effectively hides opt-in XSAVE features if the caching occurs
before the test opts in via prctl().
Documentation:
- Remove deleted ioctls from documentation
- Clean up the docs for the x86 MSR filter.
- Various fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (361 commits)
KVM: x86: Add proper ReST tables for userspace MSR exits/flags
KVM: selftests: Allocate ucall pool from MEM_REGION_DATA
KVM: arm64: selftests: Align VA space allocator with TTBR0
KVM: arm64: Fix benign bug with incorrect use of VA_BITS
KVM: arm64: PMU: Fix period computation for 64bit counters with 32bit overflow
KVM: x86: Advertise that the SMM_CTL MSR is not supported
KVM: x86: remove unnecessary exports
KVM: selftests: Fix spelling mistake "probabalistic" -> "probabilistic"
tools: KVM: selftests: Convert clear/set_bit() to actual atomics
tools: Drop "atomic_" prefix from atomic test_and_set_bit()
tools: Drop conflicting non-atomic test_and_{clear,set}_bit() helpers
KVM: selftests: Use non-atomic clear/set bit helpers in KVM tests
perf tools: Use dedicated non-atomic clear/set bit helpers
tools: Take @bit as an "unsigned long" in {clear,set}_bit() helpers
KVM: arm64: selftests: Enable single-step without a "full" ucall()
KVM: x86: fix APICv/x2AVIC disabled when vm reboot by itself
KVM: Remove stale comment about KVM_REQ_UNHALT
KVM: Add missing arch for KVM_CREATE_DEVICE and KVM_{SET,GET}_DEVICE_ATTR
KVM: Reference to kvm_userspace_memory_region in doc and comments
KVM: Delete all references to removed KVM_SET_MEMORY_ALIAS ioctl
...
This adds a simple test for inserting an XDP program into a cpumap that is
"owned" by an XDP program that was loaded as PROG_TYPE_EXT (as libxdp
does). Prior to the kernel fix this would fail because the map type
ownership would be set to PROG_TYPE_EXT instead of being resolved to
PROG_TYPE_XDP.
v5:
- Fix a few nits from Andrii, add his ACK
v4:
- Use skeletons for selftest
v3:
- Update comment to better explain the cause
- Add Yonghong's ACK
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221214230254.790066-2-toke@redhat.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since uprobe's argument must contain the user-space data, that
should not be converted to kernel symbols. Reject if user
specifies these types on uprobe events. e.g.
/sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symbol' >> uprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # echo 'p /bin/sh:10 %ax:symstr' >> uprobe_events
sh: write error: Invalid argument
/sys/kernel/debug/tracing # cat error_log
[ 1783.134883] trace_uprobe: error: Unknown type is specified
Command: p /bin/sh:10 %ax:symbol
^
[ 1792.201120] trace_uprobe: error: Unknown type is specified
Command: p /bin/sh:10 %ax:symstr
^
Link: https://lore.kernel.org/all/166679931679.1528100.15540755370726009882.stgit@devnote3/
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Turns out that btf_dump API doesn't handle a bunch of tricky corner
cases, as reported by Per, and further discovered using his testing
Python script ([0]).
This patch revamps btf_dump's padding logic significantly, making it
more correct and also avoiding unnecessary explicit padding, where
compiler would pad naturally. This overall topic turned out to be very
tricky and subtle, there are lots of subtle corner cases. The comments
in the code tries to give some clues, but comments themselves are
supposed to be paired with good understanding of C alignment and padding
rules. Plus some experimentation to figure out subtle things like
whether `long :0;` means that struct is now forced to be long-aligned
(no, it's not, turns out).
Anyways, Per's script, while not completely correct in some known
situations, doesn't show any obvious cases where this logic breaks, so
this is a nice improvement over the previous state of this logic.
Some selftests had to be adjusted to accommodate better use of natural
alignment rules, eliminating some unnecessary padding, or changing it to
`type: 0;` alignment markers.
Note also that for when we are in between bitfields, we emit explicit
bit size, while otherwise we use `: 0`, this feels much more natural in
practice.
Next patch will add few more test cases, found through randomized Per's
script.
[0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/
Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.
This patch fixes this issue precisely by additionally checking these
conditions.
Fixes: 3d208f4ca1 ("libbpf: Expose btf__align_of() API")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-5-andrii@kernel.org
Turns out C allows to force enum to be 1-byte or 8-byte explicitly using
mode(byte) or mode(word), respecticely. Linux sources are using this in
some cases. This is imporant to handle correctly, as enum size
determines corresponding fields in a struct that use that enum type. And
if enum size is incorrect, this will lead to invalid struct layout. So
add mode(byte) and mode(word) attribute support to btf_dump APIs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-3-andrii@kernel.org
btf_dump APIs emit unnecessary tabs when emitting struct/union
definition that fits on the single line. Before this patch we'd get:
struct blah {<tab>};
This patch fixes this and makes sure that we get more natural:
struct blah {};
Fixes: 44a726c3f2 ("bpftool: Print newline before '}' for struct with padding only fields")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org
been long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets applied,
it patches a call accounting thunk which is used to track the call depth
of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific value
for the Return Stack Buffer, the code stuffs that RSB and avoids its
underflow which could otherwise lead to the Intel variant of Retbleed.
This software-only solution brings a lot of the lost performance back,
as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT support
where present to annotate and track indirect branches using a hash to
validate them
- Other misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe
VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb
VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN
gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A
iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y
Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A
ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh
ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd
73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP
i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n
Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/
zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs=
=cRy1
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Add the call depth tracking mitigation for Retbleed which has been
long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets
applied, it patches a call accounting thunk which is used to track
the call depth of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific
value for the Return Stack Buffer, the code stuffs that RSB and
avoids its underflow which could otherwise lead to the Intel variant
of Retbleed.
This software-only solution brings a lot of the lost performance
back, as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT
support where present to annotate and track indirect branches using a
hash to validate them
- Other misc fixes and cleanups
* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
x86/paravirt: Use common macro for creating simple asm paravirt functions
x86/paravirt: Remove clobber bitmask from .parainstructions
x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
x86/Kconfig: Enable kernel IBT by default
x86,pm: Force out-of-line memcpy()
objtool: Fix weak hole vs prefix symbol
objtool: Optimize elf_dirty_reloc_sym()
x86/cfi: Add boot time hash randomization
x86/cfi: Boot time selection of CFI scheme
x86/ibt: Implement FineIBT
objtool: Add --cfi to generate the .cfi_sites section
x86: Add prefix symbols for function padding
objtool: Add option to generate prefix symbols
objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
objtool: Slice up elf_create_section_symbol()
kallsyms: Revert "Take callthunks into account"
x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
x86/retpoline: Fix crash printing warning
x86/paravirt: Fix a !PARAVIRT build warning
...
* add tests that trigger reallocation of memblock structures from
memblock itself via memblock_double_array()
* add tests for memblock_alloc_exact_nid_raw() that verify that requested
node and memory range constraints are respected.
-----BEGIN PGP SIGNATURE-----
iQFMBAABCAA2FiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmOYL14YHG1pa2UucmFw
b3BvcnRAZ21haWwuY29tAAoJEDkDhibLDv2RZdcH/2AE447oXzVO2lzOgkqQH1EX
xJdaa7hu00h2Euzv2lgcOHroHGXDP8wYjUV2cEyNZMP0WOMiO8i6rwIKmrzWufcm
R+ZoKPQV/Nc+7rIycpW455yLxcgsVIpUILK2BQEkDCGYugSHKb7IYdcA9KDJwtmR
xIG9j8nsuwWJtmtAuQqNOBmsc5FzKNYFa/RtDiJoMFmQNK3UqB8G8VCASdP0DYvH
7MXPcyRmlwpmOsKoNKi2/wQBsiag8/PLgcZv5vYg+E6no1tMG6u7pgDS12Sn6ZvA
I8gThJ8HNAo0d1O2SnbkicMx2CqrPFSub3QXaEFjCZF5mdBcirxHc/VBKj50TXU=
=iXEA
-----END PGP SIGNATURE-----
Merge tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Extend test coverage:
- add tests that trigger reallocation of memblock structures from
memblock itself via memblock_double_array()
- add tests for memblock_alloc_exact_nid_raw() that verify that
requested node and memory range constraints are respected"
* tag 'memblock-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
memblock tests: remove completed TODO item
memblock tests: add generic NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: add bottom-up NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: add top-down NUMA tests for memblock_alloc_exact_nid_raw
memblock tests: introduce range tests for memblock_alloc_exact_nid_raw
memblock test: Update TODO list
memblock test: Add test to memblock_reserve() 129th region
memblock test: Add test to memblock_add() 129th region
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/perf`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1668762999-9297-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -D/--delay option is to delay the measure after the program starts.
But the current code goes to sleep before starting the program so the
program is delayed too. This is not the intention, let's fix it.
Before:
$ time sudo ./perf stat -a -e cycles -D 3000 sleep 4
Events disabled
Events enabled
Performance counter stats for 'system wide':
4,326,949,337 cycles
4.007494118 seconds time elapsed
real 0m7.474s
user 0m0.356s
sys 0m0.120s
It ran the workload for 4 seconds and gave the 3 second delay. So it
should skip the first 3 second and measure the last 1 second only. But
as you can see, it delays 3 seconds and ran the workload after that for
4 seconds. So the total time (real) was 7 seconds.
After:
$ time sudo ./perf stat -a -e cycles -D 3000 sleep 4
Events disabled
Events enabled
Performance counter stats for 'system wide':
1,063,551,013 cycles
1.002769510 seconds time elapsed
real 0m4.484s
user 0m0.385s
sys 0m0.086s
The bug was introduced when it changed enablement of system-wide events
with a command line workload. But it should've considered the initial
delay case. The code was reworked since then (in bb8bc52e75) so I'm
afraid it won't be applied cleanly.
Fixes: d0a0a51149 ("perf stat: Fix forked applications enablement of counters")
Reported-by: Kevin Nomura <nomurak@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Link: https://lore.kernel.org/r/20221212230820.901382-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The group option predates grouping events using curly braces added in
commit 89efb02950 ("perf tools: Add support to parse event group
syntax").
The --group option was retained for legacy support (in August
2012) but keeping it adds complexity.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shaomin Deng <dengshaomin@cdjrlc.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Timothy Hayes <timothy.hayes@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213232651.1269909-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a small improvement in libbpf_strerror. When libbpf_strerror
is used to obtain the system error description, if the length of the
buf is insufficient, libbpf_sterror returns ERANGE and sets errno to
ERANGE.
However, this processing is not performed when the error code
customized by libbpf is obtained. Make some minor improvements here,
return -ERANGE and set errno to ERANGE when buf is not enough for
custom description.
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221210082045.233697-1-liuxin350@huawei.com
BPF selftests require CONFIG_FUNCTION_ERROR_INJECTION to work. However,
CONFIG_FUNCTION_ERROR_INJECTION is no longer 'y' by default after recent
changes. As a result, we are seeing errors like the following from BPF CI:
bpf_testmod_test_read() is not modifiable
__x64_sys_setdomainname is not sleepable
__x64_sys_getpgid is not sleepable
Fix this by explicitly selecting CONFIG_FUNCTION_ERROR_INJECTION in the
selftest config.
Fixes: a4412fdd49 ("error-injection: Add prompt for function error injection")
Reported-by: Daniel Müller <deso@posteo.net>
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Müller <deso@posteo.net>
Link: https://lore.kernel.org/bpf/20221213220500.3427947-1-song@kernel.org
Kernel test robot reported bpf selftest build failure when CONFIG_SMP
is not set. The error message looks below:
>> progs/rcu_read_lock.c:256:34: error: no member named 'last_wakee' in 'struct task_struct'
last_wakee = task->real_parent->last_wakee;
~~~~~~~~~~~~~~~~~ ^
1 error generated.
When CONFIG_SMP is not set, the field 'last_wakee' is not available in struct
'task_struct'. Hence the above compilation failure. To fix the issue, let us
choose another field 'group_leader' which is available regardless of
CONFIG_SMP set or not.
Fixes: fe147956fc ("bpf/selftests: Add selftests for new task kfuncs")
Fixes: 48671232fc ("selftests/bpf: Add tests for bpf_rcu_read_lock()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20221213012224.379581-1-yhs@fb.com
iommufd is the user API to control the IOMMU subsystem as it relates to
managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU device
specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance the
combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a
guest. Dirty tracking enables VM live migration with SRIOV devices and
PASID support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs, which
is currently VFIO and VDPA.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCY5ct7wAKCRCFwuHvBreF
YZZ5AQDciXfcgXLt0UBEmWupNb0f/asT6tk717pdsKm8kAZMNAEAsIyLiKT5HqGl
s7fAu+CQ1pr9+9NKGevD+frw8Solsw4=
=jJkd
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd implementation from Jason Gunthorpe:
"iommufd is the user API to control the IOMMU subsystem as it relates
to managing IO page tables that point at user space memory.
It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
container) which is the VFIO specific interface for a similar idea.
We see a broad need for extended features, some being highly IOMMU
device specific:
- Binding iommu_domain's to PASID/SSID
- Userspace IO page tables, for ARM, x86 and S390
- Kernel bypassed invalidation of user page tables
- Re-use of the KVM page table in the IOMMU
- Dirty page tracking in the IOMMU
- Runtime Increase/Decrease of IOPTE size
- PRI support with faults resolved in userspace
Many of these HW features exist to support VM use cases - for instance
the combination of PASID, PRI and Userspace IO Page Tables allows an
implementation of DMA Shared Virtual Addressing (vSVA) within a guest.
Dirty tracking enables VM live migration with SRIOV devices and PASID
support allow creating "scalable IOV" devices, among other things.
As these features are fundamental to a VM platform they need to be
uniformly exposed to all the driver families that do DMA into VMs,
which is currently VFIO and VDPA"
For more background, see the extended explanations in Jason's pull request:
https://lore.kernel.org/lkml/Y5dzTU8dlmXTbzoJ@nvidia.com/
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (62 commits)
iommufd: Change the order of MSI setup
iommufd: Improve a few unclear bits of code
iommufd: Fix comment typos
vfio: Move vfio group specific code into group.c
vfio: Refactor dma APIs for emulated devices
vfio: Wrap vfio group module init/clean code into helpers
vfio: Refactor vfio_device open and close
vfio: Make vfio_device_open() truly device specific
vfio: Swap order of vfio_device_container_register() and open_device()
vfio: Set device->group in helper function
vfio: Create wrappers for group register/unregister
vfio: Move the sanity check of the group to vfio_create_group()
vfio: Simplify vfio_create_group()
iommufd: Allow iommufd to supply /dev/vfio/vfio
vfio: Make vfio_container optionally compiled
vfio: Move container related MODULE_ALIAS statements into container.c
vfio-iommufd: Support iommufd for emulated VFIO devices
vfio-iommufd: Support iommufd for physical VFIO devices
vfio-iommufd: Allow iommufd to be used in place of a container fd
vfio: Use IOMMU_CAP_ENFORCE_CACHE_COHERENCY for vfio_file_enforced_coherent()
...
Since Python 3.3 extensions have a suffix encoding platform and
version information. For example, the perf extension was previously
perf.so but now maybe perf.cpython-310-x86_64-linux-gnu.so. Compute
the extension using Python and then use this in the target name. Doing
this avoids the "perf.so" target always being rebuilt.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Eelco Chaudron <echaudro@redhat.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kim Phillips <kim.phillips@amd.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Shaomin Deng <dengshaomin@cdjrlc.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Timothy Hayes <timothy.hayes@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213232651.1269909-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ensure that the availability of the VG register behaves as expected
depending on the kernel version and SVE support.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213114739.2312862-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The first two version numbers are used since that is where the ABI
changes happen, so seems to be the most useful for now.
'Until' is exclusive and 'since' is inclusive so that the same version
number can be used to mark a point where the change comes into effect.
This allows keeping the tests in a state where new tests will also pass
on older kernels if the existence of a new feature isn't explicitly
broadcast by the kernel. For example extended user regs are currently
discovered by trial and error calls to perf_event_open.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213114739.2312862-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This can be used to skip tests or provide different test values on
different platforms. For example to run a test only where Arm SVE is
present add this to the config section:
auxv = auxv["AT_HWCAP"] & 0x200000 == 0x200000
The value is a freeform Python expression that is evaled in the context
of a map called "auxv" that contains the decoded auxiliary vector.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213114739.2312862-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the return value is used to skip the test, but sometimes it
can be useful to test if a certain command should return a certain exit
code.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221213114739.2312862-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Provide task-analyzer test cases for all possible arguments and a subset of possible
combinations.
12 Tests in total.
test_basic:
- cmd:"perf script report task-analyzer"
- Fundamental test of script without arguments.
- Check for standard output.
test_ns_rename:
- cmd:"perf script report task-analyzer --ns --rename-comms-by-tids 0:random"
- Standard task with timestamps in nanoseconds and comm renamed.
- Check for standard output.
test_ms_filtertasks_highlight:
- cmd:"perf script report task-analyzer --ms --filter-tasks perf --highlight-tasks perf"
- Standard task with timestamps in milliseconds, task filtered out and highlighted.
- Check for standard output.
test_extended_times_timelimit_limittasks:
- cmd "perf script report task-analyzer --extended-times --time-limit :99999"
- Standard task with additional schedule out/in info and timlimit active at 99999.
- Check for extended table output.
test_summary:
- cmd:"perf script report task-analyzer --summary"
- Standard task with additional summary output.
- Check for summary print.
test_summary_extended:
- cmd:"perf script report task-analyzer --summary-extended"
- Standard task with summary and additional schedule in/out info.
- Chceck for extended table print.
test_summaryonly:
- cmd:"perf script report task-analyzer --summary-only"
- Only summary should be printed.
- Check for summary print.
test_extended_times_summary_ns:
- cmd:"perf script report task-analyzer --extended-times --summary --ns"
- Standard task with extended schedule in/out information and summary in ns.
- Check for extended table and summary.
test_csv:
- cmd:"perf script report task-analyzer --csv csv"
- Print standard task to csv file in csv format.
- Check for csv format.
test_csv_extended_times:
- cmd:"perf script report task-analyzer --csv csv --extended-times"
- Print standard task to csv file in csv format with additional schedule in/out
information.
- Check for additional information and csv format.
test_csvsummary:
- cmd:"perf script report task-analyzer --csv-summary csvsummary"
- Print summary to csvsummary file in csv format.
- Check for csv format.
test_csvsummary_extended:
- cmd:"perf script report task-analyzer --csv-summary csvsummary --summary-extended"
- Print summary to csvsummary file in csv format with additional schedule in/out
information.
- Check for additional information and csv format.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221206154406.41941-4-petar.gligor@gmail.com
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds the possibility to write the trace and the summary as csv files
to a user specified file. A format as such simplifies further data processing.
This is achieved by having ";" as separators instead of spaces and solely one
header per file.
Additional parameters are being considered, like in the normal usage of the
script. Colors are turned off in the case of a csv output, thus the highlight
option is also being ignored.
Usage:
Write standard task to csv file:
$ perf script report tasks-analyzer --csv <file>
write limited output to csv file in nanoseconds:
$ perf script report tasks-analyzer --csv <file> --ns --limit-to-tasks 1337
Write summary to a csv file:
$ perf script report tasks-analyzer --csv-summary <file>
Write summary to csv file with additional schedule information:
$ perf script report tasks-analyzer --csv-summary <file> --summary-extended
Write both summary and standard task to a csv file:
$ perf script report tasks-analyzer --csv --csv-summary
The following examples illustrate what is possible with the CSV output. The
first command sequence will record all scheduler switch events for 10 seconds,
the task-analyzer calculates task information like runtimes as CSV. A small
python snippet using pandas and matplotlib will visualize the most frequent
task (e.g. kworker/1:1) runtimes - each runtime as a bar in a bar chart:
$ perf record -e sched:sched_switch -a -- sleep 10
$ perf script report tasks-analyzer --ns --csv tasks.csv
$ cat << EOF > /tmp/freq-comm-runtimes-bar.py
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("tasks.csv", sep=';')
most_freq_comm = df["COMM"].value_counts().idxmax()
most_freq_runtimes = df[df["COMM"]==most_freq_comm]["Runtime"]
plt.title(f"Runtimes for Task {most_freq_comm} in Nanoseconds")
plt.bar(range(len(most_freq_runtimes)), most_freq_runtimes)
plt.show()
$ python3 /tmp/freq-comm-runtimes-bar.py
As a seconds example, the subsequent script generates a pie chart of all
accumulated tasks runtimes for 10 seconds of system recordings:
$ perf record -e sched:sched_switch -a -- sleep 10
$ perf script report tasks-analyzer --csv-summary task-summary.csv
$ cat << EOF > /tmp/accumulated-task-pie.py
import pandas as pd
from matplotlib.pyplot import pie, axis, show
df = pd.read_csv("task-summary.csv", sep=';')
sums = df.groupby(df["Comm"])["Accumulated"].sum()
axis("equal")
pie(sums, labels=sums.index);
show()
EOF
$ python3 /tmp/accumulated-task-pie.py
A variety of other visualizations are possible in matplotlib and other
environments. Of course, pandas, numpy and co. also allow easy
statistical analysis of the data!
Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221206154406.41941-3-petar.gligor@gmail.com
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce a new 'perf script' to analyze task scheduling behavior.
During the task analysis, some data is always needed - which goes beyond
the simple time of switching on and off a task (process/thread). This
concerns for example the runtime of a process or the frequency with
which the process was called. This script serves to simplify this
recurring analyze process. It immediately provides the user with helpful
task characteristic information about the tasks runtimes.
Usage:
Recorded can be in two ways:
$ perf script record tasks-analyzer -- sleep 10
$ perf record -e sched:sched_switch -a -- sleep 10
The script can parse all perf.data files, most important: sched:sched_switch
events are mandatory, other events will be ignored.
Most simple report use case is to just call the script without arguments:
$ perf script report tasks-analyzer
Switched-In Switched-Out CPU PID TID Comm Runtime Time Out-In
15576.658891407 15576.659156086 4 2412 2428 gdbus 265 1949
15576.659111320 15576.659455410 0 2412 2412 gnome-shell 344 2267
15576.659491326 15576.659506173 2 74 74 kworker/2:1 15 13145
15576.659506173 15576.659825748 2 2858 2858 gnome-terminal- 320 63263
15576.659871270 15576.659902872 6 20932 20932 kworker/u16:0 32 2314582
15576.659909951 15576.659945501 3 27264 27264 sh 36 -1
15576.659853285 15576.659971052 7 27265 27265 perf 118 5050741
[...]
What is not shown here are the ASCII color sequences. For example, if
the task consists of only one thread, the TID is grayed out.
Runtime is the time the task was running on the CPU, Time Out-In is the
time between the process being scheduled *out* and scheduled back *in*.
So the last time span between two executions. If -1 is printed, then the
task simply ran the first time in the measurements - a Out-In delta
could not be calculated.
In addition to the chronological representation, there is a summary on
task level. This output can be additionally switched on via the
--summary option and provides information such as max, min & average
runtime per process. The maximum runtime is often important for
debugging. The call looks like this:
$ perf script report tasks-analyzer --summary
Summary
Task Information Runtime Information
PID TID Comm Runs Accumulated Mean Median Min Max Max At
14 14 ksoftirqd/0 13 334 26 15 9 127 15571.621211956
15 15 rcu_preempt 133 1778 13 13 2 33 15572.581176024
16 16 migration/0 3 49 16 13 12 24 15571.608915425
20 20 migration/1 3 34 11 13 8 13 15571.639101555
25 25 migration/2 3 32 11 12 9 12 15575.639239896
[...]
Besides these two options, there are a number of other options that change the
output and behavior. This can be queried via --help. Options worth mentioning include:
- filter-tasks - filter out unneeded tasks, --filter-task 1337,/sbin/init
- highlight-tasks - more pleasant focusing, --highlight-tasks 1:red,mutt:yellow
- extended-times - show combinations of elapsed times between schedule in/schedule out
- summary-extended - summary with additional information, like maximum delta time statistics
- rename-comms-by-tids - handy for inexpressive processnames like python, --rename 1337:my-python-app
- ms - show timestamps in milliseconds, nanoseconds is also possible (--ns)
- time-limit - limit the analyzer to a time range, --time-limit 15576.0:15576.1
Script is tested and prime time ready for python2 & python3:
- make PYTHON=python3 prefix=/usr/local install
- make PYTHON=python2 prefix=/usr/local install
Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20221206154406.41941-2-petar.gligor@gmail.com
Signed-off-by: Petar Gligoric <petar.gligoric@rohde-schwarz.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Printing the info doesn't have any dependency on OpenCSD, and neither
does recording Coresight data. Because it's sometimes useful to look at
the info for debugging, it makes sense to be able to see it on the same
platform that the recording was made on.
So pull the auxtrace info printing parts into a new file that is always
compiled into Perf.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221212155513.2259623-6-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
hdr is a copy of 3 values of ptr and doesn't need to be long lived. So
just use ptr instead which means the malloc and the extra error path can
be removed to simplify things.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221212155513.2259623-5-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
cs_etm__print_auxtrace_info() is called twice in case there is an error
somewhere in cs_etm__process_auxtrace_info(), but all the info is
already available at the beginning so just print it there instead.
Also use u64 and the already cast ptr variable to make it more
consistent with the rest of the etm code.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221212155513.2259623-4-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These aren't used outside of cs-etm so don't need stubs. Leave
cs_etm__process_auxtrace_info() which is used externally, and add an
error message so that it's obvious to users why it causes errors.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221212155513.2259623-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is an error rather than just for the raw trace dump so always print
it as an error. Also remove the duplicate header version check.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Al Grant <Al.Grant@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20221212155513.2259623-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add test cases for the task and addr aggregation modes.
$ sudo ./perf test -v contention
86: kernel lock contention analysis test :
--- start ---
test child forked, pid 680006
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
test child finished with 0
---- end ----
kernel lock contention analysis test: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR. It displays lock address and optionally
symbol name if exists.
$ sudo ./perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
1 36.28 us 36.28 us 36.28 us ffff92615d6448b8
9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state
1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80
8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock
3 3.03 us 1.45 us 1.01 us ffff9262277861e0
1 924 ns 924 ns 924 ns ffff926095ba9d20
1 436 ns 436 ns 436 ns ffff9260bfda4f60
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The BPF didn't show the per-thread stat properly. Use task's thread id (PID)
as a key instead of stack_id and add a task_data map to save task comm names.
$ sudo ./perf lock con -abt -E 5 sleep 1
contended total wait max wait avg wait pid comm
1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue
3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/
1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De
12 23.09 us 9.30 us 1.92 us 0 swapper
1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Accessing BPF maps should use the same data types. Add bpf_skel/lock_data.h
to define the common data structures. No functional changes.
Committer notes:
Fixed contention_key.stack_id missing rename to contention_key.stack_or_task_id.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes build systems may append options e.g. --sysroot etc. to CC
variable especially in cross-compile environments like yocto project
where CC varable is composed of cross-compiler name and some needed
options for it to work in a relocatable environment.
Therefore separate out the compiler name from rest of the options in CC,
then add the options via second argument to Popen() API
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Khem Raj <raj.khem@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: https://lore.kernel.org/r/20221205025534.150006-1-raj.khem@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In BTF, tracepoint definitions have the "btf_trace_" prefix. The
off-cpu profiler needs to check the signature of the sched_switch event
using that definition. But there's a typo (s/bpf/btf/) so it failed
always.
Fixes: b36888f71c ("perf record: Handle argument change in sched_switch")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: bpf@vger.kernel.org
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20221208182636.524139-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event group test checks group creation for combinations of hw, sw
and uncore PMU events. Some of the uncore pmus may require additional
permission to access the counters.
For example, in case of hv_24x7, partition need to have permissions to
access hv_24x7 pmu counters. If not, event_open will fail. Hence add a
sanity check to see if event_open succeeds before proceeding with the
test.
Fixes: 9d9b22beda ("perf test: Add event group test for events in multiple PMUs")
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Disha Goel <disgoel@linux.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20221207165815.774-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some distros have older versions of libtraceevent where
TEP_FIELD_IS_RELATIVE and its associated semantics are not present, so
we need to check if the version has it, it was introduced in
libtraceevent 1.5.0.
Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
libtraceevent is now out-of-date and it is better to depend on the
system version. Remove this code that is no longer depended upon by
any builds.
Committer notes:
Removed the removed tools/lib/traceevent/ from tools/perf/MANIFEST, so
that 'make perf-tar-src-pkg' works.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20221130062935.2219247-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command
line variables.
If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the
build, don't compile in libtraceevent and libtracefs support.
This also disables CONFIG_TRACE that controls "perf trace".
CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles,
HAVE_LIBTRACEEVENT is used in C code.
Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the
commands kmem, kwork, lock, sched and timechart are removed. The
majority of commands continue to work including "perf test".
Committer notes:
Fixed up a tools/perf/util/Build reject and added:
#include <traceevent/event-parse.h>
to tools/perf/util/scripting-engines/trace-event-perl.c.
Committer testing:
$ rpm -qi libtraceevent-devel
Name : libtraceevent-devel
Version : 1.5.3
Release : 2.fc36
Architecture: x86_64
Install Date: Mon 25 Jul 2022 03:20:19 PM -03
Group : Unspecified
Size : 27728
License : LGPLv2+ and GPLv2+
Signature : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4
Source RPM : libtraceevent-1.5.3-2.fc36.src.rpm
Build Date : Fri 15 Apr 2022 10:57:01 AM -03
Build Host : buildvm-x86-05.iad2.fedoraproject.org
Packager : Fedora Project
Vendor : Fedora Project
URL : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/
Bug URL : https://bugz.fedoraproject.org/libtraceevent
Summary : Development headers of libtraceevent
Description :
Development headers of libtraceevent-libs
$
Default build:
$ ldd ~/bin/perf | grep tracee
libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000)
$
# perf trace -e sched:* --max-events 10
0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1)
0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1)
0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120)
1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120)
1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120)
0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2)
0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2)
0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120)
1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1)
1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120)
#
Had to tweak tools/perf/util/setup.py to make sure the python binding
shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is
present in CFLAGS.
Building with NO_LIBTRACEEVENT=1 uncovered some more build failures:
- Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y
- perf-$(CONFIG_LIBTRACEEVENT) += scripts/
- bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y
- The python binding needed some fixups and util/trace-event.c can't be
built and linked with the python binding shared object, so remove it
in tools/perf/util/setup.py and exclude it from the list of
dependencies in the python/perf.so Makefile.perf target.
Building without libtraceevent-devel installed uncovered more build
failures:
- The python binding tools/perf/util/python.c was assuming that
traceevent/parse-events.h was always available, which was the case
when we defaulted to using the in-kernel tools/lib/traceevent/ files,
now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like
the other parts of it that deal with tracepoints.
- We have to ifdef the rules in the Build files with
CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and
tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when
setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't
detect libtraceevent-devel installed in the system. Simplification here
to avoid these two ways of disabling builtin-trace.c and not having
CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean
way.
From Athira:
<quote>
tools/perf/arch/powerpc/util/Build
-perf-y += kvm-stat.o
+perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o
</quote>
Then, ditto for arm64 and s390, detected by container cross build tests.
- s/390 uses test__checkevent_tracepoint() that is now only available if
HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT.
Also from Athira:
<quote>
With this change, I could successfully compile in these environment:
- Without libtraceevent-devel installed
- With libtraceevent-devel installed
- With “make NO_LIBTRACEEVENT=1”
</quote>
Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for
consistency with other libraries detected in tools/perf/.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the 'MetricExpr' json value is passed from the json
file to the pmu-events.c. This change introduces an expression
tree that is parsed into. The parsing is done largely by using
operator overloading and python's 'eval' function. Two advantages
in doing this are:
1) Broken metrics fail at compile time rather than relying on
`perf test` to detect. `perf test` remains relevant for checking
event encoding and actual metric use.
2) The conversion to a string from the tree can minimize the metric's
string size, for example, preferring 1e6 over 1000000, avoiding
multiplication by 1 and removing unnecessary whitespace. On x86
this reduces the string size by 2,930bytes (0.07%).
In future changes it would be possible to programmatically
generate the json expressions (a single line of text and so a
pain to write manually) for an architecture using the expression
tree. This could avoid copy-pasting metrics for all architecture
variants.
v4. Doesn't simplify "0*SLOTS" to 0, as the pattern is used to fix
Intel metrics with topdown events.
v3. Avoids generic types on standard types like set that aren't
supported until Python 3.9, fixing an issue with Python 3.6
reported-by John Garry. v3 also fixes minor pylint issues and adds
a call to Simplify on the read expression tree.
v2. Improvements to type information.
Committer notes:
Added one-line fixer from Ian, see first Link: tag below.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/CAP-5=fWa=zNK_ecpWGoGggHCQx7z-oW0eGMQf19Maywg0QK=4g@mail.gmail.com
Link: https://lore.kernel.org/r/20221207055908.1385448-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In print_counter_aggrdata(), it skips some events that has no aggregate
count. It's actually for system-wide per-thread mode and merged uncore
and hybrid events.
Let's update the condition to check them explicitly.
Fixes: 91f85f98da ("perf stat: Display event stats using aggr counts")
Reported-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221206175804.391387-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If LIBTRACEEVENT_DYNAMIC is enabled then avoid the install step for
the plugins. If disabled correct DESTDIR so that the plugins are
installed under <lib>/traceevent/plugins.
Fixes: ef019df01e ("perf build: Install libtraceevent locally when building")
Reported-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20221205225940.3079667-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is used in bpf_lock_contention.c and builtin-lock.c will be made
CONFIG_LIBTRACEEVENT=y conditional, so move it to machine.c, that is
always available.
This makes those 4 global variables for sched and lock text start and
end to move to 'struct machine' too, as conceivably we can have that
info for several machine instances, say some 'perf diff' like tool.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Multiple events in a group can belong to one or more PMUs, however
there are some limitations.
One of the limitations is that perf doesn't allow creating a group of
events from different hw PMUs.
Write a simple test to create various combinations of hw, sw and uncore
PMU events and verify group creation succeeds or fails as expected.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Link: https://lore.kernel.org/r/20221206043237.12159-3-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'pmus' list variable is defined as static variable under pmu.c file.
Introduce a new pmus.c file and migrate this variable to it. Also make
it non static so that it can be accessed from outside.
Suggested-by: Ian Rogers <irogers@google.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: carsten.haitzler@arm.com
Link: https://lore.kernel.org/r/20221206043237.12159-2-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avoid libtraceevent dependency for tep_is_bigendian or trace-event.h
dependency for bigendian. Add a new host_is_bigendian to util.h, using
the compiler defined __BYTE_ORDER__ when available.
Committer notes:
Added:
#else /* !__BYTE_ORDER__ */
On that nested #ifdef block, as per Namhyung's suggestion.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221130062935.2219247-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Remove git reference by changing GIT_COMPAT_UTIL_H to __PERF_UTIL_H.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221130062935.2219247-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In this context, 'os' is already a pointer so the extra dereference
isn't required. This fixes the following test failure on aarch64:
$ ./perf test "json output" -vvv
92: perf stat JSON output linter :
--- start ---
Checking json output: no args Test failed for input:
...
Fatal error: glibc detected an invalid stdio handle
---- end ----
perf stat JSON output linter: FAILED!
Fixes: e7f4da3122 ("perf stat: Pass struct outstate to printout()")
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221130111521.334152-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a metric produces more than one values, it missed to print the opening
bracket.
Fixes: ab6baaae27 ("perf stat: Fix JSON output in metric-only mode")
Reported-by: Weilin Wang <weilin.wang@intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Weilin Wang <weilin.wang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221202190447.1588680-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Compute the headers to be installed from their source headers and make
each have its own build target to install it. Using dependencies
avoids headers being reinstalled and getting a new timestamp which
then causes files that depend on the header to be rebuilt.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20221202045743.2639466-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Compute the headers to be installed from their source headers and make
each have its own build target to install it. Using dependencies
avoids headers being reinstalled and getting a new timestamp which
then causes files that depend on the header to be rebuilt.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20221202045743.2639466-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Compute the headers to be installed from their source headers and make
each have its own build target to install it. Using dependencies
avoids headers being reinstalled and getting a new timestamp which
then causes files that depend on the header to be rebuilt.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20221202045743.2639466-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Compute the headers to be installed from their source headers and make
each have its own build target to install it. Using dependencies
avoids headers being reinstalled and getting a new timestamp which
then causes files that depend on the header to be rebuilt.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20221202045743.2639466-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In 'perf stat' with CSV output option, number of fields in metrics
output is not matching with number of fields in other event output
lines.
Sample output below after applying patch to fix printing os->prefix.
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,82.11,msec,cpu-clock,82111626,100.00,1.000,CPUs utilized
S0,1,2,,context-switches,82109314,100.00,24.358,/sec
------
====> S0,1,,,,,,,1.71,stalled cycles per insn
The above command line uses field separator as "," via "-x," option and
per-socket option displays socket value as first field. But here the
last line for "stalled cycles per insn" has more separators. Each csv
output line is expected to have 8 field separators (for the 9 fields),
where as last line has 9 "," in the result. Patch fixes this issue.
The counter stats are displayed by function
"perf_stat__print_shadow_stats" in code "util/stat-shadow.c". While
printing the stats info for "stalled cycles per insn", function
"new_line_csv" is used as new_line callback.
The fields printed in each line contains: "Socket_id,aggr
nr,Avg,unit,event_name,run,enable_percent,ratio,unit"
The metric output prints Socket_id, aggr nr, ratio and unit. It has to
skip through remaining five fields ie,
Avg,unit,event_name,run,enable_percent. The csv line callback uses
"os->nfields" to know the number of fields to skip to match with other
lines.
Currently it is set as:
os.nfields = 3 + aggr_fields[config->aggr_mode] + (counter->cgrp ? 1 : 0);
But in case of aggregation modes, csv_sep already gets printed along
with each field (Function "aggr_printout" in util/stat-display.c). So
aggr_fields can be removed from nfields. And fixed number of fields to
skip has to be "4". This is to skip fields for: "avg, unit, event name,
run, enable_percent"
This needs 4 csv separators. Patch removes aggr_fields
and uses 4 as fixed number of os->nfields to skip.
After the patch:
# ./perf stat -x, --per-socket -a -C 1 ls
S0,1,79.08,msec,cpu-clock,79085956,100.00,1.000,CPUs utilized
S0,1,7,,context-switches,79084176,100.00,88.514,/sec
------
====> S0,1,,,,,,0.81,stalled cycles per insn
Fixes: 92a61f6412 ("perf stat: Implement CSV metrics output")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20221205042852.83382-1-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This adds all remaining branch filters i.e "no_cycles", "no_flags" and
"hw_index". While here, also updates the documentation.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20221205064443.533587-1-anshuman.khandual@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to check if we have a OS prefix, otherwise we stumble on a
metric segv that I'm now seeing in Arnaldo's tree:
$ gdb --args perf stat -M Backend true
...
Performance counter stats for 'true':
4,712,355 TOPDOWN.SLOTS # 17.3 % tma_core_bound
Program received signal SIGSEGV, Segmentation fault.
__strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
77 ../sysdeps/x86_64/multiarch/strlen-evex.S: No such file or directory.
(gdb) bt
#0 __strlen_evex () at ../sysdeps/x86_64/multiarch/strlen-evex.S:77
#1 0x00007ffff74749a5 in __GI__IO_fputs (str=0x0, fp=0x7ffff75f5680 <_IO_2_1_stderr_>)
#2 0x0000555555779f28 in do_new_line_std (config=0x555555e077c0 <stat_config>, os=0x7fffffffbf10) at util/stat-display.c:356
#3 0x000055555577a081 in print_metric_std (config=0x555555e077c0 <stat_config>, ctx=0x7fffffffbf10, color=0x0, fmt=0x5555558b77b5 "%8.1f", unit=0x7fffffffbb10 "% tma_memory_bound", val=13.165355724442199) at util/stat-display.c:380
#4 0x00005555557768b6 in generic_metric (config=0x555555e077c0 <stat_config>, metric_expr=0x55555593d5b7 "((CYCLE_ACTIVITY.STALLS_MEM_ANY + EXE_ACTIVITY.BOUND_ON_STORES) / (CYCLE_ACTIVITY.STALLS_TOTAL + (EXE_ACTIVITY.1_PORTS_UTIL + tma_retiring * EXE_ACTIVITY.2_PORTS_UTIL) + EXE_ACTIVITY.BOUND_ON_STORES))"..., metric_events=0x555555f334e0, metric_refs=0x555555ec81d0, name=0x555555f32e80 "TOPDOWN.SLOTS", metric_name=0x555555f26c80 "tma_memory_bound", metric_unit=0x55555593d5b1 "100%", runtime=0, map_idx=0, out=0x7fffffffbd90, st=0x555555e9e620 <rt_stat>) at util/stat-shadow.c:934
#5 0x0000555555778cac in perf_stat__print_shadow_stats (config=0x555555e077c0 <stat_config>, evsel=0x555555f289d0, avg=4712355, map_idx=0, out=0x7fffffffbd90, metric_events=0x555555e078e8 <stat_config+296>, st=0x555555e9e620 <rt_stat>) at util/stat-shadow.c:1329
#6 0x000055555577b6a0 in printout (config=0x555555e077c0 <stat_config>, os=0x7fffffffbf10, uval=4712355, run=325322, ena=325322, noise=4712355, map_idx=0) at util/stat-display.c:741
#7 0x000055555577bc74 in print_counter_aggrdata (config=0x555555e077c0 <stat_config>, counter=0x555555f289d0, s=0, os=0x7fffffffbf10) at util/stat-display.c:838
#8 0x000055555577c1d8 in print_counter (config=0x555555e077c0 <stat_config>, counter=0x555555f289d0, os=0x7fffffffbf10) at util/stat-display.c:957
#9 0x000055555577dba0 in evlist__print_counters (evlist=0x555555ec3610, config=0x555555e077c0 <stat_config>, _target=0x555555e01c80 <target>, ts=0x0, argc=1, argv=0x7fffffffe450) at util/stat-display.c:1413
#10 0x00005555555fc821 in print_counters (ts=0x0, argc=1, argv=0x7fffffffe450) at builtin-stat.c:1040
#11 0x000055555560091a in cmd_stat (argc=1, argv=0x7fffffffe450) at builtin-stat.c:2665
#12 0x00005555556b1eea in run_builtin (p=0x555555e11f70 <commands+336>, argc=4, argv=0x7fffffffe450) at perf.c:322
#13 0x00005555556b2181 in handle_internal_command (argc=4, argv=0x7fffffffe450) at perf.c:376
#14 0x00005555556b22d7 in run_argv (argcp=0x7fffffffe27c, argv=0x7fffffffe270) at perf.c:420
#15 0x00005555556b26ef in main (argc=4, argv=0x7fffffffe450) at perf.c:550
(gdb)
Fixes: f123b2d84e ("perf stat: Remove prefix argument in print_metric_headers()")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/CAP-5=fUOjSM5HajU9TCD6prY39LbX4OQbkEbtKPPGRBPBN=_VQ@mail.gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- More userfaultfs work from Peter Xu.
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying.
- Some filemap cleanups from Vishal Moola.
- David Hildenbrand added the ability to selftest anon memory COW handling.
- Some cpuset simplifications from Liu Shixin.
- Addition of vmalloc tracing support by Uladzislau Rezki.
- Some pagecache folioifications and simplifications from Matthew Wilcox.
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use it.
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword. This series shold have been in the
non-MM tree, my bad.
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages.
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages.
- Peter Xu utilized the PTE marker code for handling swapin errors.
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient.
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand.
- zram support for multiple compression streams from Sergey Senozhatsky.
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway.
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations.
- Vishal Moola removed the try_to_release_page() wrapper.
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache.
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking.
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend.
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range().
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen.
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect.
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages().
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting.
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines.
- Many singleton patches, as usual.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5j6ZwAKCRDdBJ7gKXxA
jkDYAP9qNeVqp9iuHjZNTqzMXkfmJPsw2kmy2P+VdzYVuQRcJgEAgoV9d7oMq4ml
CodAgiA51qwzId3GRytIo/tfWZSezgA=
=d19R
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- More userfaultfs work from Peter Xu
- Several convert-to-folios series from Sidhartha Kumar and Huang Ying
- Some filemap cleanups from Vishal Moola
- David Hildenbrand added the ability to selftest anon memory COW
handling
- Some cpuset simplifications from Liu Shixin
- Addition of vmalloc tracing support by Uladzislau Rezki
- Some pagecache folioifications and simplifications from Matthew
Wilcox
- A pagemap cleanup from Kefeng Wang: we have VM_ACCESS_FLAGS, so use
it
- Miguel Ojeda contributed some cleanups for our use of the
__no_sanitize_thread__ gcc keyword.
This series should have been in the non-MM tree, my bad
- Naoya Horiguchi improved the interaction between memory poisoning and
memory section removal for huge pages
- DAMON cleanups and tuneups from SeongJae Park
- Tony Luck fixed the handling of COW faults against poisoned pages
- Peter Xu utilized the PTE marker code for handling swapin errors
- Hugh Dickins reworked compound page mapcount handling, simplifying it
and making it more efficient
- Removal of the autonuma savedwrite infrastructure from Nadav Amit and
David Hildenbrand
- zram support for multiple compression streams from Sergey Senozhatsky
- David Hildenbrand reworked the GUP code's R/O long-term pinning so
that drivers no longer need to use the FOLL_FORCE workaround which
didn't work very well anyway
- Mel Gorman altered the page allocator so that local IRQs can remnain
enabled during per-cpu page allocations
- Vishal Moola removed the try_to_release_page() wrapper
- Stefan Roesch added some per-BDI sysfs tunables which are used to
prevent network block devices from dirtying excessive amounts of
pagecache
- David Hildenbrand did some cleanup and repair work on KSM COW
breaking
- Nhat Pham and Johannes Weiner have implemented writeback in zswap's
zsmalloc backend
- Brian Foster has fixed a longstanding corner-case oddity in
file[map]_write_and_wait_range()
- sparse-vmemmap changes for MIPS, LoongArch and NIOS2 from Feiyang
Chen
- Shiyang Ruan has done some work on fsdax, to make its reflink mode
work better under xfstests. Better, but still not perfect
- Christoph Hellwig has removed the .writepage() method from several
filesystems. They only need .writepages()
- Yosry Ahmed wrote a series which fixes the memcg reclaim target
beancounting
- David Hildenbrand has fixed some of our MM selftests for 32-bit
machines
- Many singleton patches, as usual
* tag 'mm-stable-2022-12-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (313 commits)
mm/hugetlb: set head flag before setting compound_order in __prep_compound_gigantic_folio
mm: mmu_gather: allow more than one batch of delayed rmaps
mm: fix typo in struct pglist_data code comment
kmsan: fix memcpy tests
mm: add cond_resched() in swapin_walk_pmd_entry()
mm: do not show fs mm pc for VM_LOCKONFAULT pages
selftests/vm: ksm_functional_tests: fixes for 32bit
selftests/vm: cow: fix compile warning on 32bit
selftests/vm: madv_populate: fix missing MADV_POPULATE_(READ|WRITE) definitions
mm/gup_test: fix PIN_LONGTERM_TEST_READ with highmem
mm,thp,rmap: fix races between updates of subpages_mapcount
mm: memcg: fix swapcached stat accounting
mm: add nodes= arg to memory.reclaim
mm: disable top-tier fallback to reclaim on proactive reclaim
selftests: cgroup: make sure reclaim target memcg is unprotected
selftests: cgroup: refactor proactive reclaim code to reclaim_until()
mm: memcg: fix stale protection of reclaim target memcg
mm/mmap: properly unaccount memory on mas_preallocate() failure
omfs: remove ->writepage
jfs: remove ->writepage
...
Add a test for bonding prio option. Here is the test result:
]# ./option_prio.sh
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=arp_ip_target and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=1 monitor=miimon and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=5 monitor=miimon and primary_reselect=2) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=0) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=1) [ OK ]
TEST: prio_test (Test bonding option 'prio' with mode=6 monitor=miimon and primary_reselect=2) [ OK ]
Signed-off-by: Liang Li <liali@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Core
----
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations.
- Add inet drop monitor support.
- A few GRO performance improvements.
- Add infrastructure for atomic dev stats, addressing long standing
data races.
- De-duplicate common code between OVS and conntrack offloading
infrastructure.
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements.
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up
the workload with the number of available CPUs.
- Add the helper support for connection-tracking OVS offload.
BPF
---
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF.
- Make cgroup local storage available to non-cgroup attached BPF
programs.
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers.
- A relevant bunch of BPF verifier fixes and improvements.
- Veristat tool improvements to support custom filtering, sorting,
and replay of results.
- Add LLVM disassembler as default library for dumping JITed code.
- Lots of new BPF documentation for various BPF maps.
- Add bpf_rcu_read_{,un}lock() support for sleepable programs.
- Add RCU grace period chaining to BPF to wait for the completion
of access from both sleepable and non-sleepable BPF programs.
- Add support storing struct task_struct objects as kptrs in maps.
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values.
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions.
Protocols
---------
- TCP: implement Protective Load Balancing across switch links.
- TCP: allow dynamically disabling TCP-MD5 static key, reverting
back to fast[er]-path.
- UDP: Introduce optional per-netns hash lookup table.
- IPv6: simplify and cleanup sockets disposal.
- Netlink: support different type policies for each generic
netlink operation.
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support.
- MPTCP: add netlink notification support for listener sockets
events.
- SCTP: add VRF support, allowing sctp sockets binding to VRF
devices.
- Add bridging MAC Authentication Bypass (MAB) support.
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios.
- More work for Wi-Fi 7 support, comprising conversion of all
the existing drivers to internal TX queue usage.
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading.
- IPSec: extended ack support for more descriptive XFRM error
reporting.
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking.
- IEEE 802154: synchronous send frame and extended filtering
support, initial support for scanning available 15.4 networks.
- Tun: bump the link speed from 10Mbps to 10Gbps.
- Tun/VirtioNet: implement UDP segmentation offload support.
Driver API
----------
- PHY/SFP: improve power level switching between standard
level 1 and the higher power levels.
- New API for netdev <-> devlink_port linkage.
- PTP: convert existing drivers to new frequency adjustment
implementation.
- DSA: add support for rx offloading.
- Autoload DSA tagging driver when dynamically changing protocol.
- Add new PCP and APPTRUST attributes to Data Center Bridging.
- Add configuration support for 800Gbps link speed.
- Add devlink port function attribute to enable/disable RoCE and
migratable.
- Extend devlink-rate to support strict prioriry and weighted fair
queuing.
- Add devlink support to directly reading from region memory.
- New device tree helper to fetch MAC address from nvmem.
- New big TCP helper to simplify temporary header stripping.
New hardware / drivers
----------------------
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches.
- Marvel Prestera AC5X Ethernet Switch.
- WangXun 10 Gigabit NIC.
- Motorcomm yt8521 Gigabit Ethernet.
- Microchip ksz9563 Gigabit Ethernet Switch.
- Microsoft Azure Network Adapter.
- Linux Automation 10Base-T1L adapter.
- PHY:
- Aquantia AQR112 and AQR412.
- Motorcomm YT8531S.
- PTP:
- Orolia ART-CARD.
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices.
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices.
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets.
- Realtek RTL8852BE and RTL8723DS.
- Cypress.CYW4373A0 WiFi + Bluetooth combo device.
Drivers
-------
- CAN:
- gs_usb: bus error reporting support.
- kvaser_usb: listen only and bus error reporting support.
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping.
- implement devlink-rate support.
- support direct read from memory.
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate.
- Support for enhanced events compression.
- extend H/W offload packet manipulation capabilities.
- implement IPSec packet offload mode.
- nVidia/Mellanox (mlx4):
- better big TCP support.
- Netronome Ethernet NICs (nfp):
- IPsec offload support.
- add support for multicast filter.
- Broadcom:
- RSS and PTP support improvements.
- AMD/SolarFlare:
- netlink extened ack improvements.
- add basic flower matches to offload, and related stats.
- Virtual NICs:
- ibmvnic: introduce affinity hint support.
- small / embedded:
- FreeScale fec: add initial XDP support.
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood.
- TI am65-cpsw: add suspend/resume support.
- Mediatek MT7986: add RX wireless wthernet dispatch support.
- Realtek 8169: enable GRO software interrupt coalescing per
default.
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP.
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support.
- add ip6gre support.
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support.
- enable flow offload support.
- Renesas:
- add rswitch R-Car Gen4 gPTP support.
- Microchip (lan966x):
- add full XDP support.
- add TC H/W offload via VCAP.
- enable PTP on bridge interfaces.
- Microchip (ksz8):
- add MTU support for KSZ8 series.
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan.
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support.
- add ack signal support.
- enable coredump support.
- remain_on_channel support.
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities.
- 320 MHz channels support.
- RealTek WiFi (rtw89):
- new dynamic header firmware format support.
- wake-over-WLAN support.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmOYXUcSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOk8zQP/R7BZtbJMTPiWkRnSoKHnAyupDVwrz5U
ktukLkwPsCyJuEbAjgxrxf4EEEQ9uq2FFlxNSYuKiiQMqIpFxV6KED7LCUygn4Tc
kxtkp0Q+5XiqisWlQmtfExf2OjuuPqcjV9tWCDBI6GebKUbfNwY/eI44RcMu4BSv
DzIlW5GkX/kZAPqnnuqaLsN3FudDTJHGEAD7NbA++7wJ076RWYSLXlFv0Z+SCSPS
H8/PEG0/ZK/65rIWMAFRClJ9BNIDwGVgp0GrsIvs1gqbRUOlA1hl1rDM21TqtNFf
5QPQT7sIfTcCE/nerxKJD5JE3JyP+XRlRn96PaRw3rt4MgI6I/EOj/HOKQ5tMCNc
oPiqb7N70+hkLZyr42qX+vN9eDPjp2koEQm7EO2Zs+/534/zWDs24Zfk/Aa1ps0I
Fa82oGjAgkBhGe/FZ6i5cYoLcyxqRqZV1Ws9XQMl72qRC7/BwvNbIW6beLpCRyeM
yYIU+0e9dEm+wHQEdh2niJuVtR63hy8tvmPx56lyh+6u0+pondkwbfSiC5aD3kAC
ikKsN5DyEsdXyiBAlytCEBxnaOjQy4RAz+3YXSiS0eBNacXp03UUrNGx4Pzpu/D0
QLFJhBnMFFCgy5to8/DvKnrTPgZdSURwqbIUcZdvU21f1HLR8tUTpaQnYffc/Whm
V8gnt1EL+0cc
=CbJC
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
This looks like a relatively calm development cycle; there have been
only few changes in ALSA and ASoC core sides while we get lots of
device-specific fixes and updates as usual. Most of commits are about
ASoC, including Intel SOF/AVS and many device tree updates.
Below are some highlights:
Core:
- Improvement in memalloc helper for fallback allocations
- More cleanups of ASoC DAPM code
ASoC:
- Factoring out of mapping hw_params onto SoundWire configuration
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740
- Lots of device tree conversions and fixups
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
ALSA:
- Addition of PCM kselftest; still minimalistic but can be extended
in future
- Fixes for corner-case XRUNs with USB-audio implicit feedback mode
- Usual device-specific quirk updates for USB- and HD-audio
- FireWire DICE updates
Also, this PR also contains a few cross-tree updates:
- Some OMAP board file updates for removal of relevant OMAP platforms
- A new I2C API update for I2C probe API adaption
- A DRM update for the further hdmi-codec updates
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmOR6TEOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9p4w/8CoC/jEVFoOFVeH4/ur3MSGv93iDlPlA9sg1B
BMtUEsa+yUtlPjZfw/ZdnUvWGGkvSTIAA3Tyc+yrx+WYAJeoWsL6vpkjQcoKBFLV
oOo/dLROqeK6kS3cir0Z5VzaTg29XNz7iwe2wMp2q0irjbVZVy0+TUa2bzNOAdbs
Hupu5Vwx2lKINSKjWVbN+3g4LiMW+VyEavNZf7bZNxI5h/4p1oaOj/lJrsHCEX5y
rj1+d4EJntaFHToPf+4YkrMjLji0Yj9qsIWeXWy0Q5aUCyNr4zA3LrSszyM5cYfC
dBPPrFatvXt+N0SVTURX7VnKgYzLlG8TNwXPUJbfnTGzvXHzd5q08MHWm2ZyF2tf
3wDR+Lrw97WLWvGKQjHpg1ZFWmqSTC6D+9ihGCNRq0pMW6EtmxHtkDxhD45WF1Wq
UQJNYHWbpSQye+wwio1/JZCZ55x89utapaRXD9cTZCDoCBKOcaUsr71hNt56HL/3
5dT6fx1pJwyaR+SPJg7DQlnPGnm4J8cJhwi+WuHME9IECjO10b9o5ThcxaNWY3W7
ysVCk2jLJHOZTG4FDun5mEqyWEmnjrUAH9UZtCSQZdhYCk8E8C9B2trKUAh9nb/p
bUCrNdoopY5SpUCadPT7HtDiNXNWYMnpd7ktUun2z7V0u8pZNnhNUVvOuzFc/gT1
ypWJp+0=
=QV3a
-----END PGP SIGNATURE-----
Merge tag 'sound-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"This looks like a relatively calm development cycle; there have been
only few changes in ALSA and ASoC core sides while we get lots of
device-specific fixes and updates as usual. Most of commits are about
ASoC, including Intel SOF/AVS and many device tree updates.
Below are some highlights:
Core:
- Improvement in memalloc helper for fallback allocations
- More cleanups of ASoC DAPM code
ASoC:
- Factoring out of mapping hw_params onto SoundWire configuration
- The ever ongoing overhauls of the Intel DSP code continue,
including support for loading libraries and probes with IPC4 on
SOF.
- Support for more sample formats on JZ4740
- Lots of device tree conversions and fixups
- Support for Allwinner D1, a range of AMD and Intel systems,
Mediatek systems with multiple DMICs, Nuvoton NAU8318, NXP
fsl_rpmsg and i.MX93, Qualcomm AudioReach Enable, MFC and SAL,
RealTek RT1318 and Rockchip RK3588
ALSA:
- Addition of PCM kselftest; still minimalistic but can be extended
in future
- Fixes for corner-case XRUNs with USB-audio implicit feedback mode
- Usual device-specific quirk updates for USB- and HD-audio
- FireWire DICE updates
This also contains a few cross-tree updates:
- Some OMAP board file updates for removal of relevant OMAP platforms
- A new I2C API update for I2C probe API adaption
- A DRM update for the further hdmi-codec updates"
* tag 'sound-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (417 commits)
ALSA: mts64: fix possible null-ptr-defer in snd_mts64_interrupt
ALSA: patch_realtek: Fix Dell Inspiron Plus 16
ALSA: hda/cirrus: Add extra 10 ms delay to allow PLL settle and lock.
ASoC: dt-bindings: Correct Alexandre Belloni email
ASoC: dt-bindings: maxim,max98504: Convert to DT schema
ASoC: dt-bindings: maxim,max98357a: Convert to DT schema
ASoC: dt-bindings: Reference common DAI properties
ASoC: dt-bindings: Extend name-prefix.yaml into common DAI properties
ASoC: rt715: Make read-only arrays capture_reg_H and capture_reg_L static const
ASoC: uniphier: aio-core: Make some read-only arrays static const
ASoC: wcd938x: Make read-only array minCode_param static const
ASoC: qcom: lpass-sc7280: Add maybe_unused tag for system PM ops
ASoC : SOF: amd: Add support for IPC and DSP dumps
ASoC: SOF: amd: Use poll function instead to read ACP_SHA_DSP_FW_QUALIFIER
ALSA: usb-audio: Workaround for XRUN at prepare
ALSA: pcm: Handle XRUN at trigger START
ALSA: pcm: Set missing stop_operating flag at undoing trigger start
drm: tda99x: Don't advertise non-existent capture support
ASoC: hdmi-codec: Allow playback and capture to be disabled
kselftest/alsa: Add more coverage of sample rates and channel counts
...
-----BEGIN PGP SIGNATURE-----
iIYEABYIAC4WIQSVyBthFV4iTW/VU1/l49DojIL20gUCY5b27RAcbWljQGRpZ2lr
b2QubmV0AAoJEOXj0OiMgvbSg9YA/0K10H+VsGt1+qqR4+w9SM7SFzbgszrV3Yw9
rwiPgaPVAP9rxXPr2bD2hAk7/Lv9LeJ2kfM9RzMErP1A6UsC5YVbDA==
=mAG7
-----END PGP SIGNATURE-----
Merge tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux
Pull landlock updates from Mickaël Salaün:
"This adds file truncation support to Landlock, contributed by Günther
Noack. As described by Günther [1], the goal of these patches is to
work towards a more complete coverage of file system operations that
are restrictable with Landlock.
The known set of currently unsupported file system operations in
Landlock is described at [2]. Out of the operations listed there,
truncate is the only one that modifies file contents, so these patches
should make it possible to prevent the direct modification of file
contents with Landlock.
The new LANDLOCK_ACCESS_FS_TRUNCATE access right covers both the
truncate(2) and ftruncate(2) families of syscalls, as well as open(2)
with the O_TRUNC flag. This includes usages of creat() in the case
where existing regular files are overwritten.
Additionally, this introduces a new Landlock security blob associated
with opened files, to track the available Landlock access rights at
the time of opening the file. This is in line with Unix's general
approach of checking the read and write permissions during open(), and
associating this previously checked authorization with the opened
file. An ongoing patch documents this use case [3].
In order to treat truncate(2) and ftruncate(2) calls differently in an
LSM hook, we split apart the existing security_path_truncate hook into
security_path_truncate (for truncation by path) and
security_file_truncate (for truncation of previously opened files)"
Link: https://lore.kernel.org/r/20221018182216.301684-1-gnoack3000@gmail.com [1]
Link: https://www.kernel.org/doc/html/v6.1/userspace-api/landlock.html#filesystem-flags [2]
Link: https://lore.kernel.org/r/20221209193813.972012-1-mic@digikod.net [3]
* tag 'landlock-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux:
samples/landlock: Document best-effort approach for LANDLOCK_ACCESS_FS_REFER
landlock: Document Landlock's file truncation support
samples/landlock: Extend sample tool to support LANDLOCK_ACCESS_FS_TRUNCATE
selftests/landlock: Test ftruncate on FDs created by memfd_create(2)
selftests/landlock: Test FD passing from restricted to unrestricted processes
selftests/landlock: Locally define __maybe_unused
selftests/landlock: Test open() and ftruncate() in multiple scenarios
selftests/landlock: Test file truncation support
landlock: Support file truncation
landlock: Document init_layer_masks() helper
landlock: Refactor check_access_path_dual() into is_access_to_paths_allowed()
security: Create file_truncate hook from path_truncate hook
- A ptrace API cleanup series from Sergey Shtylyov
- Fixes and cleanups for kexec from ye xingchen
- nilfs2 updates from Ryusuke Konishi
- squashfs feature work from Xiaoming Ni: permit configuration of the
filesystem's compression concurrency from the mount command line.
- A series from Akinobu Mita which addresses bound checking errors when
writing to debugfs files.
- A series from Yang Yingliang to address rapido memory leaks
- A series from Zheng Yejian to address possible overflow errors in
encode_comp_t().
- And a whole shower of singleton patches all over the place.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5efRgAKCRDdBJ7gKXxA
jgvdAP0al6oFDtaSsshIdNhrzcMwfjt6PfVxxHdLmNhF1hX2dwD/SVluS1bPSP7y
0sZp7Ustu3YTb8aFkMl96Y9m9mY1Nwg=
=ga5B
-----END PGP SIGNATURE-----
Merge tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- A ptrace API cleanup series from Sergey Shtylyov
- Fixes and cleanups for kexec from ye xingchen
- nilfs2 updates from Ryusuke Konishi
- squashfs feature work from Xiaoming Ni: permit configuration of the
filesystem's compression concurrency from the mount command line
- A series from Akinobu Mita which addresses bound checking errors when
writing to debugfs files
- A series from Yang Yingliang to address rapidio memory leaks
- A series from Zheng Yejian to address possible overflow errors in
encode_comp_t()
- And a whole shower of singleton patches all over the place
* tag 'mm-nonmm-stable-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (79 commits)
ipc: fix memory leak in init_mqueue_fs()
hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
rapidio: devices: fix missing put_device in mport_cdev_open
kcov: fix spelling typos in comments
hfs: Fix OOB Write in hfs_asc2mac
hfs: fix OOB Read in __hfs_brec_find
relay: fix type mismatch when allocating memory in relay_create_buf()
ocfs2: always read both high and low parts of dinode link count
io-mapping: move some code within the include guarded section
kernel: kcsan: kcsan_test: build without structleak plugin
mailmap: update email for Iskren Chernev
eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD
rapidio: fix possible UAF when kfifo_alloc() fails
relay: use strscpy() is more robust and safer
cpumask: limit visibility of FORCE_NR_CPUS
acct: fix potential integer overflow in encode_comp_t()
acct: fix accuracy loss for input value of encode_comp_t()
linux/init.h: include <linux/build_bug.h> and <linux/stringify.h>
rapidio: rio: fix possible name leak in rio_register_mport()
rapidio: fix possible name leaks when rio_add_device() fails
...
- New tool "rv" for starting and stopping runtime verification:
example: ./rv mon wip -r printk -v
Enables the wake-in-preempt monitor and the printk reactor in verbose mode
- Fix exit status of rtla usage() calls
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5e8IRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qpicAQD+Ov5Exy0H19l35oK0ojvG7LUeQpiY
f50AOTOlA71KqAD/fI9cdCT+BfmD6b2lD5iZc0k+VqU+XtCZj0iHsq2H7go=
=wxpw
-----END PGP SIGNATURE-----
Merge tag 'trace-tools-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing tools updates from Steven Rostedt:
- New tool "rv" for starting and stopping runtime verification.
Example:
./rv mon wip -r printk -v
Enables the wake-in-preempt monitor and the printk reactor in verbose
mode
- Fix exit status of rtla usage() calls
* tag 'trace-tools-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
Documentation/rv: Add verification/rv man pages
tools/rv: Add in-kernel monitor interface
rv: Add rv tool
rtla: Fix exit status when returning from calls to usage()
- Fix minconfig test to unset the config and not relying on
olddefconfig to do it, as some configs are set to default y
- Fix reading grub2 menus for handling submenus
- Add new ${shell <cmd>} to execute shell commands that will be useful
for setting variables like: HOSTNAME := ${shell hostname}
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCY5erBBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qqeoAQDzt97bAEfWfPGoKBWjtVs/TVIrpyVt
WGbrRwJzdgIrigD/SUBHq4irLD85UpGSG3EiHZRcyJxn8Wuv7npNgtpexQA=
=oPSv
-----END PGP SIGNATURE-----
Merge tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest
Pull ktest updates from Steven Rostedt:
- Fix minconfig test to unset the config and not relying on
olddefconfig to do it, as some configs are set to default y
- Fix reading grub2 menus for handling submenus
- Add new ${shell <cmd>} to execute shell commands that will be useful
for setting variables like: HOSTNAME := ${shell hostname}
* tag 'ktest-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest.pl: Add shell commands to variables
kest.pl: Fix grub2 menu handling for rebooting
ktest.pl minconfig: Unset configs instead of just removing them
This KUnit next update for Linux 6.2-rc1 consists of several enhancements,
fixes, clean-ups, documentation updates, improvements to logging and KTAP
compliance of KUnit test output:
- log numbers in decimal and hex
- parse KTAP compliant test output
- allow conditionally exposing static symbols to tests
when KUNIT is enabled
- make static symbols visible during kunit testing
- clean-ups to remove unused structure definition
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmOXnPYACgkQCwJExA0N
Qxwf9RAAwdBKxgPZuKZ40v69Jm8YhaO3vyKUkyYRH59/HQGFUHMA2f2ONez4krEX
iXPgBFQ+7pB63FdgQi2HSg2z/u3xY02AaGgZGXDuNJDmg2xYjNDfZ0GjN6tuavlN
Liz01DGZkjZoVVXM6oV2xT8woBg/0BbdkKNL1OBO9RBZFHzwDryRzfXmQb8cKlNr
S+tkeZTlCA/s7UW2LNj4VlTzn6wgni4Y9gSk4wbQmSGWn3OX3rHaqAb7GiZ/yPGb
1WjbMeE8FwyydLU40aOZZ8V6AJRiw5VGPJyFzWJyWZ21xOgN9Z95b+I36z8RXraA
i/wnazO/FJsrhzvKL83rQkrSW6bpmVY+jGvk+L6deFM6Ro/vEWHJ4DgyKsIdMiJy
gUM1Q69szptq+ZRHGrZWPlVONBkBXMOL+fePbCbGcMzlaEAS/zsFYW9IBKcvLzwP
uHzzMS/cMmSUq52ZIyl9jhHQFVSoErCpJwQjAaZBQpYXPmE7yLcZItxnCaSUQTay
bRwyps5ph5md0oJTTFJKZ4Zx5FJ2ItjbC4y9BIexb9gYRDdRq723ivDoVENZl/Zk
DFIV95AY+mSxadS5vFagwWwX0ZN0KFKxeM8Tw7VTimal/0Sbglqp+oflsuKFD6JQ
b5HUixYifKMbWxkH5xrUb8NdjmBj561TYa8U4N+j3oOiaPYu5Ss=
=UQNn
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
"Several enhancements, fixes, clean-ups, documentation updates,
improvements to logging and KTAP compliance of KUnit test output:
- log numbers in decimal and hex
- parse KTAP compliant test output
- allow conditionally exposing static symbols to tests when KUNIT is
enabled
- make static symbols visible during kunit testing
- clean-ups to remove unused structure definition"
* tag 'linux-kselftest-kunit-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (29 commits)
Documentation: dev-tools: Clarify requirements for result description
apparmor: test: make static symbols visible during kunit testing
kunit: add macro to allow conditionally exposing static symbols to tests
kunit: tool: make parser preserve whitespace when printing test log
Documentation: kunit: Fix "How Do I Use This" / "Next Steps" sections
kunit: tool: don't include KTAP headers and the like in the test log
kunit: improve KTAP compliance of KUnit test output
kunit: tool: parse KTAP compliant test output
mm: slub: test: Use the kunit_get_current_test() function
kunit: Use the static key when retrieving the current test
kunit: Provide a static key to check if KUnit is actively running tests
kunit: tool: make --json do nothing if --raw_ouput is set
kunit: tool: tweak error message when no KTAP found
kunit: remove KUNIT_INIT_MEM_ASSERTION macro
Documentation: kunit: Remove redundant 'tips.rst' page
Documentation: KUnit: reword description of assertions
Documentation: KUnit: make usage.rst a superset of tips.rst, remove duplication
kunit: eliminate KUNIT_INIT_*_ASSERT_STRUCT macros
kunit: tool: remove redundant file.close() call in unit test
kunit: tool: unit tests all check parser errors, standardize formatting a bit
...
This Kselftest update for Linux 6.2-rc1 consists of several fixes
and enhancements to existing tests and a few new tests:
- adds new amd-pstate and fixes and enhances existing ones
- adds new watchdog tests and enhances existing ones to improve coverage
- fixes to ftrace, splice_read, rtc, and efivars tests
- fixes to handle egrep obsolescence in the latest grep release
- miscellaneous spelling and SPDX fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmOXdT4ACgkQCwJExA0N
Qxy06RAAvQN6kpzCGJjLz5R9Lx3QpG4BiCO7vdZs1QTwzo8kTGQDq3JD6m+ychDx
CgLnH7RrPlIYz4oExnV7JPE2tFEarV/zFh2V8LjKfGePZVtNeDASlC7F3lWYUnM/
n3+6H/JbZ1BgGE9DE5/DAOOAsN0CY2QPJWRDN1wYH7/gLXulPlSt+BV/ZFj3LG/0
Qne2SR7kc+hKPOFNl+BWKOU2a4mNOmoxaROgQraKdeQMQoTAwz/7lfylYZD9nU0r
nyVxHTr0n+/XX3Q93arAS/chOyFBJrAESciUPY4E2oF97uiE0TqHdKA/qfPNRr7N
wSOdWxYSuNaz0tkzO01EzeGGr+mw0WlCNoo6NzsUvqzRXDf0F0cWe32tmIZHJAzS
CqxpKd6I8XPkEeyy5kL12q+akxe30zDGaKdaYGkZ7SjbwG6ygzSSW5MYfojvbtr9
Nfb6OnkPC1aZzC9jtiJO1EHd9f+PdeUVKNQsvzseT4b9xhmpxBqlrzgB5GakDoE6
uo3cXyz5gOzqJD6FT+CqKa/16NaHATw/U7/Y0gXj5ELKEmuYBmnl1T9svDnSIVfF
hgS/3UkFYiw3R2oW35wv988w2JsXrkItOyNdAm47ihvAHF/uCumcSeea5k3+QYrH
7bM4PzJsMMcOhWWQ/04Q+LQCWWem/Vhk22BlIr6IiuGd6L03pc8=
=4cDX
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
"Several fixes and enhancements to existing tests and a few new tests:
- add new amd-pstate tests and fix and enhance existing ones
- add new watchdog tests and enhance existing ones to improve
coverage
- fixes to ftrace, splice_read, rtc, and efivars tests
- fixes to handle egrep obsolescence in the latest grep release
- miscellaneous spelling and SPDX fixes"
* tag 'linux-kselftest-next-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (24 commits)
selftests/ftrace: Use long for synthetic event probe test
selftests/tpm2: Split async tests call to separate shell script runner
selftests: splice_read: Fix sysfs read cases
selftests: ftrace: Use "grep -E" instead of "egrep"
selftests: gpio: Use "grep -E" instead of "egrep"
selftests: kselftest_deps: Use "grep -E" instead of "egrep"
selftests/efivarfs: Add checking of the test return value
cpufreq: amd-pstate: fix spdxcheck warnings for amd-pstate-ut.c
selftests: rtc: skip when RTC is not present
selftests/ftrace: event_triggers: wait longer for test_event_enable
selftests/vDSO: Add riscv getcpu & gettimeofday test
Documentation: amd-pstate: Add tbench and gitsource test introduction
selftests: amd-pstate: Trigger gitsource benchmark and test cpus
selftests: amd-pstate: Trigger tbench benchmark and test cpus
selftests: amd-pstate: Split basic.sh into run.sh and basic.sh.
selftests: amd-pstate: Rename amd-pstate-ut.sh to basic.sh.
selftests/ftrace: Convert tracer tests to use 'requires' to specify program dependency
selftests/ftrace: Add check for ping command for trigger tests
selftests/watchdog: Fix spelling mistake "Temeprature" -> "Temperature"
selftests/watchdog: add test for WDIOC_GETTEMP
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
Nothing too interesting.
* Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations kprobable.
* A couple cpuset optimizations.
* Other misc changes including doc and test updates.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCY5bHvg4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGcYrAQCfrlzrbWw6gTQ7fmr0Avxjy5FxLjsdzEGPcmGY
ByEMhgD/VdUf3zI/Khr91Gsi5JXQxQf7a5caD369xupRWUWjqA8=
=Nf+E
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
"Nothing too interesting:
- Add CONFIG_DEBUG_GROUP_REF which makes cgroup refcnt operations
kprobable
- A couple cpuset optimizations
- Other misc changes including doc and test updates"
* tag 'cgroup-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: remove rcu_read_lock()/rcu_read_unlock() in critical section of spin_lock_irq()
cgroup/cpuset: Improve cpuset_css_alloc() description
kselftest/cgroup: Add cleanup() to test_cpuset_prs.sh
cgroup/cpuset: Optimize cpuset_attach() on v2
cgroup/cpuset: Skip spread flags update on v2
kselftest/cgroup: Fix gathering number of CPUs
cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REF
cgroup: Implement DEBUG_CGROUP_REF
Add a selftests that includes the following test cases:
1. Configuration tests. Both valid and invalid configurations are
tested across all entry types (e.g., L2, IPv4).
2. Forwarding tests. Both host and port group entries are tested across
all entry types.
3. Interaction between user installed MDB entries and IGMP / MLD control
packets.
Example output:
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [ OK ]
TEST: Common host entries configuration tests (IPv6) [ OK ]
TEST: Common host entries configuration tests (L2) [ OK ]
INFO: # Port group entries configuration tests - (*, G)
TEST: Common port group entries configuration tests (IPv4 (*, G)) [ OK ]
TEST: Common port group entries configuration tests (IPv6 (*, G)) [ OK ]
TEST: IPv4 (*, G) port group entries configuration tests [ OK ]
TEST: IPv6 (*, G) port group entries configuration tests [ OK ]
INFO: # Port group entries configuration tests - (S, G)
TEST: Common port group entries configuration tests (IPv4 (S, G)) [ OK ]
TEST: Common port group entries configuration tests (IPv6 (S, G)) [ OK ]
TEST: IPv4 (S, G) port group entries configuration tests [ OK ]
TEST: IPv6 (S, G) port group entries configuration tests [ OK ]
INFO: # Port group entries configuration tests - L2
TEST: Common port group entries configuration tests (L2 (*, G)) [ OK ]
TEST: L2 (*, G) port group entries configuration tests [ OK ]
INFO: # Forwarding tests
TEST: IPv4 host entries forwarding tests [ OK ]
TEST: IPv6 host entries forwarding tests [ OK ]
TEST: L2 host entries forwarding tests [ OK ]
TEST: IPv4 port group "exclude" entries forwarding tests [ OK ]
TEST: IPv6 port group "exclude" entries forwarding tests [ OK ]
TEST: IPv4 port group "include" entries forwarding tests [ OK ]
TEST: IPv6 port group "include" entries forwarding tests [ OK ]
TEST: L2 port entries forwarding tests [ OK ]
INFO: # Control packets tests
TEST: IGMPv3 MODE_IS_INCLUE tests [ OK ]
TEST: MLDv2 MODE_IS_INCLUDE tests [ OK ]
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The test is only concerned with host MDB entries and not with MDB
entries as a whole. Rename the test to reflect that.
Subsequent patches will add a more general test that will contain the
test cases for host MDB entries and remove the current test.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter.
2) Add DATA_SENT state to SCTP connection tracking helper, from
Sriram Yagnaraman.
3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal.
4) Add bitmask support for ipset, from Vishwanath Pai.
5) Handle icmpv6 redirects as RELATED, from Florian Westphal.
6) Add WARN_ON_ONCE() to impossible case in flowtable datapath,
from Li Qiong.
7) A large batch of IPVS updates to replace timer-based estimators by
kthreads to scale up wrt. CPUs and workload (millions of estimators).
Julian Anastasov says:
This patchset implements stats estimation in kthread context.
It replaces the code that runs on single CPU in timer context every 2
seconds and causing latency splats as shown in reports [1], [2], [3].
The solution targets setups with thousands of IPVS services,
destinations and multi-CPU boxes.
Spread the estimation on multiple (configured) CPUs and multiple
time slots (timer ticks) by using multiple chains organized under RCU
rules. When stats are not needed, it is recommended to use
run_estimation=0 as already implemented before this change.
RCU Locking:
- As stats are now RCU-locked, tot_stats, svc and dest which
hold estimator structures are now always freed from RCU
callback. This ensures RCU grace period after the
ip_vs_stop_estimator() call.
Kthread data:
- every kthread works over its own data structure and all
such structures are attached to array. For now we limit
kthreads depending on the number of CPUs.
- even while there can be a kthread structure, its task
may not be running, eg. before first service is added or
while the sysctl var is set to an empty cpulist or
when run_estimation is set to 0 to disable the estimation.
- the allocated kthread context may grow from 1 to 50
allocated structures for timer ticks which saves memory for
setups with small number of estimators
- a task and its structure may be released if all
estimators are unlinked from its chains, leaving the
slot in the array empty
- every kthread data structure allows limited number
of estimators. Kthread 0 is also used to initially
calculate the max number of estimators to allow in every
chain considering a sub-100 microsecond cond_resched
rate. This number can be from 1 to hundreds.
- kthread 0 has an additional job of optimizing the
adding of estimators: they are first added in
temp list (est_temp_list) and later kthread 0
distributes them to other kthreads. The optimization
is based on the fact that newly added estimator
should be estimated after 2 seconds, so we have the
time to offload the adding to chain from controlling
process to kthread 0.
- to add new estimators we use the last added kthread
context (est_add_ktid). The new estimators are linked to
the chains just before the estimated one, based on add_row.
This ensures their estimation will start after 2 seconds.
If estimators are added in bursts, common case if all
services and dests are initially configured, we may
spread the estimators to more chains and as result,
reducing the initial delay below 2 seconds.
Many thanks to Jiri Wiesner for his valuable comments
and for spending a lot of time reviewing and testing
the changes on different platforms with 48-256 CPUs and
1-8 NUMA nodes under different cpufreq governors.
The new IPVS estimators do not use workqueue infrastructure
because:
- The estimation can take long time when using multiple IPVS rules (eg.
millions estimator structures) and especially when box has multiple
CPUs due to the for_each_possible_cpu usage that expects packets from
any CPU. With est_nice sysctl we have more control how to prioritize the
estimation kthreads compared to other processes/kthreads that have
latency requirements (such as servers). As a benefit, we can see these
kthreads in top and decide if we will need some further control to limit
their CPU usage (max number of structure to estimate per kthread).
- with kthreads we run code that is read-mostly, no write/lock
operations to process the estimators in 2-second intervals.
- work items are one-shot: as estimators are processed every
2 seconds, they need to be re-added every time. This again
loads the timers (add_timer) if we use delayed works, as there are
no kthreads to do the timings.
[1] Report from Yunhong Jiang:
https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/
[2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2
[3] Report from Dust:
https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
ipvs: run_estimation should control the kthread tasks
ipvs: add est_cpulist and est_nice sysctl vars
ipvs: use kthreads for stats estimation
ipvs: use u64_stats_t for the per-cpu counters
ipvs: use common functions for stats allocation
ipvs: add rcu protection to stats
netfilter: flowtable: add a 'default' case to flowtable datapath
netfilter: conntrack: set icmpv6 redirects as RELATED
netfilter: ipset: Add support for new bitmask parameter
netfilter: conntrack: merge ipv4+ipv6 confirm functions
netfilter: conntrack: add sctp DATA_SENT state
netfilter: nft_inner: fix IS_ERR() vs NULL check
====================
Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fix up ptrace interface to protection keys register (PKRU)
* Avoid undefined compiler behavior with TYPE_ALIGN
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYisACgkQaDWVMHDJ
krAJkA//QRChRwyKi1syinXt2SGoSa3mTzP23SyV0TunOfKBiBUreFJ2mMFjsX0h
V7SJcu82sCWLHAY6LZRdyiF8zK3Cfzpbgb1QfzBCefE/gU801FhCypqNbQO5Lpdr
PEo+naaDOzwDWDt0A6OkAArgb0zfaOGL+OBhuwT7mcUtBz6gCakFqG2BMgOzqD1z
SAp0RraoSsFnKFl5Gv44+gkThq8/8yL5tyrJtnGv1jAsbhw9zmloaOue6MNMPJhH
3sFQnML3qeNRozquWWeCPu/hxWuFDitPhwdmNRZrnQ3DyRdDhCZPOjv+tQmxI3EO
5c+UIkMIsRh2nZLwHcM+iO5cWE7lyiAWpgqqArB+r2CFXWK5q2lplhXngBodE9Kr
ki/NZ6oEitT3+bLXhCwyc7WKxohl2IlmclJ4AD3Qrp4bzPhfsZebL6nNs/3bxWuF
CxJWIKzjtIcgNSEJaDOzFA5CAImq74r/kCW4e11ZXwmOnx6PX1YG6p0C1yknrZYJ
bvy8WxureO7OJEcVZfwxpXLYbb+7Q/k/l2DkUdVAvKSCB81uWR4JzEp4oooDxf2j
6x9qT5Mi95FhAHOCmlxwkQJTBCB36LkVF/3ESEOqJmun4F5ghPbMX2JzpBa6jPCS
lzkBrzA8MAdmaLHhDO+nd5m8HVY3QBSXDVtRTycmuloeoSeyBno=
=An0n
-----END PGP SIGNATURE-----
Merge tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Dave Hansen:
"There are two little fixes in here, one to give better XSAVE warnings
and another to address some undefined behavior in offsetof().
There is also a collection of patches to fix some issues with ptrace
and the protection keys register (PKRU). PKRU is a real oddity because
it is exposed in the XSAVE-related ABIs, but it is generally managed
without using XSAVE in the kernel. This fix thankfully came with a
selftest to ward off future regressions.
Summary:
- Clarify XSAVE consistency warnings
- Fix up ptrace interface to protection keys register (PKRU)
- Avoid undefined compiler behavior with TYPE_ALIGN"
* tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace
x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set
x86/fpu: Allow PKRU to be (once again) written by ptrace.
x86/fpu: Add a pkru argument to copy_uabi_to_xstate()
x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().
x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()
x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYmwACgkQaDWVMHDJ
krD8hg/+J0hUTfljmlCctwGZyqVR3Y2E722wL9oTvbgYiUAtFrARzfPF0WNwvHi5
Ywvod5hQ4unPoluthdVAD/uJqcPVhjIZ7CvNTGrS8J7ED5x5ydGLNWAL3Rn+9s6O
xkz/DsV4zl+cPQ60XLsO+3Mc6RhwVs9DUthpUovl22epmgmRPCovkHWkvQsZajJq
ceF/78ThfrkG4dDouaIXi1gsmKLLzU4KdHeBATMg0bgPQXFJZSGBCLaeJXWmLapq
7N3SznUqDMn4Plr/IuP4XuMA6VTVojrakCcBmw5SGVqhkVWGM1/FMg7jHSQS7Z5V
5uG7CkhTBqh17v9xKwDMPh34D51TLtNifA7jbecyL5155czFkj7BoSwEFINU/wCz
agUO9NvK9j1chUnA2UGqGQigM3nWGZHMwaQjfgBWyq5gqF8HURUUrjx6XuunOfmB
1byyrDu0g48u/zaQ/RpNfewz1ZY+WylDPcqOhYaVWF1PYThStML/VMBKpdsl1Ovw
nytUdQsaBIjFHQdB+snizaF93+/0FG+FTGAlDnHYmey/8plL2LYuzrcDnDYnGEXa
tN3HFd2lAi4JBLmvmgF39gH+BLXuKTLweIhwTXZTn91cfire3yxiXAnLd0tuptMP
aXFddxKMdMpxTqzy2X+8gJjqCr2lZ9gZkxaPsWwrBM+xrJf0p2w=
=JGnq
-----END PGP SIGNATURE-----
Merge tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tdx updates from Dave Hansen:
"This includes a single chunk of new functionality for TDX guests which
allows them to talk to the trusted TDX module software and obtain an
attestation report.
This report can then be used to prove the trustworthiness of the guest
to a third party and get access to things like storage encryption
keys"
* tag 'x86_tdx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftests/tdx: Test TDX attestation GetReport support
virt: Add TDX guest driver
x86/tdx: Add a wrapper to get TDREPORT0 from the TDX Module
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device unlock.
- Add a facility for the kernel to warn about collisions between kernel
and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL 1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCY5UpyAAKCRDfioYZHlFs
Z0ttAP4uxCjIibKsFVyexpSgI4vaZqQ9yt9NesmPwonc0XookwD+PlwP6Xc0d0Ox
t0gJ6+pwdh11NRzhcNE1pAaPcJZU4gs=
=HAQk
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull cxl updates from Dan Williams:
"Compute Express Link (CXL) updates for 6.2.
While it may seem backwards, the CXL update this time around includes
some focus on CXL 1.x enabling where the work to date had been with
CXL 2.0 (VH topologies) in mind.
First generation CXL can mostly be supported via BIOS, similar to DDR,
however it became clear there are use cases for OS native CXL error
handling and some CXL 3.0 endpoint features can be deployed on CXL 1.x
hosts (Restricted CXL Host (RCH) topologies). So, this update brings
RCH topologies into the Linux CXL device model.
In support of the ongoing CXL 2.0+ enabling two new core kernel
facilities are added.
One is the ability for the kernel to flag collisions between userspace
access to PCI configuration registers and kernel accesses. This is
brought on by the PCIe Data-Object-Exchange (DOE) facility, a hardware
mailbox over config-cycles.
The other is a cpu_cache_invalidate_memregion() API that maps to
wbinvd_on_all_cpus() on x86. To prevent abuse it is disabled in guest
VMs and architectures that do not support it yet. The CXL paths that
need it, dynamic memory region creation and security commands (erase /
unlock), are disabled when it is not present.
As for the CXL 2.0+ this cycle the subsystem gains support Persistent
Memory Security commands, error handling in response to PCIe AER
notifications, and support for the "XOR" host bridge interleave
algorithm.
Summary:
- Add the cpu_cache_invalidate_memregion() API for cache flushing in
response to physical memory reconfiguration, or memory-side data
invalidation from operations like secure erase or memory-device
unlock.
- Add a facility for the kernel to warn about collisions between
kernel and userspace access to PCI configuration registers
- Add support for Restricted CXL Host (RCH) topologies (formerly CXL
1.1)
- Add handling and reporting of CXL errors reported via the PCIe AER
mechanism
- Add support for CXL Persistent Memory Security commands
- Add support for the "XOR" algorithm for CXL host bridge interleave
- Rework / simplify CXL to NVDIMM interactions
- Miscellaneous cleanups and fixes"
* tag 'cxl-for-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (71 commits)
cxl/region: Fix memdev reuse check
cxl/pci: Remove endian confusion
cxl/pci: Add some type-safety to the AER trace points
cxl/security: Drop security command ioctl uapi
cxl/mbox: Add variable output size validation for internal commands
cxl/mbox: Enable cxl_mbox_send_cmd() users to validate output size
cxl/security: Fix Get Security State output payload endian handling
cxl: update names for interleave ways conversion macros
cxl: update names for interleave granularity conversion macros
cxl/acpi: Warn about an invalid CHBCR in an existing CHBS entry
tools/testing/cxl: Require cache invalidation bypass
cxl/acpi: Fail decoder add if CXIMS for HBIG is missing
cxl/region: Fix spelling mistake "memergion" -> "memregion"
cxl/regs: Fix sparse warning
cxl/acpi: Set ACPI's CXL _OSC to indicate RCD mode support
tools/testing/cxl: Add an RCH topology
cxl/port: Add RCD endpoint port enumeration
cxl/mem: Move devm_cxl_add_endpoint() from cxl_core to cxl_mem
tools/testing/cxl: Add XOR Math support to cxl_test
cxl/acpi: Support CXL XOR Interleave Math (CXIMS)
...
- Update the ACPICA code in the kernel to the 20221020 upstream
version and fix a couple of issues in it:
* Make acpi_ex_load_op() match upstream implementation (Rafael
Wysocki).
* Add support for loong_arch-specific APICs in MADT (Huacai Chen).
* Add support for fixed PCIe wake event (Huacai Chen).
* Add EBDA pointer sanity checks (Vit Kabele).
* Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
* Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
* Add a couple of new UUIDs to the known UUID list (Bob Moore).
* Add support for FFH Opregion special context data (Sudeep Holla).
* Improve warning message for "invalid ACPI name" (Bob Moore).
* Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
table (Alison Schofield).
* Prepare IORT support for revision E.e (Robin Murphy).
* Finish support for the CDAT table (Bob Moore).
* Fix error code path in acpi_ds_call_control_method() (Rafael
Wysocki).
* Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
Zetao).
* Update the version of the ACPICA code in the kernel (Bob Moore).
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti).
- Change the return type of the ACPI driver remove callback to void and
update its users accordingly (Dawei Li).
- Add general support for FFH address space type and implement the low-
level part of it for ARM64 (Sudeep Holla).
- Fix stale comments in the ACPI tables parsing code and make it print
more messages related to MADT (Hanjun Guo, Huacai Chen).
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe JAILLET,
Xu Panda).
- Print full name paths of ACPI power resource objects during
enumeration (Kane Chen).
- Eliminate a compiler warning regarding a missing function prototype
in the ACPI power management code (Sudeep Holla).
- Fix and clean up the ACPI processor driver (Rafael Wysocki, Li Zhong,
Colin Ian King, Sudeep Holla).
- Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
driver (Mia Kanashi).
- Add some mew ACPI backlight handling quirks and update some existing
ones (Hans de Goede).
- Make the ACPI backlight driver prefer the native backlight control
over vendor backlight control when possible (Hans de Goede).
- Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König).
- Use xchg_release() instead of cmpxchg() for updating new GHES cache
slots (Ard Biesheuvel).
- Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay Lu).
- Add new I2C device enumeration quirks for Medion Lifetab S10346 and
Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede).
- Make the ACPI battery driver notify user space about adding new
battery hooks and removing the existing ones (Armin Wolf).
- Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
for freeing acpi_object structures to help diagnostics (Wang ShaoBo).
- Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
code (ye xingchen).
- Fix the _FIF package extraction failure handling in the ACPI fan
driver (Hanjun Guo).
- Fix the PCC mailbox handling error code path (Huisong Li).
- Avoid using PCC Opregions if there is no platform interrupt allocated
for this purpose (Huisong Li).
- Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
CPPC library (ye xingchen).
- Fix some kernel-doc issues in the ACPI GSI processing code (Xiongfeng
Wang).
- Fix name memory leak in pnp_alloc_dev() (Yang Yingliang).
- Do not disable PNP devices on suspend when they cannot be re-enabled
on resume (Hans de Goede).
- Clean up the ACPI thermal driver a bit (Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXV10SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxuOwP/2zew6val2Jf7I/Yxf1iQLlRyGmhFnaH
wpltJvBjlHjAUKnPQ/kLYK9fjuUY5HVgjOE03WpwhFUpmhftYTrSkhoVkJ1Mw9Zl
RNOAEgCG484ThHiTIVp/dMPxrtfuqpdbamhWX3Q51IfXjGW8Vc/lDxIa3k/JQxyq
ko8GFPCoebJrSCfuwaAf2+xSQaf6dq4jpL/rlIk+nYMMB9mQmXhNEhc+l97NaCe8
MyCIGynyNbhGsIlwdHRvTp04EIe8h0Z1+Dyns7g/TrzHj3Aezy7QVZbn8sKdZWa1
W/Ck9QST5tfpDWyr+hUXxUJjEn4Yy+GXjM2xON0EMx5q+JD9XsOpwWOVwTR7CS5s
FwEd6I89SC8OZM86AgMtnGxygjpK24R/kGzHjhG15IQCsypc8Rvzoxl0L0YVoon/
UTkE57GzNWVzu0pY/oXJc2aT7lVqFXMFZ6ft/zHnBRnQmrcIi+xgDO5ni5KxctFN
TVFwbAMCuwVx6IOcVQCZM2g4aJw426KpUn19fKnXvPwR5UIufBaCzSKWMiYrtdXr
O5BM8ElYuyKCWGYEE0GSMjZygyDpyY6ENLH7s7P1IEmFyigBzaaGBbKm108JJq4V
eCWJYTAx8pAptsU/vfuMvEQ1ErfhZ3TTokA5Lv0uPf53VcAnWDb7EAbW6ZGMwFSI
IaV6cv6ILoqO
=GVzp
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI and PNP updates from Rafael Wysocki:
"These include new code (for instance, support for the FFH address
space type and support for new firmware data structures in ACPICA),
some new quirks (mostly related to backlight handling and I2C
enumeration), a number of fixes and a fair amount of cleanups all
over.
Specifics:
- Update the ACPICA code in the kernel to the 20221020 upstream
version and fix a couple of issues in it:
- Make acpi_ex_load_op() match upstream implementation (Rafael
Wysocki)
- Add support for loong_arch-specific APICs in MADT (Huacai Chen)
- Add support for fixed PCIe wake event (Huacai Chen)
- Add EBDA pointer sanity checks (Vit Kabele)
- Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele)
- Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan)
- Add a couple of new UUIDs to the known UUID list (Bob Moore)
- Add support for FFH Opregion special context data (Sudeep
Holla)
- Improve warning message for "invalid ACPI name" (Bob Moore)
- Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT
table (Alison Schofield)
- Prepare IORT support for revision E.e (Robin Murphy)
- Finish support for the CDAT table (Bob Moore)
- Fix error code path in acpi_ds_call_control_method() (Rafael
Wysocki)
- Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li
Zetao)
- Update the version of the ACPICA code in the kernel (Bob Moore)
- Use ZERO_PAGE(0) instead of empty_zero_page in the ACPI device
enumeration code (Giulio Benetti)
- Change the return type of the ACPI driver remove callback to void
and update its users accordingly (Dawei Li)
- Add general support for FFH address space type and implement the
low- level part of it for ARM64 (Sudeep Holla)
- Fix stale comments in the ACPI tables parsing code and make it
print more messages related to MADT (Hanjun Guo, Huacai Chen)
- Replace invocations of generic library functions with more kernel-
specific counterparts in the ACPI sysfs interface (Christophe
JAILLET, Xu Panda)
- Print full name paths of ACPI power resource objects during
enumeration (Kane Chen)
- Eliminate a compiler warning regarding a missing function prototype
in the ACPI power management code (Sudeep Holla)
- Fix and clean up the ACPI processor driver (Rafael Wysocki, Li
Zhong, Colin Ian King, Sudeep Holla)
- Add quirk for the HP Pavilion Gaming 15-cx0041ur to the ACPI EC
driver (Mia Kanashi)
- Add some mew ACPI backlight handling quirks and update some
existing ones (Hans de Goede)
- Make the ACPI backlight driver prefer the native backlight control
over vendor backlight control when possible (Hans de Goede)
- Drop unsetting ACPI APEI driver data on remove (Uwe Kleine-König)
- Use xchg_release() instead of cmpxchg() for updating new GHES cache
slots (Ard Biesheuvel)
- Clean up the ACPI APEI code (Sudeep Holla, Christophe JAILLET, Jay
Lu)
- Add new I2C device enumeration quirks for Medion Lifetab S10346 and
Lenovo Yoga Tab 3 Pro (YT3-X90F) (Hans de Goede)
- Make the ACPI battery driver notify user space about adding new
battery hooks and removing the existing ones (Armin Wolf)
- Modify the pfr_update and pfr_telemetry drivers to use ACPI_FREE()
for freeing acpi_object structures to help diagnostics (Wang
ShaoBo)
- Make the ACPI fan driver use sysfs_emit_at() in its sysfs interface
code (ye xingchen)
- Fix the _FIF package extraction failure handling in the ACPI fan
driver (Hanjun Guo)
- Fix the PCC mailbox handling error code path (Huisong Li)
- Avoid using PCC Opregions if there is no platform interrupt
allocated for this purpose (Huisong Li)
- Use sysfs_emit() instead of scnprintf() in the ACPI PAD driver and
CPPC library (ye xingchen)
- Fix some kernel-doc issues in the ACPI GSI processing code
(Xiongfeng Wang)
- Fix name memory leak in pnp_alloc_dev() (Yang Yingliang)
- Do not disable PNP devices on suspend when they cannot be
re-enabled on resume (Hans de Goede)
- Clean up the ACPI thermal driver a bit (Rafael Wysocki)"
* tag 'acpi-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (67 commits)
ACPI: x86: Add skip i2c clients quirk for Medion Lifetab S10346
ACPI: APEI: EINJ: Refactor available_error_type_show()
ACPI: APEI: EINJ: Fix formatting errors
ACPI: processor: perflib: Adjust acpi_processor_notify_smm() return value
ACPI: processor: perflib: Rearrange acpi_processor_notify_smm()
ACPI: processor: perflib: Rearrange unregistration routine
ACPI: processor: perflib: Drop redundant parentheses
ACPI: processor: perflib: Adjust white space
ACPI: processor: idle: Drop unnecessary statements and parens
ACPI: thermal: Adjust critical.flags.valid check
ACPI: fan: Convert to use sysfs_emit_at() API
ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
ACPI: battery: Call power_supply_changed() when adding hooks
ACPI: use sysfs_emit() instead of scnprintf()
ACPI: x86: Add skip i2c clients quirk for Lenovo Yoga Tab 3 Pro (YT3-X90F)
ACPI: APEI: Remove a useless include
PNP: Do not disable devices on suspend when they cannot be re-enabled on resume
ACPI: processor: Silence missing prototype warnings
ACPI: processor_idle: Silence missing prototype warnings
ACPI: PM: Silence missing prototype warning
...
- Fix nasty and hard to debug race condition introduced by mistake
in the runtime PM core code and clean up that code somewhat on
top of the fix (Rafael Wysocki).
- Generalize of_perf_domain_get_sharing_cpumask phandle format (Hector
Martin).
- Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin).
- Update Qualcomm cpufreq driver, including:
* CPU clock provider support,
* Generic cleanups or reorganization.
* Potential memleak fix.
* Fix of the return value of cpufreq_driver->get().
(Manivannan Sadhasivam, Chen Hui).
- Update Qualcomm cpufreq driver's DT bindings, including:
* Support for CPU clock provider.
* Missing cache-related properties fixes.
* Support for QDU1000/QRU1000.
(Manivannan Sadhasivam, Rob Herring, Melody Olvera).
- Add support for ti,am625 SoC and enable build of ti-cpufreq for
ARCH_K3 (Dave Gerlach, and Vibhore Vardhan).
- Use flexible array to simplify memory allocation in the tegra186
cpufreq driver (Christophe JAILLET).
- Convert cpufreq statistics code to use sysfs_emit_at() (ye xingchen).
- Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni
Gherdovich).
- Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq driver
(Xiongfeng Wang).
- Initialize the kobj_unregister completion before calling
kobject_init_and_add() in the cpufreq core code (Yongqiang Liu).
- Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes,
Nathan Chancellor).
- Make intel_pstate accept initial EPP value of 0x80 (Srinivas
Pandruvada).
- Make read-only array sys_clk_src in the SPEAr cpufreq driver static
(Colin Ian King).
- Make array speeds in the longhaul cpufreq driver static (Colin Ian
King).
- Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy
Shevchenko).
- Drop a reference to CVS from cpufreq documentation (Conghui Wang).
- Improve kernel messages printed by the PSCI cpuidle driver (Ulf
Hansson).
- Make the DT cpuidle driver return the correct number of parsed idle
states, clean it up and clarify a comment in it (Ulf Hansson).
- Modify the tasks freezing code to avoid using pr_cont() and refine an
error message printed by it (Rafael Wysocki).
- Make the hibernation core code complain about memory map mismatches
during resume to help diagnostics (Xueqin Luo).
- Fix mistake in a kerneldoc comment in the hibernation code (xiongxin).
- Reverse the order of performance and enabling operations in the
generic power domains code (Abel Vesa).
- Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the
generic power domains code (Abel Vesa).
- Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn
Guo).
- Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo).
- Drop generic power domain status manipulation during hibernate
restore (Shawn Guo).
- Fix compiler warnings with make W=1 in the idle_inject power capping
driver (Srinivas Pandruvada).
- Use kstrtobool() instead of strtobool() in the power capping sysfs
interface (Christophe JAILLET).
- Add SCMI Powercap based power capping driver (Cristian Marussi).
- Add Emerald Rapids support to the intel-uncore-freq driver (Artem
Bityutskiy).
- Repair slips in kernel-doc comments in the generic notifier code
(Lukas Bulwahn).
- Fix several DT issues in the OPP library reorganize code around
opp-microvolt-<named> DT property (Viresh Kumar).
- Allow any of opp-microvolt, opp-microamp, or opp-microwatt properties
to be present without the others present (James Calligeros).
- Fix clock-latency-ns property in DT example (Serge Semin).
- Add a private governor_data for devfreq governors (Kant Fan).
- Reorganize devfreq code to use device_match_of_node() and
devm_platform_get_and_ioremap_resource() instead of open coding
them (ye xingchen, Minghao Chi).
- Make cpupower choose base_cpu to display default cpupower details
instead of picking CPU 0 (Saket Kumar Bhaskar).
- Add Georgian translation to cpupower documentation (Zurab
Kargareteli).
- Introduce powercap intel-rapl library, powercap-info command, and
RAPL monitor into cpupower (Thomas Renninger).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXWKsSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxzKsP/jEKIwMSG4KHyXjJSDopFvppA13468ma
ao2G5EnbtPgZOiN66BOcAfPB+pzBM8WBCnpy8sfNzcpQSaGaJr+flQDqQV/1QG/H
GNQ0MUYN6TF/zfz/hKawDtQJihw9OrJgqQfUJIyc7Djo8ntSBu299XAt3X8VB5D1
azU1WwfOnEhr8evkqd8DS81fwm6b5cWvLkfG3Qvk2VxlwC/BCFdygqNjwOXmMNMb
DPYWv1xoVhSKzJsPHbAtzFq6veLsw2Glf2xPDyjf9ZPB0ujrftFoRoeCrC/neBDb
5bB4P5Injg3IB7SAHf97XgGAH2biUKwVnQhVUOTWXdQ7u/xDbH5fOLFJkBOBP6n6
gZiEOqzg5wVXk+ZfKx4fjsf4LvB1r+nM2tmx/bzhxyt9UDLUfB9kY0PMXLRuYqyn
ITvk00CJ/hkwD98pql4pCnc1PYZLUv/CHiaqTjwwOKuue3Jb3OTSPrSWtYIyTyNx
s2eBz/CxGSg4Q25u3loIiNVAaCOul6SZq+Iz6BlVP8sy3q62LWi8mp5b+kb8HFWH
lk8GpavqOLF6brxpPL/n0vav2bCmdwblMjTcowtGbLgiGSZaD97AkPFTN2H7tGPv
iUZDTdK3H24aqY62yKzo2HK3PhwNCg06gF0VTsuvJ7iIQmfeUpLjB/3qGeJNjlEQ
20fQ6YU/NytB
=B9Uu
-----END PGP SIGNATURE-----
Merge tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These include two new drivers (cpufreq driver for Apple SoC CPU
P-states and the SCMI Powercap based power capping driver), other new
hardware support and driver extensions (Qualcomm cpufreq driver and
its DT bindings, TI cpufreq driver, intel_pstate, intel-uncore-freq),
a bunch of fixes and cleanups all over and a cpupower utility update
including new features related to RAPL support.
Specifics:
- Fix nasty and hard to debug race condition introduced by mistake in
the runtime PM core code and clean up that code somewhat on top of
the fix (Rafael Wysocki)
- Generalize of_perf_domain_get_sharing_cpumask phandle format
(Hector Martin)
- Add new cpufreq driver for Apple SoC CPU P-states (Hector Martin)
- Update Qualcomm cpufreq driver (Manivannan Sadhasivam, Chen Hui):
- CPU clock provider support
- Generic cleanups or reorganization
- Potential memleak fix
- Fix of the return value of cpufreq_driver->get()
- Update Qualcomm cpufreq driver's DT bindings (Manivannan
Sadhasivam, Rob Herring, Melody Olvera):
- Support for CPU clock provider
- Missing cache-related properties fixes
- Support for QDU1000/QRU1000
- Add support for ti,am625 SoC and enable build of ti-cpufreq for
ARCH_K3 (Dave Gerlach, and Vibhore Vardhan)
- Use flexible array to simplify memory allocation in the tegra186
cpufreq driver (Christophe JAILLET)
- Convert cpufreq statistics code to use sysfs_emit_at() (ye
xingchen)
- Allow intel_pstate to use no-HWP mode on Sapphire Rapids (Giovanni
Gherdovich)
- Add missing pci_dev_put() to the amd_freq_sensitivity cpufreq
driver (Xiongfeng Wang)
- Initialize the kobj_unregister completion before calling
kobject_init_and_add() in the cpufreq core code (Yongqiang Liu)
- Defer setting boost MSRs in the ACPI cpufreq driver (Stuart Hayes,
Nathan Chancellor)
- Make intel_pstate accept initial EPP value of 0x80 (Srinivas
Pandruvada)
- Make read-only array sys_clk_src in the SPEAr cpufreq driver static
(Colin Ian King)
- Make array speeds in the longhaul cpufreq driver static (Colin Ian
King)
- Use str_enabled_disabled() helper in the ACPI cpufreq driver (Andy
Shevchenko)
- Drop a reference to CVS from cpufreq documentation (Conghui Wang)
- Improve kernel messages printed by the PSCI cpuidle driver (Ulf
Hansson)
- Make the DT cpuidle driver return the correct number of parsed idle
states, clean it up and clarify a comment in it (Ulf Hansson)
- Modify the tasks freezing code to avoid using pr_cont() and refine
an error message printed by it (Rafael Wysocki)
- Make the hibernation core code complain about memory map mismatches
during resume to help diagnostics (Xueqin Luo)
- Fix mistake in a kerneldoc comment in the hibernation code
(xiongxin)
- Reverse the order of performance and enabling operations in the
generic power domains code (Abel Vesa)
- Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in
the generic power domains code (Abel Vesa)
- Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn
Guo)
- Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo)
- Drop generic power domain status manipulation during hibernate
restore (Shawn Guo)
- Fix compiler warnings with make W=1 in the idle_inject power
capping driver (Srinivas Pandruvada)
- Use kstrtobool() instead of strtobool() in the power capping sysfs
interface (Christophe JAILLET)
- Add SCMI Powercap based power capping driver (Cristian Marussi)
- Add Emerald Rapids support to the intel-uncore-freq driver (Artem
Bityutskiy)
- Repair slips in kernel-doc comments in the generic notifier code
(Lukas Bulwahn)
- Fix several DT issues in the OPP library reorganize code around
opp-microvolt-<named> DT property (Viresh Kumar)
- Allow any of opp-microvolt, opp-microamp, or opp-microwatt
properties to be present without the others present (James
Calligeros)
- Fix clock-latency-ns property in DT example (Serge Semin)
- Add a private governor_data for devfreq governors (Kant Fan)
- Reorganize devfreq code to use device_match_of_node() and
devm_platform_get_and_ioremap_resource() instead of open coding
them (ye xingchen, Minghao Chi)
- Make cpupower choose base_cpu to display default cpupower details
instead of picking CPU 0 (Saket Kumar Bhaskar)
- Add Georgian translation to cpupower documentation (Zurab
Kargareteli)
- Introduce powercap intel-rapl library, powercap-info command, and
RAPL monitor into cpupower (Thomas Renninger)"
* tag 'pm-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
PM: runtime: Adjust white space in the core code
cpufreq: Remove CVS version control contents from documentation
cpufreq: stats: Convert to use sysfs_emit_at() API
cpufreq: ACPI: Only set boost MSRs on supported CPUs
PM: sleep: Refine error message in try_to_freeze_tasks()
PM: sleep: Avoid using pr_cont() in the tasks freezing code
PM: runtime: Relocate rpm_callback() right after __rpm_callback()
PM: runtime: Do not call __rpm_callback() from rpm_idle()
PM / devfreq: event: use devm_platform_get_and_ioremap_resource()
PM / devfreq: event: Use device_match_of_node()
PM / devfreq: Use device_match_of_node()
powercap: idle_inject: Fix warnings with make W=1
PM: hibernate: Complain about memory map mismatches during resume
dt-bindings: cpufreq: cpufreq-qcom-hw: Add QDU1000/QRU1000 cpufreq
cpufreq: tegra186: Use flexible array to simplify memory allocation
cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain
cpupower: Introduce powercap intel-rapl library and powercap-info command
cpupower: Add Georgian translation
cpufreq: intel_pstate: Add Sapphire Rapids support in no-HWP mode
cpufreq: amd_freq_sensitivity: Add missing pci_dev_put()
...
Currently, kunit_parser.py is stripping all leading whitespace to make
parsing easier. But this means we can't accurately show kernel output
for failing tests or when the kernel crashes.
Embarassingly, this affects even KUnit's own output, e.g.
[13:40:46] Expected 2 + 1 == 2, but
[13:40:46] 2 + 1 == 3 (0x3)
[13:40:46] not ok 1 example_simple_test
[13:40:46] [FAILED] example_simple_test
After this change, here's what the output in context would look like
[13:40:46] =================== example (4 subtests) ===================
[13:40:46] # example_simple_test: initializing
[13:40:46] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[13:40:46] Expected 2 + 1 == 2, but
[13:40:46] 2 + 1 == 3 (0x3)
[13:40:46] [FAILED] example_simple_test
[13:40:46] [SKIPPED] example_skip_test
[13:40:46] [SKIPPED] example_mark_skipped_test
[13:40:46] [PASSED] example_all_expect_macros_test
[13:40:46] # example: initializing suite
[13:40:46] # example: pass:1 fail:1 skip:2 total:4
[13:40:46] # Totals: pass:1 fail:1 skip:2 total:4
[13:40:46] ===================== [FAILED] example =====================
This example shows one minor cosmetic defect this approach has.
The test counts lines prevent us from dedenting the suite-level output.
But at the same time, any form of non-KUnit output would do the same
unless it happened to be indented as well.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We print the "test log" on failure.
This is meant to be all the kernel output that happened during the test.
But we also include the special KTAP lines in it, which are often
redundant.
E.g. we include the "not ok" line in the log, right before we print
that the test case failed...
[13:51:48] Expected 2 + 1 == 2, but
[13:51:48] 2 + 1 == 3 (0x3)
[13:51:48] not ok 1 example_simple_test
[13:51:48] [FAILED] example_simple_test
More full example after this patch:
[13:51:48] =================== example (4 subtests) ===================
[13:51:48] # example_simple_test: initializing
[13:51:48] # example_simple_test: EXPECTATION FAILED at lib/kunit/kunit-example-test.c:29
[13:51:48] Expected 2 + 1 == 2, but
[13:51:48] 2 + 1 == 3 (0x3)
[13:51:48] [FAILED] example_simple_test
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Change the KUnit parser to be able to parse test output that complies with
the KTAP version 1 specification format found here:
https://kernel.org/doc/html/latest/dev-tools/ktap.html. Ensure the parser
is able to parse tests with the original KUnit test output format as
well.
KUnit parser now accepts any of the following test output formats:
Original KUnit test output format:
TAP version 14
1..1
# Subtest: kunit-test-suite
1..3
ok 1 - kunit_test_1
ok 2 - kunit_test_2
ok 3 - kunit_test_3
# kunit-test-suite: pass:3 fail:0 skip:0 total:3
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 - kunit-test-suite
KTAP version 1 test output format:
KTAP version 1
1..1
KTAP version 1
1..3
ok 1 kunit_test_1
ok 2 kunit_test_2
ok 3 kunit_test_3
ok 1 kunit-test-suite
New KUnit test output format (changes made in the next patch of
this series):
KTAP version 1
1..1
KTAP version 1
# Subtest: kunit-test-suite
1..3
ok 1 kunit_test_1
ok 2 kunit_test_2
ok 3 kunit_test_3
# kunit-test-suite: pass:3 fail:0 skip:0 total:3
# Totals: pass:3 fail:0 skip:0 total:3
ok 1 kunit-test-suite
Signed-off-by: Rae Moar <rmoar@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When --raw_output is set (to any value), we don't actually parse the
test results. So asking to print the test results as json doesn't make
sense.
We internally create a fake test with one passing subtest, so --json
would actually print out something misleading.
This patch:
* Rewords the flag descriptions so hopefully this is more obvious.
* Also updates --raw_output's description to note the default behavior
is to print out only "KUnit" results (actually any KTAP results)
* also renames and refactors some related logic for clarity (e.g.
test_result => test, it's a kunit_parser.Test object).
Notably, this patch does not make it an error to specify --json and
--raw_output together. This is an edge case, but I know of at least one
wrapper around kunit.py that always sets --json. You'd never be able to
use --raw_output with that wrapper.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We currently tell people we "couldn't find any KTAP output" with no
indication as to what this might mean.
After this patch, we get:
$ ./tools/testing/kunit/kunit.py parse /dev/null
============================================================
[ERROR] Test: <missing>: Could not find any KTAP output. Did any KUnit tests run?
============================================================
Testing complete. Ran 0 tests: errors: 1
Note: we could try and generate a more verbose message like
> Please check .kunit/test.log to see the raw kernel output.
or the like, but we'd need to know what the build dir was to know where
test.log actually lives.
This patch tries to make a more minimal improvement.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We're using a `with` block above, so the file object is already closed.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Let's verify that the parser isn't reporting any errors for valid
inputs.
This change also
* does result.status checking on one line
* makes sure we consistently do it outside of the `with` block
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since we're using Python 3.7+, we can use dataclasses to tersen the
code.
It also lets us create pre-populated TestCounts() objects and compare
them in our unit test. (Before, you could only create empty ones).
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
x86 Xen-for-KVM:
* Allow the Xen runstate information to cross a page boundary
* Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured
* add support for 32-bit guests in SCHEDOP_poll
x86 fixes:
* One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
* Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
* Clean up the MSR filter docs.
* Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
* Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
* Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
* Advertise (on AMD) that the SMM_CTL MSR is not supported
* Remove unnecessary exports
Selftests:
* Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
* Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
* Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
* Introduce actual atomics for clear/set_bit() in selftests
Documentation:
* Remove deleted ioctls from documentation
* Various fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmOWgtsACgkQ6rmadz2v
bTpT2g//WzQRsODtPVVmg87fEo1GSTXvoXq/fhg95OKNZrVKgx1N6EVlFSLSqEjL
TAmOuv5cZT28ZpMPMNjnU/c/lFf/6/UWbbTusA+F3MtSCBSbP5DPsWDD0yvNT9DL
EZbGoQDSyt1M+BakZLzwOV6HPn9oDhj5p/4lMw+gptTY+3IeYUbS50DinM8eLz+Q
067aF01p3ROF6LNUx9Az0cLPdU05oHzL2MvRsj/F7h/sWoSW5B/1Kx/m1vsT9lwn
T2vbm6r4Jo0m0ZvpEMeRyKNZgVKIc64C7NH9CV7V66giJaONmxvLwkc0zWFwbXJ2
V9aPQbbBUx/CZXoC72LEsvVcoAFl7LAL1IALm2HVt1iQjpj1yDlWw3WV0PMQ9Rn7
xRVDOfQNGZ6jnkv6LB2j7V1z7hVENWQQwM48dgO2pAnJwYmUW9wZaAGE5kadUrZf
eCD4c1U+qcZkSk4vwvpr8ubJ0PWPMUZqI0FrHUxfPxqkdy78c1h3qNQufZvAHWff
Ca9NZqraFACTx58ZBsN1V5Xzv7azoK8Zgr9+JwVNahpFxclrbL8xuceThkC4smBl
fiZJC9fClD9ATquIdj177jNMVC8F4B5yrKF/ehJDcNQhcqUdWx9Sbj461enf+3HI
nfTP+77ZzyIJ76iRXJBV/jr9wkaPWhAZVeBGxmw5clTvB9/RBbU=
=fzwv
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2022-12-11
We've added 74 non-merge commits during the last 11 day(s) which contain
a total of 88 files changed, 3362 insertions(+), 789 deletions(-).
The main changes are:
1) Decouple prune and jump points handling in the verifier, from Andrii.
2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin.
Merged from hid tree.
3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs,
from Björn.
4) Don't use rcu_users to refcount in task kfuncs, from David.
5) Three reg_state->id fixes in the verifier, from Eduard.
6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou.
7) Refactor dynptr handling in the verifier, from Kumar.
8) Remove the "/sys" mount and umount dance in {open,close}_netns
in bpf selftests, from Martin.
9) Enable sleepable support for cgrp local storage, from Yonghong.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits)
selftests/bpf: test case for relaxed prunning of active_lock.id
selftests/bpf: Add pruning test case for bpf_spin_lock
bpf: use check_ids() for active_lock comparison
selftests/bpf: verify states_equal() maintains idmap across all frames
bpf: states_equal() must build idmap for all function frames
selftests/bpf: test cases for regsafe() bug skipping check_id()
bpf: regsafe() must not skip check_ids()
docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE
selftests/bpf: Add test for dynptr reinit in user_ringbuf callback
bpf: Use memmove for bpf_dynptr_{read,write}
bpf: Move PTR_TO_STACK alignment check to process_dynptr_func
bpf: Rework check_func_arg_reg_off
bpf: Rework process_dynptr_func
bpf: Propagate errors from process_* checks in check_func_arg
bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func
bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true
bpf: Reuse freed element in free_by_rcu during allocation
selftests/bpf: Bring test_offload.py back to life
bpf: Fix comment error in fixup_kfunc_call function
bpf: Do not zero-extend kfunc return values
...
====================
Link: https://lore.kernel.org/r/20221212024701.73809-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Highlights:
- Intel:
- PMC: Add support for Meteor Lake
- Intel On Demand: various updates
- ideapad-laptop:
- Add support for various Fn keys on new models
- Fix touchpad on/off handling in a generic way to avoid having
to add more and more quirks
- android-x86-tablets: Add support for 2 more X86 Android tablet models
- New Dell WMI DDV driver
- Miscellaneous cleanups and small bugfixes
The following is an automated git shortlog grouped by driver:
ACPI:
- battery: Pass battery hook pointer to hook callbacks
ISST:
- Fix typo in comments
Move existing HP drivers to a new hp subdir:
- Move existing HP drivers to a new hp subdir
dell:
- Add new dell-wmi-ddv driver
dell-ddv:
- Warn if ePPID has a suspicious length
- Improve buffer handling
huawei-wmi:
- remove unnecessary member
- fix return value calculation
- do not hard-code sizes
ideapad-laptop:
- Make touchpad_ctrl_via_ec a module option
- Stop writing VPCCMD_W_TOUCHPAD at probe time
- Send KEY_TOUCHPAD_TOGGLE on some models
- Only toggle ps2 aux port on/off on select models
- Do not send KEY_TOUCHPAD* events on probe / resume
- Refactor ideapad_sync_touchpad_state()
- support for more special keys in WMI
- Add new _CFG bit numbers for future use
- Revert "check for touchpad support in _CFG"
intel/pmc:
- Relocate Alder Lake PCH support
- Relocate Tiger Lake PCH support
- Relocate Ice Lake PCH support
- Relocate Cannon Lake Point PCH support
- Relocate Sunrise Point PCH support
- Move variable declarations and definitions to header and core.c
- Replace all the reg_map with init functions
intel/pmc/core:
- Add Meteor Lake support to pmc core driver
intel_scu_ipc:
- fix possible name leak in __intel_scu_ipc_register()
mxm-wmi:
- fix memleak in mxm_wmi_call_mx[ds|mx]()
platform/mellanox:
- mlxbf-pmc: Fix event typo
- Add BlueField-3 support in the tmfifo driver
platform/x86/amd:
- pmc: Add a workaround for an s0i3 issue on Cezanne
platform/x86/amd/pmf:
- pass the struct by reference
platform/x86/dell:
- alienware-wmi: Use sysfs_emit() instead of scnprintf()
platform/x86/intel:
- pmc: Fix repeated word in comment
platform/x86/intel/hid:
- Add module-params for 5 button array + SW_TABLET_MODE reporting
platform/x86/intel/sdsi:
- Add meter certificate support
- Support different GUIDs
- Hide attributes if hardware doesn't support
- Add Intel On Demand text
sony-laptop:
- Convert to use sysfs_emit_at() API
thinkpad_acpi:
- use strstarts()
- Fix max_brightness of thinklight
tools/arch/x86:
- intel_sdsi: Add support for reading meter certificates
- intel_sdsi: Add support for new GUID
- intel_sdsi: Read more On Demand registers
- intel_sdsi: Add Intel On Demand text
- intel_sdsi: Add support for reading state certificates
uv_sysfs:
- Use sysfs_emit() instead of scnprintf()
wireless-hotkey:
- use ACPI HID as phys
x86-android-tablets:
- Add Advantech MICA-071 extra button
- Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data
- Add Medion Lifetab S10346 data
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmOW+QgUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9yAPwf/dAYLHiC2ox5YlNTLX2DvU+jOpeBv
W+EIx4oHQz1+O9jrWMLyvS9zTwTEAf6ANLiMP3damEvtJnB72ClgFITzlJAaB4zN
yj0SdxoBRMt6zDL2QwMkwitvb5kJonLfO2H7NsMwA6f0KP1X8sio3oVRAMMVwlzz
nwDKM/VBpuxmy+d880wRRoAkgRkTsPIOwBkYdo1525NU7kkTmtrMpgM+SXQsHTJn
TB9uQnyuiq5/znh3k1Qn+OGwXQezmGz2Fb76IcW5RzUQDew6n6b3kzILee5ddynT
Pa7/ibwpV+FtZjm2kS/l4tV+WPdA+s5TSWoq7Hz0jzBX9GdOORcMZmEneg==
=z16d
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
- Intel:
- PMC: Add support for Meteor Lake
- Intel On Demand: various updates
- Ideapad-laptop:
- Add support for various Fn keys on new models
- Fix touchpad on/off handling in a generic way to avoid having to
add more and more quirks
- Android x86 tablets:
- Add support for two more X86 Android tablet models
- New Dell WMI DDV driver
- Miscellaneous cleanups and small bugfixes
* tag 'platform-drivers-x86-v6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (52 commits)
platform/mellanox: mlxbf-pmc: Fix event typo
platform/x86: intel_scu_ipc: fix possible name leak in __intel_scu_ipc_register()
platform/x86: sony-laptop: Convert to use sysfs_emit_at() API
platform/x86/dell: alienware-wmi: Use sysfs_emit() instead of scnprintf()
platform/x86: uv_sysfs: Use sysfs_emit() instead of scnprintf()
platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]()
platform/x86: x86-android-tablets: Add Advantech MICA-071 extra button
platform/x86: x86-android-tablets: Add Lenovo Yoga Tab 3 (YT3-X90F) charger + fuel-gauge data
platform/x86: x86-android-tablets: Add Medion Lifetab S10346 data
platform/x86: wireless-hotkey: use ACPI HID as phys
platform/x86/intel/hid: Add module-params for 5 button array + SW_TABLET_MODE reporting
platform/x86: ideapad-laptop: Make touchpad_ctrl_via_ec a module option
platform/x86: ideapad-laptop: Stop writing VPCCMD_W_TOUCHPAD at probe time
platform/x86: ideapad-laptop: Send KEY_TOUCHPAD_TOGGLE on some models
platform/x86: ideapad-laptop: Only toggle ps2 aux port on/off on select models
platform/x86: ideapad-laptop: Do not send KEY_TOUCHPAD* events on probe / resume
platform/x86: ideapad-laptop: Refactor ideapad_sync_touchpad_state()
tools/arch/x86: intel_sdsi: Add support for reading meter certificates
tools/arch/x86: intel_sdsi: Add support for new GUID
tools/arch/x86: intel_sdsi: Read more On Demand registers
...
ACPI:
* Enable FPDT support for boot-time profiling
* Fix CPU PMU probing to work better with PREEMPT_RT
* Update SMMUv3 MSI DeviceID parsing to latest IORT spec
* APMT support for probing Arm CoreSight PMU devices
CPU features:
* Advertise new SVE instructions (v2.1)
* Advertise range prefetch instruction
* Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
* Enable DIT (Data Independent Timing) when running in the kernel
* More conversion of system register fields over to the generated
header
CPU misfeatures:
* Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
* Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's
pointer authentication feature when it is supported (complete
with scary DWARF parser!)
Tracing and debug:
* Remove static ftrace in favour of, err, dynamic ftrace!
* Seperate 'struct ftrace_regs' from 'struct pt_regs' in core
ftrace and existing arch code
* Introduce and implement FTRACE_WITH_ARGS on arm64 to replace
the old FTRACE_WITH_REGS
* Extend 'crashkernel=' parameter with default value and fallback
to placement above 4G physical if initial (low) allocation
fails
SVE:
* Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
* Rework of undefined instruction handling to avoid serialisation
on global lock (this includes emulation of user accesses to the
ID registers)
Perf and PMU:
* Support for TLP filters in Hisilicon's PCIe PMU device
* Support for the DDR PMU present in Amlogic Meson G12 SoCs
* Support for the terribly-named "CoreSight PMU" architecture
from Arm (and Nvidia's implementation of said architecture)
Misc:
* Tighten up our boot protocol for systems with memory above
52 bits physical
* Const-ify static keys to satisty jump label asm constraints
* Trivial FFA driver cleanups in preparation for v1.1 support
* Export the kernel_neon_* APIs as GPL symbols
* Harden our instruction generation routines against
instrumentation
* A bunch of robustness improvements to our arch-specific selftests
* Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmOPLFAQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNPRcCACLyDTvkimiqfoPxzzgdkx/6QOvw9s3/mXg
UcTORSZBR1VnYkiMYEKVz/tTfG99dnWtD8/0k/rz48NbhBfsF2sN4ukyBBXVf0zR
fjnaVyVC11LUgBgZKPo6maV+jf/JWf9hJtpPl06KTiPb2Hw2JX4DXg+PeF8t2hGx
NLH4ekQOrlDM8mlsN5mc0YsHbiuO7Xe/NRuet8TsgU4bEvLAwO6bzOLVUMqDQZNq
bQe2ENcGVAzAf7iRJb38lj9qB/5hrQTHRXqLXMSnJyyVjQEwYca0PeJMa7x30bXF
ZZ+xQ8Wq0mxiffZraf6SE34yD4gaYS4Fziw7rqvydC15vYhzJBH1
=hV+2
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
"The highlights this time are support for dynamically enabling and
disabling Clang's Shadow Call Stack at boot and a long-awaited
optimisation to the way in which we handle the SVE register state on
system call entry to avoid taking unnecessary traps from userspace.
Summary:
ACPI:
- Enable FPDT support for boot-time profiling
- Fix CPU PMU probing to work better with PREEMPT_RT
- Update SMMUv3 MSI DeviceID parsing to latest IORT spec
- APMT support for probing Arm CoreSight PMU devices
CPU features:
- Advertise new SVE instructions (v2.1)
- Advertise range prefetch instruction
- Advertise CSSC ("Common Short Sequence Compression") scalar
instructions, adding things like min, max, abs, popcount
- Enable DIT (Data Independent Timing) when running in the kernel
- More conversion of system register fields over to the generated
header
CPU misfeatures:
- Workaround for Cortex-A715 erratum #2645198
Dynamic SCS:
- Support for dynamic shadow call stacks to allow switching at
runtime between Clang's SCS implementation and the CPU's pointer
authentication feature when it is supported (complete with scary
DWARF parser!)
Tracing and debug:
- Remove static ftrace in favour of, err, dynamic ftrace!
- Seperate 'struct ftrace_regs' from 'struct pt_regs' in core ftrace
and existing arch code
- Introduce and implement FTRACE_WITH_ARGS on arm64 to replace the
old FTRACE_WITH_REGS
- Extend 'crashkernel=' parameter with default value and fallback to
placement above 4G physical if initial (low) allocation fails
SVE:
- Optimisation to avoid disabling SVE unconditionally on syscall
entry and just zeroing the non-shared state on return instead
Exceptions:
- Rework of undefined instruction handling to avoid serialisation on
global lock (this includes emulation of user accesses to the ID
registers)
Perf and PMU:
- Support for TLP filters in Hisilicon's PCIe PMU device
- Support for the DDR PMU present in Amlogic Meson G12 SoCs
- Support for the terribly-named "CoreSight PMU" architecture from
Arm (and Nvidia's implementation of said architecture)
Misc:
- Tighten up our boot protocol for systems with memory above 52 bits
physical
- Const-ify static keys to satisty jump label asm constraints
- Trivial FFA driver cleanups in preparation for v1.1 support
- Export the kernel_neon_* APIs as GPL symbols
- Harden our instruction generation routines against instrumentation
- A bunch of robustness improvements to our arch-specific selftests
- Minor cleanups and fixes all over (kbuild, kprobes, kfence, PMU, ...)"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (151 commits)
arm64: kprobes: Return DBG_HOOK_ERROR if kprobes can not handle a BRK
arm64: kprobes: Let arch do_page_fault() fix up page fault in user handler
arm64: Prohibit instrumentation on arch_stack_walk()
arm64:uprobe fix the uprobe SWBP_INSN in big-endian
arm64: alternatives: add __init/__initconst to some functions/variables
arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init()
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
arm64/sysreg: Remove duplicate definitions from asm/sysreg.h
arm64/sysreg: Convert ID_DFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_DFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_AFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_MMFR5_EL1 to automatic generation
arm64/sysreg: Convert MVFR2_EL1 to automatic generation
arm64/sysreg: Convert MVFR1_EL1 to automatic generation
arm64/sysreg: Convert MVFR0_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR2_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR1_EL1 to automatic generation
arm64/sysreg: Convert ID_PFR0_EL1 to automatic generation
...
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEjUuTAak14xi+SF7M4CHKc/GJqRAFAmOTQvYACgkQ4CHKc/GJ
qRCaqQf/UjCDmj1vYKcsTzp5L4MDXdQPA7dKtytbnZtROtClVNUzB0jODsfeMI7C
SwbDJRoUU1y99GRFYIx9oGji1q7TYOWS/PsZxOGkv8ILommmQ1kJdZdxt9rOqYNg
3mjCZoQmZMIRipLDrN55C096Mi+mI89kkE4Lkyrigpmxvc0KyX6QBerr+VmaBMHw
DjmFC6Gj+ZH2AX6z7AzOF1gZ42gPBQUjWdHFRcY41dShOQZNl2FPT5ITAvotlJlH
9mj6woCqW936UOcpUl+Qqk7mekDJb1hqmYXV2VAlhprBi6Vcd9PU6GmPPb6w51bS
HkSNNYjkbuNxBXY13PUPcR0hEHv9zw==
=AlWx
-----END PGP SIGNATURE-----
Merge tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- SLOB deprecation and SLUB_TINY
The SLOB allocator adds maintenance burden and stands in the way of
API improvements [1]. Deprecate it by renaming the config option (to
make users notice) to CONFIG_SLOB_DEPRECATED with updated help text.
SLUB should be used instead as SLAB will be the next on the removal
list.
Based on reports from a riscv k210 board with 8MB RAM, add a
CONFIG_SLUB_TINY option to minimize SLUB's memory usage at the
expense of scalability. This has resolved the k210 regression [2] so
in case there are no others (that wouldn't be resolvable by further
tweaks to SLUB_TINY) plan is to remove SLOB in a few cycles.
Existing defconfigs with CONFIG_SLOB are converted to
CONFIG_SLUB_TINY.
- kmalloc() slub_debug redzone improvements
A series from Feng Tang that builds on the tracking or requested size
for kmalloc() allocations (for caches with debugging enabled) added
in 6.1, to make redzone checks consider the requested size and not
the rounded up one, in order to catch more subtle buffer overruns.
Includes new slub_kunit test.
- struct slab fields reordering to accomodate larger rcu_head
RCU folks would like to grow rcu_head with debugging options, which
breaks current struct slab layout's assumptions, so reorganize it to
make this possible.
- Miscellaneous improvements/fixes:
- __alloc_size checking compiler workaround (Kees Cook)
- Optimize and cleanup SLUB's sysfs init (Rasmus Villemoes)
- Make SLAB compatible with PROVE_RAW_LOCK_NESTING (Jiri Kosina)
- Correct SLUB's percpu allocation estimates (Baoquan He)
- Re-enableS LUB's run-time failslab sysfs control (Alexander Atanasov)
- Make tools/vm/slabinfo more user friendly when not run as root (Rong Tao)
- Dead code removal in SLUB (Hyeonggon Yoo)
* tag 'slab-for-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (31 commits)
mm, slob: rename CONFIG_SLOB to CONFIG_SLOB_DEPRECATED
mm, slub: don't aggressively inline with CONFIG_SLUB_TINY
mm, slub: remove percpu slabs with CONFIG_SLUB_TINY
mm, slub: split out allocations from pre/post hooks
mm/slub, kunit: Add a test case for kmalloc redzone check
mm/slub, kunit: add SLAB_SKIP_KFENCE flag for cache creation
mm, slub: refactor free debug processing
mm, slab: ignore SLAB_RECLAIM_ACCOUNT with CONFIG_SLUB_TINY
mm, slub: don't create kmalloc-rcl caches with CONFIG_SLUB_TINY
mm, slub: lower the default slub_max_order with CONFIG_SLUB_TINY
mm, slub: retain no free slabs on partial list with CONFIG_SLUB_TINY
mm, slub: disable SYSFS support with CONFIG_SLUB_TINY
mm, slub: add CONFIG_SLUB_TINY
mm, slab: ignore hardened usercopy parameters when disabled
slab: Remove special-casing of const 0 size allocations
slab: Clean up SLOB vs kmalloc() definition
mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head
mm/migrate: make isolate_movable_page() skip slab pages
mm/slab: move and adjust kernel-doc for kmem_cache_alloc
mm/slub, percpu: correct the calculation of early percpu allocation size
...
- Add timens support (when switching mm). This version has survived
in -next for the entire cycle (Andrei Vagin).
- Various small bug fixes, refactoring, and readability improvements
(Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin).
- Remove FOLL_FORCE for stack setup (Kees Cook).
- Whilespace cleanups (Rolf Eike Beer, Kees Cook).
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmOOjsgWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJqwED/9mjtKL2GwHOYKsfhtc0m4HVGBw
gxTEKuyo5mRwaRLg2bfuWe1OQfeGWQd9+IZ83Kr2ijzm4R16Gslv9i69Iwdf2tce
iFf2R+iR7On+zNokHxaNflRH9fMsZLobVFqzLvB73BUF82ybJlTR3WMnQhS6HZQB
Gse8jRfueOnVgKldRLlgdxIucPVsXYSoBS4B0nvIUuQn3aNzDNuuctMe/5NFK0ud
+TWMXtKzS3B9pcLTXy3e0bPk/Ptio18CBUEI+iLMAHswtNCoxx1ZCcuvnEcrd5Qr
h2WGaRvYJ7oSUXeEsqPKuDdhqEJQH2AQoX8FzvD+hyIutQJCJzVYlHvwGCqn/Km6
0Dalng9Pjb6z2LEie/N42LDXEQmLZO2WtJ4otpORJlsJ7ZkrLjB4u+hDU1JA/Q14
YPWvth3fMA5vAFKvGCtpEc7YdHmghmXCW+YGXOBm625fPYnwFSXOarHfow1RKNE5
MOM4l60WwzLIHgmr8AFUaLf8TbutXN+BKvbMRh2ToWzDYXEoywxAedHDyo4LVwEy
mZEca/3izT1ynBcyZg1t8shf4htgLjcPHqM0B+Hq0iNMIrwtecqAcYL/Oj6XssPx
OuQYv341KF9fV/hMy84GM2HMr0ygUmrP7b9x+PEvCwzWf/2Glaw6Z4rtCdYC+TjW
8ZWqPqEY+LRsZsL18Q==
=ZDYk
-----END PGP SIGNATURE-----
Merge tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
"Most are small refactorings and bug fixes, but three things stand out:
switching timens (which got reverted before) looks solid now,
FOLL_FORCE has been removed (no failures seen yet across several weeks
in -next), and some whitespace cleanups (which are long overdue).
- Add timens support (when switching mm). This version has survived
in -next for the entire cycle (Andrei Vagin)
- Various small bug fixes, refactoring, and readability improvements
(Bernd Edlinger, Rolf Eike Beer, Bo Liu, Li Zetao Liu Shixin)
- Remove FOLL_FORCE for stack setup (Kees Cook)
- Whitespace cleanups (Rolf Eike Beer, Kees Cook)"
* tag 'execve-v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
binfmt_misc: fix shift-out-of-bounds in check_special_flags
binfmt: Fix error return code in load_elf_fdpic_binary()
exec: Remove FOLL_FORCE for stack setup
binfmt_elf: replace IS_ERR() with IS_ERR_VALUE()
binfmt_elf: simplify error handling in load_elf_phdrs()
binfmt_elf: fix documented return value for load_elf_phdrs()
exec: simplify initial stack size expansion
binfmt: Fix whitespace issues
exec: Add comments on check_unsafe_exec() fs counting
ELF uapi: add spaces before '{'
selftests/timens: add a test for vfork+exit
fs/exec: switch timens when a task gets a new mm
This branch further improves nolibc testing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKoFQTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jLQwD/sEbXzDOC+CeqKrVXHjYPwM5Y+GDoPZ
hJCaT8RSvgKszaTp2wgUCQxHDbIzxGm29os5GR3+LNoWDFFc/+KHnMzkPyBQq8Kq
id3PBUFOhm2ZwEMCfAGqTeDyopKxl3mMwnOQEQsA+wSxgagc8DwQfESmNUzPQW67
K6kUaLpirn7wGuvh/5co2NzgQ9E/AO02ZGhAWrmJiInlE3Uvdo3zw3r/7OqG0MFO
uAEz+DDggOerHA3cdQNHJxPVorkpYEs5a5pJD2tzPcVfan/o41nMY9EZGAMn2vdy
ZJLXyR8HBRVJS5ovKDwTc1beb4wTxjNsYbxvv9G54FvgWJS8GzIBxEjn+cFwx9LS
frkhvPMZPsl7tHNlMQtxArxJWqomNGPnPxm5fVeaBD8khxureJCADB8xmtP39l8M
fQOPW5uBIU7dL0jMH+V7x++7/ToBZab+SZqvEbtgDV8B7ZcnEynqThkvLkVIXEPR
jWhag9M+IesNzux/nxt9Sgvrx32k38S/n44Y3wQURayTwPKQeu5Hbbu6++p4cuj/
v4pHjjhey6brT4tKtokfsAtTQqEFcjKgsUkAJtbT52grXrvJ27FHbOm5jUjHSXP+
qGvjL+UfUURTgb98W5g6RSo0IPJuBen+blDCJz9K6ThWdz8XbzksCpb7dghwlWgz
Q1grIIH62QObZg==
=lTqG
-----END PGP SIGNATURE-----
Merge tag 'nolibc.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Further improvements to nolibc testing
* tag 'nolibc.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
selftests/nolibc: Always rebuild the sysroot when running a test
selftests/nolibc: Add 7 tests for memcmp()
This series adds instrumentation for memcpy(), memset(), and memmove() for
Clang v16+'s new function names that are used when the -fsanitize=thread
argument is given. It also fixes objtool warnings from KCSAN's volatile
instrumentation, and fixes a pair of typos in a pair of Kconfig options'
help clauses.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKn5gTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jHoyD/9gkMct9vIgKRZbMYW9haoLIJlYiwrT
sps8x2PssOwy99I89BTEovnIyPdQ9y3uLuHWMHAUcSN0JZqe797OIEnFImPiXPQF
q2dEg4zeHXGlD0+EDpr+FUcu1Sc4ppk2AqQiS4YiIfQzunS2RMETB+FkehLDMmgm
sJCd40E+xsCbq8yeCYOP2UkDeeJbVvdekli3GsjCu8vjE2UYaBjjZugXgIge9lOQ
FnMfJfrcQutLgLrm4oz2s2Jt7Km3Bl40IJVYeFGdrKBaIXhCXKsbESfOdudRGRTb
jPNf7s7Ofce8b3DQcT/sr8II49CZ0ekhEsExTfGdQKTz+2tghxGolY7VOZ8Nvd78
fM4SHicN/JREMcLTES0VNR+qPQLoFX1qtIXWQUt6OvxP1EoMandRahaGv+OYJ2Cm
lWcmiZWJIDNhQukgnFn2wSd2pkn+Bqj5S6oUhBdcjvVBvt2vCCJtHfZenCLJvJLq
k7nPvofvxA7oec9kDRcwJz+Np29DT7MR8gcn0kElF/Biq1F/wlKNuXyX9Sexm821
XQWEWUGFOtirK9BtxDI8R1uIpKWvLm66mnoXDWSb9kTZrkZHc2sa5j7ipPfOtWkr
GAPfGrn2o6ckg7M5SKlRo87RdjDzyFxLXn5vqkmwMM8ntRTq8nc7JpJSxX96wVJm
v5+HXwRwB5iT0w==
=/aBz
-----END PGP SIGNATURE-----
Merge tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
- Add instrumentation for memcpy(), memset(), and memmove() for Clang
v16+'s new function names that are used when the -fsanitize=thread
argument is given
- Fix objtool warnings from KCSAN's volatile instrumentation, and typos
in a pair of Kconfig options' help clauses
* tag 'kcsan.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan: Fix trivial typo in Kconfig help comments
objtool, kcsan: Add volatile read/write instrumentation to whitelist
kcsan: Instrument memcpy/memset/memmove with newer Clang
This series updates LKMM documentation, both in English and in Korean.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKn/cTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jHfAEACPwEaZrIAyytBV5jcoTOK6qxm5spK7
bek9HIGVbzBJYKuS0rGVnvViH+xVkM4LE0DYxxZuU9oxhzMYywOBKDGeBHpB+Gq7
RMUar6ZhjqkSg6NJmUsdGlelnRMKTDM6o5W98mVsLV66msrAO/JLLTi3E2t5RT1Q
TI/4w5tUmJTLQmvUqDDUZbV9HjB6CCts/k7k/g466Cts55u1ItUqoEX3SRDHUjDA
DZn5wiyRLsTLhN57BXY9kWO/V+00kC644kXQnRptdeHz3FxqNwD/vwhiFjM/Ws0i
nGjcdmZlYtjjFAl9Q3ZSr0U1cUFh0gsP8yAL9mXDeyKOX1vK28RTQMfKmdHzH35L
/upAvpH4f2IEGpBaoYzL3CS0VbJGcYSER9ra0fEmEdlOwSSZWMAJQkbHoGfqHqu8
Zh4vgLG7ER0HcEwKrmvAJ5XMZjo3sJs7a5suwQrQh/Q4JMAUOqtjzgoJne9d5iAD
Cgc9BkTYpNS8BgnsShmzweA87KeCVbIoFk/VmFGhmQJwuBaZ1ZQh91Th1PQiTCAn
9L8sDz/j3aPh0HOzWuqNft1aO0R6yn7YlU+ZrCiMFK5xshlCnmepddC63gT1OciF
pmjV6o/cYRGqWCj6xdAHHJxCG4X2/wT4kIItZlcknn3LxRwep5A4IacfkNjRD0xV
wSYtE4jP8WBY7A==
=a3fB
-----END PGP SIGNATURE-----
Merge tag 'lkmm.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull kernel memory model documentation updates from Paul McKenney:
- Update the LKMM documentation, both in English and in Korean
* tag 'lkmm.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
docs/memory-barriers.txt/kokr: Fix confusing name of 'data dependency barrier'
docs/memory-barriers.txt/kokr: Add memory barrier dma_mb()
docs/memory-barriers.txt/kokr: introduce io_stop_wc() and add implementation for ARM64
docs/memory-barriers.txt: Add a missed closing parenthesis
tools/memory-model: Weaken ctrl dependency definition in explanation.txt
This pull request contains the following branches:
doc.2022.10.20a: Documentation updates. This is the second
in a series from an ongoing review of the RCU documentation.
fixes.2022.10.21a: Miscellaneous fixes.
lazy.2022.11.30a: Introduces a default-off Kconfig option that depends
on RCU_NOCB_CPU that, on CPUs mentioned in the nohz_full or
rcu_nocbs boot-argument CPU lists, causes call_rcu() to introduce
delays. These delays result in significant power savings on
nearly idle Android and ChromeOS systems. These savings range
from a few percent to more than ten percent.
This series also includes several commits that change call_rcu()
to a new call_rcu_hurry() function that avoids these delays in
a few cases, for example, where timely wakeups are required.
Several of these are outside of RCU and thus have acks and
reviews from the relevant maintainers.
srcunmisafe.2022.11.09a: Creates an srcu_read_lock_nmisafe() and an
srcu_read_unlock_nmisafe() for architectures that support NMIs,
but which do not provide NMI-safe this_cpu_inc(). These NMI-safe
SRCU functions are required by the upcoming lockless printk()
work by John Ogness et al.
That printk() series depends on these commits, so if you pull
the printk() series before this one, you will have already
pulled in this branch, plus two more SRCU commits:
0cd7e350ab ("rcu: Make SRCU mandatory")
51f5f78a4f ("srcu: Make Tiny synchronize_srcu() check for readers")
These two commits appear to work well, but do not have
sufficient testing exposure over a long enough time for me to
feel comfortable pushing them unless something in mainline is
definitely going to use them immediately, and currently only
the new printk() work uses them.
torture.2022.10.18c: Changes providing minor but important increases
in test coverage for the new RCU polled-grace-period APIs.
torturescript.2022.10.20a: Changes that avoid redundant kernel builds,
thus providing about a 30% speedup for the torture.sh acceptance
test.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmOKnS8THHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jCMiD/4weraRjmcLhZ3tz2vgTI8ZsXdIiCfU
vCln0AOKroVo37S4BhViVfryV2D4VFfEb1UY6EgxNFu7Jd3z0seQShZh/5r8bFMU
p0E6TC8PwyKUpQstTOwOynkw6BWGW1qeL620PpBNRAy4MkxL8AGv40tHRIHEeAzc
cCTax2+xW9ae0ZtAZHDDCUAzpYpcjScIf4OZ3tkSaFCcpWZijg+dN60dnsZ9l7h9
DtqKH61rszXAtxkmN9Fs9OY5MPCXi9Es6LVYq6KN06jqxwJRqmYf+pai3apmNIOf
P8isXOQG58tbhBLpNCG58UBSkjI2GG8Lcq6hYr6d/7Ukm7RF49q8eL7OQlVrJMuQ
Zi2DVTEAu2U3pzdTC14gi3RvqP7dO+psBs+LpGXtj4RxYvAP99e9KSRcG14j/Wwa
L52AetBzBXTCS5nhPOG8RP22d8HRZLxMe9x7T8iVCDuwH4M1zTF5cVzLeEdgPAD7
tdX4eV16PLt1AvhCEuHU/2v520gc2K9oGXLI1A6kzquXh7FflcPWl5WS+sYUbB/p
gBsblz7C3I5GgSoW4aAMnkukZiYgSvVql8ZyRwQuRzvLpYcofMpoanZbcufDjuw9
N5QzAaMmzHnBu3hOJS2WaSZRZ73fed3NO8jo8q8EMfYeWK3NAHybBdaQqSTgsO8i
s+aN+LZ4s5MnRw==
=eMOr
-----END PGP SIGNATURE-----
Merge tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull RCU updates from Paul McKenney:
- Documentation updates. This is the second in a series from an ongoing
review of the RCU documentation.
- Miscellaneous fixes.
- Introduce a default-off Kconfig option that depends on RCU_NOCB_CPU
that, on CPUs mentioned in the nohz_full or rcu_nocbs boot-argument
CPU lists, causes call_rcu() to introduce delays.
These delays result in significant power savings on nearly idle
Android and ChromeOS systems. These savings range from a few percent
to more than ten percent.
This series also includes several commits that change call_rcu() to a
new call_rcu_hurry() function that avoids these delays in a few
cases, for example, where timely wakeups are required. Several of
these are outside of RCU and thus have acks and reviews from the
relevant maintainers.
- Create an srcu_read_lock_nmisafe() and an srcu_read_unlock_nmisafe()
for architectures that support NMIs, but which do not provide
NMI-safe this_cpu_inc(). These NMI-safe SRCU functions are required
by the upcoming lockless printk() work by John Ogness et al.
- Changes providing minor but important increases in torture test
coverage for the new RCU polled-grace-period APIs.
- Changes to torturescript that avoid redundant kernel builds, thus
providing about a 30% speedup for the torture.sh acceptance test.
* tag 'rcu.2022.12.02a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (49 commits)
net: devinet: Reduce refcount before grace period
net: Use call_rcu_hurry() for dst_release()
workqueue: Make queue_rcu_work() use call_rcu_hurry()
percpu-refcount: Use call_rcu_hurry() for atomic switch
scsi/scsi_error: Use call_rcu_hurry() instead of call_rcu()
rcu/rcutorture: Use call_rcu_hurry() where needed
rcu/rcuscale: Use call_rcu_hurry() for async reader test
rcu/sync: Use call_rcu_hurry() instead of call_rcu
rcuscale: Add laziness and kfree tests
rcu: Shrinker for lazy rcu
rcu: Refactor code a bit in rcu_nocb_do_flush_bypass()
rcu: Make call_rcu() lazy to save power
rcu: Implement lockdep_rcu_enabled for !CONFIG_DEBUG_LOCK_ALLOC
srcu: Debug NMI safety even on archs that don't require it
srcu: Explain the reason behind the read side critical section on GP start
srcu: Warn when NMI-unsafe API is used in NMI
arch/s390: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
arch/loongarch: Add ARCH_HAS_NMI_SAFE_THIS_CPU_OPS Kconfig option
rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state()
rcu-tasks: Make grace-period-age message human-readable
...
Merge devfreq updates and cpupower utility updates for 6.2-rc1:
- Add a private governor_data for devfreq governors (Kant Fan).
- Reorganize devfreq code to use device_match_of_node() and
devm_platform_get_and_ioremap_resource() instead of open coding
them (ye xingchen, Minghao Chi).
- Make cpupower choose base_cpu to display default cpupower details
instead of picking CPU 0 (Saket Kumar Bhaskar).
- Add Georgian translation to cpupower documentation (Zurab
Kargareteli).
- Introduce powercap intel-rapl library, powercap-info command, and
RAPL monitor into cpupower (Thomas Renninger).
* pm-devfreq:
PM / devfreq: event: use devm_platform_get_and_ioremap_resource()
PM / devfreq: event: Use device_match_of_node()
PM / devfreq: Use device_match_of_node()
PM/devfreq: governor: Add a private governor_data for governor
* pm-tools:
cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain
cpupower: Introduce powercap intel-rapl library and powercap-info command
cpupower: Add Georgian translation
tools/cpupower: Choose base_cpu to display default cpupower details
Merge cpuidle changes, updates related to system sleep amd generic power
domains code fixes for 6.2-rc1:
- Improve kernel messages printed by the cpuidle PCI driver (Ulf
Hansson).
- Make the DT cpuidle driver return the correct number of parsed idle
states, clean it up and clarify a comment in it (Ulf Hansson).
- Modify the tasks freezing code to avoid using pr_cont() and refine an
error message printed by it (Rafael Wysocki).
- Make the hibernation core code complain about memory map mismatches
during resume to help diagnostics (Xueqin Luo).
- Fix mistake in a kerneldoc comment in the hibernation code (xiongxin).
- Reverse the order of performance and enabling operations in the
generic power domains code (Abel Vesa).
- Power off[on] domains in hibernate .freeze[thaw]_noirq hook of in the
generic power domains code (Abel Vesa).
- Consolidate genpd_restore_noirq() and genpd_resume_noirq() (Shawn
Guo).
- Pass generic PM noirq hooks to genpd_finish_suspend() (Shawn Guo).
- Drop generic power domain status manipulation during hibernate
restore (Shawn Guo).
* pm-cpuidle:
cpuidle: dt: Clarify a comment and simplify code in dt_init_idle_driver()
cpuidle: dt: Return the correct numbers of parsed idle states
cpuidle: psci: Extend information in log about OSI/PC mode
* pm-sleep:
PM: sleep: Refine error message in try_to_freeze_tasks()
PM: sleep: Avoid using pr_cont() in the tasks freezing code
PM: hibernate: Complain about memory map mismatches during resume
PM: hibernate: Fix mistake in kerneldoc comment
* pm-domains:
PM: domains: Reverse the order of performance and enabling ops
PM: domains: Power off[on] domain in hibernate .freeze[thaw]_noirq hook
PM: domains: Consolidate genpd_restore_noirq() and genpd_resume_noirq()
PM: domains: Pass generic PM noirq hooks to genpd_finish_suspend()
PM: domains: Drop genpd status manipulation for hibernate restore
Merge ACPICA changes, including bug fixes and cleanups as well as support
for some recently defined data structures, for 6.2-rc1:
- Make acpi_ex_load_op() match upstream implementation (Rafael Wysocki).
- Add support for loong_arch-specific APICs in MADT (Huacai Chen).
- Add support for fixed PCIe wake event (Huacai Chen).
- Add EBDA pointer sanity checks (Vit Kabele).
- Avoid accessing VGA memory when EBDA < 1KiB (Vit Kabele).
- Add CCEL table support to both compiler/disassembler (Kuppuswamy
Sathyanarayanan).
- Add a couple of new UUIDs to the known UUID list (Bob Moore).
- Add support for FFH Opregion special context data (Sudeep Holla).
- Improve warning message for "invalid ACPI name" (Bob Moore).
- Add support for CXL 3.0 structures (CXIMS & RDPAS) in the CEDT table
(Alison Schofield).
- Prepare IORT support for revision E.e (Robin Murphy).
- Finish support for the CDAT table (Bob Moore).
- Fix error code path in acpi_ds_call_control_method() (Rafael Wysocki).
- Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() (Li Zetao).
- Update the version of the ACPICA code in the kernel (Bob Moore).
* acpica:
ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage()
ACPICA: Fix error code path in acpi_ds_call_control_method()
ACPICA: Update version to 20221020
ACPICA: Add utcksum.o to the acpidump Makefile
Revert "LoongArch: Provisionally add ACPICA data structures"
ACPICA: Finish support for the CDAT table
ACPICA: IORT: Update for revision E.e
ACPICA: Add CXL 3.0 structures (CXIMS & RDPAS) to the CEDT table
ACPICA: Improve warning message for "invalid ACPI name"
ACPICA: Add support for FFH Opregion special context data
ACPICA: Add a couple of new UUIDs to the known UUID list
ACPICA: iASL: Add CCEL table to both compiler/disassembler
ACPICA: Do not touch VGA memory when EBDA < 1ki_b
ACPICA: Check that EBDA pointer is in valid memory
ACPICA: Events: Support fixed PCIe wake event
ACPICA: MADT: Add loong_arch-specific APICs support
ACPICA: Make acpi_ex_load_op() match upstream
The test currently fails on 32bit. Fixing the "-1ull" vs. "-1ul" seems
to make the test pass and the compiler happy.
Note: This test is not in mm-stable yet. This fix should be squashed into
"selftests/vm: add KSM unmerge tests".
Link: https://lkml.kernel.org/r/20221205193716.276024-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The compiler complains about the conversion of a pointer to an int of
different width.
Link: https://lkml.kernel.org/r/20221205193716.276024-4-david@redhat.com
Fixes: 6f1405efc6 ("selftests/vm: anon_cow: add R/O longterm tests via gup_test")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The tests fail to compile in some environments (e.g., Debian 11.5 on x86).
Let's simply conditionally define MADV_POPULATE_(READ|WRITE) if not
already defined, similar to how the khugepaged.c test handles it.
Link: https://lkml.kernel.org/r/20221205193716.276024-3-david@redhat.com
Fixes: 39b2e5cae4 ("selftests/vm: make MADV_POPULATE_(READ|WRITE) use in-tree headers")
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make sure that we ignore protection of a memcg that is the target of memcg
reclaim.
Link: https://lkml.kernel.org/r/20221202031512.1365483-4-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Refactor the code that drives writing to memory.reclaim (retrying, error
handling, etc) from test_memcg_reclaim() to a helper called
reclaim_until(), which proactively reclaims from a memcg until its usage
reaches a certain value.
While we are at it, refactor and simplify the reclaim loop.
This will be used in a following patch in another test.
Link: https://lkml.kernel.org/r/20221202031512.1365483-3-yosryahmed@google.com
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Suggested-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Down <chris@chrisdown.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vasily Averin <vasily.averin@linux.dev>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
A DAMON sysfs user could start DAMON with a scheme, remove the sysfs
directory for the scheme, and then ask stats or schemes tried regions
update. The related logic were not aware of the already removed directory
situation, so it was able to results in invalid memory accesses. The fix
has made with commit 8468b48661 ("mm/damon/sysfs-schemes: skip stats
update if the scheme directory is removed"), though. Add a selftest to
prevent such kinds of bugs from being introduced again.
Link: https://lkml.kernel.org/r/20221201170834.62823-1-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add a test to measure performance of KSM breaking not triggered via
COW, but triggered by disabling KSM on an area filled with KSM pages via
MADV_UNMERGEABLE.
Link: https://lkml.kernel.org/r/20221021101141.84170-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/ksm: break_ksm() cleanups and fixes", v2.
This series cleans up and fixes break_ksm(). In summary, we no longer use
fake write faults to break COW but instead FAULT_FLAG_UNSHARE. Further,
we move away from using follow_page() --- that we can hopefully remove
completely at one point --- and use new walk_page_range_vma() instead.
Fortunately, we can get rid of VM_FAULT_WRITE and FOLL_MIGRATION in common
code now.
Extend the existing ksm tests by an unmerge benchmark, and a some new
unmerge tests.
Also, add a selftest to measure MADV_UNMERGEABLE performance. In my setup
(AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance
on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a
performance degradation of ~6% -- 7% (old: ~5250 MiB/s, new: ~4900 MiB/s).
I don't think we particularly care for now, but it's good to be aware of
the implication.
This patch (of 9):
Let's add three unmerge tests (MADV_UNMERGEABLE unmerging all pages in the
range).
test_unmerge(): basic unmerge tests
test_unmerge_discarded(): have some pte_none() entries in the range
test_unmerge_uffd_wp(): protect the merged pages using uffd-wp
ksm_tests.c currently contains a mixture of benchmarks and tests, whereby
each test is carried out by executing the ksm_tests binary with specific
parameters. Let's add new ksm_functional_tests.c that performs multiple,
smaller functional tests all at once.
Link: https://lkml.kernel.org/r/20221021101141.84170-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221021101141.84170-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Our memory management kernel CI testing at Red Hat uses the VM
selftests and we have run into two problems:
First, our LTP tests overlap with the VM selftests.
We want to avoid unhelpful redundancy in our testing practices.
Second, we have observed the current run_vmtests.sh to report overall
failure/ambiguous results in the case that a machine lacks the necessary
hardware to perform one or more of the tests. E.g. ksm tests that
require more than one numa node.
We want to be able to run the vm selftests suitable to particular hardware.
Add the ability to run one or more groups of vm tests via run_vmtests.sh
instead of simply all-or-none in order to solve these problems.
Preserve existing default behavior of running all tests when the script
is invoked with no arguments.
Documentation of test groups is included in the patch as follows:
# ./run_vmtests.sh [ -h || --help ]
usage: ./tools/testing/selftests/vm/run_vmtests.sh [ -h | -t "<categories>"]
-t: specify specific categories to tests to run
-h: display this message
The default behavior is to run all tests.
Alternatively, specific groups tests can be run by passing a string
to the -t argument containing one or more of the following categories
separated by spaces:
- mmap
tests for mmap(2)
- gup_test
tests for gup using gup_test interface
- userfaultfd
tests for userfaultfd(2)
- compaction
a test for the patch "Allow compaction of unevictable pages"
- mlock
tests for mlock(2)
- mremap
tests for mremap(2)
- hugevm
tests for very large virtual address space
- vmalloc
vmalloc smoke tests
- hmm
hmm smoke tests
- madv_populate
test memadvise(2) MADV_POPULATE_{READ,WRITE} options
- memfd_secret
test memfd_secret(2)
- process_mrelease
test process_mrelease(2)
- ksm
ksm tests that do not require >=2 NUMA nodes
- ksm_numa
ksm tests that require >=2 NUMA nodes
- pkey
memory protection key tests
- soft_dirty
test soft dirty page bit semantics
- anon_cow
test anonymous copy-on-write semantics
example: ./run_vmtests.sh -t "hmm mmap ksm"
Link: https://lkml.kernel.org/r/20221018231222.1884715-1-jsavitz@redhat.com
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Cc: Joel Savitz <jsavitz@redhat.com>
Cc: Nico Pache <npache@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
post-6.0 issues.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY5Ur2AAKCRDdBJ7gKXxA
jsGmAQDWSq6z9fVgk30XpMr/X7t5c6NTPw5GocVpdwG8iqch3gEAjEs5/Kcd/mx4
d1dLaJFu1u3syessp8nJrNr1HANIog8=
=L8zu
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Nine hotfixes.
Six for MM, three for other areas. Four of these patches address
post-6.0 issues"
* tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
memcg: fix possible use-after-free in memcg_write_event_control()
MAINTAINERS: update Muchun Song's email
mm/gup: fix gup_pud_range() for dax
mmap: fix do_brk_flags() modifying obviously incorrect VMAs
mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit
tmpfs: fix data loss from failed fallocate
kselftests: cgroup: update kmem test precision tolerance
mm: do not BUG_ON missing brk mapping, because userspace can unmap it
mailmap: update Matti Vaittinen's email address
Check that verifier.c:states_equal() uses check_ids() to match
consistent active_lock/map_value configurations. This allows to prune
states with active spin locks even if numerical values of
active_lock ids do not match across compared states.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-8-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test that when reg->id is not same for the same register of type
PTR_TO_MAP_VALUE between current and old explored state, we currently
return false from regsafe and continue exploring.
Without the fix in prior commit, the test case fails.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-7-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A test case that would erroneously pass verification if
verifier.c:states_equal() maintains separate register ID mappings for
call frames.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Under certain conditions it was possible for verifier.c:regsafe() to
skip check_id() call. This commit adds negative test cases previously
errorneously accepted as safe.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221209135733.28851-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed
the batch size while this test case has been left behind. This has led
to a test failure reported by test bot:
not ok 2 selftests: cgroup: test_kmem # exit=1
Update the tolerance for the pcp charges to reflect the
MEMCG_CHARGE_BATCH change to fix this.
[akpm@linux-foundation.org: update comments, per Roman]
Link: https://lkml.kernel.org/r/Y4m8Unt6FhWKC6IH@dhcp22.suse.cz
Fixes: 1813e51eec ("memcg: increase MEMCG_CHARGE_BATCH to 64")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212010958.c1053bd3-yujie.liu@intel.com
Acked-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Michal Koutný" <mkoutny@suse.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add the ability to control and trace in-kernel monitors. This is
a generic interface, it will check for existing monitors and enable
standard setup, like enabling reactors.
For example:
# rv list
wip wakeup in preemptive per-cpu testing monitor. [OFF]
wwnr wakeup while not running per-task testing model. [OFF]
# rv mon wwnr --help
rv version 6.1.0-rc4: help
usage: rv mon wwnr [-h] [-q] [-r reactor] [-s] [-v]
-h/--help: print this menu and the reactor list
-r/--reactor 'reactor': enables the 'reactor'
-s/--self: when tracing (-t), also trace rv command
-t/--trace: trace monitor's event
-v/--verbose: print debug messages
available reactors: nop printk panic
# rv mon wwnr --trace
<TASK>-PID [CPU] TYPE ID STATE x EVENT -> NEXT_STATE FINAL
| | | | | | | | |
rv-3613 [001] event 3613 running x switch_out -> not_running Y
sshd-1248 [005] event 1248 running x switch_out -> not_running Y
<idle>-0 [005] event 71 not_running x wakeup -> not_running Y
<idle>-0 [005] event 71 not_running x switch_in -> running N
kcompactd0-71 [005] event 71 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [005] event 71 not_running x wakeup -> not_running Y
<idle>-0 [005] event 71 not_running x switch_in -> running N
kcompactd0-71 [005] event 71 running x switch_out -> not_running Y
<idle>-0 [000] event 860 not_running x wakeup -> not_running Y
<idle>-0 [000] event 860 not_running x switch_in -> running N
systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y
<idle>-0 [001] event 3613 not_running x wakeup -> not_running Y
Link: https://lkml.kernel.org/r/1e57547e3acadda6e23949b2672c89e76ec2ec42.1668180100.git.bristot@kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
This is the (user-space) runtime verification tool, named rv.
This tool aims to be the interface for in-kernel rv monitors, as
well as the home for monitors in user-space (online asynchronous),
and in *eBPF.
The tool receives a command as the first argument, the current
commands are:
list - list all available monitors
mon - run a given monitor
Each monitor is an independent piece of software inside the
tool and can have their own arguments.
There is no monitor implemented in this patch, it only
adds the basic structure of the tool, based on rtla.
# rv --help
rv version 6.1.0-rc4: help
usage: rv command [-h] [command_options]
-h/--help: print this menu
command: run one of the following command:
list: list all available monitors
mon: run a monitor
[command options]: each command has its own set of options
run rv command -h for further information
*dot2bpf is the next patch set, depends on this, doing cleanups.
Link: https://lkml.kernel.org/r/fb51184f3b95aea0d7bfdc33ec09f4153aee84fa.1668180100.git.bristot@kernel.org
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
rtla_usage(), osnoise_usage() and timerlat_usage() all exit with an
error status.
However when these are called from help, they should exit with a
non-error status.
Fix this by passing the exit status to the functions.
Note, although we remove the subsequent call to exit after calling
usage, we leave it in at the end of a function to suppress the compiler
warning "control reaches end of a non-void function".
Link: https://lkml.kernel.org/r/20221107144313.22470-1-jkacur@redhat.com
Signed-off-by: John Kacur <jkacur@redhat.com>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
MEM_REGION_TEST_DATA is meant to hold data explicitly used by a
selftest, not implicit allocations due to the selftests infrastructure.
Allocate the ucall pool from MEM_REGION_DATA much like the rest of the
selftests library allocations.
Fixes: 426729b2cf ("KVM: selftests: Add ucall pool based implementation")
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
An interesting feature of the Arm architecture is that the stage-1 MMU
supports two distinct VA regions, controlled by TTBR{0,1}_EL1. As KVM
selftests on arm64 only uses TTBR0_EL1, the VA space is constrained to
[0, 2^(va_bits-1)). This is different from other architectures that
allow for addressing low and high regions of the VA space from a single
page table.
KVM selftests' VA space allocator presumes the valid address range is
split between low and high memory based the MSB, which of course is a
poor match for arm64's TTBR0 region.
Allow architectures to override the default VA space layout. Make use of
the override to align vpages_valid with the behavior of TTBR0 on arm64.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20221207214809.489070-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAmOODb0PHG1hekBrZXJu
ZWwub3JnAAoJECPQ0LrRPXpDztsQAInRnsgLl57/SpqhZzExNCllN6AT/bdeB3uz
rnw3ScJOV174uNKp8lnPWoTvu2YUGiVtBp6tFHhDI8le7zHX438ZT8KE5mcs8p5i
KfFKnb8SHV2DDpqkcy24c0Xl/6vsg1qkKrdfJb49yl5ZakRITDpynW/7tn6dXsxX
wASeGFdCYeW4g2xMQzsCbtx6LgeQ8uomBmzRfPrOtZHYYxAn6+4Mj4595EC1sWxM
AQnbp8tW3Vw46saEZAQvUEOGOW9q0Nls7G21YqQ52IA+ZVDK1LmAF2b1XY3edjkk
pX8EsXOURfqdasBxfSfF3SgnUazoz9GHpSzp1cTVTktrPp40rrT7Ldtml0ktq69d
1malPj47KVMDsIq0kNJGnMxciXFgAHw+VaCQX+k4zhIatNwviMbSop2fEoxj22jc
4YGgGOxaGrnvmAJhreCIbr4CkZk5CJ8Zvmtfg+QM6npIp8BY8896nvORx/d4i6tT
H4caadd8AAR56ANUyd3+KqF3x0WrkaU0PLHJLy1tKwOXJUUTjcpvIfahBAAeUlSR
qEFrtb+EEMPgAwLfNOICcNkPZR/yyuYvM+FiUQNVy5cNiwFkpztpIctfOFaHySGF
K07O2/a1F6xKL0OKRUg7hGKknF9ecmux4vHhiUMuIk9VOgNTWobHozBDorLKXMzC
aWa6oGVC
=iIPT
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for 6.2
- Enable the per-vcpu dirty-ring tracking mechanism, together with an
option to keep the good old dirty log around for pages that are
dirtied by something other than a vcpu.
- Switch to the relaxed parallel fault handling, using RCU to delay
page table reclaim and giving better performance under load.
- Relax the MTE ABI, allowing a VMM to use the MAP_SHARED mapping
option, which multi-process VMMs such as crosvm rely on.
- Merge the pKVM shadow vcpu state tracking that allows the hypervisor
to have its own view of a vcpu, keeping that state private.
- Add support for the PMUv3p5 architecture revision, bringing support
for 64bit counters on systems that support it, and fix the
no-quite-compliant CHAIN-ed counter support for the machines that
actually exist out there.
- Fix a handful of minor issues around 52bit VA/PA support (64kB pages
only) as a prefix of the oncoming support for 4kB and 16kB pages.
- Add/Enable/Fix a bunch of selftests covering memslots, breakpoints,
stage-2 faults and access tracking. You name it, we got it, we
probably broke it.
- Pick a small set of documentation and spelling fixes, because no
good merge window would be complete without those.
As a side effect, this tag also drags:
- The 'kvmarm-fixes-6.1-3' tag as a dependency to the dirty-ring
series
- A shared branch with the arm64 tree that repaints all the system
registers to match the ARM ARM's naming, and resulting in
interesting conflicts
Allow variables to execute shell commands. Note, these are processed when
they are first seen while parsing the config file. This is useful if you
have the same config file used for multiple hosts (as they may be in a git
repository).
HOSTNAME := ${shell hostname}
DEFAULTS IF "${HOSTNAME}" == "frodo"
Link: https://lkml.kernel.org/r/20221207212944.277ee850@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The BPF Makefile in net/bpf did incorrect path substitution for O=dir
builds, e.g.
make O=/tmp/kselftest headers
make O=/tmp/kselftest -C tools/testing/selftests
would fail in selftest builds [1] net/ with
clang-16: error: no such file or directory: 'kselftest/net/bpf/nat6to4.c'
clang-16: error: no input files
Add a pattern prerequisite and an order-only-prerequisite (for
creating the directory), to resolve the issue.
[1] https://lore.kernel.org/all/202212060009.34CkQmCN-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Fixes: 837a3d66d6 ("selftests: net: Add cross-compilation support for BPF programs")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20221206102838.272584-1-bjorn@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that Spectrum-1 gained ip6gre support we can move the test out of
the Spectrum-2 directory.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The original support for bpf_user_ringbuf_drain callbacks simply
short-circuited checks for the dynptr state, allowing users to pass
PTR_TO_DYNPTR (now CONST_PTR_TO_DYNPTR) to helpers that initialize a
dynptr. This bug would have also surfaced with other dynptr helpers in
the future that changed dynptr view or modified it in some way.
Include test cases for all cases, i.e. both bpf_dynptr_from_mem and
bpf_ringbuf_reserve_dynptr, and ensure verifier rejects both of them.
Without the fix, both of these programs load and pass verification.
While at it, remove sys_nanosleep target from failure cases' SEC
definition, as there is no such tracepoint.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
While check_func_arg_reg_off is the place which performs generic checks
needed by various candidates of reg->type, there is some handling for
special cases, like ARG_PTR_TO_DYNPTR, OBJ_RELEASE, and
ARG_PTR_TO_RINGBUF_MEM.
This commit aims to streamline these special cases and instead leave
other things up to argument type specific code to handle. The function
will be restrictive by default, and cover all possible cases when
OBJ_RELEASE is set, without having to update the function again (and
missing to do that being a bug).
This is done primarily for two reasons: associating back reg->type to
its argument leaves room for the list getting out of sync when a new
reg->type is supported by an arg_type.
The other case is ARG_PTR_TO_RINGBUF_MEM. The problem there is something
we already handle, whenever a release argument is expected, it should
be passed as the pointer that was received from the acquire function.
Hence zero fixed and variable offset.
There is nothing special about ARG_PTR_TO_RINGBUF_MEM, where technically
its target register type PTR_TO_MEM | MEM_RINGBUF can already be passed
with non-zero offset to other helper functions, which makes sense.
Hence, lift the arg_type_is_release check for reg->off and cover all
possible register types, instead of duplicating the same kind of check
twice for current OBJ_RELEASE arg_types (alloc_mem and ptr_to_btf_id).
For the release argument, arg_type_is_dynptr is the special case, where
we go to actual object being freed through the dynptr, so the offset of
the pointer still needs to allow fixed and variable offset and
process_dynptr_func will verify them later for the release argument case
as well.
This is not specific to ARG_PTR_TO_DYNPTR though, we will need to make
this exception for any future object on the stack that needs to be
released. In this sense, PTR_TO_STACK as a candidate for object on stack
argument is a special case for release offset checks, and they need to
be done by the helper releasing the object on stack.
Since the check has been lifted above all register type checks, remove
the duplicated check that is being done for PTR_TO_BTF_ID.
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Recently, user ringbuf support introduced a PTR_TO_DYNPTR register type
for use in callback state, because in case of user ringbuf helpers,
there is no dynptr on the stack that is passed into the callback. To
reflect such a state, a special register type was created.
However, some checks have been bypassed incorrectly during the addition
of this feature. First, for arg_type with MEM_UNINIT flag which
initialize a dynptr, they must be rejected for such register type.
Secondly, in the future, there are plans to add dynptr helpers that
operate on the dynptr itself and may change its offset and other
properties.
In all of these cases, PTR_TO_DYNPTR shouldn't be allowed to be passed
to such helpers, however the current code simply returns 0.
The rejection for helpers that release the dynptr is already handled.
For fixing this, we take a step back and rework existing code in a way
that will allow fitting in all classes of helpers and have a coherent
model for dealing with the variety of use cases in which dynptr is used.
First, for ARG_PTR_TO_DYNPTR, it can either be set alone or together
with a DYNPTR_TYPE_* constant that denotes the only type it accepts.
Next, helpers which initialize a dynptr use MEM_UNINIT to indicate this
fact. To make the distinction clear, use MEM_RDONLY flag to indicate
that the helper only operates on the memory pointed to by the dynptr,
not the dynptr itself. In C parlance, it would be equivalent to taking
the dynptr as a point to const argument.
When either of these flags are not present, the helper is allowed to
mutate both the dynptr itself and also the memory it points to.
Currently, the read only status of the memory is not tracked in the
dynptr, but it would be trivial to add this support inside dynptr state
of the register.
With these changes and renaming PTR_TO_DYNPTR to CONST_PTR_TO_DYNPTR to
better reflect its usage, it can no longer be passed to helpers that
initialize a dynptr, i.e. bpf_dynptr_from_mem, bpf_ringbuf_reserve_dynptr.
A note to reviewers is that in code that does mark_stack_slots_dynptr,
and unmark_stack_slots_dynptr, we implicitly rely on the fact that
PTR_TO_STACK reg is the only case that can reach that code path, as one
cannot pass CONST_PTR_TO_DYNPTR to helpers that don't set MEM_RDONLY. In
both cases such helpers won't be setting that flag.
The next patch will add a couple of selftest cases to make sure this
doesn't break.
Fixes: 2057156738 ("bpf: Add bpf_user_ringbuf_drain() helper")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
ARG_PTR_TO_DYNPTR is akin to ARG_PTR_TO_TIMER, ARG_PTR_TO_KPTR, where
the underlying register type is subjected to more special checks to
determine the type of object represented by the pointer and its state
consistency.
Move dynptr checks to their own 'process_dynptr_func' function so that
is consistent and in-line with existing code. This also makes it easier
to reuse this code for kfunc handling.
Then, reuse this consolidated function in kfunc dynptr handling too.
Note that for kfuncs, the arg_type constraint of DYNPTR_TYPE_LOCAL has
been lifted.
Acked-by: David Vernet <void@manifault.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207204141.308952-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields
after the first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used,
or when source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOSS98ACgkQMUZtbf5S
Irssbg//aTPi8bJgyM/yH83QK1+6t23xiW0YscJG3CXr+lTIvXKRoyPjUwNmy1TG
xr5+dkW73nEVr3Uxkunn1qo+74yYpwewd1gamOxmjD+TDIb2xn/ErN61X506eXTy
59thWoAMD0EOQrqDIcD4SEfdNMaASrJSD4t8rXwk/h7LMw+mrIsWNSeFL4HpgHOw
lYwujLwDkkDJNWDWI8NcJiR3i+l7JGOHkcdYfhBZIodBCQ1y/u3AVpK0qZd6eeFs
3Waz7a0q9M0glHhXXMdN/v+XjxKRE1evkMjv842zABTSLy4lKKZPn//pXwA+bZ24
qHw2y1a7ZHiRBQMqNCddxzHhe6kfq8OyrCDAI8qupBTNFBXXKzJyxCrbabPNeern
YW0oggN9dTkES17OeHraA5O8km1L6tfIbm6BLteXAwM4hDoxBSMQYtEDF26P4K1K
UYufBjfL/hPhuoYNFZ//0laLow+6k8Sl7Pdk/fwd8InCb59YrYUpjod7kXMByw7m
++FO3guxnk32ZEWduG3Gc203Vi4acRrQkGqWbhMZtH+ccxexx2Efl6Xn+sq7r9Dv
rl/PIQiuGdoFf7OaW64FS2Nj8eVAPcqbYUMEc3EBrFDs/CvGTkyzRUcQoLdEf2zu
p2ldFTerGtek9F7dUs4GZr/vw29n0Q2dPdQVFIcuXo3Nz3wFNKI=
=+mCl
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bluetooth, can and netfilter.
Current release - new code bugs:
- bonding: ipv6: correct address used in Neighbour Advertisement
parsing (src vs dst typo)
- fec: properly scope IRQ coalesce setup during link up to supported
chips only
Previous releases - regressions:
- Bluetooth fixes for fake CSR clones (knockoffs):
- re-add ERR_DATA_REPORTING quirk
- fix crash when device is replugged
- Bluetooth:
- silence a user-triggerable dmesg error message
- L2CAP: fix u8 overflow, oob access
- correct vendor codec definition
- fix support for Read Local Supported Codecs V2
- ti: am65-cpsw: fix RGMII configuration at SPEED_10
- mana: fix race on per-CQ variable NAPI work_done
Previous releases - always broken:
- af_unix: diag: fetch user_ns from in_skb in unix_diag_get_exact(),
avoid null-deref
- af_can: fix NULL pointer dereference in can_rcv_filter
- can: slcan: fix UAF with a freed work
- can: can327: flush TX_work on ldisc .close()
- macsec: add missing attribute validation for offload
- ipv6: avoid use-after-free in ip6_fragment()
- nft_set_pipapo: actually validate intervals in fields after the
first one
- mvneta: prevent oob access in mvneta_config_rss()
- ipv4: fix incorrect route flushing when table ID 0 is used, or when
source address is deleted
- phy: mxl-gpy: add workaround for IRQ bug on GPY215B and GPY215C"
* tag 'net-6.1-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits)
net: dsa: sja1105: avoid out of bounds access in sja1105_init_l2_policing()
s390/qeth: fix use-after-free in hsci
macsec: add missing attribute validation for offload
net: mvneta: Fix an out of bounds check
net: thunderbolt: fix memory leak in tbnet_open()
ipv6: avoid use-after-free in ip6_fragment()
net: plip: don't call kfree_skb/dev_kfree_skb() under spin_lock_irq()
net: phy: mxl-gpy: add MDINT workaround
net: dsa: mv88e6xxx: accept phy-mode = "internal" for internal PHY ports
xen/netback: don't call kfree_skb() under spin_lock_irqsave()
dpaa2-switch: Fix memory leak in dpaa2_switch_acl_entry_add() and dpaa2_switch_acl_entry_remove()
ethernet: aeroflex: fix potential skb leak in greth_init_rings()
tipc: call tipc_lxc_xmit without holding node_read_lock
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
net: dsa: sja1105: fix memory leak in sja1105_setup_devlink_regions()
ipv4: Fix incorrect route flushing when table ID 0 is used
ipv4: Fix incorrect route flushing when source address is deleted
...
Bpftool has new extra libbpf_det_bind probing map we need to exclude.
Also skip trying to load netdevsim modules if it's already loaded (builtin).
v2:
- drop iproute2->bpftool changes (Toke)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221206232739.2504890-1-sdf@google.com
grub2 has submenus where to use grub-reboot, it requires:
grub-reboot X>Y
where X is the main index and Y is the submenu. Thus if you have:
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux ...
[...]
}
submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option ...
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64 (recovery mode)' --class debian --class gnu-linux ...
[...]
}
menuentry 'Debian GNU/Linux, with Linux test' --class debian --class gnu-linux ...
[...]
}
And wanted to boot to the "Linux test" kernel, you need to run:
# grub-reboot 1>2
As 1 is the second top menu (the submenu) and 2 is the third of the sub
menu entries.
Have the grub.cfg parsing for grub2 handle such cases.
Cc: stable@vger.kernel.org
Fixes: a15ba91361 ("ktest: Add support for grub2")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
After a full run of a make_min_config test, I noticed there were a lot of
CONFIGs still enabled that really should not be. Looking at them, I
noticed they were all defined as "default y". The issue is that the test
simple removes the config and re-runs make oldconfig, which enables it
again because it is set to default 'y'. Instead, explicitly disable the
config with writing "# CONFIG_FOO is not set" to the file to keep it from
being set again.
With this change, one of my box's minconfigs went from 768 configs set,
down to 521 configs set.
Link: https://lkml.kernel.org/r/20221202115936.016fce23@gandalf.local.home
Cc: stable@vger.kernel.org
Fixes: 0a05c769a9 ("ktest: Added config_bisect test type")
Reviewed-by: John 'Warthog9' Hawley (VMware) <warthog9@eaglescrag.net>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Convert big chunks of dynptr and map_kptr subtests to use generic
verification_tester. They are switched from using manually maintained
tables of test cases, specifying program name and expected error
verifier message, to btf_decl_tag-based annotations directly on
corresponding BPF programs: __failure to specify that BPF program is
expected to fail verification, and __msg() to specify expected log
message.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221207201648.2990661-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's become a common pattern to have a collection of small BPF programs
in one BPF object file, each representing one test case. On user-space
side of such tests we maintain a table of program names and expected
failure or success, along with optional expected verifier log message.
This works, but each set of tests reimplement this mundane code over and
over again, which is a waste of time for anyone trying to add a new set
of tests. Furthermore, it's quite error prone as it's way too easy to miss
some entries in these manually maintained test tables (as evidences by
dynptr_fail tests, in which ringbuf_release_uninit_dynptr subtest was
accidentally missed; this is fixed in next patch).
So this patch implements generic test_loader, which accepts skeleton
name and handles the rest of details: opens and loads BPF object file,
making sure each program is tested in isolation. Optionally each test
case can specify expected BPF verifier log message. In case of failure,
tester makes sure to report verifier log, but it also reports verifier
log in verbose mode unconditionally.
Now, the interesting deviation from existing custom implementations is
the use of btf_decl_tag attribute to specify expected-to-fail vs
expected-to-succeed markers and, optionally, expected log message
directly next to BPF program source code, eliminating the need to
manually create and update table of tests.
We define few macros wrapping btf_decl_tag with a convention that all
values of btf_decl_tag start with "comment:" prefix, and then utilizing
a very simple "just_some_text_tag" or "some_key_name=<value>" pattern to
define things like expected success/failure, expected verifier message,
extra verifier log level (if necessary). This approach is demonstrated
by next patch in which two existing sets of failure tests are converted.
Tester supports both expected-to-fail and expected-to-succeed programs,
though this patch set didn't convert any existing expected-to-succeed
programs yet, as existing tests couple BPF program loading with their
further execution through attach or test_prog_run. One way to allow
testing scenarios like this would be ability to specify custom callback,
executed for each successfully loaded BPF program. This is left for
follow up patches, after some more analysis of existing test cases.
This test_loader is, hopefully, a start of a test_verifier-like runner,
but integrated into test_progs infrastructure. It will allow much better
"user experience" of defining low-level verification tests that can take
advantage of all the libbpf-provided nicety features on BPF side: global
variables, declarative maps, etc. All while having a choice of defining
it in C or as BPF assembly (through __attribute__((naked)) functions and
using embedded asm), depending on what makes most sense in each
particular case. This will be explored in follow up patches as well.
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221207201648.2990661-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
properly initialize it when table ID 0 is used. This can lead to a route
in the default VRF with a preferred source address not being flushed
when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28
# ip address add dev dummy1 192.0.2.17/28
# ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100
# ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two
different FIB info structures. One with a metric of 100 and table ID of
254 (main) and one with a metric of 200 and table ID of 0. Therefore,
when the preferred source address is deleted from the default VRF,
the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration
structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8
Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Reported-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6
Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8
Tests failed: 0
Fixes: 5a56a0b3a4 ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A series of prior patches added some kfuncs that allow struct
task_struct * objects to be used as kptrs. These kfuncs leveraged the
'refcount_t rcu_users' field of the task for performing refcounting.
This field was used instead of 'refcount_t usage', as we wanted to
leverage the safety provided by RCU for ensuring a task's lifetime.
A struct task_struct is refcounted by two different refcount_t fields:
1. p->usage: The "true" refcount field which task lifetime. The
task is freed as soon as this refcount drops to 0.
2. p->rcu_users: An "RCU users" refcount field which is statically
initialized to 2, and is co-located in a union with
a struct rcu_head field (p->rcu). p->rcu_users
essentially encapsulates a single p->usage
refcount, and when p->rcu_users goes to 0, an RCU
callback is scheduled on the struct rcu_head which
decrements the p->usage refcount.
Our logic was that by using p->rcu_users, we would be able to use RCU to
safely issue refcount_inc_not_zero() a task's rcu_users field to
determine if a task could still be acquired, or was exiting.
Unfortunately, this does not work due to p->rcu_users and p->rcu sharing
a union. When p->rcu_users goes to 0, an RCU callback is scheduled to
drop a single p->usage refcount, and because the fields share a union,
the refcount immediately becomes nonzero again after the callback is
scheduled.
If we were to split the fields out of the union, this wouldn't be a
problem. Doing so should also be rather non-controversial, as there are
a number of places in struct task_struct that have padding which we
could use to avoid growing the structure by splitting up the fields.
For now, so as to fix the kfuncs to be correct, this patch instead
updates bpf_task_acquire() and bpf_task_release() to use the p->usage
field for refcounting via the get_task_struct() and put_task_struct()
functions. Because we can no longer rely on RCU, the change also guts
the bpf_task_acquire_not_zero() and bpf_task_kptr_get() functions
pending a resolution on the above problem.
In addition, the task fixes the kfunc and rcu_read_lock selftests to
expect this new behavior.
Fixes: 90660309b0 ("bpf: Add kfuncs for storing struct task_struct * as a kptr")
Fixes: fca1aa7551 ("bpf: Handle MEM_RCU type properly")
Reported-by: Matus Jokay <matus.jokay@stuba.sk>
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221206210538.597606-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When installing the selftests using
"make -C tools/testing/selftests install", we need to make sure
all the required files to run the selftests are installed. Let's
make sure this is the case.
Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221205131618.1524337-2-daan.j.demeyer@gmail.com
Parse USDT arguments like "8@(%rsp)" on x86. These are emmited by
SystemTap. The argument syntax is similar to the existing "memory
dereference case" but the offset left out as it's zero (i.e. read
the value from the address in the register). We treat it the same
as the the "memory dereference case", but set the offset to 0.
I've tested that this fixes the "unrecognized arg #N spec: 8@(%rsp).."
error I've run into when attaching to a probe with such an argument.
Attaching and reading the correct argument values works.
Something similar might be needed for the other supported
architectures.
[0] Closes: https://github.com/libbpf/libbpf/issues/559
Signed-off-by: Timo Hunziker <timo.hunziker@gmx.ch>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221203123746.2160-1-timo.hunziker@eclipso.ch
It is useful to use vmlinux.h in the xfrm_info test like other kfunc
tests do. In particular, it is common for kfunc bpf prog that requires
to use other core kernel structures in vmlinux.h
Although vmlinux.h is preferred, it needs a ___local flavor of
struct bpf_xfrm_info in order to build the bpf selftests
when CONFIG_XFRM_INTERFACE=[m|n].
Cc: Eyal Birger <eyal.birger@gmail.com>
Fixes: 90a3a05eb3 ("selftests/bpf: add xfrm_info tests")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20221206193554.1059757-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
strdup() allocates memory for path. We need to release the memory in the
following error path. Add free() to avoid memory leak.
Fixes: 8f184732b6 ("bpftool: Switch to libbpf's hashmap for pinned paths of BPF objects")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221206071906.806384-1-linmq006@gmail.com
* for-next/selftests:
kselftest/arm64: Allow epoll_wait() to return more than one result
kselftest/arm64: Don't drain output while spawning children
kselftest/arm64: Hold fp-stress children until they're all spawned
kselftest/arm64: Set test names prior to starting children
kselftest/arm64: Use preferred form for predicate load/stores
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: fix array_size.cocci warning
kselftest/arm64: Print ASCII version of unknown signal frame magic values
kselftest/arm64: Remove validation of extra_context from TODO
kselftest/arm64: Provide progress messages when signalling children
kselftest/arm64: Check that all children are producing output in fp-stress
A previous change amended try_to_freeze_tasks() with the "what"
variable pointing to a string describing the group of tasks subject to
the freezing which may be used in the error message in there too, so
make that happen.
Accordingly, update sleepgraph.py to catch the modified error message
as appropriate.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmOOO08ACgkQJNaLcl1U
h9BiIAf+O8zqcxtokSTS3wsoFHDp2zprTTUk3RZbxaYMFTmFrALbQVUx5EdJ179X
Dr9rL9T2iNBN7YGhDvmzoVvLrDncJEerMk7cgbc88a5tkB7ZipI7E9QliZt92QaP
ihrXAyD7ekLC6QLXrHFNmzwUeFspgtsBkT3MakcfcucuMN53UQbFwD1r0ziCJOMZ
gFnkZKBemUDsURxWR+LCveB/WYKNrZ6Bjokdg8HY1yenACMiXaoWedmIFJ2SxId3
NNS8vqUDUSEFhDaeJVBo1Ow9m8fe+dZBUeVk9Ej7bGEXxYptwtu3aODXknySa8Eb
YVlalR9qQPvaI1688WRleRWp+oyL4Q==
=x1AC
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v6.2
This is a fairly sedate release for the core code, but there's been a
lot of driver work especially around the x86 platforms and device tree
updates:
- More cleanups of the DAPM code from Morimoto-san.
- Factoring out of mapping hw_params onto SoundWire configuration by
Charles Keepax.
- The ever ongoing overhauls of the Intel DSP code continue, including
support for loading libraries and probes with IPC4 on SOF.
- Support for more sample formats on JZ4740.
- Lots of device tree conversions and fixups.
- Support for Allwinner D1, a range of AMD and Intel systems, Mediatek
systems with multiple DMICs, Nuvoton NAU8318, NXP fsl_rpmsg and
i.MX93, Qualcomm AudioReach Enable, MFC and SAL, RealTek RT1318 and
Rockchip RK3588
There's more cross tree updates than usual, though all fairly minor:
- Some OMAP board file updates that were depedencies for removing their
providers in ASoC, as part of a wider effort removing the support for
the relevant OMAP platforms.
- A new I2C API required for updates to the new I2C probe API.
- A DRM update making use of a new API for fixing the capabilities
advertised via hdmi-codec.
Since this is being sent early I might send some more stuff if you've
not yet sent your pull request and there's more come in.
Test the xfrm_info kfunc helpers.
The test setup creates three name spaces - NS0, NS1, NS2.
XFRM tunnels are setup between NS0 and the two other NSs.
The kfunc helpers are used to steer traffic from NS0 to the other
NSs based on a userspace populated bpf global variable and validate
that the return traffic had arrived from the desired NS.
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-5-eyal.birger@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The typical environment where cxl_test is run, QEMU, does not support
cpu_cache_invalidate_memregion(). Add the 'test' bypass symbols to the
configuration check.
Reported-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/167026948179.3527561.4535373655515827457.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pick up support for "XOR" interleave math when parsing ACPI CFMWS window
structures. Fix up conflicts with the RCH emulation already pending in
cxl/next.
In an RCH topology a CXL host-bridge as Root Complex Integrated Endpoint
the represents the memory expander. Unlike a VH topology there is no
CXL/PCIE Root Port that host the endpoint. The CXL subsystem maps this
as the CXL root object (ACPI0017 on ACPI based systems) targeting the
host-bridge as a dport, per usual, but then that dport directly hosts
the endpoint port.
Mock up that configuration with a 4th host-bridge that has a 'cxl_rcd'
device instance as its immediate child.
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Link: https://lore.kernel.org/r/166993046170.1882361.12460762475782283638.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This cpupower update for Linux 6.2-rc1 consists of:
- enhancement to choose base_cpu to display default cpupower details
instead of picking cpu 0 and failing show information when it is
offline. This change ensure user will see power information on
the cpu the tool runs on.
- adds Georgian translation to cpupower documentation.
- introduces powercap intel-rapl library, powercap-info command, and
rapl monitor. This adds the ability to show the used power consumption
in for each rapl domain
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmOOGQUACgkQCwJExA0N
QxyJrg/9FEQUa5ENT3owBofiJzrI1Lc+OhQlfM7m6glLLL0j59tAavaFQKTLuDkI
klZ/4Ebofu1fS5yarykp2oHAfhbMZij8DVrukHi3xOztV11vBvbUVlnnKO/quBuR
VEIQq8pvR04h3dvZMPHweShfGWrS9UcvnUU3UJS+hkPnre4RtCjLx+TpVpsfDgzn
mq2Nm7iaNKQU0IQYOzpjLnwymEA7wMbhSZ7YYgfLhmgyq0tt8AvVQIbr/O18crdi
RqRqxEeHDhzXtmp9HmKj8R2zfsZxvvXZQFNMCM5Ths+iR95NMWYEZzbSOCFIcgkQ
xqOZK2ZDF33sKRWnoONrvLdr0gK0mwVZ1osJLxfrCODQkB2gsDDSzPagMs5zKmr1
5Tq9BboGXu17kT4V5qLmRZrsXppr7v3DKFDwoSlWEUScxDMgUEtjoRXViUIMdxzi
uioHOS8S8vesI7fLNvcKgvz5gzv7ER3iPNttnFBoLY2xu3HXENY3fwq1o4YtOf3h
ghPg4FV98/noJ0W6KeUgvhOtPLdyxviDD0lhFPiAZdW2zSy24tFIzI7SU9IHYMZZ
MkcYsl0zQVltOR0HpkDoT0QZi7FFAae1OXVRdHvd55yEiN3S4hT43hsdtLi5ptI9
1SWb2MiP3kHubvdQl/ek95j+ozr5A+8edm01MOrtKja9c1ug7gs=
=43fr
-----END PGP SIGNATURE-----
Merge tag 'linux-cpupower-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux
Pull cpupower utility updates for 6.2-rc1 from Shuah Khan:
"This cpupower update for Linux 6.2-rc1 consists of:
- enhancement to choose base_cpu to display default cpupower details
instead of picking cpu 0 and failing show information when it is
offline. This change ensure user will see power information on
the cpu the tool runs on.
- adds Georgian translation to cpupower documentation.
- introduces powercap intel-rapl library, powercap-info command, and
rapl monitor. This adds the ability to show the used power consumption
in for each rapl domain"
* tag 'linux-cpupower-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux:
cpupower: rapl monitor - shows the used power consumption in uj for each rapl domain
cpupower: Introduce powercap intel-rapl library and powercap-info command
cpupower: Add Georgian translation
tools/cpupower: Choose base_cpu to display default cpupower details
* kvm-arm64/dirty-ring:
: .
: Add support for the "per-vcpu dirty-ring tracking with a bitmap
: and sprinkles on top", courtesy of Gavin Shan.
:
: This branch drags the kvmarm-fixes-6.1-3 tag which was already
: merged in 6.1-rc4 so that the branch is in a working state.
: .
KVM: Push dirty information unconditionally to backup bitmap
KVM: selftests: Automate choosing dirty ring size in dirty_log_test
KVM: selftests: Clear dirty ring states between two modes in dirty_log_test
KVM: selftests: Use host page size to map ring buffer in dirty_log_test
KVM: arm64: Enable ring-based dirty memory tracking
KVM: Support dirty ring in conjunction with bitmap
KVM: Move declaration of kvm_cpu_dirty_log_size() to kvm_dirty_ring.h
KVM: x86: Introduce KVM_REQ_DIRTY_RING_SOFT_FULL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/access-tracking:
: .
: Small series to add support for arm64 to access_tracking_perf_test and
: correct a couple bugs along the way.
:
: Patches courtesy of Oliver Upton.
: .
KVM: selftests: Build access_tracking_perf_test for arm64
KVM: selftests: Have perf_test_util signal when to stop vCPUs
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/s2-faults:
: .
: New KVM/arm64 selftests exercising various sorts of S2 faults, courtesy
: of Ricardo Koller. From the cover letter:
:
: "This series adds a new aarch64 selftest for testing stage 2 fault handling
: for various combinations of guest accesses (e.g., write, S1PTW), backing
: sources (e.g., anon), and types of faults (e.g., read on hugetlbfs with a
: hole, write on a readonly memslot). Each test tries a different combination
: and then checks that the access results in the right behavior (e.g., uffd
: faults with the right address and write/read flag). [...]"
: .
KVM: selftests: aarch64: Add mix of tests into page_fault_test
KVM: selftests: aarch64: Add readonly memslot tests into page_fault_test
KVM: selftests: aarch64: Add dirty logging tests into page_fault_test
KVM: selftests: aarch64: Add userfaultfd tests into page_fault_test
KVM: selftests: aarch64: Add aarch64/page_fault_test
KVM: selftests: Use the right memslot for code, page-tables, and data allocations
KVM: selftests: Fix alignment in virt_arch_pgd_alloc() and vm_vaddr_alloc()
KVM: selftests: Add vm->memslots[] and enum kvm_mem_region_type
KVM: selftests: Stash backing_src_type in struct userspace_mem_region
tools: Copy bitfield.h from the kernel sources
KVM: selftests: aarch64: Construct DEFAULT_MAIR_EL1 using sysreg.h macros
KVM: selftests: Add missing close and munmap in __vm_mem_region_delete()
KVM: selftests: aarch64: Add virt_get_pte_hva() library function
KVM: selftests: Add a userfaultfd library
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/linked-bps:
: .
: Additional selftests for the arm64 breakpoints/watchpoints,
: courtesy of Reiji Watanabe. From the cover letter:
:
: "This series adds test cases for linked {break,watch}points to the
: debug-exceptions test, and expands {break,watch}point tests to
: use non-zero {break,watch}points (the current test always uses
: {break,watch}point#0)."
: .
KVM: arm64: selftests: Test with every breakpoint/watchpoint
KVM: arm64: selftests: Add a test case for a linked watchpoint
KVM: arm64: selftests: Add a test case for a linked breakpoint
KVM: arm64: selftests: Change debug_version() to take ID_AA64DFR0_EL1
KVM: arm64: selftests: Stop unnecessary test stage tracking of debug-exceptions
KVM: arm64: selftests: Add helpers to enable debug exceptions
KVM: arm64: selftests: Remove the hard-coded {b,w}pn#0 from debug-exceptions
KVM: arm64: selftests: Add write_dbg{b,w}{c,v}r helpers in debug-exceptions
KVM: arm64: selftests: Use FIELD_GET() to extract ID register fields
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/selftest/memslot-fixes:
: .
: KVM memslot selftest fixes for non-4kB page sizes, courtesy
: of Gavin Shan. From the cover letter:
:
: "kvm/selftests/memslots_perf_test doesn't work with 64KB-page-size-host
: and 4KB-page-size-guest on aarch64. In the implementation, the host and
: guest page size have been hardcoded to 4KB. It's ovbiously not working
: on aarch64 which supports 4KB, 16KB, 64KB individually on host and guest.
:
: This series tries to fix it. After the series is applied, the test runs
: successfully with 64KB-page-size-host and 4KB-page-size-guest."
: .
KVM: selftests: memslot_perf_test: Report optimal memory slots
KVM: selftests: memslot_perf_test: Consolidate memory
KVM: selftests: memslot_perf_test: Support variable guest page size
KVM: selftests: memslot_perf_test: Probe memory slots for once
KVM: selftests: memslot_perf_test: Consolidate loop conditions in prepare_vm()
KVM: selftests: memslot_perf_test: Use data->nslots in prepare_vm()
Signed-off-by: Marc Zyngier <maz@kernel.org>
This reverts commit c4b41b83c2.
As Ian said, the "cpu-count" is not appropriate for uncore events, also it
caused a perf test failure.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221130193613.1046804-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using "sort -nu", arm64 syscalls were lost. That is, the io_setup
syscall (number 0) and all but one (typically ftruncate; 64) of the
syscalls that are defined symbolically (like "#define __NR_ftruncate
__NR3264_ftruncate") at the point where "sort" is applied.
This creation-of-syscalls.c-scheme is, judging from comments,
copy-pasted from powerpc, and worked there because at the time, its
tools/arch/powerpc/include/uapi/asm/unistd.h had *literals*, like
"#define __NR_ftruncate 93".
With sort being numeric and the non-numeric key effectively evaluating
to 0, the sort option "-u" means these "duplicates" are removed.
There's no need to remove syscall lines with duplicate numbers for arm64
because there are none, so let's fix that by just losing the "-u".
Having the table numerically sorted on syscall-number for the rest of
the syscalls looks nice, so keep the "-n".
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Hans-Peter Nilsson <hp@axis.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20201228023941.E0DE2203B5@pchp3.se.axis.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 93315e46b0 ("perf/core: Add speculation info to branch
entries") added a new field in between type and new_type. Perf has its
own copy of this struct so update it to match the kernel side.
This doesn't currently cause any issues because new_type is only used by
the Arm BRBE driver which isn't merged yet.
Committer notes:
Is this really an ABI? How are we supposed to deal with old perf.data
files with new tools and vice versa? :-\
Fixes: 93315e46b0 ("perf/core: Add speculation info to branch entries")
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Link: https://lore.kernel.org/r/20221130165158.517385-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without
affecting users that don't want atomic operations.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: alexandru elisei <alexandru.elisei@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Cc: kvmarm@lists.linux.dev
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20221119013450.2643007-6-seanjc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In check_all_cpu_dscr_defaults, opendir() opens the directory stream.
Add missing closedir() in the error path to release it.
In check_cpu_dscr_default, open() creates an open file descriptor.
Add missing close() in the error path to release it.
Fixes: ebd5858c90 ("selftests/powerpc: Add test for all DSCR sysfs interfaces")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221205084429.570654-1-linmq006@gmail.com
Both tolower and toupper are built in c functions, we should not
redefine them as this can result in a build error.
Fixes the following errors:
progs/bpf_iter_ksym.c:10:20: error: conflicting types for built-in function 'tolower'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
10 | static inline char tolower(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:5:1: note: 'tolower' is declared in header '<ctype.h>'
4 | #include <bpf/bpf_helpers.h>
+++ |+#include <ctype.h>
5 |
progs/bpf_iter_ksym.c:17:20: error: conflicting types for built-in function 'toupper'; expected 'int(int)' [-Werror=builtin-declaration-mismatch]
17 | static inline char toupper(char c)
| ^~~~~~~
progs/bpf_iter_ksym.c:17:20: note: 'toupper' is declared in header '<ctype.h>'
See background on this sort of issue:
https://stackoverflow.com/a/20582607https://gcc.gnu.org/bugzilla/show_bug.cgi?id=12213
(C99, 7.1.3p1) "All identifiers with external linkage in any of the
following subclauses (including the future library directions) are
always reserved for use as identifiers with external linkage."
This is documented behavior in GCC:
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-std-2
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221203010847.2191265-1-james.hilliard1@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add three tests for cgrp local storage support for sleepable progs.
Two tests can load and run properly, one for cgroup_iter, another
for passing current->cgroups->dfl_cgrp to bpf_cgrp_storage_get()
helper. One test has bpf_rcu_read_lock() and failed to load.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201050449.2785613-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Martin mentioned that the verifier cannot assume arguments from
LSM hook sk_alloc_security being trusted since after the hook
is called, the sk ref_count is set to 1. This will overwrite
the ref_count changed by the bpf program and may cause ref_count
underflow later on.
I then further checked some other hooks. For example,
for bpf_lsm_file_alloc() hook in fs/file_table.c,
f->f_cred = get_cred(cred);
error = security_file_alloc(f);
if (unlikely(error)) {
file_free_rcu(&f->f_rcuhead);
return ERR_PTR(error);
}
atomic_long_set(&f->f_count, 1);
The input parameter 'f' to security_file_alloc() cannot be trusted
as well.
Specifically, I investiaged bpf_map/bpf_prog/file/sk/task alloc/free
lsm hooks. Except bpf_map_alloc and task_alloc, arguments for all other
hooks should not be considered as trusted. This may not be a complete
list, but it covers common usage for sk and task.
Fixes: 3f00c52393 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203204954.2043348-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add MEM_RCU pointer null checking for related tests. Also
modified task_acquire test so it takes a rcu ptr 'ptr' where
'ptr = rcu_ptr->rcu_field'.
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221203184607.478314-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Expand the cxl_test topology to include CFMWS's that use XOR math
for interleave arithmetic, as defined in the CXL Specification 3.0.
With this expanded topology, cxl_test is useful for testing:
x1,x2,x4 ways with XOR interleave arithmetic.
Define the additional XOR CFMWS entries to appear only with the
module parameter interleave_arithmetic=1. The cxl_test default
continues to be modulo math.
modprobe cxl_test interleave_arithmetic=1
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/54670400cd48ba7fcc6d8ee0d6ae2276d3f51aad.1669847017.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
A downstream port must be connected to a component register block.
For restricted hosts the base address is determined from the RCRB. The
RCRB is provided by the host's CEDT CHBS entry. Rework CEDT parser to
get the RCRB and add code to extract the component register block from
it.
RCRB's BAR[0..1] point to the component block containing CXL subsystem
component registers. MEMBAR extraction follows the PCI base spec here,
esp. 64 bit extraction and memory range alignment (6.0, 7.5.1.2.1). The
RCRB base address is cached in the cxl_dport per-host bridge so that the
upstream port component registers can be retrieved later by an RCD
(RCIEP) associated with the host bridge.
Note: Right now the component register block is used for HDM decoder
capability only which is optional for RCDs. If unsupported by the RCD,
the HDM init will fail. It is future work to bypass it in this case.
Co-developed-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/Y4dsGZ24aJlxSfI1@rric.localdomain
[djbw: introduce devm_cxl_add_rch_dport()]
Link: https://lore.kernel.org/r/166993044524.1882361.2539922887413208807.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Accept any cxl_test topology device as the first argument in
cxl_chbs_context.
This is in preparation for reworking the detection of the component
registers across VH and RCH topologies. Move
mock_acpi_table_parse_cedt() beneath the definition of is_mock_port()
and use is_mock_port() instead of the explicit mock cxl_acpi device
check.
Acked-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Robert Richter <rrichter@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166993043433.1882361.17651413716599606118.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The three objects 'struct cxl_nvdimm_bridge', 'struct cxl_nvdimm', and
'struct cxl_pmem_region' manage CXL persistent memory resources. The
bridge represents base platform resources, the nvdimm represents one or
more endpoints, and the region is a collection of nvdimms that
contribute to an assembled address range.
Their relationship is such that a region is torn down if any component
endpoints are removed. All regions and endpoints are torn down if the
foundational bridge device goes down.
A workqueue was deployed to manage these interdependencies, but it is
difficult to reason about, and fragile. A recent attempt to take the CXL
root device lock in the cxl_mem driver was reported by lockdep as
colliding with the flush_work() in the cxl_pmem flows.
Instead of the workqueue, arrange for all pmem/nvdimm devices to be torn
down immediately and hierarchically. A similar change is made to both
the 'cxl_nvdimm' and 'cxl_pmem_region' objects. For bisect-ability both
changes are made in the same patch which unfortunately makes the patch
bigger than desired.
Arrange for cxl_memdev and cxl_region to register a cxl_nvdimm and
cxl_pmem_region as a devres release action of the bridge device.
Additionally, include a devres release action of the cxl_memdev or
cxl_region device that triggers the bridge's release action if an endpoint
exits before the bridge. I.e. this allows either unplugging the bridge,
or unplugging and endpoint to result in the same cleanup actions.
To keep the patch smaller the cleanup of the now defunct workqueue
infrastructure is saved for a follow-on patch.
Tested-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/166993041773.1882361.16444301376147207609.stgit@dwillia2-xfh.jf.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/net`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/1669864248-829-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When testing in kci_test_ipsec_offload, srcip is configured as $dstip,
it should add xfrm policy rule in instead of out.
The test result of this patch is as follows:
PASS: ipsec_offload
Fixes: 2766a11161 ("selftests: rtnetlink: add ipsec offload API test")
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com>
Acked-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20221201082246.14131-1-shaozhengchao@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit d2825fa936 ("crypto: sm3,sm4 - move into crypto directory") moves
SM3 and SM4 algorithm implementations from stand-alone library to crypto
API. The corresponding configuration options for the API version (generic)
are CONFIG_CRYPTO_SM3_GENERIC and CONFIG_CRYPTO_SM4_GENERIC, respectively.
Replace option selected in selftests configuration from the library version
to the API version.
Fixes: d2825fa936 ("crypto: sm3,sm4 - move into crypto directory")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20221201131852.38501-1-tianjia.zhang@linux.alibaba.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Current libbpf Makefile does not contain the help command, which
is inconvenient to use. Similar to the Makefile help command of the
perf, a help command is provided to list the commands supported by
libbpf make and the functions of the commands.
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221202081738.128513-1-liuxin350@huawei.com
The bpf_legacy.h header uses llvm specific load functions, add
GCC compatible variants as well to fix tests using these functions
under GCC.
Signed-off-by: James Hilliard <james.hilliard1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221201190939.3230513-1-james.hilliard1@gmail.com
address post-6.0 issues, which is hopefully a sign that things are
converging.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY4pQpQAKCRDdBJ7gKXxA
jquxAP9Lqif7CGDgdq8uWY2hHS/Ujc3k7Ohgyzs37olnCuU8KwEA6/J7SpjsBgtY
OfzvnwxpCTh8Kfzu/oNckIHo/EEiIA8=
=o6qT
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"15 hotfixes, 11 marked cc:stable.
Only three or four of the latter address post-6.0 issues, which is
hopefully a sign that things are converging"
* tag 'mm-hotfixes-stable-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible"
Kconfig.debug: provide a little extra FRAME_WARN leeway when KASAN is enabled
drm/amdgpu: temporarily disable broken Clang builds due to blown stack-frame
mm/khugepaged: invoke MMU notifiers in shmem/file collapse paths
mm/khugepaged: fix GUP-fast interaction by sending IPI
mm/khugepaged: take the right locks for page table retraction
mm: migrate: fix THP's mapcount on isolation
mm: introduce arch_has_hw_nonleaf_pmd_young()
mm: add dummy pmd_young() for architectures not having it
mm/damon/sysfs: fix wrong empty schemes assumption under online tuning in damon_sysfs_set_schemes()
tools/vm/slabinfo-gnuplot: use "grep -E" instead of "egrep"
nilfs2: fix NULL pointer dereference in nilfs_palloc_commit_free_entry()
hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing
madvise: use zap_page_range_single for madvise dontneed
mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE
In the "mode_filter_without_nnp" test in seccomp_bpf, there is currently
a TODO which asks to check the capability CAP_SYS_ADMIN instead of euid.
This patch adds support to check if the calling process has the flag
CAP_SYS_ADMIN, and also if this flag has CAP_EFFECTIVE set.
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220731092529.28760-1-gautammenghani201@gmail.com
There is a spelling mistake in some help text. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Message-Id: <20221201091354.1613652-1-colin.i.king@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Convert {clear,set}_bit() to atomics as KVM's ucall implementation relies
on clear_bit() being atomic, they are defined in atomic.h, and the same
helpers in the kernel proper are atomic.
KVM's ucall infrastructure is the only user of clear_bit() in tools/, and
there are no true set_bit() users. tools/testing/nvdimm/ does make heavy
use of set_bit(), but that code builds into a kernel module of sorts, i.e.
pulls in all of the kernel's header and so is already getting the kernel's
atomic set_bit().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-10-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop the "atomic_" prefix from tools' atomic_test_and_set_bit() to
match the kernel nomenclature where test_and_set_bit() is atomic,
and __test_and_set_bit() provides the non-atomic variant.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Drop tools' non-atomic test_and_set_bit() and test_and_clear_bit() helpers
now that all users are gone. The names will be claimed in the future for
atomic versions.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.
Leave the usage in ucall_free() as-is, it's the one place in tools/ that
actually wants/needs atomic behavior.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-7-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Use the dedicated non-atomic helpers for {clear,set}_bit() and their
test variants, i.e. the double-underscore versions. Depsite being
defined in atomic.h, and despite the kernel versions being atomic in the
kernel, tools' {clear,set}_bit() helpers aren't actually atomic. Move
to the double-underscore versions so that the versions that are expected
to be atomic (for kernel developers) can be made atomic without affecting
users that don't want atomic operations.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Message-Id: <20221119013450.2643007-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Take @bit as an unsigned long instead of a signed int in clear_bit() and
set_bit() so that they match the double-underscore versions, __clear_bit()
and __set_bit(). This will allow converting users that really don't want
atomic operations to the double-underscores without introducing a
functional change, which will in turn allow making {clear,set}_bit()
atomic (as advertised).
Practically speaking, this _should_ have no functional impact. KVM's
selftests usage is either hardcoded (Hyper-V tests) or is artificially
limited (arch_timer test and dirty_log test). In KVM, dirty_log test is
the only mildly interesting case as it's use indirectly restricted to
unsigned 32-bit values, but in theory it could generate a negative value
when cast to a signed int. But in that case, taking an "unsigned long"
is actually a bug fix.
Perf's usage is more difficult to audit, but any code that is affected
by the switch is likely already broken. perf_header__{set,clear}_feat()
and perf_file_header__read() effectively use only hardcoded enums with
small, positive values, atom_new() passes an unsigned long, but its value
is capped at 128 via NR_ATOM_PER_PAGE, etc...
The only real potential for breakage is in the perf flows that take a
"cpu", but it's unlikely perf is subtly relying on a negative index into
bitmaps, e.g. "cpu" can be "-1", but only as "not valid" placeholder.
Note, tools/testing/nvdimm/ makes heavy use of set_bit(), but that code
builds into a kernel module of sorts, i.e. pulls in all of the kernel's
header and so is getting the kernel's atomic set_bit(). The NVDIMM test
usage of atomics is likely unnecessary, e.g. ndtest_dimm_register() sets
bits in a local variable, but that's neither here nor there as far as
this change is concerned.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a new ucall hook, GUEST_UCALL_NONE(), to allow tests to make ucalls
without allocating a ucall struct, and use it to enable single-step
in ARM's debug-exceptions test. Like the disable single-step path, the
enabling path also needs to ensure that no exclusive access sequences are
attempted after enabling single-step, as the exclusive monitor is cleared
on ERET from the debug exception taken to EL2.
The test currently "works" because clear_bit() isn't actually an atomic
operation... yet.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221119013450.2643007-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJQesSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5IR4QAKGPbLRykY/2FohV2HDu5fDPxA2Fe9nu
5W7ZIptQu+tQtCTWKFEjcQdwYoNrLbr0hr1eGubVbIvBqJbwPQfH7G0765eOIcvy
s6Zn2N24IisIoUxdkJGOL3Tt1UR7wCFbwC+ms0i4FQvMcw+TbM0BTHgJDdwR5laX
mGN7ubz5iImwDFFE3Bd8Qy5I+FGL9CI60l+RzK6b7J8HYi1wOBMLU9QueF/dB7gR
g+navZJAAnvN6YIkjP5j8yPBuvhDzni379ue5ATDq1ALvyyI7xlYALsxpUjCnLuo
CkbvgmfmC94Vdm7pzFgsbazUN2oIjwoimjFQHP1bf8Jmd+770R282JpnwiD/ydCV
Tl2ArwXA2zxVxNZm9g/XqPBwWBWWgWfYIQbuuxc065MnXCnHkY5UGGf0JNx41CDl
hdtm9DHkft8+6kyBBmgkdKxd328Znljq02v3nLePUipfpDVaNj4VAUj9IpV6Lpuj
GJjs4Wx7oqFwH1Im/LqZgnOGwgkSj3ObHtkYx2RSrQAQultbjuplFz2qZWP8PF6A
FrJbcddKOmLINrdNOlvTd5WKCAjtV8vycjFkk+/7H67rpZdM8AI1StrzMP6gmwg4
ARozZJ2UF8nTriRYFQbFQyNm9bBTZ7YQ/HajqfhvCuZLi7i1EaImhC0F1xn7IU5S
6XhvQPvjRTgS
=i6OA
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-fixes-6.2-1' of https://github.com/kvm-x86/linux into HEAD
Misc KVM x86 fixes and cleanups for 6.2:
- One-off fixes for various emulation flows (SGX, VMXON, NRIPS=0).
- Reinstate IBPB on emulated VM-Exit that was incorrectly dropped a few
years back when eliminating unnecessary barriers when switching between
vmcs01 and vmcs02.
- Clean up the MSR filter docs.
- Clean up vmread_error_trampoline() to make it more obvious that params
must be passed on the stack, even for x86-64.
- Let userspace set all supported bits in MSR_IA32_FEAT_CTL irrespective
of the current guest CPUID.
- Fudge around a race with TSC refinement that results in KVM incorrectly
thinking a guest needs TSC scaling when running on a CPU with a
constant TSC, but no hardware-enumerated TSC frequency.
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
- Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmOJPQkSHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5v2AQALv4BN7H4JZWCo0Sjl8cPtLJgLyxJRzt
gsMnxMkxLO5sf0wGVppJBVcNbrhCRwz/VktaTnwdTqr74B4oGPozRdCzXxHe5OMo
Ngy3bREEfmDD7fPW6Tjp2qjEmFW3sSHBWl7nrtWmV/h8JqZKj/SntxqZCGF8xjtM
C3g51DRfkY6gBvX8vObh5iUVP/yr3fVKlrjgLQHtWirt3JybfgW51ULN1yA2j0QM
2r9lYwPK9N0MBRQOd2r/9LBmYXc7nkHdZJnnCUIPPX+ZUZuGuNJ5OQVNqVjlZtCZ
ZjQBq31ZoUshUFtuA9CopBJ97vUUVw/esKo0eGVX8oUVlyn/iklrr4gArktSwUPX
aFVVFNMsBNcyVaJUHeFafYvpjiPg76Fh+Ts3PX2B/O9HHKuiYLIeD8c8RDmlsviA
JcL3fETIXpUwcqbCyYWdCOO3Mv6tHllbkbYYcS2wbHuvXmSTPqN0ebE+PdtoU00v
kAkbnHBLVSMBH09ohR7wK6K9jl7V1tRb9Wyq0Hl1pI6Of/B2/q7gs3SNh8zQaWQH
2TNbx5grLN8ElpjXnMZILwCy3HjnOEbyFP3sp70sTo2lK6u0eduVAds89LI06Adi
5Zz57oPxrW22VyvI9uItDvfwef57w+E6kkPucwsDFj65sCZ/rcMz2do6HWCtU8sn
NglwBZ6V616T
=Gkt4
-----END PGP SIGNATURE-----
Merge tag 'kvm-selftests-6.2-2' of https://github.com/kvm-x86/linux into HEAD
KVM selftests fixes for 6.2
- Fix an inverted check in the access tracking perf test, and restore
support for asserting that there aren't too many idle pages when
running on bare metal.
- Fix an ordering issue in the AMX test introduced by recent conversions
to use kvm_cpu_has(), and harden the code to guard against similar bugs
in the future. Anything that tiggers caching of KVM's supported CPUID,
kvm_cpu_has() in this case, effectively hides opt-in XSAVE features if
the caching occurs before the test opts in via prctl().
- Fix build errors that occur in certain setups (unsure exactly what is
unique about the problematic setup) due to glibc overriding
static_assert() to a variant that requires a custom message.
There are still references to the removed kvm_memory_region data structure
but the doc and comments should mention struct kvm_userspace_memory_region
instead, since that is what's used by the ioctl that replaced the old one
and this data structure support the same set of flags.
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-4-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-3-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The documentation says that the ioctl has been deprecated, but it has been
actually removed and the remaining references are just left overs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Message-Id: <20221202105011.185147-2-javierm@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
On 32bit the trigger-synthetic-eprobe.tc selftest fails with the error:
hist:syscalls:sys_exit_openat: error: Param type doesn't match synthetic event field type
Command: hist:keys=common_pid:filename=$__arg__1,ret=ret:onmatch(syscalls.sys_enter_openat).trace(synth_open,$filename,$ret)
^
This is because the synth_open synthetic event is created with:
echo "$SYNTH u64 filename; s64 ret;" > synthetic_events
Which works fine on 64 bit, as filename is a pointer and the return is
also a long. But for 32 bit architectures, it doesn't work.
Use "unsigned long" and "long" instead so that it works for both 64 bit
and 32 bit architectures.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Resolve conflicts in drivers/vfio/vfio_main.c by using the iommfd version.
The rc fix was done a different way when iommufd patches reworked this
code.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
- malloc() does not zero the buffer,
- fread() does not null-terminate it's output,
- `cat /proc/sys/kernel/core_pattern | hexdump -C` shows the file is
not inherently null-terminated
So using string operations on the buffer is risky. Explicitly add a null
character to the end to make it safer.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-3-bgray@linux.ibm.com
No need to write inline asm for mtspr/mfspr, we have macros for this
in reg.h
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20221128041948.58339-2-bgray@linux.ibm.com
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this using "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/powerpc`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1669862997-31335-1-git-send-email-yangtiezhu@loongson.cn
This test adds a basic HSRv0 network with 3 nodes. In its current shape
it sends and forwards packets, announcements and so merges nodes based
on MAC A/B information.
It is able to detect duplicate packets and packetloss should any occur.
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by the
in-kernel path manager in mptcp_join.sh.
It adds the listener event checking in the existing "remove single
address with port" test. The output looks like this:
003 remove single address with port syn[ ok ] - synack[ ok ] - ack[ ok ]
add[ ok ] - echo [ ok ] - pt [ ok ]
syn[ ok ] - synack[ ok ] - ack[ ok ]
syn[ ok ] - ack [ ok ]
rm [ ok ] - rmsf [ ok ] invert
CREATE_LISTENER 10.0.2.1:10100[ ok ]
CLOSE_LISTENER 10.0.2.1:10100 [ ok ]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch moves evts_ns1 and evts_ns2 out of do_transfer() as two global
variables in mptcp_join.sh. Init them in init() and remove them in
cleanup().
Add a new helper reset_with_events() to save the outputs of 'pm_nl_ctl
events' command in them. And a new helper kill_events_pids() to kill
pids of 'pm_nl_ctl events' command. Use these helpers in userspace pm
tests.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch adds test coverage for listening sockets created by userspace
processes.
It adds a new test named test_listener() and a new verifying helper
verify_listener_events(). The new output looks like this:
CREATE_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
DESTROY_SUBFLOW 10.0.2.2 (ns2) => 10.0.2.1 (ns1) [OK]
MP_PRIO TX [OK]
MP_PRIO RX [OK]
CREATE_LISTENER 10.0.2.2:37106 [OK]
CLOSE_LISTENER 10.0.2.2:37106 [OK]
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This patch makes server_evts and client_evts global in userspace_pm.sh,
then these two variables could be used in test_announce(), test_remove()
and test_subflows(). The local variable 'evts' in these three functions
then could be dropped.
Also move local variable 'file' as a global one.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some userspace pm tests failed since pm listener events have been added.
Now MPTCP_EVENT_LISTENER_CREATED event becomes the first item in the
events list like this:
type:15,family:2,sport:10006,saddr4:0.0.0.0
type:1,token:3701282876,server_side:1,family:2,saddr4:10.0.1.1,...
And no token value in this MPTCP_EVENT_LISTENER_CREATED event.
This patch fixes this by specifying the type 1 item to search for token
values.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Just to avoid classical Bash pitfall where variables are accidentally
overridden by other functions because the proper scope has not been
defined.
That's also what is done in other MPTCP selftests scripts where all non
local variables are defined at the beginning of the script and the
others are defined with the "local" keyword.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
It is clearer to declare these global variables at the beginning of the
file as it is done in other MPTCP selftests rather than in functions in
the middle of the script.
So for uniformity reason, we can do the same here in mptcp_sockopt.sh.
Suggested-by: Geliang Tang <geliang.tang@suse.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The definition of 'rndh' was probably copied from one script to another
but some times, 'sec' was not defined, not used and/or not spelled
properly.
Here all the 'rndh' are now defined the same way.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Some variables were set but never used.
This was not causing any issues except adding some confusion and having
shellcheck complaining about them.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
A new "sandbox" net namespace is available where no other netfilter
rules have been added.
Use this new netns instead of re-using "ns1" and clean it.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Modify list_push_pop_multiple to alloc and insert nodes 2-at-a-time.
Without the previous patch's fix, this block of code:
bpf_spin_lock(lock);
bpf_list_push_front(head, &f[i]->node);
bpf_list_push_front(head, &f[i + 1]->node);
bpf_spin_unlock(lock);
would fail check_reference_leak check as release_on_unlock logic would miss
a ref that should've been released.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221201183406.1203621-2-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Define and use kvm_static_assert() in the common KVM selftests headers to
provide deterministic behavior, and to allow creating static asserts
without dummy messages.
The kernel's static_assert() makes the message param optional, and on the
surface, tools/include/linux/build_bug.h appears to follow suit. However,
glibc may override static_assert() and redefine it as a direct alias of
_Static_assert(), which makes the message parameter mandatory. This leads
to non-deterministic behavior as KVM selftests code that utilizes
static_assert() without a custom message may or not compile depending on
the order of includes. E.g. recently added asserts in
x86_64/processor.h fail on some systems with errors like
In file included from lib/memstress.c:11:0:
include/x86_64/processor.h: In function ‘this_cpu_has_p’:
include/x86_64/processor.h:193:34: error: expected ‘,’ before ‘)’ token
static_assert(low_bit < high_bit); \
^
due to _Static_assert() expecting a comma before a message. The "message
optional" version of static_assert() uses macro magic to strip away the
comma when presented with empty an __VA_ARGS__
#ifndef static_assert
#define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
#define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
#endif // static_assert
and effectively generates "_Static_assert(expr, #expr)".
The incompatible version of static_assert() gets defined by this snippet
in /usr/include/assert.h:
#if defined __USE_ISOC11 && !defined __cplusplus
# undef static_assert
# define static_assert _Static_assert
#endif
which yields "_Static_assert(expr)" and thus fails as above.
KVM selftests don't actually care about using C11, but __USE_ISOC11 gets
defined because of _GNU_SOURCE, which many tests do #define. _GNU_SOURCE
triggers a massive pile of defines in /usr/include/features.h, including
_ISOC11_SOURCE:
/* If _GNU_SOURCE was defined by the user, turn on all the other features. */
#ifdef _GNU_SOURCE
# undef _ISOC95_SOURCE
# define _ISOC95_SOURCE 1
# undef _ISOC99_SOURCE
# define _ISOC99_SOURCE 1
# undef _ISOC11_SOURCE
# define _ISOC11_SOURCE 1
# undef _POSIX_SOURCE
# define _POSIX_SOURCE 1
# undef _POSIX_C_SOURCE
# define _POSIX_C_SOURCE 200809L
# undef _XOPEN_SOURCE
# define _XOPEN_SOURCE 700
# undef _XOPEN_SOURCE_EXTENDED
# define _XOPEN_SOURCE_EXTENDED 1
# undef _LARGEFILE64_SOURCE
# define _LARGEFILE64_SOURCE 1
# undef _DEFAULT_SOURCE
# define _DEFAULT_SOURCE 1
# undef _ATFILE_SOURCE
# define _ATFILE_SOURCE 1
#endif
which further down in /usr/include/features.h leads to:
/* This is to enable the ISO C11 extension. */
#if (defined _ISOC11_SOURCE \
|| (defined __STDC_VERSION__ && __STDC_VERSION__ >= 201112L))
# define __USE_ISOC11 1
#endif
To make matters worse, /usr/include/assert.h doesn't guard against
multiple inclusion by turning itself into a nop, but instead #undefs a
few macros and continues on. As a result, it's all but impossible to
ensure the "message optional" version of static_assert() will actually be
used, e.g. explicitly including assert.h and #undef'ing static_assert()
doesn't work as a later inclusion of assert.h will again redefine its
version.
#ifdef _ASSERT_H
# undef _ASSERT_H
# undef assert
# undef __ASSERT_VOID_CAST
# ifdef __USE_GNU
# undef assert_perror
# endif
#endif /* assert.h */
#define _ASSERT_H 1
#include <features.h>
Fixes: fcba483e82 ("KVM: selftests: Sanity check input to ioctls() at build time")
Fixes: ee37955366 ("KVM: selftests: Refactor X86_FEATURE_* framework to prep for X86_PROPERTY_*")
Fixes: 53a7dc0f21 ("KVM: selftests: Add X86_PROPERTY_* framework to retrieve CPUID values")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221122013309.1872347-1-seanjc@google.com
Move the AMX test's kvm_cpu_has() checks before creating the VM+vCPU,
there are no dependencies between the two operations. Opportunistically
add a comment to call out that enabling off-by-default XSAVE-managed
features must be done before KVM_GET_SUPPORTED_CPUID is cached.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-5-seanjc@google.com
Disallow using kvm_get_supported_cpuid() and thus caching KVM's supported
CPUID info before enabling XSAVE-managed features that are off-by-default
and must be enabled by ARCH_REQ_XCOMP_GUEST_PERM. Caching the supported
CPUID before all XSAVE features are enabled can result in false negatives
due to testing features that were cached before they were enabled.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-4-seanjc@google.com
Move __vm_xsave_require_permission() below the CPUID helpers so that a
future change can reference the cached result of KVM_GET_SUPPORTED_CPUID
while keeping the definition of the variable close to its intended user,
kvm_get_supported_cpuid().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-3-seanjc@google.com
Move the kvm_cpu_has() check on X86_FEATURE_XFD out of the helper to
enable off-by-default XSAVE-managed features and into the one test that
currenty requires XFD (XFeature Disable) support. kvm_cpu_has() uses
kvm_get_supported_cpuid() and thus caches KVM_GET_SUPPORTED_CPUID, and so
using kvm_cpu_has() before ARCH_REQ_XCOMP_GUEST_PERM effectively results
in the test caching stale values, e.g. subsequent checks on AMX_TILE will
get false negatives.
Although off-by-default features are nonsensical without XFD, checking
for XFD virtualization prior to enabling such features isn't strictly
required.
Signed-off-by: Lei Wang <lei4.wang@intel.com>
Fixes: 7fbb653e01 ("KVM: selftests: Check KVM's supported CPUID, not host CPUID, for XFD")
Link: https://lore.kernel.org/r/20221125023839.315207-1-lei4.wang@intel.com
[sean: add Fixes, reword changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221128225735.3291648-2-seanjc@google.com
Restore the assert (on x86-64) that <10% of pages are still idle when NOT
running as a nested VM in the access tracking test. The original assert
was converted to a "warning" to avoid false failures when running the
test in a VM, but the non-nested case does not suffer from the same
"infinite TLB size" issue.
Using the HYPERVISOR flag isn't infallible as VMMs aren't strictly
required to enumerate the "feature" in CPUID, but practically speaking
anyone that is running KVM selftests in VMs is going to be using a VMM
and hypervisor that sets the HYPERVISOR flag.
Cc: David Matlack <dmatlack@google.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-3-seanjc@google.com
Warn if the number of idle pages is greater than or equal to 10% of the
total number of pages, not if the percentage of idle pages is less than
10%. The original code asserted that less than 10% of pages were still
idle, but the check got inverted when the assert was converted to a
warning.
Opportunistically clean up the warning; selftests are 64-bit only, there
is no need to use "%PRIu64" instead of "%lu".
Fixes: 6336a810db ("KVM: selftests: replace assertion with warning in access_tracking_perf_test")
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221129175300.4052283-2-seanjc@google.com
nfit_test overrode the security_show() sysfs attribute function in nvdimm
dimm_devs in order to allow testing of security unlock. With the
introduction of CXL security commands, the trick to override
security_show() becomes significantly more complicated. By introdcing a
security flag CONFIG_NVDIMM_SECURITY_TEST, libnvdimm can just toggle the
check via a compile option. In addition the original override can can be
removed from tools/testing/nvdimm/.
The flag will also be used to bypass cpu_cache_invalidate_memregion() when
set in a different commit. This allows testing on QEMU with nfit_test or
cxl_test since cpu_cache_has_invalidate_memregion() checks whether
X86_FEATURE_HYPERVISOR cpu feature flag is set on x86.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983618758.2734609.18031639517065867138.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The mock cxl mem devs needs a way to go into "locked" status to simulate
when the platform is rebooted. Add a sysfs mechanism so the device security
state is set to "locked" and the frozen state bits are cleared.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983617602.2734609.7042497620931694717.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add support to emulate a CXL mem device support the "Disable Passphrase"
operation. The operation supports disabling of either a user or a master
passphrase. The emulation will provide support for both user and master
passphrase.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983612447.2734609.2767804273351656413.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add support to emulate a CXL mem device supporting the "Set Passphrase"
operation. The operation supports setting of either a user or a master
passphrase.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983611314.2734609.12996309794483934484.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Add the emulation support for handling "Get Security State" opcode for a
CXL memory device for the cxl_test. The function will copy back device
security state bitmask to the output payload.
The security state data is added as platform_data for the mock mem device.
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983610177.2734609.4953959949148428755.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Now that we can skip unsupported configurations add some more test cases
using that, cover 8kHz, 44.1kHz and 96kHz plus 8kHz mono and 48kHz 6
channel.
44.1kHz is a different clock base to the existing 48kHz tests and may
therefore show problems with the clock configuration if only 8kHz based
rates are really available (or help diagnose if bad clocking is due to
only 44.1kHz based rates being supported). 8kHz mono and 48Hz 6 channel
are real world formats and should show if clocking does not account for
channel count properly.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-7-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Rather than just numbering the tests try to provide semi descriptive names
for what the tests are trying to cover. This also has the advantage of
meaning we can add more tests without having to keep the list of tests
ordered by existing number which should make it easier to understand what
we're testing and why.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-6-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
The values in the one example configuration file we currently have are the
default values for the two tests we have so there's no need to actually set
them. Comment them out as examples, with a rename for the tests so that we
can update the tests in the code more easily.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-5-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If constraint selection gives us a number of channels other than the one
that we asked for that isn't a failure, that is the device implementing
constraints and advertising that it can't support whatever we asked
for. Report such cases as a test skip rather than failure so we don't have
false positives.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-4-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
If constraint selection gives us a sample rate other than the one that we
asked for that isn't a failure, that is the device implementing sample
rate constraints and advertising that it can't support whatever we asked
for. Report such cases as a test skip rather than failure so we don't have
false positives.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-3-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
In order to help make the list of tests a bit easier to maintain refactor
things so we pass the tests around as a struct with the parameters in,
enabling us to add new tests by adding to a table with comments saying
what each of the number are. We could also use named initializers if we get
more parameters.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221201170745.1111236-2-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When everything is starting up we are likely to have a lot of child
processes producing output at once. This means that we can reduce
overhead a bit by allowing epoll_wait() to return more than one
descriptor at once, it cuts down on the number of system calls we need
to do which on virtual platforms where the syscall overhead is a bit
more noticable and we're likely to have a lot more children active can
make a small but noticable difference.
On physical platforms the relatively small number of processes being run
and vastly improved speeds push the effects of this change into the
noise.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-4-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Now we hold execution of the stress test programs until all children are
started there is no need to drain output while that is happening.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
At present fp-stress has a bit of a thundering herd problem since the
children it spawns start running immediately, meaning that they can start
starving the parent process of CPU before it has even started all the
children. This is much more severe on virtual platforms since they tend to
support far more SVE and SME vector lengths, be slower in general and for
some have issues with performance when simulating multiple CPUs.
We can mitigate this problem by having all the child processes block before
starting the test program, meaning that we at least have all the child
processes started before we start heavily using CPU. We still have the same
load issues while waiting for the actual stress test programs to start up
and produce output but they're at least all ready to go before that kicks
in, resulting in substantial reductions in overall runtime on some of the
severely affected systems. One test was showing about 20% improvement.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129215926.442895-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
The test prog dumps a single AF_UNIX socket's UID with and without
unshare(CLONE_NEWUSER) and checks if it matches the result of getuid().
Without the preceding patch, the test prog is killed by a NULL deref
in sk_diag_dump_uid().
# ./diag_uid
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN diag_uid.uid.1 ...
BUG: kernel NULL pointer dereference, address: 0000000000000270
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 105212067 P4D 105212067 PUD 1051fe067 PMD 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
RIP: 0010:sk_diag_fill (./include/net/sock.h:920 net/unix/diag.c:119 net/unix/diag.c:170)
...
# 1: Test terminated unexpectedly by signal 9
# FAIL diag_uid.uid.1
not ok 1 diag_uid.uid.1
# RUN diag_uid.uid_unshare.1 ...
# 1: Test terminated by timeout
# FAIL diag_uid.uid_unshare.1
not ok 2 diag_uid.uid_unshare.1
# FAILED: 0 / 2 tests passed.
# Totals: pass:0 fail:2 xfail:0 xpass:0 skip:0 error:0
With the patch, the test succeeds.
# ./diag_uid
TAP version 13
1..2
# Starting 2 tests from 3 test cases.
# RUN diag_uid.uid.1 ...
# OK diag_uid.uid.1
ok 1 diag_uid.uid.1
# RUN diag_uid.uid_unshare.1 ...
# OK diag_uid.uid_unshare.1
ok 2 diag_uid.uid_unshare.1
# PASSED: 2 / 2 tests passed.
# Totals: pass:2 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Add nvdimm_security_ops support for CXL memory device with the introduction
of the ->get_flags() callback function. This is part of the "Persistent
Memory Data-at-rest Security" command set for CXL memory device support.
The ->get_flags() function provides the security state of the persistent
memory device defined by the CXL 3.0 spec section 8.2.9.8.6.1.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/166983609611.2734609.13231854299523325319.stgit@djiang5-desk3.ch.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Verify the KVM allows userspace to set all supported bits in the
IA32_FEATURE_CONTROL MSR irrespective of the current guest CPUID, and
that all unsupported bits are rejected.
Throw the testcase into vmx_msrs_test even though it's not technically a
VMX MSR; it's close enough, and the most frequently feature controlled by
the MSR is VMX.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20220607232353.3375324-4-seanjc@google.com
Cover the essential functionality of the iommufd with a directed test from
userspace. This aims to achieve reasonable functional coverage using the
in-kernel self test framework.
A second test does a failure injection sweep of the success paths to study
error unwind behaviors.
This allows achieving high coverage of the corner cases in pages.c.
The selftest requires CONFIG_IOMMUFD_TEST to be enabled, and several huge
pages which may require:
echo 4 > /proc/sys/vm/nr_hugepages
Link: https://lore.kernel.org/r/19-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Matthew Rosato <mjrosato@linux.ibm.com> # s390
Tested-by: Yi Liu <yi.l.liu@intel.com>
Tested-by: Eric Auger <eric.auger@redhat.com> # aarch64
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This test was overlooked with a hard-coded mntpoint path in test when
we're removing the hugetlb mntpoint in commit 0796c7b8be. Fix it up so
the test can keep running.
Link: https://lkml.kernel.org/r/Y3aojfUC2nSwbCzB@x1n
Fixes: 0796c7b8be ("selftests/vm: drop mnt point for hugetlb in run_vmtests.sh")
Signed-off-by: Peter Xu <peterx@redhat.com>
Reported-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's test whether R/O long-term pinning is reliable for non-anonymous
memory: when R/O long-term pinning a page, the expectation is that we
break COW early before pinning, such that actual write access via the
page tables won't break COW later and end up replacing the R/O-pinned
page in the page table.
Consequently, R/O long-term pinning in private mappings would only target
exclusive anonymous pages.
For now, all tests fail:
# [RUN] R/O longterm GUP pin ... with shared zeropage
not ok 151 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd
not ok 152 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with tmpfile
not ok 153 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with huge zeropage
not ok 154 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd hugetlb (2048 kB)
not ok 155 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP pin ... with memfd hugetlb (1048576 kB)
not ok 156 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with shared zeropage
not ok 157 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd
not ok 158 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with tmpfile
not ok 159 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with huge zeropage
not ok 160 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (2048 kB)
not ok 161 Longterm R/O pin is reliable
# [RUN] R/O longterm GUP-fast pin ... with memfd hugetlb (1048576 kB)
not ok 162 Longterm R/O pin is reliable
Link: https://lkml.kernel.org/r/20221116102659.70287-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add basic tests for COW with non-anonymous pages in private
mappings: write access should properly trigger COW and result in the
private changes not being visible through other page mappings.
Especially, add tests for:
* Zeropage
* Huge zeropage
* Ordinary pagecache pages via memfd and tmpfile()
* Hugetlb pages via memfd
Fortunately, all tests pass.
Link: https://lkml.kernel.org/r/20221116102659.70287-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O
long-term pinning)".
For now, we did not support reliable R/O long-term pinning in COW
mappings. That means, if we would trigger R/O long-term pinning in
MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared
zeropage or a pagecache page.
The next write access would trigger a write fault and replace the pinned
page by an exclusive anonymous page in the process page table; whatever
the process would write to that private page copy would not be visible by
the owner of the previous page pin: for example, RDMA could read stale
data. The end result is essentially an unexpected and hard-to-debug
memory corruption.
Some drivers tried working around that limitation by using
"FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now.
FOLL_WRITE would trigger a write fault, if required, and break COW before
pinning the page. FOLL_FORCE is required because the VMA might lack write
permissions, and drivers wanted to make that working as well, just like
one would expect (no write access, but still triggering a write access to
break COW).
However, that is not a practical solution, because
(1) Drivers that don't stick to that undocumented and debatable pattern
would still run into that issue. For example, VFIO only uses
FOLL_LONGTERM for R/O long-term pinning.
(2) Using FOLL_WRITE just to work around a COW mapping + page pinning
limitation is unintuitive. FOLL_WRITE would, for example, mark the
page softdirty or trigger uffd-wp, even though, there actually isn't
going to be any write access.
(3) The purpose of FOLL_FORCE is debug access, not access without lack of
VMA permissions by arbitrarty drivers.
So instead, make R/O long-term pinning work as expected, by breaking COW
in a COW mapping early, such that we can remove any FOLL_FORCE usage from
drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE).
More details in patch #8.
This patch (of 19):
Originally, the plan was to have a separate tests for testing COW of
non-anonymous (e.g., shared zeropage) pages.
Turns out, that we'd need a lot of similar functionality and that there
isn't a really good reason to separate it. So let's prepare for non-anon
tests by renaming to "cow".
Link: https://lkml.kernel.org/r/20221116102659.70287-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221116102659.70287-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Walls <awalls@md.metrocast.net>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bernard Metzler <bmt@zurich.ibm.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Airlie <airlied@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hans Verkuil <hverkuil@xs4all.nl>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Lucas Stach <l.stach@pengutronix.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nelson Escobar <neescoba@cisco.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oded Gabbay <ogabbay@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Moore <paul@paul-moore.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <richard.henderson@linaro.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux+etnaviv@armlinux.org.uk>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When testing overflow and overread, there is no need to keep unnecessary
compilation warnings, we should simply ignore them.
The motivation for this patch is to eliminate the compilation warning,
maybe one day we will compile the kernel with "-Werror -Wall", at which
point this compilation warning will turn into a compilation error, we
should fix this error in advance.
How to reproduce the problem (with gcc-11.3.1):
$ make -C tools/testing/selftests/
...
warning: `write' reading 4294967295 bytes from a region of size 1
[-Wstringop-overread]
warning: `read' writing 4294967295 bytes into a region of size 25
overflows the destination [-Wstringop-overflow=]
"-Wno-stringop-overread" is supported at least in gcc-11.1.0.
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d14c547abd484d3540b692bb8048c4a6efe92c8b
Link: https://lkml.kernel.org/r/tencent_51C4ACA8CB3895C2D7F35178440283602107@qq.com
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's extend the test to cover the possible mprotect() optimization when
removing write-protection. mprotect() must not allow write-access to a
COW-shared page by accident.
Link: https://lkml.kernel.org/r/20221108174652.198904-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 8ebe0a5eaa ("mm,madvise,hugetlb: fix unexpected data loss with
MADV_DONTNEED on hugetlbfs") changed how the passed length was interpreted
for hugetlb mappings. It was changed from align up to align down. The
hugetlb-madvise test explicitly tests this behavior. Change test to
expect new behavior.
Link: https://lkml.kernel.org/r/20221104011632.357049-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/oe-lkp/202211040619.2ec447d7-oliver.sang@intel.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a simple test case for ensuring tried_regions directory existence.
Link: https://lkml.kernel.org/r/20221101220328.95765-7-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This CPU power monitor shows the power consumption
as exposed by the powercap subsystem, cmp with:
Documentation/power/powercap/powercap.rst
cpupower monitor -m RAPL
| RAPL
CPU| pack | core | unco
0|6853926|967832|442381
8|6853926|967832|442381
1|6853926|967832|442381
9|6853926|967832|442381
Unfortunately RAPL domains cannot be directly mapped to the corresponding
CPU socket/package, core it belongs to.
Not sure this is possible at all with the current data exposed from the
kernel.
Still it can be worthful information for developers trying to optimize
power consumption of workloads or their system in general.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Zhang Rui <rui.zhang@intel.com>
CC: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Read out powercap zone information via:
cpupower powercap-info
and show the zone hierarchy to the user:
./cpupower powercap-info
Driver: intel-rapl
Powercap domain hierarchy:
Zone: package-0 (enabled)
Power consumption can be monitored in micro Watts
Zone: core (disabled)
Power consumption can be monitored in micro Watts
Zone: uncore (disabled)
Power consumption can be monitored in micro Watts
Zone: dram (disabled)
Power consumption can be monitored in micro Watts
There is a dummy -a option for powercap-info which can/should be used to
show more detailed info later. Like that other args can be added easily
later as well.
A enable/disable option via powercap-set subcommand is also an enhancement
for later.
Also not all RAPL domains are shown. The func walking through RAPL
subdomains is restricted and hardcoded to: "intel-rapl/intel-rapl:0"
On my system above powercap domains map to:
intel-rapl/intel-rapl:0
-> pack (age-0)
intel-rapl/intel-rapl:0/intel-rapl:0:0
-> core
intel-rapl/intel-rapl:0/intel-rapl:0:1
-> uncore
Missing ones on my system are:
intel-rapl-mmio/intel-rapl-mmio:0
-> pack (age-0)
intel-rapl/intel-rapl:1
-> psys
This could get enhanced in:
struct powercap_zone *powercap_init_zones()
and adopted to walk through all intel-rapl zones, but
also to other powercap drivers like dtpm
(Dynamic Thermal Power Management framework),
cmp with: drivers/powercap/dtpm_*
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The networking programs typically don't require CAP_PERFMON, but through kfuncs
like bpf_cast_to_kern_ctx() they can access memory through PTR_TO_BTF_ID. In
such case enforce CAP_PERFMON.
Also make sure that only GPL programs can access kernel data structures.
All kfuncs require GPL already.
Also remove allow_ptr_to_map_access. It's the same as allow_ptr_leaks and
different name for the same check only causes confusion.
Fixes: fd264ca020 ("bpf: Add a kfunc to type cast from bpf uapi ctx to kernel ctx")
Fixes: 50c6b8a9ae ("selftests/bpf: Add a test for btf_type_tag "percpu"")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221125220617.26846-1-alexei.starovoitov@gmail.com
BPF CI fails for arm64 and s390x each with the following result:
[...]
All error logs:
serial_test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec
serial_test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec
libbpf: prog 'test_kprobe_empty': failed to attach: Operation not supported
serial_test_kprobe_multi_bench_attach:FAIL:bpf_program__attach_kprobe_multi_opts unexpected error: -95
#92 kprobe_multi_bench_attach:FAIL
[...]
Add the test to the deny list.
Fixes: 5b6c7e5c44 ("selftests/bpf: Add attach bench test")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add simple test cases for DAMON_LRU_SORT's 'enabled' parameter. Those
tests are focusing on the synchronous behavior of DAMON_RECLAIM enabling
and disabling.
Link: https://lkml.kernel.org/r/20221025173650.90624-5-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add simple test cases for DAMON_RECLAIM's 'enabled' parameter. Those
tests are focusing on the synchronous behavior of DAMON_RECLAIM enabling
and disabling.
Link: https://lkml.kernel.org/r/20221025173650.90624-3-sj@kernel.org
Signed-off-by: SeongJae Park <sj@kernel.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/vm`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Link: https://lkml.kernel.org/r/1668825419-30584-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
icmp conntrack will set icmp redirects as RELATED, but icmpv6 will not
do this.
For icmpv6, only icmp errors (code <= 128) are examined for RELATED state.
ICMPV6 Redirects are part of neighbour discovery mechanism, those are
handled by marking a selected subset (e.g. neighbour solicitations) as
UNTRACKED, but not REDIRECT -- they will thus be flagged as INVALID.
Add minimal support for REDIRECTs. No parsing of neighbour options is
added for simplicity, so this will only check that we have the embeeded
original header (ND_OPT_REDIRECT_HDR), and then attempt to do a flow
lookup for this tuple.
Also extend the existing test case to cover redirects.
Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Reported-by: Eric Garver <eric@garver.life>
Link: https://github.com/firewalld/firewalld/issues/1046
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Eric Garver <eric@garver.life>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Just a simple test to make sure we don't introduce unwanted compiler
warnings and API still supports passing enums as input argument.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221130200013.2997831-2-andrii@kernel.org
C++ enum forward declarations are fundamentally not compatible with pure
C enum definitions, and so libbpf's use of `enum bpf_stats_type;`
forward declaration in libbpf/bpf.h public API header is causing C++
compilation issues.
More details can be found in [0], but it comes down to C++ supporting
enum forward declaration only with explicitly specified backing type:
enum bpf_stats_type: int;
In C (and I believe it's a GCC extension also), such forward declaration
is simply:
enum bpf_stats_type;
Further, in Linux UAPI this enum is defined in pure C way:
enum bpf_stats_type { BPF_STATS_RUN_TIME = 0; }
And even though in both cases backing type is int, which can be
confirmed by looking at DWARF information, for C++ compiler actual enum
definition and forward declaration are incompatible.
To eliminate this problem, for C++ mode define input argument as int,
which makes enum unnecessary in libbpf public header. This solves the
issue and as demonstrated by next patch doesn't cause any unwanted
compiler warnings, at least with default warnings setting.
[0] https://stackoverflow.com/questions/42766839/c11-enum-forward-causes-underlying-type-mismatch
[1] Closes: https://github.com/libbpf/libbpf/issues/249
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20221130200013.2997831-1-andrii@kernel.org
This patch removes the need to pin prog when attaching to tc ingress
in the btf_skc_cls_ingress test. Instead, directly use the
bpf_tc_hook_create() and bpf_tc_attach(). The qdisc clsact
will go away together with the netns, so no need to
bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-8-martin.lau@linux.dev
After removing the mount/umount dance from {open,close}_netns()
in the pervious patch, "serial_" can be removed from
the tests using {open,close}_netns().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-7-martin.lau@linux.dev
The previous patches have removed the need to do the mount and umount
dance when switching netns. In particular:
* Avoid remounting /sys/fs/bpf to have a clean start
* Avoid remounting /sys to get a ifindex of a particular netns
This patch can finally remove the mount and umount dance in
{open,close}_netns which is unnecessarily complicated and
error-prone.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-6-martin.lau@linux.dev
This patch removes the need to pin prog in the remaining tests in
tc_redirect.c by directly using the bpf_tc_hook_create() and
bpf_tc_attach(). The clsact qdisc will go away together with
the test netns, so no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-5-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_peer_l3
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-4-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_dtime
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-3-martin.lau@linux.dev
When switching netns, the setns_by_fd() is doing dances in mount/umounting
the /sys directories. One reason is the tc_redirect.c test is depending
on the /sys/net/class/*/ifindex instead of using the if_nametoindex().
if_nametoindex() uses ioctl() to get the ifindex.
This patch is to move all /sys/net/class/*/ifindex usages to
if_nametoindex(). The current code checks ifindex >= 0 which is
incorrect. ifindex > 0 should be checked instead. This patch also
stores ifindex_veth_src and ifindex_veth_dst since the latter patch
will need them.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-2-martin.lau@linux.dev
Torture test the cases where the runstate crosses a page boundary, and
and especially the case where it's configured in 32-bit mode and doesn't,
but then switching to 64-bit mode makes it go onto the second page.
To simplify this, make the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST ioctl
also update the guest runstate area. It already did so if the actual
runstate changed, as a side-effect of kvm_xen_update_runstate(). So
doing it in the plain adjustment case is making it more consistent, as
well as giving us a nice way to trigger the update without actually
running the vCPU again and changing the values.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.
I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The guest runstate area can be arbitrarily byte-aligned. In fact, even
when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
fields in the structure end up being unaligned due to the fact that the
32-bit ABI only aligns them to 32 bits.
So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
is buggy, because if it's unaligned then we can't update the whole field
atomically; the low bytes might be observable before the _UPDATE bit is.
Xen actually updates the *byte* containing that top bit, on its own. KVM
should do the same.
In addition, we cannot assume that the runstate area fits within a single
page. One option might be to make the gfn_to_pfn cache cope with regions
that cross a page — but getting a contiguous virtual kernel mapping of a
discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
and it seems this is the *only* current use case for the GPC which would
benefit from it.
An earlier version of the runstate code did use a gfn_to_hva cache for
this purpose, but it still had the single-page restriction because it
used the uhva directly — because it needs to be able to do so atomically
when the vCPU is being scheduled out, so it used pagefault_disable()
around the accesses and didn't just use kvm_write_guest_cached() which
has a fallback path.
So... use a pair of GPCs for the first and potential second page covering
the runstate area. We can get away with locking both at once because
nothing else takes more than one GPC lock at a time so we can invent
a trivial ordering rule.
The common case where it's all in the same page is kept as a fast path,
but in both cases, the actual guest structure (compat or not) is built
up from the fields in @vx, following preset pointers to the state and
times fields. The only difference is whether those pointers point to
the kernel stack (in the split case) or to guest memory directly via
the GPC. The fast path is also fixed to use a byte access for the
XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
memcpy.
Finally, Xen also does write the runstate area immediately when it's
configured. Flip the kvm_xen_update_runstate() and …_guest() functions
and call the latter directly when the runstate area is set. This means
that other ioctls which modify the runstate also write it immediately
to the guest when they do so, which is also intended.
Update the xen_shinfo_test to exercise the pathological case where the
XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
actually in a different page to the rest of the 64-bit word.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The systemwide perf hardware breakpoint test tries to open a perf event
on each cpu. On large systems, we run out of file descriptors and fail
the test. Instead, have the test set the file descriptor limit to an
arbitraty high value.
Reported-by: Rohan Deshpande <rohan_d@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/187fed5843cecc1e5066677b6296ee88337d7bef.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
This patch first adds TFO support in mptcp_connect.c.
This can be enabled via a new option: -o MPTFO.
Once enabled, the TCP_FASTOPEN socket option is enabled for the server
side and a sendto() with MSG_FASTOPEN is used instead of a connect() for
the client side.
Note that the first SYN has a limit of bytes it can carry. In other
words, it is allowed to send less data than the provided one. We then
need to track more status info to properly allow the next sendmsg()
starting from the next part of the data to send the rest.
Also in TFO scenarios, we need to completely spool the partially xmitted
buffer -- and account for that -- before starting sendfile/mmap xmit,
otherwise the relevant tests will fail.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the async test case was introduced, despite being a completely
independent test case, the command to run it was added to the same shell
script as the smoke test case. Since a shell script implicitly returns
the error code from the last run command, this effectively caused the
script to only return as error code the result from the async test case,
hiding the smoke test result (which could then only be seen from the
python unittest logs).
Move the async test case call to its own shell script runner to avoid
the aforementioned issue. This also makes the output clearer to read,
since each kselftest KTAP result now matches with one python unittest
report.
While at it, also make it so the async test case is skipped if
/dev/tpmrm0 doesn't exist, since commit 8335adb8f9 ("selftests: tpm:
add async space test with noneexisting handle") added a test that relies
on it.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
sysfs now supports splice_* operations with
commit f2d6c2708b ("kernfs: wire up ->splice_read and ->splice_write")
Update the selftests to expect success instead of failure.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Siddharth Gupta <sidgup@codeaurora.org
Reported-by: Dilip Kota <dilip.kota@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOGOdYACgkQMUZtbf5S
IrsknQ//SAoOyDOEu15YzOt8hAupLKoF6MM+D0dwwTEQZLf7IVXCjPpkKtVh7Si7
YCBoyrqrDs7vwaUrVoKY19Amwov+EYrHCpdC+c7wdZ7uxTaYfUbJJUGmxYOR179o
lV1+1Aiqg9F9C6CUsmZ5lDN2Yb7/uPDBICIV8LM+VzJAtXjurBVauyMwAxLxPOAr
cgvM+h5xzE7DXMF2z8R/mUq5MSIWoJo9hy2UwbV+f2liMTQuw9rwTbyw3d7+H/6p
xmJcBcVaABjoUEsEhld3NTlYbSEnlFgCQBfDWzf2e4y6jBxO0JepuIc7SZwJFRJY
XBqdsKcGw5RkgKbksKUgxe126XFX0SUUQEp0UkOIqe15k7eC2yO9uj1gRm6OuV4s
J94HKzHX9WNV5OQ790Ed2JyIJScztMZlNFVJ/cz2/+iKR42xJg6kaO6Rt2fobtmL
VC2cH+RfHzLl+2+7xnfzXEDgFePSBlA02Aq1wihU3zB3r7WCFHchEf9T7sGt1QF0
03R+8E3+N2tYqphPAXyDoy6kXQJTPxJHAe1FNHJlwgfieUDEWZi/Pm+uQrKIkDeo
oq9MAV2QBNSD1w4wl7cXfvicO5kBr/OP6YBqwkpsGao2jCSIgkWEX2DRrUaLczXl
5/Z+m/gCO5tAEcVRYfMivxUIon//9EIhbErVpHTlNWpRHk24eS4=
=0Lnw
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
Signal that a test run is complete through perf_test_args instead of
having tests open code a similar solution. Ensure that the field resets
to false at the beginning of a test run as the structure is reused
between test runs, eliminating a couple of bugs:
access_tracking_perf_test hangs indefinitely on a subsequent test run,
as 'done' remains true. The bug doesn't amount to much right now, as x86
supports a single guest mode. However, this is a precondition of
enabling the test for other architectures with >1 guest mode, like
arm64.
memslot_modification_stress_test has the exact opposite problem, where
subsequent test runs complete immediately as 'run_vcpus' remains false.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
[oliver: added commit message, preserve spin_wait_for_next_iteration()]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221118211503.4049023-2-oliver.upton@linux.dev
The minimal alsa-lib configuration code is similar in both mixer
and pcm tests. Move this code to the shared conf.c source file.
Also, fix the build rules inspired by rseq tests. Build libatest.so
which is linked to the both test utilities dynamically.
Also, set the TEST_FILES variable for lib.mk.
Cc: linux-kselftest@vger.kernel.org
Cc: Shuah Khan <shuah@kernel.org>
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129085306.2345763-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4AC5QAKCRDbK58LschI
g1e0AQCfAqduTy7mYd02jDNCV0wLphNp9FbPiP9OrQT37ABpKAEA1ulj1X59bX3d
HnZdDKuatcPZT9MV5hDLM7MFJ9GjOA4=
=fNmM
-----END PGP SIGNATURE-----
Daniel Borkmann says:
====================
bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain
a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).
The main changes are:
1) Support for user defined BPF objects: the use case is to allocate own
objects, build own object hierarchies and use the building blocks to
build own data structures flexibly, for example, linked lists in BPF,
from Kumar Kartikeya Dwivedi.
2) Add bpf_rcu_read_{,un}lock() support for sleepable programs,
from Yonghong Song.
3) Add support storing struct task_struct objects as kptrs in maps,
from David Vernet.
4) Batch of BPF map documentation improvements, from Maryam Tahhan
and Donald Hunter.
5) Improve BPF verifier to propagate nullness information for branches
of register to register comparisons, from Eduard Zingerman.
6) Fix cgroup BPF iter infra to hold reference on the start cgroup,
from Hou Tao.
7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted
given it is not the case for them, from Alexei Starovoitov.
8) Improve BPF verifier's realloc handling to better play along with dynamic
runtime analysis tools like KASAN and friends, from Kees Cook.
9) Remove legacy libbpf mode support from bpftool,
from Sahid Orentino Ferdjaoui.
10) Rework zero-len skb redirection checks to avoid potentially breaking
existing BPF test infra users, from Stanislav Fomichev.
11) Two small refactorings which are independent and have been split out
of the XDP queueing RFC series, from Toke Høiland-Jørgensen.
12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.
13) Documentation on how to run BPF CI without patch submission,
from Daniel Müller.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4ACtQAKCRDbK58LschI
gyIcAP4nE/chC+gYDOleloK2tQlQawM5Sa4kwHjFzBdPD3tRrAEAs6y2fJjb5vZo
OXIFKhXv5Xo3knmynkTSVSVOvB48IAE=
=Cme8
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2022-11-25
We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 7 files changed, 48 insertions(+), 30 deletions(-).
The main changes are:
1) Several libbpf ringbuf fixes related to probing for its availability,
size overflows when mmaping a 2G ringbuf and rejection of invalid
reservationsizes, from Hou Tao.
2) Fix a buggy return pointer in libbpf for attach_raw_tp function,
from Jiri Olsa.
3) Fix a local storage BPF map bug where the value's spin lock field
can get initialized incorrectly, from Xu Kuohai.
4) Two follow-up fixes in kprobe_multi BPF selftests for BPF CI,
from Jiri Olsa.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Make test_bench_attach serial
selftests/bpf: Filter out default_idle from kprobe_multi bench
bpf: Set and check spin lock value in sk_storage_map_test
bpf: Do not copy spin lock field from user in bpf_selem_alloc
libbpf: Check the validity of size in user_ring_buffer__reserve()
libbpf: Handle size overflow for user ringbuf mmap
libbpf: Handle size overflow for ringbuf mmap
libbpf: Use page size as max_entries when probing ring buffer map
bpf, perf: Use subprog name when reporting subprog ksymbol
libbpf: Use correct return pointer in attach_raw_tp
====================
Link: https://lore.kernel.org/r/20221125001034.29473-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the kernel receives a route deletion request from user space it
tries to delete a route that matches the route attributes specified in
the request.
If only prefix information is specified in the request, the kernel
should delete the first matching FIB alias regardless of its associated
FIB info. However, an error is currently returned when the FIB info is
backed by a nexthop object:
# ip nexthop add id 1 via 192.0.2.2 dev dummy10
# ip route add 198.51.100.0/24 nhid 1
# ip route del 198.51.100.0/24
RTNETLINK answers: No such process
Fix by matching on such a FIB info when legacy nexthop attributes are
not specified in the request. An earlier check already covers the case
where a nexthop ID is specified in the request.
Add tests that cover these flows. Before the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [FAIL]
Tests passed: 11
Tests failed: 1
After the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [ OK ]
Tests passed: 12
Tests failed: 0
No regressions in other tests:
# ./fib_nexthops.sh
...
Tests passed: 228
Tests failed: 0
# ./fib_tests.sh
...
Tests passed: 186
Tests failed: 0
Cc: stable@vger.kernel.org
Reported-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Jonas Gorski <jonas.gorski@gmail.com>
Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Fixes: 6bf92d70e6 ("net: ipv4: fix route with nexthop object delete warning")
Fixes: 61b91eb33a ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221124210932.2470010-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fixes for Xen emulation. While nobody should be enabling it in
the kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
* Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a use-after-free
when it disables virtualization extensions. While downgrading the panic
to a WARN is quite easy, the full fix is a bit more laborious; there
are also tests. This is the bulk of the pull request.
* Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
* Obey changes to the kvm.halt_poll_ns module parameter in VMs
not using KVM_CAP_HALT_POLL, restoring behavior from before
the introduction of the capability
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmODI84UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVJwgAombWOBf549JiHGPtwejuQO20nTSj
Om9pzWQ9dR182P+ju/FdqSPXt/Lc8i+z5zSXDrV3HQ6/a3zIItA+bOAUiMFvHNAQ
w/7pEb1MzVOsEg2SXGOjZvW3WouB4Z4R0PosInYjrFrRGRAaw5iaTOZHGezE44t2
WBWk1PpdMap7J/8sjNT1ble72ig9JdSW4qeJUQ1GWxHCigI5sESCQVqF446KM0jF
gTYPGX5TqpbWiIejF0yNew9yNKMi/yO4Pz8I5j3vtopeHx24DCIqUAGaEg6ykErX
vnzYbVP7NaFrqtje49PsK6i1cu2u7uFPArj0dxo3DviQVZVHV1q6tNmI4A==
=Qgei
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for Xen emulation. While nobody should be enabling it in the
kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
- Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a
use-after-free when it disables virtualization extensions. While
downgrading the panic to a WARN is quite easy, the full fix is a
bit more laborious; there are also tests. This is the bulk of the
pull request.
- Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
- Obey changes to the kvm.halt_poll_ns module parameter in VMs not
using KVM_CAP_HALT_POLL, restoring behavior from before the
introduction of the capability"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Update gfn_to_pfn_cache khva when it moves within the same page
KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0
KVM: x86/xen: Validate port number in SCHEDOP_poll
KVM: x86/mmu: Fix race condition in direct_page_fault
KVM: x86: remove exit_int_info warning in svm_handle_exit
KVM: selftests: add svm part to triple_fault_test
KVM: x86: allow L1 to not intercept triple fault
kvm: selftests: add svm nested shutdown test
KVM: selftests: move idt_entry to header
KVM: x86: forcibly leave nested mode on vCPU reset
KVM: x86: add kvm_leave_nested
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use
KVM: x86: nSVM: leave nested mode on vCPU free
KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL
KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling
KVM: Cap vcpu->halt_poll_ns before halting rather than after
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/ftrace`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/gpio`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The default output of cpupower info utils shows unexpected output
when CPU 0 is disabled.
Considering a case where CPU 0 is disabled, output of cpupower idle-info:
Before change:
cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 0:
*is offline
After change:
./cpupower idle-info
CPUidle driver: pseries_idle
CPUidle governor: menu
analyzing CPU 50:
Number of idle states: 2
Available idle states: snooze CEDE
snooze:
Flags/Description: snooze
Latency: 0
Usage: 101748
Duration: 2724058
CEDE:
Flags/Description: CEDE
Latency: 12
Usage: 270004
Duration: 283019526849
If -c option is not passed, CPU 0 was chosen as the default chosen CPU to
display details. However when CPU 0 is offline, it results in showing
unexpected output. This commit chooses the base_cpu
instead of CPU 0, hence keeping the output more relevant in all cases.
The base_cpu is the number of CPU on which the calling thread is
currently executing.
Signed-off-by: Saket Kumar Bhaskar <skb99@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The newly added PCM test produces a binary which is not ignored by git
when built in tree, fix that.
Fixes: aba51cd094 ("selftests: alsa - add PCM test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20221125153654.1037868-1-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since we now flush output immediately on starting children we should ensure
that the child name is set beforehand so that any output that does get
flushed from the newly created child has the name of the child attached.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221124120722.150988-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add a few positive/negative tests to test bpf_rcu_read_lock()
and its corresponding verifier support. The new test will fail
on s390x and aarch64, so an entry is added to each of their
respective deny lists.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221124053222.2374650-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Missed previously, add libpfm support for 'perf list' callbacks and
thereby JSON support.
Committer notes:
Add __maybe_unused to the args of the new print_libpfm_events() in the
else HAVE_LIBPFM block.
Fixes: e42b0ee61282a2f9 ("perf list: Add JSON output option")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221118024607.409083-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use strbuf to make the string under construction's length unlimited. Use
the format %s to mean a literal string copy and %S to signify a need to
escape the string. Add supported for escaping a newline character.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221118024607.409083-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than a newline starting from column 0, record a newline was
seen and then add the newline and space before the next word.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221118024607.409083-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf doesn't provide proper symbol information for specially crafted
.debug files.
Sometimes .debug file may not have similar program header as runtime
ELF file. For example if we generate .debug file using objcopy
--only-keep-debug resulting file will not contain .text, .data and
other runtime sections. That means corresponding program headers will
have zero FileSiz and modified Offset.
Example: program header of text section of libxxx.so:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x00000000003d3000 0x00000000003d3000 0x00000000003d3000
0x000000000055ae80 0x000000000055ae80 R E 0x1000
Same program header after executing:
objcopy --only-keep-debug libxxx.so libxxx.so.debug
LOAD 0x0000000000001000 0x00000000003d3000 0x00000000003d3000
0x0000000000000000 0x000000000055ae80 R E 0x1000
Offset and FileSiz have been changed.
Following formula will not provide correct value, if program header
taken from .debug file (syms_ss):
sym.st_value -= phdr.p_vaddr - phdr.p_offset;
Correct program header information is located inside runtime ELF
file (runtime_ss).
Fixes: 2d86612aac ("perf symbol: Correct address for bss symbols")
Signed-off-by: Ajay Kaher <akaher@vmware.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srivatsa S. Bhat <srivatsab@vmware.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vasavi Sirnapalli <vsirnapalli@vmware.com>
Link: http://lore.kernel.org/lkml/1669198696-50547-1-git-send-email-akaher@vmware.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update JSON events and metrics for alderlake to perf.
Based on ADL JSON event list v1.16:
https://github.com/intel/perfmon/tree/main/ADL/events
Generate the event list and metrics with the converter scripts:
https://github.com/intel/perfmon/pull/32
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221124031441.110134-4-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add JSON metrics for Alderlake-N to perf.
It only included E-core metrics.
E-core metrics based on E-core TMA v2.2 (E-core_TMA_Metrics.csv)
It is downloaded from:
https://github.com/intel/perfmon/
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221124031441.110134-3-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Alderlake-N only has E-core, it has been moved to non-hybrid code path on
the kernel side, so add the cpuid for Alderlake-N separately.
Add core event list for Alderlake-N, it is based on the
ADL gracemont v1.16 JSON file.
https://github.com/intel/perfmon/tree/main/ADL/events/
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221124031441.110134-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It printed empty strings for each metric. I guess it's needed for CSV
output to match the column number. We could just ignore the empty
metrics in JSON but it ended up with a broken JSON object with a
trailing comma.
So I added a dummy '"metric-value" : "none"' part. To do that, it
needs to pass struct outstate to print_metric_end() to check if any
metric value is printed or not.
Before:
# perf stat -aj --metric-only --per-socket --for-each-cgroup system.slice true
{"socket" : "S0", "cpu-count" : 8, "cgroup" : "system.slice", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : "", "" : ""}
After:
# perf stat -aj --metric-only --per-socket --for-each-cgroup system.slice true
{"socket" : "S0", "cpu-count" : 8, "cgroup" : "system.slice", "metric-value" : "none"}
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-16-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As the JSON output has been broken for a little while, I guess there are
not many users. Let's rename the field to more intuitive one. :)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-15-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It generated a broken JSON output when aggregation mode or cgroup is
used with --metric-only option. Also get rid of the header line and
make the output single line for each entry.
It needs to know whether the current metric is the first one or not.
So add 'first' field in the outstate and mark it false after printing.
Before:
# perf stat -a -j --metric-only true
{"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"}
{{"metric-value" : "0.797"}{"metric-value" : "1.65"}{"metric-value" : "0.89"}
^
# perf stat -a -j --metric-only --per-socket true
{"unit" : "GHz"}{"unit" : "insn per cycle"}{"unit" : "branch-misses of all branches"}
{"socket" : "S0", "aggregate-number" : 8, {"metric-value" : "0.295"}{"metric-value" : "1.88"}{"metric-value" : "0.64"}
^
After:
# perf stat -a -j --metric-only true
{"GHz" : "0.990", "insn per cycle" : "2.06", "branch-misses of all branches" : "0.59"}
# perf stat -a -j --metric-only --per-socket true
{"socket" : "S0", "aggregate-number" : 8, "GHz" : "0.439", "insn per cycle" : "2.14", "branch-misses of all branches" : "0.51"}
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-14-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now most of the print functions take a pointer to the struct outstate.
We have one in the evlist__print_counters() and pass it through the
child functions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It always passes a pointer to rt_stat as it's the only one. Let's not
pass it and directly refer it in the printout().
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The printout() takes a lot of arguments and sets an outstate with the
value. Instead, we can fill the outstate first and then pass it to
reduce the number of arguments.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It passes prefix and cgroup pointers but the outstate already has them.
Let's pass the outstate pointer instead.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a preparation for the later cleanup. No functional changes
intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a minor cleanup and preparation for the later change.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It already passes the stat_config argument, then it can find the value in the
config. No need to pass it separately.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It always passes a whitespace to the function, thus we can just add it to the
function body. Furthermore, it's only used in the normal output mode.
Well, actually CSV used it but it doesn't need to since we don't care about the
indentation or alignment in the CSV output.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-6-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It should not use sprintf() anymore. Let's pass the buffer size and use the
safer scnprintf() instead.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We don't care about the alignment in the CSV output as it's intended for machine
processing. Let's get rid of it to make the output more compact.
Before:
# perf stat -a --summary -I 1 -x, true
0.001149309,219.20,msec,cpu-clock,219322251,100.00,219.200,CPUs utilized
0.001149309,144,,context-switches,219241902,100.00,656.935,/sec
0.001149309,38,,cpu-migrations,219173705,100.00,173.358,/sec
0.001149309,61,,page-faults,219093635,100.00,278.285,/sec
0.001149309,10679310,,cycles,218746228,100.00,0.049,GHz
0.001149309,6288296,,instructions,218589869,100.00,0.59,insn per cycle
0.001149309,1386904,,branches,218428851,100.00,6.327,M/sec
0.001149309,56863,,branch-misses,218219951,100.00,4.10,of all branches
summary,219.20,msec,cpu-clock,219322251,100.00,20.025,CPUs utilized
summary,144,,context-switches,219241902,100.00,656.935,/sec
summary,38,,cpu-migrations,219173705,100.00,173.358,/sec
summary,61,,page-faults,219093635,100.00,278.285,/sec
summary,10679310,,cycles,218746228,100.00,0.049,GHz
summary,6288296,,instructions,218589869,100.00,0.59,insn per cycle
summary,1386904,,branches,218428851,100.00,6.327,M/sec
summary,56863,,branch-misses,218219951,100.00,4.10,of all branches
After:
0.001148449,224.75,msec,cpu-clock,224870589,100.00,224.747,CPUs utilized
0.001148449,176,,context-switches,224775564,100.00,783.103,/sec
0.001148449,38,,cpu-migrations,224707428,100.00,169.079,/sec
0.001148449,61,,page-faults,224629326,100.00,271.416,/sec
0.001148449,12172071,,cycles,224266368,100.00,0.054,GHz
0.001148449,6901907,,instructions,224108764,100.00,0.57,insn per cycle
0.001148449,1515655,,branches,223946693,100.00,6.744,M/sec
0.001148449,70027,,branch-misses,223735385,100.00,4.62,of all branches
summary,224.75,msec,cpu-clock,224870589,100.00,21.066,CPUs utilized
summary,176,,context-switches,224775564,100.00,783.103,/sec
summary,38,,cpu-migrations,224707428,100.00,169.079,/sec
summary,61,,page-faults,224629326,100.00,271.416,/sec
summary,12172071,,cycles,224266368,100.00,0.054,GHz
summary,6901907,,instructions,224108764,100.00,0.57,insn per cycle
summary,1515655,,branches,223946693,100.00,6.744,M/sec
summary,70027,,branch-misses,223735385,100.00,4.62,of all branches
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It matches to the prefix (interval timestamp), so better to have them together.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221123180208.2068936-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Verify when a bond is configured with {up,down}delay and the link state
of slave members flaps if there are no remaining members up the bond
should immediately select a member to bring up. (from bonding.txt
section 13.1 paragraph 4)
Suggested-by: Liang Li <liali@redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add some selftest testcases that validate the expected behavior of the
bpf_task_from_pid() kfunc that was added in the prior patch.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122145300.251210-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It caused some troubles when a lock inside kmalloc is contended
because task local storage would allocate memory using kmalloc.
It'd create a recusion and even crash in my system.
There could be a couple of workarounds but I think the simplest
one is to use a pre-allocated hash map. We could fix the task
local storage to use the safe BPF allocator, but it takes time
so let's change this until it happens actually.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Chris Li <chriscli@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221118190109.1512674-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using precise flag with br_inst_retired.near_call causes the test fail
on KVM guests, even when the guests have PMU forwarding enabled and the
event itself is supported.
Remove the precise flag in order to make the test work on KVM guests.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20221122083121.6012-1-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
With perf inject -b, it synthesizes build-id event for DSOs. But it
missed to set the size and resulted in having trailing zeros.
As perf record sets the size in write_build_id(), let's set the size
here as well.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221119002750.1568027-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On IBM Power9, perf watchpoint tests fail since no hardware breakpoints
are available. Detect this by checking the error returned by
perf_event_open() and skip the tests in that case.
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kajol Jain<kjain@linux.ibm.com>
Tested-by: Kajol Jain<kjain@linux.ibm.com>
Link: https://lore.kernel.org/r/20221121102747.208289-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
augmented_raw_syscalls.c defines the bpf map 'syscalls' which is
initialized by perf tool in user space to indicate which system calls
are enabled for tracing, on the other flip eBPF program relies on the
map to filter out the trace events which are not enabled.
The map also includes a field 'string_args_len[6]' which presents the
string length if the corresponding argument is a string type.
Now the map 'syscalls' is not used, bpf program doesn't use it as filter
anymore, this is replaced by using the function bpf_tail_call() and
PROG_ARRAY syscalls map. And we don't need to explicitly set the string
length anymore, bpf_probe_read_str() is smart to copy the string and
return string length.
Therefore, it's safe to remove the bpf map 'syscalls'.
To consolidate the code, this patch removes the definition of map
'syscalls' from augmented_raw_syscalls.c and drops code for using
the map in the perf trace.
Note, since function trace__set_ev_qualifier_bpf_filter() is removed,
calling trace__init_syscall_bpf_progs() from it is also removed. We
don't need to worry it because trace__init_syscall_bpf_progs() is
still invoked from trace__init_syscalls_bpf_prog_array_maps() for
initialization the system call's bpf program callback.
After:
# perf trace -e examples/bpf/augmented_raw_syscalls.c,open* --max-events 10 perf stat --quiet sleep 0.001
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libelf.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libdw.so.1", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libunwind.so.8", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libunwind-aarch64.so.8", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libcrypto.so.3", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libslang.so.2", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libperl.so.5.34", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
# perf trace -e examples/bpf/augmented_raw_syscalls.c --max-events 10 perf stat --quiet sleep 0.001
... [continued]: execve()) = 0
brk(NULL) = 0xaaaab1d28000
faccessat(-100, "/etc/ld.so.preload", 4) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
close(3</usr/lib/aarch64-linux-gnu/libcrypto.so.3>) = 0
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3</usr/lib/aarch64-linux-gnu/libcrypto.so.3>, 0xfffff33f70d0, 832) = 832
munmap(0xffffb5519000, 28672) = 0
munmap(0xffffb55b7000, 32880) = 0
mprotect(0xffffb55a6000, 61440, PROT_NONE) = 0
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221121075237.127706-6-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The local variable 'syscall' is not used anymore, remove it.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221121075237.127706-5-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
On Arm64 a case is perf tools fails to find the corresponding trace
point folder for system calls listed in the table 'syscalltbl_arm64',
e.g. the generated system call table contains "lookup_dcookie" but we
cannot find out the matched trace point folder for it.
We need to figure out if there have any issue for the generated system
call table, on the other hand, we need to handle the case when trace
point folder is missed under sysfs, this patch sets the flag
syscall::nonexistent as true and returns the error from
trace__read_syscall_info().
Another problem is for trace__syscall_info(), it returns two different
values if a system call doesn't exist: at the first time calling
trace__syscall_info() it returns NULL when the system call doesn't exist,
later if call trace__syscall_info() again for the same missed system
call, it returns pointer of syscall. trace__syscall_info() checks the
condition 'syscalls.table[id].name == NULL', but the name will be
assigned in the first invoking even the system call is not found.
So checking system call's name in trace__syscall_info() is not the right
thing to do, this patch simply checks flag syscall::nonexistent to make
decision if a system call exists or not, finally trace__syscall_info()
returns the consistent result (NULL) if a system call doesn't existed.
Fixes: b8b1033fca ("perf trace: Mark syscall ids that are not allocated to avoid unnecessary error messages")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221121075237.127706-4-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When a system call is not detected, the reason is either because the
system call ID is out of scope or failure to find the corresponding path
in the sysfs, trace__read_syscall_info() returns zero. Finally, without
returning an error value it introduces confusion for the caller.
This patch lets the function trace__read_syscall_info() to return
-EEXIST when a system call doesn't exist.
Fixes: b8b1033fca ("perf trace: Mark syscall ids that are not allocated to avoid unnecessary error messages")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: bpf@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221121075237.127706-3-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch defines a macro RAW_SYSCALL_ARGS_NUM to replace the open
coded number '6'.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221121075237.127706-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than controlling the list output with passed flags, add
callbacks that are called when an event or metric are
encountered. State is passed to the callback so that command line
options can be respected, alternatively the callbacks can be changed.
Fix a few bugs:
- wordwrap to columns metric descriptions and expressions;
- remove unnecessary whitespace after PMU event names;
- the metric filter is a glob but matched using strstr which will
always fail, switch to using a proper globmatch,
- the detail flag gives details for extra kernel PMU events like
branch-instructions.
In metricgroup.c switch from struct mep being a rbtree of metricgroups
containing a list of metrics, to the tree directly containing all the
metrics. In general the alias for a name is passed to the print
routine rather than being contained in the name with OR.
Committer notes:
Check the asprint() return to address this on fedora 36:
util/print-events.c: In function ‘print_sdt_events’:
util/print-events.c:183:33: error: ignoring return value of ‘asprintf’ declared with attribute ‘warn_unused_result’ [-Werror=unused-result]
183 | asprintf(&evt_name, "%s@%s(%.12s)", sdt_name->s, path, bid);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
$ gcc --version | head -1
gcc (GCC) 12.2.1 20220819 (Red Hat 12.2.1-2)
$
Fix ps.pmu_glob setting when dealing with *:* events, it was being left
with a freed pointer that then at the end of cmd_list() would be double
freed.
Check if pmu_name is NULL in default_print_event() before calling
strglobmatch(pmu_name, ...) to avoid a segfault.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xin Gao <gaoxin@cdjrlc.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: http://lore.kernel.org/lkml/20221114210723.2749751-10-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tools/lib includes fixes break LIBTRACEVENT_DYNAMIC as the makefile
erroneously had dependencies on building libtraceevent even when not
linking with it. This change fixes the issues with LIBTRACEEVENT_DYNAMIC
by making the built files optional.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221116224631.207631-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can get rid of requirement of a compiler.
$ sudo ./perf test -v 109
109: Test data symbol :
--- start ---
test child forked, pid 844526
Recording workload...
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.354 MB /tmp/__perf_test.perf.data.GFeZO (4847 samples) ]
Cleaning up files...
test child finished with 0
---- end ----
Test data symbol: Ok
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-13-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The datasym workload is to check if perf mem command gets the data
addresses precisely. This is needed for data symbol test.
$ perf test -w datasym
I had to keep the buf1 in the data section, otherwise it could end
up in the BSS and was mmaped as a separate //anon region, then it
was not symbolized at all. It needs to be fixed separately.
Committer notes:
Add a -U _FORTIFY_SOURCE to the datasym CFLAGS, as the main perf flags
set it and it requires building with optimization, and this new test has
a -O0.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-12-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can get rid of requirement of a compiler. Also rename the
symbols to match with the perf test workload.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@arm.com>
Acked-by: German Gomez <german.gomez@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-11-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The brstack is to run different kinds of branches repeatedly. This is
necessary for brstack test case to verify if it has correct branch info.
$ perf test -w brstack
I renamed the internal functions to have brstack_ prefix as it's too
generic name.
Add a -U_FORTIFY_SOURCE to the brstack CFLAGS, as the main perf flags
set it and it requires building with optimization, and this new test has
a -O0.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-10-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can get rid of requirement of a compiler. I've also removed
killall as it'll kill perf process now and run the test workload for 10
sec instead.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-9-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sqrtloop creates a child process to run an infinite loop calling
sqrt() with rand(). This is needed for ARM SPE fork test.
$ perf test -w sqrtloop
It can take an optional argument to specify how long it will run in
seconds (default: 1).
Committer notes:
Explicitely ignored the sqrt() return to fix the build on systems where
the compiler complains it isn't being used.
And added a sqrtloop specific CFLAGS to disable optimizations to make
this a bit more robust wrt dead code elimination.
Doing that a -U_FORTIFY_SOURCE needs to be added, as -O0 is incompatible
with it.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-8-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can get rid of requirement of a compiler.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhengjun Xing <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20221116233854.1596378-7-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>