Extending the code in previous commits, introduce referenced kptr
support, which needs to be tagged using 'kptr_ref' tag instead. Unlike
unreferenced kptr, referenced kptr have a lot more restrictions. In
addition to the type matching, only a newly introduced bpf_kptr_xchg
helper is allowed to modify the map value at that offset. This transfers
the referenced pointer being stored into the map, releasing the
references state for the program, and returning the old value and
creating new reference state for the returned pointer.
Similar to unreferenced pointer case, return value for this case will
also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer
must either be eventually released by calling the corresponding release
function, otherwise it must be transferred into another map.
It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear
the value, and obtain the old value if any.
BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future
commit will permit using BPF_LDX for such pointers, but attempt at
making it safe, since the lifetime of object won't be guaranteed.
There are valid reasons to enforce the restriction of permitting only
bpf_kptr_xchg to operate on referenced kptr. The pointer value must be
consistent in face of concurrent modification, and any prior values
contained in the map must also be released before a new one is moved
into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg
returns the old value, which the verifier would require the user to
either free or move into another map, and releases the reference held
for the pointer being moved in.
In the future, direct BPF_XCHG instruction may also be permitted to work
like bpf_kptr_xchg helper.
Note that process_kptr_func doesn't have to call
check_helper_mem_access, since we already disallow rdonly/wronly flags
for map, which is what check_map_access_type checks, and we already
ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc,
so check_map_access is also not required.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com
Add a new type flag for bpf_arg_type that when set tells verifier that
for a release function, that argument's register will be the one for
which meta.ref_obj_id will be set, and which will then be released
using release_reference. To capture the regno, introduce a new field
release_regno in bpf_call_arg_meta.
This would be required in the next patch, where we may either pass NULL
or a refcounted pointer as an argument to the release function
bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not
enough, as there is a case where the type of argument needed matches,
but the ref_obj_id is set to 0. Hence, we must enforce that whenever
meta.ref_obj_id is zero, the register that is to be released can only
be NULL for a release function.
Since we now indicate whether an argument is to be released in
bpf_func_proto itself, is_release_function helper has lost its utitlity,
hence refactor code to work without it, and just rely on
meta.release_regno to know when to release state for a ref_obj_id.
Still, the restriction of one release argument and only one ref_obj_id
passed to BPF helper or kfunc remains. This may be lifted in the future.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com
resctrl_tests can be built or run using kselftests framework.
Add description on how to do so in README.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In kselftest framework, all tests can be build/run at a time,
and a sub test also can be build/run individually. As follows:
$ make kselftest-all TARGETS=resctrl
$ make -C tools/testing/selftests run_tests
$ make -C tools/testing/selftests TARGETS=resctrl run_tests
However, resctrl_tests cannot be run using kselftest framework,
users have to change directory to tools/testing/selftests/resctrl/,
run "make" to build executable file "resctrl_tests",
and run "sudo ./resctrl_tests" to execute the test.
To build/run resctrl_tests using kselftest framework.
Modify tools/testing/selftests/Makefile
and tools/testing/selftests/resctrl/Makefile.
Even after this change, users can still build/run resctrl_tests
without using framework as before.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> # resctrl changes
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In kselftest framework, if a sub test can not run by some reasons,
the test result should be marked as SKIP rather than FAIL.
Return KSFT_SKIP(4) instead of KSFT_FAIL(1) if resctrl_tests is not run
as root or it is run on a test environment which does not support resctrl.
- ksft_exit_fail_msg(): returns KSFT_FAIL(1)
- ksft_exit_skip(): returns KSFT_SKIP(4)
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When testing on a Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz the resctrl
selftests fail due to timeout after exceeding the default time limit of
45 seconds. On this system the test takes about 68 seconds.
Since the failing test by default accesses a fixed size of memory, the
execution time should not vary significantly between different environment.
A new default of 120 seconds should be sufficient yet easy to customize
with the introduction of the "settings" file for reference.
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In kselftest framework, a sub test is run using the timeout utility
and it will send SIGTERM to the test upon timeout.
In resctrl_tests, a child process is created by fork() to
run benchmark but SIGTERM is not set in sigaction().
If SIGTERM signal is received, the parent process will be killed,
but the child process still exists.
Kill child process before the parent process terminates
if SIGTERM signal is received.
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
According to "Intel Resource Director Technology (Intel RDT) on
2nd Generation Intel Xeon Scalable Processors Reference Manual",
When the Intel Sub-NUMA Clustering(SNC) feature is enabled,
Intel CMT and MBM counters may not be accurate.
However, there does not seem to be an architectural way to detect
if SNC is enabled.
If the result of MBM&CMT test fails on Intel CPU,
print a message to let users know a possible cause of failure.
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, the resctrl_tests only has a function to detect AMD vendor.
Since when the Intel Sub-NUMA Clustering feature is enabled,
Intel CMT and MBM counters may not be accurate,
the resctrl_tests also need a function to detect Intel vendor.
And in the future, resctrl_tests will need a function to detect different
vendors, such as Arm.
Extend the function to detect Intel vendor as well. Also,
this function can be easily extended to detect other vendors.
Signed-off-by: Shaopeng Tan <tan.shaopeng@jp.fujitsu.com>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
musl nftw implementation does not support FTW_ACTIONRETVAL. There have been
multiple attempts at pushing the feature in musl upstream, but it has been
refused or ignored all the times:
https://www.openwall.com/lists/musl/2021/03/26/1https://www.openwall.com/lists/musl/2022/01/22/1
In this case we only care about /proc/<pid>/fd/<fd>, so it's not too difficult
to reimplement directly instead, and the new implementation makes 'bpftool perf'
slightly faster because it doesn't needlessly stat/readdir unneeded directories
(54ms -> 13ms on my machine).
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220424051022.2619648-4-asmadeus@codewreck.org
kselftest.h makes the __cpuid_count() macro available
to conveniently call the CPUID instruction.
Remove the local CPUID wrapper and use __cpuid_count()
from kselftest.h instead.
__cpuid_count() from kselftest.h is used instead of the
macro provided by the compiler since gcc v4.4 (via cpuid.h)
because the selftest needs to be supported with gcc v3.2,
the minimal required version for stable kernels.
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
kselftest.h makes the __cpuid_count() macro available
to conveniently call the CPUID instruction.
Remove the local CPUID wrapper and use __cpuid_count()
from kselftest.h instead.
__cpuid_count() from kselftest.h is used instead of the
macro provided by the compiler since gcc v4.4 (via cpuid.h)
because the selftest needs to be supported with gcc v3.2,
the minimal required version for stable kernels.
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
kselftest.h makes the __cpuid_count() macro available
to conveniently call the CPUID instruction.
Remove the local CPUID wrapper and use __cpuid_count()
from already included kselftest.h instead.
__cpuid_count() from kselftest.h is used instead of the
macro provided by the compiler since gcc v4.4 (via cpuid.h)
because the selftest needs to be compiled with gcc v3.2,
the minimal required version for stable kernels.
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: linux-mm@kvack.org
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Some selftests depend on information provided by the CPUID instruction.
To support this dependency the selftests implement private wrappers for
CPUID.
Duplication of the CPUID wrappers should be avoided.
Both gcc and clang/LLVM provide __cpuid_count() macros but neither
the macro nor its header file are available in all the compiler
versions that need to be supported by the selftests. __cpuid_count()
as provided by gcc is available starting with gcc v4.4, so it is
not available if the latest tests need to be run in all the
environments required to support kernels v4.9 and v4.14 that
have the minimal required gcc v3.2.
Duplicate gcc's __cpuid_count() macro to provide a centrally defined
macro for __cpuid_count() to help eliminate the duplicate CPUID wrappers
while continuing to compile in older environments.
Suggested-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently the damon selftests are not built with the rest of the
selftests. We add damon to the list of targets.
Fixes: b348eb7abd ("mm/damon: add user space selftests")
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Yuanchu Xie <yuanchu@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Most of the test suites in tools/testing/selftests contain a config file
that specifies which kernel config options need to be present in order for
the test suite to be able to run and perform meaningful validation. There
is no config file for the tools/testing/selftests/cgroup test suite, so
this patch adds one.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cgroup cpu controller selftests have a test_cpucg_max() testcase
that validates the behavior of the cpu.max knob. Let's also add a
testcase that verifies that the behavior works correctly when set on a
nested cgroup.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cgroup cpu controller test suite has a number of testcases that
validate the expected behavior of the cpu.weight knob, but none for
cpu.max. This testcase fixes that by adding a testcase for cpu.max as well.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cgroup cpu controller test suite currently contains a testcase called
test_cpucg_nested_weight_underprovisioned() which verifies the expected
behavior of cpu.weight when applied to nested cgroups. That first testcase
validated the expected behavior when the processes in the leaf cgroups
overcommitted the system. This patch adds a complementary
test_cpucg_nested_weight_underprovisioned() testcase which validates
behavior when those leaf cgroups undercommit the system.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cgroup cpu controller tests in
tools/testing/selftests/cgroup/test_cpu.c have some testcases that validate
the expected behavior of setting cpu.weight on cgroups, and then hogging
CPUs. What is still missing from the suite is a testcase that validates
nested cgroups. This patch adds test_cpucg_nested_weight_overprovisioned(),
which validates that a parent's cpu.weight will override its children if
they overcommit a host, and properly protect any sibling groups of that
parent.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Currently the binderfs test says what failure it encountered
without saying why it may occurred when it fails to mount
binderfs. So, Warn about enabling CONFIG_ANDROID_BINDERFS in the
running kernel.
Signed-off-by: Karthik Alapati <mail@karthek.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The link variable is already of type 'struct bpf_link *', casting it to
'struct bpf_link *' is redundant, drop it.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220424143420.457082-1-ytcoode@gmail.com
Once line card is activated, check the device FW version is exposed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once line card is provisioned, check if HW revision and INI version
are exposed.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Once line card is provisioned, check the count of devices on it and
print them out.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Not much of a test but we keep on getting problems with boolean controls
not being called Switches so let's add a few basic checks to help people
spot problems.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20220421115020.14118-1-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Factor out perf_evsel__ioctl() so it can be reused.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220422162402.147958-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since for cpu_core or cpu_atom, they have different topdown events
groups.
For cpu_core, --topdown equals to:
"{slots,cpu_core/topdown-retiring/,cpu_core/topdown-bad-spec/,
cpu_core/topdown-fe-bound/,cpu_core/topdown-be-bound/,
cpu_core/topdown-heavy-ops/,cpu_core/topdown-br-mispredict/,
cpu_core/topdown-fetch-lat/,cpu_core/topdown-mem-bound/}"
For cpu_atom, --topdown equals to:
"{cpu_atom/topdown-retiring/,cpu_atom/topdown-bad-spec/,
cpu_atom/topdown-fe-bound/,cpu_atom/topdown-be-bound/}"
To simplify the implementation, on hybrid, --topdown is used
together with --cputype. If without --cputype, it uses cpu_core
topdown events by default.
# ./perf stat --topdown -a sleep 1
WARNING: default to use cpu_core topdown events
Performance counter stats for 'system wide':
retiring bad speculation frontend bound backend bound heavy operations light operations branch mispredict machine clears fetch latency fetch bandwidth memory bound Core bound
4.1% 0.0% 5.1% 90.8% 2.3% 1.8% 0.0% 0.0% 4.2% 0.9% 9.9% 81.0%
1.002624229 seconds time elapsed
# ./perf stat --topdown -a --cputype atom sleep 1
Performance counter stats for 'system wide':
retiring bad speculation frontend bound backend bound
13.5% 0.1% 31.2% 55.2%
1.002366987 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-3-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE perf.data
files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on s390, where
it is not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYmNPCAAKCRCyPKLppCJ+
JxW9AQCgzYxEw5CJ+zn58lGmYJdfV5Kc6C8MPD671oo39lC49AD/Qw8tyklKTok5
hJkZ3CqahjMdN1j+xNgskXBNcJW6Rww=
=Ayk0
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix header include for LLVM >= 14 when building with libclang.
- Allow access to 'data_src' for auxtrace in 'perf script' with ARM SPE
perf.data files, fixing processing data with such attributes.
- Fix error message for test case 71 ("Convert perf time to TSC") on
s390, where it is not supported.
* tag 'perf-tools-fixes-for-v5.18-2022-04-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf test: Fix error message for test case 71 on s390, where it is not supported
perf report: Set PERF_SAMPLE_DATA_SRC bit for Arm SPE event
perf script: Always allow field 'data_src' for auxtrace
perf clang: Fix header include for LLVM >= 14
This adds an initial subset of forwarding selftests which I considered
to be relevant for DSA drivers, along with a forwarding.config that
makes it easier to run them (disables veth pair creation, makes sure MAC
addresses are unique and stable).
The intention is to request driver writers to run these selftests during
review and make sure that the tests pass, or at least that the problems
are known.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This tests the capability of switch ports to filter out undesired
traffic. Different drivers are expected to have different capabilities
here (so some may fail and some may pass), yet the test still has some
value, for example to check for regressions.
There are 2 kinds of failures, one is when a packet which should have
been accepted isn't (and that should be fixed), and the other "failure"
(as reported by the test) is when a packet could have been filtered out
(for being unnecessary) yet it was received.
The bridge driver fares particularly badly at this test:
TEST: br0: Unicast IPv4 to primary MAC address [ OK ]
TEST: br0: Unicast IPv4 to macvlan MAC address [ OK ]
TEST: br0: Unicast IPv4 to unknown MAC address [FAIL]
reception succeeded, but should have failed
TEST: br0: Unicast IPv4 to unknown MAC address, promisc [ OK ]
TEST: br0: Unicast IPv4 to unknown MAC address, allmulti [FAIL]
reception succeeded, but should have failed
TEST: br0: Multicast IPv4 to joined group [ OK ]
TEST: br0: Multicast IPv4 to unknown group [FAIL]
reception succeeded, but should have failed
TEST: br0: Multicast IPv4 to unknown group, promisc [ OK ]
TEST: br0: Multicast IPv4 to unknown group, allmulti [ OK ]
TEST: br0: Multicast IPv6 to joined group [ OK ]
TEST: br0: Multicast IPv6 to unknown group [FAIL]
reception succeeded, but should have failed
TEST: br0: Multicast IPv6 to unknown group, promisc [ OK ]
TEST: br0: Multicast IPv6 to unknown group, allmulti [ OK ]
mainly because it does not implement IFF_UNICAST_FLT. Yet I still think
having the test (with the failures) is useful in case somebody wants to
tackle that problem in the future, to make an easy before-and-after
comparison.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Bombard a standalone switch port with various kinds of traffic to ensure
it is really standalone and doesn't leak packets to other switch ports.
Also check for switch ports in different bridges, and switch ports in a
VLAN-aware bridge but having different pvids. No forwarding should take
place in either case.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pinging an IPv6 link-local multicast address selects the link-local
unicast address of the interface as source, and we'd like to monitor for
that in tcpdump.
Add a helper to the forwarding library which retrieves the link-local
IPv6 address of an interface, to make that task easier.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the forwarding library with calls to some small C programs which
join an IP multicast group and send some packets to it. Both IPv4 and
IPv6 groups are supported. Use cases range from testing IGMP/MLD
snooping, to RX filtering, to multicast routing.
Testing multicast traffic using msend/mreceive is intended to be done
using tcpdump.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend tcpdump_start() & C:o to handle multiple instances. Useful when
observing bridge operation, e.g., unicast learning/flooding, and any
case of multicast distribution (to these ports but not that one ...).
This means the interface argument is now a mandatory argument to all
tcpdump_*() functions, hence the changes to the ocelot flower test.
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some use-cases we may want to change the tcpdump flags used in
tcpdump_start(). For instance, observing interfaces without the PROMISC
flag, e.g. to see what's really being forwarded to the bridge interface.
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By default, DSA switch ports inherit their MAC address from the DSA
master.
This works well for practical situations, but some selftests like
bridge_vlan_unaware.sh loop back 2 standalone DSA ports with 2 bridged
DSA ports, and require the bridge to forward packets between the
standalone ports.
Due to the bridge seeing that the MAC DA it needs to forward is present
as a local FDB entry (it coincides with the MAC address of the bridge
ports), the test packets are not forwarded, but terminated locally on
br0. In turn, this makes the ping and ping6 tests fail.
Address this by introducing an option to have stable MAC addresses.
When mac_addr_prepare is called, the current addresses of the netifs are
saved and replaced with 00:01:02:03:04:${netif number}. Then when
mac_addr_restore is called at the end of the test, the original MAC
addresses are restored. This ensures that the MAC addresses are unique,
which makes the test pass even for DSA ports.
The usage model is for the behavior to be opt-in via STABLE_MAC_ADDRS,
which DSA should set to true, all others behave as before. By hooking
the calls to mac_addr_prepare and mac_addr_restore within the forwarding
lib itself, we do not need to patch each individual selftest, the only
requirement is that pre_cleanup is called.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds a function chk_infi_nr() to check the mibs for the
infinite mapping. Invoke it in chk_join_nr() when validate_checksum
is set.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Remove 's' & 'u' as valid ISA extension
* Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
* Fix NMI watchdog in guests on AMD
* Fix for SEV cache incoherency issues
* Don't re-acquire SRCU lock in complete_emulated_io()
* Avoid NULL pointer deref if VM creation fails
* Fix race conditions between APICv disabling and vCPU creation
* Bugfixes for disabling of APICv
* Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
* Do not use bitfields larger than 32-bits, they differ between GCC and clang
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJi3KUUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMhvQf/Yncfg3MkOvKsVxnCe7diKDTI/E2n
wBGNIcL8r7L9oIltHL4Mh7JQTacHFQOZ9PQ30NO1p+pznZ03e8LR59IF1JpP7VOU
sWrLZ5a4bIAEjOpA7Jxcee6hUBwewBauDgFLbb+YAI2lAahiH7jVfywDRife/c3k
N2LjeA75K8UvMiDCfjxxxerFJK91zaqjWlUNF2OhtFp/5pnMfS+nli9Q8QS837pZ
oUf+0Beb2RpSHan+wbYVU7X3ZLwtpR0M3w3uXOG+X3as56wDf26znXS02aSwa45x
lfX+pqJfmb4vCJJDXt6avH27EVgTq0Vew+BhQHG3VLRO6uxZ+smX6qmsuw==
=kvbw
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"The main and larger change here is a workaround for AMD's lack of
cache coherency for encrypted-memory guests.
I have another patch pending, but it's waiting for review from the
architecture maintainers.
RISC-V:
- Remove 's' & 'u' as valid ISA extension
- Do not allow disabling the base extensions 'i'/'m'/'a'/'c'
x86:
- Fix NMI watchdog in guests on AMD
- Fix for SEV cache incoherency issues
- Don't re-acquire SRCU lock in complete_emulated_io()
- Avoid NULL pointer deref if VM creation fails
- Fix race conditions between APICv disabling and vCPU creation
- Bugfixes for disabling of APICv
- Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
selftests:
- Do not use bitfields larger than 32-bits, they differ between GCC
and clang"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
kvm: selftests: introduce and use more page size-related constants
kvm: selftests: do not use bitfields larger than 32-bits for PTEs
KVM: SEV: add cache flush to solve SEV cache incoherency issues
KVM: SVM: Flush when freeing encrypted pages even on SME_COHERENT CPUs
KVM: SVM: Simplify and harden helper to flush SEV guest page(s)
KVM: selftests: Silence compiler warning in the kvm_page_table_test
KVM: x86/pmu: Update AMD PMC sample period to fix guest NMI-watchdog
x86/kvm: Preserve BSP MSR_KVM_POLL_CONTROL across suspend/resume
KVM: SPDX style and spelling fixes
KVM: x86: Skip KVM_GUESTDBG_BLOCKIRQ APICv update if APICv is disabled
KVM: x86: Pend KVM_REQ_APICV_UPDATE during vCPU creation to fix a race
KVM: nVMX: Defer APICv updates while L2 is active until L1 is active
KVM: x86: Tag APICv DISABLE inhibit, not ABSENT, if APICv is disabled
KVM: Initialize debugfs_dentry when a VM is created to avoid NULL deref
KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused
KVM: RISC-V: Use kvm_vcpu.srcu_idx, drop RISC-V's unnecessary copy
KVM: x86: Don't re-acquire SRCU lock in complete_emulated_io()
RISC-V: KVM: Restrict the extensions that can be disabled
RISC-V: KVM: Remove 's' & 'u' as valid ISA extension
It turns out that by having CONFIG_ACPI=n, we've been failing to boot
additional CPUs, and so these systems were functionally UP. The code
bloat is unfortunate for build times, but I don't see an alternative. So
this commit sets CONFIG_ACPI=y for x86_64 and i686 configs.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Use bpf_link_create() API in fexit_stress test to attach FEXIT programs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kui-Feng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-4-andrii@kernel.org
Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() on
older kernels for programs that are attachable through
BPF_RAW_TRACEPOINT_OPEN. This makes bpf_link_create() more unified and
convenient interface for creating bpf_link-based attachments.
With this approach end users can just use bpf_link_create() for
tp_btf/fentry/fexit/fmod_ret/lsm program attachments without needing to
care about kernel support, as libbpf will handle this transparently. On
the other hand, as newer features (like BPF cookie) are added to
LINK_CREATE interface, they will be readily usable though the same
bpf_link_create() API without any major refactoring from user's
standpoint.
bpf_program__attach_btf_id() is now using bpf_link_create() internally
as well and will take advantaged of this unified interface when BPF
cookie is added for fentry/fexit.
Doing proactive feature detection of LINK_CREATE support for
fentry/tp_btf/etc is quite involved. It requires parsing vmlinux BTF,
determining some stable and guaranteed to be in all kernels versions
target BTF type (either raw tracepoint or fentry target function),
actually attaching this program and thus potentially affecting the
performance of the host kernel briefly, etc. So instead we are taking
much simpler "lazy" approach of falling back to
bpf_raw_tracepoint_open() call only if initial LINK_CREATE command
fails. For modern kernels this will mean zero added overhead, while
older kernels will incur minimal overhead with a single fast-failing
LINK_CREATE call.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Kui-Feng Lee <kuifeng@fb.com>
Link: https://lore.kernel.org/bpf/20220421033945.3602803-3-andrii@kernel.org
Test case 71 'Convert perf time to TSC' is not supported on s390.
Subtest 71.1 is skipped with the correct message, but subtest 71.2 is
not skipped and fails.
The root cause is function evlist__open() called from
test__perf_time_to_tsc(). evlist__open() returns -ENOENT because the
event cycles:u is not supported by the selected PMU, for example
platform s390 on z/VM or an x86_64 virtual machine.
The PMU driver returns -ENOENT in this case. This error is leads to the
failure.
Fix this by returning TEST_SKIP on -ENOENT.
Output before:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: FAILED!
Output after:
71: Convert perf time to TSC:
71.1: TSC support: Skip (This architecture does not support)
71.2: Perf time to TSC: Skip (perf_read_tsc_conversion is not supported)
This also happens on an x86_64 virtual machine:
# uname -m
x86_64
$ ./perf test -F 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : FAILED!
$
Committer testing:
Continues to work on x86_64:
$ perf test 71
71: Convert perf time to TSC :
71.1: TSC support : Ok
71.2: Perf time to TSC : Ok
$
Fixes: 290fa68bdc ("perf test tsc: Fix error message when not supported")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: chengdongli@tencent.com
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220420062921.1211825-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit bb30acae4c ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't report result if the PERF_SAMPLE_DATA_SRC bit is missed in sample
type.
The commit ffab487052 ("perf: arm-spe: Fix perf report
--mem-mode") partially fixes the issue. It adds PERF_SAMPLE_DATA_SRC
bit for Arm SPE event, this allows the perf data file generated by
kernel v5.18-rc1 or later version can be reported properly.
On the other hand, perf tool still fails to be backward compatibility
for a data file recorded by an older version's perf which contains Arm
SPE trace data. This patch is a workaround in reporting phase, when
detects ARM SPE PMU event and without PERF_SAMPLE_DATA_SRC bit, it will
force to set the bit in the sample type and give a warning info.
Fixes: bb30acae4c ("perf report: Bail out --mem-mode if mem info is not available")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: German Gomez <german.gomez@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Link: https://lore.kernel.org/r/20220414123201.842754-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If use command 'perf script -F,+data_src' to dump memory samples with
Arm SPE trace data, it reports error:
# perf script -F,+data_src
Samples for 'dummy:u' event do not have DATA_SRC attribute set. Cannot print 'data_src' field.
This is because the 'dummy:u' event is absent DATA_SRC bit in its sample
type, so if a file contains AUX area tracing data then always allow
field 'data_src' to be selected as an option for perf script.
Fixes: e55ed3423c ("perf arm-spe: Synthesize memory event")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220417114837.839896-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The header TargetRegistry.h has moved in LLVM/clang 14.
Committer notes:
The problem as noticed when building in ubuntu:22.04:
90 98.61 ubuntu:22.04 : FAIL gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
util/c++/clang.cpp:23:10: fatal error: llvm/Support/TargetRegistry.h: No such file or directory
23 | #include "llvm/Support/TargetRegistry.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Fixed after applying this patch.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Guilherme Amadio <amadio@gentoo.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://twitter.com/GuilhermeAmadio/status/1514970524232921088
Link: http://lore.kernel.org/lkml/Ylp0M/VYgHOxtcnF@gentoo.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
test_cpu.c includes testcases that validate the cgroup cpu controller.
This patch adds a new testcase called test_cpucg_weight_underprovisioned()
that verifies that processes with different cpu.weight that are all running
on an underprovisioned system, still get roughly the same amount of cpu
time.
Because test_cpucg_weight_underprovisioned() is very similar to
test_cpucg_weight_overprovisioned(), this patch also pulls the common logic
into a separate helper function that is invoked from both testcases, and
which uses function pointers to invoke the unique portions of the
testcases.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
test_cpu.c includes testcases that validate the cgroup cpu controller.
This patch adds a new testcase called test_cpucg_weight_overprovisioned()
that verifies the expected behavior of creating multiple processes with
different cpu.weight, on a system that is overprovisioned.
So as to avoid code duplication, this patch also updates cpu_hog_func_param
to take a new hog_clock_type enum which informs how time is counted in
hog_cpus_timed() (either process time or wall clock time).
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
test_cpu.c includes testcases that validate the cgroup cpu controller.
This patch adds a new testcase called test_cpucg_stats() that verifies the
expected behavior of the cpu.stat interface. In doing so, we define a
new hog_cpus_timed() function which takes a cpu_hog_func_param struct
that configures how many CPUs it uses, and how long it runs. Future
patches will also spawn threads that hog CPUs, so this function will
eventually serve those use-cases as well.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
The cgroup selftests suite currently contains tests that validate various
aspects of cgroup, such as validating the expected behavior for memory
controllers, the expected behavior of cgroup.procs, etc. There are no tests
that validate the expected behavior of the cgroup cpu controller.
This patch therefore adds a new test_cpu.c file that will contain cpu
controller testcases. The file currently only contains a single testcase
that validates creating nested cgroups with cgroup.subtree_control
including cpu. Future patches will add more sophisticated testcases that
validate functional aspects of the cpu controller.
Signed-off-by: David Vernet <void@manifault.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Turn kmem_cache_alloc() into a wrapper around kmem_cache_alloc_lru().
Fixes: 9bbdc0f324 ("xarray: use kmem_cache_alloc_lru to allocate xa_node")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reported-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Reported-by: Li Wang <liwang@redhat.com>
For hybrid events, by default stat aggregates and reports the event counts
per pmu.
# ./perf stat -e cycles -a sleep 1
Performance counter stats for 'system wide':
14,066,877,268 cpu_core/cycles/
6,814,443,147 cpu_atom/cycles/
1.002760625 seconds time elapsed
Sometimes, it's also useful to aggregate event counts from all PMUs.
Create a new option '--hybrid-merge' to enable that behavior and report
the counts without PMUs.
# ./perf stat -e cycles -a --hybrid-merge sleep 1
Performance counter stats for 'system wide':
20,732,982,512 cycles
1.002776793 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-2-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One metric such as 'Kernel_Utilization' may be from different PMUs and
consists of different events.
For core,
Kernel_Utilization = cpu_clk_unhalted.thread:k / cpu_clk_unhalted.thread
For atom,
Kernel_Utilization = cpu_clk_unhalted.core:k / cpu_clk_unhalted.core
The metric group string for core is:
'{cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.thread_p/metric-id=cpu_clk_unhalted.thread_p:k/k,cpu_clk_unhalted.thread/metric-id=cpu_clk_unhalted.thread/}:W#cpu_core'
The metric group string for atom is:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W'
It's internally expanded to:
'{cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core:k/k,cpu_clk_unhalted.core/metric-id=cpu_clk_unhalted.core/}:W#cpu_atom'
That means the group "{cpu_clk_unhalted.thread:k,cpu_clk_unhalted.thread}:W"
is from cpu_core PMU and the group "{cpu_clk_unhalted.core:k,cpu_clk_unhalted.core}"
is from cpu_atom PMU. And then next, check if the events in the group are
valid on that PMU. If one event is not valid on that PMU, the associated
group would be removed internally.
In this example, cpu_clk_unhalted.thread is valid on cpu_core and
cpu_clk_unhalted.core is valid on cpu_atom. So the checks for these two
groups are passed.
Before:
# ./perf stat -M Kernel_Utilization -a sleep 1
WARNING: events in group from different hybrid PMUs!
WARNING: grouped events cpus do not match, disabling group:
anon group { CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD_P:k, CPU_CLK_UNHALTED.THREAD, CPU_CLK_UNHALTED.THREAD }
Performance counter stats for 'system wide':
17,639,501 cpu_atom/CPU_CLK_UNHALTED.CORE/ # 1.00 Kernel_Utilization
17,578,757 cpu_atom/CPU_CLK_UNHALTED.CORE:k/
1,005,350,226 ns duration_time
43,012,352 cpu_core/CPU_CLK_UNHALTED.THREAD_P:k/ # 0.99 Kernel_Utilization
17,608,010 cpu_atom/CPU_CLK_UNHALTED.THREAD_P:k/
43,608,755 cpu_core/CPU_CLK_UNHALTED.THREAD/
17,630,838 cpu_atom/CPU_CLK_UNHALTED.THREAD/
1,005,350,226 ns duration_time
1.005350226 seconds time elapsed
After:
# ./perf stat -M Kernel_Utilization -a sleep 1
Performance counter stats for 'system wide':
17,981,895 CPU_CLK_UNHALTED.CORE [cpu_atom] # 1.00 Kernel_Utilization
17,925,405 CPU_CLK_UNHALTED.CORE:k [cpu_atom]
1,004,811,366 ns duration_time
41,246,425 CPU_CLK_UNHALTED.THREAD_P:k [cpu_core] # 0.99 Kernel_Utilization
41,819,129 CPU_CLK_UNHALTED.THREAD [cpu_core]
1,004,811,366 ns duration_time
1.004811366 seconds time elapsed
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220422065635.767648-1-zhengjun.xing@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge misc fixes from Andrew Morton:
"13 patches.
Subsystems affected by this patch series: mm (memory-failure, memcg,
userfaultfd, hugetlbfs, mremap, oom-kill, kasan, hmm), and kcov"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/mmu_notifier.c: fix race in mmu_interval_notifier_remove()
kcov: don't generate a warning on vm_insert_page()'s failure
MAINTAINERS: add Vincenzo Frascino to KASAN reviewers
oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup
selftest/vm: add skip support to mremap_test
selftest/vm: support xfail in mremap_test
selftest/vm: verify remap destination address in mremap_test
selftest/vm: verify mmap addr in mremap_test
mm, hugetlb: allow for "high" userspace addresses
userfaultfd: mark uffd_wp regardless of VM_WRITE flag
memcg: sync flush only if periodic flush is delayed
mm/memory-failure.c: skip huge_zero_page in memory_failure()
mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()
Move the libbpf init code into a single function, so that we have a single
place doing that.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: https://lore.kernel.org/r/20220422100025.1469207-4-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The '--lto' option is a confusing way of telling objtool to do stack
validation despite it being a linked object. It's no longer needed now
that an explicit '--stackval' option exists. The '--vmlinux' option is
also redundant.
Remove both options in favor of a straightforward '--link' option which
identifies a linked object.
Also, implicitly set '--link' with a warning if the user forgets to do
so and we can tell that it's a linked object. This makes it easier for
manual vmlinux runs.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/dcd3ceffd15a54822c6183e5766d21ad06082b45.1650300597.git.jpoimboe@redhat.com
Objtool has some hacks in place to workaround toolchain limitations
which otherwise would break no-instrumentation rules. Make the hacks
explicit (and optional for other arches) by turning it into a cmdline
option and kernel config option.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/b326eeb9c33231b9dfbb925f194ed7ee40edcd7c.1650300597.git.jpoimboe@redhat.com
Objtool secretly does a jump label hack to overcome the limitations of
the toolchain. Make the hack explicit (and optional for other arches)
by turning it into a cmdline option and kernel config option.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/3bdcbfdd27ecb01ddec13c04bdf756a583b13d24.1650300597.git.jpoimboe@redhat.com
Now that CONFIG_STACK_VALIDATION is frame-pointer specific, do the same
for the '--stackval' option. Now the '--no-fp' option is redundant and
can be removed.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/f563fa064b3b63d528de250c72012d49e14742a3.1650300597.git.jpoimboe@redhat.com
Now that stack validation is an optional feature of objtool, add
CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with
it.
CONFIG_STACK_VALIDATION can now be considered to be frame-pointer
specific. CONFIG_UNWINDER_ORC is already inherently valid for live
patching, so no need to "validate" it.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com
Extricate ibt from validate_branch() so they can be executed (or ported)
independently from each other.
While shuffling code around, simplify and improve the ibt logic:
- Ignore an explicit list of known sections which reference functions
for reasons other than indirect branching to them. This helps prevent
unnnecesary sealing.
- Warn on missing !ENDBR for all other sections, not just .data and
.rodata. This finds additional warnings, because there are sections
other than .[ro]data which reference function pointers. For example,
the ksymtab sections which are used for exporting symbols.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/fd1435e46bb95f81031b8fb1fa360f5f787e4316.1650300597.git.jpoimboe@redhat.com
To help prevent objtool users from having to do math to convert function
addresses to section addresses, and to help out with finding data
addresses reported by IBT validation, add an option to print the section
address in addition to the function address.
Normal:
vmlinux.o: warning: objtool: fixup_exception()+0x2d1: unreachable instruction
With '--sec-address':
vmlinux.o: warning: objtool: fixup_exception()+0x2d1 (.text+0x76c51): unreachable instruction
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/2cea4d5299d53d1a4c09212a6ad7820aa46fda7a.1650300597.git.jpoimboe@redhat.com
The parentheses in the "func()+off" address output are inconsistent with
how the kernel prints function addresses, breaking Peter's scripts.
Remove them.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/f2bec70312f62ef4f1ea21c134d9def627182ad3.1650300597.git.jpoimboe@redhat.com
Objtool has a fairly singular focus. It runs on object files and does
validations and transformations which can be combined in various ways.
The subcommand model has never been a good fit, making it awkward to
combine and remove options.
Remove the "check" and "orc" subcommands in favor of a more traditional
cmdline option model. This makes it much more flexible to use, and
easier to port individual features to other arches.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/5c61ebf805e90aefc5fa62bc63468ffae53b9df6.1650300597.git.jpoimboe@redhat.com
Split the existing options into two groups: actions, which actually do
something; and options, which modify the actions in some way.
Also there's no need to have short flags for all the non-action options.
Reserve short flags for the more important actions.
While at it:
- change a few of the short flags to be more intuitive
- make option descriptions more consistently descriptive
- sort options in the source like they are when printed
- move options to a global struct
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/9dcaa752f83aca24b1b21f0b0eeb28a0c181c0b0.1650300597.git.jpoimboe@redhat.com
The OPTION_GROUP option type is a way of grouping certain options
together in the printed usage text. It happens to be completely broken,
thanks to the fact that the subcmd option sorting just sorts everything,
without regard for grouping. Luckily, nobody uses this option anyway,
though that will change shortly.
Fix it by sorting each group individually.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
Link: https://lkml.kernel.org/r/e167ea3a11e2a9800eb062c1fd0f13e9cd05140c.1650300597.git.jpoimboe@redhat.com
Occasionally objtool driven code patching (think .static_call_sites
.retpoline_sites etc..) goes sideways and it tries to patch an
instruction that doesn't match.
Much head-scatching and cursing later the problem is as outlined below
and affects every section that objtool generates for us, very much
including the ORC data. The below uses .static_call_sites because it's
convenient for demonstration purposes, but as mentioned the ORC
sections, .retpoline_sites and __mount_loc are all similarly affected.
Consider:
foo-weak.c:
extern void __SCT__foo(void);
__attribute__((weak)) void foo(void)
{
return __SCT__foo();
}
foo.c:
extern void __SCT__foo(void);
extern void my_foo(void);
void foo(void)
{
my_foo();
return __SCT__foo();
}
These generate the obvious code
(gcc -O2 -fcf-protection=none -fno-asynchronous-unwind-tables -c foo*.c):
foo-weak.o:
0000000000000000 <foo>:
0: e9 00 00 00 00 jmpq 5 <foo+0x5> 1: R_X86_64_PLT32 __SCT__foo-0x4
foo.o:
0000000000000000 <foo>:
0: 48 83 ec 08 sub $0x8,%rsp
4: e8 00 00 00 00 callq 9 <foo+0x9> 5: R_X86_64_PLT32 my_foo-0x4
9: 48 83 c4 08 add $0x8,%rsp
d: e9 00 00 00 00 jmpq 12 <foo+0x12> e: R_X86_64_PLT32 __SCT__foo-0x4
Now, when we link these two files together, you get something like
(ld -r -o foos.o foo-weak.o foo.o):
foos.o:
0000000000000000 <foo-0x10>:
0: e9 00 00 00 00 jmpq 5 <foo-0xb> 1: R_X86_64_PLT32 __SCT__foo-0x4
5: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:0x0(%rax,%rax,1)
f: 90 nop
0000000000000010 <foo>:
10: 48 83 ec 08 sub $0x8,%rsp
14: e8 00 00 00 00 callq 19 <foo+0x9> 15: R_X86_64_PLT32 my_foo-0x4
19: 48 83 c4 08 add $0x8,%rsp
1d: e9 00 00 00 00 jmpq 22 <foo+0x12> 1e: R_X86_64_PLT32 __SCT__foo-0x4
Noting that ld preserves the weak function text, but strips the symbol
off of it (hence objdump doing that funny negative offset thing). This
does lead to 'interesting' unused code issues with objtool when ran on
linked objects, but that seems to be working (fingers crossed).
So far so good.. Now lets consider the objtool static_call output
section (readelf output, old binutils):
foo-weak.o:
Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 .text + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foo.o:
Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 .text + d
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foos.o:
Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000100000002 R_X86_64_PC32 0000000000000000 .text + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
0000000000000008 0000000100000002 R_X86_64_PC32 0000000000000000 .text + 1d
000000000000000c 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
So we have two patch sites, one in the dead code of the weak foo and one
in the real foo. All is well.
*HOWEVER*, when the toolchain strips unused section symbols it
generates things like this (using new enough binutils):
foo-weak.o:
Relocation section '.rela.static_call_sites' at offset 0x2c8 contains 1 entry:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 foo + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foo.o:
Relocation section '.rela.static_call_sites' at offset 0x310 contains 2 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000200000002 R_X86_64_PC32 0000000000000000 foo + d
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
foos.o:
Relocation section '.rela.static_call_sites' at offset 0x430 contains 4 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000000000 0000000100000002 R_X86_64_PC32 0000000000000000 foo + 0
0000000000000004 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
0000000000000008 0000000100000002 R_X86_64_PC32 0000000000000000 foo + d
000000000000000c 0000000d00000002 R_X86_64_PC32 0000000000000000 __SCT__foo + 1
And now we can see how that foos.o .static_call_sites goes side-ways, we
now have _two_ patch sites in foo. One for the weak symbol at foo+0
(which is no longer a static_call site!) and one at foo+d which is in
fact the right location.
This seems to happen when objtool cannot find a section symbol, in which
case it falls back to any other symbol to key off of, however in this
case that goes terribly wrong!
As such, teach objtool to create a section symbol when there isn't
one.
Fixes: 44f6a7c075 ("objtool: Fix seg fault with Clang non-section symbols")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lkml.kernel.org/r/20220419203807.655552918@infradead.org
Allow the mremap test to be skipped due to errors such as failing to
parse the mmap_min_addr sysctl.
Link: https://lkml.kernel.org/r/20220420215721.4868-4-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use ksft_test_result_xfail for the tests which are expected to fail.
Link: https://lkml.kernel.org/r/20220420215721.4868-3-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Because mremap does not have a MAP_FIXED_NOREPLACE flag, it can destroy
existing mappings. This causes a segfault when regions such as text are
remapped and the permissions are changed.
Verify the requested mremap destination address does not overlap any
existing mappings by using mmap's MAP_FIXED_NOREPLACE flag. Keep
incrementing the destination address until a valid mapping is found or
fail the current test once the max address is reached.
Link: https://lkml.kernel.org/r/20220420215721.4868-2-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Avoid calling mmap with requested addresses that are less than the
system's mmap_min_addr. When run as root, mmap returns EACCES when
trying to map addresses < mmap_min_addr. This is not one of the error
codes for the condition to retry the mmap in the test.
Rather than arbitrarily retrying on EACCES, don't attempt an mmap until
addr > vm.mmap_min_addr.
Add a munmap call after an alignment check as the mappings are retained
after the retry and can reach the vm.max_map_count sysctl.
Link: https://lkml.kernel.org/r/20220420215721.4868-1-sidhartha.kumar@oracle.com
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up code that was hardcoding masks for various fields,
now that the masks are included in processor.h.
For more cleanup, define PAGE_SIZE and PAGE_MASK just like in Linux.
PAGE_SIZE in particular was defined by several tests.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Red Hat's QE team reported test failure on access_tracking_perf_test:
Testing guest mode: PA-bits:ANY, VA-bits:48, 4K pages
guest physical test memory offset: 0x3fffbffff000
Populating memory : 0.684014577s
Writing to populated memory : 0.006230175s
Reading from populated memory : 0.004557805s
==== Test Assertion Failure ====
lib/kvm_util.c:1411: false
pid=125806 tid=125809 errno=4 - Interrupted system call
1 0x0000000000402f7c: addr_gpa2hva at kvm_util.c:1411
2 (inlined by) addr_gpa2hva at kvm_util.c:1405
3 0x0000000000401f52: lookup_pfn at access_tracking_perf_test.c:98
4 (inlined by) mark_vcpu_memory_idle at access_tracking_perf_test.c:152
5 (inlined by) vcpu_thread_main at access_tracking_perf_test.c:232
6 0x00007fefe9ff81ce: ?? ??:0
7 0x00007fefe9c64d82: ?? ??:0
No vm physical memory at 0xffbffff000
I can easily reproduce it with a Intel(R) Xeon(R) CPU E5-2630 with 46 bits
PA.
It turns out that the address translation for clearing idle page tracking
returned a wrong result; addr_gva2gpa()'s last step, which is based on
"pte[index[0]].pfn", did the calculation with 40 bits length and the
high 12 bits got truncated. In above case the GPA address to be returned
should be 0x3fffbffff000 for GVA 0xc0000000, but it got truncated into
0xffbffff000 and the subsequent gpa2hva lookup failed.
The width of operations on bit fields greater than 32-bit is
implementation defined, and differs between GCC (which uses the bitfield
precision) and clang (which uses 64-bit arithmetic), so this is a
potential minefield. Remove the bit fields and using manual masking
instead.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2075036
Reported-by: Nana Liu <nanliu@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Tested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Current release - regressions:
- rxrpc: restore removed timer deletion
Current release - new code bugs:
- gre: fix device lookup for l3mdev use-case
- xfrm: fix egress device lookup for l3mdev use-case
Previous releases - regressions:
- sched: cls_u32: fix netns refcount changes in u32_change()
- smc: fix sock leak when release after smc_shutdown()
- xfrm: limit skb_page_frag_refill use to a single page
- eth: atlantic: invert deep par in pm functions, preventing null
derefs
- eth: stmmac: use readl_poll_timeout_atomic() in atomic state
Previous releases - always broken:
- gre: fix skb_under_panic on xmit
- openvswitch: fix OOB access in reserve_sfa_size()
- dsa: hellcreek: calculate checksums in tagger
- eth: ice: fix crash in switchdev mode
- eth: igc:
- fix infinite loop in release_swfw_sync
- fix scheduling while atomic
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmJhLbcSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOknxkP/jiAATyBt/HQFykQNiZ7cdrcv1gzJ3Lg
BmrV1QmbrNwfffBtmBHRliP7x0vNF6fV8LUKjfyQh1YgJw8I9F/FDH/1fojhBZq/
JJpZrh+TFikBBM4RDMJ0aQi6ssOEo8S9gfN4W48F/49O4S3Q/Gbgv7Lk0jL8utRz
7RgGUVxX+xOSklvh2Tn/xHdYPeebPhLojiKhmH+l6xghyDEUHkemF3AkLwV9QMnq
LXmNP4y100xcdCW1bLbyVcq0lbwdLSg4SL+2wC2bvgEDRR0MUezQyNxD6Oqrmusn
sASZYgNK92R9ekLBqTX/QwV/XIT+17hclTk4u0eV8GnemnibqOq7DhDqtKyeAzbD
mfU6Z5Ku6LuXA1U+2w1jxnd4cJTacA+dCRKcQD91ReguBbCd6zOweB996iBNLucK
Kf+r6qWWLxt+JmhSexb/T+oQHsdgvIPSQXNHUH2W8w+2DdTB/EPcSL76DlbZUxrP
up4EC3Nr3oxJjHbv7Iq4d9mHuRlwoOOpNJ3mARlfRDL6iuL0zECTweST3qT9YyIH
Cz4FGj7kwEDTxGtufoTVia+/JmS39f2lBrMKuhbTVo+qcYhs3zJM4Ki9bAgOKXqI
Qf+I73x9yQZ182afq4DsRXLnq/BajmRMyX2/kebY8KsARzRsPAktBhsT17SI6tUG
3MiLiHiIb0qM
=thBq
-----END PGP SIGNATURE-----
Merge tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from xfrm and can.
Current release - regressions:
- rxrpc: restore removed timer deletion
Current release - new code bugs:
- gre: fix device lookup for l3mdev use-case
- xfrm: fix egress device lookup for l3mdev use-case
Previous releases - regressions:
- sched: cls_u32: fix netns refcount changes in u32_change()
- smc: fix sock leak when release after smc_shutdown()
- xfrm: limit skb_page_frag_refill use to a single page
- eth: atlantic: invert deep par in pm functions, preventing null
derefs
- eth: stmmac: use readl_poll_timeout_atomic() in atomic state
Previous releases - always broken:
- gre: fix skb_under_panic on xmit
- openvswitch: fix OOB access in reserve_sfa_size()
- dsa: hellcreek: calculate checksums in tagger
- eth: ice: fix crash in switchdev mode
- eth: igc:
- fix infinite loop in release_swfw_sync
- fix scheduling while atomic"
* tag 'net-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (37 commits)
drivers: net: hippi: Fix deadlock in rr_close()
selftests: mlxsw: vxlan_flooding_ipv6: Prevent flooding of unwanted packets
selftests: mlxsw: vxlan_flooding: Prevent flooding of unwanted packets
nfc: MAINTAINERS: add Bug entry
net: stmmac: Use readl_poll_timeout_atomic() in atomic state
doc/ip-sysctl: add bc_forwarding
netlink: reset network and mac headers in netlink_dump()
net: mscc: ocelot: fix broken IP multicast flooding
net: dsa: hellcreek: Calculate checksums in tagger
net: atlantic: invert deep par in pm functions, preventing null derefs
can: isotp: stop timeout monitoring when no first frame was sent
bonding: do not discard lowest hash bit for non layer3+4 hashing
net: lan966x: Make sure to release ptp interrupt
ipv6: make ip6_rt_gc_expire an atomic_t
net: Handle l3mdev in ip_tunnel_init_flow
l3mdev: l3mdev_master_upper_ifindex_by_index_rcu should be using netdev_master_upper_dev_get_rcu
net/sched: cls_u32: fix possible leak in u32_init_knode()
net/sched: cls_u32: fix netns refcount changes in u32_change()
powerpc: Update MAINTAINERS for ibmvnic and VAS
net: restore alpha order to Ethernet devices in config
...
When compiling kvm_page_table_test.c, I get this compiler warning
with gcc 11.2:
kvm_page_table_test.c: In function 'pre_init_before_test':
../../../../tools/include/linux/kernel.h:44:24: warning: comparison of
distinct pointer types lacks a cast
44 | (void) (&_max1 == &_max2); \
| ^~
kvm_page_table_test.c:281:21: note: in expansion of macro 'max'
281 | alignment = max(0x100000, alignment);
| ^~~
Fix it by adjusting the type of the absolute value.
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20220414103031.565037-1-thuth@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Switching to libbpf 1.0 API broke test_lpm_map and test_lru_map as error
reporting changed. Instead of setting errno and returning -1 bpf calls
now return -Exxx directly.
Drop errno checks and look at return code directly.
Fixes: b858ba8c52 ("selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20220421094320.1563570-1-asavkov@redhat.com
I am getting the following compilation error for prog_tests/uprobe_autoattach.c:
tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
The value of mem is now used in one of the asserts, which is why it may be
confusing compilers. However, it is not dereferenced. Silence this by moving
free(mem) after the assert block.
Fixes: 1717e24801 ("selftests/bpf: Uprobe tests should verify param/return values")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132317.1583867-1-asavkov@redhat.com
Switching to libbpf 1.0 API broke test_sock and test_sysctl as they
check for return of bpf_prog_attach to be exactly -1. Switch the check
to '< 0' instead.
Fixes: b858ba8c52 ("selftests/bpf: Use libbpf 1.0 API mode instead of RLIMIT_MEMLOCK")
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/bpf/20220421130104.1582053-1-asavkov@redhat.com
This adds documentation for the following API functions:
- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
- bpf_program__set_attach_target()
- bpf_program__attach()
- bpf_program__pin()
- bpf_program__unpin()
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-3-grantseltzer@gmail.com
This updates usage of the following API functions within
libbpf so their newly added error return is checked:
- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-2-grantseltzer@gmail.com
This adds an error return to the following API functions:
- bpf_program__set_expected_attach_type()
- bpf_program__set_type()
In both cases, the error occurs when the BPF object has
already been loaded when the function is called. In this
case -EBUSY is returned.
Signed-off-by: Grant Seltzer <grantseltzer@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220420161226.86803-1-grantseltzer@gmail.com
These functions are currently only available on architectures that have
my_syscall6() macro implemented. Since these functions use malloc(),
malloc() uses mmap(), mmap() depends on my_syscall6() macro.
On architectures that don't support my_syscall6(), these function will
always return NULL with errno set to ENOSYS.
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
size_t strnlen(const char *str, size_t maxlen);
The strnlen() function returns the number of bytes in the string
pointed to by sstr, excluding the terminating null byte ('\0'), but at
most maxlen. In doing this, strnlen() looks only at the first maxlen
characters in the string pointed to by str and never beyond str[maxlen-1].
The first use case of this function is for determining the memory
allocation size in the strndup() function.
Link: https://lore.kernel.org/lkml/CAOG64qMpEMh+EkOfjNdAoueC+uQyT2Uv3689_sOr37-JxdJf4g@mail.gmail.com
Suggested-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Implement basic dynamic allocator functions. These functions are
currently only available on architectures that have nolibc mmap()
syscall implemented. These are not a super-fast memory allocator,
but at least they can satisfy basic needs for having heap without
libc.
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Implement `offsetof()` and `container_of()` macro. The first use case
of these macros is for `malloc()`, `realloc()` and `free()`.
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Implement mmap() and munmap(). Currently, they are only available for
architecures that have my_syscall6 macro. For architectures that don't
have, this function will return -1 with errno set to ENOSYS (Function
not implemented).
This has been tested on x86 and i386.
Notes for i386:
1) The common mmap() syscall implementation uses __NR_mmap2 instead
of __NR_mmap.
2) The offset must be shifted-right by 12-bit.
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
On i386, the 6th argument of syscall goes in %ebp. However, both Clang
and GCC cannot use %ebp in the clobber list and in the "r" constraint
without using -fomit-frame-pointer. To make it always available for
any kind of compilation, the below workaround is implemented.
1) Push the 6-th argument.
2) Push %ebp.
3) Load the 6-th argument from 4(%esp) to %ebp.
4) Do the syscall (int $0x80).
5) Pop %ebp (restore the old value of %ebp).
6) Add %esp by 4 (undo the stack pointer).
Cc: x86@kernel.org
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/lkml/2e335ac54db44f1d8496583d97f9dab0@AcuMS.aculab.com
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Building with clang yields the following error:
```
<inline asm>:3:1: error: _start changed binding to STB_GLOBAL
.global _start
^
1 error generated.
```
Make sure only specify one between `.global _start` and `.weak _start`.
Remove `.global _start`.
Cc: llvm@lists.linux.dev
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Replace `asm` with `__asm__` to support compilation with -std flag.
Using `asm` with -std flag makes GCC think `asm()` is a function call
instead of an inline assembly.
GCC doc says:
For the C language, the `asm` keyword is a GNU extension. When
writing C code that can be compiled with `-ansi` and the `-std`
options that select C dialects without GNU extensions, use
`__asm__` instead of `asm`.
Link: https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html
Reported-by: Alviro Iskandar Setiawan <alviro.iskandar@gnuweeb.org>
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The old link no longer works, update it.
Acked-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
When building with gcc at -O0 we're seeing link errors due to the
"environ" variable being referenced by getenv(). The problem is that
at -O0 gcc will not inline getenv() and will not drop the external
reference. One solution would be to locally declare the variable as
weak, but then it would appear in all programs even those not using
it, and would be confusing to users of getenv() who would forget to
set environ to envp.
An alternate approach used in this patch consists in always inlining
the outer part of getenv() that references this extern so that it's
always dropped when not used. The biggest part of the function was
now moved to a new function called _getenv() that's still not inlined
by default.
Reported-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Tested-by: Ammar Faizi <ammarfaizi2@gnuweeb.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
clang wants to use strlen() for __builtin_strlen() at -O0. We don't
really care about -O0 but it at least ought to build, so let's make
sure we don't choke on this, by dropping the optimizationn for
constant strings in this case.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The Makefile in tools/ is used to forward options to the makefiles
in the various subdirs. Let's add nolibc there so that it becomes
possible to make tools/nolibc_headers_standalone from the main tree
to simply create a completely usable sysroot.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This provides a target "headers_standalone" which installs the nolibc's
arch-specific headers with "arch.h" taken from the current arch (or a
concatenation of both i386 and x86_64 for arch=x86), then installs
kernel headers. This creates a convenient sysroot which is directly
usable by a bare-metal compiler to create any executable.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
- POLLIN etc were missing, so poll() could only be used with timeouts.
- WNOHANG was not defined and is convenient to check if a child is still
running
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This is essentially for completeness as it's not the most often used
in regtests.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We need these functions all the time, including when checking environment
variables and parsing command-line arguments. These implementations were
optimized to show optimal code size on a wide range of compilers (22 bytes
return included for strcmp(), 33 for strncmp()).
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
%p remains quite useful in test code, and the code path can easily be
merged with the existing "%x" thus only adds ~50 bytes, thus let's
add it.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This implementation relies on an extern definition of the environ
variable, that the caller must declare and initialize from envp.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
It's often convenient to support this, especially in test programs where
a NULL may correspond to an allocation error or a non-existing value.
Let's make printf("%s") support being passed a NULL. In this case it
prints "(null)" like glibc's printf().
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
libgcc uses it for certain divide functions, so it must be exported. Like
for memset() we do that in its own section so that the linker can strip
it when not needed.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Now that a few basic include files are provided, some simple portable
programs may build, which will save them from having to surround their
includes with #ifndef NOLIBC. This patch mentions how to proceed, and
enumerates the list of files that are covered.
A comprehensive list of required include files is available here:
https://en.cppreference.com/w/c/header
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The time() syscall is used by a few simple applications, and is trivial
to implement based on gettimeofday() that we already have. Let's create
the file to ease porting and provide the function. It never returns any
error, though it may segfault in case of invalid pointer, like other
implementations relying on gettimeofday().
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This function is normally found in signal.h, and providing the file
eases porting of existing programs. Let's move it there.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This call is trivial to implement based on select() to complete sleep()
and msleep(), let's add it.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These functions are normally provided by unistd.h. For ease of porting,
let's create the file and move them there.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This allows us to provide a minimal errno.h to ease porting applications
that use it.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
"clang -Os" and "gcc -Ofast" without -ffreestanding may ignore memset()
and memmove(), hoping to provide their builtin equivalents, and finally
not find them. Thus we must export these functions for these rare cases.
Note that as they're set in their own sections, they will be eliminated
by the linker if not used. In addition, they do not prevent gcc from
identifying them and replacing them with the shorter "rep movsb" or
"rep stosb" when relevant.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These ones are often used and commonly set by applications to fallback
values. Let's fix them both to agree on PATH_MAX=4096 by default, as is
already present in linux/limits.h.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
By doing so we can link together multiple C files that have been compiled
with nolibc and which each have a _start symbol.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Some functions like raise() and memcpy() are permanently exported because
they're needed by libgcc on certain platforms. However most of the time
they are not needed and needlessly take space.
Let's move them to their own sub-section, called .text.nolibc_<function>.
This allows ld to get rid of them if unused when passed --gc-sections.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
While these functions are often dangerous, forcing the user to work
around their absence is often much worse. Let's provide small versions
of each of them. The respective sizes in bytes on a few architectures
are:
strncat(): x86:0x33 mips:0x68 arm:0x3c
strlcat(): x86:0x25 mips:0x4c arm:0x2c
The two are quite different, and strncat() is even different from
strncpy() in that it limits the amount of data it copies and will always
terminate the output by one zero, while strlcat() will always limit the
total output to the specified size and will put a zero if possible.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These are minimal variants. strncpy() always fills the destination for
<size> chars, while strlcpy() copies no more than <size> including the
zero and returns the source's length. The respective sizes on various
archs are:
strncpy(): x86:0x1f mips:0x30 arm:0x20
strlcpy(): x86:0x17 mips:0x34 arm:0x1a
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The direction test inside the loop was not always completely optimized,
resulting in a larger than necessary function. This change adds a
direction variable that is set out of the loop. Now the function is down
to 48 bytes on x86, 32 on ARM and 68 on mips. It's worth noting that other
approaches were attempted (including relying on the up and down functions)
but they were only slightly beneficial on x86 and cost more on others.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Till now memcpy() relies on memmove(), but it's always included for libgcc,
so we have a larger than needed function. Let's implement two unidirectional
variants to copy from bottom to top and from top to bottom, and use the
former for memcpy(). The variants are optimized to be compact, and at the
same time the compiler is sometimes able to detect the loop and to replace
it with a "rep movsb". The new function is 24 bytes instead of 52 on x86_64.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These syscalls never fail so there is no need to extract and set errno
for them.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
raise() doesn't set errno, so there's no point calling kill(), better
call sys_kill(), which also reduces the function's size.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The build of printf() on mips requires libgcc for functions __ashldi3 and
__lshrdi3 due to 64-bit shifts when scanning the input number. These are
not really needed in fact since we scan the number 4 bits at a time. Let's
arrange the loop to perform two 32-bit shifts instead on 32-bit platforms.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Let's pass a vararg to open() so that it remains compatible with existing
code. The arg is only dereferenced when flags contain O_CREAT. The function
is generally not inlined anymore, causing an extra call (total 16 extra
bytes) but it's still optimized for constant propagation, limiting the
excess to no more than 16 bytes in practice when open() is called without
O_CREAT, and ~40 with O_CREAT, which remains reasonable.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
It doesn't contain the text for the error codes, but instead displays
"errno=" followed by the errno value. Just like the regular errno, if
a non-empty message is passed, it's placed followed with ": " on the
output before the errno code. The message is emitted on stderr.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These ones are found in some examples found in man pages and ease
portability tests.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This adds a minimal vfprintf() implementation as well as the commonly
used fprintf() and printf() that rely on it.
For now the function supports:
- formats: %s, %c, %u, %d, %x
- modifiers: %l and %ll
- unknown chars are considered as modifiers and are ignored
It is designed to remain minimalist, despite this printf() is 549 bytes
on x86_64. It would be wise not to add too many formats.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
We'll use it to write substrings. It relies on a simpler _fwrite() that
only takes one size. fputs() was also modified to rely on it.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The standard puts() function always emits the trailing LF which makes it
unconvenient for small string concatenation. fputs() ought to be used
instead but it requires a FILE*.
This adds 3 dummy FILE* values (stdin, stdout, stderr) which are in fact
pointers to struct FILE of one byte. We reserve 3 pointer values for them,
-3, -2 and -1, so that they are ordered, easing the tests and mapping to
integer.
>From this, fgetc(), fputc(), fgets() and fputs() were implemented, and
the previous putchar() and getchar() now remap to these. The standard
getc() and putc() macros were also implemented as pointing to these
ones.
There is absolutely no buffering, fgetc() and fgets() read one byte at
a time, fputc() writes one byte at a time, and only fputs() which knows
the string's length writes all of it at once.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
These are 64-bit variants of the itoa() and utoa() functions. They also
support reentrant ones, and use the same itoa_buffer. The functions are
a bit larger than the previous ones in 32-bit mode (86 and 98 bytes on
x86_64 and armv7 respectively), which is why we continue to provide them
as separate functions.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The original ltoa() function and the reentrant one ltoa_r() present a
number of drawbacks. The divide by 10 generates calls to external code
from libgcc_s, and the number does not necessarily start at the beginning
of the buffer.
Let's rewrite these functions so that they do not involve a divide and
only use loops on powers of 10, and implement both signed and unsigned
variants, always starting from the buffer's first character. Instead of
using a static buffer for each function, we're now using a common one.
In order to avoid confusion with the ltoa() name, the new functions are
called itoa_r() and utoa_r() to distinguish the signed and unsigned
versions, and for convenience for their callers, these functions now
reutrn the number of characters emitted. The ltoa_r() function is just
an inline mapping to the signed one and which returns the buffer.
The functions are quite small (86 bytes on x86_64, 68 on armv7) and
do not depend anymore on external code.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This function is not standard and performs the opposite of atol(). Let's
move it with atol(). It's been split between a reentrant function and one
using a static buffer.
There's no more definition in nolibc.h anymore now.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The makedev() man page says it's supposed to be a macro and that some
OSes have it with the other ones in sys/types.h so it now makes sense
to move it to types.h as a macro. Let's also define major() and
minor() that perform the reverse operation.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The macro was hard-coded to 256 but it's common to see it redefined.
Let's support this and make sure we always allocate enough entries for
the cases where it wouldn't be multiple of 32.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
FD_SET, FD_CLR, FD_ISSET, FD_ZERO are often expected to be macros and
not functions. In addition we already have a file dedicated to such
macros and types used by syscalls, it's types.h, so let's move them
there and turn them to macros. FD_CLR() and FD_ISSET() were missing,
so they were added. FD_ZERO() now deals with its own loop so that it
doesn't rely on memset() that sets one byte at a time.
Cc: David Laight <David.Laight@aculab.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
In fact there's only isdigit() for now. More should definitely be added.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The string manipulation functions (mem*, str*) are now found in
string.h. The file depends on almost nothing and will be
usable from other includes if needed. Maybe more functions could
be added.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The new file stdlib.h contains the definitions of functions that
are usually found in stdlib.h. Many more could certainly be added.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The syscall definitions were moved to sys.h. They were arranged
in a more easily maintainable order, whereby the sys_xxx() and xxx()
functions were grouped together, which also enlights the occasional
mappings such as wait relying on wait4().
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
In order to ease maintenance, this splits the arch-specific code into
one file per architecture. A common file "arch.h" is used to include the
right file among arch-* based on the detected architecture. Projects
which are already split per architecture could simply rename these
files to $arch/arch.h and get rid of the common arch.h. For this
reason, include guards were placed into each arch-specific file.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The macros and type definitions used by a number of syscalls were moved
to types.h where they will be easier to maintain. A few of them
are arch-specific and must not be moved there (e.g. O_*, sys_stat_struct).
A warning about them was placed at the top of the file.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The ordering of includes and definitions for now is a bit of a mess, as
for example asm/signal.h is included after int definitions, but plenty of
structures are defined later as they rely on other includes.
Let's move the standard type definitions to a dedicated file that is
included first. We also move NULL there. This way all other includes
are aware of it, and we can bring asm/signal.h back to the top of the
file.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The torture.sh script provides extra memory for scftorture and rcuscale.
However, the total memory provided is only 1G, which is less than the
2G that is required for KASAN testing. This commit therefore ups the
torture.sh script's 1G to 2G.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Now that the Tasks RCU flavors are selected by their users rather than
by the rcutorture scenarios, torture.sh fails when attempting to run
NOPREEMPT scenarios for refscale and rcuscale. This commit therefore
makes torture.sh specify CONFIG_TASKS_TRACE_RCU=y to avoid such failure.
Why not also CONFIG_TASKS_RCU? Because tracing selects this one.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
KASAN allots significant memory to track allocation state, and the amount
of memory has increased recently, which results in frequent OOMs on a
few of the rcutorture scenarios. This commit therefore provides 2G of
memory for --kasan runs, up from the 512M default.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, torture.sh saves only the build output and exit code from the
"make allmodconfig" test. This commit also saves the .config file.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
There is an extraneous "scf" in the per_version_boot_params shell function
used by scftorture. No harm done in that it is just passed as an argument
to the /init program in initrd, but this commit nevertheless removes it.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Now that CONFIG_PREEMPT_DYNAMIC=y is the default, kernels that are
ostensibly built with CONFIG_PREEMPT_NONE=y or CONFIG_PREEMPT_VOLUNTARY=y
are now actually built with CONFIG_PREEMPT=y, but are by default booted
so as to disable preemption. Although this allows much more flexibility
from a single kernel binary, it means that the current rcutorture
scenarios won't find build errors that happen only when preemption is
fully disabled at build time.
This commit therefore adds CONFIG_PREEMPT_DYNAMIC=n to several scenarios,
and while in the area switches one from CONFIG_PREEMPT_NONE=y to
CONFIG_PREEMPT_VOLUNTARY=y to add coverage of this Kconfig option.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit passes the csdlock_debug=1 kernel parameter in order to
enable CSD-lock stall reports for torture.sh scftorure runs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The kvm-again.sh script reruns an previously built set of kernels, so
the vmlinux files are associated with that previous run, not this on.
This results in kvm-find_errors.sh reporting spurious failed-build errors.
This commit therefore omits the vmlinux check for kvm-again.sh runs.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit adjusts the scftorture PREEMPT and NOPREEMPT scenarios to
account for the TASKS_RCU Kconfig option being explicitly selected rather
than computed in isolation.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for
RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds rcuscale,
whether built-in or as a module, in which case these RCU Tasks flavors are
(unnecessarily) built in. This both increases kernel size and increases
the complexity of certain tracing operations. This commit therefore
decouples the presence of rcuscale from the presence of RCU Tasks Rude
and RCU Tasks Trace.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for
RCU Tasks. Unless that kernel builds rcuscale, whether built-in or as
a module, in which case RCU Tasks is (unnecessarily) built. This both
increases kernel size and increases the complexity of certain tracing
operations. This commit therefore decouples the presence of rcuscale
from the presence of RCU Tasks.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for
RCU Tasks Rude and RCU Tasks Trace. Unless that kernel builds refscale,
whether built-in or as a module, in which case these RCU Tasks flavors are
(unnecessarily) built in. This both increases kernel size and increases
the complexity of certain tracing operations. This commit therefore
decouples the presence of refscale from the presence of RCU Tasks Rude
and RCU Tasks Trace.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for
RCU Tasks. Unless that kernel builds refscale, whether built-in or as a
module, in which case RCU Tasks is (unnecessarily) built in. This both
increases kernel size and increases the complexity of certain tracing
operations. This commit therefore decouples the presence of refscale
from the presence of RCU Tasks.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The rcutorture test suite makes double use of the rcutorture.stat_interval
module parameter. As its name suggests, it controls the frequency
of statistics printing, but it also controls the rcu_torture_writer()
stall timeout. The current setting of 15 seconds works surprisingly well.
However, given that the RCU tasks stall-warning timeout is ten -minutes-,
15 seconds is too short for TASKS02, which runs a non-preemptible kernel
on a single CPU.
This commit therefore adds checks for per-scenario specification of the
rcutorture.stat_interval module parameter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Now that CONFIG_PREEMPT_DYNAMIC=y is the default, TASKS02 no longer
builds a pure non-preemptible kernel that uses Tiny RCU. This commit
therefore fixes this new hole in rcutorture testing by adding
CONFIG_PREEMPT_DYNAMIC=n to the TASKS02 rcutorture scenario.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Unless a kernel builds rcutorture, whether built-in or as a module, that
kernel is also built with CONFIG_TASKS_RUDE_RCU, whether anything else
needs Tasks Rude RCU or not. This unnecessarily increases kernel size.
This commit therefore decouples the presence of rcutorture from the
presence of RCU Tasks Rude.
However, there is a need to select CONFIG_TASKS_RUDE_RCU for testing
purposes. Except that casual users must not be bothered with
questions -- for them, this needs to be fully automated. There is
thus a CONFIG_FORCE_TASKS_RUDE_RCU that selects CONFIG_TASKS_RUDE_RCU,
is user-selectable, but which depends on CONFIG_RCU_EXPERT.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, a CONFIG_PREEMPT_NONE=y kernel substitutes normal RCU for
RCU Tasks. Unless that kernel builds rcutorture, whether built-in or as
a module, in which case RCU Tasks is (unnecessarily) used. This both
increases kernel size and increases the complexity of certain tracing
operations. This commit therefore decouples the presence of rcutorture
from the presence of RCU Tasks.
However, there is a need to select CONFIG_TASKS_RCU for testing purposes.
Except that casual users must not be bothered with questions -- for them,
this needs to be fully automated. There is thus a CONFIG_FORCE_TASKS_RCU
that selects CONFIG_TASKS_RCU, is user-selectable, but which depends
on CONFIG_RCU_EXPERT.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Unless a kernel builds rcutorture, whether built-in or as a module, that
kernel is also built with CONFIG_TASKS_TRACE_RCU, whether anything else
needs Tasks Trace RCU or not. This unnecessarily increases kernel size.
This commit therefore decouples the presence of rcutorture from the
presence of RCU Tasks Trace.
However, there is a need to select CONFIG_TASKS_TRACE_RCU for
testing purposes. Except that casual users must not be bothered with
questions -- for them, this needs to be fully automated. There is thus
a CONFIG_FORCE_TASKS_TRACE_RCU that selects CONFIG_TASKS_TRACE_RCU,
is user-selectable, but which depends on CONFIG_RCU_EXPERT.
[ paulmck: Apply kernel test robot feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, any kernel built with CONFIG_PREEMPTION=y also gets
CONFIG_TASKS_RCU=y, which is not helpful to people trying to build
preemptible kernels of minimal size.
Because CONFIG_TASKS_RCU=y is needed only in kernels doing tracing of
one form or another, this commit moves from TASKS_RCU deciding when it
should be enabled to the tracing Kconfig options explicitly selecting it.
This allows building preemptible kernels without TASKS_RCU, if desired.
This commit also updates the SRCU-N and TREE09 rcutorture scenarios
in order to avoid Kconfig errors that would otherwise result from
CONFIG_TASKS_RCU being selected without its CONFIG_RCU_EXPERT dependency
being met.
[ paulmck: Apply BPF_SYSCALL feedback from Andrii Nakryiko. ]
Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Use bpf_prog_test_run_opts to test the skb_load_bytes function. Tests
the behavior when offset is greater than INT_MAX or a normal value.
Signed-off-by: Liu Jian <liujian56@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20220416105801.88708-4-liujian56@huawei.com
It bothered me that during benchmarking using 'perf stat' (to collect
for example CPU cache events) I could not simultaneously retrieve the
times spend in user or kernel mode in a machine readable format.
When running 'perf stat' the output for humans contains the times
reported by rusage and wait4.
$ perf stat -e cache-misses:u -- true
Performance counter stats for 'true':
4,206 cache-misses:u
0.001113619 seconds time elapsed
0.001175000 seconds user
0.000000000 seconds sys
But 'perf stat's machine-readable format does not provide this information.
$ perf stat -x, -e cache-misses:u -- true
4282,,cache-misses:u,492859,100.00,,
I found no way to retrieve this information using the available events
while using machine-readable output.
This patch adds two new tool internal events 'user_time' and
'system_time', similarly to the already present 'duration_time' event.
Both events use the already collected rusage information obtained by
wait4 and tracked in the global ru_stats.
Examples presenting cache-misses and rusage information in both human
and machine-readable form:
$ perf stat -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .
Performance counter stats for 'grep -q -r duration_time .':
67,422,542 ns duration_time:u
50,517,000 ns user_time:u
16,839,000 ns system_time:u
30,937 cache-misses:u
0.067422542 seconds time elapsed
0.050517000 seconds user
0.016839000 seconds sys
$ perf stat -x, -e duration_time,user_time,system_time,cache-misses -- grep -q -r duration_time .
72134524,ns,duration_time:u,72134524,100.00,,
65225000,ns,user_time:u,65225000,100.00,,
6865000,ns,system_time:u,6865000,100.00,,
38705,,cache-misses:u,71189328,100.00,,
Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-3-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is preparation for exporting rusage values as tool events.
Add new global stats tracking the values obtained via rusage.
For now only ru_utime and ru_stime are part of the tracked stats.
Both are stored as nanoseconds to be consistent with 'duration_time',
although the finest resolution the struct timeval data in rusage
provides are microseconds.
Signed-off-by: Florian Fischer <florian.fischer@muhq.space>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220420102354.468173-2-florian.fischer@muhq.space
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When one requests debuginfod, either via --debuginfod option, or with a
perf-config value, complain when perf is not built with it.
Signed-off-by: Martin Liška <mliska@suse.cz>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/35bae747-3951-dc3d-a66b-abf4cebcd9cb@suse.cz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test verifies that packets are correctly flooded by the bridge and
the VXLAN device by matching on the encapsulated packets at the other
end. However, if packets other than those generated by the test also
ingress the bridge (e.g., MLD packets), they will be flooded as well and
interfere with the expected count.
Make the test more robust by making sure that only the packets generated
by the test can ingress the bridge. Drop all the rest using tc filters
on the egress of 'br0' and 'h1'.
In the software data path, the problem can be solved by matching on the
inner destination MAC or dropping unwanted packets at the egress of the
VXLAN device, but this is not currently supported by mlxsw.
Fixes: d01724dd2a ("selftests: mlxsw: spectrum-2: Add a test for VxLAN flooding with IPv6")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test verifies that packets are correctly flooded by the bridge and
the VXLAN device by matching on the encapsulated packets at the other
end. However, if packets other than those generated by the test also
ingress the bridge (e.g., MLD packets), they will be flooded as well and
interfere with the expected count.
Make the test more robust by making sure that only the packets generated
by the test can ingress the bridge. Drop all the rest using tc filters
on the egress of 'br0' and 'h1'.
In the software data path, the problem can be solved by matching on the
inner destination MAC or dropping unwanted packets at the egress of the
VXLAN device, but this is not currently supported by mlxsw.
Fixes: 94d302deae ("selftests: mlxsw: Add a test for VxLAN flooding")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add riscv-specific USDT argument specification parsing logic.
riscv USDT argument format is shown below:
- Memory dereference case:
"size@off(reg)", e.g. "-8@-88(s0)"
- Constant value case:
"size@val", e.g. "4@5"
- Register read case:
"size@reg", e.g. "-8@a1"
s8 will be marked as poison while it's a reg of riscv, we need
to alias it in advance. Both RV32 and RV64 have been tested.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220419145238.482134-3-pulehui@huawei.com
The usdt_cookie is defined as __u64, which should not be
used as a long type because it will be cast to 32 bits
in 32-bit platforms.
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220419145238.482134-2-pulehui@huawei.com
Drop duplicate macro min() definition in mq_perf_tests.c, use MIN() in
sys/param.h instead.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This is the mips variant of commit <3990b5baf225> ("selftests/ftrace:
Add s390 support for kprobe args tests").
Signed-off-by: Ze Zhang <zhangze@loongson.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
This is the mips variant of commit <3990b5baf225> ("selftests/ftrace:
Add s390 support for kprobe args tests").
Signed-off-by: Ze Zhang <zhangze@loongson.cn>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Add a few test cases that ensure we catch cases of badly ordered type
tags in modifier chains.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220419164608.1990559-3-memxor@gmail.com
Take advantage of new libbpf feature for declarative non-autoloaded BPF
program SEC() definitions in few test that test single program at a time
out of many available programs within the single BPF object.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220419002452.632125-2-andrii@kernel.org
Establish SEC("?abc") naming convention (i.e., adding question mark in
front of otherwise normal section name) that allows to set corresponding
program's autoload property to false. This is effectively just
a declarative way to do bpf_program__set_autoload(prog, false).
Having a way to do this declaratively in BPF code itself is useful and
convenient for various scenarios. E.g., for testing, when BPF object
consists of multiple independent BPF programs that each needs to be
tested separately. Opting out all of them by default and then setting
autoload to true for just one of them at a time simplifies testing code
(see next patch for few conversions in BPF selftests taking advantage of
this new feature).
Another real-world use case is in libbpf-tools for cases when different
BPF programs have to be picked depending on particulars of the host
kernel due to various incompatible changes (like kernel function renames
or signature change, or to pick kprobe vs fentry depending on
corresponding kernel support for the latter). Marking all the different
BPF program candidates as non-autoloaded declaratively makes this more
obvious in BPF source code and allows simpler code in user-space code.
When BPF program marked as SEC("?abc") it is otherwise treated just like
SEC("abc") and bpf_program__section_name() reported will be "abc".
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220419002452.632125-1-andrii@kernel.org
Objtool's function fallthrough detection only works on C objects.
The distinction between C and assembly objects no longer makes sense
with objtool running on vmlinux.o.
Now that copy_user_64.S has been fixed up, and an objtool sibling call
detection bug has been fixed, the asm code is in "compliance" and this
hack is no longer needed. Remove it.
Fixes: ed53a0d971 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/b434cff98eca3a60dcc64c620d7d5d405a0f441c.1649718562.git.jpoimboe@redhat.com
In add_jump_destinations(), sibling call detection requires 'insn->func'
to be valid. But alternative instructions get their 'func' set in
handle_group_alt(), which runs *after* add_jump_destinations(). So
sibling calls in alternatives code don't get properly detected.
Fix that by changing the initialization order: call
add_special_section_alts() *before* add_jump_destinations().
This also means the special case for a missing 'jump_dest' in
add_jump_destinations() can be removed, as it has already been dealt
with.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/c02e0a0a2a4286b5f848d17c77fdcb7e0caf709c.1649718562.git.jpoimboe@redhat.com
For most sibling calls, 'jump_dest' is NULL because objtool treats the
jump like a call and sets 'call_dest'. But there are a few edge cases
where that's not true. Make it consistent to avoid unexpected behavior.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/8737d6b9d1691831aed73375f444f0f42da3e2c9.1649718562.git.jpoimboe@redhat.com
When a "!ENDBR" warning is reported for a data section, objtool just
prints the text address of the relocation target twice, without giving
any clues about the location of the original data reference:
vmlinux.o: warning: objtool: dcbnl_netdevice_event()+0x0: .text+0xb64680: data relocation to !ENDBR: dcbnl_netdevice_event+0x0
Instead, print the address of the data reference, in addition to the
address of the relocation target.
vmlinux.o: warning: objtool: dcbnl_nb+0x0: .data..read_mostly+0xe260: data relocation to !ENDBR: dcbnl_netdevice_event+0x0
Fixes: 89bc853eae ("objtool: Find unused ENDBR instructions")
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/762e88d51300e8eaf0f933a5b0feae20ac033bea.1650300597.git.jpoimboe@redhat.com
GCC-8 isn't clever enough to figure out that cpu_start_entry() is a
noreturn while objtool is. This results in code after the call in
start_secondary(). Give GCC a hand so that they all agree on things.
vmlinux.o: warning: objtool: start_secondary()+0x10e: unreachable
Reported-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Link: https://lore.kernel.org/r/20220408094718.383658532@infradead.org
The llvm patch [1] enabled opaque pointer which caused selftest
'exhandler' failure.
...
; work = task->task_works;
7: (79) r1 = *(u64 *)(r6 +2120) ; R1_w=ptr_callback_head(off=0,imm=0) R6_w=ptr_task_struct(off=0,imm=0)
; func = work->func;
8: (79) r2 = *(u64 *)(r1 +8) ; R1_w=ptr_callback_head(off=0,imm=0) R2_w=scalar()
; if (!work && !func)
9: (4f) r1 |= r2
math between ptr_ pointer and register with unbounded min value is not allowed
below is insn 10 and 11
10: (55) if r1 != 0 goto +5
11: (18) r1 = 0 ll
...
In llvm, the code generation of 'r1 |= r2' happened in codegen
selectiondag phase due to difference of opaque pointer vs. non-opaque pointer.
Without [1], the related code looks like:
r2 = *(u64 *)(r6 + 2120)
r1 = *(u64 *)(r2 + 8)
if r2 != 0 goto +6 <LBB0_4>
if r1 != 0 goto +5 <LBB0_4>
r1 = 0 ll
...
I haven't found a good way in llvm to fix this issue. So let us workaround the
problem first so bpf CI won't be blocked.
[1] https://reviews.llvm.org/D123300
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220419050900.3136024-1-yhs@fb.com
Pull turbostat changes for 5.19 from Len Brown:
"Chen Yu (1):
tools/power turbostat: Support thermal throttle count print
Dan Merillat (1):
tools/power turbostat: fix dump for AMD cpus
Len Brown (5):
tools/power turbostat: tweak --show and --hide capability
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: be more useful as non-root
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: version 2022.04.16
Sumeet Pawnikar (2):
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: print power values upto three decimal
Zephaniah E. Loss-Cutler-Hull (2):
tools/power turbostat: Allow -e for all names.
tools/power turbostat: Allow printing header every N iterations"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: version 2022.04.16
tools/power turbostat: No build warnings with -Wextra
tools/power turbostat: be more useful as non-root
tools/power turbostat: fix ICX DRAM power numbers
tools/power turbostat: Support thermal throttle count print
tools/power turbostat: Allow printing header every N iterations
tools/power turbostat: Allow -e for all names.
tools/power turbostat: print power values upto three decimal
tools/power turbostat: Add Power Limit4 support
tools/power turbostat: fix dump for AMD cpus
tools/power turbostat: tweak --show and --hide capability
This is a pre-req to add separate logging for each subtest in
test_progs.
Move all the mutable test data to the test_result struct.
Move per-test init/de-init into the run_one_test function.
Consolidate data aggregation and final log output in
calculate_and_print_summary function.
As a side effect, this patch fixes double counting of errors
for subtests and possible duplicate output of subtest log
on failures.
Also, add prog_tests_framework.c test to verify some of the
counting logic.
As part of verification, confirmed that number of reported
tests is the same before and after the change for both parallel
and sequential test execution.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220418222507.1726259-1-mykolal@fb.com
Update the topic of BTCLEAR.ANY and add additional uncore event names
as per:
https://github.com/intel/event-converter-for-linux-perf/
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>1
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Update the topic of ASSISTS.ANY as per:
https://github.com/intel/event-converter-for-linux-perf/
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
JSON uncore events are generated for Skylake Server for v1.26
with events from:
https://download.01.org/perfmon/SKX/
New event names are added, that match the original JSON names,
due to an update to:
https://github.com/intel/event-converter-for-linux-perf/
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
JSON uncore events are generated for CascadeLake Server for v1.14 with
events from:
https://download.01.org/perfmon/CLX/
New event names are added, that match the original JSON names,
due to an update to:
https://github.com/intel/event-converter-for-linux-perf/
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Apply cstate fix from:
https://github.com/intel/event-converter-for-linux-perf/
so that metrics for cstates that exist on the particular architecture
are generated. This corrects issues with metric testing.
Also correct topic of ASSISTS.ANY event.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Apply cstate fix from:
https://github.com/intel/event-converter-for-linux-perf/
so that metrics for cstates that exist on the particular architecture
are generated. This corrects issues with metric testing.
Also correct topic of ASSISTS.ANY event.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Link: https://lore.kernel.org/r/20220413210503.3256922-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce basic line card manipulation which consists of provisioning,
unprovisioning and activation of a line card.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX to
cover all CPUs which allow to disable it.
- Disable TSX development mode at boot so that a microcode update which
provides TSX development mode does not suddenly make the system
vulnerable to TSX Asynchronous Abort.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJb5LYTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoVVbD/9cxZWkFctCiymedUZqLabkfpYSki65
MngdpCPzCNaaIdlp44lwCido5+gJsY9unXdm3OAUzLjv6SsxxpDr5njz1/C6TM1l
XmWjlkLEbG2QDPd1Ybd/lpYQORBmiukyo8v8x0yFT7ZzwvSddoDZAbeUtkQBrIin
sDTeExsewKzL2X5qXhttrHLHu1PYgurn4ThIrrG+eg2e4FNk6UUFUS3TOyMvzJDg
NWJ7N5pGy9YkR7CISq1q+qdnH55pGaUrgonDi2qBTt3EaH0fQtZP2ZtIOYr3O4nI
YCx6isrIiGUB6kSygofxmk4B+22CaUJXd2OcUxMZ/Th/a2aCK+35BtGVPXQGi6nU
d7m+ZWB7dShOiejFygS59ty+5L5kliKXYZfUASsq1CLoXH8K1xUwBMkbY5FQ2WH1
Ue4KUvjguNqsgSRAfeHdOi6B36oot0Xf9JO013Wm3V/r9hsGPtSOjWwFuVvT/euw
a9iFtruATxDssBxH/l0djCKnwwm5yuOt1OpyizcIMFnlCgRD06h/6zgAvsJK7c8d
dh6lC4D2mXP1e2wtEyZelve1tmRJ/FeReyG2V5FNU7m1mWYGm1rJZ4AEvnbrzcbC
ePwFva0lPu8GVKG6HRgHfR8PjuQ7TFmKPKytT7fboIqQpTIY+1Q75wYD4eXkSu8Q
/ltzXQz/8lz7bA==
=UQaW
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Two x86 fixes related to TSX:
- Use either MSR_TSX_FORCE_ABORT or MSR_IA32_TSX_CTRL to disable TSX
to cover all CPUs which allow to disable it.
- Disable TSX development mode at boot so that a microcode update
which provides TSX development mode does not suddenly make the
system vulnerable to TSX Asynchronous Abort"
* tag 'x86-urgent-2022-04-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsx: Disable TSX development mode at boot
x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
Add a new neighbour cache entry in STALE state for routers on receiving
an unsolicited (gratuitous) neighbour advertisement with
target link-layer-address option specified.
This is similar to the arp_accept configuration for IPv4.
A new sysctl endpoint is created to turn on this behaviour:
/proc/sys/net/ipv6/conf/interface/accept_unsolicited_na.
Signed-off-by: Arun Ajith S <aajith@arista.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't exit if used this way:
sudo setcap cap_sys_nice,cap_sys_rawio=+ep ./turbostat
sudo chmod +r /dev/cpu/*/msr
./turbostat
note: cap_sys_admin is now also needed for the perf IPC counter:
sudo setcap cap_sys_admin,cap_sys_nice,cap_sys_rawio=+ep ./turbostat
Reported-by: Artem S. Tashkinov <aros@gmx.com>
Reported-by: Toby Broom <tbroom@outlook.com>
Signed-off-by: Len Brown <len.brown@intel.com>
ICX (and its duplicates) require special hard-coded DRAM RAPL units,
rather than using the generic RAPL energy units.
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The turbostat data is collected by end user for power evaluationit. However
it looks like we are missing enough thermal context there. Already a couple of
time we found that power management developer asking something like this:
grep -r . /sys/devices/system/cpu/cpu*/thermal_throttle/*
Print the per core thermal throttle count so as to get suffificent thermal
context.
turbostat -i 5 -s Core,CPU,CoreThr
Core CPU CoreThr
- - 104
0 0 61
0 4
1 1 0
1 5
2 2 104
2 6
3 3 7
3 7
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This gives the ability to reprint the header every N iterations, so you
can ensure that a scrolling display always has the header visible
somewhere on the screen.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Currently, there are a number of variables which are displayed by
default, enabled with -e all, and listed by --list, but which you can
not give to --enable/-e.
So you can enable CPU0c1 (in the bic array), but you can't enable C1 or
C1% (not in the bic array, but exists in sysfs).
This runs counter to both the documentation and user expectations, and
it's just not very user friendly.
As such, the mechanism used by --hide has been duplicated, and is now
also used by --enable, so we can handle unknown names gracefully.
Note: One impact of this is that truly unknown fields given to --enable
will no longer generate errors, they will be silently ignored, as --hide
does.
Signed-off-by: Zephaniah E. Loss-Cutler-Hull <zephaniah@gmail.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Print power values upto three decimal places in watts.
Suggested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat --Dump exits early with status 243 (-13)
get_counters() calls get_msr_sum() on zen CPUS
for MSR_PKG_ENERGY_STAT, but per_cpu_msr_sum
has not been initialized.
Signed-off-by: Dan Merillat <git@dan.eginity.com>
Signed-off-by: Len Brown <len.brown@intel.com>
This Kselftest fixes update consists of a mqueue perf test memory leak
bug fix. mq_perf_tests fail to call CPU_FREE to free memory allocated
by CPU_SET.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmJZm/0ACgkQCwJExA0N
QxwZLQ//Z3yHo1MEsxwCh7qiPOrZhomWyWprpNYn6KHrHm9jPoUSld1M++DF/1PW
jEjJISy4WUFLDemFABvRop/K7FuLJxtpxWB8iI+zbhhoPqdM3rThzd10nHfpgKke
g8x5umdBuux1pNzBcekq8BT3dBEUHU9JxGMPvTsp8v0pr5xoKAwUenhhDCmARukJ
tUH83jLY4MKNI17Z8YIAVb9MjqPseMPruZuU25n8fXDkqqSAI/99ZeeNsLnSEQRo
1IA5Np0Qd+cUNFJ54Aqjb/nsGOc7ev0X2VSbWlrO+gBuX6GTvl7mKtxil2PIFz0p
OjsiG9RgWcuLQTtIgKvtfpfUjsZjW6YuhxjPq+K/3jARoPk0tzk6Im7VfqRKApdA
lWr41OEDMotaNbCFaQQyhOXe4N81n33skCs3d8xS94DJ6pAW51zFoJGiOihdobof
CLOnpYCaA63MMHruOQb0bB4sZGFFA5tczmUe+KENH42b8NjLdUmPhyNOw6P2rkeO
e34UrdysM1It3iJS+pKq4FiRwGuRcRw6HoK9c93/HOzG/t+l8HpDkfbsljxl2JDF
2VEiieYqITpGmhqgJvhqr0hoHhXiw/uFZzKzkO8+W/453AnO9deLy6HCiK/s7idz
GbI2wnd4LHPkysn/CtEJBlFtCF/XpW7N29Nbr7igHl6qfFWSjII=
=+5jP
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"A mqueue perf test memory leak bug fix.
mq_perf_tests failed to call CPU_FREE to free memory allocated by
CPU_SET"
* tag 'linux-kselftest-fixes-5.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
testing/selftests/mqueue: Fix mq_perf_tests to free the allocated cpu set
The 'perf bench numa' testcase fails on systems with more than 1K CPUs.
Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1
Snippet of code:
<<>>
perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed.
Aborted (core dumped)
<<>>
bind_to_node() uses "sched_getaffinity" to save the original cpumask and
this call is returning EINVAL ((invalid argument).
This happens because the default mask size in glibc is 1024. To
overcome this 1024 CPUs mask size limitation of cpu_set_t, change the
mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate
cpumask, CPU_ALLOC_SIZE for size.
Apart from fixing this for "orig_mask", apply same logic to "mask" as
well which is used to setaffinity so that mask size is large enough to
represent number of possible CPU's in the system.
sched_getaffinity is used in one more place in perf numa bench. It is in
"bind_to_cpu" function. Apply the same logic there also. Though
currently no failure is reported from there, it is ideal to change
getaffinity to work with such system configurations having CPU's more
than default mask size supported by glibc.
Also fix "sched_setaffinity" to use mask size which is large enough to
represent number of possible CPU's in the system.
Fixed all places where "bind_cpumask" which is part of "struct
thread_data" is used such that bind_cpumask works in all configuration.
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Perf numa bench test fails with error:
Testcase:
./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk
Failure snippet:
<<>>
Running 'numa/mem' benchmark:
# Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk"
perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed.
<<>>
The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list",
There is check to see if cpu number is greater than max cpu's possible
in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >=
g->p.nr_cpus) {".
But it could happen that system has say 48 CPU's, but only number of
online CPU's is 0-7. Other CPU's are offlined. Since "g->p.nr_cpus" is
48, so function will go ahead and set bit for CPU 8 also in cpumask (
td->bind_cpumask).
bind_to_cpumask function is called to set affinity using
sched_setaffinity and the cpumask. Since the CPU8 is not present, set
affinity will fail here with EINVAL.
Fix this issue by adding a check to make sure that, CPU's provided in
the input argument values are online before proceeding further and skip
the test. For this, include new helper function "is_cpu_online" in
"tools/perf/util/header.c".
Since "BIT(x)" definition will get included from header.h, remove
that from bench/numa.c
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220412164059.42654-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Per-thread mode doesn't have specific CPUs for events, add checks for
this case.
Minor fix to a pr_debug by Ian Rogers <irogers@google.com> to avoid an
out of bound array access.
Fixes: 7954f71689 ("perf record: Introduce thread affinity and mmap masks")
Reported-by: Ian Rogers <irogers@google.com>
Signed-off-by: Alexey Bayduraev <alexey.bayduraev@gmail.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220414014642.3308206-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The SPE integration in Perf has quite a few usability quirks that
can't be found by just reading the reference manual. So document this
and at the same time add a summary of the feature that is also hard to
find elsewhere.
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Signed-off-by: James Clark <james.clark@arm.com>
Co-authored-by: Al Grant <al.grant@arm.com>
Co-authored-by: Luke Dare <Luke.Dare@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220413084021.2556142-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_evsel::sample_id is an xyarray which can cause a segfault when
accessed beyond its size. e.g.
# perf record -e intel_pt// -C 1 sleep 1
Segmentation fault (core dumped)
#
That is happening because a dummy event is opened to capture text poke
events accross all CPUs, however the mmap logic is allocating according
to the number of user_requested_cpus.
In general, perf sometimes uses the evsel cpus to open events, and
sometimes the evlist user_requested_cpus. However, it is not necessary
to determine which case is which because the opened event file
descriptors are also in an xyarray, the size of whch can be used
to correctly allocate the size of the sample_id xyarray, because there
is one ID per file descriptor.
Note, in the affected code path, perf_evsel fd array is subsequently
used to get the file descriptor for the mmap, so it makes sense for the
xyarrays to be the same size there.
Fixes: d1a177595b ("libperf: Adopt perf_evlist__mmap()/munmap() from tools/perf")
Fixes: 246eba8e90 ("perf tools: Add support for PERF_RECORD_TEXT_POKE")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: stable@vger.kernel.org # 5.5+
Link: https://lore.kernel.org/r/20220413114232.26914-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
hashmap__new() returns ERR_PTR(-ENOMEM) when it fails, so we should use
IS_ERR() to check it in error handling path.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220413093302.2538128-1-lv.ruyi@zte.com.cn
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ACPICA commit 738d7b0726e6c0458ef93c0a01c0377490888d1e
Affects all source modules and utility signons.
Link: https://github.com/acpica/acpica/commit/738d7b07
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Build of intel-speed-select will fail if you run:
$ LDFLAGS="-Wl,--as-needed" /usr/bin/make V=1
...
gcc -O2 -Wall -g -D_GNU_SOURCE -Iinclude -I/usr/include/libnl3 -Wl,--as-needed -lnl-genl-3 -lnl-3 intel-speed-select-in.o -o intel-speed-select
/usr/bin/ld: intel-speed-select-in.o: in function `handle_event':
(...)/linux/tools/power/x86/intel-speed-select/hfi-events.c:189: undefined reference to `nlmsg_hdr'
...
In this case the problem is that order when linking matters when using
the flag -Wl,--as-needed, symbols not used at that point are discarded.
So since intel-speed-select-in.o comes after, at that point the
libraries/symbols are already discarded and then missing/undefined
references are reported.
To fix this, make sure we specify LDFLAGS after the object file.
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Link: https://lore.kernel.org/r/20220404210525.725611-1-herton@redhat.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Add boilerplate test loop in test to run all tests
in fib_rule_tests.sh
Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix incorrect debug message:
Attempting to add event pmu 'intel_pt' with '' that may result in
non-fatal errors
which always appears with perf record -vv and intel_pt e.g.
perf record -vv -e intel_pt//u uname
The message is incorrect because there will never be non-fatal errors.
Suppress the message if the PMU is 'selectable' i.e. meant to be
selected directly as an event.
Fixes: 4ac22b484d ("perf parse-events: Make add PMU verbose output clearer")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lore.kernel.org/lkml/20220411061758.2458417-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Miscellaneous bugfixes
* A small cleanup for the new workqueue code
* Documentation syntax fix
RISC-V:
* Remove hgatp zeroing in kvm_arch_vcpu_put()
* Fix alignment of the guest_hang() in KVM selftest
* Fix PTE A and D bits in KVM selftest
* Missing #include in vcpu_fp.c
ARM:
* Some PSCI fixes after introducing PSCIv1.1 and SYSTEM_RESET2
* Fix the MMU write-lock not being taken on THP split
* Fix mixed-width VM handling
* Fix potential UAF when debugfs registration fails
* Various selftest updates for all of the above
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJVtdMUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroO33QgAiPh80xUkYfnl8FVN440S5F7UOPQ2
Cs/PbroNoP+Oz2GoG07aaqnUkFFApeBE5S+VMu1zhRNAernqpreN64/Y2iNaz0Y6
+MbvEX0FhQRW0UZJIF2m49ilgO8Gkt6aEpVRulq5G9w4NWiH1PtR25FVXfDMi8OG
xdw4x1jwXNI9lOQJ5EpUKVde3rAbxCfoC6hCTh5pCNd9oLuVeLfnC+Uv91fzXltl
EIeBlV0/mAi3RLp2E/AX38WP6ucMZqOOAy91/RTqX6oIx/7QL28ZNHXVrwQ67Hkd
pAr3MAk84tZL58lnosw53i5aXAf9CBp0KBnpk2KGutfRNJ4Vzs1e+DZAJA==
=vqAv
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Miscellaneous bugfixes
- A small cleanup for the new workqueue code
- Documentation syntax fix
RISC-V:
- Remove hgatp zeroing in kvm_arch_vcpu_put()
- Fix alignment of the guest_hang() in KVM selftest
- Fix PTE A and D bits in KVM selftest
- Missing #include in vcpu_fp.c
ARM:
- Some PSCI fixes after introducing PSCIv1.1 and SYSTEM_RESET2
- Fix the MMU write-lock not being taken on THP split
- Fix mixed-width VM handling
- Fix potential UAF when debugfs registration fails
- Various selftest updates for all of the above"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (24 commits)
KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU
KVM: SVM: Do not activate AVIC for SEV-enabled guest
Documentation: KVM: Add SPDX-License-Identifier tag
selftests: kvm: add tsc_scaling_sync to .gitignore
RISC-V: KVM: include missing hwcap.h into vcpu_fp
KVM: selftests: riscv: Fix alignment of the guest_hang() function
KVM: selftests: riscv: Set PTE A and D bits in VS-stage page table
RISC-V: KVM: Don't clear hgatp CSR in kvm_arch_vcpu_put()
selftests: KVM: Free the GIC FD when cleaning up in arch_timer
selftests: KVM: Don't leak GIC FD across dirty log test iterations
KVM: Don't create VM debugfs files outside of the VM directory
KVM: selftests: get-reg-list: Add KVM_REG_ARM_FW_REG(3)
KVM: avoid NULL pointer dereference in kvm_dirty_ring_push
KVM: arm64: selftests: Introduce vcpu_width_config
KVM: arm64: mixed-width check should be skipped for uninitialized vCPUs
KVM: arm64: vgic: Remove unnecessary type castings
KVM: arm64: Don't split hugepages outside of MMU write lock
KVM: arm64: Drop unneeded minor version check from PSCI v1.x handler
KVM: arm64: Actually prevent SMC64 SYSTEM_RESET2 from AArch32
KVM: arm64: Generally disallow SMC64 for AArch32 guests
...
The selftest "mqueue/mq_perf_tests.c" use CPU_ALLOC to allocate
CPU set. This cpu set is used further in pthread_attr_setaffinity_np
and by pthread_create in the code. But in current code, allocated
cpu set is not freed.
Fix this issue by adding CPU_FREE in the "shutdown" function which
is called in most of the error/exit path for the cleanup. There are
few error paths which exit without using shutdown. Add a common goto
error path with CPU_FREE for these cases.
Fixes: 7820b0715b ("tools/selftests: add mq_perf_tests")
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Boiler plate for testing static mdb entries. This first test verifies
adding and removing host mdb entries for all supported types: IPv4,
IPv6, and MAC multicast.
Signed-off-by: Joachim Wiberg <troglobit@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Including nolibc.h multiple times results in build errors due to multiple
definitions. Let's add a guard against multiple inclusions.
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This arch doesn't provide the old-style select() syscall, we have to
use pselect6().
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
For consecutive numbers the lscpu command collapses the output and just
shows the range with start and end. The processors are numbered that
way on POWER8.
$ sudo ppc64_cpu --smt=8
$ lscpu | grep '^NUMA node'
NUMA node(s): 2
NUMA node0 CPU(s): 0-79
NUMA node8 CPU(s): 80-159
This causes the heuristic to detect the number threads per core, looking
for the number after the first comma, to fail, and QEMU aborts because of
invalid arguments.
$ lscpu | grep '^NUMA node0' | sed -e 's/^[^,-]*(,|\-)\([0-9]*\),.*$/\1/'
NUMA node0 CPU(s): 0-79
But the lscpu command shows the number of threads per core:
$ sudo ppc64_cpu --smt=8
$ lscpu | grep 'Thread(s) per core'
Thread(s) per core: 8
$ sudo ppc64_cpu --smt=off
$ lscpu | grep 'Thread(s) per core'
Thread(s) per core: 1
This commit therefore directly uses that value and replaces use of grep
with "sed -n" and its "p" command.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit weakens the checks of the kvm.sh script's --torture parameter
and the kvm-recheck.sh script's parsing so that experimental torture tests
may be created without updating these two scripts. The changes required
are to the appropriate Makefile and Kconfig file, plus a directory
whose name begins with "X" must be added to the rcutorture/configs file.
This new directory's name can then be passed in via the kvm.sh script's
--torture parameter.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
The torture.sh script normally runs unattended, so there is not much
point in the "ssh" command asking for a password. This commit therefore
adds the "-o Batchmode=yes" argument to each "ssh" command to cause it
to fail rather than ask for a password.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
An "echo" slipped in between an "ssh" and the "ret=$?" that was intended
to collect its exit code, which prevents torture.sh from detecting
"ssh" failure. This commit therefore reassociates the two.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Currently, the rcupdate.rcu_normal and rcupdate.rcu_expedited kernel
boot parameters are not regularly tested. The potential addition of
polled expedited grace-period APIs increases the amount of code that is
affected by these kernel boot parameters. This commit therefore adds a
"--do-rt" argument to torture.sh to exercise these kernel-boot options.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Parsing of USDT arguments is architecture-specific. On aarch64 it is
relatively easy since registers used are x[0-31] and sp. Format is
slightly different compared to x86_64. Possible forms are:
- "size@[reg[,offset]]" for dereferences, e.g. "-8@[sp,76]" and "-4@[sp]";
- "size@reg" for register values, e.g. "-4@x0";
- "size@value" for raw values, e.g. "-8@1".
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649690496-1902-2-git-send-email-alan.maguire@oracle.com
'perf test''s shell runner will just run everything in the tests
directory (as long as it's not another directory or does not begin
with a dot), but sometimes you find files in there that are not shell
scripts - perf.data output for example if you do some testing and then
the next time you run perf test it tries to run these.
Check the files are executable so they are actually intended to be test
scripts and not just some "random junk" files there.
Signed-off-by: Carsten Haitzler <carsten.haitzler@arm.com>
Reviewed-by: Leo Yan <leo.yan@linaro.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: coresight@lists.linaro.org
Link: http://lore.kernel.org/lkml/20220309122859.31487-1-carsten.haitzler@foss.arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This change adds the symbol offset to the data exported for each
call-chain entry. This can not be calculated from the script and
only the ip value, and no related mmap information.
In addition, also export the source file and line information, if
available, to avoid an external lookup if this information is needed.
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/164554263724.752731.14651017093796049736.stgit@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The tsc_scaling_sync's binary should be present in the .gitignore
file for the git to ignore it.
Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220406063715.55625-3-likexu@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Check dumping of mptcp listener sockets:
1. filter by dport should not return any results
2. filter by sport should return listen sk
3. filter by saddr+sport should return listen sk
4. no filter should return listen sk
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Replace unnecessary list_for_each_entry_continue() in nf_tables,
from Jakob Koschel.
2) Add struct nf_conntrack_net_ecache to conntrack event cache and
use it, from Florian Westphal.
3) Refactor ctnetlink_dump_list(), also from Florian.
4) Bump module reference counter on cttimeout object addition/removal,
from Florian.
5) Consolidate nf_log MAC printer, from Phil Sutter.
6) Add basic logging support for unknown ethertype, from Phil Sutter.
7) Consolidate check for sysctl nf_log_all_netns toggle, also from Phil.
8) Replace hardcode value in nft_bitwise, from Jeremy Sowden.
9) Rename BASIC-like goto tags in nft_bitwise to more meaningful names,
also from Jeremy.
10) nft_fib support for reverse path filtering with policy-based routing
on iif. Extend selftests to cover for this new usecase, from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Its now possible to use fib expression in the forward chain (where both
the input and output interfaces are known).
Add a simple test case for this.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
A microcode update on some Intel processors causes all TSX transactions
to always abort by default[*]. Microcode also added functionality to
re-enable TSX for development purposes. With this microcode loaded, if
tsx=on was passed on the cmdline, and TSX development mode was already
enabled before the kernel boot, it may make the system vulnerable to TSX
Asynchronous Abort (TAA).
To be on safer side, unconditionally disable TSX development mode during
boot. If a viable use case appears, this can be revisited later.
[*]: Intel TSX Disable Update for Selected Processors, doc ID: 643557
[ bp: Drop unstable web link, massage heavily. ]
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/347bd844da3a333a9793c6687d4e4eb3b2419a3e.1646943780.git.pawan.kumar.gupta@linux.intel.com
We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now.
libbpf_set_strict_mode always return 0, so we don't need to check whether
the return value is 0 or not.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-4-laoar.shao@gmail.com
We have switched to memcg-based memory accouting and thus the rlimit is
not needed any more. LIBBPF_STRICT_AUTO_RLIMIT_MEMLOCK was introduced in
libbpf for backward compatibility, so we can use it instead now. After
this change, the header tools/testing/selftests/bpf/bpf_rlimit.h can be
removed.
This patch also removes the useless header sys/resource.h from many files
in tools/testing/selftests/bpf/.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409125958.92629-3-laoar.shao@gmail.com
Background:
Libbpf automatically replaces calls to BPF bpf_probe_read_{kernel,user}
[_str]() helpers with bpf_probe_read[_str](), if libbpf detects that
kernel doesn't support new APIs. Specifically, libbpf invokes the
probe_kern_probe_read_kernel function to load a small eBPF program into
the kernel in which bpf_probe_read_kernel API is invoked and lets the
kernel checks whether the new API is valid. If the loading fails, libbpf
considers the new API invalid and replaces it with the old API.
static int probe_kern_probe_read_kernel(void)
{
struct bpf_insn insns[] = {
BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) */
BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), /* r1 += -8 */
BPF_MOV64_IMM(BPF_REG_2, 8), /* r2 = 8 */
BPF_MOV64_IMM(BPF_REG_3, 0), /* r3 = 0 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel),
BPF_EXIT_INSN(),
};
int fd, insn_cnt = ARRAY_SIZE(insns);
fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL,
"GPL", insns, insn_cnt, NULL);
return probe_fd(fd);
}
Bug:
On older kernel versions [0], the kernel checks whether the version
number provided in the bpf syscall, matches the LINUX_VERSION_CODE.
If not matched, the bpf syscall fails. eBPF However, the
probe_kern_probe_read_kernel code does not set the kernel version
number provided to the bpf syscall, which causes the loading process
alwasys fails for old versions. It means that libbpf will replace the
new API with the old one even the kernel supports the new one.
Solution:
After a discussion in [1], the solution is using BPF_PROG_TYPE_TRACEPOINT
program type instead of BPF_PROG_TYPE_KPROBE because kernel does not
enfoce version check for tracepoint programs. I test the patch in old
kernels (4.18 and 4.19) and it works well.
[0] https://elixir.bootlin.com/linux/v4.19/source/kernel/bpf/syscall.c#L1360
[1] Closes: https://github.com/libbpf/libbpf/issues/473
Signed-off-by: Runqing Yang <rainkin1993@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409144928.27499-1-rainkin1993@gmail.com
Improve subtest selection logic when using -t/-a/-d parameters.
In particular, more than one subtest can be specified or a
combination of tests / subtests.
-a send_signal -d send_signal/send_signal_nmi* - runs send_signal
test without nmi tests
-a send_signal/send_signal_nmi*,find_vma - runs two send_signal
subtests and find_vma test
-a 'send_signal*' -a find_vma -d send_signal/send_signal_nmi* -
runs 2 send_signal test and find_vma test. Disables two send_signal
nmi subtests
-t send_signal -t find_vma - runs two *send_signal* tests and one
*find_vma* test
This will allow us to have granular control over which subtests
to disable in the CI system instead of disabling whole tests.
Also, add new selftest to avoid possible regression when
changing prog_test test name selection logic.
Signed-off-by: Mykola Lysenko <mykolal@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220409001750.529930-1-mykolal@fb.com
- Use local labels in the exception table macros to avoid symbol
conflicts with clang LTO builds
- A couple of fixes to objtool checking of the relatively newly added
SLS and IBT code
- Rename a local var in the WARN* macro machinery to prevent shadowing
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmJSwSkACgkQEsHwGGHe
VUp6QQ//TGhL2xxLoN+7pYjIBDEDHJ3Oi0m6fOweqyQAZTYcm/rAPqd7hvoWVSoO
YsLdWi9jeMwkzG0ItSm/qPVm/UvrViXwuQMdz4nDWqg2IPFIbhgNA3CKCIyPTio2
WHp2NXvYyDnwPMr6xTTRndMDoxiwxMBnXf91pNwoU3toxw0GuUuXan0Y+GKnvx1A
sqhbpWO27bAmhKb26wPw5soJVxBbSqx+1TbFVG0Sz/uwYQowMa+nfNg1DXF0sXyJ
E/ssqBB6wjl7ANVbQsxBQHRzr/EksLVPwHHrlT8ga/5loin+VJ6mTBCPLgG7SMBE
+R1fm79Bp/9KU194fcqhJ3pvnyJPi8hfizzCqNKnK871V8LRzC+jW0l3EdvASEXC
sDj0XWsSFoWft9eAtMV11d641uVC4rLB90GyyzmWWrEw9BbxmasBgED6QBx9d+V6
o1L4y58Tsz88HKzwd0PtBkeGDkvkA7xOx8ViG24IeLA0tcbixnfnATQdelQeWKqO
4m3o1JU8ogJp9JCEBY7ZeXyStFjZMedM4U/V0akF6AKnpDuVfR3T5C68cYhoLKBu
XU6Swf5sFHImNWp0+54HPnXhHj/uhuwj9YWCkxx/eXViwvVlxSdTdIQWa380EddN
0KhOFLwLOdhha2+81FJc6vmkDHwiu6hlR38yqdGvdxZf/KPKjM0=
=kMtP
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Fix the MSI message data struct definition
- Use local labels in the exception table macros to avoid symbol
conflicts with clang LTO builds
- A couple of fixes to objtool checking of the relatively newly added
SLS and IBT code
- Rename a local var in the WARN* macro machinery to prevent shadowing
* tag 'x86_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/msi: Fix msi message data shadow struct
x86/extable: Prefer local labels in .set directives
x86,bpf: Avoid IBT objtool warning
objtool: Fix SLS validation for kcov tail-call replacement
objtool: Fix IBT tail-call detection
x86/bug: Prevent shadowing in __WARN_FLAGS
x86/mm/tlb: Revert retpoline avoidance approach
- Fix the clang command line option probing and remove some options to filter
out, fixing the build with the latest clang versions.
- Fix 'perf bench' futex and epoll benchmarks to deal with machines with more
than 1K CPUs.
- Fix 'perf test tsc' error message when not supported.
- Remap perf ring buffer if there is no space for event, fixing perf usage
in 32-bit ChromeOS.
- Drop objdump stderr to avoid getting stuck waiting for stdout output in
'perf annotate'.
- Fix up garbled output by now showing unwind error messages when augmenting
frame in best effort mode.
- Fix perf's libperf_print callback, use the va_args eprintf() variant.
- Sync vhost and arm64 cputype headers with the kernel sources.
- Fix 'perf report --mem-mode' with ARM SPE.
- Add missing external commands ('perf iiostat', etc) to 'perf --list-cmds'.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYlIM+gAKCRCyPKLppCJ+
J5l6AQCCY4co/6FBh8JMmMX4RVHAUriX0YfKTJfpeLU3nsiXPAD/TVqf1LOyYaPv
/ZqJ8DwqvKr9nkUsf5kAOfPrDB/j/QQ=
=0UV/
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-fixes-for-v5.18-2022-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix the clang command line option probing and remove some options to
filter out, fixing the build with the latest clang versions
- Fix 'perf bench' futex and epoll benchmarks to deal with machines
with more than 1K CPUs
- Fix 'perf test tsc' error message when not supported
- Remap perf ring buffer if there is no space for event, fixing perf
usage in 32-bit ChromeOS
- Drop objdump stderr to avoid getting stuck waiting for stdout output
in 'perf annotate'
- Fix up garbled output by now showing unwind error messages when
augmenting frame in best effort mode
- Fix perf's libperf_print callback, use the va_args eprintf() variant
- Sync vhost and arm64 cputype headers with the kernel sources
- Fix 'perf report --mem-mode' with ARM SPE
- Add missing external commands ('iiostat', etc) to 'perf --list-cmds'
* tag 'perf-tools-fixes-for-v5.18-2022-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf annotate: Drop objdump stderr to avoid getting stuck waiting for stdout output
perf tools: Add external commands to list-cmds
perf docs: Add perf-iostat link to manpages
perf session: Remap buf if there is no space for event
perf bench: Fix epoll bench to correct usage of affinity for machines with #CPUs > 1K
perf bench: Fix futex bench to correct usage of affinity for machines with #CPUs > 1K
perf tools: Fix perf's libperf_print callback
perf: arm-spe: Fix perf report --mem-mode
perf unwind: Don't show unwind error messages when augmenting frame pointer stack
tools headers arm64: Sync arm64's cputype.h with the kernel sources
perf test tsc: Fix error message when not supported
perf build: Don't use -ffat-lto-objects in the python feature test when building with clang-13
perf python: Fix probing for some clang command line options
tools build: Filter out options and warnings not supported by clang
tools build: Use $(shell ) instead of `` to get embedded libperl's ccopts
tools include UAPI: Sync linux/vhost.h with the kernel sources
- Fix a compile error in the nvdimm unit tests
- Fix a shadowed variable warning in the CXL PCI driver
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYlIY6QAKCRDfioYZHlFs
Z0aeAQDDYcicYRhLZ3Ljbg6stitBIumpdVcKDHm4WkC9gbmB4QEArnXLpcHPWyAa
zmgc1Yrp9gOnpNSRMog9Wc8NaR45KA8=
=Ov5t
-----END PGP SIGNATURE-----
Merge tag 'cxl+nvdimm-for-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull cxl and nvdimm fixes from Dan Williams:
- Fix a compile error in the nvdimm unit tests
- Fix a shadowed variable warning in the CXL PCI driver
* tag 'cxl+nvdimm-for-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
cxl/pci: Drop shadowed variable
tools/testing/nvdimm: Fix security_init() symbol collision
If objdump writes to stderr it can block waiting for it to be read. As
perf doesn't read stderr then progress stops with perf waiting for
stdout output.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Lexi Shao <shaolexi@huawei.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: William Cohen <wcohen@redhat.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lore.kernel.org/lkml/20220407230503.1265036-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The `perf --list-cmds` output prints only internal commands, although
there is no reason for that from users' perspective.
Adding the external commands to commands array with NULL function
pointer allows printing all perf commands while not changing the logic
of command handler selection.
Signed-off-by: Michael Petlan <mpetlan@redhat.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220404221541.30312-2-mpetlan@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a perf event doesn't fit into remaining buffer space return NULL to
remap buf and fetch the event again.
Keep the logic to error out on inadequate input from fuzzing.
This fixes perf failing on ChromeOS (with 32b userspace):
$ perf report -v -i perf.data
...
prefetch_event: head=0x1fffff8 event->header_size=0x30, mmap_size=0x2000000: fuzzed or compressed perf.data?
Error:
failed to process sample
Fixes: 57fc032ad6 ("perf session: Avoid infinite loop when seeing invalid header.size")
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Denis Nikitin <denik@chromium.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220330031130.2152327-1-denik@chromium.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf bench epoll' testcase fails on systems with more than 1K CPUs.
Testcase: perf bench epoll all
Result snippet:
<<>>
Run summary [PID 106497]: 1399 threads monitoring on 64 file-descriptors for 8 secs.
perf: pthread_create: No such file or directory
<<>>
In epoll benchmarks (ctl, wait) pthread_create is invoked in do_threads
from respective bench_epoll_* function. Though the logs shows direct
failure from pthread_create, the actual failure is from
"sched_setaffinity" returning EINVAL (invalid argument).
This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.
Patch addresses this by fixing all the epoll benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-3-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'perf bench futex' testcase fails on systems with more than 1K CPUs.
Testcase: perf bench futex all
Failure snippet:
<<>>Running futex/hash benchmark...
perf: pthread_create: No such file or directory
<<>>
All the futex benchmarks (ie hash, lock-api, requeue, wake,
wake-parallel), pthread_create is invoked in respective bench_futex_*
function. Though the logs shows direct failure from pthread_create,
strace logs showed that actual failure is from "sched_setaffinity"
returning EINVAL (invalid argument).
This happens because the default mask size in glibc is 1024. To overcome
this 1024 CPUs mask size limitation of cpu_set_t, change the mask size
using the CPU_*_S macros.
Patch addresses this by fixing all the futex benchmarks to use CPU_ALLOC
to allocate cpumask, CPU_ALLOC_SIZE for size, and CPU_SET_S to set the
mask.
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nageswara R Sastry <rnsastry@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Link: https://lore.kernel.org/r/20220406175113.87881-2-atrajeev@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
eprintf() does not expect va_list as the type of the 4th parameter.
Use veprintf() because it does.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 428dab813a ("libperf: Merge libperf_set_print() into libperf_init()")
Cc: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20220408132625.2451452-1-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit bb30acae4c ("perf report: Bail out --mem-mode if mem
info is not available") "perf mem report" and "perf report --mem-mode"
don't allow opening the file unless one of the events has
PERF_SAMPLE_DATA_SRC set.
SPE doesn't have this set even though synthetic memory data is generated
after it is decoded. Fix this issue by setting DATA_SRC on SPE events.
This has no effect on the data collected because the SPE driver doesn't
do anything with that flag and doesn't generate samples.
Fixes: bb30acae4c ("perf report: Bail out --mem-mode if mem info is not available")
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20220408144056.1955535-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when
using 'perf record --call-graph=fp'") intended to add a 'best effort'
DWARF unwind that improved the frame pointer stack in most scenarios.
It's expected that the unwind will fail sometimes, but this shouldn't be
reported as an error. It only works when the return address can be
determined from the contents of the link register alone.
Fix the error shown when the unwinder requires extra registers by adding
a new flag that suppresses error messages. This flag is not set in the
normal --call-graph=dwarf unwind mode so that behavior is not changed.
Fixes: b9f6fbb3b2 ("perf arm64: Inject missing frames when using 'perf record --call-graph=fp'")
Reported-by: John Garry <john.garry@huawei.com>
Signed-off-by: James Clark <james.clark@arm.com>
Tested-by: John Garry <john.garry@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Truong <alexandre.truong@arm.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220406145651.1392529-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
83bea32ac7 ("arm64: Add part number for Arm Cortex-A78AE")
That addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/asm/cputype.h' differs from latest version at 'arch/arm64/include/asm/cputype.h'
diff -u tools/arch/arm64/include/asm/cputype.h arch/arm64/include/asm/cputype.h
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Chanho Park <chanho61.park@samsung.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By default `perf test tsc` does not return the error message when the
child process detected kernel does not support it. Instead, the child
process prints an error message to stderr, unfortunately stderr is
redirected to /dev/null when verbose <= 0.
This patch does:
- return TEST_SKIP to the parent process instead of TEST_OK when
perf_read_tsc_conversion() is not supported.
- Add a new subtest of testing if TSC is supported on current
architecture by moving exist code to a separate function.
It avoids two places in test__perf_time_to_tsc() that return
TEST_SKIP by doing this.
- Extend the test suite definition to contain above two subtests.
Current test_suite and test_case structs do not support printing skip
reason when the number of subtest less than 1. To print skip reason, it
is necessary to extend current test suite definition.
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Chengdong Li <chengdongli@tencent.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: likexu@tencent.com
Link: https://lore.kernel.org/r/20220408084748.43707-1-chengdongli@tencent.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using -ffat-lto-objects in the python feature test when building with
clang-13 results in:
clang-13: error: optimization flag '-ffat-lto-objects' is not supported [-Werror,-Wignored-optimization-argument]
error: command '/usr/sbin/clang' failed with exit code 1
cp: cannot stat '/tmp/build/perf/python_ext_build/lib/perf*.so': No such file or directory
make[2]: *** [Makefile.perf:639: /tmp/build/perf/python/perf.so] Error 1
Noticed when building on a docker.io/library/archlinux:base container.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The clang compiler complains about some options even without a source
file being available, while others require one, so use the simple
tools/build/feature/test-hello.c file.
Then check for the "is not supported" string in its output, in addition
to the "unknown argument" already being looked for.
This was noticed when building with clang-13 where -ffat-lto-objects
isn't supported and since we were looking just for "unknown argument"
and not providing a source code to clang, was mistakenly assumed as
being available and not being filtered to set of command line options
provided to clang, leading to a build failure.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These make the feature check fail when using clang, so remove them just
like is done in tools/perf/Makefile.config to build perf itself.
Adding -Wno-compound-token-split-by-macro to tools/perf/Makefile.config
when building with clang is also necessary to avoid these warnings
turned into errors (-Werror):
CC /tmp/build/perf/util/scripting-engines/trace-event-perl.o
In file included from util/scripting-engines/trace-event-perl.c:35:
In file included from /usr/lib64/perl5/CORE/perl.h:4085:
In file included from /usr/lib64/perl5/CORE/hv.h:659:
In file included from /usr/lib64/perl5/CORE/hv_func.h:34:
In file included from /usr/lib64/perl5/CORE/sbox32_hash.h:4:
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '(' and '{' tokens introducing statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:80:38: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
#define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \
^~~~~~~~~~
/usr/lib64/perl5/CORE/perl.h:737:29: note: expanded from macro 'STMT_START'
# define STMT_START (void)( /* gcc supports "({ STATEMENTS; })" */
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: '{' token is here
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:80:49: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
#define ZAPHOD32_SCRAMBLE32(v,prime) STMT_START { \
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: error: '}' and ')' tokens terminating statement expression appear in different macro expansion contexts [-Werror,-Wcompound-token-split-by-macro]
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:87:41: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
v ^= (v>>23); \
^
/usr/lib64/perl5/CORE/zaphod32_hash.h:150:5: note: ')' token is here
ZAPHOD32_SCRAMBLE32(state[0],0x9fade23b);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib64/perl5/CORE/zaphod32_hash.h:88:3: note: expanded from macro 'ZAPHOD32_SCRAMBLE32'
} STMT_END
^~~~~~~~
/usr/lib64/perl5/CORE/perl.h:738:21: note: expanded from macro 'STMT_END'
# define STMT_END )
^
Please refer to the discussion on the Link: tag below, where Nathan
clarifies the situation:
<quote>
acme> And then get to the problems at the end of this message, which seem
acme> similar to the problem described here:
acme>
acme> From Nathan Chancellor <>
acme> Subject [PATCH] mwifiex: Remove unnecessary braces from HostCmd_SET_SEQ_NO_BSS_INFO
acme>
acme> https://lkml.org/lkml/2020/9/1/135
acme>
acme> So perhaps in this case its better to disable that
acme> -Werror,-Wcompound-token-split-by-macro when building with clang?
Yes, I think that is probably the best solution. As far as I can tell,
at least in this file and context, the warning appears harmless, as the
"create a GNU C statement expression from two different macros" is very
much intentional, based on the presence of PERL_USE_GCC_BRACE_GROUPS.
The warning is fixed in upstream Perl by just avoiding creating GNU C
statement expressions using STMT_START and STMT_END:
https://github.com/Perl/perl5/issues/18780https://github.com/Perl/perl5/pull/18984
If I am reading the source code correctly, an alternative to disabling
the warning would be specifying -DPERL_GCC_BRACE_GROUPS_FORBIDDEN but it
seems like that might end up impacting more than just this site,
according to the issue discussion above.
</quote>
Based-on-a-patch-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YkxWcYzph5pC1EK8@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Just like its done for ldopts and for both in tools/perf/Makefile.config.
Using `` to initialize PERL_EMBED_CCOPTS somehow precludes using:
$(filter-out SOMETHING_TO_FILTER,$(PERL_EMBED_CCOPTS))
And we need to do it to allow for building with versions of clang where
some gcc options selected by distros are not available.
Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # Debian/Selfmade LLVM-14 (x86-64)
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Fangrui Song <maskray@google.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Keeping <john@metanate.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lore.kernel.org/lkml/YktYX2OnLtyobRYD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The guest_hang() function is used as the default exception handler
for various KVM selftests applications by setting it's address in
the vstvec CSR. The vstvec CSR requires exception handler base address
to be at least 4-byte aligned so this patch fixes alignment of the
guest_hang() function.
Fixes: 3e06cdf105 ("KVM: selftests: Add initial support for RISC-V
64-bit")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Supporting hardware updates of PTE A and D bits is optional for any
RISC-V implementation so current software strategy is to always set
these bits in both G-stage (hypervisor) and VS-stage (guest kernel).
If PTE A and D bits are not set by software (hypervisor or guest)
then RISC-V implementations not supporting hardware updates of these
bits will cause traps even for perfectly valid PTEs.
Based on above explanation, the VS-stage page table created by various
KVM selftest applications is not correct because PTE A and D bits are
not set. This patch fixes VS-stage page table programming of PTE A and
D bits for KVM selftests.
Fixes: 3e06cdf105 ("KVM: selftests: Add initial support for RISC-V
64-bit")
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Tested-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Daniel Borkmann says:
====================
pull-request: bpf-next 2022-04-09
We've added 63 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 4852 insertions(+), 619 deletions(-).
The main changes are:
1) Add libbpf support for USDT (User Statically-Defined Tracing) probes.
USDTs are an abstraction built on top of uprobes, critical for tracing
and BPF, and widely used in production applications, from Andrii Nakryiko.
2) While Andrii was adding support for x86{-64}-specific logic of parsing
USDT argument specification, Ilya followed-up with USDT support for s390
architecture, from Ilya Leoshkevich.
3) Support name-based attaching for uprobe BPF programs in libbpf. The format
supported is `u[ret]probe/binary_path:[raw_offset|function[+offset]]`, e.g.
attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc")
now, from Alan Maguire.
4) Various load/store optimizations for the arm64 JIT to shrink the image
size by using arm64 str/ldr immediate instructions. Also enable pointer
authentication to verify return address for JITed code, from Xu Kuohai.
5) BPF verifier fixes for write access checks to helper functions, e.g.
rd-only memory from bpf_*_cpu_ptr() must not be passed to helpers that
write into passed buffers, from Kumar Kartikeya Dwivedi.
6) Fix overly excessive stack map allocation for its base map structure and
buckets which slipped-in from cleanups during the rlimit accounting removal
back then, from Yuntao Wang.
7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter
connection tracking tuple direction, from Lorenzo Bianconi.
8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde.
9) Minor cleanups all over the place from various others.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits)
bpf: Fix excessive memory allocation in stack_map_alloc()
selftests/bpf: Fix return value checks in perf_event_stackmap test
selftests/bpf: Add CO-RE relos into linked_funcs selftests
libbpf: Use weak hidden modifier for USDT BPF-side API functions
libbpf: Don't error out on CO-RE relos for overriden weak subprogs
samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread
libbpf: Allow WEAK and GLOBAL bindings during BTF fixup
libbpf: Use strlcpy() in path resolution fallback logic
libbpf: Add s390-specific USDT arg spec parsing logic
libbpf: Make BPF-side of USDT support work on big-endian machines
libbpf: Minor style improvements in USDT code
libbpf: Fix use #ifdef instead of #if to avoid compiler warning
libbpf: Potential NULL dereference in usdt_manager_attach_usdt()
selftests/bpf: Uprobe tests should verify param/return values
libbpf: Improve string parsing for uprobe auto-attach
libbpf: Improve library identification for uprobe binary path resolution
selftests/bpf: Test for writes to map key from BPF helpers
selftests/bpf: Test passing rdonly mem to global func
bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access
bpf: Check PTR_TO_MEM | MEM_RDONLY in check_helper_mem_access
...
====================
Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The bpf_get_stackid() function may also return 0 on success as per UAPI BPF
helper documentation. Therefore, correct checks from 'val > 0' to 'val >= 0'
to ensure that they cover all possible success return values.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408041452.933944-1-ytcoode@gmail.com
Add CO-RE relocations into __weak subprogs for multi-file linked_funcs
selftest to make sure libbpf handles such combination well.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-4-andrii@kernel.org
Use __weak __hidden for bpf_usdt_xxx() APIs instead of much more
confusing `static inline __noinline`. This was previously impossible due
to libbpf erroring out on CO-RE relocations pointing to eliminated weak
subprogs. Now that previous patch fixed this issue, switch back to
__weak __hidden as it's a more direct way of specifying the desired
behavior.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-3-andrii@kernel.org
During BPF static linking, all the ELF relocations and .BTF.ext
information (including CO-RE relocations) are preserved for __weak
subprograms that were logically overriden by either previous weak
subprogram instance or by corresponding "strong" (non-weak) subprogram.
This is just how native user-space linkers work, nothing new.
But libbpf is over-zealous when processing CO-RE relocation to error out
when CO-RE relocation belonging to such eliminated weak subprogram is
encountered. Instead of erroring out on this expected situation, log
debug-level message and skip the relocation.
Fixes: db2b8b0642 ("libbpf: Support CO-RE relocations for multi-prog sections")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220408181425.2287230-2-andrii@kernel.org
Starting with the new perf-event support in the nvdimm core, the
nfit_test mock module stops compiling. Rename its security_init() to
nfit_security_init().
tools/testing/nvdimm/test/nfit.c:1845:13: error: conflicting types for ‘security_init’; have ‘void(struct nfit_test *)’
1845 | static void security_init(struct nfit_test *t)
| ^~~~~~~~~~~~~
In file included from ./include/linux/perf_event.h:61,
from ./include/linux/nd.h:11,
from ./drivers/nvdimm/nd-core.h:11,
from tools/testing/nvdimm/test/nfit.c:19:
Fixes: 9a61d0838c ("drivers/nvdimm: Add nvdimm pmu structure")
Cc: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>
Link: https://lore.kernel.org/r/164904238610.1330275.1889212115373993727.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
During BTF fix up for global variables, global variable can be global
weak and will have STB_WEAK binding in ELF. Support such global
variables in addition to non-weak ones.
This is not the problem when using BPF static linking, as BPF static
linker "fixes up" BTF during generation so that libbpf doesn't have to
do it anymore during bpf_object__open(), which led to this not being
noticed for a while, along with a pretty rare (currently) use of __weak
variables and maps.
Reported-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-2-andrii@kernel.org
Coverity static analyzer complains that strcpy() can cause buffer
overflow. Use libbpf_strlcpy() instead to be 100% sure this doesn't
happen.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220407230446.3980075-1-andrii@kernel.org
The logic is superficially similar to that of x86, but the small
differences (no need for register table and dynamic allocation of
register names, no $ sign before constants) make maintaining a common
implementation too burdensome. Therefore simply add a s390x-specific
version of parse_usdt_arg().
Note that while bcc supports index registers, this patch does not. This
should not be a problem in most cases, since s390 uses a default value
"nor" for STAP_SDT_ARG_CONSTRAINT.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-4-iii@linux.ibm.com
BPF_USDT_ARG_REG_DEREF handling always reads 8 bytes, regardless of
the actual argument size. On little-endian the relevant argument bits
end up in the lower bits of val, and later on the code that handles
all the argument types expects them to be there.
On big-endian they end up in the upper bits of val, breaking that
expectation. Fix by right-shifting val on big-endian.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-3-iii@linux.ibm.com
Fix several typos and references to non-existing headers.
Also use __BYTE_ORDER__ instead of __BYTE_ORDER for consistency with
the rest of the bpf code - see commit 45f2bebc80 ("libbpf: Fix
endianness detection in BPF_CORE_READ_BITFIELD_PROBED()") for
rationale).
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220407214411.257260-2-iii@linux.ibm.com
As reported by Naresh:
perf build errors on i386 [1] on Linux next-20220407 [2]
usdt.c:1181:5: error: "__x86_64__" is not defined, evaluates to 0
[-Werror=undef]
1181 | #if __x86_64__
| ^~~~~~~~~~
usdt.c:1196:5: error: "__x86_64__" is not defined, evaluates to 0
[-Werror=undef]
1196 | #if __x86_64__
| ^~~~~~~~~~
cc1: all warnings being treated as errors
Use #ifdef instead of #if to avoid this.
Fixes: 4c59e584d1 ("libbpf: Add x86-specific USDT arg spec parsing logic")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220407203842.3019904-1-andrii@kernel.org
uprobe/uretprobe tests don't do any validation of arguments/return values,
and without this we can't be sure we are attached to the right function,
or that we are indeed attached to a uprobe or uretprobe. To fix this
record argument and return value for auto-attached functions and ensure
these match expectations. Also need to filter by pid to ensure we do
not pick up stray malloc()s since auto-attach traces libc system-wide.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-4-git-send-email-alan.maguire@oracle.com
For uprobe auto-attach, the parsing can be simplified for the SEC()
name to a single sscanf(); the return value of the sscanf can then
be used to distinguish between sections that simply specify
"u[ret]probe" (and thus cannot auto-attach), those that specify
"u[ret]probe/binary_path:function+offset" etc.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-3-git-send-email-alan.maguire@oracle.com
In the process of doing path resolution for uprobe attach, libraries are
identified by matching a ".so" substring in the binary_path.
This matches a lot of patterns that do not conform to library.so[.version]
format, so instead match a ".so" _suffix_, and if that fails match a
".so." substring for the versioned library case.
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1649245431-29956-2-git-send-email-alan.maguire@oracle.com
In order to correctly destroy a VM, all references to the VM must be
freed. The arch_timer selftest creates a VGIC for the guest, which
itself holds a reference to the VM.
Close the GIC FD when cleaning up a VM.
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220406235615.1447180-4-oupton@google.com
dirty_log_perf_test instantiates a VGICv3 for the guest (if supported by
hardware) to reduce the overhead of guest exits. However, the test does
not actually close the GIC fd when cleaning up the VM between test
iterations, meaning that the VM is never actually destroyed in the
kernel.
While this is generally a bad idea, the bug was detected from the kernel
spewing about duplicate debugfs entries as subsequent VMs happen to
reuse the same FD even though the debugfs directory is still present.
Abstract away the notion of setup/cleanup of the GIC FD from the test
by creating arch-specific helpers for test setup/cleanup. Close the GIC
FD on VM cleanup and do nothing for the other architectures.
Fixes: c340f7899a ("KVM: selftests: Add vgic initialization for dirty log perf test for ARM")
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220406235615.1447180-3-oupton@google.com
When testing a kernel with commit a5905d6af4 ("KVM: arm64:
Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated")
get-reg-list output
vregs: Number blessed registers: 234
vregs: Number registers: 238
vregs: There are 1 new registers.
Consider adding them to the blessed reg list with the following lines:
KVM_REG_ARM_FW_REG(3),
vregs: PASS
...
That output inspired two changes: 1) add the new register to the
blessed list and 2) explain why "Number registers" is actually four
larger than "Number blessed registers" (on the system used for
testing), even though only one register is being stated as new.
The reason is that some registers are host dependent and they get
filtered out when comparing with the blessed list. The system
used for the test apparently had three filtered registers.
Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220316125129.392128-1-drjones@redhat.com
Alexei Starovoitov says:
====================
pull-request: bpf 2022-04-06
We've added 8 non-merge commits during the last 8 day(s) which contain
a total of 9 files changed, 139 insertions(+), 36 deletions(-).
The main changes are:
1) rethook related fixes, from Jiri and Masami.
2) Fix the case when tracing bpf prog is attached to struct_ops, from Martin.
3) Support dual-stack sockets in bpf_tcp_check_syncookie, from Maxim.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Adjust bpf_tcp_check_syncookie selftest to test dual-stack sockets
bpf: Support dual-stack sockets in bpf_tcp_check_syncookie
bpf: selftests: Test fentry tracing a struct_ops program
bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT
rethook: Fix to use WRITE_ONCE() for rethook:: Handler
selftests/bpf: Fix warning comparing pointer to 0
bpf: Fix sparse warnings in kprobe_multi_resolve_syms
bpftool: Explicit errno handling in skeletons
====================
Link: https://lore.kernel.org/r/20220407031245.73026-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When invoking bpf_for_each_map_elem callback, we are passed a
PTR_TO_MAP_KEY, previously writes to this through helper may be allowed,
but the fix in previous patches is meant to prevent that case. The test
case tries to pass it as writable memory to helper, and fails test if it
succeeds to pass the verifier.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add two test cases, one pass read only map value pointer to global
func, which should be rejected. The same code checks it for kfunc, so
that is covered as well. Second one tries to use the missing check for
PTR_TO_MEM's MEM_RDONLY flag and tries to write to a read only memory
pointer. Without prior patches, both of these tests fail.
Reviewed-by: Hao Luo <haoluo@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220319080827.73251-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_map_value_size() uses num_possible_cpus() to determine map size, but
some of the tests only allocate enough memory for online cpus. This
results in out-of-bound writes in userspace during bpf(BPF_MAP_LOOKUP_ELEM)
syscalls in cases when number of online cpus is lower than the number of
possible cpus. Fix by switching from get_nprocs_conf() to
bpf_num_possible_cpus() when determining the number of processors in
these tests (test_progs/netcnt and test_cgroup_storage).
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406085408.339336-1-asavkov@redhat.com
The function does not check that parsing_end is false after parsing
argument. Thus, if the final part of the argument is something like '4-',
which is invalid, parse_num_list() will discard it instead of returning
-EINVAL.
Before:
$ ./test_progs -n 2,4-
#2 atomic_bounds:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED
After:
$ ./test_progs -n 2,4-
Failed to parse test numbers.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220406003622.73539-1-ytcoode@gmail.com
The previous commit fixed support for dual-stack sockets in
bpf_tcp_check_syncookie. This commit adjusts the selftest to verify the
fixed functionality.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Arthur Fabre <afabre@cloudflare.com>
Link: https://lore.kernel.org/bpf/20220406124113.2795730-2-maximmi@nvidia.com
Introduce a test for aarch64 that ensures non-mixed-width vCPUs
(all 64bit vCPUs or all 32bit vcPUs) can be configured, and
mixed-width vCPUs cannot be configured.
Reviewed-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220329031924.619453-3-reijiw@google.com
Extend urandom_read helper binary to include USDTs of 4 combinations:
semaphore/semaphoreless (refcounted and non-refcounted) and based in
executable or shared library. We also extend urandom_read with ability
to report it's own PID to parent process and wait for parent process to
ready itself up for tracing urandom_read. We utilize popen() and
underlying pipe properties for proper signaling.
Once urandom_read is ready, we add few tests to validate that libbpf's
USDT attachment handles all the above combinations of semaphore (or lack
of it) and static or shared library USDTs. Also, we validate that libbpf
handles shared libraries both with PID filter and without one (i.e., -1
for PID argument).
Having the shared library case tested with and without PID is important
because internal logic differs on kernels that don't support BPF
cookies. On such older kernels, attaching to USDTs in shared libraries
without specifying concrete PID doesn't work in principle, because it's
impossible to determine shared library's load address to derive absolute
IPs for uprobe attachments. Without absolute IPs, it's impossible to
perform correct look up of USDT spec based on uprobe's absolute IP (the
only kind available from BPF at runtime). This is not the problem on
newer kernels with BPF cookie as we don't need IP-to-ID lookup because
BPF cookie value *is* spec ID.
So having those two situations as separate subtests is good because
libbpf CI is able to test latest selftests against old kernels (e.g.,
4.9 and 5.5), so we'll be able to disable PID-less shared lib attachment
for old kernels, but will still leave PID-specific one enabled to validate
this legacy logic is working correctly.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-8-andrii@kernel.org
Add semaphore-based USDT to test_progs itself and write basic tests to
valicate both auto-attachment and manual attachment logic, as well as
BPF-side functionality.
Also add subtests to validate that libbpf properly deduplicates USDT
specs and handles spec overflow situations correctly, as well as proper
"rollback" of partially-attached multi-spec USDT.
BPF-side of selftest intentionally consists of two files to validate
that usdt.bpf.h header can be included from multiple source code files
that are subsequently linked into final BPF object file without causing
any symbol duplication or other issues. We are validating that __weak
maps and bpf_usdt_xxx() API functions defined in usdt.bpf.h do work as
intended.
USDT selftests utilize sys/sdt.h header that on Ubuntu systems comes
from systemtap-sdt-devel package. But to simplify everyone's life,
including CI but especially casual contributors to bpf/bpf-next that
are trying to build selftests, I've checked in sys/sdt.h header from [0]
directly. This way it will work on all architectures and distros without
having to figure it out for every relevant combination and adding any
extra implicit package dependencies.
[0] https://sourceware.org/git?p=systemtap.git;a=blob_plain;f=includes/sys/sdt.h;h=ca0162b4dc57520b96638c8ae79ad547eb1dd3a1;hb=HEAD
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-7-andrii@kernel.org
Add x86/x86_64-specific USDT argument specification parsing. Each
architecture will require their own logic, as all this is arch-specific
assembly-based notation. Architectures that libbpf doesn't support for
USDTs will pr_warn() with specific error and return -ENOTSUP.
We use sscanf() as a very powerful and easy to use string parser. Those
spaces in sscanf's format string mean "skip any whitespaces", which is
pretty nifty (and somewhat little known) feature.
All this was tested on little-endian architecture, so bit shifts are
probably off on big-endian, which our CI will hopefully prove.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org
Last part of architecture-agnostic user-space USDT handling logic is to
set up BPF spec and, optionally, IP-to-ID maps from user-space.
usdt_manager performs a compact spec ID allocation to utilize
fixed-sized BPF maps as efficiently as possible. We also use hashmap to
deduplicate USDT arg spec strings and map identical strings to single
USDT spec, minimizing the necessary BPF map size. usdt_manager supports
arbitrary sequences of attachment and detachment, both of the same USDT
and multiple different USDTs and internally maintains a free list of
unused spec IDs. bpf_link_usdt's logic is extended with proper setup and
teardown of this spec ID free list and supporting BPF maps.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org
Implement architecture-agnostic parts of USDT parsing logic. The code is
the documentation in this case, it's futile to try to succinctly
describe how USDT parsing is done in any sort of concreteness. But
still, USDTs are recorded in special ELF notes section (.note.stapsdt),
where each USDT call site is described separately. Along with USDT
provider and USDT name, each such note contains USDT argument
specification, which uses assembly-like syntax to describe how to fetch
value of USDT argument. USDT arg spec could be just a constant, or
a register, or a register dereference (most common cases in x86_64), but
it technically can be much more complicated cases, like offset relative
to global symbol and stuff like that. One of the later patches will
implement most common subset of this for x86 and x86-64 architectures,
which seems to handle a lot of real-world production application.
USDT arg spec contains a compact encoding allowing usdt.bpf.h from
previous patch to handle the above 3 cases. Instead of recording which
register might be needed, we encode register's offset within struct
pt_regs to simplify BPF-side implementation. USDT argument can be of
different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle
this, libbpf pre-calculates necessary bit shifts to do proper casting
and sign-extension in a short sequences of left and right shifts.
The rest is in the code with sometimes extensive comments and references
to external "documentation" for USDTs.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org
Wire up libbpf USDT support APIs without yet implementing all the
nitty-gritty details of USDT discovery, spec parsing, and BPF map
initialization.
User-visible user-space API is simple and is conceptually very similar
to uprobe API.
bpf_program__attach_usdt() API allows to programmatically attach given
BPF program to a USDT, specified through binary path (executable or
shared lib), USDT provider and name. Also, just like in uprobe case, PID
filter is specified (0 - self, -1 - any process, or specific PID).
Optionally, USDT cookie value can be specified. Such single API
invocation will try to discover given USDT in specified binary and will
use (potentially many) BPF uprobes to attach this program in correct
locations.
Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that
represents this attachment. It is a virtual BPF link that doesn't have
direct kernel object, as it can consist of multiple underlying BPF
uprobe links. As such, attachment is not atomic operation and there can
be brief moment when some USDT call sites are attached while others are
still in the process of attaching. This should be taken into
consideration by user. But bpf_program__attach_usdt() guarantees that
in the case of success all USDT call sites are successfully attached, or
all the successfuly attachments will be detached as soon as some USDT
call sites failed to be attached. So, in theory, there could be cases of
failed bpf_program__attach_usdt() call which did trigger few USDT
program invocations. This is unavoidable due to multi-uprobe nature of
USDT and has to be handled by user, if it's important to create an
illusion of atomicity.
USDT BPF programs themselves are marked in BPF source code as either
SEC("usdt"), in which case they won't be auto-attached through
skeleton's <skel>__attach() method, or it can have a full definition,
which follows the spirit of fully-specified uprobes:
SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's
attach method will attempt auto-attachment. Similarly, generic
bpf_program__attach() will have enought information to go off of for
parameterless attachment.
USDT BPF programs are actually uprobes, and as such for kernel they are
marked as BPF_PROG_TYPE_KPROBE.
Another part of this patch is USDT-related feature probing:
- BPF cookie support detection from user-space;
- detection of kernel support for auto-refcounting of USDT semaphore.
The latter is optional. If kernel doesn't support such feature and USDT
doesn't rely on USDT semaphores, no error is returned. But if libbpf
detects that USDT requires setting semaphores and kernel doesn't support
this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't
support poking process's memory directly to increment semaphore value,
like BCC does on legacy kernels, due to inherent raciness and danger of
such process memory manipulation. Libbpf let's kernel take care of this
properly or gives up.
Logistically, all the extra USDT-related infrastructure of libbpf is put
into a separate usdt.c file and abstracted behind struct usdt_manager.
Each bpf_object has lazily-initialized usdt_manager pointer, which is
only instantiated if USDT programs are attempted to be attached. Closing
BPF object frees up usdt_manager resources. usdt_manager keeps track of
USDT spec ID assignment and few other small things.
Subsequent patches will fill out remaining missing pieces of USDT
initialization and setup logic.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org
Add BPF-side implementation of libbpf-provided USDT support. This
consists of single header library, usdt.bpf.h, which is meant to be used
from user's BPF-side source code. This header is added to the list of
installed libbpf header, along bpf_helpers.h and others.
BPF-side implementation consists of two BPF maps:
- spec map, which contains "a USDT spec" which encodes information
necessary to be able to fetch USDT arguments and other information
(argument count, user-provided cookie value, etc) at runtime;
- IP-to-spec-ID map, which is only used on kernels that don't support
BPF cookie feature. It allows to lookup spec ID based on the place
in user application that triggers USDT program.
These maps have default sizes, 256 and 1024, which are chosen
conservatively to not waste a lot of space, but handling a lot of common
cases. But there could be cases when user application needs to either
trace a lot of different USDTs, or USDTs are heavily inlined and their
arguments are located in a lot of differing locations. For such cases it
might be necessary to size those maps up, which libbpf allows to do by
overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros.
It is an important aspect to keep in mind. Single USDT (user-space
equivalent of kernel tracepoint) can have multiple USDT "call sites".
That is, single logical USDT is triggered from multiple places in user
application. This can happen due to function inlining. Each such inlined
instance of USDT invocation can have its own unique USDT argument
specification (instructions about the location of the value of each of
USDT arguments). So while USDT looks very similar to usual uprobe or
kernel tracepoint, under the hood it's actually a collection of uprobes,
each potentially needing different spec to know how to fetch arguments.
User-visible API consists of three helper functions:
- bpf_usdt_arg_cnt(), which returns number of arguments of current USDT;
- bpf_usdt_arg(), which reads value of specified USDT argument (by
it's zero-indexed position) and returns it as 64-bit value;
- bpf_usdt_cookie(), which functions like BPF cookie for USDT
programs; this is necessary as libbpf doesn't allow specifying actual
BPF cookie and utilizes it internally for USDT support implementation.
Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into
BPF program. On kernels that don't support BPF cookie it is used to
fetch absolute IP address of the underlying uprobe.
usdt.bpf.h also provides BPF_USDT() macro, which functions like
BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to
get access to USDT arguments, if USDT definition is static and known to
the user. It is expected that majority of use cases won't have to use
bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover
all their needs.
Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to
detect kernel support for BPF cookie. If BPF CO-RE dependency is
undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to
either a boolean constant (or equivalently zero and non-zero), or even
point it to its own .rodata variable that can be specified from user's
application user-space code. It is important that
BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus
.rodata and not just .data), as otherwise BPF code will still contain
bpf_get_attach_cookie() BPF helper call and will fail validation at
runtime, if not dead-code eliminated.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org
Since not all compilers have a function attribute to disable KCOV
instrumentation, objtool can rewrite KCOV instrumentation in noinstr
functions as per commit:
f56dae88a8 ("objtool: Handle __sanitize_cov*() tail calls")
However, this has subtle interaction with the SLS validation from
commit:
1cc1e4c8aa ("objtool: Add straight-line-speculation validation")
In that when a tail-call instrucion is replaced with a RET an
additional INT3 instruction is also written, but is not represented in
the decoded instruction stream.
This then leads to false positive missing INT3 objtool warnings in
noinstr code.
Instead of adding additional struct instruction objects, mark the RET
instruction with retpoline_safe to suppress the warning (since we know
there really is an INT3).
Fixes: 1cc1e4c8aa ("objtool: Add straight-line-speculation validation")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220323230712.GA8939@worktop.programming.kicks-ass.net
Objtool reports:
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx() falls through to next function poly1305_blocks_x86_64()
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_emit_avx() falls through to next function poly1305_emit_x86_64()
arch/x86/crypto/poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2() falls through to next function poly1305_blocks_x86_64()
Which reads like:
0000000000000040 <poly1305_blocks_x86_64>:
40: f3 0f 1e fa endbr64
...
0000000000000400 <poly1305_blocks_avx>:
400: f3 0f 1e fa endbr64
404: 44 8b 47 14 mov 0x14(%rdi),%r8d
408: 48 81 fa 80 00 00 00 cmp $0x80,%rdx
40f: 73 09 jae 41a <poly1305_blocks_avx+0x1a>
411: 45 85 c0 test %r8d,%r8d
414: 0f 84 2a fc ff ff je 44 <poly1305_blocks_x86_64+0x4>
...
These are simple conditional tail-calls and *should* be recognised as
such by objtool, however due to a mistake in commit 08f87a93c8
("objtool: Validate IBT assumptions") this is failing.
Specifically, the jump_dest is +4, this means the instruction pointed
at will not be ENDBR and as such it will fail the second clause of
is_first_func_insn() that was supposed to capture this exact case.
Instead, have is_first_func_insn() look at the previous instruction.
Fixes: 08f87a93c8 ("objtool: Validate IBT assumptions")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220322115125.811582125@infradead.org
attach_probe selftest fails on Debian-based distros with `failed to
resolve full path for 'libc.so.6'`. The reason is that these distros
embraced multiarch to the point where even for the "main" architecture
they store libc in /lib/<triple>.
This is configured in /etc/ld.so.conf and in theory it's possible to
replicate the loader's parsing and processing logic in libbpf, however
a much simpler solution is to just enumerate the known library paths.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404225020.51029-1-iii@linux.ibm.com
Before, our help output contained lines like
--kconfig_add KCONFIG_ADD
--qemu_config qemu_config
--jobs jobs
They're not very helpful.
The former kind come from the automatic 'metavar' we get from argparse,
the uppercase version of the flag name.
The latter are where we manually specified metavar as the flag name.
After:
--build_dir DIR
--make_options X=Y
--kunitconfig PATH
--kconfig_add CONFIG_X=Y
--arch ARCH
--cross_compile PREFIX
--qemu_config FILE
--jobs N
--timeout SECONDS
--raw_output [{all,kunit}]
--json [FILE]
This patch tries to make the code more clear by specifying the _type_ of
input we expect, e.g. --build_dir is a DIR, --qemu_config is a FILE.
I also switched it to uppercase since it looked more clearly like
placeholder text that way.
This patch also changes --raw_output to specify `choices` to make it
more clear what the options are, and this way argparse can validate it
for us, as shown by the added test case.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
attach_probe selftest fails on aarch64 with `failed to create kprobe
'sys_nanosleep+0x0' perf event: No such file or directory`. This is
because, like on several other architectures, nanosleep has a prefix.
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20220404142101.27900-1-iii@linux.ibm.com
In addition to displaying the program type in bpftool prog show
this enables us to be able to query bpf_prog_type_syscall
availability through feature probe as well as see
which helpers are available in those programs (such as
bpf_sys_bpf and bpf_sys_close)
Signed-off-by: Milan Landaverde <milan@mdaverde.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220331154555.422506-2-milan@mdaverde.com
The script for checking that various lists of types in bpftool remain in
sync with the UAPI BPF header uses a regex to parse enum bpf_prog_type.
If this enum contains a set of values different from the list of program
types in bpftool, it complains.
This script should have reported the addition, some time ago, of the new
BPF_PROG_TYPE_SYSCALL, which was not reported to bpftool's program types
list. It failed to do so, because it failed to parse that new type from
the enum. This is because the new value, in the BPF header, has an
explicative comment on the same line, and the regex does not support
that.
Let's update the script to support parsing enum values when they have
comments on the same line.
Signed-off-by: Quentin Monnet <quentin@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404140944.64744-1-quentin@isovalent.com
Filling log files with color codes makes diffs and other comparisons
difficult. Only emit vt100 codes when the stdout is a TTY.
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: linux-kselftest@vger.kernel.org
Cc: kunit-dev@googlegroups.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Before, kunit.py always printed "arch": "UM" in its json output, but...
1. With `kunit.py parse`, we could be parsing output from anywhere, so
we can't say that.
2. Capitalizing it is probably wrong, as it's `ARCH=um`
3. Commit 87c9c16317 ("kunit: tool: add support for QEMU") made it so
kunit.py could knowingly run a different arch, yet we'd still always
claim "UM".
This patch addresses all of those. E.g.
1.
$ ./tools/testing/kunit/kunit.py parse .kunit/test.log --json | grep -o '"arch.*' | sort -u
"arch": "",
2.
$ ./tools/testing/kunit/kunit.py run --json | ...
"arch": "um",
3.
$ ./tools/testing/kunit/kunit.py run --json --arch=x86_64 | ...
"arch": "x86_64",
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
When using --json, kunit.py run/exec/parse will produce results in
KernelCI json format.
As part of that, we include the build_dir that was used, and we
(incorrectly) hardcode in the arch, etc.
We'll want a way to plumb more values (as well as the correct `arch`),
so this patch groups those fields into kunit_json.Metadata type.
This patch should have no user visible changes.
And since we only used build_dir in KunitParseRequest for json, we can
now move it out of that struct and add it into KunitExecRequest, which
needs it and used to get it via inheritance.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Use a more idiomatic check that a list is non-empty (`if mylist:`) and
simplify the function body by dedenting and using a dict to map between
the kunit TestStatus enum => KernelCI json status string.
The dict hopefully makes it less likely to have bugs like commit
9a6bb30a88 ("kunit: tool: fix --json output for skipped tests").
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
--build_dir is set to a default of '.kunit' since commit ddbd60c779
("kunit: use --build_dir=.kunit as default"), but even before then it
was explicitly set to ''.
So outside of one unit test, there was no way for the build_dir to be
ever be None, and we can simplify code by fixing the unit test and
enforcing that via updated type annotations.
E.g. this lets us drop `get_file_path()` since it's now exactly
equivalent to os.path.join().
Note: there's some `if build_dir` checks that also fail if build_dir is
explicitly set to '' that just guard against passing "O=" to make.
But running `make O=` works just fine, so drop these checks.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Since we formally require python3.7+ since commit df4b0807ca
("kunit: tool: Assert the version requirement"), we can just use
@dataclasses.dataclass instead.
In kunit_config.py, we used namedtuple to create a hashable type that
had `name` and `value` fields and had to subclass it to define a custom
`__str__()`.
@datalcass lets us just define one type instead.
In qemu_config.py, we use namedtuple to allow modules to define various
parameters. Using @dataclass, we can add type-annotations for all these
fields, making our code more typesafe and making it easier for users to
figure out how to define new configs.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Commit be886ba90c ("kunit: run kunit_tool from any directory")
introduced this variable, but it was unused even in that commit.
Since it's still unused now and callers can instead use
get_kernel_root_path(), delete this var.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently kunit_json.get_json_result() will output the JSON-ified test
output to json_path, but iff it's not "stdout".
Instead, move the responsibility entirely over to the one caller.
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: David Gow <davidgow@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as "variant".
In some cases, the variant will change the setup, such that expectations
also change on teardown. Also pass variant to FIXTURE_TEARDOWN.
The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP,
right above.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20201210231010.420298-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The kselftest test harness has traditionally not run the registered
TEARDOWN handler when a test encountered an ASSERT. This creates
unexpected situations and tests need to be very careful about using
ASSERT, which seems a needless hurdle for test writers.
Because of the harness's design for optional failure handlers, the
original implementation of ASSERT used an abort() to immediately
stop execution, but that meant the context for running teardown was
lost. Instead, use setjmp/longjmp so that teardown can be done.
Failed SETUP routines continue to not be followed by TEARDOWN, though.
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Will Drewry <wad@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
I fixed a few warnings like this in commit e2aa5e650b
("selftests: fixup build warnings in pidfd / clone3 tests"), but I
missed this one by mistake. Since this variable is unused, remove it.
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The way the test target was defined before, when building with clang we
get a command line like this:
clang -Wall -Werror -g -I../../../../usr/include/ \
regression_enomem.c ../pidfd/pidfd.h -o regression_enomem
This yields an error, because clang thinks we want to produce both a *.o
file, as well as a precompiled header:
clang: error: cannot specify -o when generating multiple output files
gcc, for whatever reason, doesn't exhibit the same behavior which I
suspect is why the problem wasn't noticed before.
This can be fixed simply by using the LOCAL_HDRS infrastructure the
selftests lib.mk provides. It does the right think and marks the target
as depending on the header (so if the header changes, we rebuild), but
it filters the header out of the compiler command line, so we don't get
the error described above.
Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
In order to successfully build all these 32bit tests, these 32bit gcc
and glibc packages, named gcc-32bit and glibc-devel-static-32bit on SUSE,
need to be installed.
This patch added this information in warn_32bit_failure.
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix the following coccicheck warning:
tools/testing/selftests/proc/proc-pid-vm.c:371:26-27:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/proc/proc-pid-vm.c:420:26-27:
WARNING: Use ARRAY_SIZE
It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix the following coccicheck warning:
tools/testing/selftests/vDSO/vdso_test_correctness.c:309:46-47:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/vDSO/vdso_test_correctness.c:373:46-47:
WARNING: Use ARRAY_SIZE
It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64.
Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Those were added as part of the SMAP enablement but SMAP is currently
an integral part of kernel proper and there's no need to disable it
anymore.
Rip out that functionality. Leave --uaccess default on for objtool as
this is what objtool should do by default anyway.
If still needed - clearcpuid=smap.
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220127115626.14179-4-bp@alien8.de
Since core relos is an optional part of the .BTF.ext ELF section, we should
skip parsing it instead of returning -EINVAL if header size is less than
offsetofend(struct btf_ext_header, core_relo_len).
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220404005320.1723055-1-ytcoode@gmail.com
add tests that verify attaching by name for
1. local functions in a program
2. library functions in a shared object
...succeed for uprobe and uretprobes using new "func_name"
option for bpf_program__attach_uprobe_opts(). Also verify
auto-attach works where uprobe, path to binary and function
name are specified, but fails with -EOPNOTSUPP with a SEC
name that does not specify binary path/function.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-5-git-send-email-alan.maguire@oracle.com
Now that u[ret]probes can use name-based specification, it makes
sense to add support for auto-attach based on SEC() definition.
The format proposed is
SEC("u[ret]probe/binary:[raw_offset|[function_name[+offset]]")
For example, to trace malloc() in libc:
SEC("uprobe/libc.so.6:malloc")
...or to trace function foo2 in /usr/bin/foo:
SEC("uprobe//usr/bin/foo:foo2")
Auto-attach is done for all tasks (pid -1). prog can be an absolute
path or simply a program/library name; in the latter case, we use
PATH/LD_LIBRARY_PATH to resolve the full path, falling back to
standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if
the file is not found via environment-variable specified locations.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com
kprobe attach is name-based, using lookups of kallsyms to translate
a function name to an address. Currently uprobe attach is done
via an offset value as described in [1]. Extend uprobe opts
for attach to include a function name which can then be converted
into a uprobe-friendly offset. The calcualation is done in
several steps:
1. First, determine the symbol address using libelf; this gives us
the offset as reported by objdump
2. If the function is a shared library function - and the binary
provided is a shared library - no further work is required;
the address found is the required address
3. Finally, if the function is local, subtract the base address
associated with the object, retrieved from ELF program headers.
The resultant value is then added to the func_offset value passed
in to specify the uprobe attach address. So specifying a func_offset
of 0 along with a function name "printf" will attach to printf entry.
The modes of operation supported are then
1. to attach to a local function in a binary; function "foo1" in
"/usr/bin/foo"
2. to attach to a shared library function in a shared library -
function "malloc" in libc.
[1] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-3-git-send-email-alan.maguire@oracle.com
bpf_program__attach_uprobe_opts() requires a binary_path argument
specifying binary to instrument. Supporting simply specifying
"libc.so.6" or "foo" should be possible too.
Library search checks LD_LIBRARY_PATH, then /usr/lib64, /usr/lib.
This allows users to run BPF programs prefixed with
LD_LIBRARY_PATH=/path2/lib while still searching standard locations.
Similarly for non .so files, we check PATH and /usr/bin, /usr/sbin.
Path determination will be useful for auto-attach of BPF uprobe programs
using SEC() definition.
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/1648654000-21758-2-git-send-email-alan.maguire@oracle.com
Currently, when we run test_progs with just executable file name, for
example 'PATH=. test_progs-no_alu32', cd_flavor_subdir() will not check
if test_progs is running as a flavored test runner and switch into
corresponding sub-directory.
This will cause test_progs-no_alu32 executed by the
'PATH=. test_progs-no_alu32' command to run in the wrong directory and
load the wrong BPF objects.
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220403135245.1713283-1-ytcoode@gmail.com
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions. This fixes the following warnings from coccicheck:
./tools/testing/selftests/bpf/progs/test_xdp_noinline.c:567:9-10: WARNING:
return of 0/1 in function 'get_packet_dst' with return type bool
./tools/testing/selftests/bpf/progs/test_l4lb_noinline.c:221:9-10: WARNING:
return of 0/1 in function 'get_packet_dst' with return type bool
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1648779354-14700-1-git-send-email-baihaowen@meizu.com
Since commit 6521f89170 ("namei: prepare for idmapped mounts")
vfs_link's prototype was changed, the kprobe definition in
profiler selftest in turn wasn't updated. The result is that all
argument after the first are now stored in different registers. This
means that self-test has been broken ever since. Fix it by updating the
kprobe definition accordingly.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20220331140949.1410056-1-nborisov@suse.com
- Make the prctl() for enabling dynamic XSTATE components correct so it
adds the newly requested feature to the permission bitmap instead of
overwriting it. Add a selftest which validates that.
- Unroll string MMIO for encrypted SEV guests as the hypervisor cannot
emulate it.
- Handle supervisor states correctly in the FPU/XSTATE code so it takes
the feature set of the fpstate buffer into account. The feature sets
can differ between host and guest buffers. Guest buffers do not contain
supervisor states. So far this was not an issue, but with enabling
PASID it needs to be handled in the buffer offset calculation and in
the permission bitmaps.
- Avoid a gazillion of repeated CPUID invocations in by caching the values
early in the FPU/XSTATE code.
- Enable CONFIG_WERROR for X86.
- Make the X86 defconfigs more useful by adapting them to Y2022 reality.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmJJWwwTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoT3mEACA9xkNjECn/MHN3B0X5wTPhVyw9+TJ
OdfpqL7C9pbAU1s2mwf3TyicrCOqx8nlnOYB/mXgfRGnbZqmUeGQFpZFM587dm/I
r/BtouAzSASjnaW7SijT3gnRTqMPVNTcLOTUEVjnTa7zatw+t4rH1uxE9dLqEq9B
cKMtsBOJyTTbj4ie3ngkUS2PQngNNHLJ4oQGZW4wCA5snLuwF1LlgcZJy8Zkrlpo
D58h/ZV6K2/tI7INWLINlqGnxaL2B/Ld4zXsFH+t05XGh+JOiq8ueLi5tdfEPG9f
/pzuGia0Cv6WBv+jOHLCBe2kfgvBx+Y8Goi0tqL0hwKCGjpZlQkhRccrjbVSAPhW
2SfxOD1pulTwI1J75csYXjTc/heJvAv/ZpZSz3wldM3fyiwnmgfWKlMYqG6Xb9+T
2OHwEUJHJQnon/f25+yb9dWI7HYMw2fEIqu3CgbRyOviObcB9MM1uKVErkCYAUWY
W7Q8ShjNPrUguCPbw4YFPIwaazuhRbR8t2kRvfBOyTYwh3jo6U3eRL72Cov84uik
hnFtUdiusWtvV59ngZelREmd3iVKif2hxx7EoGDY/VV2Ru4C2X/xgJemKJeKSR/f
gm6pp8wbPSC4TBJOfP6IwYtoZKyu03miIeupPPUDxx0hLbx5j2e6EgVM5NVAeJFF
fu4MEkGvStZc+w==
=GK27
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes and updates:
- Make the prctl() for enabling dynamic XSTATE components correct so
it adds the newly requested feature to the permission bitmap
instead of overwriting it. Add a selftest which validates that.
- Unroll string MMIO for encrypted SEV guests as the hypervisor
cannot emulate it.
- Handle supervisor states correctly in the FPU/XSTATE code so it
takes the feature set of the fpstate buffer into account. The
feature sets can differ between host and guest buffers. Guest
buffers do not contain supervisor states. So far this was not an
issue, but with enabling PASID it needs to be handled in the buffer
offset calculation and in the permission bitmaps.
- Avoid a gazillion of repeated CPUID invocations in by caching the
values early in the FPU/XSTATE code.
- Enable CONFIG_WERROR in x86 defconfig.
- Make the X86 defconfigs more useful by adapting them to Y2022
reality"
* tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/fpu/xstate: Consolidate size calculations
x86/fpu/xstate: Handle supervisor states in XSTATE permissions
x86/fpu/xsave: Handle compacted offsets correctly with supervisor states
x86/fpu: Cache xfeature flags from CPUID
x86/fpu/xsave: Initialize offset/size cache early
x86/fpu: Remove unused supervisor only offsets
x86/fpu: Remove redundant XCOMP_BV initialization
x86/sev: Unroll string mmio with CC_ATTR_GUEST_UNROLL_STRING_IO
x86/config: Make the x86 defconfigs a bit more usable
x86/defconfig: Enable WERROR
selftests/x86/amx: Update the ARCH_REQ_XCOMP_PERM test
x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation
I made a stupid typo when adding the nexthop route warning selftest and
added both $IP and ip after it (double ip) on the cleanup path. The
error doesn't show up when running the test, but obviously it doesn't
cleanup properly after it.
Fixes: 392baa339c ("selftests: net: add delete nexthop route warning test")
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The test fails:
# ./test_offload.py
[...]
Test bpftool bound info reporting (own ns)...
FAIL: 3 BPF maps loaded, expected 2
File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 1177, in <module>
check_dev_info(False, "")
File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 645, in check_dev_info
maps = bpftool_map_list(expected=2, ns=ns)
File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 190, in bpftool_map_list
fail(True, "%d BPF maps loaded, expected %d" %
File "/root/bpf-next/tools/testing/selftests/bpf/./test_offload.py", line 86, in fail
tb = "".join(traceback.extract_stack().format())
Some base maps do not have names and they cannot be added due to compatibility
with older kernels, see [0]. So, just skip the unnamed maps.
[0] https://lore.kernel.org/bpf/CAEf4BzY66WPKQbDe74AKZ6nFtZjq5e+G3Ji2egcVytB9R6_sGQ@mail.gmail.com/
Signed-off-by: Yauheni Kaliuta <ykaliuta@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20220329081100.9705-1-ykaliuta@redhat.com
Was never used in bpf_sk_assign_test(), and was removed from handle_{tcp,udp}()
in commit 0b9ad56b1e ("selftests/bpf: Use SOCKMAP for server sockets in
bpf_sk_assign test").
Fixes: 0b9ad56b1e ("selftests/bpf: Use SOCKMAP for server sockets in bpf_sk_assign test")
Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220329154914.3718658-1-eyal.birger@gmail.com
Addresses this coccinelle warning:
./tools/perf/util/evlist.c:1333:5-8: Unneeded variable: "err". Return
"- ENOMEM" on line 1358
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/1648432532-23151-1-git-send-email-baihaowen@meizu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
perf_cpu_map__merge() will reuse one of its arguments if they are equal or
the other argument is NULL.
The arguments could be reused if it is known one set of values is a
subset of the other.
For example, a map of 0-1 and a map of just 0 when merged yields the map
of 0-1.
Currently a new map is created rather than adding a reference count to
the original 0-1 map.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Returns true if the second argument is a subset of the first.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.
For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.
For an uncore event, all_cpus may be just CPU 0 or every online CPU.
This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.
To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This essentially reverts commit c72e3f04b4 ("tools/perf/build:
Speed up git-version test on re-make") and commit 4e666cdb06
("perf tools: Fix dependency for version file creation")
In commit c72e3f04b4 ("tools/perf/build: Speed up git-version test
on re-make"), a makefile dependency on .git/HEAD was added. The
background is that running PERF-VERSION-FILE is relatively slow, and
commands like "git describe" are particularly slow.
In commit 4e666cdb06 ("perf tools: Fix dependency for version file
creation"), an additional dependency on .git/ORIG_HEAD was added, as
.git/HEAD may not change for "git reset --hard HEAD^" command. However,
depending on whether we're on a branch or not, a "git cherry-pick" may
not lead to the version being updated.
As discussed with the git community in [0], using git internal files for
dependencies is not reliable. Commit 4e666cdb06 also breaks some build
scenarios [1].
As mentioned, c72e3f04b4 ("tools/perf/build: Speed up git-version
test on re-make") was added to speed up the build. However in commit
7572733b84 ("perf tools: Fix version kernel tag") we removed the
call to "git describe", so just revert Makefile.perf back to same as pre
c72e3f04b4 ("tools/perf/build: Speed up git-version test on
re-make") and the build should not be so slow, as below:
Pre 7572733b8499:
$> time util/PERF-VERSION-GEN
PERF_VERSION = 5.17.rc8.g4e666cdb06ee
real 0m0.110s
user 0m0.091s
sys 0m0.019s
Post 7572733b8499:
$> time util/PERF-VERSION-GEN
PERF_VERSION = 5.17.rc8.g7572733b8499
real 0m0.039s
user 0m0.036s
sys 0m0.007s
[0] https://lore.kernel.org/git/87wngkpddp.fsf@igel.home/T/#m4a4dd6de52fdbe21179306cd57b3761eb07f45f8
[1] https://lore.kernel.org/linux-perf-users/20220329093120.4173283-1-matthieu.baerts@tessares.net/T/#u
Committer testing:
After a fresh rebuild using 'make -C tools/perf O=/tmp/build/perf install-bin':
$ perf -v
perf version 5.17.g162f9db407b6
$ git log --oneline -1
162f9db407b6a6e5 (HEAD -> perf/core) perf tools: Stop depending on .git files for building PERF-VERSION-FILE
$
Now using a detached tarball, i.e. outside the kernel source tree:
$ ls -la perf*tar
ls: cannot access 'perf*tar': No such file or directory
$ make perf-tar-src-pkg
TAR
PERF_VERSION = 5.17.g31d10b3ef133
$ ls -la perf*tar
-rw-r--r--. 1 acme acme 22241280 Mar 30 13:26 perf-5.17.0.tar
$ mv perf-5.17.0.tar /tmp
$ cd /tmp
$ tar xf perf-5.17.0.tar
$ cd perf-5.17.0/
$ make -C tools/perf |& tail
CC util/pmu.o
CC util/pmu-flex.o
CC util/expr-flex.o
CC util/expr.o
LD util/scripting-engines/perf-in.o
LD util/intel-pt-decoder/perf-in.o
LD util/perf-in.o
LD perf-in.o
LINK perf
make: Leaving directory '/tmp/perf-5.17.0/tools/perf'
$ tools/perf/perf -v
perf version 5.17.g31d10b3ef133
$ pwd
/tmp/perf-5.17.0
$ cat PERF-VERSION-FILE
#define PERF_VERSION "5.17.g31d10b3ef133"
$
Fixes: 4e666cdb06 ("perf tools: Fix dependency for version file creation")
Reported-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: John Garry <john.garry@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/1648635774-14581-1-git-send-email-john.garry@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
991625f3dd ("x86/ibt: Add IBT feature, MSR and #CP handling")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YkSCx2kr4ambH+Qe@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
caa574ffc4 ("drm/i915/uapi: document behaviour for DG2 64K support")
That don't add any new ioctl, so no changes in tooling.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Link: http://lore.kernel.org/lkml/YkSChHqaOApscFQ0@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
6d8491910f ("KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2")
ef11c9463a ("KVM: s390: Add vm IOCTL for key checked guest absolute memory access")
e9e9feebcb ("KVM: s390: Add optional storage key checking to MEMOP IOCTL")
That just rebuilds perf, as these patches don't add any new KVM ioctl to
be harvested for the the 'perf trace' ioctl syscall argument
beautifiers.
This is also by now used by tools/testing/selftests/kvm/, a simple test
build succeeded.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Janis Schoetterl-Glausch <scgl@linux.ibm.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lore.kernel.org/lkml/YkSCOWHQdir1lhdJ@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
34739fd95f ("KVM: arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field")
583cda1b0e ("KVM: arm64: Refuse to run VCPU if the PMU doesn't match the physical CPU")
That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h
Cc: Marc Zyngier <maz@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Link: https://lore.kernel.org/lkml/YkSB4Q7kWmnaqeZU@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes from:
9457056ac4 ("mm: madvise: MADV_DONTNEED_LOCKED")
That result in these changes in the tools:
$ diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
--- tools/include/uapi/asm-generic/mman-common.h 2022-03-29 16:17:50.461694991 -0300
+++ include/uapi/asm-generic/mman-common.h 2022-03-27 19:12:48.923250468 -0300
@@ -75,6 +75,8 @@
#define MADV_POPULATE_READ 22 /* populate (prefault) page tables readable */
#define MADV_POPULATE_WRITE 23 /* populate (prefault) page tables writable */
+#define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */
+
/* compatibility flags */
#define MAP_FILE 0
$ tools/perf/trace/beauty/madvise_behavior.sh > before
$ cp include/uapi/asm-generic/mman-common.h tools/include/uapi/asm-generic/mman-common.h
$ tools/perf/trace/beauty/madvise_behavior.sh > after
$ diff -u before after
--- before 2022-03-29 16:18:04.091044244 -0300
+++ after 2022-03-29 16:18:11.692238906 -0300
@@ -20,6 +20,7 @@
[21] = "PAGEOUT",
[22] = "POPULATE_READ",
[23] = "POPULATE_WRITE",
+ [24] = "DONTNEED_LOCKED",
[100] = "HWPOISON",
[101] = "SOFT_OFFLINE",
};
$
I.e. now when madvise gets those behaviours as args, 'perf trace' will
be able to translate from the number to a human readable string and to
use the strings in tracepoint filter expressions.
This addresses the following perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkNcUfeh795yqGMV@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
fba60b171a ("libbpf: Use IS_ERR_OR_NULL() in hashmap__free()")
That don't entail any changes in tools/perf.
This addresses this perf build warning:
Warning: Kernel ABI header at 'tools/perf/util/hashmap.h' differs from latest version at 'tools/lib/bpf/hashmap.h'
diff -u tools/perf/util/hashmap.h tools/lib/bpf/hashmap.h
Not a kernel ABI, its just that this uses the mechanism in place for
checking kernel ABI files drift.
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mauricio Vásquez <mauricio@kinvolk.io>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/YkMb2SAIai2VeuUD@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Passing NULL to perf_cpu_map__max doesn't make sense as there is no
valid max. Avoid this problem by null checking in
perf_stat_init_aggr_mode.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328062414.1893550-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -c option is used to cull by stacktrace. Now, --cull option has
been Added in page_owner_sort.c. Culling by stacktrace is one of the
function of "--cull". No need to set an extra parameter. So remove -c
option.
Remove parsing of -c when parse parameter and remove "-c" from usage.
This work is coauthored by
Shenghong Han
Yixuan Cao
Chongxi Zhao
Jiajian Ye
Yuhong Feng
Yongqiang Liu
Link: https://lkml.kernel.org/r/20220326085920.1470081-1-zhangyinan2019@email.szu.edu.cn
Signed-off-by: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Georgi Djakov <georgi.djakov@linaro.org>
Cc: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a test which causes a WARNING on kernels which treat a
nexthop route like a normal route when comparing for deletion and a
device is specified. That is, a route is found but we hit a warning while
matching it. The warning is from fib_info_nh() in include/net/nexthop.h
because we run it on a fib_info with nexthop object. The call chain is:
inet_rtm_delroute -> fib_table_delete -> fib_nh_match (called with a
nexthop fib_info and also with fc_oif set thus calling fib_info_nh on
the fib_info and triggering the warning).
Repro steps:
$ ip nexthop add id 12 via 172.16.1.3 dev veth1
$ ip route add 172.16.101.1/32 nhid 12
$ ip route delete 172.16.101.1/32 dev veth1
Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
vdpa generic device type support
More virtio hardening for broken devices
On the same theme, revert some virtio hotplug hardening patches -
they were misusing some interrupt flags, will have to be reverted.
RSS support in virtio-net
max device MTU support in mlx5 vdpa
akcipher support in virtio-crypto
shared IRQ support in ifcvf vdpa
a minor performance improvement in vhost
Enable virtio mem for ARM64
beginnings of advance dma support
Cleanups, fixes all over the place.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmJEEk8PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpcpUH+wRIXrzveirsN4MYH0aAeF+SLYaA5pgtO4U7
da22HYtwlMrDRMxwjepKBOTSu89uP5LEK7IKWPj9VRZg+GLz/Cdfc6BZl/fND3qt
0yFpwG1ZLsBK1+WHbysWQneEbPjXqQdbh9eVkKVGcNkRuLJJwXbmF95dyQEJwzeh
dPHssDcEC2tRgHAMrLyjLPKwMCRwcgtdPoB1ZC+lqTs3G6lktAfREEvqVfJOVe1b
mQcgdAJ+aRM0J/w/PYTmxFOZPYAmQ6hmAQ8Hf7nkjfRWQ4EM91W0cKAoZPc/+7KN
ZfFKVL28GEZLJqnx+3xijwCR2gwVHsRYZHaTjfGgQUWZPoB3Vrc=
=ynRx
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- vdpa generic device type support
- more virtio hardening for broken devices (but on the same theme,
revert some virtio hotplug hardening patches - they were misusing
some interrupt flags and had to be reverted)
- RSS support in virtio-net
- max device MTU support in mlx5 vdpa
- akcipher support in virtio-crypto
- shared IRQ support in ifcvf vdpa
- a minor performance improvement in vhost
- enable virtio mem for ARM64
- beginnings of advance dma support
- cleanups, fixes all over the place
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (33 commits)
vdpa/mlx5: Avoid processing works if workqueue was destroyed
vhost: handle error while adding split ranges to iotlb
vdpa: support exposing the count of vqs to userspace
vdpa: change the type of nvqs to u32
vdpa: support exposing the config size to userspace
vdpa/mlx5: re-create forwarding rules after mac modified
virtio: pci: check bar values read from virtio config space
Revert "virtio_pci: harden MSI-X interrupts"
Revert "virtio-pci: harden INTX interrupts"
drivers/net/virtio_net: Added RSS hash report control.
drivers/net/virtio_net: Added RSS hash report.
drivers/net/virtio_net: Added basic RSS support.
drivers/net/virtio_net: Fixed padded vheader to use v1 with hash.
virtio: use virtio_device_ready() in virtio_device_restore()
tools/virtio: compile with -pthread
tools/virtio: fix after premapped buf support
virtio_ring: remove flags check for unmap packed indirect desc
virtio_ring: remove flags check for unmap split indirect desc
virtio_ring: rename vring_unmap_state_packed() to vring_unmap_extra_packed()
net/mlx5: Add support for configuring max device MTU
...
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
additional flags to be passed to user-space programs.
- Fix missing fflush() bugs in Kconfig and fixdep
- Fix a minor bug in the comment format of the .config file
- Make kallsyms ignore llvm's local labels, .L*
- Fix UAPI compile-test for cross-compiling with Clang
- Extend the LLVM= syntax to support LLVM=<suffix> form for using a
particular version of LLVm, and LLVM=<prefix> form for using custom
LLVM in a particular directory path.
- Clean up Makefiles
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs
IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/
p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj
ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa
rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z
ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF
nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW
/YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX
oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2
p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD
ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1
+LJMEpdZO0dFvpF+
=84rW
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Add new environment variables, USERCFLAGS and USERLDFLAGS to allow
additional flags to be passed to user-space programs.
- Fix missing fflush() bugs in Kconfig and fixdep
- Fix a minor bug in the comment format of the .config file
- Make kallsyms ignore llvm's local labels, .L*
- Fix UAPI compile-test for cross-compiling with Clang
- Extend the LLVM= syntax to support LLVM=<suffix> form for using a
particular version of LLVm, and LLVM=<prefix> form for using custom
LLVM in a particular directory path.
- Clean up Makefiles
* tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: Make $(LLVM) more flexible
kbuild: add --target to correctly cross-compile UAPI headers with Clang
fixdep: use fflush() and ferror() to ensure successful write to files
arch: syscalls: simplify uapi/kapi directory creation
usr/include: replace extra-y with always-y
certs: simplify empty certs creation in certs/Makefile
certs: include certs/signing_key.x509 unconditionally
kallsyms: ignore all local labels prefixed by '.L'
kconfig: fix missing '# end of' for empty menu
kconfig: add fflush() before ferror() check
kbuild: replace $(if A,A,B) with $(or A,B)
kbuild: Add environment variables for userprogs flags
kbuild: unify cmd_copy and cmd_shipped
Features:
- kprobes: rethook: x86: replace kretprobe trampoline with rethook
Current release - regressions:
- sfc: avoid null-deref on systems without NUMA awareness
in the new queue sizing code
Current release - new code bugs:
- vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
- eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl
when interface is down
Previous releases - always broken:
- openvswitch: correct neighbor discovery target mask field
in the flow dump
- wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
- rxrpc: fix call timer start racing with call destruction
- rxrpc: fix null-deref when security type is rxrpc_no_security
- can: fix UAF bugs around echo skbs in multiple drivers
Misc:
- docs: move netdev-FAQ to the "process" section of the documentation
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJF3S0ACgkQMUZtbf5S
IruIvA/+NZx+c+fBBrbjOh63avRL7kYIqIDREf+v6lh4ZXmbrp22xalcjIdxgWeK
vAiYfYmzZblWAGkilcvPG3blCBc+9b+YE+pPJXFe60Huv3eYpjKfgTKwQOg/lIeM
8MfPP7eBwcJ/ltSTRtySRl9LYgyVcouP9rAVJavFVYrvuDYunwhfChswVfGCYon8
42O4nRwrtkTE1MjHD8HS3YxvwGlo+iIyhsxgG/gWx8F2xeIG22H6adzjDXcCQph8
air/awrJ4enYkVMRokGNfNppK9Z3vjJDX5xha3CREpvXNPe0F24cAE/L8XqyH7+r
/bXP5y9VC9mmEO7x4Le3VmDhOJGbCOtR89gTlevftDRdSIrbNHffZhbPW48tR7o8
NJFlhiSJb4HEMN0q7BmxnWaKlbZUlvLEXLuU5ytZE/G7i+nETULlunfZrCD4eNYH
gBGYhiob2I/XotJA9QzG/RDyaFwDaC/VARsyv37PSeBAl/yrEGAeP7DsKkKX/ayg
LM9ItveqHXK30J0xr3QJA8s49EkIYejjYR3l0hQ9esf9QvGK99dE/fo44Apf3C3A
Lz6XpnRc5Xd7tZ9Aopwb3FqOH6WR9Hq9Qlbk0qifsL/2sRbatpuZbbDK6L3CR3Ir
WFNcOoNbbqv85kCKFXFjj0jdpoNa9Yej8XFkMkVSkM3sHImYmYQ=
=5Bvy
-----END PGP SIGNATURE-----
Merge tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull more networking updates from Jakub Kicinski:
"Networking fixes and rethook patches.
Features:
- kprobes: rethook: x86: replace kretprobe trampoline with rethook
Current release - regressions:
- sfc: avoid null-deref on systems without NUMA awareness in the new
queue sizing code
Current release - new code bugs:
- vxlan: do not feed vxlan_vnifilter_dump_dev with non-vxlan devices
- eth: lan966x: fix null-deref on PHY pointer in timestamp ioctl when
interface is down
Previous releases - always broken:
- openvswitch: correct neighbor discovery target mask field in the
flow dump
- wireguard: ignore v6 endpoints when ipv6 is disabled and fix a leak
- rxrpc: fix call timer start racing with call destruction
- rxrpc: fix null-deref when security type is rxrpc_no_security
- can: fix UAF bugs around echo skbs in multiple drivers
Misc:
- docs: move netdev-FAQ to the 'process' section of the
documentation"
* tag 'net-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (57 commits)
vxlan: do not feed vxlan_vnifilter_dump_dev with non vxlan devices
openvswitch: Add recirc_id to recirc warning
rxrpc: fix some null-ptr-deref bugs in server_key.c
rxrpc: Fix call timer start racing with call destruction
net: hns3: fix software vlan talbe of vlan 0 inconsistent with hardware
net: hns3: fix the concurrency between functions reading debugfs
docs: netdev: move the netdev-FAQ to the process pages
docs: netdev: broaden the new vs old code formatting guidelines
docs: netdev: call out the merge window in tag checking
docs: netdev: add missing back ticks
docs: netdev: make the testing requirement more stringent
docs: netdev: add a question about re-posting frequency
docs: netdev: rephrase the 'should I update patchwork' question
docs: netdev: rephrase the 'Under review' question
docs: netdev: shorten the name and mention msgid for patch status
docs: netdev: note that RFC postings are allowed any time
docs: netdev: turn the net-next closed into a Warning
docs: netdev: move the patch marking section up
docs: netdev: minor reword
docs: netdev: replace references to old archives
...
The LLVM make variable allows a developer to quickly switch between the
GNU and LLVM tools. However, it does not handle versioned binaries, such
as the ones shipped by Debian, as LLVM=1 just defines the tool variables
with the unversioned binaries.
There was some discussion during the review of the patch that introduces
LLVM=1 around versioned binaries, ultimately coming to the conclusion
that developers can just add the folder that contains the unversioned
binaries to their PATH, as Debian's versioned suffixed binaries are
really just symlinks to the unversioned binaries in /usr/lib/llvm-#/bin:
$ realpath /usr/bin/clang-14
/usr/lib/llvm-14/bin/clang
$ PATH=/usr/lib/llvm-14/bin:$PATH make ... LLVM=1
However, that can be cumbersome to developers who are constantly testing
series with different toolchains and versions. It is simple enough to
support these versioned binaries directly in the Kbuild system by
allowing the developer to specify the version suffix with LLVM=, which
is shorter than the above suggestion:
$ make ... LLVM=-14
It does not change the meaning of LLVM=1 (which will continue to use
unversioned binaries) and it does not add too much additional complexity
to the existing $(LLVM) code, while allowing developers to quickly test
their series with different versions of the whole LLVM suite of tools.
Some developers may build LLVM from source but not add the binaries to
their PATH, as they may not want to use that toolchain systemwide.
Support those developers by allowing them to supply the directory that
the LLVM tools are available in, as it is no more complex to support
than the version suffix change above.
$ make ... LLVM=/path/to/llvm/
Update and reorder the documentation to reflect these new additions.
At the same time, notate that LLVM=0 is not the same as just omitting it
altogether, which has confused people in the past.
Link: https://lore.kernel.org/r/20200317215515.226917-1-ndesaulniers@google.com/
Link: https://lore.kernel.org/r/20220224151322.072632223@infradead.org/
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
This patch tests attaching an fentry prog to a struct_ops prog.
Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220330011502.2985292-1-kafai@fb.com
The seed_rng() function was written to work across lots of old kernels,
back when WireGuard used a big compatibility layer. Now that things have
evolved, we can vastly simplify this, by just marking the RNG as seeded.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
- Add perf support for nvdimm events, initially only for 'papr_scm'
devices.
- Deprecate the 'block aperture' support in libnvdimm, it only ever
existed in the specification, not in shipping product.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjvuTwAKCRDfioYZHlFs
Z4JbAQCViArRj/yxffDB4kg4FNlOgAQbxPblC07E06UX8Jj2DgD+NY2YJIucz0qb
87v0+CorJjtoy5dJ9vxAR8keojT3RQ0=
=oAwm
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The update for this cycle includes the deprecation of block-aperture
mode and a new perf events interface for the papr_scm nvdimm driver.
The perf events approach was acked by PeterZ.
- Add perf support for nvdimm events, initially only for 'papr_scm'
devices.
- Deprecate the 'block aperture' support in libnvdimm, it only ever
existed in the specification, not in shipping product"
* tag 'libnvdimm-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
nvdimm/blk: Fix title level
MAINTAINERS: remove section LIBNVDIMM BLK: MMIO-APERTURE DRIVER
powerpc/papr_scm: Fix build failure when
drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set
nvdimm/region: Delete nd_blk_region infrastructure
ACPI: NFIT: Remove block aperture support
nvdimm/namespace: Delete nd_namespace_blk
nvdimm/namespace: Delete blk namespace consideration in shared paths
nvdimm/blk: Delete the block-aperture window driver
nvdimm/region: Fix default alignment for small regions
docs: ABI: sysfs-bus-nvdimm: Document sysfs event format entries for nvdimm pmu
powerpc/papr_scm: Add perf interface support
drivers/nvdimm: Add perf interface to expose nvdimm performance stats
drivers/nvdimm: Add nvdimm pmu structure
Avoid pointer type value compared with 0 to make code clear. Reported by
coccicheck:
tools/testing/selftests/bpf/progs/map_ptr_kern.c:370:21-22:
WARNING comparing pointer to 0
tools/testing/selftests/bpf/progs/map_ptr_kern.c:397:21-22:
WARNING comparing pointer to 0
Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1648605588-19269-1-git-send-email-baihaowen@meizu.com
Andrii noticed that since f97b8b9bd6 ("bpftool: Fix a bug in subskeleton
code generation") the subskeleton code allows bpf_object__destroy_subskeleton
to overwrite the errno that subskeleton__open would return with. While this
is not currently an issue, let's make it future-proof.
This patch explicitly tracks err in subskeleton__open and skeleton__create
(i.e. calloc failure is explicitly ENOMEM) and ensures that errno is -err on
the error return path. The skeleton code had to be changed since maps and
progs codegen is shared with subskeletons.
Fixes: f97b8b9bd6 ("bpftool: Fix a bug in subskeleton code generation")
Signed-off-by: Delyan Kratunov <delyank@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/3b6bfbb770c79ae64d8de26c1c1bd9d53a4b85f8.camel@fb.com
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmJDDgsLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYM9oBAAxm93DZCXsqektM2qJ34o1KCyfAhvTvZ1r38ab+cl
wJwmMPF6/S9MCj6XZEnCzUnXL//TnhcuYVztNpPTWqhx6QaqWmmx9yJKjoYAnHce
svVMef7iipn35w7hAPpiVR/AVwWyxQCkSC+5sgp6XX8mp7l7I3ajfO0fZ52JCcxw
12d4k1E0yjC096Kw8wXQv+rzmCAoQcK9Jj20COUO3rkgOr68ZIXse2HXUJjn76Fy
wym2rJfqJ9mdKrDHqphe1ntIzkcQNWx9xR0UVh7/e4p7Si5H8Lp8QWwC7Zw6Y2Gb
paeotIMu1uTKkcZI4K54J8PXRLA7PLrDSDFdxnKOsWNZU/inIwt9b11kr9FOaYqR
BLJ+w6bF1/PmM6q2gkOwNuoiJD5YQfwF7y+wi84VyaauM0J8ssIHYnVrCWXn0m1E
4veAkWasAYb1oaoNlDhmZEbpI+kcN3xwDyK1WbtHuGvR00oSvxl0d1viGTVXYfDA
k5rBjb7CovK8JIrFIJoMiDM4TvdauxL66IlEL7ohLDh6l1f09Q0+gsdVcAM0ObX6
zOkoulyHCFqkePvoH/xpyIrZZ9cHA228fZYC7QiBcxdWlD3dFMWkKvhajiSDQJSW
SAz94CeEDWn64Q462N+ecivKlLwz7j/TqOig5xU+/6UoMC/2a7+HIim+p6bjh8Pc
5Gg=
=C+Es
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:
- do not zero buffer in set_memory_decrypted (Kirill A. Shutemov)
- fix return value of dma-debug __setup handlers (Randy Dunlap)
- swiotlb cleanups (Robin Murphy)
- remove most remaining users of the pci-dma-compat.h API
(Christophe JAILLET)
- share the ABI header for the DMA map_benchmark with userspace
(Tian Tao)
- update the maintainer for DMA MAPPING BENCHMARK (Xiang Chen)
- remove CONFIG_DMA_REMAP (me)
* tag 'dma-mapping-5.18' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: benchmark: extract a common header file for map_benchmark definition
dma-debug: fix return value of __setup handlers
dma-mapping: remove CONFIG_DMA_REMAP
media: v4l2-pci-skeleton: Remove usage of the deprecated "pci-dma-compat.h" API
rapidio/tsi721: Remove usage of the deprecated "pci-dma-compat.h" API
sparc: Remove usage of the deprecated "pci-dma-compat.h" API
agp/intel: Remove usage of the deprecated "pci-dma-compat.h" API
alpha: Remove usage of the deprecated "pci-dma-compat.h" API
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
swiotlb: simplify array allocation
swiotlb: tidy up includes
swiotlb: simplify debugfs setup
swiotlb: do not zero buffer in set_memory_decrypted()
llvm upstream patch ([1]) added to issue warning for code like
void test() {
int j = 0;
for (int i = 0; i < 1000; i++)
j++;
return;
}
This triggered several errors in selftests/bpf build since
compilation flag -Werror is used.
...
test_lpm_map.c:212:15: error: variable 'n_matches' set but not used [-Werror,-Wunused-but-set-variable]
size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
^
test_lpm_map.c:212:26: error: variable 'n_matches_after_delete' set but not used [-Werror,-Wunused-but-set-variable]
size_t i, j, n_matches, n_matches_after_delete, n_nodes, n_lookups;
^
...
prog_tests/get_stack_raw_tp.c:32:15: error: variable 'cnt' set but not used [-Werror,-Wunused-but-set-variable]
static __u64 cnt;
^
...
For test_lpm_map.c, 'n_matches'/'n_matches_after_delete' are changed to be volatile
in order to silent the warning. I didn't remove these two declarations since
they are referenced in a commented code which might be used by people in certain
cases. For get_stack_raw_tp.c, the variable 'cnt' is removed.
[1] https://reviews.llvm.org/D122271
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20220325200304.2915588-1-yhs@fb.com
Arnaldo reported perf compilation fail with:
$ make -k BUILD_BPF_SKEL=1 CORESIGHT=1 PYTHON=python3
...
In file included from util/bpf_counter.c:28:
/tmp/build/perf//util/bpf_skel/bperf_leader.skel.h: In function ‘bperf_leader_bpf__assert’:
/tmp/build/perf//util/bpf_skel/bperf_leader.skel.h:351:51: error: unused parameter ‘s’ [-Werror=unused-parameter]
351 | bperf_leader_bpf__assert(struct bperf_leader_bpf *s)
| ~~~~~~~~~~~~~~~~~~~~~~~~~^
cc1: all warnings being treated as errors
If there's nothing to generate in the new assert function,
we will get unused 's' warn/error, adding 'unused' attribute to it.
Fixes: 08d4dba6ae ("bpftool: Bpf skeletons assert type sizes")
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: https://lore.kernel.org/bpf/20220328083703.2880079-1-jolsa@kernel.org
14c174633f ("random: remove unused tracepoints") removed all the
tracepoints from drivers/char/random.c, one of which,
random:urandom_read, was used by stacktrace_build_id selftest to trigger
stack trace capture.
Fix breakage by switching to kprobing urandom_read() function.
Suggested-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20220325225643.2606-1-andrii@kernel.org
Current release - regressions:
- llc: only change llc->dev when bind() succeeds, fix null-deref
Current release - new code bugs:
- smc: fix a memory leak in smc_sysctl_net_exit()
- dsa: realtek: make interface drivers depend on OF
Previous releases - regressions:
- sched: act_ct: fix ref leak when switching zones
Previous releases - always broken:
- netfilter: egress: report interface as outgoing
- vsock/virtio: enable VQs early on probe and finish the setup
before using them
Misc:
- memcg: enable accounting for nft objects
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmJCSMMACgkQMUZtbf5S
Irt4eQ//fTAC/7mBmT8uoUJMZlrRckSDnJ/Y1ukgOQrjbwcgeRi0PK1cy2oGmU4w
mRZ8zhskVpmzodPuduCIdmsdE2PaWTCFoVRC52QH1HffCRbj1mRK9vrf94q0TP9+
jqzaIOhKyWKGMgYQGObIFbojnF4H1wm+tIXcEVWzxivS/2yY4W/3hdBIblBO++r5
c9vxO//qzGH1kGDCWfahuJSTvZBpQ3HTmjGLC1F8xTh8RkR7MGQyGCQ984j+DClb
PJJQXeV/Zoyxvrzv14MU5Ms9+lsgH2pyBdVzvN8p2QSwSaU8CsbbM05I4lB5mT/b
tGBYNreMmuQbXRxNVoxaZOTgQqEtTgH+AKJ9L0f2Es6Ftp5TTrFZZA97lO0/qzMj
NGbxa0p7tlNyOGKDxyUw6SB1+kqqgR2a6skk4XnQ6CAH7AvxSFOxvt63mjeJfCY7
+j5Lxtm+a/RpVt6Djsvpwq12lKiootcbEyMoUKxKeQ+4I08z6W6hoS1zjUDeMDM6
q8eDXsxpZgGF6k7x3eKkOWKLMVeQ1cv0CjGaCoTXCqtGZTixRll3v6I6/Oh405Gw
18fZkIC4TjRdXyfA23n7MzyukjOjmbzn5Kx01lfiMYFeiS/tMwFt/W+ka836j0R6
gzUdEHLEZdPN699WP4fRrxmIjGlGpEpl02WDEFrP+LDdFCYHCzY=
=sOIu
-----END PGP SIGNATURE-----
Merge tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter.
Current release - regressions:
- llc: only change llc->dev when bind() succeeds, fix null-deref
Current release - new code bugs:
- smc: fix a memory leak in smc_sysctl_net_exit()
- dsa: realtek: make interface drivers depend on OF
Previous releases - regressions:
- sched: act_ct: fix ref leak when switching zones
Previous releases - always broken:
- netfilter: egress: report interface as outgoing
- vsock/virtio: enable VQs early on probe and finish the setup before
using them
Misc:
- memcg: enable accounting for nft objects"
* tag 'net-5.18-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (39 commits)
Revert "selftests: net: Add tls config dependency for tls selftests"
net/smc: Send out the remaining data in sndbuf before close
net: move net_unlink_todo() out of the header
net: dsa: bcm_sf2_cfp: fix an incorrect NULL check on list iterator
net: bnxt_ptp: fix compilation error
selftests: net: Add tls config dependency for tls selftests
memcg: enable accounting for nft objects
net/sched: act_ct: fix ref leak when switching zones
net/smc: fix a memory leak in smc_sysctl_net_exit()
selftests: tls: skip cmsg_to_pipe tests with TLS=n
octeontx2-af: initialize action variable
net: sparx5: switchdev: fix possible NULL pointer dereference
net/x25: Fix null-ptr-deref caused by x25_disconnect
qlcnic: dcb: default to returning -EOPNOTSUPP
net: sparx5: depends on PTP_1588_CLOCK_OPTIONAL
net: hns3: fix phy can not link up when autoneg off and reset
net: hns3: add NULL pointer check for hns3_set/get_ringparam()
net: hns3: add netdev reset check for hns3_set_tunable()
net: hns3: clean residual vf config after disable sriov
net: hns3: add max order judgement for tx spare buffer
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmJBmMMACgkQUqAMR0iA
lPLeXBAAnAqK3rY+mberKcFKHLaNJ0O2Y7OMcCf5Xh8snnivgi9RYcqklSbxXQwm
hILa2oP6gUug16zhD2XVb5Mxic7MfgsN8mfy/eItMfEVs3KqUzHKSryTp6N1PA5x
DiQvC7Fg7NGYZs95prMCrFILwVrkLYiKlWGTmlWrz/MTfOOsbAjB9yv5bfalvlo+
A3+XpXxHfb/Wl2kXrUjTey61Rrk3gdgLhucrHVxttb9VPp1ODoLLLu4ePoN9CArA
fpGVUfeh1IDV3sUgwpGgXBwJFBsXxJ9ZYGnJzea0opNn8EgfwgIC97qTaa+GXX/j
bUJFPUNrGGEq99JbPgHmu+imXC1eFfCwxXK7zi6TR7mIOq6I/DfQxCLYUHZpFIMn
mt30wm21j2zVRsOt27frhjyXCSnts7HmOleBcd8NL+aIVKaOqamEOQrmPZPj8eH2
cx9gAphhFv6EDnr3Cj3SbpBrqf1pcxjVa9T2gfhJjtkLLyxR2ruvlRvnWNnaKJZZ
bC7OL74h6eAhJk1pwPcHW2BsABv3jWPzBrOYkjIhRWUY77UriWNKJ27Dd83cAVkw
7P6GbGfTbSCX7m2+0pEdKxc9hMshK2zyTLbu02PopD7yGBDkrcnkgpPGPVMDsj4c
44ANkVlLojBAE43fXXdRPfpSKKBa0pi6MO5WXORrWiY7PNZjAAw=
=PhGM
-----END PGP SIGNATURE-----
Merge tag 'livepatching-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Forced transitions block only to-be-removed livepatches [Chengming]
- Detect when ftrace handler could not be disabled in self-tests [David]
- Calm down warning from a static analyzer [Tom]
* tag 'livepatching-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
livepatch: Reorder to use before freeing a pointer
livepatch: Don't block removal of patches that are safe to unload
livepatch: Skip livepatch tests if ftrace cannot be configured
When using pthreads, one has to compile and link with -lpthread,
otherwise e.g. glibc is not guaranteed to be reentrant.
This replaces -lpthread.
Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Here is the big set of char/misc and other small driver subsystem
updates for 5.18-rc1.
Included in here are merges from driver subsystems which contain:
- iio driver updates and new drivers
- fsi driver updates
- fpga driver updates
- habanalabs driver updates and support for new hardware
- soundwire driver updates and new drivers
- phy driver updates and new drivers
- coresight driver updates
- icc driver updates
Individual changes include:
- mei driver updates
- interconnect driver updates
- new PECI driver subsystem added
- vmci driver updates
- lots of tiny misc/char driver updates
There will be two merge conflicts with your tree, one in MAINTAINERS
which is obvious to fix up, and one in drivers/phy/freescale/Kconfig
which also should be easy to resolve.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYkG3fQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykNEgCfaRG8CRxewDXOO4+GSeA3NGK+AIoAnR89donC
R4bgCjfg8BWIBcVVXg3/
=WWXC
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver updates from Greg KH:
"Here is the big set of char/misc and other small driver subsystem
updates for 5.18-rc1.
Included in here are merges from driver subsystems which contain:
- iio driver updates and new drivers
- fsi driver updates
- fpga driver updates
- habanalabs driver updates and support for new hardware
- soundwire driver updates and new drivers
- phy driver updates and new drivers
- coresight driver updates
- icc driver updates
Individual changes include:
- mei driver updates
- interconnect driver updates
- new PECI driver subsystem added
- vmci driver updates
- lots of tiny misc/char driver updates
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
firmware: google: Properly state IOMEM dependency
kgdbts: fix return value of __setup handler
firmware: sysfb: fix platform-device leak in error path
firmware: stratix10-svc: add missing callback parameter on RSU
arm64: dts: qcom: add non-secure domain property to fastrpc nodes
misc: fastrpc: Add dma handle implementation
misc: fastrpc: Add fdlist implementation
misc: fastrpc: Add helper function to get list and page
misc: fastrpc: Add support to secure memory map
dt-bindings: misc: add fastrpc domain vmid property
misc: fastrpc: check before loading process to the DSP
misc: fastrpc: add secure domain support
dt-bindings: misc: add property to support non-secure DSP
misc: fastrpc: Add support to get DSP capabilities
misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
misc: fastrpc: separate fastrpc device from channel context
dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
dt-bindings: nvmem: make "reg" property optional
nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
nvmem: dt-bindings: Fix the error of dt-bindings check
...
selftest net tls test cases need TLS=m without this the test hangs.
Enabling config TLS solves this problem and runs to complete.
- CONFIG_TLS=m
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
New features:
perf ftrace:
- Add -n/--use-nsec option to the 'latency' subcommand.
Default: usecs:
$ sudo perf ftrace latency -T dput -a sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 2098375 | ############################# |
1 - 2 us | 61 | |
2 - 4 us | 33 | |
4 - 8 us | 13 | |
8 - 16 us | 124 | |
16 - 32 us | 123 | |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 1 | |
256 - 512 us | 0 | |
Better granularity with nsec:
$ sudo perf ftrace latency -T dput -a -n sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 0 | |
1 - 2 ns | 0 | |
2 - 4 ns | 0 | |
4 - 8 ns | 0 | |
8 - 16 ns | 0 | |
16 - 32 ns | 0 | |
32 - 64 ns | 0 | |
64 - 128 ns | 1163434 | ############## |
128 - 256 ns | 914102 | ############# |
256 - 512 ns | 884 | |
512 - 1024 ns | 613 | |
1 - 2 us | 31 | |
2 - 4 us | 17 | |
4 - 8 us | 7 | |
8 - 16 us | 123 | |
16 - 32 us | 83 | |
perf lock:
- Add -c/--combine-locks option to merge lock instances in the same class into
a single entry.
# perf lock report -c
Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
rcu_read_lock 251225 0 0 0 0 0
hrtimer_bases.lock 39450 0 0 0 0 0
&sb->s_type->i_l... 10301 1 662 662 662 662
ptlock_ptr(page) 10173 2 701 1402 760 642
&(ei->i_block_re... 8732 0 0 0 0 0
&xa->xa_lock 8088 0 0 0 0 0
&base->lock 6705 0 0 0 0 0
&p->pi_lock 5549 0 0 0 0 0
&dentry->d_lockr... 5010 4 1274 5097 1844 789
&ep->lock 3958 0 0 0 0 0
- Add -F/--field option to customize the list of fields to output:
$ perf lock report -F contended,wait_max -k avg_wait
Name contended max wait(ns) avg wait(ns)
slock-AF_INET6 1 23543 23543
&lruvec->lru_lock 5 18317 11254
slock-AF_INET6 1 10379 10379
rcu_node_1 1 2104 2104
&dentry->d_lockr... 1 1844 1844
&dentry->d_lockr... 1 1672 1672
&newf->file_lock 15 2279 1025
&dentry->d_lockr... 1 792 792
- Add --synth=no option for record, as there is no need to symbolize,
lock names comes from the tracepoints.
perf record:
- Threaded recording, opt-in, via the new --threads command line option.
- Improve AMD IBS (Instruction-Based Sampling) error handling messages.
perf script:
- Add 'brstackinsnlen' field (use it with -F) for branch stacks.
- Output branch sample type in 'perf script'.
perf report:
- Add "addr_from" and "addr_to" sort dimensions.
- Print branch stack entry type in 'perf report --dump-raw-trace'
- Fix symbolization for chrooted workloads.
Hardware tracing:
Intel PT:
- Add CFE (Control Flow Event) and EVD (Event Data) packets support.
- Add MODE.Exec IFLAG bit support.
Explanation about these features from the "Intel® 64 and IA-32 architectures
software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C,
3D, and 4" PDF at:
https://cdrdv2.intel.com/v1/dl/getContent/671200
At page 3951:
<quote>
32.2.4
Event Trace is a capability that exposes details about the asynchronous
events, when they are generated, and when their corresponding software
event handler completes execution. These include:
o Interrupts, including NMI and SMI, including the interrupt vector when
defined.
o Faults, exceptions including the fault vector.
— Page faults additionally include the page fault address, when in context.
o Event handler returns, including IRET and RSM.
o VM exits and VM entries.¹
— VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields.
INIT and SIPI events.
o TSX aborts, including the abort status returned for the RTM instructions.
o Shutdown.
Additionally, it provides indication of the status of the Interrupt Flag
(IF), to indicate when interrupts are masked.
</quote>
ARM CoreSight:
- Use advertised caps/min_interval as default sample_period on ARM spe.
- Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM.
Vendor Events (JSON):
Intel:
- Update events and metrics for:
Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake,
Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX,
Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing,
Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
Tigerlake, TremontX, Westmere EP-SP, Westmere EX.
ARM:
- Add support for HiSilicon CPA PMU aliasing.
perf stat:
- Fix forked applications enablement of counters.
- The 'slots' should only be printed on a different order than the one specified
on the command line when 'topdown' events are present, fix it.
Miscellaneous:
- Sync msr-index, cpufeatures header files with the kernel sources.
- Stop using some deprecated libbpf APIs in 'perf trace'.
- Fix some spelling mistakes.
- Refactor the maps pointers usage to pave the way for using refcount debugging.
- Only offer the --tui option on perf top, report and annotate when perf was
built with libslang.
- Don't mention --to-ctf in 'perf data --help' when not linking with the required
library, libbabeltrace.
- Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci.
- Enhance the matching of sub-commands abbreviations:
'perf c2c rec' -> 'perf c2c record'
'perf c2c recport -> error
- Set build-id using build-id header on new mmap records.
- Fix generation of 'perf --version' string.
perf test:
- Add test for the arm_spe event.
- Add test to check unwinding using fame-pointer (fp) mode on arm64.
- Make metric testing more robust in 'perf test'.
- Add error message for unsupported branch stack cases.
libperf:
- Add API for allocating new thread map array.
- Fix typo in perf_evlist__open() failure error messages in libperf tests.
perf c2c:
- Replace bitmap_weight() with bitmap_empty() where appropriate.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYj8viwAKCRCyPKLppCJ+
J8K3AQDpN45P4/TWJxVWhZlvYzJtWDSboXHZJfmBiEd4Xu2zbwD7BFW02f1ATHPr
dGBFXxRQQufBIqfE+OQXG59Awp1m8wE=
=1l8S
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools updates from Arnaldo Carvalho de Melo:
"New features:
perf ftrace:
- Add -n/--use-nsec option to the 'latency' subcommand.
Default: usecs:
$ sudo perf ftrace latency -T dput -a sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 2098375 | ############################# |
1 - 2 us | 61 | |
2 - 4 us | 33 | |
4 - 8 us | 13 | |
8 - 16 us | 124 | |
16 - 32 us | 123 | |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 1 | |
256 - 512 us | 0 | |
Better granularity with nsec:
$ sudo perf ftrace latency -T dput -a -n sleep 1
# DURATION | COUNT | GRAPH |
0 - 1 us | 0 | |
1 - 2 ns | 0 | |
2 - 4 ns | 0 | |
4 - 8 ns | 0 | |
8 - 16 ns | 0 | |
16 - 32 ns | 0 | |
32 - 64 ns | 0 | |
64 - 128 ns | 1163434 | ############## |
128 - 256 ns | 914102 | ############# |
256 - 512 ns | 884 | |
512 - 1024 ns | 613 | |
1 - 2 us | 31 | |
2 - 4 us | 17 | |
4 - 8 us | 7 | |
8 - 16 us | 123 | |
16 - 32 us | 83 | |
perf lock:
- Add -c/--combine-locks option to merge lock instances in the same
class into a single entry.
# perf lock report -c
Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns)
rcu_read_lock 251225 0 0 0 0 0
hrtimer_bases.lock 39450 0 0 0 0 0
&sb->s_type->i_l... 10301 1 662 662 662 662
ptlock_ptr(page) 10173 2 701 1402 760 642
&(ei->i_block_re... 8732 0 0 0 0 0
&xa->xa_lock 8088 0 0 0 0 0
&base->lock 6705 0 0 0 0 0
&p->pi_lock 5549 0 0 0 0 0
&dentry->d_lockr... 5010 4 1274 5097 1844 789
&ep->lock 3958 0 0 0 0 0
- Add -F/--field option to customize the list of fields to output:
$ perf lock report -F contended,wait_max -k avg_wait
Name contended max wait(ns) avg wait(ns)
slock-AF_INET6 1 23543 23543
&lruvec->lru_lock 5 18317 11254
slock-AF_INET6 1 10379 10379
rcu_node_1 1 2104 2104
&dentry->d_lockr... 1 1844 1844
&dentry->d_lockr... 1 1672 1672
&newf->file_lock 15 2279 1025
&dentry->d_lockr... 1 792 792
- Add --synth=no option for record, as there is no need to symbolize,
lock names comes from the tracepoints.
perf record:
- Threaded recording, opt-in, via the new --threads command line
option.
- Improve AMD IBS (Instruction-Based Sampling) error handling
messages.
perf script:
- Add 'brstackinsnlen' field (use it with -F) for branch stacks.
- Output branch sample type in 'perf script'.
perf report:
- Add "addr_from" and "addr_to" sort dimensions.
- Print branch stack entry type in 'perf report --dump-raw-trace'
- Fix symbolization for chrooted workloads.
Hardware tracing:
Intel PT:
- Add CFE (Control Flow Event) and EVD (Event Data) packets support.
- Add MODE.Exec IFLAG bit support.
Explanation about these features from the "Intel® 64 and IA-32
architectures software developer’s manual combined volumes: 1, 2A,
2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at:
https://cdrdv2.intel.com/v1/dl/getContent/671200
At page 3951:
"32.2.4
Event Trace is a capability that exposes details about the
asynchronous events, when they are generated, and when their
corresponding software event handler completes execution. These
include:
o Interrupts, including NMI and SMI, including the interrupt
vector when defined.
o Faults, exceptions including the fault vector.
- Page faults additionally include the page fault address,
when in context.
o Event handler returns, including IRET and RSM.
o VM exits and VM entries.¹
- VM exits include the values written to the “exit reason”
and “exit qualification” VMCS fields. INIT and SIPI events.
o TSX aborts, including the abort status returned for the RTM
instructions.
o Shutdown.
Additionally, it provides indication of the status of the
Interrupt Flag (IF), to indicate when interrupts are masked"
ARM CoreSight:
- Use advertised caps/min_interval as default sample_period on ARM
spe.
- Update deduction of TRCCONFIGR register for branch broadcast on
ARM's CoreSight ETM.
Vendor Events (JSON):
Intel:
- Update events and metrics for: Alderlake, Broadwell, Broadwell DE,
BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont,
GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX,
Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP,
Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX,
Tigerlake, TremontX, Westmere EP-SP, and Westmere EX.
ARM:
- Add support for HiSilicon CPA PMU aliasing.
perf stat:
- Fix forked applications enablement of counters.
- The 'slots' should only be printed on a different order than the
one specified on the command line when 'topdown' events are
present, fix it.
Miscellaneous:
- Sync msr-index, cpufeatures header files with the kernel sources.
- Stop using some deprecated libbpf APIs in 'perf trace'.
- Fix some spelling mistakes.
- Refactor the maps pointers usage to pave the way for using refcount
debugging.
- Only offer the --tui option on perf top, report and annotate when
perf was built with libslang.
- Don't mention --to-ctf in 'perf data --help' when not linking with
the required library, libbabeltrace.
- Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by
array_size.cocci.
- Enhance the matching of sub-commands abbreviations:
'perf c2c rec' -> 'perf c2c record'
'perf c2c recport -> error
- Set build-id using build-id header on new mmap records.
- Fix generation of 'perf --version' string.
perf test:
- Add test for the arm_spe event.
- Add test to check unwinding using fame-pointer (fp) mode on arm64.
- Make metric testing more robust in 'perf test'.
- Add error message for unsupported branch stack cases.
libperf:
- Add API for allocating new thread map array.
- Fix typo in perf_evlist__open() failure error messages in libperf
tests.
perf c2c:
- Replace bitmap_weight() with bitmap_empty() where appropriate"
* tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits)
perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages
perf python: Add perf_env stubs that will be needed in evsel__open_strerror()
perf tools: Enhance the matching of sub-commands abbreviations
libperf tests: Fix typo in perf_evlist__open() failure error messages
tools arm64: Import cputype.h
perf lock: Add -F/--field option to control output
perf lock: Extend struct lock_key to have print function
perf lock: Add --synth=no option for record
tools headers cpufeatures: Sync with the kernel sources
tools headers cpufeatures: Sync with the kernel sources
perf stat: Fix forked applications enablement of counters
tools arch x86: Sync the msr-index.h copy with the kernel sources
perf evsel: Make evsel__env() always return a valid env
perf build-id: Fix spelling mistake "Cant" -> "Can't"
perf header: Fix spelling mistake "could't" -> "couldn't"
perf script: Add 'brstackinsnlen' for branch stacks
perf parse-events: Move slots only with topdown
perf ftrace latency: Update documentation
perf ftrace latency: Add -n/--use-nsec option
perf tools: Fix version kernel tag
...
* A small cleanup of unused variable in __next_mem_pfn_range_in_zone
* Initial test suite to simulate memblock behaviour in userspace
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmI9bD4THHJwcHRAbGlu
dXguaWJtLmNvbQAKCRA5A4Ymyw79kXwhB/wNXR1wUb/eD3eKD+aNa2KMY5+8csjD
ghJph8wQmM9U9hsLViv3/M/H5+bY/s0riZNulKYrcmzW2BgIzF2ebcoqgfQ89YGV
bLx7lMJGxG/lCglur9m6KnOF89//Owq6Vfk7Jd6jR/F+43JO/3+5siCbTo6NrbVw
3DjT/WzvaICA646foyFTh8WotnIRbB2iYX1k/vIA3gwJ2C6n7WwoKzxU3ulKMUzg
hVlhcuTVnaV4mjFBbl23wC7i4l9dgPO9M4ZrTtlEsNHeV6uoFYRObwy6/q/CsBqI
avwgV0bQDch+QuCteUXcqIcnBpcUAfGxgiqp2PYX4lXA4gYTbo7plTna
=IemP
-----END PGP SIGNATURE-----
Merge tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
Pull memblock updates from Mike Rapoport:
"Test suite and a small cleanup:
- A small cleanup of unused variable in __next_mem_pfn_range_in_zone
- Initial test suite to simulate memblock behaviour in userspace"
* tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: (27 commits)
memblock tests: Add TODO and README files
memblock tests: Add memblock_alloc_try_nid tests for bottom up
memblock tests: Add memblock_alloc_try_nid tests for top down
memblock tests: Add memblock_alloc_from tests for bottom up
memblock tests: Add memblock_alloc_from tests for top down
memblock tests: Add memblock_alloc tests for bottom up
memblock tests: Add memblock_alloc tests for top down
memblock tests: Add simulation of physical memory
memblock tests: Split up reset_memblock function
memblock tests: Fix testing with 32-bit physical addresses
memblock: __next_mem_pfn_range_in_zone: remove unneeded local variable nid
memblock tests: Add memblock_free tests
memblock tests: Add memblock_add_node test
memblock tests: Add memblock_remove tests
memblock tests: Add memblock_reserve tests
memblock tests: Add memblock_add tests
memblock tests: Add memblock reset function
memblock tests: Add skeleton of the memblock simulator
tools/include: Add debugfs.h stub
tools/include: Add pfn.h stub
...
coarse grained, hardware based, forward edge Control-Flow-Integrity mechanism
where any indirect CALL/JMP must target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation is
limited to 2 instructions (and typically fewer) on branch targets not starting
with ENDBR. CET-IBT also limits speculation of the next sequential instruction
after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides, as
described above, speculation limits itself.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEv3OU3/byMaA0LqWJdkfhpEvA5LoFAmI/LI8VHHBldGVyekBp
bmZyYWRlYWQub3JnAAoJEHZH4aRLwOS6ZnkP/2QCgQLTu6oRxv9O020CHwlaSEeD
1Hoy3loum5q5hAi1Ik3dR9p0H5u64c9qbrBVxaFoNKaLt5GKrtHaDSHNk2L/CFHX
urpH65uvTLxbyZzcahkAahoJ71XU+m7PcrHLWMunw9sy10rExYVsUOlFyoyG6XCF
BDCNZpdkC09ZM3vwlWGMZd5Pp+6HcZNPyoV9tpvWAS2l+WYFWAID7mflbpQ+tA8b
y/hM6b3Ud0rT2ubuG1iUpopgNdwqQZ+HisMPGprh+wKZkYwS2l8pUTrz0MaBkFde
go7fW16kFy2HQzGm6aIEBmfcg0palP/mFVaWP0zS62LwhJSWTn5G6xWBr3yxSsht
9gWCiI0oDZuTg698MedWmomdG2SK6yAuZuqmdKtLLoWfWgviPEi7TDFG/cKtZdAW
ag8GM8T4iyYZzpCEcWO9GWbjo6TTGq30JBQefCBG47GjD0csv2ubXXx0Iey+jOwT
x3E8wnv9dl8V9FSd/tMpTFmje8ges23yGrWtNpb5BRBuWTeuGiBPZED2BNyyIf+T
dmewi2ufNMONgyNp27bDKopY81CPAQq9cVxqNm9Cg3eWPFnpOq2KGYEvisZ/rpEL
EjMQeUBsy/C3AUFAleu1vwNnkwP/7JfKYpN00gnSyeQNZpqwxXBCKnHNgOMTXyJz
beB/7u2KIUbKEkSN
=jZfK
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 CET-IBT (Control-Flow-Integrity) support from Peter Zijlstra:
"Add support for Intel CET-IBT, available since Tigerlake (11th gen),
which is a coarse grained, hardware based, forward edge
Control-Flow-Integrity mechanism where any indirect CALL/JMP must
target an ENDBR instruction or suffer #CP.
Additionally, since Alderlake (12th gen)/Sapphire-Rapids, speculation
is limited to 2 instructions (and typically fewer) on branch targets
not starting with ENDBR. CET-IBT also limits speculation of the next
sequential instruction after the indirect CALL/JMP [1].
CET-IBT is fundamentally incompatible with retpolines, but provides,
as described above, speculation limits itself"
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/branch-history-injection.html
* tag 'x86_core_for_5.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
kvm/emulate: Fix SETcc emulation for ENDBR
x86/Kconfig: Only allow CONFIG_X86_KERNEL_IBT with ld.lld >= 14.0.0
x86/Kconfig: Only enable CONFIG_CC_HAS_IBT for clang >= 14.0.0
kbuild: Fixup the IBT kbuild changes
x86/Kconfig: Do not allow CONFIG_X86_X32_ABI=y with llvm-objcopy
x86: Remove toolchain check for X32 ABI capability
x86/alternative: Use .ibt_endbr_seal to seal indirect calls
objtool: Find unused ENDBR instructions
objtool: Validate IBT assumptions
objtool: Add IBT/ENDBR decoding
objtool: Read the NOENDBR annotation
x86: Annotate idtentry_df()
x86,objtool: Move the ASM_REACHABLE annotation to objtool.h
x86: Annotate call_on_stack()
objtool: Rework ASM_REACHABLE
x86: Mark __invalid_creds() __noreturn
exit: Mark do_group_exit() __noreturn
x86: Mark stop_this_cpu() __noreturn
objtool: Ignore extra-symbol code
objtool: Rename --duplicate to --lto
...
These are negative tests, testing TLS code rejects certain
operations. They won't pass without TLS enabled, pure TCP
accepts those operations.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Fixes: d87d67fd61 ("selftests: tls: test splicing cmsgs")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve the error message returned on failed perf_event_open() on AMD
systems when using IBS (Instruction-Based Sampling).
Output of executing 'perf record -e ibs_op// true' as a non root user
BEFORE this patch (perf will add the 'u' modifier at the end to exclude
kernel/hypervisor sampling):
The sys_perf_event_open() syscall returned with 22 (Invalid argument)for event (ibs_op//u).
/bin/dmesg | grep -i perf may provide additional information.
Output after:
AMD IBS can't exclude kernel events. Try running at a higher privilege level.
Output of executing 'sudo perf record -e ibs_op// true' BEFORE this patch:
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (ibs_op//).
/bin/dmesg | grep -i perf may provide additional information.
Output after:
Error:
Invalid event (ibs_op//) in per-thread mode, enable system wide with '-a'.
Folowing the suggestion:
$ sudo perf record -a -e ibs_op// true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.664 MB perf.data (194 samples) ]
$
Signed-off-by: Kim Phillips <kim.phillips@amd.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: João Martins <joao.m.martins@oracle.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20220322221517.2510440-12-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The AMD IBS error message enhancements will use these, but we're not
using evsel__open_strerror() in the python binding so far.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We support short command 'rec*' for 'record' and 'rep*' for 'report' in
lots of sub-commands, but the matching is not quite strict currnetly.
It may be puzzling sometime, like we mis-type a 'recport' to report but
it will perform 'record' in fact without any message.
To fix this, add a check to ensure that the short cmd is valid prefix
of the real command.
Committer testing:
[root@quaco ~]# perf c2c re sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
# perf c2c rec sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (16 samples) ]
# perf c2c recport sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
# perf c2c record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.038 MB perf.data (15 samples) ]
# perf c2c records sleep 1
Usage: perf c2c {record|report}
-v, --verbose be more verbose (show counter open errors, etc)
#
Signed-off-by: Wei Li <liwei391@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rui Xiang <rui.xiang@huawei.com>
Link: http://lore.kernel.org/lkml/20220325092032.2956161-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch corrects typos in error messages. I should be "evlist", not
"evsel" as the function that fails is perf_evlist__open().
Fixes: 3ce311afb5 ("libperf: Move to tools/lib/perf")
Fixes: a7f3713f6b ("libperf tests: Add test_stat_multiplexing test")
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220325043829.224045-2-nakamura.shun@fujitsu.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bring-in the kernel's arch/arm64/include/asm/cputype.h into tools/
for arm64 to make use of all the core-type definitions in perf.
Replace sysreg.h with the version already imported into tools/.
Committer notes:
Added an entry to tools/perf/check-headers.sh, so that we get notified
when the original file in the kernel sources gets modified.
Tester notes:
LGTM. I did the testing on both my x86 and Arm64 platforms, thanks for
the fixing up.
Signed-off-by: Ali Saidi <alisaidi@amazon.com>
Tested-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrew Kilroy <andrew.kilroy@arm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick.Forrington@arm.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20220324183323.31414-2-alisaidi@amazon.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The purpose of the last test case is to test VXLAN encapsulation and
decapsulation when the underlay lookup takes place in a non-default VRF.
This is achieved by enslaving the physical device of the tunnel to a
VRF.
The binding of the VXLAN UDP socket to the VRF happens when the VXLAN
device itself is opened, not when its physical device is opened. This
was also mentioned in the cited commit ("tests that moving the underlay
from a VRF to another works when down/up the VXLAN interface"), but the
test did something else.
Fix it by reopening the VXLAN device instead of its physical device.
Before:
# ./test_vxlan_under_vrf.sh
Checking HV connectivity [ OK ]
Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
Check VM connectivity through VXLAN (underlay in a VRF) [FAIL]
After:
# ./test_vxlan_under_vrf.sh
Checking HV connectivity [ OK ]
Check VM connectivity through VXLAN (underlay in the default VRF) [ OK ]
Check VM connectivity through VXLAN (underlay in a VRF) [ OK ]
Fixes: 03f1c26b1c ("test/net: Add script for VXLAN underlay in a VRF")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220324200514.1638326-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Highlights:
- new drivers:
- AMD Host System Management Port (HSMP)
- Intel Software Defined Silicon
- removed drivers (functionality folded into other drivers):
- intel_cht_int33fe_microb
- surface3_button
- amd-pmc:
- s2idle bug-fixes
- Support for AMD Spill to DRAM STB feature
- hp-wmi:
- Fix SW_TABLET_MODE detection method (and other fixes)
- Support omen thermal profile policy v1
- serial-multi-instantiate:
- Add SPI device support
- Add support for CS35L41 amplifiers used in new laptops
- think-lmi:
- syfs-class-firmware-attributes Certificate authentication support
- thinkpad_acpi:
- Fixes + quirks
- Add platform_profile support on AMD based ThinkPads
- x86-android-tablets
- Improve Asus ME176C / TF103C support
- Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets
- Lots of various other small fixes and hardware-id additions
The following is an automated git shortlog grouped by driver:
ACPI / scan:
- Create platform device for CS35L41
ACPI / x86:
- Add support for LPS0 callback handler
ALSA:
- hda/realtek: Add support for HP Laptops
Add AMD system management interface:
- Add AMD system management interface
Add Intel Software Defined Silicon driver:
- Add Intel Software Defined Silicon driver
Documentation:
- syfs-class-firmware-attributes: Lenovo Certificate support
- Add x86/amd_hsmp driver
ISST:
- Fix possible circular locking dependency detected
Input:
- soc_button_array - add support for Microsoft Surface 3 (MSHW0028) buttons
Merge remote-tracking branch 'pdx86/platform-drivers-x86-pinctrl-pmu_clk' into review-hans-gcc12:
- Merge remote-tracking branch 'pdx86/platform-drivers-x86-pinctrl-pmu_clk' into review-hans-gcc12
Merge tag 'platform-drivers-x86-serial-multi-instantiate-1' into review-hans:
- Merge tag 'platform-drivers-x86-serial-multi-instantiate-1' into review-hans
Replace acpi_bus_get_device():
- Replace acpi_bus_get_device()
amd-pmc:
- Only report STB errors when STB enabled
- Drop CPU QoS workaround
- Output error codes in messages
- Move to later in the suspend process
- Validate entry into the deepest state on resume
- uninitialized variable in amd_pmc_s2d_init()
- Set QOS during suspend on CZN w/ timer wakeup
- Add support for AMD Spill to DRAM STB feature
- Correct usage of SMU version
- Make amd_pmc_stb_debugfs_fops static
asus-tf103c-dock:
- Make 2 global structs static
asus-wmi:
- Fix regression when probing for fan curve control
hp-wmi:
- support omen thermal profile policy v1
- Changing bios_args.data to be dynamically allocated
- Fix 0x05 error code reported by several WMI calls
- Fix SW_TABLET_MODE detection method
- Fix hp_wmi_read_int() reporting error (0x05)
huawei-wmi:
- check the return value of device_create_file()
i2c-multi-instantiate:
- Rename it for a generic serial driver name
int3472:
- Add terminator to gpiod_lookup_table
intel-uncore-freq:
- fix uncore_freq_common_init() error codes
intel_cht_int33fe:
- Move to intel directory
- Drop Lenovo Yogabook YB1-X9x code
- Switch to DMI modalias based loading
intel_crystal_cove_charger:
- Fix IRQ masking / unmasking
lg-laptop:
- Move setting of battery charge limit to common location
pinctrl:
- baytrail: Add pinconf group + function for the pmu_clk
platform/dcdbas:
- move EXPORT_SYMBOL after function
platform/surface:
- Remove Surface 3 Button driver
- surface3-wmi: Simplify resource management
- Replace acpi_bus_get_device()
- Reinstate platform dependency
platform/x86/intel-uncore-freq:
- Split common and enumeration part
platform/x86/intel/uncore-freq:
- Display uncore current frequency
- Use sysfs API to create attributes
- Move to uncore-frequency folder
selftests:
- sdsi: test sysfs setup
serial-multi-instantiate:
- Add SPI support
- Reorganize I2C functions
spi:
- Add API to count spi acpi resources
- Support selection of the index of the ACPI Spi Resource before alloc
- Create helper API to lookup ACPI info for spi device
- Make spi_alloc_device and spi_add_device public again
surface:
- surface3_power: Fix battery readings on batteries without a serial number
think-lmi:
- Certificate authentication support
thinkpad_acpi:
- consistently check fan_get_status return.
- Don't use test_bit on an integer
- Fix compiler warning about uninitialized err variable
- clean up dytc profile convert
- Add PSC mode support
- Add dual fan probe
- Add dual-fan quirk for T15g (2nd gen)
- Fix incorrect use of platform profile on AMD platforms
- Add quirk for ThinkPads without a fan
tools arch x86:
- Add Intel SDSi provisiong tool
touchscreen_dmi:
- Add info for the RWC NANOTE P8 AY07J 2-in-1
x86-android-tablets:
- Depend on EFI and SPI
- Lenovo Yoga Tablet 2 830/1050 sound support
- Workaround Lenovo Yoga Tablet 2 830/1050 poweroff hang
- Add Lenovo Yoga Tablet 2 830 / 1050 data
- Fix EBUSY error when requesting IOAPIC IRQs
- Minor charger / fuel-gauge improvements
- Add Nextbook Ares 8 data
- Add IRQ to Asus ME176C accelerometer info
- Add lid-switch gpio-keys pdev to Asus ME176C + TF103C
- Add x86_android_tablet_get_gpiod() helper
- Add Asus ME176C/TF103C charger and fuelgauge props
- Add battery swnode support
- Trivial typo fix for MODULE_AUTHOR
- Fix the buttons on CZC P10T tablet
- Constify the gpiod_lookup_tables arrays
- Add an init() callback to struct x86_dev_info
- Add support for disabling ACPI _AEI handlers
- Correct crystal_cove_charger module name
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmI8SjEUHGhkZWdvZWRl
QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wYUwf/cdUMPFy5cwpHq1LuqGy+PxVCRHCe
71PFd2Ycj+HGOtrt66RxSiCC1Seb4tylr7FvudToDaqWjlBf5n6LhpDudg4ds7Qw
lCuRlaXTIrF7p3nOLIsWvJPRqacMG79KkRM62MLTS2evtRYjbnKvFzNPJPzr8827
1AhCakE92S8gkR5lUZYYHtsaz9rZ4z4TrEtjO6GdlbL2bDw0l18dNNwdMomfVpNS
bBIHIDLeufDuMJ4PxIHlE5MB3AuZAuc0HTJWihozyJX/h5FMGI6qVm0/s9RAfHgX
XdMCpADtS/JjHCmkFgLZYIzvXTxwQVZRo5VO0Wrv5Mis6gSpxJXCd0aKlA==
=1x9/
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver updates from Hans de Goede:
"New drivers:
- AMD Host System Management Port (HSMP)
- Intel Software Defined Silicon
Removed drivers (functionality folded into other drivers):
- intel_cht_int33fe_microb
- surface3_button
amd-pmc:
- s2idle bug-fixes
- Support for AMD Spill to DRAM STB feature
hp-wmi:
- Fix SW_TABLET_MODE detection method (and other fixes)
- Support omen thermal profile policy v1
serial-multi-instantiate:
- Add SPI device support
- Add support for CS35L41 amplifiers used in new laptops
think-lmi:
- syfs-class-firmware-attributes Certificate authentication support
thinkpad_acpi:
- Fixes + quirks
- Add platform_profile support on AMD based ThinkPads
x86-android-tablets:
- Improve Asus ME176C / TF103C support
- Support Nextbook Ares 8, Lenovo Tab 2 830 and 1050 tablets
Lots of various other small fixes and hardware-id additions"
* tag 'platform-drivers-x86-v5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (60 commits)
platform/x86: think-lmi: Certificate authentication support
Documentation: syfs-class-firmware-attributes: Lenovo Certificate support
platform/x86: amd-pmc: Only report STB errors when STB enabled
platform/x86: amd-pmc: Drop CPU QoS workaround
platform/x86: amd-pmc: Output error codes in messages
platform/x86: amd-pmc: Move to later in the suspend process
ACPI / x86: Add support for LPS0 callback handler
platform/x86: thinkpad_acpi: consistently check fan_get_status return.
platform/x86: hp-wmi: support omen thermal profile policy v1
platform/x86: hp-wmi: Changing bios_args.data to be dynamically allocated
platform/x86: hp-wmi: Fix 0x05 error code reported by several WMI calls
platform/x86: hp-wmi: Fix SW_TABLET_MODE detection method
platform/x86: hp-wmi: Fix hp_wmi_read_int() reporting error (0x05)
platform/x86: amd-pmc: Validate entry into the deepest state on resume
platform/x86: thinkpad_acpi: Don't use test_bit on an integer
platform/x86: thinkpad_acpi: Fix compiler warning about uninitialized err variable
platform/x86: thinkpad_acpi: clean up dytc profile convert
platform/x86: x86-android-tablets: Depend on EFI and SPI
platform/x86: amd-pmc: uninitialized variable in amd_pmc_s2d_init()
platform/x86: intel-uncore-freq: fix uncore_freq_common_init() error codes
...
And use it to print output for each key field. No functional change
intended and the output should be identical.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220323230259.288494-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf lock command has nothing to symbolize and lock names come
from the tracepoint. Moreover, kernel symbols are available even the
--synth=no option is given.
This will reduce the startup time by avoiding unnecessary synthesis.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220323230259.288494-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge yet more updates from Andrew Morton:
"This is the material which was staged after willystuff in linux-next.
Subsystems affected by this patch series: mm (debug, selftests,
pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
and selftests"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
selftests: kselftest framework: provide "finished" helper
mm: madvise: MADV_DONTNEED_LOCKED
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
mm: generalize ARCH_HAS_FILTER_PGPROT
mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
mm: warn on deleting redirtied only if accounted
mm/huge_memory: remove stale locking logic from __split_huge_pmd()
mm/huge_memory: remove stale page_trans_huge_mapcount()
mm/swapfile: remove stale reuse_swap_page()
mm/khugepaged: remove reuse_swap_page() usage
mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
mm: streamline COW logic in do_swap_page()
mm: slightly clarify KSM logic in do_swap_page()
mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
mm: optimize do_wp_page() for exclusive pages in the swapcache
mm/huge_memory: make is_transparent_hugepage() static
userfaultfd/selftests: enable hugetlb remap and remove event testing
selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
mm: enable MADV_DONTNEED for hugetlb mappings
kasan: disable LOCKDEP when printing reports
...
* Support for Sv57-based virtual memory.
* Various improvements for the MicroChip PolarFire SOC and the
associated Icicle dev board, which should allow upstream kernels to
boot without any additional modifications.
* An improved memmove() implementation.
* Support for the new Ssconfpmf and SBI PMU extensions, which allows for
a much more useful perf implementation on RISC-V systems.
* Support for restartable sequences.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmI96FcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYiQBFD/425+6xmoOru6Wiki3Ja0fqQToNrQyW
IbmE/8AxUP7UxMvJSNzvQm8deXgklzvmegXCtnjwZZins971vMzzDSI83k/zn8I7
m5thVC9z01BjodV+pvIp/44hS6FesolOLzkVHksX0Zh6h0iidrc34Qf5HrqvvNfN
CZ/4K1+E9ig5r9qZp4WdvocCXj+FzwF/30GjKoW9vwA599CEG/dCo+TNN9GKD6XS
k+xiUGwlIRA+kCLSPFCi7ev9XPr1tCmQB7uB8Igcvr7Y3mWl8HKfajQVXBnXNRC3
ifbDxpx1elJiLPyf7Rza8jIDwDhLQdxBiwPgDgP9h9R4x0uF4efq8PzLzFlFmaE+
9Z9thfykBb5dXYDFDje9bAOXvKnGk7Iqxdsz0qWo/ChEQawX1+11bJb0TNN8QTT9
YvlQfUXgb1dmEcj5yG2uVE1Y8L7YNLRMsZU3W3FbmPJZoavSOuU4x0yCGeLyv597
76af3nuBJ5v80Db97gu6St+HIACeevKflsZUf/8GS/p7d1DlvmrWzQUMEycxPTG9
UZpZak58jh7AqQ9JbLnavhwmeacY50vpZOw6QHGAHSN+8daCPlOHDG7Ver7Z+kNj
+srJ7iKMvLnnaEjGNgavfxdqTOme1gv4LWs/JdHYMkpphqVN92xBDJnhXTPRVZiQ
0x39vK86qtB46A==
=Omc6
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for Sv57-based virtual memory.
- Various improvements for the MicroChip PolarFire SOC and the
associated Icicle dev board, which should allow upstream kernels to
boot without any additional modifications.
- An improved memmove() implementation.
- Support for the new Ssconfpmf and SBI PMU extensions, which allows
for a much more useful perf implementation on RISC-V systems.
- Support for restartable sequences.
* tag 'riscv-for-linus-5.18-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (36 commits)
rseq/selftests: Add support for RISC-V
RISC-V: Add support for restartable sequence
MAINTAINERS: Add entry for RISC-V PMU drivers
Documentation: riscv: Remove the old documentation
RISC-V: Add sscofpmf extension support
RISC-V: Add perf platform driver based on SBI PMU extension
RISC-V: Add RISC-V SBI PMU extension definitions
RISC-V: Add a simple platform driver for RISC-V legacy perf
RISC-V: Add a perf core library for pmu drivers
RISC-V: Add CSR encodings for all HPMCOUNTERS
RISC-V: Remove the current perf implementation
RISC-V: Improve /proc/cpuinfo output for ISA extensions
RISC-V: Do no continue isa string parsing without correct XLEN
RISC-V: Implement multi-letter ISA extension probing framework
RISC-V: Extract multi-letter extension names from "riscv, isa"
RISC-V: Minimal parser for "riscv, isa" strings
RISC-V: Correctly print supported extensions
riscv: Fixed misaligned memory access. Fixed pointer comparison.
MAINTAINERS: update riscv/microchip entry
riscv: dts: microchip: add new peripherals to icicle kit device tree
...
- Raise minimum supported machine generation to z10, which comes with
various cleanups and code simplifications (usercopy/spectre
mitigation/etc).
- Rework extables and get rid of anonymous out-of-line fixups.
- Page table helpers cleanup. Add set_pXd()/set_pte() helper
functions. Covert pte_val()/pXd_val() macros to functions.
- Optimize kretprobe handling by avoiding extra kprobe on
__kretprobe_trampoline.
- Add support for CEX8 crypto cards.
- Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
- Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
group sections which simplifies kpatch support.
- Always use the packed stack layout and extend kernel unwinder tests.
- Add sanity checks for ftrace code patching.
- Add s390dbf debug log for the vfio_ap device driver.
- Various virtual vs physical address confusion fixes.
- Various small fixes and improvements all over the code.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmI94dsACgkQjYWKoQLX
FBiaCggAm9xYJ06Qt9c+T9B7aA4Lt50w7Bnxqx1/Q7UHQQgDpkNhKzI1kt/xeKY4
JgZQ9lJC4YRLlyfIVzffLI2DWGbl8BcTpuRWVLhPI5D2yHZBXr2ARe7IGFJueddy
MVqU/r+U3H0r3obQeUc4TSrHtSRX7eQZWIoVuDU75b9fCniee/bmGZqs6yXPXXh4
pTZQ/gsIhF/o6eBJLEXLjUAcIasxCk15GXWXmkaSwKHAhfYiintwGmtKqQ8etCvw
17vdlTjA4ce+3ooD/hXGPa8TqeiGKsIB2Xr89x/48f1eJyp2zPJZ1ZvAUBHJBCNt
b4sF4ql8303Lj7Be+LeqdlbXfa5PZg==
=meZf
-----END PGP SIGNATURE-----
Merge tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Raise minimum supported machine generation to z10, which comes with
various cleanups and code simplifications (usercopy/spectre
mitigation/etc).
- Rework extables and get rid of anonymous out-of-line fixups.
- Page table helpers cleanup. Add set_pXd()/set_pte() helper functions.
Covert pte_val()/pXd_val() macros to functions.
- Optimize kretprobe handling by avoiding extra kprobe on
__kretprobe_trampoline.
- Add support for CEX8 crypto cards.
- Allow to trigger AP bus rescan via writing to /sys/bus/ap/scans.
- Add CONFIG_EXPOLINE_EXTERN option to build the kernel without COMDAT
group sections which simplifies kpatch support.
- Always use the packed stack layout and extend kernel unwinder tests.
- Add sanity checks for ftrace code patching.
- Add s390dbf debug log for the vfio_ap device driver.
- Various virtual vs physical address confusion fixes.
- Various small fixes and improvements all over the code.
* tag 's390-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (69 commits)
s390/test_unwind: add kretprobe tests
s390/kprobes: Avoid additional kprobe in kretprobe handling
s390: convert ".insn" encoding to instruction names
s390: assume stckf is always present
s390/nospec: move to single register thunks
s390: raise minimum supported machine generation to z10
s390/uaccess: Add copy_from/to_user_key functions
s390/nospec: align and size extern thunks
s390/nospec: add an option to use thunk-extern
s390/nospec: generate single register thunks if possible
s390/pci: make zpci_set_irq()/zpci_clear_irq() static
s390: remove unused expoline to BC instructions
s390/irq: use assignment instead of cast
s390/traps: get rid of magic cast for per code
s390/traps: get rid of magic cast for program interruption code
s390/signal: fix typo in comments
s390/asm-offsets: remove unused defines
s390/test_unwind: avoid build warning with W=1
s390: remove .fixup section
s390/bpf: encode register within extable entry
...
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some toolchains. This allows
powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to: Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh Kumar K.V, Anton
Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar Chowdhury, Cédric Le Goater, Chen
Jingwen, Christophe JAILLET, Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel
Henrique Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu Hua, Haren
Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason Wang, Jeremy Kerr, Joachim
Wiberg, Jordan Niethe, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan
Srinivasan, Mamatha Inamdar, Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal
Suchanek, Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin, Nour-eddine
Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy Dunlap, Ritesh Harjani, Rohan
McLure, Russell Currey, Sachin Sant, Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain,
Thierry Reding, Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean, Wedson
Almeida Filho, YueHaibing.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmI9TtQTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLp2D/0dwoliEJubRCfoawYUGhxTRZuo6ZYw
EQzprOiFA/MtrZyPfbrX/FwxeeetzQWysaw2r5JAuQwx5Jb7Od9dNIrVmueFEktC
hD4fkO8YT+QuOD3Xhp/rDQTImdw4fkeofIjnWIqEAtz0XGInmiRQKOnojVe/Po7f
72Yi1u80LxYBAnkN/Hhpmi/BsVmu0Nh3wELu+JZopQXjINj4RyD49ayCBSLbmiNc
uo7oYzJ0/WsZHNTpX9kAzzCq+XmI3dKZPyf2AOCvoRxJTmUPCRZF9QCwsmQFikiI
vZOdz4fI5e+C0aYJj8ODmWMrXiS+JUQdEShjGg9t9K6EN8idC8joKWpAuXjTA9KN
kRjzXX7AvjxaMEGbLe8gjU0PmEjY3eSzMOy15Oc/C0DRRswXRzrXdx2AF+/J6bQb
MWMM4aCKfcYs5/TENkEnV0xpbOCOy4ikHM1KZbxvVrShvjSlNIL9XTOnl/pNK5BJ
XSSI2mfnjKkbI1+l0KQ4NBXIRTo6HLpu5jwY3Xh97Tq7kaEfqDbO5p2P2HoOCiLa
ZjdzmpP99zM6wnqUSj+lyvjob7btyhoq6TKmPtxfKbR6OaSfRJ760BCJ5y15Y9Hc
rHey4Y/NL7LqsVYFZxi4/T6Ncq1hNeYr2Fiis4gH+/1zjr6Cd4othnvw3Slaxhst
AaHpN3pyx1QI6g==
=8r2c
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
"Livepatch support for 32-bit is probably the standout new feature,
otherwise mostly just lots of bits and pieces all over the board.
There's a series of commits cleaning up function descriptor handling,
which touches a few other arches as well as LKDTM. It has acks from
Arnd, Kees and Helge.
Summary:
- Enforce kernel RO, and implement STRICT_MODULE_RWX for 603.
- Add support for livepatch to 32-bit.
- Implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS.
- Merge vdso64 and vdso32 into a single directory.
- Fix build errors with newer binutils.
- Add support for UADDR64 relocations, which are emitted by some
toolchains. This allows powerpc to build with the latest lld.
- Fix (another) potential userspace r13 corruption in transactional
memory handling.
- Cleanups of function descriptor handling & related fixes to LKDTM.
Thanks to Abdul Haleem, Alexey Kardashevskiy, Anders Roxell, Aneesh
Kumar K.V, Anton Blanchard, Arnd Bergmann, Athira Rajeev, Bhaskar
Chowdhury, Cédric Le Goater, Chen Jingwen, Christophe JAILLET,
Christophe Leroy, Corentin Labbe, Daniel Axtens, Daniel Henrique
Barboza, David Dai, Fabiano Rosas, Ganesh Goudar, Guo Zhengkui, Hangyu
Hua, Haren Myneni, Hari Bathini, Igor Zhbanov, Jakob Koschel, Jason
Wang, Jeremy Kerr, Joachim Wiberg, Jordan Niethe, Julia Lawall, Kajol
Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mamatha Inamdar,
Maxime Bizon, Maxim Kiselev, Maxim Kochetkov, Michal Suchanek,
Nageswara R Sastry, Nathan Lynch, Naveen N. Rao, Nicholas Piggin,
Nour-eddine Taleb, Paul Menzel, Ping Fang, Pratik R. Sampat, Randy
Dunlap, Ritesh Harjani, Rohan McLure, Russell Currey, Sachin Sant,
Segher Boessenkool, Shivaprasad G Bhat, Sourabh Jain, Thierry Reding,
Tobias Waldekranz, Tyrel Datwyler, Vaibhav Jain, Vladimir Oltean,
Wedson Almeida Filho, and YueHaibing"
* tag 'powerpc-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (179 commits)
powerpc/pseries: Fix use after free in remove_phb_dynamic()
powerpc/time: improve decrementer clockevent processing
powerpc/time: Fix KVM host re-arming a timer beyond decrementer range
powerpc/tm: Fix more userspace r13 corruption
powerpc/xive: fix return value of __setup handler
powerpc/64: Add UADDR64 relocation support
powerpc: 8xx: fix a return value error in mpc8xx_pic_init
powerpc/ps3: remove unneeded semicolons
powerpc/64: Force inlining of prevent_user_access() and set_kuap()
powerpc/bitops: Force inlining of fls()
powerpc: declare unmodified attribute_group usages const
powerpc/spufs: Fix build warning when CONFIG_PROC_FS=n
powerpc/secvar: fix refcount leak in format_show()
powerpc/64e: Tie PPC_BOOK3E_64 to PPC_FSL_BOOK3E
powerpc: Move C prototypes out of asm-prototypes.h
powerpc/kexec: Declare kexec_paca static
powerpc/smp: Declare current_set static
powerpc: Cleanup asm-prototypes.c
powerpc/ftrace: Use STK_GOT in ftrace_mprofile.S
powerpc/ftrace: Regroup PPC64 specific operations in ftrace_mprofile.S
...
Instead of having each time that wants to use ksft_exit() have to figure
out the internals of kselftest.h, add the helper ksft_finished() that
makes sure the passes, xfails, and skips are equal to the test plan count.
Link: https://lkml.kernel.org/r/20220201013717.2464392-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With MADV_DONTNEED support added to hugetlb mappings, mremap testing can
also be enabled for hugetlb.
Modify the tests to use madvise MADV_DONTNEED and MADV_REMOVE instead of
fallocate hole puch for releasing hugetlb pages.
Link: https://lkml.kernel.org/r/20220215002348.128823-4-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that MADV_DONTNEED support for hugetlb is enabled, add corresponding
tests. MADV_REMOVE has been enabled for some time, but no tests exist so
add them as well.
Link: https://lkml.kernel.org/r/20220215002348.128823-3-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
PAGE_SIZE is not 4096 in many configurations, particularly ppc64 uses 64K
pages in majority of cases.
Add helpers to detect PAGE_SIZE and PAGE_SHIFT dynamically.
Without this tests are broken w.r.t reading /proc/self/pagemap
if (pread(pagemap_fd, ent, sizeof(ent),
(uintptr_t)ptr >> (PAGE_SHIFT - 3)) != sizeof(ent))
err(2, "read pagemap");
Link: https://lkml.kernel.org/r/20220307054355.149820-2-aneesh.kumar@linux.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When viewing page owner information, we may want to cull blocks of
information with our own rules. So it is important to enhance culling
function to provide the support for customizing culling rules.
Therefore, following adjustments are made:
1. Add --cull option to support the culling of blocks of information
with user-defined culling rules.
./page_owner_sort <input> <output> --cull=<rules>
./page_owner_sort <input> <output> --cull <rules>
<rules> is a single argument in the form of a comma-separated list to
specify individual culling rules, by the sequence of keys k1,k2, ....
Mixed use of abbreviated and complete-form of keys is allowed.
For reference, please see the document(Documentation/vm/page_owner.rst).
Now, assuming two blocks in the input file are as follows:
Page allocated via order 0, mask xxxx, pid 1, tgid 1 (task_name_demo)
PFN xxxx
prep_new_page+0xd0/0xf8
get_page_from_freelist+0x4a0/0x1290
__alloc_pages+0x168/0x340
alloc_pages+0xb0/0x158
Page allocated via order 0, mask xxxx, pid 32, tgid 32 (task_name_demo)
PFN xxxx
prep_new_page+0xd0/0xf8
get_page_from_freelist+0x4a0/0x1290
__alloc_pages+0x168/0x340
alloc_pages+0xb0/0x158
If we want to cull the blocks by stacktrace and task command name, we can
use this command:
./page_owner_sort <input> <output> --cull=stacktrace,name
The output would be like:
2 times, 2 pages, task_comm_name: task_name_demo
prep_new_page+0xd0/0xf8
get_page_from_freelist+0x4a0/0x1290
__alloc_pages+0x168/0x340
alloc_pages+0xb0/0x158
As we can see, these two blocks are culled successfully, for they share
the same pid and task command name.
However, if we want to cull the blocks by pid, stacktrace and task command
name, we can this command:
./page_owner_sort <input> <output> --cull=stacktrace,name,pid
The output would be like:
1 times, 1 pages, PID 1, task_comm_name: task_name_demo
prep_new_page+0xd0/0xf8
get_page_from_freelist+0x4a0/0x1290
__alloc_pages+0x168/0x340
alloc_pages+0xb0/0x158
1 times, 1 pages, PID 32, task_comm_name: task_name_demo
prep_new_page+0xd0/0xf8
get_page_from_freelist+0x4a0/0x1290
__alloc_pages+0x168/0x340
alloc_pages+0xb0/0x158
As we can see, these two blocks are failed to cull, for their PIDs are
different.
2. Add explanations of --cull options to the document.
This work is coauthored by
Yixuan Cao
Shenghong Han
Yinan Zhang
Chongxi Zhao
Yuhong Feng
Link: https://lkml.kernel.org/r/20220312145834.624-1-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Cc: Yuhong Feng <yuhongf@szu.edu.cn>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When viewing page owner information, we may also need to select the blocks
by PID, TGID or task command name, which helps to get more accurate page
allocation information as needed.
Therefore, following adjustments are made:
1. Add three new options, including --pid, --tgid and --name, to support
the selection of information blocks by a specific pid, tgid and task
command name. In addtion, multiple options are allowed to be used at
the same time.
./page_owner_sort [input] [output] --pid <PID>
./page_owner_sort [input] [output] --tgid <TGID>
./page_owner_sort [input] [output] --name <TASK_COMMAND_NAME>
Assuming a scenario when a multi-threaded program, ./demo (PID =
5280), is running, and ./demo creates a child process (PID = 5281).
$ps
PID TTY TIME CMD
5215 pts/0 00:00:00 bash
5280 pts/0 00:00:00 ./demo
5281 pts/0 00:00:00 ./demo
5282 pts/0 00:00:00 ps
It would be better to filter out the records with tgid=5280 and the
task name "demo" when debugging the parent process, and the specific
usage is
./page_owner_sort [input] [output] --tgid 5280 --name demo
2. Add explanations of three new options, including --pid, --tgid and
--name, to the document.
This work is coauthored by
Shenghong Han <hanshenghong2019@email.szu.edu.cn>,
Yixuan Cao <caoyixuan2019@email.szu.edu.cn>,
Yinan Zhang <zhangyinan2019@email.szu.edu.cn>,
Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>,
Yuhong Feng <yuhongf@szu.edu.cn>.
Link: https://lkml.kernel.org/r/1646835223-7584-1-git-send-email-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When viewing page owner information, we may also need to the block to be
sorted by task command name. Therefore, the following adjustments are
made:
1. Add a member variable to record task command name of block.
2. Add a new -n option to sort the information of blocks by task command
name.
3. Add -n option explanation in the document.
Link: https://lkml.kernel.org/r/20220306030640.43054-2-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: <zhaochongxi2019@email.szu.edu.cn>
Cc: <hanshenghong2019@email.szu.edu.cn>
Cc: <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The following adjustments are made:
1. Instead of using another array to cull the blocks after sorting,
reuse the old array. So there is no need to malloc a new array.
2. When enabling '-f' option to filter out the blocks which have been
released, only add those have not been released in the list, rather
than add all of blocks in the list and then do the filtering when
printing the result.
3. When enabling '-c' option to cull the blocks by comparing
stacktrace, print the stacetrace rather than the total block.
Link: https://lkml.kernel.org/r/20220306030640.43054-1-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: <hanshenghong2019@email.szu.edu.cn>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: <zhangyinan2019@email.szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When the "page owner" information is read, the information sorted
by TGID is expected.
As a result, the following adjustments have been made:
1. Add a new -P option to sort the information of blocks by TGID in
ascending order.
2. Adjust the order of member variables in block_list strust to avoid
one 4 byte hole.
3. Add -P option explanation in the document.
Link: https://lkml.kernel.org/r/20220301151438.166118-3-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a security check after using malloc() to allocate memory.
Link: https://lkml.kernel.org/r/20220301151438.166118-2-yejiajian2018@email.szu.edu.cn
Signed-off-by: Jiajian Ye <yejiajian2018@email.szu.edu.cn>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed a discrepancy between the usage method and the code logic.
If we enable the -f option, it should be "Filter out the information of
blocks whose memory has been released".
Link: https://lkml.kernel.org/r/20220219143106.2805-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I noticed that there is two invalid lines of duplicate code. It's better
to delete it.
Link: https://lkml.kernel.org/r/20211213095743.3630-1-caoyixuan2019@email.szu.edu.cn
Signed-off-by: Yixuan Cao <caoyixuan2019@email.szu.edu.cn>
Cc: Mark Brown <broonie@kernel.org>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1) There is an unused variable. It's better to delete it.
2) One case is missing in the usage().
Link: https://lkml.kernel.org/r/20211213164518.2461-1-hanshenghong2019@email.szu.edu.cn
Signed-off-by: Shenghong Han <hanshenghong2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When viewing the page owner information, we expect that the information
can be sorted by PID, so that we can quickly combine PID with the program
to check the information together.
We also expect that the information can be sorted by time. Time sorting
helps to view the running status of the program according to the time
interval when the program hangs up.
Finally, we hope to pass the page_ owner_ Sort. C can reduce part of the
output and only output the plate information whose memory has not been
released, which can make us locate the problem of the program faster.
Therefore, the following adjustments have been made:
1. Add the static functions search_pattern and check_regcomp to
improve the cleanliness.
2. Add member attributes and their corresponding sorting methods. In
terms of comparison time, int will overflow because the data of ull is
too large, so the ternary operator is used
3. Add the -f parameter to filter out the information of blocks whose
memory has not been released
Link: https://lkml.kernel.org/r/20211206165653.5093-1-zhaochongxi2019@email.szu.edu.cn
Signed-off-by: Chongxi Zhao <zhaochongxi2019@email.szu.edu.cn>
Reviewed-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Culling by comparing stacktrace would casue loss of some information. For
example, if there exists 2 blocks which have the same stacktrace and the
different head info
Page allocated via order 0, mask 0x108c48(...), pid 73696,
ts 1578829190639010 ns, free_ts 1576583851324450 ns
prep_new_page+0x80/0xb8
get_page_from_freelist+0x924/0xee8
__alloc_pages+0x138/0xc18
alloc_pages+0x80/0xf0
__page_cache_alloc+0x90/0xc8
Page allocated via order 0, mask 0x108c48(...), pid 61806,
ts 1354113726046100 ns, free_ts 1354104926841400 ns
prep_new_page+0x80/0xb8
get_page_from_freelist+0x924/0xee8
__alloc_pages+0x138/0xc18
alloc_pages+0x80/0xf0
__page_cache_alloc+0x90/0xc8
After culling, it would be like this
2 times, 2 pages:
Page allocated via order 0, mask 0x108c48(...), pid 73696,
ts 1578829190639010 ns, free_ts 1576583851324450 ns
prep_new_page+0x80/0xb8
get_page_from_freelist+0x924/0xee8
__alloc_pages+0x138/0xc18
alloc_pages+0x80/0xf0
__page_cache_alloc+0x90/0xc8
The info of second block missed. So, add -c to turn on culling by
stacktrace. By default, it will cull by txt.
Link: https://lkml.kernel.org/r/20211129145658.2491-1-zhangyinan2019@email.szu.edu.cn
Signed-off-by: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Cc: Changhee Han <ch0.han@lge.com>
Cc: Sean Anderson <seanga2@gmail.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the ability to sort by stacktraces. This is helpful when
comparing multiple dumps of page_owner taken at different times, since
blocks will not be reordered if they were allocated/free'd.
Link: https://lkml.kernel.org/r/20211124193709.1805776-2-seanga2@gmail.com
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Changhee Han <ch0.han@lge.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The contents of page_owner have changed to include more information than
the stack trace. On a modern kernel, the blocks look like
Page allocated via order 0, mask 0x0(), pid 1, ts 165564237 ns, free_ts 0 ns
register_early_stack+0x4b/0x90
init_page_owner+0x39/0x250
kernel_init_freeable+0x11e/0x242
kernel_init+0x16/0x130
Sorting by the contents of .txt will result in almost no repeated pages,
as the pid, ts, and free_ts will almost never be the same. Instead,
sort by the contents of the stack trace, which we assume to be whatever
is after the first line.
[seanga2@gmail.com: fix NULL-pointer dereference when comparing stack traces]
Link: https://lkml.kernel.org/r/20211125162653.1855958-1-seanga2@gmail.com
Link: https://lkml.kernel.org/r/20211124193709.1805776-1-seanga2@gmail.com
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Cc: Changhee Han <ch0.han@lge.com>
Cc: Tang Bin <tangbin@cmss.chinamobile.com>
Cc: Zhang Shengju <zhangshengju@cmss.chinamobile.com>
Cc: Zhenliang Wei <weizhenliang@huawei.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Yinan Zhang <zhangyinan2019@email.szu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem
operation as distinct from 'cxl_pci' mailbox operations. Its primary
responsibility is enumerating an endpoint 'struct cxl_port' and all the
'struct cxl_port' instances between an endpoint and the CXL platform
root.
- Add a driver for 'struct cxl_port' objects responsible for enumerating
and operating all Host-managed Device Memory (HDM) decoder resources
between the platform-level CXL memory description, all intervening host
bridges / switches, and the HDM resources in endpoints.
- Update the cxl_pci driver to validate CXL.mem operation precursors to
HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC
based CXL.mem configuration.
- Add basic lockdep coverage for usage of device_lock() on CXL subsystem
objects similar to what exists for LIBNVDIMM. Include a compile-time
switch for which subsystem to validate at run-time.
- Update cxl_test to emulate a one level switch topology.
- Document a "Theory of Operation" for the subsystem.
- Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs
- Include miscellaneous fixes for spec / QEMU CXL emulation
compatibility and static analysis reports.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjpX6AAKCRDfioYZHlFs
ZzyxAQCztxAXj7mzkm1Qt5zZz4e7p/6sR49B03jBTfPtrEF9kQEAl9R15WVt6U+o
Ooof1XhRic3kT6e8zS3ZVKHzGduYxwM=
=mR94
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"This development cycle extends the subsystem to discover CXL resources
throughout a CXL/PCIe switch topology and respond to hot add/remove
events anywhere in that topology.
This is more foundational infrastructure in preparation for dynamic
memory region provisioning support. Recall that CXL memory regions, as
the new "Theory of Operation" section of
Documentation/driver-api/cxl/memory-devices.rst describes, bring
storage volume striping semantics to memory.
The hot add/remove behavior is validated with extensions to the
cxl_test unit test environment and this test in the cxl-cli test
suite:
https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh
Summary:
- Add a driver for 'struct cxl_memdev' objects responsible for
CXL.mem operation as distinct from 'cxl_pci' mailbox operations.
Its primary responsibility is enumerating an endpoint 'struct
cxl_port' and all the 'struct cxl_port' instances between an
endpoint and the CXL platform root.
- Add a driver for 'struct cxl_port' objects responsible for
enumerating and operating all Host-managed Device Memory (HDM)
decoder resources between the platform-level CXL memory
description, all intervening host bridges / switches, and the HDM
resources in endpoints.
- Update the cxl_pci driver to validate CXL.mem operation precursors
to HDM decoder operation like ready-polling, and legacy CXL 1.1
DVSEC based CXL.mem configuration.
- Add basic lockdep coverage for usage of device_lock() on CXL
subsystem objects similar to what exists for LIBNVDIMM. Include a
compile-time switch for which subsystem to validate at run-time.
- Update cxl_test to emulate a one level switch topology.
- Document a "Theory of Operation" for the subsystem.
- Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs
- Include miscellaneous fixes for spec / QEMU CXL emulation
compatibility and static analysis reports"
* tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits)
cxl/core/port: Fix NULL but dereferenced coccicheck error
cxl/port: Hold port reference until decoder release
cxl/port: Fix endpoint refcount leak
cxl/core: Fix cxl_device_lock() class detection
cxl/core/port: Fix unregister_port() lock assertion
cxl/regs: Fix size of CXL Capability Header Register
cxl/core/port: Handle invalid decoders
cxl/core/port: Fix / relax decoder target enumeration
tools/testing/cxl: Add a physical_node link
tools/testing/cxl: Enumerate mock decoders
tools/testing/cxl: Mock one level of switches
tools/testing/cxl: Fix root port to host bridge assignment
tools/testing/cxl: Mock dvsec_ranges()
cxl/core/port: Add endpoint decoders
cxl/core: Move target_list out of base decoder attributes
cxl/mem: Add the cxl_mem driver
cxl/core/port: Add switch port enumeration
cxl/memdev: Add numa_node attribute
cxl/pci: Emit device serial number
cxl/pci: Implement wait for media active
...
Merge more updates from Andrew Morton:
"Various misc subsystems, before getting into the post-linux-next
material.
41 patches.
Subsystems affected by this patch series: procfs, misc, core-kernel,
lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
taskstats, panic, kcov, resource, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
kernel/resource: fix kfree() of bootmem memory again
kcov: properly handle subsequent mmap calls
kcov: split ioctl handling into locked and unlocked parts
panic: move panic_print before kmsg dumpers
panic: add option to dump all CPUs backtraces in panic_print
docs: sysctl/kernel: add missing bit to panic_print
taskstats: remove unneeded dead assignment
kasan: no need to unset panic_on_warn in end_report()
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
panic: unset panic_on_warn inside panic()
docs: kdump: add scp example to write out the dump file
docs: kdump: update description about sysfs file system support
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
cgroup: use irqsave in cgroup_rstat_flush_locked().
fat: use pointer to simple type in put_user()
minix: fix bug when opening a file with O_DIRECT
...
To pick the changes from:
fa31a4d669 ("x86/cpufeatures: Put the AMX macros in the word 18 block")
7b8f40b3de ("x86/cpu: Add definitions for the Intel Hardware Feedback Interface")
This only causes these perf files to be rebuilt:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Borislav Petkov <bp@suse.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/lkml/YjzZPxdyLjf76gM+@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick the changes in:
7c1ef59145 ("x86/cpufeatures: Re-enable ENQCMD")
That causes only these 'perf bench' objects to rebuild:
CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o
And addresses these perf build warnings:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h'
diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h
Cc: Borislav Petkov <bp@suse.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/lkml/YjzX+PknzGoKaGMX@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I have run into the following issue:
# perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7
Performance counter stats for 'system wide':
0 new_pmu/INSTRUCTION_7/
0.000366428 seconds time elapsed
#
The new PMU for s390 counts the execution of certain CPU instructions.
The root cause is the extremely small run time of the mytest program. It
just executes some assembly instructions and then exits.
In above invocation the instruction is executed exactly one time (-c1
option). The PMU is expected to report this one time execution by a
counter value of one, but fails to do so in some cases, not all.
Debugging reveals the invocation of the child process is done
*before* the counter events are installed and enabled.
Tracing reveals that sometimes the child process starts and exits before
the event is installed on all CPUs. The more CPUs the machine has, the
more often this miscount happens.
Fix this by reversing the start of the work load after the events have
been installed on the specified CPUs. Now the comment also matches the
code.
Output after:
# perf stat -a -e new_pmu/INSTRUCTION_7/ -- mytest -c1 7
Performance counter stats for 'system wide':
1 new_pmu/INSTRUCTION_7/
0.000366428 seconds time elapsed
#
Now the correct result is reported rock solid all the time regardless
how many CPUs are online.
Reviewers notes:
Jiri:
Right, without -a the event has enable_on_exec so the race does not
matter, but it's a problem for system wide with fork.
Namhyung:
Agreed. Also we may move the enable_counters() and the clock code out of
the if block to be shared with the else block.
Fixes: acf2892270 ("perf stat: Use perf_evlist__prepare/start_workload()")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20220317155346.577384-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes from these csets:
7b8f40b3de ("x86/cpu: Add definitions for the Intel Hardware Feedback Interface")
That cause no changes to tooling:
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
$ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
$ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
$ diff -u before after
$
Just silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'
diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Link: https://lore.kernel.org/lkml/YjzVt8CjAORAsTCo@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical sections
that last too long for very large guests backed by 4 KiB SPTEs.
- Zap invalid and defunct roots asynchronously via concurrency-managed
work queue.
- Allowing yielding when zapping TDP MMU roots in response to the root's
last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the paging
structure now holds RCU as a proxy for all vCPUs running in the guest,
i.e. to prolongs the grace period on their behalf. It then kicks the
the vCPUs out of guest mode before doing rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that
need memcg accounting
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmI4fdwUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroMq8gf/WoeVHtw2QlL5Mmz6McvRRmPAYPLV
wLUIFNrRqRvd8Tw4kivzZoh/xTpwmnojv0YdK5SjKAiMjgv094YI1LrNp1JSPvmL
pitocMkA10RSJNWHeEMg9cMSKH0rKiqeYl6S1e2XsdB+UZZ2BINOCVtvglmjTAvJ
dFBdKdBkqjAUZbdXAGIvz4JEEER3N/LkFDKGaUGX+0QIQOzGBPIyLTxynxIDG6mt
RViCCFyXdy5NkVp5hZFm96vQ2qAlWL9B9+iKruQN++82+oqWbeTdSqPhdwF7GyFz
BfOv3gobQ2c4ef/aMLO5LswZ9joI1t/4kQbbAn6dNybpOAz/NXfDnbNefg==
=keox
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"ARM:
- Proper emulation of the OSLock feature of the debug architecture
- Scalibility improvements for the MMU lock when dirty logging is on
- New VMID allocator, which will eventually help with SVA in VMs
- Better support for PMUs in heterogenous systems
- PSCI 1.1 support, enabling support for SYSTEM_RESET2
- Implement CONFIG_DEBUG_LIST at EL2
- Make CONFIG_ARM64_ERRATUM_2077057 default y
- Reduce the overhead of VM exit when no interrupt is pending
- Remove traces of 32bit ARM host support from the documentation
- Updated vgic selftests
- Various cleanups, doc updates and spelling fixes
RISC-V:
- Prevent KVM_COMPAT from being selected
- Optimize __kvm_riscv_switch_to() implementation
- RISC-V SBI v0.3 support
s390:
- memop selftest
- fix SCK locking
- adapter interruptions virtualization for secure guests
- add Claudio Imbrenda as maintainer
- first step to do proper storage key checking
x86:
- Continue switching kvm_x86_ops to static_call(); introduce
static_call_cond() and __static_call_ret0 when applicable.
- Cleanup unused arguments in several functions
- Synthesize AMD 0x80000021 leaf
- Fixes and optimization for Hyper-V sparse-bank hypercalls
- Implement Hyper-V's enlightened MSR bitmap for nested SVM
- Remove MMU auditing
- Eager splitting of page tables (new aka "TDP" MMU only) when dirty
page tracking is enabled
- Cleanup the implementation of the guest PGD cache
- Preparation for the implementation of Intel IPI virtualization
- Fix some segment descriptor checks in the emulator
- Allow AMD AVIC support on systems with physical APIC ID above 255
- Better API to disable virtualization quirks
- Fixes and optimizations for the zapping of page tables:
- Zap roots in two passes, avoiding RCU read-side critical
sections that last too long for very large guests backed by 4
KiB SPTEs.
- Zap invalid and defunct roots asynchronously via
concurrency-managed work queue.
- Allowing yielding when zapping TDP MMU roots in response to the
root's last reference being put.
- Batch more TLB flushes with an RCU trick. Whoever frees the
paging structure now holds RCU as a proxy for all vCPUs running
in the guest, i.e. to prolongs the grace period on their behalf.
It then kicks the the vCPUs out of guest mode before doing
rcu_read_unlock().
Generic:
- Introduce __vcalloc and use it for very large allocations that need
memcg accounting"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (246 commits)
KVM: use kvcalloc for array allocations
KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2
kvm: x86: Require const tsc for RT
KVM: x86: synthesize CPUID leaf 0x80000021h if useful
KVM: x86: add support for CPUID leaf 0x80000021
KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask
Revert "KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()"
kvm: x86/mmu: Flush TLB before zap_gfn_range releases RCU
KVM: arm64: fix typos in comments
KVM: arm64: Generalise VM features into a set of flags
KVM: s390: selftests: Add error memop tests
KVM: s390: selftests: Add more copy memop tests
KVM: s390: selftests: Add named stages for memop test
KVM: s390: selftests: Add macro as abstraction for MEM_OP
KVM: s390: selftests: Split memop tests
KVM: s390x: fix SCK locking
RISC-V: KVM: Implement SBI HSM suspend call
RISC-V: KVM: Add common kvm_riscv_vcpu_wfi() function
RISC-V: Add SBI HSM suspend related defines
RISC-V: KVM: Implement SBI v0.3 SRST extension
...
Hi Linus,
Please, pull the following treewide patch that replaces zero-length arrays with
flexible-array members. This patch has been baking in linux-next for a
whole development cycle.
Thanks
--
Gustavo
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG
2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw
GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY
dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy
T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7
t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh
yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ
rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r
uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i
EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S
bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO
NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k=
=xZD2
-----END PGP SIGNATURE-----
Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull flexible-array transformations from Gustavo Silva:
"Treewide patch that replaces zero-length arrays with flexible-array
members.
This has been baking in linux-next for a whole development cycle"
* tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
treewide: Replace zero-length arrays with flexible-array members
There are no users of "__bitwise__" except the definition of
"__bitwise". Remove __bitwise__ and define __bitwise directly.
This is a follow-up to 05de97003c ("linux/types.h: enable endian
checks for all sparse builds").
[akpm@linux-foundation.org: change the tools/include/linux/types.h definition also]
Link: https://lkml.kernel.org/r/20220310220927.245704-2-helgaas@kernel.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
It's been a fairly calm development cycle. There are a few
last-minute ALSA core fixes, most notably for covering PCM ioctl
races, but the most of rest are device-specific changes.
Below are some highlights:
* ALSA core:
- Fixes for PCM ioctl races that may lead to UAF
- Fix for oversized allocations in PCM OSS layer
* ASoC:
- Start of moving SoF to support multiple IPC mechanisms
- Use of NHLT ACPI table to reduce the amount of quirking required for
Intel systems
- Preliminary works forthcoming Intel AVS driver for legacy Intel DSP
firmwares
- Support for AMD PDM, Atmel PDMC, Awinic AW8738, i.MX cards with
TLV320AIC31xx, Intel machines with CS35L41 and ESSX8336, Mediatek
MT8181 wideband bluetooth, nVidia Tegra234, Qualcomm SC7280, Renesas
RZ/V2L, Texas Instruments TAS585M
* HD-audio:
- Driver re-binding fix for HD-audio
- Updates for Intel ADL and Tegra234, various platform quirks for
Dell, HP, Lenovo, ASUS, Samsung and Clevo machines
* USB-audio:
- Quirk updates for Scarlett2, RODE, Corsair devices
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmI7AkUOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/faBAAvPFODmyJlt16UG7bSlqwoSafWho+Bp4GSH4O
+pEm47+kULgkKOm9k2NK7sci6nOsNIabQsVhMeryCLgDlNlFqR4FQjIswbgtRWsO
lmu3TMw26I0vS2joNE+tpqCOyJuEGI/ekQru3aKAZx6JyBlXmrzuf7L4BNomVORr
fgBgpMg/tRcE9ceWjc1qHMggueAfkcjnI4ioFYxaWYXp4wyVX1mx3mVHEf6WQnff
ZXsgQLhupUKLvyBr2D1vkN6JcRyTahkBprbLEtZhKszR8hl6tFlnyILkzsiZ/B+K
oJAvtEoC6z2PW+suPSPPl2qnbyOJyX32m43iCXW8uSG1KG/K2JshZIJshMbVw3pV
rLK3XYr2zoE3VzzNUL+QyGYhLpdDPSNF+E19z7jfWU/wKwCUu8qWuejhf9uAlQgx
XtlrZuyCpnsNVyILqLM2Sgzvc1U8vJd68uYwhecchTmP0Aurld5NM2PiAagcvVpW
RtEMbTJbIBYbou3UPhxDjEdQOeT+KZUYrClEjb61pJQ9sHAbC4l0LoRyS4NEWCZH
J7Z5DNPqPf6CFU1AVpfktL4Dh+VtM7nb4DVyyyLWWZgG3NcXSVLLbUA8Uo9qoDV5
7tHnV+1MURBwEq1CUvZtb3sRC5tyNVkzXMMAJfcVWlv7JkoXs8pzwK9w685aP2zl
YDOfau8=
=5cCU
-----END PGP SIGNATURE-----
Merge tag 'sound-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"It's been a fairly calm development cycle. There are a few last-minute
ALSA core fixes, most notably for covering PCM ioctl races, but the
most of rest are device-specific changes.
Below are some highlights:
ALSA core:
- Fixes for PCM ioctl races that may lead to UAF
- Fix for oversized allocations in PCM OSS layer
ASoC:
- Start of moving SoF to support multiple IPC mechanisms
- Use of NHLT ACPI table to reduce the amount of quirking required
for Intel systems
- Preliminary works forthcoming Intel AVS driver for legacy Intel DSP
firmwares
- Support for AMD PDM, Atmel PDMC, Awinic AW8738, i.MX cards with
TLV320AIC31xx, Intel machines with CS35L41 and ESSX8336, Mediatek
MT8181 wideband bluetooth, nVidia Tegra234, Qualcomm SC7280,
Renesas RZ/V2L, Texas Instruments TAS585M
HD-audio:
- Driver re-binding fix for HD-audio
- Updates for Intel ADL and Tegra234, various platform quirks for
Dell, HP, Lenovo, ASUS, Samsung and Clevo machines
USB-audio:
- Quirk updates for Scarlett2, RODE, Corsair devices"
* tag 'sound-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (486 commits)
ALSA: hda/realtek: Add alc256-samsung-headphone fixup
ALSA: pci: fix reading of swapped values from pcmreg in AC97 codec
ALSA: pcm: Add stream lock during PCM reset ioctl operations
ALSA: pcm: Fix races among concurrent prealloc proc writes
ALSA: pcm: Fix races among concurrent prepare and hw_params/hw_free calls
ALSA: pcm: Fix races among concurrent read/write and buffer changes
ALSA: pcm: Fix races among concurrent hw_params and hw_free calls
ASoC: atmel: mchp-pdmc: print the correct property name
MAINTAINERS: Add Shengjiu to maintainer list of sound/soc/fsl
ASoC: SOF: Add a new dai_get_clk topology IPC op
ASoC: SOF: topology: Add ops for setting up and tearing down pipelines
ASoC: SOF: expose sof_route_setup()
ASoC: SOF: Add dai_link_fixup PCM op for IPC3
ASoC: SOF: Add trigger PCM op for IPC3
ASoC: SOF: Define hw_params PCM op for IPC3
ASoC: SOF: Introduce IPC3 PCM hw_free op
ASoC: SOF: pcm: expose the sof_pcm_setup_connected_widgets() function
ASoC: SOF: Introduce IPC-specific PCM ops
ASoC: SOF: Add bytes_ext control IPC ops for IPC3
ASoC: SOF: Add bytes_get/put control IPC ops for IPC3
...
Update the arch_prctl test to check the permission bitmap whether the
requested feature is added as expected or not.
Every non-dynamic feature that is enabled is permitted already for use.
TILECFG is not dynamic feature. Ensure the bit is always on from
ARCH_GET_XCOMP_PERM.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220129173647.27981-3-chang.seok.bae@intel.com