The previous patches have removed the need to do the mount and umount
dance when switching netns. In particular:
* Avoid remounting /sys/fs/bpf to have a clean start
* Avoid remounting /sys to get a ifindex of a particular netns
This patch can finally remove the mount and umount dance in
{open,close}_netns which is unnecessarily complicated and
error-prone.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-6-martin.lau@linux.dev
This patch removes the need to pin prog in the remaining tests in
tc_redirect.c by directly using the bpf_tc_hook_create() and
bpf_tc_attach(). The clsact qdisc will go away together with
the test netns, so no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-5-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_peer_l3
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-4-martin.lau@linux.dev
This patch removes the need to pin prog in the tc_redirect_dtime
test by directly using the bpf_tc_hook_create() and bpf_tc_attach().
The clsact qdisc will go away together with the test netns, so
no need to do bpf_tc_hook_destroy().
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-3-martin.lau@linux.dev
When switching netns, the setns_by_fd() is doing dances in mount/umounting
the /sys directories. One reason is the tc_redirect.c test is depending
on the /sys/net/class/*/ifindex instead of using the if_nametoindex().
if_nametoindex() uses ioctl() to get the ifindex.
This patch is to move all /sys/net/class/*/ifindex usages to
if_nametoindex(). The current code checks ifindex >= 0 which is
incorrect. ifindex > 0 should be checked instead. This patch also
stores ifindex_veth_src and ifindex_veth_dst since the latter patch
will need them.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221129070900.3142427-2-martin.lau@linux.dev
Torture test the cases where the runstate crosses a page boundary, and
and especially the case where it's configured in 32-bit mode and doesn't,
but then switching to 64-bit mode makes it go onto the second page.
To simplify this, make the KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST ioctl
also update the guest runstate area. It already did so if the actual
runstate changed, as a side-effect of kvm_xen_update_runstate(). So
doing it in the plain adjustment case is making it more consistent, as
well as giving us a nice way to trigger the update without actually
running the vCPU again and changing the values.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.
I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The guest runstate area can be arbitrarily byte-aligned. In fact, even
when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
fields in the structure end up being unaligned due to the fact that the
32-bit ABI only aligns them to 32 bits.
So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
is buggy, because if it's unaligned then we can't update the whole field
atomically; the low bytes might be observable before the _UPDATE bit is.
Xen actually updates the *byte* containing that top bit, on its own. KVM
should do the same.
In addition, we cannot assume that the runstate area fits within a single
page. One option might be to make the gfn_to_pfn cache cope with regions
that cross a page — but getting a contiguous virtual kernel mapping of a
discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
and it seems this is the *only* current use case for the GPC which would
benefit from it.
An earlier version of the runstate code did use a gfn_to_hva cache for
this purpose, but it still had the single-page restriction because it
used the uhva directly — because it needs to be able to do so atomically
when the vCPU is being scheduled out, so it used pagefault_disable()
around the accesses and didn't just use kvm_write_guest_cached() which
has a fallback path.
So... use a pair of GPCs for the first and potential second page covering
the runstate area. We can get away with locking both at once because
nothing else takes more than one GPC lock at a time so we can invent
a trivial ordering rule.
The common case where it's all in the same page is kept as a fast path,
but in both cases, the actual guest structure (compat or not) is built
up from the fields in @vx, following preset pointers to the state and
times fields. The only difference is whether those pointers point to
the kernel stack (in the split case) or to guest memory directly via
the GPC. The fast path is also fixed to use a byte access for the
XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
memcpy.
Finally, Xen also does write the runstate area immediately when it's
configured. Flip the kvm_xen_update_runstate() and …_guest() functions
and call the latter directly when the runstate area is set. This means
that other ioctls which modify the runstate also write it immediately
to the guest when they do so, which is also intended.
Update the xen_shinfo_test to exercise the pathological case where the
XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
actually in a different page to the rest of the 64-bit word.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The systemwide perf hardware breakpoint test tries to open a perf event
on each cpu. On large systems, we run out of file descriptors and fail
the test. Instead, have the test set the file descriptor limit to an
arbitraty high value.
Reported-by: Rohan Deshpande <rohan_d@linux.vnet.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/187fed5843cecc1e5066677b6296ee88337d7bef.1669096083.git.naveen.n.rao@linux.vnet.ibm.com
This patch first adds TFO support in mptcp_connect.c.
This can be enabled via a new option: -o MPTFO.
Once enabled, the TCP_FASTOPEN socket option is enabled for the server
side and a sendto() with MSG_FASTOPEN is used instead of a connect() for
the client side.
Note that the first SYN has a limit of bytes it can carry. In other
words, it is allowed to send less data than the provided one. We then
need to track more status info to properly allow the next sendmsg()
starting from the next part of the data to send the rest.
Also in TFO scenarios, we need to completely spool the partially xmitted
buffer -- and account for that -- before starting sendfile/mmap xmit,
otherwise the relevant tests will fail.
Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Dmytro Shytyi <dmytro@shytyi.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the async test case was introduced, despite being a completely
independent test case, the command to run it was added to the same shell
script as the smoke test case. Since a shell script implicitly returns
the error code from the last run command, this effectively caused the
script to only return as error code the result from the async test case,
hiding the smoke test result (which could then only be seen from the
python unittest logs).
Move the async test case call to its own shell script runner to avoid
the aforementioned issue. This also makes the output clearer to read,
since each kselftest KTAP result now matches with one python unittest
report.
While at it, also make it so the async test case is skipped if
/dev/tpmrm0 doesn't exist, since commit 8335adb8f9 ("selftests: tpm:
add async space test with noneexisting handle") added a test that relies
on it.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
sysfs now supports splice_* operations with
commit f2d6c2708b ("kernfs: wire up ->splice_read and ->splice_write")
Update the selftests to expect success instead of failure.
Cc: Shuah Khan <shuah@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Siddharth Gupta <sidgup@codeaurora.org
Reported-by: Dilip Kota <dilip.kota@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmOGOdYACgkQMUZtbf5S
IrsknQ//SAoOyDOEu15YzOt8hAupLKoF6MM+D0dwwTEQZLf7IVXCjPpkKtVh7Si7
YCBoyrqrDs7vwaUrVoKY19Amwov+EYrHCpdC+c7wdZ7uxTaYfUbJJUGmxYOR179o
lV1+1Aiqg9F9C6CUsmZ5lDN2Yb7/uPDBICIV8LM+VzJAtXjurBVauyMwAxLxPOAr
cgvM+h5xzE7DXMF2z8R/mUq5MSIWoJo9hy2UwbV+f2liMTQuw9rwTbyw3d7+H/6p
xmJcBcVaABjoUEsEhld3NTlYbSEnlFgCQBfDWzf2e4y6jBxO0JepuIc7SZwJFRJY
XBqdsKcGw5RkgKbksKUgxe126XFX0SUUQEp0UkOIqe15k7eC2yO9uj1gRm6OuV4s
J94HKzHX9WNV5OQ790Ed2JyIJScztMZlNFVJ/cz2/+iKR42xJg6kaO6Rt2fobtmL
VC2cH+RfHzLl+2+7xnfzXEDgFePSBlA02Aq1wihU3zB3r7WCFHchEf9T7sGt1QF0
03R+8E3+N2tYqphPAXyDoy6kXQJTPxJHAe1FNHJlwgfieUDEWZi/Pm+uQrKIkDeo
oq9MAV2QBNSD1w4wl7cXfvicO5kBr/OP6YBqwkpsGao2jCSIgkWEX2DRrUaLczXl
5/Z+m/gCO5tAEcVRYfMivxUIon//9EIhbErVpHTlNWpRHk24eS4=
=0Lnw
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, can and wifi.
Current release - new code bugs:
- eth: mlx5e:
- use kvfree() in mlx5e_accel_fs_tcp_create()
- MACsec, fix RX data path 16 RX security channel limit
- MACsec, fix memory leak when MACsec device is deleted
- MACsec, fix update Rx secure channel active field
- MACsec, fix add Rx security association (SA) rule memory leak
Previous releases - regressions:
- wifi: cfg80211: don't allow multi-BSSID in S1G
- stmmac: set MAC's flow control register to reflect current settings
- eth: mlx5:
- E-switch, fix duplicate lag creation
- fix use-after-free when reverting termination table
Previous releases - always broken:
- ipv4: fix route deletion when nexthop info is not specified
- bpf: fix a local storage BPF map bug where the value's spin lock
field can get initialized incorrectly
- tipc: re-fetch skb cb after tipc_msg_validate
- wifi: wilc1000: fix Information Element parsing
- packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
- sctp: fix memory leak in sctp_stream_outq_migrate()
- can: can327: fix potential skb leak when netdev is down
- can: add number of missing netdev freeing on error paths
- aquantia: do not purge addresses when setting the number of rings
- wwan: iosm:
- fix incorrect skb length leading to truncated packet
- fix crash in peek throughput test due to skb UAF"
* tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits)
net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed
MAINTAINERS: Update maintainer list for chelsio drivers
ionic: update MAINTAINERS entry
sctp: fix memory leak in sctp_stream_outq_migrate()
packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE
net/mlx5: Lag, Fix for loop when checking lag
Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path"
net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions
net: tun: Fix use-after-free in tun_detach()
net: mdiobus: fix unbalanced node reference count
net: hsr: Fix potential use-after-free
tipc: re-fetch skb cb after tipc_msg_validate
mptcp: fix sleep in atomic at close time
mptcp: don't orphan ssk in mptcp_close()
dsa: lan9303: Correct stat name
ipv4: Fix route deletion when nexthop info is not specified
net: wwan: iosm: fix incorrect skb length
net: wwan: iosm: fix crash in peek throughput test
net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type
net: wwan: iosm: fix kernel test robot reported error
...
Signal that a test run is complete through perf_test_args instead of
having tests open code a similar solution. Ensure that the field resets
to false at the beginning of a test run as the structure is reused
between test runs, eliminating a couple of bugs:
access_tracking_perf_test hangs indefinitely on a subsequent test run,
as 'done' remains true. The bug doesn't amount to much right now, as x86
supports a single guest mode. However, this is a precondition of
enabling the test for other architectures with >1 guest mode, like
arm64.
memslot_modification_stress_test has the exact opposite problem, where
subsequent test runs complete immediately as 'run_vcpus' remains false.
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
[oliver: added commit message, preserve spin_wait_for_next_iteration()]
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221118211503.4049023-2-oliver.upton@linux.dev
The minimal alsa-lib configuration code is similar in both mixer
and pcm tests. Move this code to the shared conf.c source file.
Also, fix the build rules inspired by rseq tests. Build libatest.so
which is linked to the both test utilities dynamically.
Also, set the TEST_FILES variable for lib.mk.
Cc: linux-kselftest@vger.kernel.org
Cc: Shuah Khan <shuah@kernel.org>
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Tested-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221129085306.2345763-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4AC5QAKCRDbK58LschI
g1e0AQCfAqduTy7mYd02jDNCV0wLphNp9FbPiP9OrQT37ABpKAEA1ulj1X59bX3d
HnZdDKuatcPZT9MV5hDLM7MFJ9GjOA4=
=fNmM
-----END PGP SIGNATURE-----
Daniel Borkmann says:
====================
bpf-next 2022-11-25
We've added 101 non-merge commits during the last 11 day(s) which contain
a total of 109 files changed, 8827 insertions(+), 1129 deletions(-).
The main changes are:
1) Support for user defined BPF objects: the use case is to allocate own
objects, build own object hierarchies and use the building blocks to
build own data structures flexibly, for example, linked lists in BPF,
from Kumar Kartikeya Dwivedi.
2) Add bpf_rcu_read_{,un}lock() support for sleepable programs,
from Yonghong Song.
3) Add support storing struct task_struct objects as kptrs in maps,
from David Vernet.
4) Batch of BPF map documentation improvements, from Maryam Tahhan
and Donald Hunter.
5) Improve BPF verifier to propagate nullness information for branches
of register to register comparisons, from Eduard Zingerman.
6) Fix cgroup BPF iter infra to hold reference on the start cgroup,
from Hou Tao.
7) Fix BPF verifier to not mark fentry/fexit program arguments as trusted
given it is not the case for them, from Alexei Starovoitov.
8) Improve BPF verifier's realloc handling to better play along with dynamic
runtime analysis tools like KASAN and friends, from Kees Cook.
9) Remove legacy libbpf mode support from bpftool,
from Sahid Orentino Ferdjaoui.
10) Rework zero-len skb redirection checks to avoid potentially breaking
existing BPF test infra users, from Stanislav Fomichev.
11) Two small refactorings which are independent and have been split out
of the XDP queueing RFC series, from Toke Høiland-Jørgensen.
12) Fix a memory leak in LSM cgroup BPF selftest, from Wang Yufen.
13) Documentation on how to run BPF CI without patch submission,
from Daniel Müller.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
====================
Link: https://lore.kernel.org/r/20221125012450.441-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY4ACtQAKCRDbK58LschI
gyIcAP4nE/chC+gYDOleloK2tQlQawM5Sa4kwHjFzBdPD3tRrAEAs6y2fJjb5vZo
OXIFKhXv5Xo3knmynkTSVSVOvB48IAE=
=Cme8
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
bpf 2022-11-25
We've added 10 non-merge commits during the last 8 day(s) which contain
a total of 7 files changed, 48 insertions(+), 30 deletions(-).
The main changes are:
1) Several libbpf ringbuf fixes related to probing for its availability,
size overflows when mmaping a 2G ringbuf and rejection of invalid
reservationsizes, from Hou Tao.
2) Fix a buggy return pointer in libbpf for attach_raw_tp function,
from Jiri Olsa.
3) Fix a local storage BPF map bug where the value's spin lock field
can get initialized incorrectly, from Xu Kuohai.
4) Two follow-up fixes in kprobe_multi BPF selftests for BPF CI,
from Jiri Olsa.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Make test_bench_attach serial
selftests/bpf: Filter out default_idle from kprobe_multi bench
bpf: Set and check spin lock value in sk_storage_map_test
bpf: Do not copy spin lock field from user in bpf_selem_alloc
libbpf: Check the validity of size in user_ring_buffer__reserve()
libbpf: Handle size overflow for user ringbuf mmap
libbpf: Handle size overflow for ringbuf mmap
libbpf: Use page size as max_entries when probing ring buffer map
bpf, perf: Use subprog name when reporting subprog ksymbol
libbpf: Use correct return pointer in attach_raw_tp
====================
Link: https://lore.kernel.org/r/20221125001034.29473-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When the kernel receives a route deletion request from user space it
tries to delete a route that matches the route attributes specified in
the request.
If only prefix information is specified in the request, the kernel
should delete the first matching FIB alias regardless of its associated
FIB info. However, an error is currently returned when the FIB info is
backed by a nexthop object:
# ip nexthop add id 1 via 192.0.2.2 dev dummy10
# ip route add 198.51.100.0/24 nhid 1
# ip route del 198.51.100.0/24
RTNETLINK answers: No such process
Fix by matching on such a FIB info when legacy nexthop attributes are
not specified in the request. An earlier check already covers the case
where a nexthop ID is specified in the request.
Add tests that cover these flows. Before the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [FAIL]
Tests passed: 11
Tests failed: 1
After the fix:
# ./fib_nexthops.sh -t ipv4_fcnal
...
TEST: Delete route when not specifying nexthop attributes [ OK ]
Tests passed: 12
Tests failed: 0
No regressions in other tests:
# ./fib_nexthops.sh
...
Tests passed: 228
Tests failed: 0
# ./fib_tests.sh
...
Tests passed: 186
Tests failed: 0
Cc: stable@vger.kernel.org
Reported-by: Jonas Gorski <jonas.gorski@gmail.com>
Tested-by: Jonas Gorski <jonas.gorski@gmail.com>
Fixes: 493ced1ac4 ("ipv4: Allow routes to use nexthop objects")
Fixes: 6bf92d70e6 ("net: ipv4: fix route with nexthop object delete warning")
Fixes: 61b91eb33a ("ipv4: Handle attempt to delete multipath route when fib_info contains an nh reference")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20221124210932.2470010-1-idosch@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
* Fixes for Xen emulation. While nobody should be enabling it in
the kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
* Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a use-after-free
when it disables virtualization extensions. While downgrading the panic
to a WARN is quite easy, the full fix is a bit more laborious; there
are also tests. This is the bulk of the pull request.
* Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
* Obey changes to the kvm.halt_poll_ns module parameter in VMs
not using KVM_CAP_HALT_POLL, restoring behavior from before
the introduction of the capability
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmODI84UHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroPVJwgAombWOBf549JiHGPtwejuQO20nTSj
Om9pzWQ9dR182P+ju/FdqSPXt/Lc8i+z5zSXDrV3HQ6/a3zIItA+bOAUiMFvHNAQ
w/7pEb1MzVOsEg2SXGOjZvW3WouB4Z4R0PosInYjrFrRGRAaw5iaTOZHGezE44t2
WBWk1PpdMap7J/8sjNT1ble72ig9JdSW4qeJUQ1GWxHCigI5sESCQVqF446KM0jF
gTYPGX5TqpbWiIejF0yNew9yNKMi/yO4Pz8I5j3vtopeHx24DCIqUAGaEg6ykErX
vnzYbVP7NaFrqtje49PsK6i1cu2u7uFPArj0dxo3DviQVZVHV1q6tNmI4A==
=Qgei
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"x86:
- Fixes for Xen emulation. While nobody should be enabling it in the
kernel (the only public users of the feature are the selftests),
the bug effectively allows userspace to read arbitrary memory.
- Correctness fixes for nested hypervisors that do not intercept INIT
or SHUTDOWN on AMD; the subsequent CPU reset can cause a
use-after-free when it disables virtualization extensions. While
downgrading the panic to a WARN is quite easy, the full fix is a
bit more laborious; there are also tests. This is the bulk of the
pull request.
- Fix race condition due to incorrect mmu_lock use around
make_mmu_pages_available().
Generic:
- Obey changes to the kvm.halt_poll_ns module parameter in VMs not
using KVM_CAP_HALT_POLL, restoring behavior from before the
introduction of the capability"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Update gfn_to_pfn_cache khva when it moves within the same page
KVM: x86/xen: Only do in-kernel acceleration of hypercalls for guest CPL0
KVM: x86/xen: Validate port number in SCHEDOP_poll
KVM: x86/mmu: Fix race condition in direct_page_fault
KVM: x86: remove exit_int_info warning in svm_handle_exit
KVM: selftests: add svm part to triple_fault_test
KVM: x86: allow L1 to not intercept triple fault
kvm: selftests: add svm nested shutdown test
KVM: selftests: move idt_entry to header
KVM: x86: forcibly leave nested mode on vCPU reset
KVM: x86: add kvm_leave_nested
KVM: x86: nSVM: harden svm_free_nested against freeing vmcb02 while still in use
KVM: x86: nSVM: leave nested mode on vCPU free
KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL
KVM: Avoid re-reading kvm->max_halt_poll_ns during halt-polling
KVM: Cap vcpu->halt_poll_ns before halting rather than after
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/ftrace`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests/gpio`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The latest version of grep claims the egrep is now obsolete so the build
now contains warnings that look like:
egrep: warning: egrep is obsolescent; using grep -E
fix this up by moving the related file to use "grep -E" instead.
sed -i "s/egrep/grep -E/g" `grep egrep -rwl tools/testing/selftests`
Here are the steps to install the latest grep:
wget http://ftp.gnu.org/gnu/grep/grep-3.8.tar.gz
tar xf grep-3.8.tar.gz
cd grep-3.8 && ./configure && make
sudo make install
export PATH=/usr/local/bin:$PATH
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The newly added PCM test produces a binary which is not ignored by git
when built in tree, fix that.
Fixes: aba51cd094 ("selftests: alsa - add PCM test")
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Jaroslav Kysela <perex@perex.cz>
Link: https://lore.kernel.org/r/20221125153654.1037868-1-broonie@kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Since we now flush output immediately on starting children we should ensure
that the child name is set beforehand so that any output that does get
flushed from the newly created child has the name of the child attached.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221124120722.150988-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add a few positive/negative tests to test bpf_rcu_read_lock()
and its corresponding verifier support. The new test will fail
on s390x and aarch64, so an entry is added to each of their
respective deny lists.
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221124053222.2374650-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify when a bond is configured with {up,down}delay and the link state
of slave members flaps if there are no remaining members up the bond
should immediately select a member to bring up. (from bonding.txt
section 13.1 paragraph 4)
Suggested-by: Liang Li <liali@redhat.com>
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add some selftest testcases that validate the expected behavior of the
bpf_task_from_pid() kfunc that was added in the prior patch.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122145300.251210-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf_cgroup_ancestor() allows BPF programs to access the ancestor of a
struct cgroup *. This patch adds selftests that validate its expected
behavior.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122055458.173143-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds a selftest suite to validate the cgroup kfuncs that were
added in the prior patch.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221122055458.173143-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently LLVM fails to recognize .data.* as data section and defaults to .text
section. Later BPF backend tries to emit 4-byte NOP instruction which doesn't
exist in BPF ISA and aborts.
The fix for LLVM is pending:
https://reviews.llvm.org/D138477
While waiting for the fix lets workaround the linked_list test case
by using .bss.* prefix which is properly recognized by LLVM as BSS section.
Fix libbpf to support .bss. prefix and adjust tests.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Install a cleanup function using the trap command for signals EXIT,
SIGINT, SIGQUIT and SIGABRT. The cleanup function will perform:
1. Online the CPUs that were made offline during the test.
2. Removing the cgroups created.
3. Restoring the original /sys/kernel/debug/sched/verbose value,
currently it's left turned on, irrespective of the original
configuration value.
the test performs steps 1 and 2, on the successful runs, but not during
all of the failed runs. With the cleanup(), the system will perform all
three steps during failed/passed test runs.
Signed-off-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Acked-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Add checking of the test return value, otherwise it will report success
forever for test_create_read().
Fixes: dff6d2ae56 ("selftests/efivarfs: clean up test files from test_create*()")
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The selftests/net does not have proper cross-compilation support, and
does not properly state libbpf as a dependency. Mimic/copy the BPF
build from selftests/bpf, which has the nice side-effect that libbpf
is built as well.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/r/20221119171841.2014936-1-bjorn@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
LWT_XMIT to test L3 case, TC to test L2 case.
v2:
- s/veth_ifindex/ipip_ifindex/ in two places (Martin)
- add comment about which condition triggers the rejection (Martin)
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221121180340.1983627-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Alexei hit another rcu warnings because of this test.
Making test_bench_attach serial so it does not disrupts
other tests during parallel tests run.
While this change is not the fix, it should be less likely
to hit it with this test being executed serially.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221116100228.2064612-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Alexei hit following rcu warning when running prog_test -j.
[ 128.049567] WARNING: suspicious RCU usage
[ 128.049569] 6.1.0-rc2 #912 Tainted: G O
...
[ 128.050944] kprobe_multi_link_handler+0x6c/0x1d0
[ 128.050947] ? kprobe_multi_link_handler+0x42/0x1d0
[ 128.050950] ? __cpuidle_text_start+0x8/0x8
[ 128.050952] ? __cpuidle_text_start+0x8/0x8
[ 128.050958] fprobe_handler.part.1+0xac/0x150
[ 128.050964] 0xffffffffa02130c8
[ 128.050991] ? default_idle+0x5/0x20
[ 128.050998] default_idle+0x5/0x20
It's caused by bench test attaching kprobe_multi link to default_idle
function, which is not executed in rcu safe context so the kprobe
handler on top of it will trigger the rcu warning.
Filtering out default_idle function from the bench test.
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20221116100228.2064612-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update sk_storage_map_test to make sure kernel does not copy user
non-zero value spin lock to kernel, and does not copy kernel spin
lock value to user.
If user spin lock value is copied to kernel, this test case will
make kernel spin on the copied lock, resulting in rcu stall and
softlockup.
Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20221114134720.1057939-3-xukuohai@huawei.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test closes both iterator link fd and cgroup fd, and removes the
cgroup file to make a dead cgroup before reading from cgroup iterator.
It also uses kern_sync_rcu() and usleep() to wait for the release of
start cgroup. If the start cgroup is not pinned by cgroup iterator,
reading from iterator fd will trigger use-after-free.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hao Luo <haoluo@google.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-4-houtao@huaweicloud.com
Add remove_cgroup() to remove a cgroup which doesn't have any children
or live processes. It will be used by the following patch to test cgroup
iterator on a dead cgroup.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221121073440.1828292-3-houtao@huaweicloud.com
The `nettest` binary, built from `selftests/net/nettest.c`,
was expected to be found in the path during test execution of
`fcnal-test.sh` and `pmtu.sh`, leading to tests getting
skipped when the binary is not installed in the system, as can
be seen in these logs found in the wild [1]:
# TEST: vti4: PMTU exceptions [SKIP]
[ 350.600250] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 350.607421] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm6udp not supported
# TEST: vti6: PMTU exceptions (ESP-in-UDP) [SKIP]
[ 351.605102] IPv6: ADDRCONF(NETDEV_CHANGE): veth_b: link becomes ready
[ 351.612243] IPv6: ADDRCONF(NETDEV_CHANGE): veth_a: link becomes ready
# 'nettest' command not found; skipping tests
# xfrm4udp not supported
The `unicast_extensions.sh` tests also rely on `nettest`, but
it runs fine there because it looks for the binary in the
current working directory [2]:
The same mechanism that works for the Unicast extensions tests
is here copied over to the PMTU and functional tests.
[1] https://lkft.validation.linaro.org/scheduler/job/5839508#L6221
[2] https://lkft.validation.linaro.org/scheduler/job/5839508#L7958
Signed-off-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conform to the rest of Hyper-V emulation selftests which have 'hyperv'
prefix. Get rid of '_test' suffix as well as the purpose of this code
is fairly obvious.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-49-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the Partition
assist page.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-48-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Enable Hyper-V L2 TLB flush and check that Hyper-V TLB flush hypercalls
from L2 don't exit to L1 unless 'TlbLockCount' is set in the
Partition assist page.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-47-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V MSR-Bitmap tests do RDMSR from L2 to exit to L1. While 'evmcs_test'
correctly clobbers all GPRs (which are not preserved), 'hyperv_svm_test'
does not. Introduce a more generic rdmsr_from_l2() to avoid code
duplication and remove hardcoding of MSRs. Do not put it in common code
because it is really just a selftests bug rather than a processor
feature that requires it.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-46-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
vmmcall()/vmcall() are used to exit from L2 to L1 and no concrete hypercall
ABI is currenty followed. With the introduction of Hyper-V L2 TLB flush
it becomes (theoretically) possible that L0 will take responsibility for
handling the call and no L1 exit will happen. Prevent this by stuffing RAX
(KVM ABI) and RCX (Hyper-V ABI) with 'safe' values.
While on it, convert vmmcall() to 'static inline', make it setup stack
frame and move to include/x86_64/svm_util.h.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-45-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
There's no need to pollute VMX and SVM code with Hyper-V specific
stuff and allocate Hyper-V specific test pages for all test as only
few really need them. Create a dedicated struct and an allocation
helper.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-43-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In preparation to putting Hyper-V specific test pages to a dedicated
struct, move eVMCS load logic from load_vmcs(). Tests call load_vmcs()
directly and the only one which needs 'enlightened' version is
evmcs_test so there's not much gain in having this merged.
Temporary pass both GPA and HVA to load_evmcs().
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-42-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Hyper-V VP assist page is not eVMCS specific, it is also used for
enlightened nSVM. Move the code to vendor neutral place.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-41-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'struct hv_vp_assist_page' definition doesn't match TLFS. Also, define
'struct hv_nested_enlightenments_control' and use it instead of opaque
'__u64'.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-40-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
'struct hv_enlightened_vmcs' definition in selftests is not '__packed'
and so we rely on the compiler doing the right padding. This is not
obvious so it seems beneficial to use the same definition as in kernel.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-39-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a selftest for Hyper-V PV TLB flush hypercalls
(HvFlushVirtualAddressSpace/HvFlushVirtualAddressSpaceEx,
HvFlushVirtualAddressList/HvFlushVirtualAddressListEx).
The test creates one 'sender' vCPU and two 'worker' vCPU which do busy
loop reading from a certain GVA checking the observed value. Sender
vCPU swaos the data page with another page filled with a different value.
The expectation for workers is also altered. Without TLB flush on worker
vCPUs, they may continue to observe old value. To guard against accidental
TLB flushes for worker vCPUs the test is repeated 100 times.
Hyper-V TLB flush hypercalls are tested in both 'normal' and 'XMM
fast' modes.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-38-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Extend the test to check the scenario when NCI core tries to send data
to already closed device to ensure that nothing bad happens.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Bongsu Jeon <bongsu.jeon@samsung.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Three tests are added. One is from John Fastabend ({1]) which tests
tracing style access for xdp program from the kernel ctx.
Another is a tc test to test both kernel ctx tracing style access
and explicit non-ctx type cast. The third one is for negative tests
including two tests, a tp_bpf test where the bpf_rdonly_cast()
returns a untrusted ptr which cannot be used as helper argument,
and a tracepoint test where the kernel ctx is a u64.
Also added the test to DENYLIST.s390x since s390 does not currently
support calling kernel functions in JIT mode.
[1] https://lore.kernel.org/bpf/20221109215242.1279993-1-john.fastabend@gmail.com/
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221120195442.3114844-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
A previous change added a series of kfuncs for storing struct
task_struct objects as referenced kptrs. This patch adds a new
task_kfunc test suite for validating their expected behavior.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-5-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.
The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.
There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:
SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
struct task_struct *task,
u64 clone_flags)
{
struct task_struct *acquired, *nested;
nested = task->last_wakee;
/* Would not be rejected by the verifier. */
acquired = bpf_task_acquire(nested);
if (!acquired)
return 0;
bpf_task_release(acquired);
return 0;
}
To address this, this patch defines a new type flag called PTR_TRUSTED
which tracks whether a PTR_TO_BTF_ID pointer is safe to pass to a
KF_TRUSTED_ARGS kfunc or a BPF helper function. PTR_TRUSTED pointers are
passed directly from the kernel as a tracepoint or struct_ops callback
argument. Any nested pointer that is obtained from walking a PTR_TRUSTED
pointer is no longer PTR_TRUSTED. From the example above, the struct
task_struct *task argument is PTR_TRUSTED, but the 'nested' pointer
obtained from 'task->last_wakee' is not PTR_TRUSTED.
A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will add selftests to validate.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20221120051004.3605026-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
'size' is unsigned, it never less than zero.
Link: https://lkml.kernel.org/r/20221105110611.28920-1-yuehaibing@huawei.com
Fixes: 6c26df84e1 ("selftests: cgroup: return -errno from cg_read()/cg_write() on failure")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: zefan li <lizefan.x@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add local_config.h and local_config.mk to .gitignore to avoid accidentally
checking it in.
Link: https://lkml.kernel.org/r/20221103005754.205420-1-zhaogongyi@huawei.com
Signed-off-by: Zhao Gongyi <zhaogongyi@huawei.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
syscall(3) returns -1 and sets errno on error, unlike "syscall"
instruction.
Systems which have <= 32/64 CPUs are unaffected. Test won't bounce
to all CPUs before completing if there are more of them.
Link: https://lkml.kernel.org/r/Y1bUiT7VRXlXPQa1@p183
Fixes: 1f5bd05476 ("proc: selftests: test /proc/uptime")
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Instead of adding the whole test to DENYLIST.s390x, which also has
success test cases that should be run, just skip over failure test
cases in case the JIT does not support kfuncs.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118185938.2139616-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, tests can only request a new vaddr range by using
vm_vaddr_alloc()/vm_vaddr_alloc_page()/vm_vaddr_alloc_pages() but
these functions allocate and map physical pages too. Make it possible
to request unmapped range too.
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-36-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Similar to vm_vaddr_alloc(), virt_map() needs to reflect the mapping
in vm->vpages_mapped.
While on it, remove unneeded code wrapping in vm_vaddr_alloc().
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-35-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Introduce a selftest for Hyper-V PV IPI hypercalls
(HvCallSendSyntheticClusterIpi, HvCallSendSyntheticClusterIpiEx).
The test creates one 'sender' vCPU and two 'receiver' vCPU and then
issues various combinations of send IPI hypercalls in both 'normal'
and 'fast' (with XMM input where necessary) mode. Later, the test
checks whether IPIs were delivered to the expected destination vCPU[s].
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-34-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
All Hyper-V specific tests issuing hypercalls need this.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-33-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
HYPERV_LINUX_OS_ID needs to be written to HV_X64_MSR_GUEST_OS_ID by
each Hyper-V specific selftest.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-32-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
set_xmm()/get_xmm() helpers are fairly useless as they only read 64 bits
from 128-bit registers. Moreover, these helpers are not used. Borrow
_kvm_read_sse_reg()/_kvm_write_sse_reg() from KVM limiting them to
XMM0-XMM8 for now.
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-31-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Now that KVM isn't littered with "struct hv_enlightenments" casts, rename
the struct to "hv_vmcb_enlightenments" to highlight the fact that the
struct is specifically for SVM's VMCB.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Michael Kelley <mikelley@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-5-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a union to provide hv_enlightenments side-by-side with the sw_reserved
bytes that Hyper-V's enlightenments overlay. Casting sw_reserved
everywhere is messy, confusing, and unnecessarily unsafe.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-4-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move Hyper-V's VMCB "struct hv_enlightenments" to the svm.h header so
that the struct can be referenced in "struct vmcb_control_area".
Alternatively, a dedicated header for SVM+Hyper-V could be added, a la
x86_64/evmcs.h, but it doesn't appear that Hyper-V will end up needing
a wholesale replacement for the VMCB.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-3-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Move Hyper-V's VMCB enlightenment definitions to the TLFS header; the
definitions come directly from the TLFS[*], not from KVM.
No functional change intended.
[*] https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/datatypes/hv_svm_enlightened_vmcb_fields
[vitaly: rename VMCB_HV_ -> HV_VMCB_ to match the rest of
hyperv-tlfs.h, keep svm/hyperv.h]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20221101145426.251680-2-vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The preferred form of the str/ldr for predicate registers with an immediate
of zero is to omit the zero, and the clang built in assembler rejects the
zero immediate. Drop the immediate.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221117114130.687261-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
After commit afef88e655 ("selftests/bpf: Store BPF object files with
.bpf.o extension"), we should use xdp_dummy.bpf.o instade of xdp_dummy.o.
In addition, use the BPF_FILE variable to save the BPF object file name,
which can be better identified and modified.
Fixes: afef88e655 ("selftests/bpf: Store BPF object files with .bpf.o extension")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Daniel Müller <deso@posteo.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds 12 small test cases: 01-04 test for the sysctl
net.sctp.l3mdev_accept. 05-10 test for only binding to a right
l3mdev device, the connection can be created. 11-12 test for
two socks binding to different l3mdev devices at the same time,
each of them can process the packets from the corresponding
peer. The tests run for both IPv4 and IPv6 SCTP.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The estimated time was supposing the rate was expressed in mibit
(bit * 1024^2) but it is in mbit (bit * 1000^2).
This makes the threshold higher but in a more realistic way to avoid
false positives reported by CI instances.
Before this patch, the thresholds were at 7561/4005ms and now they are
at 7906/4178ms.
While at it, also fix a typo in the linked comment, spotted by Mat.
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/310
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Not running it from a new netns causes issues if some MPTCP settings are
modified, e.g. if MPTCP is disabled from the sysctl knob, if multiple
addresses are available and added to the MPTCP path-manager, etc.
In these cases, the created connection will not behave as expected, e.g.
unable to create an MPTCP socket, more than one subflow is seen, etc.
A new "sandbox" net namespace is now created and used to run
mptcp_sockopt from this controlled environment.
Fixes: ce9979129a ("selftests: mptcp: add mptcp getsockopt test cases")
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
On slow or busy VM, some test-cases still fail because the
data transfer completes before the endpoint manipulation
actually took effect.
Address the issue by artificially increasing the runtime for
the relevant test-cases.
Fixes: ef360019db ("selftests: mptcp: signal addresses testcases")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/309
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Preparing the metadata for bpf_list_head involves a complicated parsing
step and type resolution for the contained value. Ensure that corner
cases are tested against and invalid specifications in source are duly
rejected. Also include tests for incorrect ownership relationships in
the BTF.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-24-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Include various tests covering the success and failure cases. Also, run
the success cases at runtime to verify correctness of linked list
manipulation routines, in addition to ensuring successful verification.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-23-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
First, ensure that whenever a bpf_spin_lock is present in an allocation,
the reg->id is preserved. This won't be true for global variables
however, since they have a single map value per map, hence the verifier
harcodes it to 0 (so that multiple pseudo ldimm64 insns can yield the
same lock object per map at a given offset).
Next, add test cases for all possible combinations (kptr, global, map
value, inner map value). Since we lifted restriction on locking in inner
maps, also add test cases for them. Currently, each lookup into an inner
map gets a fresh reg->id, so even if the reg->map_ptr is same, they will
be treated as separate allocations and the incorrect unlock pairing will
be rejected.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-22-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Make updates in preparation for adding more test cases to this selftest:
- Convert from CHECK_ to ASSERT macros.
- Use BPF skeleton
- Fix typo sping -> spin
- Rename spinlock.c -> spin_lock.c
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-21-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add user facing __contains macro which provides a convenient wrapper
over the verbose kernel specific BTF declaration tag required to
annotate BPF list head structs in user types.
Acked-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-20-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a linked list API for use in BPF programs, where it expects
protection from the bpf_spin_lock in the same allocation as the
bpf_list_head. For now, only one bpf_spin_lock can be present hence that
is assumed to be the one protecting the bpf_list_head.
The following functions are added to kick things off:
// Add node to beginning of list
void bpf_list_push_front(struct bpf_list_head *head, struct bpf_list_node *node);
// Add node to end of list
void bpf_list_push_back(struct bpf_list_head *head, struct bpf_list_node *node);
// Remove node at beginning of list and return it
struct bpf_list_node *bpf_list_pop_front(struct bpf_list_head *head);
// Remove node at end of list and return it
struct bpf_list_node *bpf_list_pop_back(struct bpf_list_head *head);
The lock protecting the bpf_list_head needs to be taken for all
operations. The verifier ensures that the lock that needs to be taken is
always held, and only the correct lock is taken for these operations.
These checks are made statically by relying on the reg->id preserved for
registers pointing into regions having both bpf_spin_lock and the
objects protected by it. The comment over check_reg_allocation_locked in
this change describes the logic in detail.
Note that bpf_list_push_front and bpf_list_push_back are meant to
consume the object containing the node in the 1st argument, however that
specific mechanism is intended to not release the ref_obj_id directly
until the bpf_spin_unlock is called. In this commit, nothing is done,
but the next commit will be introducing logic to handle this case, so it
has been left as is for now.
bpf_list_pop_front and bpf_list_pop_back delete the first or last item
of the list respectively, and return pointer to the element at the
list_node offset. The user can then use container_of style macro to get
the actual entry type. The verifier however statically knows the actual
type, so the safety properties are still preserved.
With these additions, programs can now manage their own linked lists and
store their objects in them.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-17-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce bpf_obj_drop, which is the kfunc used to free allocated
objects (allocated using bpf_obj_new). Pairing with bpf_obj_new, it
implicitly destructs the fields part of object automatically without
user intervention.
Just like the previous patch, btf_struct_meta that is needed to free up
the special fields is passed as a hidden argument to the kfunc.
For the user, a convenience macro hides over the kernel side kfunc which
is named bpf_obj_drop_impl.
Continuing the previous example:
void prog(void) {
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return;
bpf_obj_drop(f);
}
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-15-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Introduce type safe memory allocator bpf_obj_new for BPF programs. The
kernel side kfunc is named bpf_obj_new_impl, as passing hidden arguments
to kfuncs still requires having them in prototype, unlike BPF helpers
which always take 5 arguments and have them checked using bpf_func_proto
in verifier, ignoring unset argument types.
Introduce __ign suffix to ignore a specific kfunc argument during type
checks, then use this to introduce support for passing type metadata to
the bpf_obj_new_impl kfunc.
The user passes BTF ID of the type it wants to allocates in program BTF,
the verifier then rewrites the first argument as the size of this type,
after performing some sanity checks (to ensure it exists and it is a
struct type).
The second argument is also fixed up and passed by the verifier. This is
the btf_struct_meta for the type being allocated. It would be needed
mostly for the offset array which is required for zero initializing
special fields while leaving the rest of storage in unitialized state.
It would also be needed in the next patch to perform proper destruction
of the object's special fields.
Under the hood, bpf_obj_new will call bpf_mem_alloc and bpf_mem_free,
using the any context BPF memory allocator introduced recently. To this
end, a global instance of the BPF memory allocator is initialized on
boot to be used for this purpose. This 'bpf_global_ma' serves all
allocations for bpf_obj_new. In the future, bpf_obj_new variants will
allow specifying a custom allocator.
Note that now that bpf_obj_new can be used to allocate objects that can
be linked to BPF linked list (when future linked list helpers are
available), we need to also free the elements using bpf_mem_free.
However, since the draining of elements is done outside the
bpf_spin_lock, we need to do migrate_disable around the call since
bpf_list_head_free can be called from map free path where migration is
enabled. Otherwise, when called from BPF programs migration is already
disabled.
A convenience macro is included in the bpf_experimental.h header to hide
over the ugly details of the implementation, leading to user code
looking similar to a language level extension which allocates and
constructs fields of a user type.
struct bar {
struct bpf_list_node node;
};
struct foo {
struct bpf_spin_lock lock;
struct bpf_list_head head __contains(bar, node);
};
void prog(void) {
struct foo *f;
f = bpf_obj_new(typeof(*f));
if (!f)
return;
...
}
A key piece of this story is still missing, i.e. the free function,
which will come in the next patch.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-14-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As we continue to add more features, argument types, kfunc flags, and
different extensions to kfuncs, the code to verify the correctness of
the kfunc prototype wrt the passed in registers has become ad-hoc and
ugly to read. To make life easier, and make a very clear split between
different stages of argument processing, move all the code into
verifier.c and refactor into easier to read helpers and functions.
This also makes sharing code within the verifier easier with kfunc
argument processing. This will be more and more useful in later patches
as we are now moving to implement very core BPF helpers as kfuncs, to
keep them experimental before baking into UAPI.
Remove all kfunc related bits now from btf_check_func_arg_match, as
users have been converted away to refactored kfunc argument handling.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-12-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
It's very unusual to have both a command line option and a compile time
option, and apparently that's confusing to people. Also, basically
everybody enables the compile time option now, which means people who
want to disable this wind up having to use the command line option to
ensure that anyway. So just reduce the number of moving pieces and nix
the compile time option in favor of the more versatile command line
option.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
When cross-compiling [1], the get_sys_includes make macro should use
the target system include path, and not the build hosts system include
path.
Make clang honor the CROSS_COMPILE triple.
[1] e.g. "ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- make"
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Link: https://lore.kernel.org/bpf/20221115182051.582962-2-bjorn@kernel.org
When cross-compiling selftests/bpf, the resolve_btfids binary end up
in a different directory, than the regular resolve_btfids
builds. Populate RESOLVE_BTFIDS for sub-make, so it can find the
binary.
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20221115182051.582962-1-bjorn@kernel.org
Attestation is used to verify the trustworthiness of a TDX guest.
During the guest bring-up, the Intel TDX module measures and records
the initial contents and configuration of the guest, and at runtime,
guest software uses runtime measurement registers (RMTRs) to measure
and record details related to kernel image, command line params, ACPI
tables, initrd, etc. At guest runtime, the attestation process is used
to attest to these measurements.
The first step in the TDX attestation process is to get the TDREPORT
data. It is a fixed size data structure generated by the TDX module
which includes the above mentioned measurements data, a MAC ID to
protect the integrity of the TDREPORT, and a 64-Byte of user specified
data passed during TDREPORT request which can uniquely identify the
TDREPORT.
Intel's TDX guest driver exposes TDX_CMD_GET_REPORT0 IOCTL interface to
enable guest userspace to get the TDREPORT subtype 0.
Add a kernel self test module to test this ABI and verify the validity
of the generated TDREPORT.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Acked-by: Wander Lairson Costa <wander@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/all/20221116223820.819090-4-sathyanarayanan.kuppuswamy%40linux.intel.com
Current release - regressions:
- tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
Previous releases - regressions:
- bridge: fix memory leaks when changing VLAN protocol
- dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
- dsa: don't leak tagger-owned storage on switch driver unbind
- eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6 is removed
- eth: stmmac: ensure tx function is not running in stmmac_xdp_release()
- eth: hns3: fix return value check bug of rx copybreak
Previous releases - always broken:
- kcm: close race conditions on sk_receive_queue
- bpf: fix alignment problem in bpf_prog_test_run_skb()
- bpf: fix writing offset in case of fault in strncpy_from_kernel_nofault
- eth: macvlan: use built-in RCU list checking
- eth: marvell: add sleep time after enabling the loopback bit
- eth: octeon_ep: fix potential memory leak in octep_device_setup()
Misc:
- tcp: configurable source port perturb table size
- bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmN2FlMSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkWAwQAJcV7XEB7bEssgabFkEmC4uvS/sFlyHC
uSwFRn5ojaB2c56T1CnNYmitg9Wr4arC6Vca28iai6BgqB6t4qLRI/WWTsZiEPhi
mt/pjNN2u9JMyaafHFHYfXnbSDWRF7kPMpNw4l3uL0vkGyjSI7LGAOP4Qh8C1h/d
tNVSDZnj4Laj/3JtDf7Rk6ydCqPYnNdWxFfoZ/SQkjYZKD3Ze9tml7WJykAzCTLp
yUiPC6TvHOnWIZYbB04sVVOQD4V+95TSOgEhB6wzs/CXB7iBEY+N+oCedjP9Xrfw
n3ea4anBoTleDnJXJI57LhdJBkyoXncfbpbYLwXljyIgosr7XVTALvOG8XUhg/DW
FzN5DWQ54jzTsx2eXFJzjQQcDIgyxazk9EdoHdqF8byCasP+fofq1JvzyqtvNSyh
h8Ps6jdMZrWpXuFDVApXUhP32A/+9q+dFSYHJO681m6mf4CIaUXdm4aB1dkxDAvg
PSlk797U94RQCzJgqxhrgsq1PGQPBb+qadZrAiD3aQi26g0NWCTg7uFpCeCEK2ZF
fLwc2XxrwLQm1q7xQVoEg4UxPIIf0mUesvOD9sLDYop0rFIw8x0v7jdYM4kyhN3o
6FWAXKxBe3LJ9jTTsVTbZbfHYpTnS8Q2KSclBN+/dZNHwwsUPHjy17Z2Ct3o3Jlm
lNbiiD30BgsD
=vVJk
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from bpf.
Current release - regressions:
- tls: fix memory leak in tls_enc_skb() and tls_sw_fallback_init()
Previous releases - regressions:
- bridge: fix memory leaks when changing VLAN protocol
- dsa: make dsa_master_ioctl() see through port_hwtstamp_get() shims
- dsa: don't leak tagger-owned storage on switch driver unbind
- eth: mlxsw: avoid warnings when not offloaded FDB entry with IPv6
is removed
- eth: stmmac: ensure tx function is not running in
stmmac_xdp_release()
- eth: hns3: fix return value check bug of rx copybreak
Previous releases - always broken:
- kcm: close race conditions on sk_receive_queue
- bpf: fix alignment problem in bpf_prog_test_run_skb()
- bpf: fix writing offset in case of fault in
strncpy_from_kernel_nofault
- eth: macvlan: use built-in RCU list checking
- eth: marvell: add sleep time after enabling the loopback bit
- eth: octeon_ep: fix potential memory leak in octep_device_setup()
Misc:
- tcp: configurable source port perturb table size
- bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)"
* tag 'net-6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (51 commits)
net: use struct_group to copy ip/ipv6 header addresses
net: usb: smsc95xx: fix external PHY reset
net: usb: qmi_wwan: add Telit 0x103a composition
netdevsim: Fix memory leak of nsim_dev->fa_cookie
tcp: configurable source port perturb table size
l2tp: Serialize access to sk_user_data with sk_callback_lock
net: thunderbolt: Fix error handling in tbnet_init()
net: microchip: sparx5: Fix potential null-ptr-deref in sparx_stats_init() and sparx5_start()
net: lan966x: Fix potential null-ptr-deref in lan966x_stats_init()
net: dsa: don't leak tagger-owned storage on switch driver unbind
net/x25: Fix skb leak in x25_lapb_receive_frame()
net: ag71xx: call phylink_disconnect_phy if ag71xx_hw_enable() fail in ag71xx_open()
bridge: switchdev: Fix memory leaks when changing VLAN protocol
net: hns3: fix setting incorrect phy link ksettings for firmware in resetting process
net: hns3: fix return value check bug of rx copybreak
net: hns3: fix incorrect hw rss hash type of rx packet
net: phy: marvell: add sleep time after enabling the loopback bit
net: ena: Fix error handling in ena_init()
kcm: close race conditions on sk_receive_queue
net: ionic: Fix error handling in ionic_init_module()
...
This fixes three issues in nested SVM:
1) in the shutdown_interception() vmexit handler we call kvm_vcpu_reset().
However, if running nested and L1 doesn't intercept shutdown, the function
resets vcpu->arch.hflags without properly leaving the nested state.
This leaves the vCPU in inconsistent state and later triggers a kernel
panic in SVM code. The same bug can likely be triggered by sending INIT
via local apic to a vCPU which runs a nested guest.
On VMX we are lucky that the issue can't happen because VMX always
intercepts triple faults, thus triple fault in L2 will always be
redirected to L1. Plus, handle_triple_fault() doesn't reset the vCPU.
INIT IPI can't happen on VMX either because INIT events are masked while
in VMX mode.
Secondarily, KVM doesn't honour SHUTDOWN intercept bit of L1 on SVM.
A normal hypervisor should always intercept SHUTDOWN, a unit test on
the other hand might want to not do so.
Finally, the guest can trigger a kernel non rate limited printk on SVM
from the guest, which is fixed as well.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a SVM implementation to triple_fault_test to test that
emulated/injected shutdown works.
Since instead of the VMX, the SVM allows the hypervisor to avoid
intercepting shutdown in guest, don't intercept shutdown to test that
KVM suports this correctly.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-9-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add test that tests that on SVM if L1 doesn't intercept SHUTDOWN,
then L2 crashes L1 and doesn't crash L2
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-7-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
struct idt_entry will be used for a test which will break IDT on purpose.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20221103141351.50662-6-mlevitsk@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
kmemleak reports this issue:
unreferenced object 0xffff88810b7835c0 (size 32):
comm "test_progs", pid 270, jiffies 4294969007 (age 1621.315s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
03 00 00 00 03 00 00 00 0f 00 00 00 00 00 00 00 ................
backtrace:
[<00000000376cdeab>] kmalloc_trace+0x27/0x110
[<000000003bcdb3b6>] selinux_sk_alloc_security+0x66/0x110
[<000000003959008f>] security_sk_alloc+0x47/0x80
[<00000000e7bc6668>] sk_prot_alloc+0xbd/0x1a0
[<0000000002d6343a>] sk_alloc+0x3b/0x940
[<000000009812a46d>] unix_create1+0x8f/0x3d0
[<000000005ed0976b>] unix_create+0xa1/0x150
[<0000000086a1d27f>] __sock_create+0x233/0x4a0
[<00000000cffe3a73>] __sys_socket_create.part.0+0xaa/0x110
[<0000000007c63f20>] __sys_socket+0x49/0xf0
[<00000000b08753c8>] __x64_sys_socket+0x42/0x50
[<00000000b56e26b3>] do_syscall_64+0x3b/0x90
[<000000009b4871b8>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue occurs in the following scenarios:
unix_create1()
sk_alloc()
sk_prot_alloc()
security_sk_alloc()
call_int_hook()
hlist_for_each_entry()
entry1->hook.sk_alloc_security
<-- selinux_sk_alloc_security() succeeded,
<-- sk->security alloced here.
entry2->hook.sk_alloc_security
<-- bpf_lsm_sk_alloc_security() failed
goto out_free;
... <-- the sk->security not freed, memleak
The core problem is that the LSM is not yet fully stacked (work is
actively going on in this space) which means that some LSM hooks do
not support multiple LSMs at the same time. To fix, skip the
"EPERM" test when it runs in the environments that already have
non-bpf lsms installed
Fixes: dca85aac88 ("selftests/bpf: lsm_cgroup functional test")
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Cc: Stanislav Fomichev <sdf@google.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/1668482980-16163-1-git-send-email-wangyufen@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Now that a VM isn't needed to check for nEPT support, assert that KVM
supports nEPT in prepare_eptp() instead of skipping the test, and push
the TEST_REQUIRE() check out to individual tests. The require+assert are
somewhat redundant and will incur some amount of ongoing maintenance
burden, but placing the "require" logic in the test makes it easier to
find/understand a test's requirements and in this case, provides a very
strong hint that the test cares about nEPT.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
When checking for nEPT support in KVM, use kvm_get_feature_msr() instead
of vcpu_get_msr() to retrieve KVM's default TRUE_PROCBASED_CTLS and
PROCBASED_CTLS2 MSR values, i.e. don't require a VM+vCPU to query nEPT
support.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20220927165209.930904-1-dmatlack@google.com
[sean: rebase on merged code, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Drop kvm_get_supported_cpuid_entry() and its inner helper now that all
known usage can use X86_FEATURE_*, X86_PROPERTY_*, X86_PMU_FEATURE_*, or
the dedicated Family/Model helpers. Providing "raw" access to CPUID
leafs is undesirable as it encourages open coding CPUID checks, which is
often error prone and not self-documenting.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-13-seanjc@google.com
Add KVM variants of the x86 Family and Model helpers, and use them in the
PMU event filter test. Open code the retrieval of KVM's supported CPUID
entry 0x1.0 in anticipation of dropping kvm_get_supported_cpuid_entry().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-12-seanjc@google.com
Add dedicated helpers for getting x86's Family and Model, which are the
last holdouts that "need" raw access to CPUID information. FMS info is
a mess and requires not only splicing together multiple values, but
requires doing so conditional in the Family case.
Provide wrappers to reduce the odds of copy+paste errors, but mostly to
allow for the eventual removal of kvm_get_supported_cpuid_entry().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-11-seanjc@google.com
Add an X86_PMU_FEATURE_* framework to simplify probing architectural
events on Intel PMUs, which require checking the length of a bit vector
and the _absence_ of a "feature" bit. Add helpers for both KVM and
"this CPU", and use the newfangled magic (along with X86_PROPERTY_*)
to clean up pmu_event_filter_test.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-10-seanjc@google.com
Add X86_PROPERTY_PMU_VERSION and use it in vmx_pmu_caps_test to replace
open coded versions of the same functionality.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-9-seanjc@google.com
Add and use x86 "properties" for the myriad AMX CPUID values that are
validated by the AMX test. Drop most of the test's single-usage
helpers so that the asserts more precisely capture what check failed.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-8-seanjc@google.com
Extent X86_PROPERTY_* support to KVM, i.e. add kvm_cpu_property() and
kvm_cpu_has_p(), and use the new helpers in kvm_get_cpu_address_width().
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-7-seanjc@google.com
Use X86_PROPERTY_MAX_KVM_LEAF to replace the equivalent open coded check
on KVM's maximum paravirt CPUID leaf.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-5-seanjc@google.com
Introduce X86_PROPERTY_* to allow retrieving values/properties from CPUID
leafs, e.g. MAXPHYADDR from CPUID.0x80000008. Use the same core code as
X86_FEATURE_*, the primary difference is that properties are multi-bit
values, whereas features enumerate a single bit.
Add this_cpu_has_p() to allow querying whether or not a property exists
based on the maximum leaf associated with the property, e.g. MAXPHYADDR
doesn't exist if the max leaf for 0x8000_xxxx is less than 0x8000_0008.
Use the new property infrastructure in vm_compute_max_gfn() to prove
that the code works as intended. Future patches will convert additional
selftests code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-4-seanjc@google.com
Refactor the X86_FEATURE_* framework to prepare for extending the core
logic to support "properties". The "feature" framework allows querying a
single CPUID bit to detect the presence of a feature; the "property"
framework will extend the idea to allow querying a value, i.e. to get a
value that is a set of contiguous bits in a CPUID leaf.
Opportunistically add static asserts to ensure features are fully defined
at compile time, and to try and catch mistakes in the definition of
features.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-3-seanjc@google.com
Add X86_FEATURE_PAE and use it to guesstimate the MAXPHYADDR when the
MAXPHYADDR CPUID entry isn't supported.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006005125.680782-2-seanjc@google.com
Add a selftest to exercise the KVM_CAP_EXIT_ON_EMULATION_FAILURE
capability.
This capability is also exercised through
smaller_maxphyaddr_emulation_test, but that test requires
allow_smaller_maxphyaddr=Y, which is off by default on Intel when ept=Y
and unconditionally disabled on AMD when npt=Y. This new test ensures
that KVM_CAP_EXIT_ON_EMULATION_FAILURE is exercised independent of
allow_smaller_maxphyaddr.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-11-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Change smaller_maxphyaddr_emulation_test to expect a #PF(RSVD), rather
than an emulation failure, when TDP is disabled. KVM only needs to
emulate instructions to emulate a smaller guest.MAXPHYADDR when TDP is
enabled.
Fixes: 39bbcc3a4e ("selftests: kvm: Allows userspace to handle emulation errors.")
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-10-dmatlack@google.com
[sean: massage comment to talk about having to emulate due to MAXPHYADDR]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Provide the error code on a fault in KVM_ASM_SAFE(), e.g. to allow tests
to assert that #PF generates the correct error code without needing to
manually install a #PF handler. Use r10 as the scratch register for the
error code, as it's already clobbered by the asm blob (loaded with the
RIP of the to-be-executed instruction). Deliberately load the output
"error_code" even in the non-faulting path so that error_code is always
initialized with deterministic data (the aforementioned RIP), i.e to
ensure a selftest won't end up with uninitialized consumption regardless
of how KVM_ASM_SAFE() is used.
Don't clear r10 in the non-faulting case and instead load error code with
the RIP (see above). The error code is valid if and only if an exception
occurs, and '0' isn't necessarily a better "invalid" value, e.g. '0'
could result in false passes for a buggy test.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-9-dmatlack@google.com
Clear R9 in the non-faulting path of KVM_ASM_SAFE() and fall through to
to a common load of "vector" to effectively load "vector" with '0' to
reduce the code footprint of the asm blob, to reduce the runtime overhead
of the non-faulting path (when "vector" is stored in a register), and so
that additional output constraints that are valid if and only if a fault
occur are loaded even in the non-faulting case.
A future patch will add a 64-bit output for the error code, and if its
output is not explicitly loaded with _something_, the user of the asm
blob can end up technically consuming uninitialized data. Using a
common path to load the output constraints will allow using an existing
scratch register, e.g. r10, to hold the error code in the faulting path,
while also guaranteeing the error code is initialized with deterministic
data in the non-faulting patch (r10 is loaded with the RIP of
to-be-executed instruction).
Consuming the error code when a fault doesn't occur would obviously be a
test bug, but there's no guarantee the compiler will detect uninitialized
consumption. And conversely, it's theoretically possible that the
compiler might throw a false positive on uninitialized data, e.g. if the
compiler can't determine that the non-faulting path won't touch the error
code.
Alternatively, the error code could be explicitly loaded in the
non-faulting path, but loading a 64-bit memory|register output operand
with an explicitl value requires a sign-extended "MOV imm32, r/m64",
which isn't exactly straightforward and has a largish code footprint.
And loading the error code with what is effectively garbage (from a
scratch register) avoids having to choose an arbitrary value for the
non-faulting case.
Opportunistically remove a rogue asterisk in the block comment.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-8-dmatlack@google.com
Copy KVM's macros for page fault error masks into processor.h so they
can be used in selftests.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-7-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Move the flds instruction emulation failure handling code to a header
so it can be re-used in an upcoming test.
No functional change intended.
Signed-off-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-5-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Delete a bunch of code related to ucall handling from
smaller_maxphyaddr_emulation_test. The only thing
smaller_maxphyaddr_emulation_test needs to check is that the vCPU exits
with UCALL_DONE after the second vcpu_run().
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Hard-code the flds instruction and assert the exact instruction bytes
are present in run->emulation_failure. The test already requires the
instruction bytes to be present because that's the only way the test
will advance the RIP past the flds and get to GUEST_DONE().
Note that KVM does not necessarily return exactly 2 bytes in
run->emulation_failure since it may not know the exact instruction
length in all cases. So just assert that
run->emulation_failure.insn_size is at least 2.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename emulator_error_test to smaller_maxphyaddr_emulation_test and
update the comment at the top of the file to document that this is
explicitly a test to validate that KVM emulates instructions in response
to an EPT violation when emulating a smaller MAXPHYADDR.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221102184654.282799-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
In xapic_state_test's test_icr(), explicitly skip iterations that would
match vcpu->id instead of assuming vcpu->id is '0', so that IPIs are
are correctly sent to non-existent vCPUs.
Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/kvm/YyoZr9rXSSMEtdh5@google.com
Signed-off-by: Gautam Menghani <gautammenghani201@gmail.com>
Link: https://lore.kernel.org/r/20221017175819.12672-1-gautammenghani201@gmail.com
[sean: massage shortlog and changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add arch specific API kvm_arch_vm_post_create to perform any required setup
after VM creation.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-4-vannapurve@google.com
[sean: place x86's implementation by vm_arch_vcpu_add()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Introduce arch specific API: kvm_selftest_arch_init to allow each arch to
handle initialization before running any selftest logic.
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-3-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Consolidate common startup logic in one place by implementing a single
setup function with __attribute((constructor)) for all selftests within
kvm_util.c.
This allows moving logic like:
/* Tell stdout not to buffer its content */
setbuf(stdout, NULL);
to a single file for all selftests.
This will also allow any required setup at entry in future to be done in
common main function.
Link: https://lore.kernel.org/lkml/Ywa9T+jKUpaHLu%2Fl@google.com
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Link: https://lore.kernel.org/r/20221115213845.3348210-2-vannapurve@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Play nice with huge pages when getting PTEs and translating GVAs to GPAs,
there's no reason to disallow using huge pages in selftests. Use
PG_LEVEL_NONE to indicate that the caller doesn't care about the mapping
level and just wants to get the pte+level.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-8-seanjc@google.com
Verify the parent PTE is PRESENT when getting a child via virt_get_pte()
so that the helper can be used for getting PTEs/GPAs without losing
sanity checks that the walker isn't wandering into the weeds.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-5-seanjc@google.com
Remove the pointless shift from GPA=>GFN and immediately back to
GFN=>GPA when creating guest page tables. Ignore the other walkers
that have a similar pattern for the moment, they will be converted
to use virt_get_pte() in the near future.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-4-seanjc@google.com
Drop the reserved bit checks from the helper to retrieve a PTE, there's
very little value in sanity checking the constructed page tables as any
will quickly be noticed in the form of an unexpected #PF. The checks
also place unnecessary restrictions on the usage of the helpers, e.g. if
a test _wanted_ to set reserved bits for whatever reason.
Removing the NX check in particular allows for the removal of the @vcpu
param, which will in turn allow the helper to be reused nearly verbatim
for addr_gva2gpa().
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-3-seanjc@google.com
Drop vm_{g,s}et_page_table_entry() and instead expose the "inner"
helper (was _vm_get_page_table_entry()) that returns a _pointer_ to the
PTE, i.e. let tests directly modify PTEs instead of bouncing through
helpers that just make life difficult.
Opportunsitically use BIT_ULL() in emulator_error_test, and use the
MAXPHYADDR define to set the "rogue" GPA bit instead of open coding the
same value.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006004512.666529-2-seanjc@google.com
There is a spelling mistake in an assert message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20220928213458.64089-1-colin.i.king@gmail.com
[sean: fix an ironic typo in the changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
To play nice with guests whose stack memory is encrypted, e.g. AMD SEV,
introduce a new "ucall pool" implementation that passes the ucall struct
via dedicated memory (which can be mapped shared, a.k.a. as plain text).
Because not all architectures have access to the vCPU index in the guest,
use a bitmap with atomic accesses to track which entries in the pool are
free/used. A list+lock could also work in theory, but synchronizing the
individual pointers to the guest would be a mess.
Note, there's no need to rewalk the bitmap to ensure success. If all
vCPUs are simply allocating, success is guaranteed because there are
enough entries for all vCPUs. If one or more vCPUs are freeing and then
reallocating, success is guaranteed because vCPUs _always_ walk the
bitmap from 0=>N; if vCPU frees an entry and then wins a race to
re-allocate, then either it will consume the entry it just freed (bit is
the first free bit), or the losing vCPU is guaranteed to see the freed
bit (winner consumes an earlier bit, which the loser hasn't yet visited).
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Peter Gonda <pgonda@google.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-8-seanjc@google.com
Drop ucall_uninit() and ucall_arch_uninit() now that ARM doesn't modify
the host's copy of ucall_exit_mmio_addr, i.e. now that there's no need to
reset the pointer before potentially creating a new VM. The few calls to
ucall_uninit() are all immediately followed by kvm_vm_free(), and that is
likely always going to hold true, i.e. it's extremely unlikely a test
will want to effectively disable ucall in the middle of a test.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-7-seanjc@google.com
Fix a mostly-theoretical bug where ARM's ucall MMIO setup could result in
different VMs stomping on each other by cloberring the global pointer.
Fix the most obvious issue by saving the MMIO gpa into the VM.
A more subtle bug is that creating VMs in parallel (on multiple tasks)
could result in a VM using the wrong address. Synchronizing a global to
a guest effectively snapshots the value on a per-VM basis, i.e. the
"global" is already prepped to work with multiple VMs, but setting the
global in the host is not thread-safe. To fix that bug, add
write_guest_global() to allow stuffing a VM's copy of a "global" without
modifying the host value.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-6-seanjc@google.com
Do init_ucall() automatically during VM creation to kill two (three?)
birds with one stone.
First, initializing ucall immediately after VM creations allows forcing
aarch64's MMIO ucall address to immediately follow memslot0. This is
still somewhat fragile as tests could clobber the MMIO address with a
new memslot, but it's safe-ish since tests have to be conversative when
accounting for memslot0. And this can be hardened in the future by
creating a read-only memslot for the MMIO page (KVM ARM exits with MMIO
if the guest writes to a read-only memslot). Add a TODO to document that
selftests can and should use a memslot for the ucall MMIO (doing so
requires yet more rework because tests assumes thay can use all memslots
except memslot0).
Second, initializing ucall for all VMs prepares for making ucall
initialization meaningful on all architectures. aarch64 is currently the
only arch that needs to do any setup, but that will change in the future
by switching to a pool-based implementation (instead of the current
stack-based approach).
Lastly, defining the ucall MMIO address from common code will simplify
switching all architectures (except s390) to a common MMIO-based ucall
implementation (if there's ever sufficient motivation to do so).
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-4-seanjc@google.com
Consolidate the actual copying of a ucall struct from guest=>host into
the common get_ucall(). Return a host virtual address instead of a guest
virtual address even though the addr_gva2hva() part could be moved to
get_ucall() too. Conceptually, get_ucall() is invoked from the host and
should return a host virtual address (and returning NULL for "nothing to
see here" is far superior to returning 0).
Use pointer shenanigans instead of an unnecessary bounce buffer when the
caller of get_ucall() provides a valid pointer.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-3-seanjc@google.com
Make ucall() a common helper that populates struct ucall, and only calls
into arch code to make the actually call out to userspace.
Rename all arch-specific helpers to make it clear they're arch-specific,
and to avoid collisions with common helpers (one more on its way...)
Add WRITE_ONCE() to stores in ucall() code (as already done to aarch64
code in commit 9e2f6498ef ("selftests: KVM: Handle compiler
optimizations in ucall")) to prevent clang optimizations breaking ucalls.
Cc: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Tested-by: Peter Gonda <pgonda@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221006003409.649993-2-seanjc@google.com
Automatically disable single-step when the guest reaches the end of the
verified section instead of using an explicit ucall() to ask userspace to
disable single-step. An upcoming change to implement a pool-based scheme
for ucall() will add an atomic operation (bit test and set) in the guest
ucall code, and if the compiler generate "old school" atomics, e.g.
40e57c: c85f7c20 ldxr x0, [x1]
40e580: aa100011 orr x17, x0, x16
40e584: c80ffc31 stlxr w15, x17, [x1]
40e588: 35ffffaf cbnz w15, 40e57c <__aarch64_ldset8_sync+0x1c>
the guest will hang as the local exclusive monitor is reset by eret,
i.e. the stlxr will always fail due to the debug exception taken to EL2.
Link: https://lore.kernel.org/all/20221006003409.649993-8-seanjc@google.com
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221117002350.2178351-3-seanjc@google.com
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Disable single-step by setting debug.control to KVM_GUESTDBG_ENABLE,
not to SINGLE_STEP_DISABLE. The latter is an arbitrary test enum that
just happens to have the same value as KVM_GUESTDBG_ENABLE, and so
effectively disables single-step debug.
No functional change intended.
Cc: Reiji Watanabe <reijiw@google.com>
Fixes: b18e4d4aeb ("KVM: arm64: selftests: Add a test case for KVM_GUESTDBG_SINGLESTEP")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221117002350.2178351-2-seanjc@google.com
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
This tests PTRACE_SETREGSET with NT_X86_XSTATE modifying PKRU directly and
removing the PKRU bit from XSTATE_BV.
Signed-off-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/20221115230932.7126-7-khuey%40kylehuey.com
Replace the perf_test_ prefix on symbol names with memstress_ to match
the new file name.
"memstress" better describes the functionality proveded by this library,
which is to provide functionality for creating and running a VM that
stresses VM memory by reading and writing to guest memory on all vCPUs
in parallel.
"memstress" also contains the same number of chracters as "perf_test",
making it a drop-in replacement in symbols, e.g. function names, without
impacting line lengths. Also the lack of underscore between "mem" and
"stress" makes it clear "memstress" is a noun.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-4-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the local variables "pta" (which is short for perf_test_args) for
args. "pta" is not an obvious acronym and using "args" mirrors
"vcpu_args".
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-3-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Rename the perf_test_util.[ch] files to memstress.[ch]. Symbols are
renamed in the following commit to reduce the amount of churn here in
hopes of playiing nice with git's file rename detection.
The name "memstress" was chosen to better describe the functionality
proveded by this library, which is to create and run a VM that
reads/writes to guest memory on all vCPUs in parallel.
"memstress" also contains the same number of chracters as "perf_test",
making it a drop-in replacement in symbols, e.g. function names, without
impacting line lengths. Also the lack of underscore between "mem" and
"stress" makes it clear "memstress" is a noun.
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221012165729.3505266-2-dmatlack@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Create the ability to randomize page access order with the -a
argument. This includes the possibility that the same pages may be hit
multiple times during an iteration or not at all.
Population has random access as false to ensure all pages will be
touched by population and avoid page faults in late dirty memory that
would pollute the test results.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-5-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Randomize which pages are written vs read using the random number
generator.
Change the variable wr_fract and associated function calls to
write_percent that now operates as a percentage from 0 to 100 where X
means each page has an X% chance of being written. Change the -f
argument to -w to reflect the new variable semantics. Keep the same
default of 100% writes.
Population always uses 100% writes to ensure all memory is actually
populated and not just mapped to the zero page. The prevents expensive
copy-on-write faults from occurring during the dirty memory iterations
below, which would pollute the performance results.
Each vCPU calculates its own random seed by adding its index to the
seed provided.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-4-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Create a -r argument to specify a random seed. If no argument is
provided, the seed defaults to 1. The random seed is set with
perf_test_set_random_seed() and must be set before guest_code runs to
apply.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-3-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Implement random number generator for guest code to randomize parts
of the test, making it less predictable and a more accurate reflection
of reality.
The random number generator chosen is the Park-Miller Linear
Congruential Generator, a fancy name for a basic and well-understood
random number generator entirely sufficient for this purpose.
Signed-off-by: Colton Lewis <coltonlewis@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Link: https://lore.kernel.org/r/20221107182208.479157-2-coltonlewis@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add a command line option, -c, to pin vCPUs to physical CPUs (pCPUs),
i.e. to force vCPUs to run on specific pCPUs.
Requirement to implement this feature came in discussion on the patch
"Make page tables for eager page splitting NUMA aware"
https://lore.kernel.org/lkml/YuhPT2drgqL+osLl@google.com/
This feature is useful as it provides a way to analyze performance based
on the vCPUs and dirty log worker locations, like on the different NUMA
nodes or on the same NUMA nodes.
To keep things simple, implementation is intentionally very limited,
either all of the vCPUs will be pinned followed by an optional main
thread or nothing will be pinned.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-8-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Many KVM selftests take command line arguments which are supposed to be
positive (>0) or non-negative (>=0). Some tests do these validation and
some missed adding the check.
Add atoi_positive() and atoi_non_negative() to validate inputs in
selftests before proceeding to use those values.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-7-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Change test args memslot_modification_delay and nr_memslot_modifications
to delay and nr_iterations for simplicity.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-6-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Replace size_1gb defined in max_guest_memory_test.c with the SZ_1G,
SZ_2G and SZ_4G from linux/sizes.h header file.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-5-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
atoi() doesn't detect errors. There is no way to know that a 0 return
is correct conversion or due to an error.
Introduce atoi_paranoid() to detect errors and provide correct
conversion. Replace all atoi() calls with atoi_paranoid().
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Suggested-by: David Matlack <dmatlack@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-4-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
There are 13 command line options and they are not in any order. Put
them in alphabetical order to make it easy to add new options.
No functional change intended.
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Wei Wang <wei.w.wang@intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-3-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Passing -e option (Run VCPUs while dirty logging is being disabled) in
dirty_log_perf_test also unintentionally enables -g (Do not enable
KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2). Add break between two switch case
logic.
Fixes: cfe12e64b0 ("KVM: selftests: Add an option to run vCPUs while disabling dirty logging")
Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20221103191719.1559407-2-vipinsh@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
This initial code does a simple sample transfer tests. By default,
all PCM devices are detected and tested with short and long
buffering parameters for 4 seconds. If the sample transfer timing
is not in a +-100ms boundary, the test fails. Only the interleaved
buffering scheme is supported in this version.
The configuration may be modified with the configuration files.
A specific hardware configuration is detected and activated
using the sysfs regex matching. This allows to use the DMI string
(/sys/class/dmi/id/* tree) or any other system parameters
exposed in sysfs for the matching for the CI automation.
The configuration file may also specify the PCM device list to detect
the missing PCM devices.
v1..v2:
- added missing alsa-local.h header file
Cc: Mark Brown <broonie@kernel.org>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@intel.com>
Cc: Liam Girdwood <liam.r.girdwood@intel.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Cc: Jimmy Cheng-Yi Chiang <cychiang@google.com>
Cc: Curtis Malainey <cujomalainey@google.com>
Cc: Brian Norris <briannorris@chromium.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221108115914.3751090-1-perex@perex.cz
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Enable unprivileged bpf for selftests kernel by default.
This forces CI to run test_verifier tests in both privileged
and unprivileged modes.
The test_verifier.c:do_test uses sysctl kernel.unprivileged_bpf_disabled
to decide whether to run or to skip test cases in unprivileged mode.
The CONFIG_BPF_UNPRIV_DEFAULT_OFF controls the default value of the
kernel.unprivileged_bpf_disabled.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20221116015456.2461135-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Verify that nullness information is porpagated in the branches of
register to register JEQ and JNE operations.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221115224859.2452988-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There is no point in failing the tests when RTC is not present.
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Tested-by: Daniel Diaz <daniel.diaz@linaro.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Currently, verifier uses MEM_ALLOC type tag to specially tag memory
returned from bpf_ringbuf_reserve helper. However, this is currently
only used for this purpose and there is an implicit assumption that it
only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM
in check_func_arg_reg_off).
Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special
relationship and instead open the use of MEM_ALLOC for more generic
allocations made for user types.
Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it.
Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'.
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
test_cpuset_prs.sh is failing with the following error:
test_cpuset_prs.sh: line 29: [[: 8
57%: syntax error in expression (error token is "57%")
This is happening because `lscpu | grep "^CPU(s)"` returns two lines in
some systems (such as Debian unstable):
# lscpu | grep "^CPU(s)"
CPU(s): 8
CPU(s) scaling MHz: 55%
This is a simple fix that discard the second line.
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Kamalesh Babulal <kamalesh.babulal@oracle.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
In preparation for cxl_acpi walking pci_root->bus->bridge, add that
association to the mock pci_root instances.
Note that the missing 3rd entry in mock_pci_root[] was not noticed until
now given that the test version of to_cxl_host_bridge()
(tools/testing/cxl/mock_acpi.c), obviated the need for that entry.
However, "cxl/acpi: Improve debug messages in cxl_acpi_probe()" [1]
needs pci_root->bus->bridge to be populated.
Link: https://lore.kernel.org/r/20221018132341.76259-6-rrichter@amd.com [1]
Cc: Robert Richter <rrichter@amd.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL dports are added in a couple of code paths using
devm_cxl_add_dport(). Debug messages are individually generated, but are
incomplete and inconsistent. Change this by moving its generation to
devm_cxl_add_dport(). This unifies the messages and reduces code
duplication. Also, generate messages on failure. Use a
__devm_cxl_add_dport() wrapper to keep the readability of the error
exits.
Signed-off-by: Robert Richter <rrichter@amd.com>
Link: https://lore.kernel.org/r/20221018132341.76259-5-rrichter@amd.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
In some platform, the schedule event may came slowly, delay 100ms can't
cover it.
I was notice that on my board which running in low cpu_freq,and this
selftests allways gose fail.
So maybe we can check more times here to wait longer.
Fixes: 43bb45da82 ("selftests: ftrace: Add a selftest to test event enable/disable func trigger")
Signed-off-by: Yipeng Zou <zouyipeng@huawei.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Fix following coccicheck warning:
tools/testing/selftests/arm64/mte/check_mmap_options.c:64:24-25:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:66:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:135:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:96:25-26:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_mmap_options.c:190:24-25:
WARNING: Use ARRAY_SIZE
Signed-off-by: KaiLong Wang <wangkailong@jari.cn>
Link: https://lore.kernel.org/r/777ce8ba.12e.184705d4211.Coremail.wangkailong@jari.cn
Signed-off-by: Will Deacon <will@kernel.org>
'time' is the local variable of run_test() function, while 'max_time' is
the local variable of do_transfer() function. So in do_transfer(),
$max_time should be used, not $time.
Please note that here $time == $max_time so the behaviour is not changed
but the right variable is used.
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEET63h6RnJhTJHuKTjXOwUVIRcSScFAmNu2EkACgkQXOwUVIRc
SSebKhAA0ffmp5jJgEJpQYNABGLYIJcwKkBrGClDbMJLtwCjevGZJajT9fpbCLb1
eK6EIhdfR0NTO+0KtUVkZ8WMa81OmLEJYdTNtJfNE23ENMpssiAWhlhDF8AoXeKv
Bo3j719gn3Cw9PWXQoircH3wpj+5RMDnjxy4iYlA5yNrvzC7XVmssMF+WALvQnuK
CGrfR57hxdgmphmasRqeCzEoriwihwPsG3k6eQN8rf7ZytLhs90tMVgT9L3Cd2u9
DafA0Xl8mZdz2mHhThcJhQVq4MUymZj44ufuHDiOs1j6nhUlWToyQuvegPOqxKti
uLGtZul0ls+3UP0Lbrv1oEGU/MWMxyDz4IBc0EVs0k3ItQbmSKs6r9WuPFGd96Sb
GHk68qFVySeLGN0LfKe3rCHJ9ZoIOPYJg9qT8Rd5bOhetgGwSsxZTxUI39BxkFup
CEqwIDnts1TMU37GDjj+vssKW91k4jEzMZVtRfsL3J36aJs28k/Ez4AqLXg6WU6u
ADqFaejVPcXbN9rX90onIYxxiL28gZSeT+i8qOPELZtqTQmNWz+tC/ySVuWnD8Mn
Nbs7PZ1IWiNZpsKS8pZnpd6j4mlBeJnwXkPKiFy+xHGuwRSRdYl6G9e5CtlRely/
rwQ8DtaOpRYMrGhnmBEdAOCa9t/iqzrzHzjoigjJ7iAST4ToJ5s=
=Y+/e
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:
====================
bpf-next 2022-11-11
We've added 49 non-merge commits during the last 9 day(s) which contain
a total of 68 files changed, 3592 insertions(+), 1371 deletions(-).
The main changes are:
1) Veristat tool improvements to support custom filtering, sorting, and replay
of results, from Andrii Nakryiko.
2) BPF verifier precision tracking fixes and improvements,
from Andrii Nakryiko.
3) Lots of new BPF documentation for various BPF maps, from Dave Tucker,
Donald Hunter, Maryam Tahhan, Bagas Sanjaya.
4) BTF dedup improvements and libbpf's hashmap interface clean ups, from
Eduard Zingerman.
5) Fix veth driver panic if XDP program is attached before veth_open, from
John Fastabend.
6) BPF verifier clean ups and fixes in preparation for follow up features,
from Kumar Kartikeya Dwivedi.
7) Add access to hwtstamp field from BPF sockops programs,
from Martin KaFai Lau.
8) Various fixes for BPF selftests and samples, from Artem Savkov,
Domenico Cerasuolo, Kang Minchul, Rong Tao, Yang Jihong.
9) Fix redirection to tunneling device logic, preventing skb->len == 0, from
Stanislav Fomichev.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (49 commits)
selftests/bpf: fix veristat's singular file-or-prog filter
selftests/bpf: Test skops->skb_hwtstamp
selftests/bpf: Fix incorrect ASSERT in the tcp_hdr_options test
bpf: Add hwtstamp field for the sockops prog
selftests/bpf: Fix xdp_synproxy compilation failure in 32-bit arch
bpf, docs: Document BPF_MAP_TYPE_ARRAY
docs/bpf: Document BPF map types QUEUE and STACK
docs/bpf: Document BPF ARRAY_OF_MAPS and HASH_OF_MAPS
docs/bpf: Document BPF_MAP_TYPE_CPUMAP map
docs/bpf: Document BPF_MAP_TYPE_LPM_TRIE map
libbpf: Hashmap.h update to fix build issues using LLVM14
bpf: veth driver panics when xdp prog attached before veth_open
selftests: Fix test group SKIPPED result
selftests/bpf: Tests for btf_dedup_resolve_fwds
libbpf: Resolve unambigous forward declarations
libbpf: Hashmap interface update to allow both long and void* keys/values
samples/bpf: Fix sockex3 error: Missing BPF prog type
selftests/bpf: Fix u32 variable compared with less than zero
Documentation: bpf: Escape underscore in BPF type name prefix
selftests/bpf: Use consistent build-id type for liburandom_read.so
...
====================
Link: https://lore.kernel.org/r/20221111233733.1088228-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Andrii Nakryiko says:
====================
bpf 2022-11-11
We've added 11 non-merge commits during the last 8 day(s) which contain
a total of 11 files changed, 83 insertions(+), 74 deletions(-).
The main changes are:
1) Fix strncpy_from_kernel_nofault() to prevent out-of-bounds writes,
from Alban Crequy.
2) Fix for bpf_prog_test_run_skb() to prevent wrong alignment,
from Baisong Zhong.
3) Switch BPF_DISPATCHER to static_call() instead of ftrace infra, with
a small build fix on top, from Peter Zijlstra and Nathan Chancellor.
4) Fix memory leak in BPF verifier in some error cases, from Wang Yufen.
5) 32-bit compilation error fixes for BPF selftests, from Pu Lehui and
Yang Jihong.
6) Ensure even distribution of per-CPU free list elements, from Xu Kuohai.
7) Fix copy_map_value() to track special zeroed out areas properly,
from Xu Kuohai.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix offset calculation error in __copy_map_value and zero_map_value
bpf: Initialize same number of free nodes for each pcpu_freelist
selftests: bpf: Add a test when bpf_probe_read_kernel_str() returns EFAULT
maccess: Fix writing offset in case of fault in strncpy_from_kernel_nofault()
selftests/bpf: Fix test_progs compilation failure in 32-bit arch
selftests/bpf: Fix casting error when cross-compiling test_verifier for 32-bit platforms
bpf: Fix memory leaks in __check_func_call
bpf: Add explicit cast to 'void *' for __BPF_DISPATCHER_UPDATE()
bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)
bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")
bpf, test_run: Fix alignment problem in bpf_prog_test_run_skb()
====================
Link: https://lore.kernel.org/r/20221111231624.938829-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
introduced post-6.0 or which aren't considered serious enough to justify a
-stable backport.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY27xPAAKCRDdBJ7gKXxA
juFXAP4tSmfNDrT6khFhV0l4cS43bluErVNLh32RfXBqse8GYgEA5EPvZkOssLqY
86ejRXFgAArxYC4caiNURUQL+IASvQo=
=YVOx
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc hotfixes from Andrew Morton:
"22 hotfixes.
Eight are cc:stable and the remainder address issues which were
introduced post-6.0 or which aren't considered serious enough to
justify a -stable backport"
* tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits)
docs: kmsan: fix formatting of "Example report"
mm/damon/dbgfs: check if rm_contexts input is for a real context
maple_tree: don't set a new maximum on the node when not reusing nodes
maple_tree: fix depth tracking in maple_state
arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging
fs: fix leaked psi pressure state
nilfs2: fix use-after-free bug of ns_writer on remount
x86/traps: avoid KMSAN bugs originating from handle_bug()
kmsan: make sure PREEMPT_RT is off
Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN
x86/uaccess: instrument copy_from_user_nmi()
kmsan: core: kmsan_in_runtime() should return true in NMI context
mm: hugetlb_vmemmap: include missing linux/moduleparam.h
mm/shmem: use page_mapping() to detect page cache for uffd continue
mm/memremap.c: map FS_DAX device memory as decrypted
Partly revert "mm/thp: carry over dirty bit when thp splits on pmd"
nilfs2: fix deadlock in nilfs_count_free_blocks()
mm/mmap: fix memory leak in mmap_region()
hugetlbfs: don't delete error page from pagecache
maple_tree: reorganize testing to restore module testing
...
Fix the bug of filtering out filename too early, before we know the
program name, if using unified file-or-prog filter (i.e., -f
<any-glob>). Because we try to filter BPF object file early without
opening and parsing it, if any_glob (file-or-prog) filter is used we
have to accept any filename just to get program name, which might match
any_glob.
Fixes: 10b1b3f3e5 ("selftests/bpf: consolidate and improve file/prog filtering in veristat")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221111181242.2101192-1-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmNuf4IQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgppvMD/9K2kFcAiD85QmRoIgwlIRM604KZ6aGXqk3
BjTavxfB+3DJcb82FHywBF5DC0sUtrBOTn7+DJpf13lb4L2DZY1lfLkRL7SKHSs5
o1z+1uLcBtZtGCq5M+yhpxbAzJ2kNWdRe+FutSA6wiz03ATXTwo2qE1MLaw1jxap
DowK08DUtLNaFNoEGdpW8iub9ql1OVWWZdOaxZmVJkdPWeWMD6Zaqwi/MeyNv0aY
KbVpYHa2AGxGY6+2krLpL09kqYlW++UvFsofM6RJrHTlLyBdYKvM2Z+Tv9I6w81s
ZerVl5srC2pVj1K0isO7A25GTVIVzI9im/GCzStNTasFtlzW85CwLEcDS8T679bY
I0P+Wl3ZoLJztChrcSufiAaOfJIichML7H3h/iEkSE51+9cBr42fqJO64dc+s/Bi
OGmaFowYgJgOClzpAJ2upd2aNu4sLiR2DUb3qdHDpcio9bfpIme1Do1yB94kRR//
yIFrs47PW+JumE90iKJPnDRHWrl3dVUK27MqkAWSBuvOkBjKxLBSVHIARs1lGWy1
25y4atEMaEYnvjC3ATwM0WX0LY+5jCVqOXyfMPAMmEZ7WDbER7FfGxnnmw/pwka7
D4CiSWn5H2Jp9Lq7HiblgYucXXNCPYgSx9JiXnY/KBpARaKUIXuTOq2PuJ/FW4UG
dsJap0W2rw==
=s8Z1
-----END PGP SIGNATURE-----
Merge tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
"Nothing major, just a few minor tweaks:
- Tweak for the TCP zero-copy io_uring self test (Pavel)
- Rather than use our internal cached value of number of CQ events
available, use what the user can see (Dylan)
- Fix a typo in a comment, added in this release (me)
- Don't allow wrapping while adding provided buffers (me)
- Fix a double poll race, and add a lockdep assertion for it too
(Pavel)"
* tag 'io_uring-6.1-2022-11-11' of git://git.kernel.dk/linux:
io_uring/poll: lockdep annote io_poll_req_insert_locked
io_uring/poll: fix double poll req->flags races
io_uring: check for rollover of buffer ID when providing buffers
io_uring: calculate CQEs from the user visible value
io_uring: fix typo in io_uring.h comment
selftests/net: don't tests batched TCP io_uring zc
This patch tests reading the skops->skb_hwtstamp field.
A local test was also done such that the shinfo hwtstamp was temporary
set to a non zero value in the kernel bpf_skops_parse_hdr()
and the same value can be read by the skops test.
An adjustment is needed to the btf_dump selftest because
the changes in the 'struct bpf_sock_ops'.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-4-martin.lau@linux.dev
This patch fixes the incorrect ASSERT test in tcp_hdr_options during
the CHECK to ASSERT macro cleanup.
Fixes: 3082f8cd4b ("selftests/bpf: Convert tcp_hdr_options test to ASSERT_* macros")
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Wang Yufen <wangyufen@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-3-martin.lau@linux.dev
xdp_synproxy fails to be compiled in the 32-bit arch, log is as follows:
xdp_synproxy.c: In function 'parse_options':
xdp_synproxy.c:175:36: error: left shift count >= width of type [-Werror=shift-count-overflow]
175 | *tcpipopts = (mss6 << 32) | (ttl << 24) | (wscale << 16) | mss4;
| ^~
xdp_synproxy.c: In function 'syncookie_open_bpf_maps':
xdp_synproxy.c:289:28: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
289 | .map_ids = (__u64)map_ids,
| ^
Fix it.
Fixes: fb5cd0ce70 ("selftests/bpf: Add selftests for raw syncookie helpers")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221111030836.37632-1-yangjihong1@huawei.com
This commit tests previous fix of bpf_probe_read_kernel_str().
The BPF helper bpf_probe_read_kernel_str should return -EFAULT when
given a bad source pointer and the target buffer should only be modified
to make the string NULL terminated.
bpf_probe_read_kernel_str() was previously inserting a NULL before the
beginning of the dst buffer. This test should ensure that the
implementation stays correct for now on.
Without the fix, this test will fail as follows:
$ cd tools/testing/selftests/bpf
$ make
$ sudo ./test_progs --name=varlen
...
test_varlen:FAIL:check got 0 != exp 66
Signed-off-by: Alban Crequy <albancrequy@linux.microsoft.com>
Signed-off-by: Francis Laniel <flaniel@linux.microsoft.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221110085614.111213-3-albancrequy@linux.microsoft.com
Changes v1 to v2:
- add ack tag
- fix my email
- rebase on bpf tree and tag for bpf tree
Current release - new code bugs:
- can: af_can: can_exit(): add missing dev_remove_pack() of canxl_packet
Previous releases - regressions:
- bpf, sockmap: fix the sk->sk_forward_alloc warning
- wifi: mac80211: fix general-protection-fault in
ieee80211_subif_start_xmit()
- can: af_can: fix NULL pointer dereference in can_rx_register()
- can: dev: fix skb drop check, avoid o-o-b access
- nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()
Previous releases - always broken:
- bpf: fix wrong reg type conversion in release_reference()
- gso: fix panic on frag_list with mixed head alloc types
- wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker()
- wifi: mac80211: set TWT Information Frame Disabled bit as 1
- eth: macsec offload related fixes, make sure to clear the keys
from memory
- tun: fix memory leaks in the use of napi_get_frags
- tun: call napi_schedule_prep() to ensure we own a napi
- tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
- ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg
to network
- tipc: fix a msg->req tlv length check
- sctp: clear out_curr if all frag chunks of current msg are pruned,
avoid list corruption
- mctp: fix an error handling path in mctp_init(), avoid leaks
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmNtnlEACgkQMUZtbf5S
IrvSfg//axNePPwFiAdbYUmSNmnnv2Zpyz1l9a2/WvKKMeyAH3d4zuQGyTz7VgoJ
at4k1fr14vm+3qBhlL0UFdd+h/wBewwuuWLiogIfhgqDO7KavZsbTJWQ59DSHH08
ujihvt7dF9ByVd3hOpUDjrYGd2rPghqXk8l/2gpPp/KIrbj1jSW0DdF7Y48/0RRw
PYzNYZ9tqICw1crBT52ZilNEebGaUuWpPLzV2owlhJpzqyRLcgd9GWN9DkKieiiw
wF0Wi7A8b/+cR/Wo93RAXtvEayN9vp/t6iyiI1opv3Yg6bhAMlzDUX/v79ccnAM6
wJ3b8bKyLgph5ZTNmbL8GwC2pwl/20hOgCVLb/Haykqrk4oO2+xD39fjKniFP/71
IBYuLCethi0zmiSyR8yO4iyrfJCnkJffoxtcG8O5x+FuCfMI1xQWx44bSc34KlqT
vDw/VmnIfXH9K3F+QdWtlZfLiM0F6vd7RNGIxX0cC2wQCwaubCo0LOs5vl2+jpR8
Xclo+OquQtX5XRqGGQDtA7kCM9jfuc/DWla1v10wy7ZagiKkdfrV7Zu7r431Dtwn
BWeKZAA38o9WNRb4FD5GGUN0dK5R5V25LmbpvYuerq5Ub3pGJgHMsdA15LqsqTnW
MGIokGFhu7ToAQEnaRkF96jh3c3yoMU/sWXsqh7x/G6Tir7JGUw=
=WPta
-----END PGP SIGNATURE-----
Merge tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wifi, can and bpf.
Current release - new code bugs:
- can: af_can: can_exit(): add missing dev_remove_pack() of
canxl_packet
Previous releases - regressions:
- bpf, sockmap: fix the sk->sk_forward_alloc warning
- wifi: mac80211: fix general-protection-fault in
ieee80211_subif_start_xmit()
- can: af_can: fix NULL pointer dereference in can_rx_register()
- can: dev: fix skb drop check, avoid o-o-b access
- nfnetlink: fix potential dead lock in nfnetlink_rcv_msg()
Previous releases - always broken:
- bpf: fix wrong reg type conversion in release_reference()
- gso: fix panic on frag_list with mixed head alloc types
- wifi: brcmfmac: fix buffer overflow in brcmf_fweh_event_worker()
- wifi: mac80211: set TWT Information Frame Disabled bit as 1
- eth: macsec offload related fixes, make sure to clear the keys from
memory
- tun: fix memory leaks in the use of napi_get_frags
- tun: call napi_schedule_prep() to ensure we own a napi
- tcp: prohibit TCP_REPAIR_OPTIONS if data was already sent
- ipv6: addrlabel: fix infoleak when sending struct ifaddrlblmsg to
network
- tipc: fix a msg->req tlv length check
- sctp: clear out_curr if all frag chunks of current msg are pruned,
avoid list corruption
- mctp: fix an error handling path in mctp_init(), avoid leaks"
* tag 'net-6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits)
eth: sp7021: drop free_netdev() from spl2sw_init_netdev()
MAINTAINERS: Move Vivien to CREDITS
net: macvlan: fix memory leaks of macvlan_common_newlink
ethernet: tundra: free irq when alloc ring failed in tsi108_open()
net: mv643xx_eth: disable napi when init rxq or txq failed in mv643xx_eth_open()
ethernet: s2io: disable napi when start nic failed in s2io_card_up()
net: atlantic: macsec: clear encryption keys from the stack
net: phy: mscc: macsec: clear encryption keys when freeing a flow
stmmac: dwmac-loongson: fix missing of_node_put() while module exiting
stmmac: dwmac-loongson: fix missing pci_disable_device() in loongson_dwmac_probe()
stmmac: dwmac-loongson: fix missing pci_disable_msi() while module exiting
cxgb4vf: shut down the adapter when t4vf_update_port_info() failed in cxgb4vf_open()
mctp: Fix an error handling path in mctp_init()
stmmac: intel: Update PCH PTP clock rate from 200MHz to 204.8MHz
net: cxgb3_main: disable napi when bind qsets failed in cxgb_up()
net: cpsw: disable napi in cpsw_ndo_open()
iavf: Fix VF driver counting VLAN 0 filters
ice: Fix spurious interrupt during removal of trusted VF
net/mlx5e: TC, Fix slab-out-of-bounds in parse_tc_actions
net/mlx5e: E-Switch, Fix comparing termination table instance
...
Add some mix of tests into page_fault_test: memory regions with all the
pairwise combinations of read-only, userfaultfd, and dirty-logging. For
example, writing into a read-only region which has a hole handled with
userfaultfd.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-15-ricarkol@google.com
Add some readonly memslot tests into page_fault_test. Mark the data and/or
page-table memory regions as readonly, perform some accesses, and check
that the right fault is triggered when expected (e.g., a store with no
write-back should lead to an mmio exit).
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-14-ricarkol@google.com
Add some dirty logging tests into page_fault_test. Mark the data and/or
page-table memory regions for dirty logging, perform some accesses, and
check that the dirty log bits are set or clean when expected.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-13-ricarkol@google.com
Add some userfaultfd tests into page_fault_test. Punch holes into the
data and/or page-table memslots, perform some accesses, and check that
the faults are taken (or not taken) when expected.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-12-ricarkol@google.com
Add a new test for stage 2 faults when using different combinations of
guest accesses (e.g., write, S1PTW), backing source type (e.g., anon)
and types of faults (e.g., read on hugetlbfs with a hole). The next
commits will add different handling methods and more faults (e.g., uffd
and dirty logging). This first commit starts by adding two sanity checks
for all types of accesses: AF setting by the hw, and accessing memslots
with holes.
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-11-ricarkol@google.com
Now that kvm_vm allows specifying different memslots for code, page tables,
and data, use the appropriate memslot when making allocations in
common/libraty code. Change them accordingly:
- code (allocated by lib/elf) use the CODE memslot
- stacks, exception tables, and other core data pages (like the TSS in x86)
use the DATA memslot
- page tables and the PGD use the PT memslot
- test data (anything allocated with vm_vaddr_alloc()) uses the TEST_DATA
memslot
No functional change intended. All allocators keep using memslot #0.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-10-ricarkol@google.com
Refactor virt_arch_pgd_alloc() and vm_vaddr_alloc() in both RISC-V and
aarch64 to fix the alignment of parameters in a couple of calls. This will
make it easier to fix the alignment in a future commit that adds an extra
parameter (that happens to be very long).
No functional change intended.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-9-ricarkol@google.com
The vm_create() helpers are hardcoded to place most page types (code,
page-tables, stacks, etc) in the same memslot #0, and always backed with
anonymous 4K. There are a couple of issues with that. First, tests
willing to differ a bit, like placing page-tables in a different backing
source type must replicate much of what's already done by the vm_create()
functions. Second, the hardcoded assumption of memslot #0 holding most
things is spread everywhere; this makes it very hard to change.
Fix the above issues by having selftests specify how they want memory to be
laid out. Start by changing ____vm_create() to not create memslot #0; a
test (to come) will specify all memslots used by the VM. Then, add the
vm->memslots[] array to specify the right memslot for different memory
allocators, e.g.,: lib/elf should use the vm->[MEM_REGION_CODE] memslot.
This will be used as a way to specify the page-tables memslots (to be
backed by huge pages for example).
There is no functional change intended. The current commit lays out memory
exactly as before. A future commit will change the allocators to get the
region they should be using, e.g.,: like the page table allocators using
the pt memslot.
Cc: Sean Christopherson <seanjc@google.com>
Cc: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-8-ricarkol@google.com
Add the backing_src_type into struct userspace_mem_region. This struct
already stores a lot of info about memory regions, except the backing
source type. This info will be used by a future commit in order to
determine the method for punching a hole.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-7-ricarkol@google.com
Define macros for memory type indexes and construct DEFAULT_MAIR_EL1
with macros from asm/sysreg.h. The index macros can then be used when
constructing PTEs (instead of using raw numbers).
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Oliver Upton <oupton@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-5-ricarkol@google.com
Deleting a memslot (when freeing a VM) is not closing the backing fd,
nor it's unmapping the alias mapping. Fix by adding the missing close
and munmap.
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Reviewed-by: Oliver Upton <oupton@google.com>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-4-ricarkol@google.com
Add a library function to get the PTE (a host virtual address) of a
given GVA. This will be used in a future commit by a test to clear and
check the access flag of a particular page.
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-3-ricarkol@google.com
Move the generic userfaultfd code out of demand_paging_test.c into a
common library, userfaultfd_util. This library consists of a setup and a
stop function. The setup function starts a thread for handling page
faults using the handler callback function. This setup returns a
uffd_desc object which is then used in the stop function (to wait and
destroy the threads).
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Ben Gardon <bgardon@google.com>
Signed-off-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221017195834.2295901-2-ricarkol@google.com
Currently, the debug-exceptions test always uses only
{break,watch}point#0 and the highest numbered context-aware
breakpoint. Modify the test to use all {break,watch}points and
context-aware breakpoints supported on the system.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-10-reijiw@google.com
Currently, the debug-exceptions test doesn't have a test case for
a linked watchpoint. Add a test case for the linked watchpoint to
the test. The new test case uses the highest numbered context-aware
breakpoint (for Context ID match), and the watchpoint#0, which is
linked to the context-aware breakpoint.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-9-reijiw@google.com
Currently, the debug-exceptions test doesn't have a test case for
a linked breakpoint. Add a test case for the linked breakpoint to
the test. The new test case uses a pair of breakpoints. One is the
higiest numbered context-aware breakpoint (for Context ID match),
and the other one is the breakpoint#0 (for Address Match), which
is linked to the context-aware breakpoint.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-8-reijiw@google.com
Change debug_version() to take the ID_AA64DFR0_EL1 value instead of
vcpu as an argument, and change its callsite to read ID_AA64DFR0_EL1
(and pass it to debug_version()).
Subsequent patches will reuse the register value in the callsite.
No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-7-reijiw@google.com
Currently, debug-exceptions test unnecessarily tracks some test stages
using GUEST_SYNC(). The code for it needs to be updated as test cases
are added or removed. Stop doing the unnecessary stage tracking,
as they are not so useful and are a bit pain to maintain.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-6-reijiw@google.com
Remove the hard-coded {break,watch}point #0 from the guest_code() in
debug-exceptions to allow {break,watch}point number to be specified.
Change reset_debug_state() to zeroing all dbg{b,w}{c,v}r_el0 registers
so that guest_code() can use the function to reset those registers
even when non-zero {break,watch}points are specified for guest_code().
Subsequent patches will add test cases for non-zero {break,watch}points.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-4-reijiw@google.com
Introduce helpers in the debug-exceptions test to write to
dbg{b,w}{c,v}r registers. Those helpers will be useful for
test cases that will be added to the test in subsequent patches.
No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-3-reijiw@google.com
Use FIELD_GET() macro to extract ID register fields for existing
aarch64 selftests code. No functional change intended.
Signed-off-by: Reiji Watanabe <reijiw@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020054202.2119018-2-reijiw@google.com
The memory area in each slot should be aligned to host page size.
Otherwise, the test will fail. For example, the following command
fails with the following messages with 64KB-page-size-host and
4KB-pae-size-guest. It's not user friendly to abort the test.
Lets do something to report the optimal memory slots, instead of
failing the test.
# ./memslot_perf_test -v -s 1000
Number of memory slots: 999
Testing map performance with 1 runs, 5 seconds each
Adding slots 1..999, each slot with 8 pages + 216 extra pages last
==== Test Assertion Failure ====
lib/kvm_util.c:824: vm_adjust_num_guest_pages(vm->mode, npages) == npages
pid=19872 tid=19872 errno=0 - Success
1 0x00000000004065b3: vm_userspace_mem_region_add at kvm_util.c:822
2 0x0000000000401d6b: prepare_vm at memslot_perf_test.c:273
3 (inlined by) test_execute at memslot_perf_test.c:756
4 (inlined by) test_loop at memslot_perf_test.c:994
5 (inlined by) main at memslot_perf_test.c:1073
6 0x0000ffff7ebb4383: ?? ??:0
7 0x00000000004021ff: _start at :?
Number of guest pages is not compatible with the host. Try npages=16
Report the optimal memory slots instead of failing the test when
the memory area in each slot isn't aligned to host page size. With
this applied, the optimal memory slots is reported.
# ./memslot_perf_test -v -s 1000
Number of memory slots: 999
Testing map performance with 1 runs, 5 seconds each
Memslot count too high for this test, decrease the cap (max is 514)
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-7-gshan@redhat.com
The addresses and sizes passed to vm_userspace_mem_region_add() and
madvise() should be aligned to host page size, which can be 64KB on
aarch64. So it's wrong by passing additional fixed 4KB memory area
to various tests.
Fix it by passing additional fixed 64KB memory area to various tests.
We also add checks to ensure that none of host/guest page size exceeds
64KB. MEM_TEST_MOVE_SIZE is fixed up to 192KB either.
With this, the following command works fine on 64KB-page-size-host and
4KB-page-size-guest.
# ./memslot_perf_test -v -s 512
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-6-gshan@redhat.com
The test case is obviously broken on aarch64 because non-4KB guest
page size is supported. The guest page size on aarch64 could be 4KB,
16KB or 64KB.
This supports variable guest page size, mostly for aarch64.
- The host determines the guest page size when virtual machine is
created. The value is also passed to guest through the synchronization
area.
- The number of guest pages are unknown until the virtual machine
is to be created. So all the related macros are dropped. Instead,
their values are dynamically calculated based on the guest page
size.
- The static checks on memory sizes and pages becomes dependent
on guest page size, which is unknown until the virtual machine
is about to be created. So all the static checks are converted
to dynamic checks, done in check_memory_sizes().
- As the address passed to madvise() should be aligned to host page,
the size of page chunk is automatically selected, other than one
page.
- MEM_TEST_MOVE_SIZE has fixed and non-working 64KB. It will be
consolidated in next patch. However, the comments about how
it's calculated has been correct.
- All other changes included in this patch are almost mechanical
replacing '4096' with 'guest_page_size'.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-5-gshan@redhat.com
prepare_vm() is called in every iteration and run. The allowed memory
slots (KVM_CAP_NR_MEMSLOTS) are probed for multiple times. It's not
free and unnecessary.
Move the probing logic for the allowed memory slots to parse_args()
for once, which is upper layer of prepare_vm().
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-4-gshan@redhat.com
There are two loops in prepare_vm(), which have different conditions.
'slot' is treated as meory slot index in the first loop, but index of
the host virtual address array in the second loop. It makes it a bit
hard to understand the code.
Change the usage of 'slot' in the second loop, to treat it as the
memory slot index either.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-3-gshan@redhat.com
In prepare_vm(), 'data->nslots' is assigned with 'max_mem_slots - 1'
at the beginning, meaning they are interchangeable.
Use 'data->nslots' isntead of 'max_mem_slots - 1'. With this, it
becomes easier to move the logic of probing number of slots into
upper layer in subsequent patches.
No functional change intended.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221020071209.559062-2-gshan@redhat.com
In the dirty ring case, we rely on vcpu exit due to full dirty ring
state. On ARM64 system, there are 4096 host pages when the host
page size is 64KB. In this case, the vcpu never exits due to the
full dirty ring state. The similar case is 4KB page size on host
and 64KB page size on guest. The vcpu corrupts same set of host
pages, but the dirty page information isn't collected in the main
thread. This leads to infinite loop as the following log shows.
# ./dirty_log_test -M dirty-ring -c 65536 -m 5
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory offset: 0xffbffe0000
vcpu stops because vcpu is kicked out...
Notifying vcpu to continue
vcpu continues now.
Iteration 1 collected 576 pages
<No more output afterwards>
Fix the issue by automatically choosing the best dirty ring size,
to ensure vcpu exit due to full dirty ring state. The option '-c'
becomes a hint to the dirty ring count, instead of the value of it.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-8-gshan@redhat.com
There are two states, which need to be cleared before next mode
is executed. Otherwise, we will hit failure as the following messages
indicate.
- The variable 'dirty_ring_vcpu_ring_full' shared by main and vcpu
thread. It's indicating if the vcpu exit due to full ring buffer.
The value can be carried from previous mode (VM_MODE_P40V48_4K) to
current one (VM_MODE_P40V48_64K) when VM_MODE_P40V48_16K isn't
supported.
- The current ring buffer index needs to be reset before next mode
(VM_MODE_P40V48_64K) is executed. Otherwise, the stale value is
carried from previous mode (VM_MODE_P40V48_4K).
# ./dirty_log_test -M dirty-ring
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 4K pages
guest physical test memory offset: 0xffbfffc000
:
Dirtied 995328 pages
Total bits checked: dirty (1012434), clear (7114123), track_next (966700)
Testing guest mode: PA-bits:40, VA-bits:48, 64K pages
guest physical test memory offset: 0xffbffc0000
vcpu stops because vcpu is kicked out...
vcpu continues now.
Notifying vcpu to continue
Iteration 1 collected 0 pages
vcpu stops because dirty ring is full...
vcpu continues now.
vcpu stops because dirty ring is full...
vcpu continues now.
vcpu stops because dirty ring is full...
==== Test Assertion Failure ====
dirty_log_test.c:369: cleared == count
pid=10541 tid=10541 errno=22 - Invalid argument
1 0x0000000000403087: dirty_ring_collect_dirty_pages at dirty_log_test.c:369
2 0x0000000000402a0b: log_mode_collect_dirty_pages at dirty_log_test.c:492
3 (inlined by) run_test at dirty_log_test.c:795
4 (inlined by) run_test at dirty_log_test.c:705
5 0x0000000000403a37: for_each_guest_mode at guest_modes.c:100
6 0x0000000000401ccf: main at dirty_log_test.c:938
7 0x0000ffff9ecd279b: ?? ??:0
8 0x0000ffff9ecd286b: ?? ??:0
9 0x0000000000401def: _start at ??:?
Reset dirty pages (0) mismatch with collected (35566)
Fix the issues by clearing 'dirty_ring_vcpu_ring_full' and the ring
buffer index before next new mode is to be executed.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-7-gshan@redhat.com
In vcpu_map_dirty_ring(), the guest's page size is used to figure out
the offset in the virtual area. It works fine when we have same page
sizes on host and guest. However, it fails when the page sizes on host
and guest are different on arm64, like below error messages indicates.
# ./dirty_log_test -M dirty-ring -m 7
Setting log mode to: 'dirty-ring'
Test iterations: 32, interval: 10 (ms)
Testing guest mode: PA-bits:40, VA-bits:48, 64K pages
guest physical test memory offset: 0xffbffc0000
vcpu stops because vcpu is kicked out...
Notifying vcpu to continue
vcpu continues now.
==== Test Assertion Failure ====
lib/kvm_util.c:1477: addr == MAP_FAILED
pid=9000 tid=9000 errno=0 - Success
1 0x0000000000405f5b: vcpu_map_dirty_ring at kvm_util.c:1477
2 0x0000000000402ebb: dirty_ring_collect_dirty_pages at dirty_log_test.c:349
3 0x00000000004029b3: log_mode_collect_dirty_pages at dirty_log_test.c:478
4 (inlined by) run_test at dirty_log_test.c:778
5 (inlined by) run_test at dirty_log_test.c:691
6 0x0000000000403a57: for_each_guest_mode at guest_modes.c:105
7 0x0000000000401ccf: main at dirty_log_test.c:921
8 0x0000ffffb06ec79b: ?? ??:0
9 0x0000ffffb06ec86b: ?? ??:0
10 0x0000000000401def: _start at ??:?
Dirty ring mapped private
Fix the issue by using host's page size to map the ring buffer.
Signed-off-by: Gavin Shan <gshan@redhat.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221110104914.31280-6-gshan@redhat.com
When showing the result of a test group, if one
of the subtests was skipped, while still having
passing subtests, the group result was marked as
SKIP. E.g.:
223/1 usdt/basic:SKIP
223/2 usdt/multispec:OK
223/3 usdt/urand_auto_attach:OK
223/4 usdt/urand_pid_attach:OK
223 usdt:SKIP
The test result of usdt in the example above
should be OK instead of SKIP, because the test
group did have passing tests and it would be
considered in "normal" state.
With this change, only if all of the subtests
were skipped, the group test is marked as SKIP.
When only some of the subtests are skipped, a
more detailed result is given, stating how
many of the subtests were skipped. E.g:
223/1 usdt/basic:SKIP
223/2 usdt/multispec:OK
223/3 usdt/urand_auto_attach:OK
223/4 usdt/urand_pid_attach:OK
223 usdt:OK (SKIP: 1/4)
Signed-off-by: Domenico Cerasuolo <dceras@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221109184039.3514033-1-cerasuolodomenico@gmail.com
Tests to verify the following behavior of `btf_dedup_resolve_fwds`:
- remapping for struct forward declarations;
- remapping for union forward declarations;
- no remapping if forward declaration kind does not match similarly
named struct or union declaration;
- no remapping if forward declaration name is ambiguous;
- base ids are considered for fwd resolution in split btf scenario.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Link: https://lore.kernel.org/bpf/20221109142611.879983-4-eddyz87@gmail.com
An update for libbpf's hashmap interface from void* -> void* to a
polymorphic one, allowing both long and void* keys and values.
This simplifies many use cases in libbpf as hashmaps there are mostly
integer to integer.
Perf copies hashmap implementation from libbpf and has to be
updated as well.
Changes to libbpf, selftests/bpf and perf are packed as a single
commit to avoid compilation issues with any future bisect.
Polymorphic interface is acheived by hiding hashmap interface
functions behind auxiliary macros that take care of necessary
type casts, for example:
#define hashmap_cast_ptr(p) \
({ \
_Static_assert((p) == NULL || sizeof(*(p)) == sizeof(long),\
#p " pointee should be a long-sized integer or a pointer"); \
(long *)(p); \
})
bool hashmap_find(const struct hashmap *map, long key, long *value);
#define hashmap__find(map, key, value) \
hashmap_find((map), (long)(key), hashmap_cast_ptr(value))
- hashmap__find macro casts key and value parameters to long
and long* respectively
- hashmap_cast_ptr ensures that value pointer points to a memory
of appropriate size.
This hack was suggested by Andrii Nakryiko in [1].
This is a follow up for [2].
[1] https://lore.kernel.org/bpf/CAEf4BzZ8KFneEJxFAaNCCFPGqp20hSpS2aCj76uRk3-qZUH5xg@mail.gmail.com/
[2] https://lore.kernel.org/bpf/af1facf9-7bc8-8a3d-0db4-7b3f333589a2@meta.com/T/#m65b28f1d6d969fcd318b556db6a3ad499a42607d
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20221109142611.879983-2-eddyz87@gmail.com
Test that locked bridge port configurations that are not supported by
mlxsw are rejected.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets received via a locked bridge port whose {SMAC, VID}
does not appear in the bridge's FDB or appears with a different port,
trigger the "locked_port" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Test that packets with a destination MAC of 01:80:C2:00:00:03 trigger
the "eapol" packet trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Merely checking whether a trap counter incremented or not without
logging a test result is useful on its own. Split this functionality to
a helper which will be used by subsequent patches.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
test_progs fails to be compiled in the 32-bit arch, log is as follows:
test_progs.c:1013:52: error: format '%ld' expects argument of type 'long int', but argument 3 has type 'size_t' {aka 'unsigned int'} [-Werror=format=]
1013 | sprintf(buf, "MSG_TEST_LOG (cnt: %ld, last: %d)",
| ~~^
| |
| long int
| %d
1014 | strlen(msg->test_log.log_buf),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| size_t {aka unsigned int}
Fix it.
Fixes: 91b2c0afd0 ("selftests/bpf: Add parallelism to test_progs")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221108015857.132457-1-yangjihong1@huawei.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
When cross-compiling test_verifier for 32-bit platforms, the casting error is shown below:
test_verifier.c:1263:27: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
1263 | info.xlated_prog_insns = (__u64)*buf;
| ^
cc1: all warnings being treated as errors
Fix it by adding zero-extension for it.
Fixes: 933ff53191 ("selftests/bpf: specify expected instructions in test_verifier tests")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20221108121945.4104644-1-pulehui@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Since the newly added instruction is in the HINT space we can't reasonably
test for it actually being present.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-5-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Add FEAT_CSSC to the set of features checked by the hwcap selftest.
Signed-off-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20221017152520.1039165-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When using the flags in KVM_X86_SET_MSR_FILTER and
KVM_CAP_X86_USER_SPACE_MSR it is expected that an attempt to write to
any of the unused bits will fail. Add testing to walk over every bit
in each of the flag fields in MSR filtering and MSR exiting to verify
that unused bits return and error and used bits, i.e. valid bits,
succeed.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Message-Id: <20220921151525.904162-6-aaronlewis@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Some users of KVM implement the UEFI variable store through a paravirtual device
that does not require the "SMM lockbox" component of edk2; allow them to
compile out system management mode, which is not a full implementation
especially in how it interacts with nested virtualization.
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220929172016.319443-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Address a few problems with the initial test script version:
* On systems with ip6tables but no ip6tables-legacy, testing for
ip6tables was disabled by accident.
* Firewall setup phase did not respect possibly unavailable tools.
* Consistently call nft via '$nft'.
Fixes: 6e31ce831c ("selftests: netfilter: Test reverse path filtering")
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's trigger a R/O longterm pin on three cases of R/O mapped anonymous
pages:
* exclusive (never shared)
* shared (child still alive)
* previously shared (child no longer alive)
... and make sure that the pin is reliable: whatever we write via the page
tables has to be observable via the pin.
Link: https://lkml.kernel.org/r/20220927110120.106906-8-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
io_uring provides a simple mechanism to test long-term, R/W GUP pins
-- via fixed buffers -- and can be used to verify that GUP pins stay
in sync with the pages in the page table even if a page would
temporarily get mapped R/O or concurrent fork() could accidentially
end up sharing pinned pages with the child.
Note that this essentially re-introduces local_config support that was
removed recently in commit 6f83d6c74e ("Kselftests: remove support of
libhugetlbfs from kselftests").
[david@redhat.com: s/size_t/ssize_t/ on `cur', `total'.]
Link: https://lkml.kernel.org/r/445fe1ae-9e22-0d1d-4d09-272231d2f84a@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-6-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's run all existing test cases with all hugetlb sizes we're able to
detect.
Note that some tests cases still fail. This will, for example, be fixed
once vmsplice properly uses FOLL_PIN instead of FOLL_GET for pinning.
With 2 MiB and 1 GiB hugetlb on x86_64, the expected failures are:
# [RUN] vmsplice() + unmap in child ... with hugetlb (2048 kB)
not ok 23 No leak from parent into child
# [RUN] vmsplice() + unmap in child ... with hugetlb (1048576 kB)
not ok 24 No leak from parent into child
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 35 No leak from child into parent
# [RUN] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 36 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (2048 kB)
not ok 47 No leak from child into parent
# [RUN] vmsplice() + unmap in parent after fork() ... with hugetlb (1048576 kB)
not ok 48 No leak from child into parent
Link: https://lkml.kernel.org/r/20220927110120.106906-5-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add various THP variants that we'll run with our existing test
cases.
Link: https://lkml.kernel.org/r/20220927110120.106906-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We'll reuse it in the anon_cow test next.
Link: https://lkml.kernel.org/r/20220927110120.106906-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/vm: test COW handling of anonymous memory".
This is my current set of tests for testing COW handling of anonymous
memory, especially when interacting with GUP. I developed these tests
while working on PageAnonExclusive and managed to clean them up just now.
On current upstream Linux, all tests pass except the hugetlb tests that
rely on vmsplice -- these tests should pass as soon as vmsplice properly
uses FOLL_PIN instead of FOLL_GET.
I'm working on additional tests for COW handling in private mappings,
focusing on long-term R/O pinning e.g., of the shared zeropage, pagecache
pages and KSM pages. These tests, however, will go into a different file.
So this is everything I have regarding tests for anonymous memory.
This patch (of 7):
Let's start adding tests for our COW handling of anonymous memory. We'll
focus on basic tests that we can achieve without additional libraries or
gup_test extensions.
We'll add THP and hugetlb tests separately.
[david@redhat.com: s/size_t/ssize_t/ on `cur', `total', `transferred';]
Link: https://lkml.kernel.org/r/51302b9e-dc69-d709-3214-f23868028555@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-1-david@redhat.com
Link: https://lkml.kernel.org/r/20220927110120.106906-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Christoph von Recklinghausen <crecklin@redhat.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
After converting all the three relevant testcases (uffd, madvise, mremap)
to use memfd, no test will need the hugetlb mount point anymore. Drop the
code.
Link: https://lkml.kernel.org/r/20221014144015.94039-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For dropping the hugetlb mountpoint in run_vmtests.sh. Cleaned it up a
little bit around the changed codes.
Link: https://lkml.kernel.org/r/20221014144013.94027-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
For dropping the hugetlb mountpoint in run_vmtests.sh. Since no parameter
is needed, drop USAGE too.
Link: https://lkml.kernel.org/r/20221014143921.93887-3-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/vm: Drop hugetlb mntpoint in run_vmtests.sh", v2.
Clean the code up so we can use the same memfd for both hugetlb and shmem
which is cleaner.
This patch (of 4):
We already used memfd for shmem test, move it forward with hugetlb too so
that we don't need user to specify the hugetlb file path explicitly when
running hugetlb shared tests.
Link: https://lkml.kernel.org/r/20221014143921.93887-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221014143921.93887-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Along the development cycle, the testing code support for module/in-kernel
compiles was removed. Restore this functionality by moving any internal
API tests to the userspace side, as well as threading tests. Fix the
lockdep issues and add a way to reduce memory usage so the tests can
complete with KASAN + memleak detection. Make the tests work on 32 bit
hosts where possible and detect 32 bit hosts in the radix test suite.
[akpm@linux-foundation.org: fix module export]
[akpm@linux-foundation.org: fix it some more]
[liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()]
Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com
Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com
Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use ARRAY_SIZE to fix the following coccicheck warnings:
tools/testing/selftests/arm64/mte/check_buffer_fill.c:341:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:35:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:168:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:72:20-21:
WARNING: Use ARRAY_SIZE
tools/testing/selftests/arm64/mte/check_buffer_fill.c:369:25-26:
WARNING: Use ARRAY_SIZE
Signed-off-by: Kang Minchul <tegongkang@gmail.com>
Link: https://lore.kernel.org/r/20221105073143.78521-1-tegongkang@gmail.com
Signed-off-by: Will Deacon <will@kernel.org>
The signal magic values are supposed to be allocated as somewhat meaningful
ASCII so if we encounter a bad magic value print the any alphanumeric
characters we find in it as well as the hex value to aid debuggability.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221102140543.98193-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
When fixing up support for extra_context in the signal handling tests I
didn't notice that there is a TODO file in the directory which lists this
as a thing to be done. Since it's been done remove it from the list.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221027110324.33802-1-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Especially when the test is configured to run for a longer time it can be
reassuring to users to see that the supervising program is running OK so
provide a message every second when the output timer expires.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-3-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
Currently we don't have an explicit check that when it's been a second
since we have seen output produced from the test programs starting up that
means all of them are running and we should start both sending signals and
timing out. This is not reliable, especially on very heavily loaded systems
where the test programs might take longer than a second to run.
We do skip sending signals to children that have not produced output yet
so we won't cause them to exit unexpectedly by sending a signal but this
can create confusion when interpreting output, for example appearing to
show the tests running for less time than expected or appearing to show
missed signal deliveries. Avoid issues by explicitly checking that we have
seen output from all the child processes before we start sending signals
or counting test run time.
This is especially likely on virtual platforms with large numbers of vector
lengths supported since the platforms are slow and there will be a lot of
tasks per CPU.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20221017144553.773176-2-broonie@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>