No functional change here we merely pull the helpers in sockmap_listen.c
into a header file so we can use these in other programs. The tests we
are about to add aren't really _listen tests so doesn't make sense
to add them here.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-10-john.fastabend@gmail.com
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2
Fixes: b60417a9f2 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move cxl_await_media_ready() to cxl_pci probe before driver starts issuing
IDENTIFY and retrieving memory device information to ensure that the
device is ready to provide the information. Allow cxl_pci_probe() to succeed
even if media is not ready. Cache the media failure in cxlds and don't ask
the device for any media information.
The rationale for proceeding in the !media_ready case is to allow for
mailbox operations to interrogate and/or remediate the device. After
media is repaired then rebinding the cxl_pci driver is expected to
restart the capacity scan.
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Fixes: b39cb1052a ("cxl/mem: Register CXL memX devices")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168445310026.3251520.8124296540679268206.stgit@djiang5-mobl3
[djbw: fixup cxl_test]
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Derick noticed, when testing hot plug, that hot-add behaves nominally
after a removal. However, if the hot-add is done without a prior
removal, CXL.mem accesses fail. It turns out that the original
implementation of the port driver and region programming wrongly assumed
that platform-firmware always enables the host-bridge HDM decoder
capability. Add support turning on switch-level HDM decoders in the case
where platform-firmware has not.
The implementation is careful to only arrange for the enable to be
undone if the current instance of the driver was the one that did the
enable. This is to interoperate with platform-firmware that may expect
CXL.mem to remain active after the driver is shutdown. This comes at the
cost of potentially not shutting down the enable on kexec flows, but it
is mitigated by the fact that the related HDM decoders still need to be
enabled on an individual basis.
Cc: <stable@vger.kernel.org>
Reported-by: Derick Marks <derick.w.marks@intel.com>
Fixes: 54cdbf845c ("cxl/port: Add a driver for 'struct cxl_port' objects")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/168437998331.403037.15719879757678389217.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRmEHoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkXu0P/AnSeVtu2CYZSjyCQYvkKpdpbDOSlsee
GOnG0jduOdJ+OabYyM6prg9JHp1t3QxxTVJs+Spc7Eh9EX+YHRwK5DNPhv3GQ7RI
pSwQxwiEhLVXVjaEtqUo1Wf8JgCQJ+ThisSIDgQRnaHKQnrRIlbZRngwn6TwNFba
kxBpCUn2RZcZZhL8xdVF4UbSbEeLEupN76rHiePBVKNG70QVRptX3Rd6EV6FxmTa
EzAtfNh1r0p3BHW1YYgsbpy7PeXhKGhlVLqIld8h9r/y4hATsrQ2f7Bv0RNrVBDf
f8r1bdG5E0K3V8AzFPpyOe304G3GAwV+V/wtvA3GRjiwmPUwzJOzaNnSgJZZ/sbq
mnR4pCwJ4gDGnDxLa8hBh6+emQGB6LJJX0JTQY7vMPNmUwtIuEQc6tLxJ4DpXTzW
psEndQPDA7yR/7pNoE7ax+8CKCxPvfiBRnV9sxzmPV6FcxWtzJeLQihOuOA4IB8i
Ddhq2OYH+HCodTNOLWNyMSjk65O7Whee1O/YGiVW9+iUbKBUSBFatJ9fJjH4bXMT
VRZZnhlFGGIMuZXhkL5+a4ZnomqfPRXNGJ/QBbB4Ty7CXr84mXb0SX5gW4qsLOBA
YEuxiqD8Oej2Mrid4ypF5GmwBYLAf4CCZajGVii0yyz2hp69RvlQ0c5lA0WisQu3
wvY2HDFGBsa+
=PfWP
-----END PGP SIGNATURE-----
Merge tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS"
* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
MAINTAINERS: skip CCing netdev for Bluetooth patches
mdio_bus: unhide mdio_bus_init prototype
bridge: always declare tunnel functions
atm: hide unused procfs functions
net: isa: include net/Space.h
Revert "ARM: dts: stm32: add CAN support on stm32f746"
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nf_tables: fix nft_trans type confusion
netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
net: wwan: t7xx: Ensure init is completed before system sleep
net: selftests: Fix optstring
net: pcs: xpcs: fix C73 AN not getting enabled
net: wwan: iosm: fix NULL pointer dereference when removing device
vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
mailmap: add entries for Nikolay Aleksandrov
igb: fix bit_shift to be in [1..8] range
net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
cassini: Fix a memory leak in the error handling path of cas_init_one()
tun: Fix memory leak for detached NAPI queue.
can: kvaser_pciefd: Disable interrupts in probe error path
...
This Kselftest fixes update for Linux 6.4-rc3 consists of:
- sgx test fix for false negatives.
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRlDzgACgkQCwJExA0N
Qxzlfw//fc+Uq5hC2KUeVOiIAPYejEZ3+ToY+OUjaKvykWndMpYys5bCB68qERm8
vtgzYOKm04ovvnyh2jXzxYTjd8439qeRxr0Eybqc5HjiVDhyA1jm/pcsy3EDBf1+
5QGjgIN2fK9rdv7TmyMHL8pSxLvYQfe2zMjLtULuHZWtjzIY98hh3zjA9mdnn3pK
f2zqEJzu2lysTiwkhDF/yspOa1TymVp1W6dCVvfcxr5wJOWYtZaZyOTYNiCI3gX6
9Qy2fm4txPcsooNrwSEekXUIV669fet9vbQX5PF3pRDipxydarAAUUbTm1hOP4X+
3YnKk9dx/Pg483Go/sGMhtf+/P3ZJlWj2YZSro+maULt8O1Jb8TGe1MB7myheiFc
CiXj5duiG+WlwteF8WMsm58vjplgbnFAfyVZ+mumRhE1e4qLtjnc5YxSdNYoOHOF
40wI1Nebfwofh/qW7/ipn5tx7bN29UUSXRd26q1T4qF3KwKYaaj720lcisS7kBR0
gK8FFCdqroKilJ/Fj3e+AM3xqmlJxornj7EhQoAVtBZCsjN90Wyaw2gaCceABp+f
lDAisdO00v/+fpVhWFjXy4Mww5t6MDsRxmE5gLGOS1QHZUV9/mV7H9e9gd9ix6mV
GXYAeytvRrQQBCDEbOz1ax8SS3eA2MFVdbmMxHYn/gWVx6ClpL4=
=DOlj
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- sgx test fix for false negatives
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner
* tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Improve integration with kselftest runner
selftests/sgx: Add "test_encl.elf" to TEST_FILES
The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.
ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)
Fixes: 5feba47273 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The qemu argument -enable-kvm is duplicated because the qemu_args bash
variable in kvm-test-1-run.sh already provides it. This commit therefore
removes the ppc64-specific copy in functions.sh.
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
When building sign-file, the call to get the CFLAGS for libcrypto is
missing white-space between `pkg-config` and `--cflags`:
$(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null)
Removing the redirection of stderr, we see:
$ make -C tools/testing/selftests/bpf sign-file
make: Entering directory '[...]/tools/testing/selftests/bpf'
make: pkg-config--cflags: No such file or directory
SIGN-FILE sign-file
make: Leaving directory '[...]/tools/testing/selftests/bpf'
Add the missing space.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/bpf/20230426215032.415792-1-jeremy@azazel.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZGETgQAKCRDfioYZHlFs
Zy+FAQDDwPDprMrALvuWz3rYPROPH0h6X2zLYH5JFq29cqjO9wD/RVlrXFFkGaG+
3n7Uip2rZaW3OpC2TOaqBaDxTkXo0ww=
=yFDG
-----END PGP SIGNATURE-----
Merge tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link fixes from Dan Williams:
- Fix a compilation issue with DEFINE_STATIC_SRCU() in the unit tests
- Fix leaking kernel memory to a root-only sysfs attribute
* tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl: Add missing return to cdat read error path
tools/testing/cxl: Use DEFINE_STATIC_SRCU()
Starting with commit:
95433f7263 ("srcu: Begin offloading srcu_struct fields to srcu_update")
...it is no longer possible to do:
static DEFINE_SRCU(x)
Switch to DEFINE_STATIC_SRCU(x) to fix:
tools/testing/cxl/test/mock.c:22:1: error: duplicate ‘static’
22 | static DEFINE_SRCU(cxl_mock_srcu);
| ^~~~~~
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/168392709546.1135523.10424917245934547117.stgit@dwillia2-xfh.jf.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
several routers (rt-1, rt-2) and hosts.
When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
Address Detection (DAD) kicks in when enabled in the Linux distros running
the selftests. DAD is used to check whether an IPv6 address is already
assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
Request and waiting for a reply.
As the DAD process could take too long to complete, it may cause the
failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.
To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
since we configure the virtual network manually and do not need any address
deduplication mechanism at all.
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The BUSTED-BOOST and TREE03 scenarios specify a mythical tree.use_softirq
module parameter, which means a failure to get full test coverage. This
commit therefore corrects the name to rcutree.use_softirq.
Fixes: e2b949d543 ("rcutorture: Make TREE03 use real-time tree.use_softirq setting")
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
According to Mirsad the gpio-sim.sh test appears to FAIL in a wrong way
due to missing initialisation of shell variables:
4.2. Bias settings work correctly
cat: /sys/devices/platform/gpio-sim.0/gpiochip18/sim_gpio0/value: No such file or directory
./gpio-sim.sh: line 393: test: =: unary operator expected
bias setting does not work
GPIO gpio-sim test FAIL
After this change the test passed:
4.2. Bias settings work correctly
GPIO gpio-sim test PASS
His testing environment is AlmaLinux 8.7 on Lenovo desktop box with
the latest Linux kernel based on v6.2:
Linux 6.2.0-mglru-kmlk-andy-09238-gd2980d8d8265 x86_64
Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmRbUnQACgkQ1V2XiooU
IOQNVw//eoCiid3J4TuNdzHaBHXUlZvln1n5Z6K5fIz5ytrY1T7oKC8Y9StkzzWR
29ToFLOJt1iRAcgxsghWRIwUzNwpuUdqgd9cUZMHMQxT0BJItp6FXUql2+1LkF/I
b/gnnb90zyE7lBS/VSRyOiqMiJlP+Som22d7Nn5k2KfTYEdXKwfzjsWAu3W3Sb0s
Lv/MA9DE42qcwiZubmFmDtOtAunPJFZm3HgkcAVeXoNkBDrSfkvxLIMYG6VfFNhQ
AkKMyzX293wpwVxfOuQfJr4QVlxAgOQUko+FqajoWMBtfA3yldZjJ8RC7c9Af9uI
ciOP11vHBCG84KrTabC5kdqOcvadreDiM/oIvk57ztQhCr3e+po+vIz6Cv4p9I30
m5GXfgbtMRl6hM2S5lrRc5fNRkYJHE4aNvesFTGaLpK3LogusH1E9mH5jdjnbU42
TwkIe250qJJelNn9ZxS5Jt0BgyogNfeiA9lOXmaQBpYmwIahrjDf0g8z2I85nPhF
PDukjSXsxi4uHwpSF5wFrlqkAPiEX4vC95uSUTVbbBgZrHhNJdGgu4FJSHRM4mVo
6Awxk2O2bvcDpXEfTBdD4EJF71bZ/aH2i8ddU1oupIl88O01TWTtkO4SYHqMX+tC
fPnpGzBN8KJvHgeFxO4p0v1oelv2RtiITv1gi7YJOnq/+Vd2yy0=
=HmfP
-----END PGP SIGNATURE-----
Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: use /proc for pid checking
netfilter: conntrack: fix possible bug_on with enable_hooks=1
netfilter: nf_tables: always release netdev hooks from notifier
====================
Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When run the test in netns, it's not easy to get the tc stats via
tc_rule_handle_stats_get(). With the new netns parameter, we can get
stats from specific netns like
num=$(tc_rule_handle_stats_get "dev eth0 ingress" 101 ".packets" "-n ns")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure flowtable interacts correctly with ingress and egress
chains, i.e. those get handled before and after flow table respectively.
Adds three more tests:
1. repeat flowtable test, but with 'ip dscp set cs3' done in
inet forward chain.
Expect that some packets have been mangled (before flowtable offload
became effective) while some pass without mangling (after offload
succeeds).
2. repeat flowtable test, but with 'ip dscp set cs3' done in
veth0:ingress.
Expect that all packets pass with cs3 dscp field.
3. same as 2, but use veth1:egress. Expect the same outcome.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When running nft_flowtable.sh in VM on a busy server we've found that
the time of the netcat file transfers vary wildly.
Therefore replace hardcoded 3 second sleep with the loop checking for
a change in the file sizes. Once no change in detected we test the results.
Nice side effect is that we shave 1 second sleep in the fast case
(hard-coded 3 second sleep vs two 1 second sleeps).
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -x option. For the
purposes of hash calculation of the list of processes this option is
inessential.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
pid existence check.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e6 ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Writing `subprocess.Popen[str]` requires python 3.9+.
kunit.py has an assertion that the python version is 3.7+, so we should
try to stay backwards compatible.
This conflicts a bit with commit 1da2e6220e ("kunit: tool: fix
pre-existing `mypy --strict` errors and update run_checks.py"), since
mypy complains like so
> kunit_kernel.py:95: error: Missing type parameters for generic type "Popen" [type-arg]
Note: `mypy --strict --python-version 3.7` does not work.
We could annotate each file with comments like
`# mypy: disable-error-code="type-arg"
but then we might still get nudged to break back-compat in other files.
This patch adds a `mypy.ini` file since it seems like the only way to
disable specific error codes for all our files.
Note: run_checks.py doesn't need to specify `--config_file mypy.ini`,
but I think being explicit is better, particularly since most kernel
devs won't be familiar with how mypy works.
Fixes: 695e260308 ("kunit: tool: add subscripts for type annotations where appropriate")
Reported-by: SeongJae Park <sj@kernel.org>
Link: https://lore.kernel.org/linux-kselftest/20230501171520.138753-1-sj@kernel.org
Signed-off-by: Daniel Latypov <dlatypov@google.com>
Tested-by: SeongJae Park <sj@kernel.org>
Reviewed-by: David Gow <davidgow@google.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
=7G78
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFLsxAAKCRDdBJ7gKXxA
jl8yAQCqjstPsOULf9QN0z4bGAUhY+Wj4ERz1jbKSIuhFCJWiQEAgQvgRXObKjmi
OtUB0Ek4CMDCQzbyIQ1Bhp3kxi6+Jgs=
=AbyC
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- Some DAMON cleanups from Kefeng Wang
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
[ Andrew called these "final", but I suspect we'll have a series fixing
up the fact that the last commit in the dmapools series in the
previous pull seems to have unintentionally just reverted all the
other commits in the same series.. - Linus ]
* tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: hwpoison: coredump: support recovery from dump_user_range()
mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
mm/ksm: move disabling KSM from s390/gmap code to KSM code
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0
mm/damon/paddr: fix missing folio_sz update in damon_pa_young()
mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate()
mm/damon/paddr: minor refactor of damon_pa_pageout()
1. Don't hard-code pkg-config
2. Remove distro-specific default for CFLAGS
3. Use pkg-config for LDLIBS
Fixes: a50a88f026 ("selftests: netfilter: fix a build error on openSUSE")
Suggested-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1
will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting
MADV_MERGEABLE would.
Link: https://lkml.kernel.org/r/20230422205420.30372-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Roesch <shr@devkernel.io>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* More phys_to_virt conversions
* Improvement of AP management for VSIE (nested virtualization)
ARM64:
* Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
* New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features
being moved to VMMs rather than be implemented in the kernel.
* Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one.
This last part allows the NV timer code to be implemented on
top.
* A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
* The usual selftest fixes and improvements.
KVM x86 changes for 6.4:
* Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
and by giving the guest control of CR0.WP when EPT is enabled on VMX
(VMX-only because SVM doesn't support per-bit controls)
* Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
as a bool
* Move AMD_PSFD to cpufeatures.h and purge KVM's definition
* Avoid unnecessary writes+flushes when the guest is only adding new PTEs
* Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
when emulating invalidations
* Clean up the range-based flushing APIs
* Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
changed SPTE" overhead associated with writing the entire entry
* Track the number of "tail" entries in a pte_list_desc to avoid having
to walk (potentially) all descriptors during insertion and deletion,
which gets quite expensive if the guest is spamming fork()
* Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
* Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, similar to CPUID features
* Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
* Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
x86 AMD:
* Add support for virtual NMIs
* Fixes for edge cases related to virtual interrupts
x86 Intel:
* Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
* Fix a bug in emulation of ENCLS in compatibility mode
* Allow emulation of NOP and PAUSE for L2
* AMX selftests improvements
* Misc cleanups
MIPS:
* Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
Generic:
* Drop unnecessary casts from "void *" throughout kvm_main.c
* Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
Documentation:
* Fix goof introduced by the conversion to rST
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
=S2Hq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"s390:
- More phys_to_virt conversions
- Improvement of AP management for VSIE (nested virtualization)
ARM64:
- Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
- New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features being
moved to VMMs rather than be implemented in the kernel.
- Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one. This
last part allows the NV timer code to be implemented on top.
- A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
- The usual selftest fixes and improvements.
x86:
- Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
enabled, and by giving the guest control of CR0.WP when EPT is
enabled on VMX (VMX-only because SVM doesn't support per-bit
controls)
- Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
return as a bool
- Move AMD_PSFD to cpufeatures.h and purge KVM's definition
- Avoid unnecessary writes+flushes when the guest is only adding new
PTEs
- Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
optimizations when emulating invalidations
- Clean up the range-based flushing APIs
- Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
single A/D bit using a LOCK AND instead of XCHG, and skip all of
the "handle changed SPTE" overhead associated with writing the
entire entry
- Track the number of "tail" entries in a pte_list_desc to avoid
having to walk (potentially) all descriptors during insertion and
deletion, which gets quite expensive if the guest is spamming
fork()
- Disallow virtualizing legacy LBRs if architectural LBRs are
available, the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably
PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features
- Overhaul the vmx_pmu_caps selftest to better validate
PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- AMD SVM:
- Add support for virtual NMIs
- Fixes for edge cases related to virtual interrupts
- Intel AMX:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
XTILE_DATA is not being reported due to userspace not opting in
via prctl()
- Fix a bug in emulation of ENCLS in compatibility mode
- Allow emulation of NOP and PAUSE for L2
- AMX selftests improvements
- Misc cleanups
MIPS:
- Constify MIPS's internal callbacks (a leftover from the hardware
enabling rework that landed in 6.3)
Generic:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
struct size by 8 bytes on 64-bit kernels by utilizing a padding
hole
Documentation:
- Fix goof introduced by the conversion to rST"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
KVM: s390: pci: fix virtual-physical confusion on module unload/load
KVM: s390: vsie: clarifications on setting the APCB
KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
KVM: selftests: Test the PMU event "Instructions retired"
KVM: selftests: Copy full counter values from guest in PMU event filter test
KVM: selftests: Use error codes to signal errors in PMU event filter test
KVM: selftests: Print detailed info in PMU event filter asserts
KVM: selftests: Add helpers for PMC asserts in PMU event filter test
KVM: selftests: Add a common helper for the PMU event filter guest code
KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
KVM: arm64: vhe: Drop extra isb() on guest exit
KVM: arm64: vhe: Synchronise with page table walker on MMU update
KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
KVM: arm64: nvhe: Synchronise with page table walker on TLBI
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
KVM: arm64: vgic: Don't acquire its_lock before config_lock
KVM: selftests: Add test to verify KVM's supported XCR0
...
- Refactor the DOE infrastructure (Data Object Exchange PCI-config-cycle
mailbox) to be a facility of the PCI core rather than the CXL core.
This is foundational for upcoming support for PCI device-attestation and
PCIe / CXL link encryption.
- Add support for retrieving and injecting poison for CXL memory
expanders. This enabling uses trace-events to convey CXL media error
records to user tooling. It includes translation of device-local
addresses (DPA) to system physical addresses (SPA) and their
corresponding CXL region.
- Fixes for decoder enumeration that missed v6.3-final
- Miscellaneous fixups
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCZE2nNwAKCRDfioYZHlFs
Z5c2AQCTWebok6CD+HN01xnIx+CBWAUQe0QIGR40dT2P6/WGEgEA8wMae0w/FDlc
lQDvSoIyPvy1hGO7Ppb0K2AT6jrQAgU=
=blcC
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull compute express link updates from Dan Williams:
"DOE support is promoted from drivers/cxl/ to drivers/pci/ with Bjorn's
blessing, and the CXL core continues to mature its media management
capabilities with support for listing and injecting media errors. Some
late fixes that missed v6.3-final are also included:
- Refactor the DOE infrastructure (Data Object Exchange
PCI-config-cycle mailbox) to be a facility of the PCI core rather
than the CXL core.
This is foundational for upcoming support for PCI
device-attestation and PCIe / CXL link encryption.
- Add support for retrieving and injecting poison for CXL memory
expanders.
This enabling uses trace-events to convey CXL media error records
to user tooling. It includes translation of device-local addresses
(DPA) to system physical addresses (SPA) and their corresponding
CXL region.
- Fixes for decoder enumeration that missed v6.3-final
- Miscellaneous fixups"
* tag 'cxl-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (38 commits)
cxl/test: Add mock test for set_timestamp
cxl/mbox: Update CMD_RC_TABLE
tools/testing/cxl: Require CONFIG_DEBUG_FS
tools/testing/cxl: Add a sysfs attr to test poison inject limits
tools/testing/cxl: Use injected poison for get poison list
tools/testing/cxl: Mock the Clear Poison mailbox command
tools/testing/cxl: Mock the Inject Poison mailbox command
cxl/mem: Add debugfs attributes for poison inject and clear
cxl/memdev: Trace inject and clear poison as cxl_poison events
cxl/memdev: Warn of poison inject or clear to a mapped region
cxl/memdev: Add support for the Clear Poison mailbox command
cxl/memdev: Add support for the Inject Poison mailbox command
tools/testing/cxl: Mock support for Get Poison List
cxl/trace: Add an HPA to cxl_poison trace events
cxl/region: Provide region info to the cxl_poison trace event
cxl/memdev: Add trigger_poison_list sysfs attribute
cxl/trace: Add TRACE support for CXL media-error records
cxl/mbox: Add GET_POISON_LIST mailbox command
cxl/mbox: Initialize the poison state
cxl/mbox: Restrict poison cmds to debugfs cxl_raw_allow_all
...
* cpuset changes including the fix for an incorrect interaction with CPU
hotplug and an optimization.
* Other doc and cosmetic changes.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErfng4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVVtAQCDycK4VSgc4nsFPG1vh1Oy1A723ciEUwAbKmV/
F1n7xwEA68FiDvE29LpMJJuYP9HnX0A5zRMyNnb52kN9jmgcEQI=
=ALol
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset changes including the fix for an incorrect interaction with
CPU hotplug and an optimization
- Other doc and cosmetic changes
* tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1/cpusets: update libcgroup project link
cgroup/cpuset: Minor updates to test_cpuset_prs.sh
cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
cpuset: Clean up cpuset_node_allowed
cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
* Support for runtime detection of the Svnapot extension.
* Support for Zicboz when clearing pages.
* We've moved to GENERIC_ENTRY.
* Support for !MMU on rv32 systems.
* The linear region is now mapped via huge pages.
* Support for building relocatable kernels.
* Support for the hwprobe interface.
* Various fixes and cleanups throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
g6G3ZP6STSQBmw==
=oxQd
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for runtime detection of the Svnapot extension
- Support for Zicboz when clearing pages
- We've moved to GENERIC_ENTRY
- Support for !MMU on rv32 systems
- The linear region is now mapped via huge pages
- Support for building relocatable kernels
- Support for the hwprobe interface
- Various fixes and cleanups throughout the tree
* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
RISC-V: hwprobe: Explicity check for -1 in vdso init
RISC-V: hwprobe: There can only be one first
riscv: Allow to downgrade paging mode from the command line
dt-bindings: riscv: add sv57 mmu-type
RISC-V: hwprobe: Remove __init on probe_vendor_features()
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
...
- Add support for building the kernel using PC-relative addressing on Power10.
- Allow HV KVM guests on Power10 to use prefixed instructions.
- Unify support for the P2020 CPU (85xx) into a single machine description.
- Always build the 64-bit kernel with 128-bit long double.
- Drop support for several obsolete 2000's era development boards as
identified by Paul Gortmaker.
- A series fixing VFIO on Power since some generic changes.
- Various other small features and fixes.
Thanks to: Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel Stanley,
Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro Yamada, Michael
Neuling, Nathan Chancellor, Nathan Lynch, Nicholas Miehlbradt, Nicholas Piggin,
Nick Desaulniers, Nysal Jan K.A, Pali Rohár, Paul Gortmaker, Paul Mackerras,
Petr Vaněk, Randy Dunlap, Rob Herring, Sachin Sant, Sean Christopherson, Segher
Boessenkool, Timothy Pearson.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmRLdD8THG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgPrED/93Hwi1/8udXYV6OAFBnLLLDX9RZWNx
sj6W7PC/yY7MGUTIGcayrAURt5xFfQZMBdq9nhhH46Wyd8pSUe1IlpXIEgqeH64Y
5uaSe6u4OZ5dDmiYz8bM+Y4Ixkfq1xMO0Rj27FIRJyU4Pp6gyMQ6/8W+iPU2vYIb
jl4frVO8PpmCbW8euOyT/b9YB2h79+5nLgZT4RvmYJblKQwgzRZWaiCAU0wYf+xT
xYsMQcqJlPstizTnXIr+GC08VrMPQR51kJCurnNUMXoAQY6toEXebveWRNZ3sH39
K0BRQ036NNGPS4GlHVbjgLIdoWr4pUEROZ48Jy9WeiDh6OwO2vb2zrm7ZLtKlGXI
LCQ08T9diPbAAJyVJaKjpsNXhTffuLPRhIOr1o2vqGvY+Fqbw96bPQAyFbxpJtrw
rGTTc5e93fEdZV+eaGR8kfdNM75WM6lgiteaIbUnpirYTh/0j6YEO1Ijc4d9e5ux
aoHN2eXQDjMm9foZf1D6vaCTRyN8LwLa9hcydKCn6+qULUQzoZ2r4cRt6eSD7+fx
Wni1Zg7vD6xx294nVmhg5dWuD4qVIAZj3BZPFHHR2SPqy+UbEJLk/NKgznmrhhgQ
HBcZAEFqIBZkA4e0w6LJKh/1j1rxpZUokgjMotOJq4VRivq82W1P7SioFDY3/6w5
UvGtIgGq4Qsa2Q==
=89WH
-----END PGP SIGNATURE-----
Merge tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc updates from Michael Ellerman:
- Add support for building the kernel using PC-relative addressing on
Power10.
- Allow HV KVM guests on Power10 to use prefixed instructions.
- Unify support for the P2020 CPU (85xx) into a single machine
description.
- Always build the 64-bit kernel with 128-bit long double.
- Drop support for several obsolete 2000's era development boards as
identified by Paul Gortmaker.
- A series fixing VFIO on Power since some generic changes.
- Various other small features and fixes.
Thanks to Alexey Kardashevskiy, Andrew Donnellan, Benjamin Gray, Bo Liu,
Christophe Leroy, Dan Carpenter, David Binderman, Ira Weiny, Joel
Stanley, Kajol Jain, Kautuk Consul, Liang He, Luis Chamberlain, Masahiro
Yamada, Michael Neuling, Nathan Chancellor, Nathan Lynch, Nicholas
Miehlbradt, Nicholas Piggin, Nick Desaulniers, Nysal Jan K.A, Pali
Rohár, Paul Gortmaker, Paul Mackerras, Petr Vaněk, Randy Dunlap, Rob
Herring, Sachin Sant, Sean Christopherson, Segher Boessenkool, and
Timothy Pearson.
* tag 'powerpc-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (156 commits)
powerpc/64s: Disable pcrel code model on Clang
powerpc: Fix merge conflict between pcrel and copy_thread changes
powerpc/configs/powernv: Add IGB=y
powerpc/configs/64s: Drop JFS Filesystem
powerpc/configs/64s: Use EXT4 to mount EXT2 filesystems
powerpc/configs: Make pseries_defconfig an alias for ppc64le_guest
powerpc/configs: Make pseries_le an alias for ppc64le_guest
powerpc/configs: Incorporate generic kvm_guest.config into guest configs
powerpc/configs: Add IBMVETH=y and IBMVNIC=y to guest configs
powerpc/configs/64s: Enable Device Mapper options
powerpc/configs/64s: Enable PSTORE
powerpc/configs/64s: Enable VLAN support
powerpc/configs/64s: Enable BLK_DEV_NVME
powerpc/configs/64s: Drop REISERFS
powerpc/configs/64s: Use SHA512 for module signatures
powerpc/configs/64s: Enable IO_STRICT_DEVMEM
powerpc/configs/64s: Enable SCHEDSTATS
powerpc/configs/64s: Enable DEBUG_VM & other options
powerpc/configs/64s: Enable EMULATED_STATS
powerpc/configs/64s: Enable KUNIT and most tests
...
- User events are finally ready!
After lots of collaboration between various parties, we finally locked
down on a stable interface for user events that can also work with user
space only tracing. This is implemented by telling the kernel (or user
space library, but that part is user space only and not part of this
patch set), where the variable is that the application uses to know if
something is listening to the trace. There's also an interface to tell
the kernel about these events, which will show up in the
/sys/kernel/tracing/events/user_events/ directory, where it can be
enabled. When it's enabled, the kernel will update the variable, to tell
the application to start writing to the kernel.
See https://lwn.net/Articles/927595/
- Cleaned up the direct trampolines code to simplify arm64 addition of
direct trampolines. Direct trampolines use the ftrace interface but
instead of jumping to the ftrace trampoline, applications (mostly BPF)
can register their own trampoline for performance reasons.
- Some updates to the fprobe infrastructure. fprobes are more efficient than
kprobes, as it does not need to save all the registers that kprobes on
ftrace do. More work needs to be done before the fprobes will be exposed
as dynamic events.
- More updates to references to the obsolete path of
/sys/kernel/debug/tracing for the new /sys/kernel/tracing path.
- Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer line
by line instead of all at once. There's users in production kernels that
have a large data dump that originally used printk() directly, but the
data dump was larger than what printk() allowed as a single print.
Using seq_buf() to do the printing fixes that.
- Add /sys/kernel/tracing/touched_functions that shows all functions that
was every traced by ftrace or a direct trampoline. This is used for
debugging issues where a traced function could have caused a crash by
a bpf program or live patching.
- Add a "fields" option that is similar to "raw" but outputs the fields of
the events. It's easier to read by humans.
- Some minor fixes and clean ups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZEr36xQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6quZHAQCzuqnn2S8DsPd3Sy1vKIYaj0uajW5D
Kz1oUJH4F0H7kgEA8XwXkdtfKpOXWc/ZH4LWfL7Orx2wJZJQMV9dVqEPDAE=
=w0Z1
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- User events are finally ready!
After lots of collaboration between various parties, we finally
locked down on a stable interface for user events that can also work
with user space only tracing.
This is implemented by telling the kernel (or user space library, but
that part is user space only and not part of this patch set), where
the variable is that the application uses to know if something is
listening to the trace.
There's also an interface to tell the kernel about these events,
which will show up in the /sys/kernel/tracing/events/user_events/
directory, where it can be enabled.
When it's enabled, the kernel will update the variable, to tell the
application to start writing to the kernel.
See https://lwn.net/Articles/927595/
- Cleaned up the direct trampolines code to simplify arm64 addition of
direct trampolines.
Direct trampolines use the ftrace interface but instead of jumping to
the ftrace trampoline, applications (mostly BPF) can register their
own trampoline for performance reasons.
- Some updates to the fprobe infrastructure. fprobes are more efficient
than kprobes, as it does not need to save all the registers that
kprobes on ftrace do. More work needs to be done before the fprobes
will be exposed as dynamic events.
- More updates to references to the obsolete path of
/sys/kernel/debug/tracing for the new /sys/kernel/tracing path.
- Add a seq_buf_do_printk() helper to seq_bufs, to print a large buffer
line by line instead of all at once.
There are users in production kernels that have a large data dump
that originally used printk() directly, but the data dump was larger
than what printk() allowed as a single print.
Using seq_buf() to do the printing fixes that.
- Add /sys/kernel/tracing/touched_functions that shows all functions
that was every traced by ftrace or a direct trampoline. This is used
for debugging issues where a traced function could have caused a
crash by a bpf program or live patching.
- Add a "fields" option that is similar to "raw" but outputs the fields
of the events. It's easier to read by humans.
- Some minor fixes and clean ups.
* tag 'trace-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (41 commits)
ring-buffer: Sync IRQ works before buffer destruction
tracing: Add missing spaces in trace_print_hex_seq()
ring-buffer: Ensure proper resetting of atomic variables in ring_buffer_reset_online_cpus
recordmcount: Fix memory leaks in the uwrite function
tracing/user_events: Limit max fault-in attempts
tracing/user_events: Prevent same address and bit per process
tracing/user_events: Ensure bit is cleared on unregister
tracing/user_events: Ensure write index cannot be negative
seq_buf: Add seq_buf_do_printk() helper
tracing: Fix print_fields() for __dyn_loc/__rel_loc
tracing/user_events: Set event filter_type from type
ring-buffer: Clearly check null ptr returned by rb_set_head_page()
tracing: Unbreak user events
tracing/user_events: Use print_format_fields() for trace output
tracing/user_events: Align structs with tabs for readability
tracing/user_events: Limit global user_event count
tracing/user_events: Charge event allocs to cgroups
tracing/user_events: Update documentation for ABI
tracing/user_events: Use write ABI in example
tracing/user_events: Add ABI self-test
...
to ARM's Top Byte Ignore and allows userspace to store metadata in some
bits of pointers without masking it out before use.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmRK/WIACgkQaDWVMHDJ
krAL+RAAw33EhsWyYVkeAtYmYBKkGvlgeSDULtfJKe5bynJBTHkGKfM6RE9MSJIt
5fHWaConGh8HNpy0Us1sDvd/aWcWRm5h7ZcCVD+R4qrgh/vc7ULzM+elXe5jzr4W
cyuTckF2eW6SVrYg6fH5q+6Uy/moDtrdkLRvwRBf+AYeepB8gvSSH5XixKDNiVBE
pjNy1xXVZQokqD4tjsFelmLttyacR5OabiE/aeVNoFYf9yTwfnN8N3T6kwuOoS4l
Lp6NA+/0ux+oBlR+Is+JJG8Mxrjvz96yJGZYdR2YP5k3bMQtHAAjuq2w+GgqZm5i
j3/E6KQepEGaCfC+bHl68xy/kKx8ik+jMCEcBalCC25J3uxbLz41g6K3aI890wJn
+5ZtfcmoDUk9pnUyLxR8t+UjOSBFAcRSUE+FTjUH1qEGsMPK++9a4iLXz5vYVK1+
+YCt1u5LNJbkDxE8xVX3F5jkXh0G01SJsuUVAOqHSNfqSNmohFK8/omqhVRrRqoK
A7cYLtnOGiUXLnvjrwSxPNOzRrG+GAwqaw8gwOTaYogETWbTY8qsSCEVl204uYwd
m8io9rk2ZXUdDuha56xpBbPE0JHL9hJ2eKCuPkfvRgJT9YFyTh+e0UdX20k+nDjc
ang1S350o/Y0sus6rij1qS8AuxJIjHucG0GdgpZk3KUbcxoRLhI=
=qitk
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 LAM (Linear Address Masking) support from Dave Hansen:
"Add support for the new Linear Address Masking CPU feature.
This is similar to ARM's Top Byte Ignore and allows userspace to store
metadata in some bits of pointers without masking it out before use"
* tag 'x86_mm_for_6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm/iommu/sva: Do not allow to set FORCE_TAGGED_SVA bit from outside
x86/mm/iommu/sva: Fix error code for LAM enabling failure due to SVA
selftests/x86/lam: Add test cases for LAM vs thread creation
selftests/x86/lam: Add ARCH_FORCE_TAGGED_SVA test cases for linear-address masking
selftests/x86/lam: Add inherit test cases for linear-address masking
selftests/x86/lam: Add io_uring test cases for linear-address masking
selftests/x86/lam: Add mmap and SYSCALL test cases for linear-address masking
selftests/x86/lam: Add malloc and tag-bits test cases for linear-address masking
x86/mm/iommu/sva: Make LAM and SVA mutually exclusive
iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()
mm: Expose untagging mask in /proc/$PID/status
x86/mm: Provide arch_prctl() interface for LAM
x86/mm: Reduce untagged_addr() overhead for systems without LAM
x86/uaccess: Provide untagged_addr() and remove tags before address check
mm: Introduce untagged_addr_remote()
x86/mm: Handle LAM on context switch
x86: CPUID and CR3/CR4 flags for Linear Address Masking
x86: Allow atomic MM_CONTEXT flags setting
x86/mm: Rework address range check in get_user() and put_user()
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 03a0b567a0 ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page().
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive rather
than exclusive, and has fixed a bunch of errors which were caused by its
unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZEr3zQAKCRDdBJ7gKXxA
jlLoAP0fpQBipwFxED0Us4SKQfupV6z4caXNJGPeay7Aj11/kQD/aMRC2uPfgr96
eMG3kwn2pqkB9ST2QpkaRbxA//eMbQY=
=J+Dj
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Nick Piggin's "shoot lazy tlbs" series, to improve the peformance of
switching from a user process to a kernel thread.
- More folio conversions from Kefeng Wang, Zhang Peng and Pankaj
Raghav.
- zsmalloc performance improvements from Sergey Senozhatsky.
- Yue Zhao has found and fixed some data race issues around the
alteration of memcg userspace tunables.
- VFS rationalizations from Christoph Hellwig:
- removal of most of the callers of write_one_page()
- make __filemap_get_folio()'s return value more useful
- Luis Chamberlain has changed tmpfs so it no longer requires swap
backing. Use `mount -o noswap'.
- Qi Zheng has made the slab shrinkers operate locklessly, providing
some scalability benefits.
- Keith Busch has improved dmapool's performance, making part of its
operations O(1) rather than O(n).
- Peter Xu adds the UFFD_FEATURE_WP_UNPOPULATED feature to userfaultd,
permitting userspace to wr-protect anon memory unpopulated ptes.
- Kirill Shutemov has changed MAX_ORDER's meaning to be inclusive
rather than exclusive, and has fixed a bunch of errors which were
caused by its unintuitive meaning.
- Axel Rasmussen give userfaultfd the UFFDIO_CONTINUE_MODE_WP feature,
which causes minor faults to install a write-protected pte.
- Vlastimil Babka has done some maintenance work on vma_merge():
cleanups to the kernel code and improvements to our userspace test
harness.
- Cleanups to do_fault_around() by Lorenzo Stoakes.
- Mike Rapoport has moved a lot of initialization code out of various
mm/ files and into mm/mm_init.c.
- Lorenzo Stoakes removd vmf_insert_mixed_prot(), which was added for
DRM, but DRM doesn't use it any more.
- Lorenzo has also coverted read_kcore() and vread() to use iterators
and has thereby removed the use of bounce buffers in some cases.
- Lorenzo has also contributed further cleanups of vma_merge().
- Chaitanya Prakash provides some fixes to the mmap selftesting code.
- Matthew Wilcox changes xfs and afs so they no longer take sleeping
locks in ->map_page(), a step towards RCUification of pagefaults.
- Suren Baghdasaryan has improved mmap_lock scalability by switching to
per-VMA locking.
- Frederic Weisbecker has reworked the percpu cache draining so that it
no longer causes latency glitches on cpu isolated workloads.
- Mike Rapoport cleans up and corrects the ARCH_FORCE_MAX_ORDER Kconfig
logic.
- Liu Shixin has changed zswap's initialization so we no longer waste a
chunk of memory if zswap is not being used.
- Yosry Ahmed has improved the performance of memcg statistics
flushing.
- David Stevens has fixed several issues involving khugepaged,
userfaultfd and shmem.
- Christoph Hellwig has provided some cleanup work to zram's IO-related
code paths.
- David Hildenbrand has fixed up some issues in the selftest code's
testing of our pte state changing.
- Pankaj Raghav has made page_endio() unneeded and has removed it.
- Peter Xu contributed some rationalizations of the userfaultfd
selftests.
- Yosry Ahmed has fixed an issue around memcg's page recalim
accounting.
- Chaitanya Prakash has fixed some arm-related issues in the
selftests/mm code.
- Longlong Xia has improved the way in which KSM handles hwpoisoned
pages.
- Peter Xu fixes a few issues with uffd-wp at fork() time.
- Stefan Roesch has changed KSM so that it may now be used on a
per-process and per-cgroup basis.
* tag 'mm-stable-2023-04-27-15-30' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits)
mm,unmap: avoid flushing TLB in batch if PTE is inaccessible
shmem: restrict noswap option to initial user namespace
mm/khugepaged: fix conflicting mods to collapse_file()
sparse: remove unnecessary 0 values from rc
mm: move 'mmap_min_addr' logic from callers into vm_unmapped_area()
hugetlb: pte_alloc_huge() to replace huge pte_alloc_map()
maple_tree: fix allocation in mas_sparse_area()
mm: do not increment pgfault stats when page fault handler retries
zsmalloc: allow only one active pool compaction context
selftests/mm: add new selftests for KSM
mm: add new KSM process and sysfs knobs
mm: add new api to enable ksm per process
mm: shrinkers: fix debugfs file permissions
mm: don't check VMA write permissions if the PTE/PMD indicates write permissions
migrate_pages_batch: fix statistics for longterm pin retry
userfaultfd: use helper function range_in_vma()
lib/show_mem.c: use for_each_populated_zone() simplify code
mm: correct arg in reclaim_pages()/reclaim_clean_pages_from_list()
fs/buffer: convert create_page_buffers to folio_create_buffers
fs/buffer: add folio_create_empty_buffers helper
...
- sh: Use generic GCC library routines
- sh: sq: Use the bitmap API when applicable
- sh: sq: Fix incorrect element size for allocating bitmap buffer
- sh: pci: Remove unused variable in SH-7786 PCI Express code
- sh: mcount.S: fix build error when PRINTK is not enabled
- sh: remove sh5/sh64 last fragments
- sh: math-emu: fix macro redefined warning
- sh: init: use OF_EARLY_FLATTREE for early init
- sh: nmi_debug: fix return value of __setup handler
- sh: SH2007: drop the bad URL info
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEYv+KdYTgKVaVRgAGdCY7N/W1+RMFAmRIz0UACgkQdCY7N/W1
+RMm1Q/9Hw5xMnxHbryDoBAqgwEOZRH+MUMBnAyMw3shqxO/Cp/nIAacvdNmF4Me
iszDjATleshk8vbTwUE6cFPzKuLM8r4o1JfBvYSEBgkfs5YEEhoa+1TQZ6aYl3zD
v6vcVQnobaV5dUc9yUA3FdG/vuXEj7wctZuqO0QYsC/bE5g/r1fFTEd37Jbo2qwg
6sJ+xL8KEa29Abq9OP0QmeOWvHBuGcCLZNgagA4JxT7U4+jYhg0ddphw+c3yybnP
FX1eFMulB98V/oDPCOlfrYsZAkQGoYPWwY0WI/nVg8ujA3lbRkSu6Fd9ic95/PGG
KVjr6Mol6/+ESy4k/MB46bJzq0un2FPWhZzyfL0RoCbX2zQWBtC/1XbT0PmTsRud
CzcPAMpNPDwUTcoSWdUpOfEAbxjIgGNhQBth9lRMNFhNkk8cwgk1UAN0LjBRm5nq
MteTim3qCyiFkNlngpvSVbIokBKWllKAtPSL3wCi6OgQCNm7XWZxme2z8G5tVkit
Q9bTVD5qMt24pRJsGsVho8wvRsqMmtl5hwMzFVP02WBNxb9csHpQHrhG7MRLN9kt
0BPYU6erCcRl9DQ9HonUaKCmJDJEyxUcXan48TSyGzajFDnURS7AfkreO7NHQIbO
YAaCvqCDwGVygBjUQtHLrBWlORjAD8IoMEJ1sivRzCeHXGlmI6s=
=RGSv
-----END PGP SIGNATURE-----
Merge tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux
Pull sh updates from John Paul Adrian Glaubitz:
"This is a bit larger than my previous one and mainly consists of
clean-up work in the arch/sh directory by Geert Uytterhoeven and Randy
Dunlap.
Additionally, this fixes a bug in the Storage Queue code that was
discovered while I was reviewing a patch to switch the code to the
bitmap API by Christophe Jaillet.
So this contains both a fix for the original bug in the Storage Queue
code that can be backported later as well as the Christophe's patch to
swich the code to the bitmap API.
Summary:
- Use generic GCC library routines
- sq: Use the bitmap API when applicable
- sq: Fix incorrect element size for allocating bitmap buffer
- pci: Remove unused variable in SH-7786 PCI Express code
- mcount.S: fix build error when PRINTK is not enabled
- remove sh5/sh64 last fragments
- math-emu: fix macro redefined warning
- init: use OF_EARLY_FLATTREE for early init
- nmi_debug: fix return value of __setup handler
- SH2007: drop the bad URL info"
* tag 'sh-for-v6.4-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/glaubitz/sh-linux:
sh: Replace <uapi/asm/types.h> by <asm-generic/int-ll64.h>
sh: Use generic GCC library routines
sh: sq: Use the bitmap API when applicable
sh: sq: Fix incorrect element size for allocating bitmap buffer
sh: pci: Remove unused variable in SH-7786 PCI Express code
sh: mcount.S: fix build error when PRINTK is not enabled
sh: remove sh5/sh64 last fragments
sh: math-emu: fix macro redefined warning
sh: init: use OF_EARLY_FLATTREE for early init
sh: nmi_debug: fix return value of __setup handler
sh: SH2007: drop the bad URL info
Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening in
the driver core in the quest to be able to move "struct bus" and "struct
class" into read-only memory, a task now complete with these changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules for
all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most of
them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCZEp7Sw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykitQCfamUHpxGcKOAGuLXMotXNakTEsxgAoIquENm5
LEGadNS38k5fs+73UaxV
=7K4B
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIVAwUAZEmThqZi849r7WBJAQLKTxAAwLKvk8xCUVPardg2tYxLSaNJAeSgo4L0
CKgB52kXa5R6+L3OApKgkREkj0TotNpNA5Gc/1DlPiRrUXPAj7g+NS2ID8SfXOUd
Iii42DoVZli03kG2xoLgU9Fy7mJ1JdfCC6dhP95y6oDzsZqb87M8sk+2G59KVhXO
KXaVMSU68+AKdXwDCbxhDwR+CH0YpGUqZxURKYycIZQhWPCssBDHorqJLLHzodSx
jk+OKAqTAURjt3Pqqn6BwyOXmjhsomUfJ2z01i/I062+zFTjy+6RBhqqbOPBpJ0w
D34nDwunyhlha11u1dJoP2lpmujJvliTUPM0ddeZTTMbRf58LzpxtVBPSsy389uI
pqC14OdUDEvlp4WX4Xkj7K2m4HpE9hYL1gF2ebnwvyS2f1Sjti1mKSYvs/cJk5nY
nlivD7lmj4Cc0SDasyfqnkP9TUxF+1SNoDAImtku/ajtIGsguveU8kYZtZxKj3WO
A0LZKabKH/jEvJug/aQA0l5+AdP88mGLre+WYc6xh7IxTlsXnYeLpaYOdGZ19WCQ
tjpc+z+nPSszc0wQs2TsJSxQpkzcO+8qS+h9GFhBm5DREfVHR8wrsMrdxot55zvm
+j9sMN8oD7RQwxtG9DUF2wzIyjKe/k9b3qbe/BApC65WsMiXdSvlhJKhvNZQs+w3
1OGeT5LJpqc=
=JhWj
-----END PGP SIGNATURE-----
Merge tag 'for-linus-2023042601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID updates from Jiri Kosina:
- import a bunch of HID selftests from out-of-tree hid-tools project
(Benjamin Tissoires)
- drastically reducing Bluetooth disconnects on hid-nintendo driven
devices (Daniel J. Ogorchock)
- lazy initialization of battery interfaces in wacom driver (Jason
Gerecke)
- generic support for all Kye tablets (David Yang)
- proper rumble queue overrun handling in hid-nintendo (Daniel J.
Ogorchock)
- support for ADC measurement in logitech-hidpp driver (Bastien Nocera)
- reset GPIO support in i2c-hid (Hans de Goede)
- improved handling of generic "Digitizer" usage (Jason Gerecke)
- support for KEY_CAMERA_FOCUS (Feng Qi)
- quirks for Apple Geyser 3 and Apple Geyser 4 (Alex Henrie)
- assorted functional fixes and device ID additions
* tag 'for-linus-2023042601' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid: (54 commits)
HID: amd_sfh: Fix max supported HID devices
HID: wacom: generic: Set battery quirk only when we see battery data
HID: wacom: Lazy-init batteries
HID: Ignore battery for ELAN touchscreen on ROG Flow X13 GV301RA
HID: asus: explicitly include linux/leds.h
HID: lg-g15: explicitly include linux/leds.h
HID: steelseries: explicitly include linux/leds.h
HID: apple: Set the tilde quirk flag on the Geyser 3
HID: apple: explicitly include linux/leds.h
HID: mcp2221: fix get and get_direction for gpio
HID: mcp2221: fix report layout for gpio get
HID: wacom: Set a default resolution for older tablets
HID: i2c-hid-of: Add reset GPIO support to i2c-hid-of
HID: i2c-hid-of: Allow using i2c-hid-of on non OF platforms
HID: i2c-hid-of: Consistenly use dev local variable in probe()
HID: kye: Fix rdesc for kye tablets
HID: amd_sfh: Support for additional light sensor
HID: amd_sfh: Handle "no sensors" enabled for SFH1.1
HID: amd_sfh: Increase sensor command timeout for SFH1.1
HID: amd_sfh: Correct the stop all command
...
At this time, it's an interesting mixture of changes for both old and
new stuff. Majority of changes are about ASoC (lots of systematic
changes for converting remove callbacks to void, and cleanups), while
we got the fixes and the enhancements of very old PCI cards, too.
Here are some highlights:
ALSA/ASoC Core:
- Continued effort of more ASoC core cleanups
- Minor improvements for XRUN handling in indirect PCM helpers
- Code refactoring of PCM core code
ASoC:
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol
- Hibernation support for CS35L45
- More DT binding conversions
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733
ALSA:
- Lots of works for legacy emu10k1 and ymfpci PCI drivers
- PCM kselftest fixes and enhancements
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAmRJBkcOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE8S/Q/+If1MEW+XXYushYU6VcWbHevwsRwmUZPtIJzT
Nx4PE4Ia8rX++GbsH5Iqt6tmldbb/vMbwy7TGbn/Q4ju2cO5qGT4/qgWdC2TuUX6
icWRHslJ//TffSd/yh1g6JIKBlcCmQeYcw5KoaLzBE/qO3iRP0IQUc17gkLKYNni
u1XOGrU9zuh3uwz+UQFfUhB8NlKhD3HVYjwrbd3gwcDsE/0G+q76A/wWghfA+RAb
0ruDhIDtJoem6PKQTwC05UgDpmwd7XFAIgcbOu7E7t/lr4YKwQZhQmJI0IexCR9i
aLPqg3Q/6S+WFKpcPcGCHNljqRNp9lUlIXak+NsbCZ7mXKE6tALywAtuB57sZ0sO
QM1YrmUAsi0RaD7foPcT64CAq8IVQ6aLWusXwvcxzzvJuHvJdeiBKiI5gmF0GqMu
ZLpAMGCoKxft4Il2r+BPTbLHe57uHmp1fKMWUK4NfyIUW7jEdKmf7ALSSJmvcqwU
+R0PXikc0lOo1GH9ZQojpVNFwV8XLOd2CWaNfoPl85A0+ngYhTY3ZRQ3qbYWHlU6
zXAu06IUOef5phsn3zerJ1orV729Xdjf+JUbL0uxJvANsX6R93CQWw0tgrUI62EZ
0vhoOp3PPZUKmDKvUo/NtIyuvSGREg3wDug5tiDOb53Qwfr2VIThJa999kNzH76c
lHUfrv4=
=7XGG
-----END PGP SIGNATURE-----
Merge tag 'sound-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound updates from Takashi Iwai:
"At this time, it's an interesting mixture of changes for both old and
new stuff. Majority of changes are about ASoC (lots of systematic
changes for converting remove callbacks to void, and cleanups), while
we got the fixes and the enhancements of very old PCI cards, too.
Here are some highlights:
ALSA/ASoC Core:
- Continued effort of more ASoC core cleanups
- Minor improvements for XRUN handling in indirect PCM helpers
- Code refactoring of PCM core code
ASoC:
- Continued feature and simplification work on SOF, including
addition of a no-DSP mode for bringup, HDA MLink and extensions to
the IPC4 protocol
- Hibernation support for CS35L45
- More DT binding conversions
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas
R-Car Gen4, Rockchip RK3588 and TI TAS5733
ALSA:
- Lots of works for legacy emu10k1 and ymfpci PCI drivers
- PCM kselftest fixes and enhancements"
* tag 'sound-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (586 commits)
ALSA: emu10k1: use high-level I/O in set_filterQ()
ALSA: emu10k1: use high-level I/O functions also during init
ALSA: emu10k1: fix error handling in snd_audigy_i2c_volume_put()
ALSA: emu10k1: don't stop DSP in _snd_emu10k1_{,audigy_}init_efx()
ALSA: emu10k1: fix SNDRV_EMU10K1_IOCTL_SINGLE_STEP
ALSA: emu10k1: skip Sound Blaster-specific hacks for E-MU cards
ALSA: emu10k1: fixup DSP defines
ALSA: emu10k1: pull in some register definitions from kX-project
ALSA: emu10k1: remove some bogus defines
ALSA: emu10k1: eliminate some unused defines
ALSA: emu10k1: fix lineup of EMU_HANA_* defines
ALSA: emu10k1: comment updates
ALSA: emu10k1: fix snd_emu1010_fpga_read() input masking for rev2 cards
ALSA: emu10k1: remove unused emu->pcm_playback_efx_substream field
ALSA: emu10k1: remove unused `resume` parameter from snd_emu10k1_init()
ALSA: emu10k1: minor optimizations
ALSA: emu10k1: remove remaining cruft from snd_emu10k1_emu1010_init()
ALSA: emu10k1: remove apparently pointless EMU_HANA_OPTION_CARDS reads
ALSA: emu10k1: remove apparently pointless FPGA reads
ALSA: emu10k1: stop doing weird things with HCFG in snd_emu10k1_emu1010_init()
...
Two series:
- Reorganize how the hardware page table objects are managed,
particularly their destruction flow. Increase the selftest
test coverage in this area by creating a more complete mock
iommu driver.
This is preparation to add a replace operation for HWPT binding,
which is done but waiting for the VFIO parts to complete so there
is a user.
- Split the iommufd support for "access" to make it two step - allocate
an access then link it to an IOAS. Update VFIO and have VFIO always
create an access even for the VFIO mdevs that never do DMA.
This is also preperation for the replace VFIO series that will allow
replace to work on access types as well.
Three minor fixes:
- Sykzaller found the selftest code didn't check for overflow when
processing user VAs
- smatch noted a .data item should have been static
- Add a selftest that reproduces a syzkaller bug for batch carry already
fixed in rc
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCZEfgkQAKCRCFwuHvBreF
YRw7AQD5BhiTzdTMlMPAasihb2ZITYs3MyiZrxICZuZf3CksLAD9FcO3aYPq01ge
7yHF+LeWHYq1dR6x7YTvcbc3apFFIgA=
=0j6E
-----END PGP SIGNATURE-----
Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd
Pull iommufd updates from Jason Gunthorpe:
"Two series:
- Reorganize how the hardware page table objects are managed,
particularly their destruction flow. Increase the selftest test
coverage in this area by creating a more complete mock iommu
driver.
This is preparation to add a replace operation for HWPT binding,
which is done but waiting for the VFIO parts to complete so there
is a user.
- Split the iommufd support for "access" to make it two step -
allocate an access then link it to an IOAS. Update VFIO and have
VFIO always create an access even for the VFIO mdevs that never do
DMA.
This is also preperation for the replace VFIO series that will
allow replace to work on access types as well.
Three minor fixes:
- Sykzaller found the selftest code didn't check for overflow when
processing user VAs
- smatch noted a .data item should have been static
- Add a selftest that reproduces a syzkaller bug for batch carry
already fixed in rc"
* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (21 commits)
iommufd/selftest: Cover domain unmap with huge pages and access
iommufd/selftest: Set varaiable mock_iommu_device storage-class-specifier to static
vfio: Check the presence for iommufd callbacks in __vfio_register_dev()
vfio/mdev: Uses the vfio emulated iommufd ops set in the mdev sample drivers
vfio-iommufd: Make vfio_iommufd_emulated_bind() return iommufd_access ID
vfio-iommufd: No need to record iommufd_ctx in vfio_device
iommufd: Create access in vfio_iommufd_emulated_bind()
iommu/iommufd: Pass iommufd_ctx pointer in iommufd_get_ioas()
iommufd/selftest: Catch overflow of uptr and length
iommufd/selftest: Add a selftest for iommufd_device_attach() with a hwpt argument
iommufd/selftest: Make selftest create a more complete mock device
iommufd/selftest: Rename the remaining mock device_id's to stdev_id
iommufd/selftest: Rename domain_id to hwpt_id for FIXTURE iommufd_mock_domain
iommufd/selftest: Rename domain_id to stdev_id for FIXTURE iommufd_ioas
iommufd/selftest: Rename the sefltest 'device_id' to 'stdev_id'
iommufd: Make iommufd_hw_pagetable_alloc() do iopt_table_add_domain()
iommufd: Move iommufd_device to iommufd_private.h
iommufd: Move ioas related HWPT destruction into iommufd_hw_pagetable_destroy()
iommufd: Consistently manage hwpt_item
iommufd: Add iommufd_lock_obj() around the auto-domains hwpts
...
Core
----
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances.
- Reduce compound page head access for zero-copy data transfers.
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when possible.
- Threaded NAPI improvements, adding defer skb free support and unneeded
softirq avoidance.
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking.
- Add lockless accesses annotation to sk_err[_soft].
- Optimize again the skb struct layout.
- Extends the skb drop reasons to make it usable by multiple
subsystems.
- Better const qualifier awareness for socket casts.
BPF
---
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and variable-sized
accesses.
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward.
- Add more precise memory usage reporting for all BPF map types.
- Adds support for using {FOU,GUE} encap with an ipip device operating
in collect_md mode and add a set of BPF kfuncs for controlling encap
params.
- Allow BPF programs to detect at load time whether a particular kfunc
exists or not, and also add support for this in light skeleton.
- Bigger batch of BPF verifier improvements to prepare for upcoming BPF
open-coded iterators allowing for less restrictive looping capabilities.
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF
programs to NULL-check before passing such pointers into kfunc.
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in
local storage maps.
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps.
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree.
- Add BPF verifier support for ST instructions in convert_ctx_access()
which will help new -mcpu=v4 clang flag to start emitting them.
- Add ARM32 USDT support to libbpf.
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations.
Protocols
---------
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address.
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition.
- Add the handshake upcall mechanism, allowing the user-space
to implement generic TLS handshake on kernel's behalf.
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures.
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers.
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction.
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore.
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter
---------
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged.
- Update bridge netfilter and ovs conntrack helpers to handle
IPv6 Jumbo packets properly, i.e. fetch the packet length
from hop-by-hop extension header. This is needed for BIT TCP
support.
- The iptables 32bit compat interface isn't compiled in by default
anymore.
- Move ip(6)tables builtin icmp matches to the udptcp one.
This has the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used.
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device.
Driver API
----------
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time.
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them.
- Allow the page_pool to directly recycle the pages from safely
localized NAPI.
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization.
- Add YNL support for user headers and struct attrs.
- Add partial YNL specification for devlink.
- Add partial YNL specification for ethtool.
- Add tc-mqprio and tc-taprio support for preemptible traffic classes.
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device.
- Add basic LED support for switch/phy.
- Add NAPI documentation, stop relaying on external links.
- Convert dsa_master_ioctl() to netdev notifier. This is a preparatory
work to make the hardware timestamping layer selectable by user
space.
- Add transceiver support and improve the error messages for CAN-FD
controllers.
New hardware / drivers
----------------------
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers
-------
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors.
- add support for configuring max SDU for each Tx queue.
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only
on shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll.
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates.
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices
(e.g. MAC address from efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRI/mUSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkgO0QAJGxpuN67YgYV0BIM+/atWKEEexJYG7B
9MMpU4jMO3EW/pUS5t7VRsBLUybLYVPmqCZoHodObDfnu59jiPOegb6SikJv/ZwJ
Zw62PVk5MvDnQjlu4e6kDcGwkplteN08TlgI+a49BUTedpdFitrxHAYGW8f2fRO6
cK2XSld+ZucMoym5vRwf8yWS1BwdxnslPMxDJ+/8ZbWBZv44qAnG2vMB/kIx7ObC
Vel/4m6MzTwVsLYBsRvcwMVbNNlZ9GuhztlTzEbfGA4ZhTadIAMgb5VTWXB84Ws7
Aic5wTdli+q+x6/2cxhbyeoVuB9HHObYmLBAciGg4GNljP5rnQBY3X3+KVZ/x9TI
HQB7CmhxmAZVrO9pLARFV+ECrMTH2/dy3NyrZ7uYQ3WPOXJi8hJZjOTO/eeEGL7C
eTjdz0dZBWIBK2gON/6s4nExXVQUTEF2ZsPi52jTTClKjfe5pz/ddeFQIWaY1DTm
pInEiWPAvd28JyiFmhFNHsuIBCjX/Zqe2JuMfMBeBibDAC09o/OGdKJYUI15AiRf
F46Pdb7use/puqfrYW44kSAfaPYoBiE+hj1RdeQfen35xD9HVE4vdnLNeuhRlFF9
aQfyIRHYQofkumRDr5f8JEY66cl9NiKQ4IVW1xxQfYDNdC6wQqREPG1md7rJVMrJ
vP7ugFnttneg
=ITVa
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Introduce a config option to tweak MAX_SKB_FRAGS. Increasing the
default value allows for better BIG TCP performances
- Reduce compound page head access for zero-copy data transfers
- RPS/RFS improvements, avoiding unneeded NET_RX_SOFTIRQ when
possible
- Threaded NAPI improvements, adding defer skb free support and
unneeded softirq avoidance
- Address dst_entry reference count scalability issues, via false
sharing avoidance and optimize refcount tracking
- Add lockless accesses annotation to sk_err[_soft]
- Optimize again the skb struct layout
- Extends the skb drop reasons to make it usable by multiple
subsystems
- Better const qualifier awareness for socket casts
BPF:
- Add skb and XDP typed dynptrs which allow BPF programs for more
ergonomic and less brittle iteration through data and
variable-sized accesses
- Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward
- Add more precise memory usage reporting for all BPF map types
- Adds support for using {FOU,GUE} encap with an ipip device
operating in collect_md mode and add a set of BPF kfuncs for
controlling encap params
- Allow BPF programs to detect at load time whether a particular
kfunc exists or not, and also add support for this in light
skeleton
- Bigger batch of BPF verifier improvements to prepare for upcoming
BPF open-coded iterators allowing for less restrictive looping
capabilities
- Rework RCU enforcement in the verifier, add kptr_rcu and enforce
BPF programs to NULL-check before passing such pointers into kfunc
- Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and
in local storage maps
- Enable RCU semantics for task BPF kptrs and allow referenced kptr
tasks to be stored in BPF maps
- Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree
- Add BPF verifier support for ST instructions in
convert_ctx_access() which will help new -mcpu=v4 clang flag to
start emitting them
- Add ARM32 USDT support to libbpf
- Improve bpftool's visual program dump which produces the control
flow graph in a DOT format by adding C source inline annotations
Protocols:
- IPv4: Allow adding to IPv4 address a 'protocol' tag. Such value
indicates the provenance of the IP address
- IPv6: optimize route lookup, dropping unneeded R/W lock acquisition
- Add the handshake upcall mechanism, allowing the user-space to
implement generic TLS handshake on kernel's behalf
- Bridge: support per-{Port, VLAN} neighbor suppression, increasing
resilience to nodes failures
- SCTP: add support for Fair Capacity and Weighted Fair Queueing
schedulers
- MPTCP: delay first subflow allocation up to its first usage. This
will allow for later better LSM interaction
- xfrm: Remove inner/outer modes from input/output path. These are
not needed anymore
- WiFi:
- reduced neighbor report (RNR) handling for AP mode
- HW timestamping support
- support for randomized auth/deauth TA for PASN privacy
- per-link debugfs for multi-link
- TC offload support for mac80211 drivers
- mac80211 mesh fast-xmit and fast-rx support
- enable Wi-Fi 7 (EHT) mesh support
Netfilter:
- Add nf_tables 'brouting' support, to force a packet to be routed
instead of being bridged
- Update bridge netfilter and ovs conntrack helpers to handle IPv6
Jumbo packets properly, i.e. fetch the packet length from
hop-by-hop extension header. This is needed for BIT TCP support
- The iptables 32bit compat interface isn't compiled in by default
anymore
- Move ip(6)tables builtin icmp matches to the udptcp one. This has
the advantage that icmp/icmpv6 match doesn't load the
iptables/ip6tables modules anymore when iptables-nft is used
- Extended netlink error report for netdevice in flowtables and
netdev/chains. Allow for incrementally add/delete devices to netdev
basechain. Allow to create netdev chain without device
Driver API:
- Remove redundant Device Control Error Reporting Enable, as PCI core
has already error reporting enabled at enumeration time
- Move Multicast DB netlink handlers to core, allowing devices other
then bridge to use them
- Allow the page_pool to directly recycle the pages from safely
localized NAPI
- Implement lockless TX queue stop/wake combo macros, allowing for
further code de-duplication and sanitization
- Add YNL support for user headers and struct attrs
- Add partial YNL specification for devlink
- Add partial YNL specification for ethtool
- Add tc-mqprio and tc-taprio support for preemptible traffic classes
- Add tx push buf len param to ethtool, specifies the maximum number
of bytes of a transmitted packet a driver can push directly to the
underlying device
- Add basic LED support for switch/phy
- Add NAPI documentation, stop relaying on external links
- Convert dsa_master_ioctl() to netdev notifier. This is a
preparatory work to make the hardware timestamping layer selectable
by user space
- Add transceiver support and improve the error messages for CAN-FD
controllers
New hardware / drivers:
- Ethernet:
- AMD/Pensando core device support
- MediaTek MT7981 SoC
- MediaTek MT7988 SoC
- Broadcom BCM53134 embedded switch
- Texas Instruments CPSW9G ethernet switch
- Qualcomm EMAC3 DWMAC ethernet
- StarFive JH7110 SoC
- NXP CBTX ethernet PHY
- WiFi:
- Apple M1 Pro/Max devices
- RealTek rtl8710bu/rtl8188gu
- RealTek rtl8822bs, rtl8822cs and rtl8821cs SDIO chipset
- Bluetooth:
- Realtek RTL8821CS, RTL8851B, RTL8852BS
- Mediatek MT7663, MT7922
- NXP w8997
- Actions Semi ATS2851
- QTI WCN6855
- Marvell 88W8997
- Can:
- STMicroelectronics bxcan stm32f429
Drivers:
- Ethernet NICs:
- Intel (1G, icg):
- add tracking and reporting of QBV config errors
- add support for configuring max SDU for each Tx queue
- Intel (100G, ice):
- refactor mailbox overflow detection to support Scalable IOV
- GNSS interface optimization
- Intel (i40e):
- support XDP multi-buffer
- nVidia/Mellanox:
- add the support for linux bridge multicast offload
- enable TC offload for egress and engress MACVLAN over bond
- add support for VxLAN GBP encap/decap flows offload
- extend packet offload to fully support libreswan
- support tunnel mode in mlx5 IPsec packet offload
- extend XDP multi-buffer support
- support MACsec VLAN offload
- add support for dynamic msix vectors allocation
- drop RX page_cache and fully use page_pool
- implement thermal zone to report NIC temperature
- Netronome/Corigine:
- add support for multi-zone conntrack offload
- Solarflare/Xilinx:
- support offloading TC VLAN push/pop actions to the MAE
- support TC decap rules
- support unicast PTP
- Other NICs:
- Broadcom (bnxt): enforce software based freq adjustments only on
shared PHC NIC
- RealTek (r8169): refactor to addess ASPM issues during NAPI poll
- Micrel (lan8841): add support for PTP_PF_PEROUT
- Cadence (macb): enable PTP unicast
- Engleder (tsnep): add XDP socket zero-copy support
- virtio-net: implement exact header length guest feature
- veth: add page_pool support for page recycling
- vxlan: add MDB data path support
- gve: add XDP support for GQI-QPL format
- geneve: accept every ethertype
- macvlan: allow some packets to bypass broadcast queue
- mana: add support for jumbo frame
- Ethernet high-speed switches:
- Microchip (sparx5): Add support for TC flower templates
- Ethernet embedded switches:
- Broadcom (b54):
- configure 6318 and 63268 RGMII ports
- Marvell (mv88e6xxx):
- faster C45 bus scan
- Microchip:
- lan966x:
- add support for IS1 VCAP
- better TX/RX from/to CPU performances
- ksz9477: add ETS Qdisc support
- ksz8: enhance static MAC table operations and error handling
- sama7g5: add PTP capability
- NXP (ocelot):
- add support for external ports
- add support for preemptible traffic classes
- Texas Instruments:
- add CPSWxG SGMII support for J7200 and J721E
- Intel WiFi (iwlwifi):
- preparation for Wi-Fi 7 EHT and multi-link support
- EHT (Wi-Fi 7) sniffer support
- hardware timestamping support for some devices/firwmares
- TX beacon protection on newer hardware
- Qualcomm 802.11ax WiFi (ath11k):
- MU-MIMO parameters support
- ack signal support for management packets
- RealTek WiFi (rtw88):
- SDIO bus support
- better support for some SDIO devices (e.g. MAC address from
efuse)
- RealTek WiFi (rtw89):
- HW scan support for 8852b
- better support for 6 GHz scanning
- support for various newer firmware APIs
- framework firmware backwards compatibility
- MediaTek WiFi (mt76):
- P2P support
- mesh A-MSDU support
- EHT (Wi-Fi 7) support
- coredump support"
* tag 'net-next-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2078 commits)
net: phy: hide the PHYLIB_LEDS knob
net: phy: marvell-88x2222: remove unnecessary (void*) conversions
tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp.
net: amd: Fix link leak when verifying config failed
net: phy: marvell: Fix inconsistent indenting in led_blink_set
lan966x: Don't use xdp_frame when action is XDP_TX
tsnep: Add XDP socket zero-copy TX support
tsnep: Add XDP socket zero-copy RX support
tsnep: Move skb receive action to separate function
tsnep: Add functions for queue enable/disable
tsnep: Rework TX/RX queue initialization
tsnep: Replace modulo operation with mask
net: phy: dp83867: Add led_brightness_set support
net: phy: Fix reading LED reg property
drivers: nfc: nfcsim: remove return value check of `dev_dir`
net: phy: dp83867: Remove unnecessary (void*) conversions
net: ethtool: coalesce: try to make user settings stick twice
net: mana: Check if netdev/napi_alloc_frag returns single page
net: mana: Rename mana_refill_rxoob and remove some empty lines
net: veth: add page_pool stats
...
- Don't advertisze XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
- Overhaul the AMX selftests to improve coverage and cleanup the test
- Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGt50SHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5MskP/2PhSrdgHxCwfpqpdVe/q5OWwFuhn3wG
f5QKMpEBg4wJFeIE3eGJEaDlg776nWtWDNgUmqdjoZ8vyyadkPX9CV2Y2Hq0M7Tw
d0gKPjQrz2BavyDYoPNfs4pfshs4EvDTswBkhdAt8KTZhGZosJOywQIp61V3ePqr
1rDP6C4+CmwTRAK0f7egslyJ2pZXiUcvhITvzx8XhIAQh6nEK4gUZ/l3hLmg38kD
Af23kiLnP8lHUUx4BQtRAnTw0SZXJ8DcKtoFkzEH8mdj4g6EqXpxy48zuyZcqWVi
4XIFr+WECPsV5gdqWN9rMDqIG2ib+2heKDmcdUptcVuvr1ktv0reQybmgVck4CKX
fTAdu86/LBaQmIHwNHaNFPwdUby4QQZ8ajafPC62oc+B6N1lQg8bbCwnvO6KGlGl
FaQTnzaZq7ft4tfQRXOMu1AbLZLK7dIqJHHhxR3MkBkd4MAcZ1bVKkvlJLqsOKNw
TEsreXErY7AsegZK73Rn4IN/CJGBof5bZ2NIchmiN+0UfMsd9zGn66Als6oRNh4E
tRUhFONPIEmydy9UB50qe6b98ElB6R++opZbvkVW2hy8lMy3iJrCvUbOs1nx3wbn
cxvIuTfw/dAFf70S03/zudf7lYHs2wKV1rrIAebyTd4NnvWdVB8OaSHgZswMgVjb
UzzQfnQ+u9so
=BY10
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-selftests-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM selftests, and an AMX/XCR0 bugfix, for 6.4:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
- Overhaul the AMX selftests to improve coverage and cleanup the test
- Misc cleanups
- Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
validate PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- Misc cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEEMHr+pfEFOIzK+KY1YJEiAU0MEvkFAmRGtd4SHHNlYW5qY0Bn
b29nbGUuY29tAAoJEGCRIgFNDBL5Z9kP/i3WZ40hevvQvB/5cEpxxmxYDwCYnnjM
hiQgK5jT4SrMTmVjLgkNdI2PogQoS4CX+GC7lcA9bvse84hjuPvgOflb2B+p2UQi
Ytbr9g/tfKNIpnKIk9mcPcSObN9vm2Kgt7n28rtPrHWj89eQzgc66eijqdpKBLxA
c3crVR8krwYAQK0tmzHq1+H6hB369YbHAHyTTRRI/bNWnqKblnvUbt0NL2aBusa9
rNMaOdRtinLpy2dmuX/b3japRB8QTnlf7zpPIF4cBEhbYXy5woClZpf1D2fCA6Er
XFbEoYawMVd9UeJYbW4z5yErLT83eYoGp4U0eFXWp6fvh8nZlgCGvBKE9g4mmqwj
aSLaTR5eVN2qlw6jXVeg3unCo8Eyl36AwYwve2L6sFmBvZvNV5iz2eQ7rrOe4oE3
dnTUaLQ8I2SVg04MbYmCq5W+frTL/I7kqNpbccL1Z3R5WO4y5gz63mug6NfLIvhR
t45TAIaifxBfcXQsBZM3v2KUK/xQrD3AbJmFKh54L2CKqiGaNWsMLX+6NZ7LZWgf
8rEqsVkkQDgF7z8eXai4TR26nYfSX6g9gDqtOH73L87aJ7PJk5cRoDWQ1sWs1e/l
4HA/L0Bo/3pnKAa0ZWxJOixmzqY49gNQf3dj8gt3jk3y2ijbAivshiSpPBmIxn0u
QLeOf/LGvipl
=m18F
-----END PGP SIGNATURE-----
Merge tag 'kvm-x86-pmu-6.4' of https://github.com/kvm-x86/linux into HEAD
KVM x86 PMU changes for 6.4:
- Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, and overhaul the vmx_pmu_caps selftest to better
validate PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- Misc cleanups and fixes
- Update the ACPICA code in the kernel to upstream revision 20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table (Jessica
Clarke).
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang).
* Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID()
(Xiongfeng Wang).
* Add missing tables to astable (Pedro Falcato).
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen).
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski).
* Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary).
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore).
* Add support for ClockInput resource (v6.5) (Niyas Sait).
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L).
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L).
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein).
* Make ACPICA code support flexible arrays properly (Kees Cook).
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red).
* Add os specific support for Zephyr RTOS to ACPICA (Najumon).
* Update version to 20230331 (Bob Moore).
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne).
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu).
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen).
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser).
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker).
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko).
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan).
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue).
- Add CXL error types to the error injection code in APEI (Tony Luck).
- Refactor acpi_data_prop_read_single() (Andy Shevchenko).
- Fix two issues in the ACPI SBS driver (Armin Wolf).
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi).
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki).
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede).
- Replace irqdomain.h include with struct declarations in ACPI headers
and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring).
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K).
- Update the pm_profile sysfs attribute documentation (Rafael Wysocki).
- Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmRGvLQSHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxoV4P/jxWGAdldtgXORR58lKGbSs6lx/0Y+SF
iI7qK88NcbcbWS+a3PqRrisNkjN17rjzajfp28Ue2CXFxzwTViyw6KYELbPJ6N/h
/3prem++jKgf7qiueDJG/AyO8N2+Z+yciubhxdMiK1+c1dZM2ycwSyBzJgYocpXn
fH+YFPhxE7c8Z8doBrTOZjRuU4SIEKCmxo3c5BbCuyVZkbqCRdQMIDCiBJgLTmbo
z4pu9OFhAamB8Cth2QFfRbZWqmuY71Gt54+c4ITPPV2ALlLUYODyHZoSISBJULp3
k0lU/hMCD+i1WRwv+Bb6of7pJPM4Lqp+wOirAtiiibjE9LRxVTNyOUAHLXbx+t2V
PN8JKVJVCLaZO6TRELgFIL4nh4aBdOtr4BuaLnClZho9bG68jEkc8grnOZYhFYtM
66BuJBW30rwwGY4N5VSZGzFFR7l2qaHIOSHdq681bxQ3e6erFEeIc5jQVEOKgCqd
XWdELVkqf3CnCX0lgonj+AgoeCqOpYdrNcWqMsJ+6OyQRoFhLFltDSPeJm9gHGO7
X+qCQru4ZgEDKexWKpGgH9x8AllDKbh/ApyyumXgsQOsRocVdoNaf+yCBlaaDyqu
UYif6hgFYnIxF2Fg1r/POgHDXFobE4iUTHcUU1V2QhuByc4PkN9ljKsHeC2FgVUz
JityWRiMABNv
=O61K
-----END PGP SIGNATURE-----
Merge tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"These update the ACPICA code in the kernel to upstream revision
20230331, fix the ACPI SBS driver and the evaluation of the _PDC
method on Xen dom0 in the ACPI processor driver, update the ACPI
driver for Intel SoCs and clean up code in multiple places.
Specifics:
- Update the ACPICA code in the kernel to upstream revision 20230331
including the following changes:
* Delete bogus node_array array of pointers from AEST table
(Jessica Clarke)
* Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang)
* Add missing macro ACPI_FUNCTION_TRACE() for
acpi_ns_repair_HID() (Xiongfeng Wang)
* Add missing tables to astable (Pedro Falcato)
* Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen)
* Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski)
* Add support for Arm's MPAM ACPI table version 2 (Hesham
Almatary)
* Update all copyrights/signons in ACPICA to 2023 (Bob Moore)
* Add support for ClockInput resource (v6.5) (Niyas Sait)
* Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L)
* Add structure definitions for the RISC-V RHCT ACPI table (Sunil
V L)
* Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein)
* Make ACPICA code support flexible arrays properly (Kees Cook)
* Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red)
* Add os specific support for Zephyr RTOS to ACPICA (Najumon)
* Update version to 20230331 (Bob Moore)
- Fix evaluating the _PDC ACPI control method when running as Xen
dom0 (Roger Pau Monne)
- Use platform devices to load ACPI PPC and PCC drivers (Petr Pavlu)
- Check for null return of devm_kzalloc() in fch_misc_setup() (Kang
Chen)
- Log a message if enable_irq_wake() fails for the ACPI SCI (Simon
Gaiser)
- Initialize the correct IOMMU fwspec while parsing ACPI VIOT
(Jean-Philippe Brucker)
- Amend indentation and prefix error messages with FW_BUG in the ACPI
SPCR parsing code (Andy Shevchenko)
- Enable ACPI sysfs support for CCEL records (Kuppuswamy
Sathyanarayanan)
- Make the APEI error injection code warn on invalid arguments when
explicitly indicated by platform (Shuai Xue)
- Add CXL error types to the error injection code in APEI (Tony Luck)
- Refactor acpi_data_prop_read_single() (Andy Shevchenko)
- Fix two issues in the ACPI SBS driver (Armin Wolf)
- Replace ternary operator with min_t() in the generic ACPI thermal
zone driver (Jiangshan Yi)
- Ensure that ACPI notify handlers are not running after removal and
clean up code in acpi_sb_notify() (Rafael Wysocki)
- Remove register_backlight_delay module option and code and remove
quirks for false-positive backlight control support advertised on
desktop boards (Hans de Goede)
- Replace irqdomain.h include with struct declarations in ACPI
headers and update several pieces of code previously including of.h
implicitly through those headers (Rob Herring)
- Fix acpi_evaluate_dsm_typed() redefinition error (Kiran K)
- Update the pm_profile sysfs attribute documentation (Rafael
Wysocki)
- Add 80862289 ACPI _HID for second PWM controller on Cherry Trail to
the ACPI driver for Intel SoCs (Hans de Goede)"
* tag 'acpi-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (64 commits)
ACPI: LPSS: Add 80862289 ACPI _HID for second PWM controller on Cherry Trail
ACPI: bus: Ensure that notify handlers are not running after removal
ACPI: bus: Add missing braces to acpi_sb_notify()
ACPI: video: Remove desktops without backlight DMI quirks
ACPI: video: Remove register_backlight_delay module option and code
ACPI: Replace irqdomain.h include with struct declarations
fpga: lattice-sysconfig-spi: Add explicit include for of.h
tpm: atmel: Add explicit include for of.h
virtio-mmio: Add explicit include for of.h
pata: ixp4xx: Add explicit include for of.h
ata: pata_macio: Add explicit include of irqdomain.h
serial: 8250_tegra: Add explicit include for of.h
net: rfkill-gpio: Add explicit include for of.h
staging: iio: resolver: ad2s1210: Add explicit include for of.h
iio: adc: ad7292: Add explicit include for of.h
ACPICA: Update version to 20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
...
User processes register an address and bit pair for events. If the same
address and bit pair are registered multiple times in the same process,
it can cause undefined behavior when events are enabled/disabled.
When more than one are used, the bit could be turned off by another
event being disabled, while the original event is still enabled.
Prevent undefined behavior by checking the current mm to see if any
event has already been registered for the address and bit pair. Return
EADDRINUSE back to the user process if it's already being used.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-4-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
If an event is enabled and a user process unregisters user_events, the
bit is left set. Fix this by always clearing the bit in the user process
if unregister is successful.
Update abi self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-3-beaub@linux.microsoft.com
Suggested-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
The write index indicates which event the data is for and accesses a
per-file array. The index is passed by user processes during write()
calls as the first 4 bytes. Ensure that it cannot be negative by
returning -EINVAL to prevent out of bounds accesses.
Update ftrace self-test to ensure this occurs properly.
Link: https://lkml.kernel.org/r/20230425225107.8525-2-beaub@linux.microsoft.com
Fixes: 7f5a08c79d ("user_events: Add minimal support for trace_event into ftrace")
Reported-by: Doug Cook <dcook@linux.microsoft.com>
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relcations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they fail
to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up
in the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context
of different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource is
installed, i.e. timekeeping is still tick based and the tick period
advances from there.
The early enablement of sched_clock() broke this alignement as the time
accumulated by sched_clock() is taken into account when timekeeping is
initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
multiple of tick periods, which breaks applications which relied on
that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements
- Cure the concurrent writer race for idle and IO sleeptime statistics
The statitic values which are exposed via /proc/stat are updated from
the CPU local idle exit and remotely by cpufreq, but that happens
without any form of serialization. As a consequence sleeptimes can be
accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
- Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
idle exit updates. As a consequence the readout may result in random
and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing the
remote runqueues nr_iowait counter. The latter is impossible to fix,
so the only way to deal with that is to document it properly and to
remove the assertion in the selftest which triggers occasionally due
to that.
- Restructure struct tick_sched for better cache layout
- Some small cleanups and a better cache layout for struct tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU timers
For unknown reason the introduction of the timer_wait_running() callback
missed to fixup posix CPU timers, which went unnoticed for almost four
years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
out from hard interrupt context to task work, which is handled before
returning to user space or to a VM. The expiry mechanism moves the
expired timers to a stack local list head with sighand lock held. Once
sighand is dropped the task can be preempted and a task which wants to
delete a timer will spin-wait until the expiry task is scheduled back
in. In the worst case this will end up in a livelock when the preempting
task and the expiry task are pinned on the same CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which uses
a per CPU timer-base expiry lock which is held by the expiry code and the
task waiting for the timer function to complete blocks on that lock.
This does not work in the same way for posix CPU timers as there is no
timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry lock
can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry task
hold it accross the expiry function and let the deleting task which
waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra mutex_lock()/unlock()
pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents the
livelock and cures the problem for RT and !RT systems.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
nIJuHl8mxSetag==
=SVnj
-----END PGP SIGNATURE-----
Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timers and timekeeping updates from Thomas Gleixner:
- Improve the VDSO build time checks to cover all dynamic relocations
VDSO does not allow dynamic relocations, but the build time check is
incomplete and fragile.
It's based on architectures specifying the relocation types to search
for and does not handle R_*_NONE relocation entries correctly.
R_*_NONE relocations are injected by some GNU ld variants if they
fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
R_*_NONE relocations must be ignored by dynamic loaders, so they
should be ignored in the build time check too.
Remove the architecture specific relocation types to check for and
validate strictly that no other relocations than R_*_NONE end up in
the VSDO .so file.
- Prefer signal delivery to the current thread for
CLOCK_PROCESS_CPUTIME_ID based posix-timers
Such timers prefer to deliver the signal to the main thread of a
process even if the context in which the timer expires is the current
task. This has the downside that it might wake up an idle thread.
As there is no requirement or guarantee that the signal has to be
delivered to the main thread, avoid this by preferring the current
task if it is part of the thread group which shares sighand.
This not only avoids waking idle threads, it also distributes the
signal delivery in case of multiple timers firing in the context of
different threads close to each other better.
- Align the tick period properly (again)
For a long time the tick was starting at CLOCK_MONOTONIC zero, which
allowed users space applications to either align with the tick or to
place a periodic computation so that it does not interfere with the
tick. The alignement of the tick period was more by chance than by
intention as the tick is set up before a high resolution clocksource
is installed, i.e. timekeeping is still tick based and the tick
period advances from there.
The early enablement of sched_clock() broke this alignement as the
time accumulated by sched_clock() is taken into account when
timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
not longer a multiple of tick periods, which breaks applications
which relied on that behaviour.
Cure this by aligning the tick starting point to the next multiple of
tick periods, i.e 1000ms/CONFIG_HZ.
- A set of NOHZ fixes and enhancements:
* Cure the concurrent writer race for idle and IO sleeptime
statistics
The statitic values which are exposed via /proc/stat are updated
from the CPU local idle exit and remotely by cpufreq, but that
happens without any form of serialization. As a consequence
sleeptimes can be accounted twice or worse.
Prevent this by restricting the accumulation writeback to the CPU
local idle exit and let the remote access compute the accumulated
value.
* Protect idle/iowait sleep time with a sequence count
Reading idle/iowait sleep time, e.g. from /proc/stat, can race
with idle exit updates. As a consequence the readout may result
in random and potentially going backwards values.
Protect this by a sequence count, which fixes the idle time
statistics issue, but cannot fix the iowait time problem because
iowait time accounting races with remote wake ups decrementing
the remote runqueues nr_iowait counter. The latter is impossible
to fix, so the only way to deal with that is to document it
properly and to remove the assertion in the selftest which
triggers occasionally due to that.
* Restructure struct tick_sched for better cache layout
* Some small cleanups and a better cache layout for struct
tick_sched
- Implement the missing timer_wait_running() callback for POSIX CPU
timers
For unknown reason the introduction of the timer_wait_running()
callback missed to fixup posix CPU timers, which went unnoticed for
almost four years.
While initially only targeted to prevent livelocks between a timer
deletion and the timer expiry function on PREEMPT_RT enabled kernels,
it turned out that fixing this for mainline is not as trivial as just
implementing a stub similar to the hrtimer/timer callbacks.
The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
systems there is a livelock issue independent of RT.
CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
timers out from hard interrupt context to task work, which is handled
before returning to user space or to a VM. The expiry mechanism moves
the expired timers to a stack local list head with sighand lock held.
Once sighand is dropped the task can be preempted and a task which
wants to delete a timer will spin-wait until the expiry task is
scheduled back in. In the worst case this will end up in a livelock
when the preempting task and the expiry task are pinned on the same
CPU.
The timer wheel has a timer_wait_running() mechanism for RT, which
uses a per CPU timer-base expiry lock which is held by the expiry
code and the task waiting for the timer function to complete blocks
on that lock.
This does not work in the same way for posix CPU timers as there is
no timer base and expiry for process wide timers can run on any task
belonging to that process, but the concept of waiting on an expiry
lock can be used too in a slightly different way.
Add a per task mutex to struct posix_cputimers_work, let the expiry
task hold it accross the expiry function and let the deleting task
which waits for the expiry to complete block on the mutex.
In the non-contended case this results in an extra
mutex_lock()/unlock() pair on both sides.
This avoids spin-waiting on a task which is scheduled out, prevents
the livelock and cures the problem for RT and !RT systems
* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
posix-cpu-timers: Implement the missing timer_wait_running callback
selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
selftests/proc: Remove idle time monotonicity assertions
MAINTAINERS: Remove stale email address
timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
timers/nohz: Add a comment about broken iowait counter update race
timers/nohz: Protect idle/iowait sleep time under seqcount
timers/nohz: Only ever update sleeptime from idle exit
timers/nohz: Restructure and reshuffle struct tick_sched
tick/common: Align tick period with the HZ tick.
selftests/timers/posix_timers: Test delivery of signals across threads
posix-timers: Prefer delivery of signals to the current thread
vdso: Improve cmd_vdso_check to check all dynamic relocations
Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes which
utilize syscall user dispatch correctly.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGgIETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYockhEACWVd/KOBlQIdUMpM3jfSWsm+VZrITg
sKN2WCKaz8MS5RA7xTAfZIEqMzkI0V+GPoj+8eK70W39XFU/PlSQo8LUFahSxVHF
RVyz4zFKeR2XZpDa8J3ytoOvngiAnpOUflssvfA0+f3gq/B48jgLmj8XsrkmkL2T
6txRpusYNlzVTBoza0+1uEmxBTNhRxvURXa6OR/l24Kbh2udyNd6dlAoRHBV0iOW
qn7ILgoYIr/74ChCbrr8yZe2rZ+BqqlS1fsjDWkuUqq9AgzeuOjGJnZtMKG6WbGg
/NBj0Ewe7gsgZwBo7t4MbKNF7bXRkLczp8BX/l9xOTe+mpZ+LyNIHvOM3/TD6O1A
NFJNwTAGAnhU5Uoba9HzaKYZZnanqgLxuszXznJDU3zKV5pCNMNzlKxjPT73Jzsl
T1WTCyhSydluSuhOHLU4awC38pqVEQwichx98c9agIBPo7kxkb5RcTVq223wOSeI
h8otkecJ6U+gmjNDHnRtNBzykEIjVFjgiSBYGTr+/6ek2Myf0O/RMr13oe9OZG5R
jaKyjcDIADbYRow1rXfEs7Bq42K8rIkbVZvEEK/auYRUFngAoQ3l090i9wj6ViXf
7CqAjCC1K1BBxbqQwf0YLuDXCzUaXxcWfvNGEGEGs/NYDuu291QntGSFSxsJgsym
HXvO4NzHOHi13A==
=AS+6
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/ptrace update from Thomas Gleixner:
"Provide a ptrace set/get interface for syscall user dispatch. The main
purpose is to enable checkpoint/restore (CRIU) to handle processes
which utilize syscall user dispatch correctly"
* tag 'core-entry-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
selftest, ptrace: Add selftest for syscall user dispatch config api
ptrace: Provide set/get interface for syscall user dispatch
syscall_user_dispatch: Untag selector address before access_ok()
syscall_user_dispatch: Split up set_syscall_user_dispatch()
iter_pass_iter_ptr_to_subprog subtest is relying on actual array size
being passed as subprog parameter. This combined with recent fixes to
precision tracking in conditional jumps ([0]) is now causing verifier to
backtrack all the way to the point where sum() and fill() subprogs are
called, at which point precision backtrack bails out and forces all the
states to have precise SCALAR registers. This in turn causes each
possible value of i within fill() and sum() subprogs to cause
a different non-equivalent state, preventing iterator code to converge.
For now, change the test to assume fixed size of passed in array. Once
BPF verifier supports precision tracking across subprogram calls, these
changes will be reverted as unnecessary.
[0] 71b547f561 ("bpf: Fix incorrect verifier pruning due to missing register precision taints")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230424235128.1941726-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As reported by Kumar in [0], the shared ownership implementation for BPF
programs has some race conditions which need to be addressed before it
can safely be used. This patch does so in a minimal way instead of
ripping out shared ownership entirely, as proper fixes for the issues
raised will follow ASAP, at which point this patch's commit can be
reverted to re-enable shared ownership.
The patch removes the ability to call bpf_refcount_acquire_impl from BPF
programs. Programs can only bump refcount and obtain a new owning
reference using this kfunc, so removing the ability to call it
effectively disables shared ownership.
Instead of changing success / failure expectations for
bpf_refcount-related selftests, this patch just disables them from
running for now.
[0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2/
Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230424204321.2680232-1-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZEEuiAAKCRCRxhvAZXjc
otFNAP4iesF3tqrzb1JLf/vKg8PT98re0Iec5tGYyfDSCjfGYAEAraeFOSry16sy
VUn6JBjDwrkAbUeraz6zauS+rXwstQk=
=xYmX
-----END PGP SIGNATURE-----
Merge tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull clone3 selftest fix from Christian Brauner:
"This is a single fix to the clone3() selftstests.
It fell through the sefltest tree cracks a few times so I'll provide
it here. It has low urgency but we should still correctly report the
number of tests"
* tag 'v6.4/kernel.clone3.tests' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
selftests/clone3: fix number of tests in ksft_set_plan
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRFYFsACgkQCwJExA0N
Qxw+PxAA1KHnHool3QbzZouFgLgTS2N/hxsOIoWKeUl6guUPX0XYu67FEIyt7p5k
a1eFLjt+q4URW/heHKYdffP+Up6xhN5yVP8xJEcbn6GD13lz1clI9RAjObiPOehc
KOV90PeAEfzosEGRIp97g4Gzu8NUMZqN7BsKBdzYJ4rEftlcjaILBVp4OfSuCyAi
UbYBdRjK4eIOwGXuHVfhNqzH1HRSbzcoSRTywj5qW0Qhpe6KnZBRuZESXYBsxzGb
G0nd4+OttjZyplI/xQYwaU0XGAI6roG5G4nAT5YGHLp5g8rTaHetTi+i3iK4iEru
wEL0NgywkA0ujAge97RldOjtU97vvSFk7FwxdS9lxaMW/Ut2sN72I2ThI8dBvVRZ
fcw8t8mmT1gUv3SCq+s1X13vz22IedXLOfvOY2o/fLk2zxOw5e8FirAz/aFeOf3K
++hK+IQvDmeMMv08bz0ORzdRQcjdwQNQ3klnfdrUVFN9yK+iAllOJ/nrXHLNIXu4
c3ITlAMldcAf2W+LRWzvqqKyT4H8MCXL3L0bBc1M1reRu9nM89AZedO8MHCB0R9Q
2ic0rOxIwZzPJuk0qPDxEVmN7Rpyx85I96YOwRemJTEfdkB/ZX+BfOU0KzinOVHC
3qrHuIw/SyRTlUEDAr53gJ5WHbdjhKAmrd1/FuplyoOSX0w6VVA=
=COQn
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
* tag 'linux-kselftest-kunit-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: add tests for using current KUnit test field
kunit: tool: Add support for SH under QEMU
kunit: tool: Add support for overriding the QEMU serial port
.gitignore: Unignore .kunitconfig
list: test: Test the klist structure
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
kunit: Use gfp in kunit_alloc_resource() kernel-doc
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
kunit: tool: remove unused imports and variables
kunit: tool: add subscripts for type annotations where appropriate
kunit: fix bug of extra newline characters in debugfs logs
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug in debugfs logs of parameterized tests
kunit: tool: Add support for m68k under QEMU
linux-kselftest-next-6.4-rc1
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRFWfMACgkQCwJExA0N
QxxXPg/9Fjo+jLEt6/CraOQeeyQsmzQkPiXutqrjUoUwaMGpOoon+En39g1eDAiZ
5iJE8lu47ldUJNp7Pn2THfcLXJ4Le+12s0imWuXaDf6VdFUNsyd9AZMfziMmeUGa
6GXLF+FUzT7uXbIuIrPvZmyX3QFL8WEcywmgFRlAyfLFcg37uUgiKSl7ATrgWCEw
jlPELO2p2+t+EB0/n1VMoXup6tGD6tpuuNH50rDeRMTV+cW7wKTvJyFXbMvGThcx
YfjzofYm+drX5gka/XQYynZehTNcbASjkvYYEqm3piwWCyfqt4Y1aOAA8fS9h+86
jKGF/pxz1Zcl7vgZW5WixKaJ+cMf8gfCMRsny6h2x2pmqKwqSJtg+jk8XqNkJQnh
JKwRxosLjEQMyIRPMH9PUjBQD46VuC2nvu4SY5apxkbHH2iG8SKG/DNIpSFXB2m0
7BuziwKZe9uw671vaZU2b0r5eeCxETtuFwlHWN9Rl954g0zeueyUuBqg0ibpQois
Jp2SvwVR+oXRLHoqDMrCEDk0WEGJ9WD/mMxW4iHGMYRwoXb1RCKzFWEoYw3dFAxw
N/knx2IeEFshiKIAqaGnpER1chLe6ChX/5rDXkj1YLDwHEJdowHhE9/GxqGbhEUi
zOzZIAGMVTOZp6g/TIutGYfpyxUAXj19eRoAbA/KkLmv4d+s1nY=
=992w
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-next-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest updates from Shuah Khan:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
* tag 'linux-kselftest-next-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (25 commits)
selftests/resctrl: Fix incorrect error return on test complete
selftests/resctrl: Remove duplicate codes that clear each test result file
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Use correct exit code when tests fail
kselftest/arm64: Convert za-fork to use kselftest.h
kselftest: Support nolibc
tools/nolibc/stdio: Implement vprintf()
selftests/resctrl: Correct get_llc_perf() param in function comment
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
...
o MAINTAINERS files additions and changes.
o Fix hotplug warning in nohz code.
o Tick dependency changes by Zqiang.
o Lazy-RCU shrinker fixes by Zqiang.
o rcu-tasks stall reporting improvements by Neeraj.
o Initial changes for renaming of k[v]free_rcu() to its new k[v]free_rcu_mightsleep()
name for robustness.
o Documentation Updates:
o Significant changes to srcu_struct size.
o Deadlock detection for srcu_read_lock() vs synchronize_srcu() from Boqun.
o rcutorture and rcu-related tool, which are targeted for v6.4 from Boqun's tree.
o Other misc changes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEcoCIrlGe4gjE06JJqA4nf2o45hAFAmQuBnIACgkQqA4nf2o4
5hACVRAAoXu7/gfh5Pjw9O4E4pCdPJKsZZVYrcrVGrq6NAxRn6M1SgurAdC5grj2
96x0waoGaiO82V0H5iJMcKdAVu67x9R8WaQ1JoxN75Efn8h9W4TguB87TV1gk0xS
eZ18b/CyEaM5mNb80DFFF4FLohy5737p/kNTMqXQdUyR1BsDl16iRMgjiBiFhNUx
yPo8Y2kC2U2OTbldZgaE7s9bQO3xxEcifx93sGWsAex/gx54FYNisiwSlCOSgOE+
XkYo/OKk8Xvr82tLVX8XQVEPCMJ+rxea8T5zSs8/alvsPq7gA8wW3y6fsoa3vUU/
+Gd+W+Q/OsONIDtp8rQAY1qsD0ScDpaR8052RSH0zTa7pj8HsQgE5PjZ+cJW0SEi
cKN+Oe8+ETqKald+xZ6PDf58O212VLrru3RpQWrOQcJ7fmKmfT4REK0RcbLgg4qT
CBgOo6eg+ub4pxq2y11LZJBNTv1/S7xAEzFE0kArew64KB2gyVud0VJRZVAJnEfe
93QQVDFrwK2bhgWQZ6J6IbTvGeQW0L93IibuaU6jhZPR283VtUIIvM7vrOylN7Fq
4jsae0T7YGYfKUhgTpm7rCnm8A/D3Ni8MY0sKYYgDSyKmZUsnpI5wpx1xke4lwwV
ErrY46RCFa+k8wscc6iWfB4cGXyyFHyu+wtyg0KpFn5JAzcfz4A=
=Rgbj
-----END PGP SIGNATURE-----
Merge tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux
Pull RCU updates from Joel Fernandes:
- Updates and additions to MAINTAINERS files, with Boqun being added to
the RCU entry and Zqiang being added as an RCU reviewer.
I have also transitioned from reviewer to maintainer; however, Paul
will be taking over sending RCU pull-requests for the next merge
window.
- Resolution of hotplug warning in nohz code, achieved by fixing
cpu_is_hotpluggable() through interaction with the nohz subsystem.
Tick dependency modifications by Zqiang, focusing on fixing usage of
the TICK_DEP_BIT_RCU_EXP bitmask.
- Avoid needless calls to the rcu-lazy shrinker for CONFIG_RCU_LAZY=n
kernels, fixed by Zqiang.
- Improvements to rcu-tasks stall reporting by Neeraj.
- Initial renaming of k[v]free_rcu() to k[v]free_rcu_mightsleep() for
increased robustness, affecting several components like mac802154,
drbd, vmw_vmci, tracing, and more.
A report by Eric Dumazet showed that the API could be unknowingly
used in an atomic context, so we'd rather make sure they know what
they're asking for by being explicit:
https://lore.kernel.org/all/20221202052847.2623997-1-edumazet@google.com/
- Documentation updates, including corrections to spelling,
clarifications in comments, and improvements to the srcu_size_state
comments.
- Better srcu_struct cache locality for readers, by adjusting the size
of srcu_struct in support of SRCU usage by Christoph Hellwig.
- Teach lockdep to detect deadlocks between srcu_read_lock() vs
synchronize_srcu() contributed by Boqun.
Previously lockdep could not detect such deadlocks, now it can.
- Integration of rcutorture and rcu-related tools, targeted for v6.4
from Boqun's tree, featuring new SRCU deadlock scenarios, test_nmis
module parameter, and more
- Miscellaneous changes, various code cleanups and comment improvements
* tag 'rcu.6.4.april5.2023.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux: (71 commits)
checkpatch: Error out if deprecated RCU API used
mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep()
rcuscale: Rename kfree_rcu() to kfree_rcu_mightsleep()
ext4/super: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/mlx5: Rename kfree_rcu() to kfree_rcu_mightsleep()
net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
lib/test_vmalloc.c: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
tracing: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
misc: vmw_vmci: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
drbd: Rename kvfree_rcu() to kvfree_rcu_mightsleep()
rcu: Protect rcu_print_task_exp_stall() ->exp_tasks access
rcu: Avoid stack overflow due to __rcu_irq_enter_check_tick() being kprobe-ed
rcu-tasks: Report stalls during synchronize_srcu() in rcu_tasks_postscan()
rcu: Permit start_poll_synchronize_rcu_expedited() to be invoked early
rcu: Remove never-set needwake assignment from rcu_report_qs_rdp()
rcu: Register rcu-lazy shrinker only for CONFIG_RCU_LAZY=y kernels
rcu: Fix missing TICK_DEP_MASK_RCU_EXP dependency check
rcu: Fix set/clear TICK_DEP_BIT_RCU_EXP bitmask race
rcu/trace: use strscpy() to instead of strncpy()
tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem
...
o Add support for loongarch.
o Fix stack-protector issues.
o Support additional integral types and signal-related macros.
o Add support for stdin, stdout, and stderr.
o Add getuid() and geteuid().
o Allow S_I* macros to be overridden by program.
o Defer to linux/fcntl.h and linux/stat.h to avoid duplicate
definitions.
o Many improvements to the self tests.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmQsjLUTHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jKuOD/9Y18ncS7+K7D9m89h5zwqBoPeNdP65
fC/kSgBeLreqO9hR7fsDegPk/pC8ZcMZj7iiLMV8dQIan0N74Hcd4mnL+Nf9LdkP
KpwMfN/DG7h+p69Ug4d6qQqSr7jwEGa1NE280NguNtugs+kanSayVoSh0lhWPp7K
5t9s4oITOQNa69EgluB25b1qbaY36DibsBbt7vO2b/Rsmbw13UdgXQxKBQTgEOmQ
yfNqjSYatRrl7pQQiQQDwV8xzd84jNLgNKsTDtG18kUWqYBhleMDQxt54betRFZ2
O3SNylt7pJKVFsgjAB90TcOl4SRmzXJ0+jNLIuNYJkqlQaPjQy29N4nBBUp2ciBf
sSkL4OBfaegJlnBWZ4LwyV+ZTPLGQ5PZ78EhfqMpQzVbEibL5Es/pPAVuUxLVK4B
U03+sh4IAr2aK4aXcxXgDWYr/HmKP6DWrlp/oH679sW1e2mUpnjghOfuGMYwLifo
d+a/M2VH8++KOV/0aZf/Z4p1Jbexn6/jz4pW7DTfiARduUtlaN5F69oAr85q5vrS
T1dZ84ZjmxrWJUYEJjfs5DM+fvPzsOyho8ae9TRlxnJJtW6UOZ0TTUo+mUDfgqWQ
OU6XteK6zTQmE5H9aVOVwO/jl0uXWzmHAoJTAynpPgEkP01I+t6Vx8e1PAK/oPbT
fJdUIEFiMo1qMg==
=BS2x
-----END PGP SIGNATURE-----
Merge tag 'nolibc.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull nolibc updates from Paul McKenney:
- Add support for loongarch
- Fix stack-protector issues
- Support additional integral types and signal-related macros
- Add support for stdin, stdout, and stderr
- Add getuid() and geteuid()
- Allow S_I* macros to be overridden by program
- Defer to linux/fcntl.h and linux/stat.h to avoid duplicate
definitions
- Many improvements to the selftests
* tag 'nolibc.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (22 commits)
tools/nolibc: x86_64: add stackprotector support
tools/nolibc: i386: add stackprotector support
tools/nolibc: tests: add test for -fstack-protector
tools/nolibc: tests: fold in no-stack-protector cflags
tools/nolibc: add support for stack protector
tools/nolibc: tests: constify test_names
tools/nolibc: add helpers for wait() signal exits
tools/nolibc: add definitions for standard fds
selftests/nolibc: Adjust indentation for Makefile
selftests/nolibc: Add support for LoongArch
tools/nolibc: Add support for LoongArch
tools/nolibc: Add statx() and make stat() rely on statx() if necessary
tools/nolibc: Include linux/fcntl.h and remove duplicate code
tools/nolibc: check for S_I* macros before defining them
selftests/nolibc: skip the chroot_root and link_dir tests when not privileged
tools/nolibc: add getuid() and geteuid()
tools/nolibc: add tests for the integer limits in stdint.h
tools/nolibc: enlarge column width of tests
tools/nolibc: add integer types and integer limit macros
tools/nolibc: add stdint.h
...
This update adds tests for nested locking and also adds support for
testing raw spinlocks in PREEMPT_RT kernels.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEbK7UrM+RBIrCoViJnr8S83LZ+4wFAmQs8kETHHBhdWxtY2tA
a2VybmVsLm9yZwAKCRCevxLzctn7jImaEACFX7CPZyRUG32Yo6wdzxRHuZPid6cR
Si5GyRiTJzKuS9aDgl6jMYRvFXSXE9Xx1TVX0ad6fkNW40IMAkXprmUkQwN3ZtSb
K/pOLyOSFkm/XDrfDinPU46kh+DgSrAZtB3jhELa5doRxr9lWWSnwV4HoBx64T3/
84LEyIi47OSVxucaUWfimDUyBbNl4Oq95hdpD3hwxyxq5nsv2Q+oLWy2syXeegOz
3ru4Aswg40cwjYT9tjnrfZKZeteby2q55JYUDvP3kPfu/utyMyafUOda0DhHFdRB
dT1EISkY/zyqf3orTfghLpYJEplDNkSKhVtyn2dQcRHhoUJ9e/8xnRclqVo4tkqv
QWUZHJFar08P6iNBh9Z/YiM8D4kpeQNVCmR29h094BlQMbTLYbcZUjJ3YeE5nsz+
Bid7Ln6aBvGb3Ui6EWq7FVfcGzrPms3MUXw6nQLh6HaQg0F2g73MKS9Wd75OjEc/
cKPxkqzC35pM87eEf0xBlJzudZYxkYhP8Rt0bCGt/tq/pZAulCyOgnET2mcBv7Z0
94uEIGVvswVPB9/VKyqf7mHVrk/uJeygGKD1++4pzGumdhfsaM1dl3g6DkrSgK1j
A/kAApkhha8Zacj3oAAQuBPi8JuIqUFQvfbA8Os6d/8PXfTRaaMnV9DRS7wcohkP
7haDPwX8pHj+Gg==
=QAhX
-----END PGP SIGNATURE-----
Merge tag 'locktorture.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull locktorture updates from Paul McKenney:
"This adds tests for nested locking and also adds support for testing
raw spinlocks in PREEMPT_RT kernels"
* tag 'locktorture.2023.04.04a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
locktorture: Add raw_spinlock* torture tests for PREEMPT_RT kernels
locktorture: With nested locks, occasionally skip main lock
locktorture: Add nested locking to rtmutex torture tests
locktorture: Add nested locking to mutex torture tests
locktorture: Add nested_[un]lock() hooks and nlocks parameter
Support the command testing in a unit-test fashion.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://lore.kernel.org/r/20230423221231.6357-1-dave@stgolabs.net
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Merge ACPICA material for 6.4-rc1:
- Delete bogus node_array array of pointers from AEST table (Jessica
Clarke).
- Add support for trace buffer extension in GICC to the ACPI MADT
parser (Xiongfeng Wang).
- Add missing macro ACPI_FUNCTION_TRACE() for acpi_ns_repair_HID()
(Xiongfeng Wang).
- Add missing tables to astable (Pedro Falcato).
- Add support for 64 bit loong_arch compilation to ACPICA (Huacai
Chen).
- Add support for ASPT table in disassembler to ACPICA (Jeremi
Piotrowski).
- Add support for Arm's MPAM ACPI table version 2 (Hesham Almatary).
- Update all copyrights/signons in ACPICA to 2023 (Bob Moore).
- Add support for ClockInput resource (v6.5) (Niyas Sait).
- Add RISC-V INTC interrupt controller definition to the list of
supported interrupt controllers for MADT (Sunil V L).
- Add structure definitions for the RISC-V RHCT ACPI table (Sunil V L).
- Address several cases in which the ACPICA code might lead to
undefined behavior (Tamir Duberstein).
- Make ACPICA code support flexible arrays properly (Kees Cook).
- Check null return of ACPI_ALLOCATE_ZEROED in
acpi_db_display_objects() (void0red).
- Add os specific support for Zephyr RTOS to ACPICA (Najumon).
- Update version to 20230331 (Bob Moore).
* acpica: (32 commits)
ACPICA: Update version to 20230331
ACPICA: add os specific support for Zephyr RTOS
ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects
ACPICA: acpi_resource_irq: Replace 1-element arrays with flexible array
ACPICA: acpi_madt_oem_data: Fix flexible array member definition
ACPICA: acpi_dmar_andd: Replace 1-element array with flexible array
ACPICA: acpi_pci_routing_table: Replace fixed-size array with flex array member
ACPICA: struct acpi_resource_dma: Replace 1-element array with flexible array
ACPICA: Introduce ACPI_FLEX_ARRAY
ACPICA: struct acpi_nfit_interleave: Replace 1-element array with flexible array
ACPICA: actbl2: Replace 1-element arrays with flexible arrays
ACPICA: actbl1: Replace 1-element arrays with flexible arrays
ACPICA: struct acpi_resource_vendor: Replace 1-element array with flexible array
ACPICA: Avoid undefined behavior: load of misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within misaligned address
ACPICA: Avoid undefined behavior: member access within null pointer
ACPICA: Avoid undefined behavior: applying zero offset to null pointer
...
The bulk of the commits here are for the conversion of drivers to use
void remove callbacks but there's a reasonable amount of other stuff
going on, the pace of development with the SOF code continues to be high
and there's a bunch of new drivers too:
- More core cleanups from Morimto-san.
- Update drivers to have remove() callbacks returning void, mostly
mechanical with some substantial changes.
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol.
- Hibernation support for CS35L45.
- More DT binding conversions.
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEreZoqmdXGLWf4p/qJNaLcl1Uh9AFAmRGdEsACgkQJNaLcl1U
h9BNVAf+Ijupg3dhAl6847v1PRYXkYK6YjAayhNd0xRoRePKnnv1zkrsXSBzxZUM
8KHpQDUJyfQbPnE2JRmr1WfHSoNDl/NXdJl+lefPBgol5bzxRji68hDPGjA0645o
R6vzrqLNw8mh3jwptXfEpQfKH8tIvwOeMeLkncDvsm0ZCUR2GhYnjB1g82ih0ssx
lvh36PdCRF0e/ruHxkiVn9b/riID65oTRkN6IxJqoPnqJZVyCiqmiJcfWePpaPir
4R9Dyk+REos/aCLdne1g6H21Tgi0td+blv6empqwdEXG41VSdRMTrOZb1ZISKmpF
ggPbKsk9BjJFBCewllHXJ0YEcBp9/g==
=TVxt
-----END PGP SIGNATURE-----
Merge tag 'asoc-v6.4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-next
ASoC: Updates for v6.4
The bulk of the commits here are for the conversion of drivers to use
void remove callbacks but there's a reasonable amount of other stuff
going on, the pace of development with the SOF code continues to be high
and there's a bunch of new drivers too:
- More core cleanups from Morimto-san.
- Update drivers to have remove() callbacks returning void, mostly
mechanical with some substantial changes.
- Continued feature and simplification work on SOF, including addition
of a no-DSP mode for bringup, HDA MLink and extensions to the IPC4
protocol.
- Hibernation support for CS35L45.
- More DT binding conversions.
- Support for Cirrus Logic CS35L56, Freescale QMC, Maxim MAX98363,
nVidia systems with MAX9809x and RT5631, Realtek RT712, Renesas R-Car
Gen4, Rockchip RK3588 and TI TAS5733.
The cxl_mem driver uses debugfs to support poison inject and clear.
Add debugfs to the list of required symbols so that cxl_test can
emulate those poison operations.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/4f3aab57fbf1cc3ccde2eb887c5d90566c8d0e90.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
CXL devices may report a maximum number of addresses that a device
allows to be poisoned using poison injection. When cxl_test creates
mock CXL memory devices, it defaults to MOCK_INJECT_DEV_MAX==88 for
all mocked memdevs.
Add a sysfs attribute, poison_inject_max to module cxl_mock_mem so
that users can set a custom device injection limit. Fail, and return
-EBUSY, if the mock poison list is not empty.
/sys/bus/platform/drivers/cxl_mock_mem/poison_inject_max
A simple usage model is to set the attribute before running a test in
order to emulate a device's poison handling.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/0f25b2862b90013545450222d2199e435c6cc11a.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Prior to poison inject support, the mock of 'Get Poison List'
returned a poison list containing a single mocked error record.
Following the addition of poison inject and clear support to the
mock driver, use the mock_poison_list[], rather than faking an
error record. Mock_poison_list[] list tracks the actual poison
inject and clear requests issued by userspace.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/0f4242c81821f4982b02cb1009c22783ef66b2f1.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mock the clear of poison by deleting the device:address entry from
the mock_poison_list[]. Behave like a real CXL device and do not fail
if the address is not in the poison list, but offer a dev_dbg()
message.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/ecf19743c6572e60971bbd078f67d520cf5bca5d.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Mock the injection of poison by storing the device:address entries in
mock_poison_list[]. Enforce a limit of 8 poison injections per memdev
device and 128 total entries for the cxl_test mock driver.
Introducing the mock_poison[] list here, makes it available for use in
the mock of Clear Poison, and the mock of Get Poison List.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/f6b7f03541eaa8c2260d3eafadd04afe3f0d7962.1681874357.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Make mock memdevs support the Get Poison List mailbox command.
Return a fake poison error record when the get poison list command
is issued.
This supports testing the kernel tracing and cxl list capabilities
for media errors.
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://lore.kernel.org/r/14d661ce3e3a32b7d8e76b8ecc5eb88343b3d09c.1681838292.git.alison.schofield@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The QFQ qdisc class has parameter bounds that are not being
checked for correctness.
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZELn8wAKCRDbK58LschI
g1khAQC1nmXPuKjM4EAfFK8Ysb3KoF8ADmpE97n+/HEDydCagwD/bX0+NABR75Nh
ueGcoU1TcfcbshDzrH0s+C95owZDZw4=
=BeZM
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-04-21
We've added 71 non-merge commits during the last 8 day(s) which contain
a total of 116 files changed, 13397 insertions(+), 8896 deletions(-).
The main changes are:
1) Add a new BPF netfilter program type and minimal support to hook
BPF programs to netfilter hooks such as prerouting or forward,
from Florian Westphal.
2) Fix race between btf_put and btf_idr walk which caused a deadlock,
from Alexei Starovoitov.
3) Second big batch to migrate test_verifier unit tests into test_progs
for ease of readability and debugging, from Eduard Zingerman.
4) Add support for refcounted local kptrs to the verifier for allowing
shared ownership, useful for adding a node to both the BPF list and
rbtree, from Dave Marchevsky.
5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF
selftests into libbpf-provided bpf_helpers.h header and improve
kfunc handling, from Andrii Nakryiko.
6) Support 64-bit pointers to kfuncs needed for archs like s390x,
from Ilya Leoshkevich.
7) Support BPF progs under getsockopt with a NULL optval,
from Stanislav Fomichev.
8) Improve verifier u32 scalar equality checking in order to enable
LLVM transformations which earlier had to be disabled specifically
for BPF backend, from Yonghong Song.
9) Extend bpftool's struct_ops object loading to support links,
from Kui-Feng Lee.
10) Add xsk selftest follow-up fixes for hugepage allocated umem,
from Magnus Karlsson.
11) Support BPF redirects from tc BPF to ifb devices,
from Daniel Borkmann.
12) Add BPF support for integer type when accessing variable length
arrays, from Feng Zhou.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits)
selftests/bpf: verifier/value_ptr_arith converted to inline assembly
selftests/bpf: verifier/value_illegal_alu converted to inline assembly
selftests/bpf: verifier/unpriv converted to inline assembly
selftests/bpf: verifier/subreg converted to inline assembly
selftests/bpf: verifier/spin_lock converted to inline assembly
selftests/bpf: verifier/sock converted to inline assembly
selftests/bpf: verifier/search_pruning converted to inline assembly
selftests/bpf: verifier/runtime_jit converted to inline assembly
selftests/bpf: verifier/regalloc converted to inline assembly
selftests/bpf: verifier/ref_tracking converted to inline assembly
selftests/bpf: verifier/map_ptr_mixing converted to inline assembly
selftests/bpf: verifier/map_in_map converted to inline assembly
selftests/bpf: verifier/lwt converted to inline assembly
selftests/bpf: verifier/loops1 converted to inline assembly
selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly
selftests/bpf: verifier/direct_packet_access converted to inline assembly
selftests/bpf: verifier/d_path converted to inline assembly
selftests/bpf: verifier/ctx converted to inline assembly
selftests/bpf: verifier/btf_ctx_access converted to inline assembly
selftests/bpf: verifier/bpf_get_stack converted to inline assembly
...
====================
Link: https://lore.kernel.org/r/20230421211035.9111-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
if sch_fq is configured with "initial quantum" having values greater than
INT_MAX, the first assignment of "credit" does signed integer overflow to
a very negative value.
In this situation, the syzkaller script provided by Cristoph triggers the
CPU soft-lockup warning even with few sockets. It's not an infinite loop,
but "credit" wasn't probably meant to be minus 2Gb for each new flow.
Capping "initial quantum" to INT_MAX proved to fix the issue.
v2: validation of "initial quantum" is done in fq_policy, instead of open
coding in fq_change() _ suggested by Jakub Kicinski
Reported-by: Christoph Paasch <cpaasch@apple.com>
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/377
Fixes: afe4fd0624 ("pkt_sched: fq: Fair Queue packet scheduler")
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://lore.kernel.org/r/7b3a3c7e36d03068707a021760a194a8eb5ad41a.1682002300.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This adds three new tests to the selftests for KSM. These tests use the
new prctl API's to enable and disable KSM.
1) add new prctl flags to prctl header file in tools dir
This adds the new prctl flags to the include file prct.h in the
tools directory. This makes sure they are available for testing.
2) add KSM prctl merge test to ksm_tests
This adds the -t option to the ksm_tests program. The -t flag
allows to specify if it should use madvise or prctl ksm merging.
3) add two functions for debugging merge outcome for ksm_tests
This adds two functions to report the metrics in /proc/self/ksm_stat
and /sys/kernel/debug/mm/ksm. The debug output is enabled with the
-d option.
4) add KSM prctl test to ksm_functional_tests
This adds a test to the ksm_functional_test that verifies that the
prctl system call to enable / disable KSM works.
5) add KSM fork test to ksm_functional_test
Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
by the child process.
Link: https://lkml.kernel.org/r/20230418051342.1919757-4-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr@devkernel.io>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Bagas Sanjaya <bagasdotme@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test suite (with 10 more sub-tests) to cover RO pinning against
fork() over uffd-wp. It covers both:
(1) Early CoW test in fork() when page pinned,
(2) page unshare due to RO longterm pin.
They are:
Testing wp-fork-pin on anon... done
Testing wp-fork-pin on shmem... done
Testing wp-fork-pin on shmem-private... done
Testing wp-fork-pin on hugetlb... done
Testing wp-fork-pin on hugetlb-private... done
Testing wp-fork-pin-with-event on anon... done
Testing wp-fork-pin-with-event on shmem... done
Testing wp-fork-pin-with-event on shmem-private... done
Testing wp-fork-pin-with-event on hugetlb... done
Testing wp-fork-pin-with-event on hugetlb-private... done
CONFIG_GUP_TEST needed or they'll be skipped.
Testing wp-fork-pin on anon... skipped [reason: Possibly CONFIG_GUP_TEST missing or unprivileged]
Note that the major test goal is on private memory, but no hurt to also run
all of them over shared because shared memory should work the same.
Link: https://lkml.kernel.org/r/20230417195317.898696-7-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The macro and facility can be reused in other tests too. Make it general.
Link: https://lkml.kernel.org/r/20230417195317.898696-6-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Extend it to all types of mem, meanwhile add one parallel test when
EVENT_FORK is enabled, where uffd-wp bits should be persisted rather than
dropped.
Since at it, rename the test to "wp-fork" to better show what it means.
Making the new test called "wp-fork-with-event".
Before:
Testing pagemap on anon... done
After:
Testing wp-fork on anon... done
Testing wp-fork on shmem... done
Testing wp-fork on shmem-private... done
Testing wp-fork on hugetlb... done
Testing wp-fork on hugetlb-private... done
Testing wp-fork-with-event on anon... done
Testing wp-fork-with-event on shmem... done
Testing wp-fork-with-event on shmem-private... done
Testing wp-fork-with-event on hugetlb... done
Testing wp-fork-with-event on hugetlb-private... done
Link: https://lkml.kernel.org/r/20230417195317.898696-5-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Namely:
"-f": add a wildcard filter for tests to run
"-l": list tests rather than running any
"-h": help msg
Link: https://lkml.kernel.org/r/20230417195317.898696-4-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Mika Penttilä <mpenttil@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Test verifier/value_ptr_arith automatically converted to use inline assembly.
Test cases "sanitation: alu with different scalars 2" and
"sanitation: alu with different scalars 3" are updated to
avoid -ENOENT as return value, as __retval() annotation
only supports numeric literals.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-25-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test verifier/unpriv semi-automatically converted to use inline assembly.
The verifier/unpriv.c had to be split in two parts:
- the bulk of the tests is in the progs/verifier_unpriv.c;
- the single test that needs `struct bpf_perf_event_data`
definition is in the progs/verifier_unpriv_perf.c.
The tests above can't be in a single file because:
- first requires inclusion of the filter.h header
(to get access to BPF_ST_MEM macro, inline assembler does
not support this isntruction);
- the second requires vmlinux.h, which contains definitions
conflicting with filter.h.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-23-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test verifier/loops1 automatically converted to use inline assembly.
There are a few modifications for the converted tests.
"tracepoint" programs do not support test execution, change program
type to "xdp" (which supports test execution) for the following tests
that have __retval tags:
- bounded loop, count to 4
- bonded loop containing forward jump
Also, remove the __retval tag for test:
- bounded loop, count from positive unknown to 4
As it's return value is a random number.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-10-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
In order to express test cases that use bpf_tail_call() intrinsic it
is necessary to have several programs to be loaded at a time.
This commit adds __auxiliary annotation to the set of annotations
supported by test_loader.c. Programs marked as auxiliary are always
loaded but are not treated as a separate test.
For example:
void dummy_prog1(void);
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 4);
__uint(key_size, sizeof(int));
__array(values, void (void));
} prog_map SEC(".maps") = {
.values = {
[0] = (void *) &dummy_prog1,
},
};
SEC("tc")
__auxiliary
__naked void dummy_prog1(void) {
asm volatile ("r0 = 42; exit;");
}
SEC("tc")
__description("reference tracking: check reference or tail call")
__success __retval(0)
__naked void check_reference_or_tail_call(void)
{
asm volatile (
"r2 = %[prog_map] ll;"
"r3 = 0;"
"call %[bpf_tail_call];"
"r0 = 0;"
"exit;"
:: __imm(bpf_tail_call),
: __clobber_all);
}
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230421174234.2391278-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Extend prog_tests with two test cases:
# ./test_progs --allow=verifier_netfilter_retcode
#278/1 verifier_netfilter_retcode/bpf_exit with invalid return code. test1:OK
#278/2 verifier_netfilter_retcode/bpf_exit with valid return code. test2:OK
#278/3 verifier_netfilter_retcode/bpf_exit with valid return code. test3:OK
#278/4 verifier_netfilter_retcode/bpf_exit with invalid return code. test4:OK
#278 verifier_netfilter_retcode:OK
This checks that only accept and drop (0,1) are permitted.
NF_QUEUE could be implemented later if we can guarantee that attachment
of such programs can be rejected if they get attached to a pf/hook that
doesn't support async reinjection.
NF_STOLEN could be implemented via trusted helpers that can guarantee
that the skb will eventually be free'd.
v4: test case for bpf_nf_ctx access checks, requested by Alexei Starovoitov.
v5: also check ctx->{state,skb} can be dereferenced (Alexei).
# ./test_progs --allow=verifier_netfilter_ctx
#281/1 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/2 verifier_netfilter_ctx/netfilter invalid context access, size too short:OK
#281/3 verifier_netfilter_ctx/netfilter invalid context access, past end of ctx:OK
#281/4 verifier_netfilter_ctx/netfilter invalid context, write:OK
#281/5 verifier_netfilter_ctx/netfilter valid context read and invalid write:OK
#281/6 verifier_netfilter_ctx/netfilter test prog with skb and state read access:OK
#281/7 verifier_netfilter_ctx/netfilter test prog with skb and state read access @unpriv:OK
#281 verifier_netfilter_ctx:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
This checks:
1/2: partial reads of ctx->{skb,state} are rejected
3. read access past sizeof(ctx) is rejected
4. write to ctx content, e.g. 'ctx->skb = NULL;' is rejected
5. ctx->state content cannot be altered
6. ctx->state and ctx->skb can be dereferenced
7. ... same program fails for unpriv (CAP_NET_ADMIN needed).
Link: https://lore.kernel.org/bpf/20230419021152.sjq4gttphzzy6b5f@dhcp-172-26-102-232.dhcp.thefacebook.com/
Link: https://lore.kernel.org/bpf/20230420201655.77kkgi3dh7fesoll@MacBook-Pro-6.local/
Signed-off-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-8-fw@strlen.de
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Put the flag MAP_HUGE_2MB in the correct flags argument instead of the
wrong offset argument.
Fixes: 2ddade3229 ("selftests/xsk: Fix munmap for hugepage allocated umem")
Reported-by: Kal Cutter Conley <kal.conley@dectris.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230421062208.3772-1-magnus.karlsson@gmail.com
When calculating the address of the refcount_t struct within a local
kptr, bpf_refcount_acquire_impl should add refcount_off bytes to the
address of the local kptr. Due to some missing parens, the function is
incorrectly adding sizeof(refcount_t) * refcount_off bytes. This patch
fixes the calculation.
Due to the incorrect calculation, bpf_refcount_acquire_impl was trying
to refcount_inc some memory well past the end of local kptrs, resulting
in kasan and refcount complaints, as reported in [0]. In that thread,
Florian and Eduard discovered that bpf selftests written in the new
style - with __success and an expected __retval, specifically - were
not actually being run. As a result, selftests added in bpf_refcount
series weren't really exercising this behavior, and thus didn't unearth
the bug.
With this fixed behavior it's safe to revert commit 7c4b96c000
("selftests/bpf: disable program test run for progs/refcounted_kptr.c"),
this patch does so.
[0] https://lore.kernel.org/bpf/ZEEp+j22imoN6rn9@strlen.de/
Fixes: 7c50b1cb76 ("bpf: Add bpf_refcount_acquire kfunc")
Reported-by: Florian Westphal <fw@strlen.de>
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20230421074431.3548349-1-davemarchevsky@fb.com
* kvm-arm64/smccc-filtering:
: .
: SMCCC call filtering and forwarding to userspace, courtesy of
: Oliver Upton. From the cover letter:
:
: "The Arm SMCCC is rather prescriptive in regards to the allocation of
: SMCCC function ID ranges. Many of the hypercall ranges have an
: associated specification from Arm (FF-A, PSCI, SDEI, etc.) with some
: room for vendor-specific implementations.
:
: The ever-expanding SMCCC surface leaves a lot of work within KVM for
: providing new features. Furthermore, KVM implements its own
: vendor-specific ABI, with little room for other implementations (like
: Hyper-V, for example). Rather than cramming it all into the kernel we
: should provide a way for userspace to handle hypercalls."
: .
KVM: selftests: Fix spelling mistake "KVM_HYPERCAL_EXIT_SMC" -> "KVM_HYPERCALL_EXIT_SMC"
KVM: arm64: Test that SMC64 arch calls are reserved
KVM: arm64: Prevent userspace from handling SMC64 arch range
KVM: arm64: Expose SMC/HVC width to userspace
KVM: selftests: Add test for SMCCC filter
KVM: selftests: Add a helper for SMCCC calls with SMC instruction
KVM: arm64: Let errors from SMCCC emulation to reach userspace
KVM: arm64: Return NOT_SUPPORTED to guest for unknown PSCI version
KVM: arm64: Introduce support for userspace SMCCC filtering
KVM: arm64: Add support for KVM_EXIT_HYPERCALL
KVM: arm64: Use a maple tree to represent the SMCCC filter
KVM: arm64: Refactor hvc filtering to support different actions
KVM: arm64: Start handling SMCs from EL1
KVM: arm64: Rename SMC/HVC call handler to reflect reality
KVM: arm64: Add vm fd device attribute accessors
KVM: arm64: Add a helper to check if a VM has ran once
KVM: x86: Redefine 'longmode' as a flag for KVM_EXIT_HYPERCALL
Signed-off-by: Marc Zyngier <maz@kernel.org>
* kvm-arm64/timer-vm-offsets: (21 commits)
: .
: This series aims at satisfying multiple goals:
:
: - allow a VMM to atomically restore a timer offset for a whole VM
: instead of updating the offset each time a vcpu get its counter
: written
:
: - allow a VMM to save/restore the physical timer context, something
: that we cannot do at the moment due to the lack of offsetting
:
: - provide a framework that is suitable for NV support, where we get
: both global and per timer, per vcpu offsetting, and manage
: interrupts in a less braindead way.
:
: Conflict resolution involves using the new per-vcpu config lock instead
: of the home-grown timer lock.
: .
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: selftests: Augment existing timer test to handle variable offset
KVM: arm64: selftests: Deal with spurious timer interrupts
KVM: arm64: selftests: Add physical timer registers to the sysreg list
KVM: arm64: nv: timers: Support hyp timer emulation
KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co
KVM: arm64: timers: Abstract the number of valid timers per vcpu
KVM: arm64: timers: Fast-track CNTPCT_EL0 trap handling
KVM: arm64: Elide kern_hyp_va() in VHE-specific parts of the hypervisor
KVM: arm64: timers: Move the timer IRQs into arch_timer_vm_data
KVM: arm64: timers: Abstract per-timer IRQ access
KVM: arm64: timers: Rationalise per-vcpu timer init
KVM: arm64: timers: Allow save/restoring of the physical timer
KVM: arm64: timers: Allow userspace to set the global counter offset
KVM: arm64: Expose {un,}lock_all_vcpus() to the rest of KVM
KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
arm64: Add HAS_ECV_CNTPOFF capability
arm64: Add CNTPOFF_EL2 register definition
...
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add test cases for bridge neighbor suppression, testing both per-port
and per-{Port, VLAN} neighbor suppression with both ARP and NS packets.
Example truncated output:
# ./test_bridge_neigh_suppress.sh
[...]
Tests passed: 148
Tests failed: 0
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC Merge layer (IEEE 802.3-2018 clause 99) does all the heavy
lifting for Frame Preemption (IEEE 802.1Q-2018 clause 6.7.2), a TSN
feature for minimizing latency.
Preemptible traffic is different on the wire from normal traffic in
incompatible ways. If we send a preemptible packet and the link partner
doesn't support preemption, it will drop it as an error frame and we
will never know. The MAC Merge layer has a control plane of its own,
which can be manipulated (using ethtool) in order to negotiate this
capability with the link partner (through LLDP).
Actually the TLV format for LLDP solves this problem only partly,
because both partners only advertise:
- if they support preemption (RX and TX)
- if they have enabled preemption (TX)
so we cannot tell the link partner what to do - we cannot force it to
enable reception of our preemptible packets.
That is fully solved by the verification feature, where the local device
generates some small probe frames which look like preemptible frames
with no useful content, and the link partner is obliged to respond to
them if it supports the standard. If the verification times out, we know
that preemption isn't active in our TX direction on the link.
Having clarified the definition, this selftest exercises the manual
(ethtool) configuration path of 2 link partners (with and without
verification), and the LLDP code path, using the openlldp project.
The test also verifies the TX activity of the MAC Merge layer by
sending traffic through a traffic class configured as preemptible
(using mqprio). There isn't a good way to make this really portable
(user space cannot find out how many traffic classes there are for
a device), but I chose num_tc 4 here, that should work reasonably well.
I also know that some devices (stmmac) only permit TXQ0 to be
preemptible, so this is why PREEMPTIBLE_PRIO was strategically chosen
as 0. Even if other hardware is more configurable, this test should
cover the baseline.
This is not really a "forwarding" selftest, but I put it near the other
"ethtool" selftests.
$ ./ethtool_mm.sh eno0 swp0
TEST: Manual configuration with verification: eno0 to swp0 [ OK ]
TEST: Manual configuration with verification: swp0 to eno0 [ OK ]
TEST: Manual configuration without verification: eno0 to swp0 [ OK ]
TEST: Manual configuration without verification: swp0 to eno0 [ OK ]
TEST: Manual configuration with failed verification: eno0 to swp0 [ OK ]
TEST: Manual configuration with failed verification: swp0 to eno0 [ OK ]
TEST: LLDP [ OK ]
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Counters for the MAC Merge layer and preemptible MAC have standardized
so far on using structured ethtool stats as opposed to the driver
specific names and meanings.
Benefit from that rare opportunity and introduce a helper to lib.sh for
querying standardized counters, in the hope that these will take off for
other uses as well.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
mlxsw selftests often invoke a bail_on_lldpad() helper to make sure LLDPAD
is not running, to prevent conflicts between the QoS configuration applied
through TC or DCB command line tool, and the DCB configuration that LLDPAD
might apply. This helper might be useful to others. Move the function to
lib.sh, and parameterize to make reusable in other contexts.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The driver-specific wrappers of these selftests invoke bail_on_lldpad to
make sure that LLDPAD doesn't trample the configuration. The function
bail_on_lldpad is going to move to lib.sh in the next patch. With that, it
won't be visible for the wrappers before sourcing the framework script. And
after sourcing it, it is too late: the selftest will have run by then.
One option might be to source NUM_NETIFS=0 lib.sh from the wrapper, but
even if that worked (it might, it might not), that seems cumbersome. lib.sh
is doing fair amount of stuff, and even if it works today, it does not look
particularly solid as a solution.
Instead, introduce a hook, sch_tbf_pre_hook(), that when available, gets
invoked. Move the bail to the hook.
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Two test cases:
- "valid read map access into a read-only array 1" and
- "valid read map access into a read-only array 2"
Expect that map_array_ro map is filled with mock data. This logic was
not taken into acount during initial test conversion.
This commit modifies prog_tests/verifier.c entry point for this test
to fill the map.
Fixes: a3c830ae02 ("selftests/bpf: verifier/array_access.c converted to inline assembly")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When a test case is annotated with __retval tag the test_loader engine
would use libbpf's bpf_prog_test_run_opts() to do a test run of the
program and compare retvals.
This commit allows to perform arbitrary actions on bpf object right
before test loader invokes bpf_prog_test_run_opts(). This could be
used to setup some state for program execution, e.g. fill some maps.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Westphal found a bug in and suggested a fix for test_loader.c
processing of __retval tag. Because of this bug the function
test_loader.c:do_prog_test_run() never executed and all __retval test
tags were ignored.
If this bug is fixed a number of test cases from
progs/verifier_array_access.c fail with retval not matching the
expected value. This test was recently converted to use test_loader.c
and inline assembly in [1]. When doing the conversion I missed the
important detail of test_verifier.c operation: when it creates
fixup_map_array_ro, fixup_map_array_wo and fixup_map_array_small it
populates these maps with a dummy record.
Disabling the __retval checks for the affected verifier_array_access
in this commit to avoid false-postivies in any potential bisects.
The issue is addressed in the next patch.
I verified that the __retval tags are now respected by changing
expected return values for all tests annotated with __retval, and
checking that these tests started to fail.
[1] https://lore.kernel.org/bpf/20230325025524.144043-1-eddyz87@gmail.com/
Fixes: 19a8e06f5f ("selftests/bpf: Tests execution support for test_loader.c")
Reported-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/f4c4aee644425842ee6aa8edf1da68f0a8260e7c.camel@gmail.com/T/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Florian Westphal found a bug in test_loader.c processing of __retval
tag. Because of this bug the function test_loader.c:do_prog_test_run()
never executed and all __retval test tags were ignored. This hid an
issue with progs/refcounted_kptr.c tests.
When __retval tag bug is fixed and refcounted_kptr.c tests are run
kernel reports various issues and eventually hangs. Shortest reproducer
is the following command run a few times:
$ for i in $(seq 1 4); do (./test_progs --allow=refcounted_kptr &); done
Commenting out __retval tags for these tests until this issue is resolved.
Reported-by: Florian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/bpf/f4c4aee644425842ee6aa8edf1da68f0a8260e7c.camel@gmail.com/T/
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230420232317.2181776-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add prog test for accessing integer type of variable array in tracing
program.
In addition, hook load_balance function to access sd->span[0], only
to confirm whether the load is successful. Because there is no direct
way to trigger load_balance call.
Co-developed-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Feng Zhou <zhoufeng.zf@bytedance.com>
Link: https://lore.kernel.org/r/20230420032735.27760-3-zhoufeng.zf@bytedance.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reducing the time taken by dscr_sysfs_test.c allows restoring the
default timeout, which was removed in
commit 850507f30c ("selftests/powerpc: Turn off timeout setting for
benchmarks, dscr, signal, tm") because that test took too long.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-8-bgray@linux.ibm.com
This test case is extremely slow, taking around a minute compared to
most of the other DSCR tests taking a second at most. Perf shows most
time is spent by the kernel switching to each CPU it reads in
/sys/devices/system/cpu. This switching is an unavoidable consequnce
of reading all the .../cpuN/dscr values.
Remove the outer iteration loop from this test case, reducing the reads
from 1600 to 16. This still updates the DSCR 16 times and verifies on
every CPU each time, so I do not expect the lower coverage to be
meaningful. The speedup is significant: back down to ~1 second like the
other tests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-7-bgray@linux.ibm.com
The tests currently have a single writer thread updating the system
DSCR with a 1/1000 chance looped only 100 times. So only around one in
10 runs actually do anything.
* Add multiple threads to the dscr_explicit_random_test case.
* Use a barrier to make all the threads start work as simultaneously as
possible.
* Use a rwlock and make all threads have a reasonable chance to write to
the DSCR on each iteration.
PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP is used to prevent
writers from starving while all the other threads keep reading.
Logging the reads/writes shows a decent mix across the whole test.
* Allow all threads a chance to write.
* Make the chance of writing more likely.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-6-bgray@linux.ibm.com
Add new cases to the relevant tests that use explicitly synchronized
threads to test the behaviour across context switches with less
randomness. By locking the participants to the same CPU we guarantee a
context switch occurs each time they make progress, which is a likely
failure point if the kernel is not tracking the thread local DSCR
correctly.
The random case is left in to keep exercising potential edge cases.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-5-bgray@linux.ibm.com
All current users of bind_to_cpu() don't care _which_ CPU they get, just
that they are bound to a single free one. So alter the interface to
1. Accept a BIND_CPU_ANY value that tells it to automatically
pick a CPU
2. Return the picked CPU
And convert all these users to bind_to_cpu(BIND_CPU_ANY).
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-4-bgray@linux.ibm.com
This function will be useful in the DSCR test patches later in this
series, so promote it to be shared by all powerpc selftests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-3-bgray@linux.ibm.com
Correct a couple of typos while working on other improvements to the
DSCR tests.
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230406043320.125138-2-bgray@linux.ibm.com
This macro is to be used in assembly where C functions are called.
pcrel addressing mode requires branches to functions with a
localentry value of 1 to have either a trailing nop or @notoc.
This macro permits the latter without changing callers.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Add dummy definitions to fix selftests build]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20230408021752.862660-5-npiggin@gmail.com
Fix the unmapping of hugepage allocated umems so that they are
properly unmapped. The new test referred to in the fixes label,
introduced a test that allocated a umem that is not a multiple of a 2M
hugepage size. This is fine for mmap() that rounds the size up the
nearest multiple of 2M. But munmap() requires the size to be a
multiple of the hugepage size in order for it to unmap the region. The
current behaviour of not properly unmapping the umem, was discovered
when further additions of tests that require hugepages (unaligned mode
tests only) started failing as the system was running out of
hugepages.
Fixes: c0801598e5 ("selftests: xsk: Add test UNALIGNED_INV_DESC_4K1_FRAME_SIZE")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230418143617.27762-1-magnus.karlsson@gmail.com
sysctl memfd_noexec is pid-namespaced, non-reservable, and inherent to the
child process.
Move the inherence test from init ns to child ns, so init ns can keep the
default value.
Link: https://lkml.kernel.org/r/20230414022801.2545257-1-jeffxu@google.com
Signed-off-by: Jeff Xu <jeffxu@google.com>
Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/oe-lkp/202303312259.441e35db-yujie.liu@intel.com
Tested-by: Yujie Liu <yujie.liu@intel.com>
Cc: Daniel Verkamp <dverkamp@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jorge Lucangeli Obes <jorgelo@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The va_high_addr_switch selftest is used to test mmap across 128TB
boundary. It divides the selftest cases into two main categories on the
basis of size. One set is used to create mappings that are multiples of
PAGE_SIZE while the other creates mappings that are multiples of
HUGETLB_SIZE.
In order to run the hugetlb testcases the binary must be appended with
"--run-hugetlb" but the file that used to run the test only invokes the
binary, thereby completely skipping the hugetlb testcases. Hence, the
required statement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-6-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Arm64 has a default hugepage size of 512MB when CONFIG_ARM64_64K_PAGES=y
is enabled. While testing on arm64 platforms having up to 4PB of virtual
address space, a minimum of 6 hugepages were required for all test cases
to pass. Support for this requirement has been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-5-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The in code comments for the selftest were made on the basis of 128TB
switch, an architecture feature specific to PowerPc and x86 platforms.
Keeping in mind the support added for arm64 platforms which implements a
256TB switch, a more generic explanation has been provided.
Link: https://lkml.kernel.org/r/20230323105243.2807166-4-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
As the initial selftest only took into consideration PowperPC and x86
architectures, on adding support for arm64, a platform independent naming
convention is chosen.
Link: https://lkml.kernel.org/r/20230323105243.2807166-3-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Implement support for arm64 on va".
The va_128TBswitch selftest is designed and implemented for PowerPC and
x86 architectures which support a 128TB switch, up to 256TB of virtual
address space and hugepage sizes of 16MB and 2MB respectively. Arm64
platforms on the other hand support a 256Tb switch, up to 4PB of virtual
address space and a default hugepage size of 512MB when 64k pagesize is
enabled.
These architectural differences require introducing support for arm64
platforms, after which a more generic naming convention is suggested. The
in code comments are amended to provide a more platform independent
explanation of the working of the code and nr_hugepages are configured as
required. Finally, the file running the testcase is modified in order to
prevent skipping of hugetlb testcases of va_high_addr_switch.
This patch (of 5):
Arm64 platforms have the ability to support 64kb pagesize, 512MB default
hugepage size and up to 4PB of virtual address space. The address switch
occurs at 256TB as opposed to 128TB. Hence, the necessary support has
been added.
Link: https://lkml.kernel.org/r/20230323105243.2807166-1-chaitanyas.prakash@arm.com
Link: https://lkml.kernel.org/r/20230323105243.2807166-2-chaitanyas.prakash@arm.com
Signed-off-by: Chaitanya S Prakash <chaitanyas.prakash@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This new test tests against the returned ioctls from UFFDIO_REGISTER,
where put into uffdio_register.ioctls.
This also tests the expected failure cases of UFFDIO_REGISTER, aka:
- Register with empty mode should fail with -EINVAL
- Register minor without page cache (anon) should fail with -EINVAL
Link: https://lkml.kernel.org/r/20230412164548.329376-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The userfaultfd stress test never tested private shmem, which I think was
overlooked long due. Add it so it matches with uffd unit test and it'll
cover all memory supported with the three memory types.
Meanwhile, rename the memory types a bit. Considering shared mem is the
major use case for both shmem / hugetlbfs, changing from:
anon, hugetlb, hugetlb_shared, shmem
To (with shmem-private added):
anon, hugetlb, hugetlb-private, shmem, shmem-private
Add the shmem-private to run_vmtests.sh too.
Link: https://lkml.kernel.org/r/20230412164546.329355-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
With the new uffd unit test covering the /dev/userfaultfd path and syscall
path of uffd initializations, we can safely drop the devnode test in the
old stress test.
One thing is to avoid duplication of running the stress test twice which is
an overkill to only test the /dev/ interface in run_vmtests.sh.
The other benefit is now all uffd tests (that uses userfaultfd_open) can
run automatically as long as any type of interface is enabled (either
syscall or dev), so it's more likely to succeed rather than fail due to
unprivilege.
With this patch lands, we can drop all the "mem_type:XXX" handlings too.
Link: https://lkml.kernel.org/r/20230412164525.329176-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Allow skip a unit test properly due to no privilege (e.g. sigbus and
events tests).
[colin.i.king@gmail.com: fix spelling mistake "priviledge" -> "privilege"]
Link: https://lkml.kernel.org/r/20230414081506.1678998-1-colin.i.king@gmail.com
Link: https://lkml.kernel.org/r/20230412164520.329163-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Userfaultfd minor+wp mode was very recently added. The test will fail on
the old kernels at ioctl(UFFDIO_CONTINUE) which is misterious.
Unfortunately there's no feature bit to detect for this support.
Add a hack to leverage WP_UNPOPULATED to detect whether that feature
existed, since WP_UNPOPULATED was merged right after minor+wp.
Link: https://lkml.kernel.org/r/20230412164517.329152-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Simplifies it a bit along the way, e.g., drop the never used offset field
(which was always the 1st page so offset=0).
Introduce uffd_register_with_ioctls() out of uffd_register() to detect
uffdio_register.ioctls got returned. Check that automatically when testing
UFFDIO_ZEROPAGE on different types of memory (and kernel).
Link: https://lkml.kernel.org/r/20230412164404.328815-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move the two tests into the unit test, and convert it into 20 standalone
tests:
- events test on all 5 mem types, with wp on/off
- signal test on all 5 mem types, with wp on/off
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... done
Testing sigbus on hugetlb-private... done
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... done
Testing sigbus-wp on hugetlb-private... done
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... done
Testing events on hugetlb-private... done
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... done
Testing events-wp on hugetlb-private... done
It'll also remove a lot of global references along the way,
e.g. test_uffdio_wp will be replaced with the wp value passed over.
Link: https://lkml.kernel.org/r/20230412164400.328798-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This moves the minor test to the new unit test.
Rewrite the content check with char* opeartions to avoid fiddling with
my_bcmp().
Drop global vars test_uffdio_minor and test_collapse, just assume test them
always in common code for now.
OTOH make this single test into five tests:
- minor test on [shmem, hugetlb] with wp=false
- minor test on [shmem, hugetlb] with wp=true
- minor test + collapse on shmem only
One thing to mention that we used to test COLLAPSE+WP but that doesn't
sound right at all. It's possible it's silently broken but unnoticed
because COLLAPSE is not part of the default test suite.
Make the MADV_COLLAPSE test fail-able (by skip it when failing), because
it's not guaranteed to success anyway.
Drop a bunch of useless code after the move, because the unit test always
use aligned num of pages and has nothing to do with n_cpus.
Link: https://lkml.kernel.org/r/20230412164357.328779-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move it over and make it split into two tests, one for pagemap and one for
the new WP_UNPOPULATED (to be a separate one).
The thp pagemap test wasn't really working (with MADV_HUGEPAGE). Let's
just drop it (since it never really worked anyway..) and leave that for
later.
Link: https://lkml.kernel.org/r/20230412164352.328733-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a framework to be prepared to move unit tests from uffd-stress.c into
uffd-unit-tests.c. The goal is to allow detection of uffd features for
each test, and also loop over specified types of memory that a test
support.
Link: https://lkml.kernel.org/r/20230412164348.328710-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Mostly to detect hugetlb allocation errors and skip hugetlb tests when
pages are not allocated.
Link: https://lkml.kernel.org/r/20230412164345.328659-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make the handler optionally apply WP bit when resolving page faults for
either missing or minor page faults. This moves towards removing global
test_uffdio_wp outside of the common code.
Link: https://lkml.kernel.org/r/20230412164341.328618-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Prepare for adding more fields into the struct.
Link: https://lkml.kernel.org/r/20230412164337.328607-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Suggested-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
hpage_size was wrongly used. Sometimes it means hugetlb default size,
sometimes it was used as thp size.
Remove the global variable and use the right one at each place.
Link: https://lkml.kernel.org/r/20230412164333.328596-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Drop it by creating the memfd dynamically in the tests.
Link: https://lkml.kernel.org/r/20230412164331.328584-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add one simple test for UFFDIO_API. With that, I also added a bunch of
small but handy helpers along the way.
Link: https://lkml.kernel.org/r/20230412164257.328375-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Provide two helpers to open an uffd handle. Drop the error checks around
SKIPs because it's inside an errexit() anyway, which IMHO doesn't really
help much if the test will not continue.
Link: https://lkml.kernel.org/r/20230412164254.328335-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add two helpers to register/unregister to an uffd. Use them to drop
duplicate codes.
This patch also drops assert_expected_ioctls_present() and
get_expected_ioctls(). Reasons:
- It'll need a lot of effort to pass test_type==HUGETLB into it from
the upper, so it's the simplest way to get rid of another global var
- The ioctls returned in UFFDIO_REGISTER is hardly useful at all,
because any app can already detect kernel support on any ioctl via its
corresponding UFFD_FEATURE_*. The check here is for sanity mostly but
it's probably destined no user app will even use it.
- It's not friendly to one future goal of uffd to run on old
kernels, the problem is get_expected_ioctls() compiles against
UFFD_API_RANGE_IOCTLS, which is a value that can change depending on
where the test is compiled, rather than reflecting what the kernel
underneath has. It means it'll report false negatives on old kernels
so it's against our will.
So let's make our lives easier.
[peterx@redhat.com; tools/testing/selftests/mm/hugepage-mremap.c: add headers]
Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
Link: https://lkml.kernel.org/r/20230412164247.328293-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
In many ways it's weird and unwanted to keep all the tests in the same
userfaultfd.c at least when still in the current way.
For example, it doesn't make much sense to run the stress test for each
method we can create an userfaultfd handle (either via syscall or /dev/
node). It's a waste of time running this twice for the whole stress as
the stress paths are the same, only the open path is different.
It's also just weird to need to manually specify different types of memory
to run all unit tests for the userfaultfd interface. We should be able to
just run a single program and that should go through all functional uffd
tests without running the stress test at all. The stress test was more
for torturing and finding race conditions. We don't want to wait for
stress to finish just to regress test a functional test.
When we start to pile up more things on top of the same file and same
functions, things start to go a bit chaos and the code is just harder to
maintain too with tons of global variables.
This patch creates a new test uffd-unit-tests to keep userfaultfd unit
tests in the future, currently empty.
Meanwhile rename the old userfaultfd.c test to uffd-stress.c.
Link: https://lkml.kernel.org/r/20230412164244.328270-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Move common utility functions into uffd-common.[ch] files from the
original userfaultfd.c. This prepares for a split of userfaultfd.c into
two tests: one to only cover the old but powerful stress test, the other
one covers all the functional tests.
This movement is kind of a brute-force effort for now, with light
touch-ups but nothing should really change. There's chances to optimize
more, but let's leave that for later.
Link: https://lkml.kernel.org/r/20230412164241.328259-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
The idea was trying to flip this var in the alarm handler from time to
time to test -EEXIST of UFFDIO_ZEROPAGE, but firstly it's only used in the
zeropage test so probably only used once, meanwhile we passed
"retry==false" so it'll never got tested anyway.
Drop both sides so we always test UFFDIO_ZEROPAGE retries if has_zeropage
is set (!hugetlb).
One more thing to do is doing UFFDIO_REGISTER for the alias buffer too,
because otherwise the test won't even pass! We were just lucky that this
test never really got ran at all.
Link: https://lkml.kernel.org/r/20230412164238.328238-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Make the check as simple as "test_type == TEST_HUGETLB" because that's the
only mem that doesn't support ZEROPAGE.
Link: https://lkml.kernel.org/r/20230412164234.328168-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We've got the macros in uffd-stress.c, move it over and use it in
vm_util.h.
Link: https://lkml.kernel.org/r/20230412164227.328145-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There're already 3 same definitions of the three functions. Move it into
vm_util.[ch].
Link: https://lkml.kernel.org/r/20230412164223.328134-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
We do have plenty of files that want to link against vm_util.c. Just make
it simple by linking it always.
Link: https://lkml.kernel.org/r/20230412164220.328123-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
TEST_GEN_PROGS and TEST_GEN_FILES are used randomly in the mm/Makefile to
specify programs that need to build. Logically all these binaries should
all fall into TEST_GEN_PROGS.
Replace those TEST_GEN_FILES with TEST_GEN_PROGS, so that we can reference
all the tests easily later.
[peterx@redhat.com: tools/testing/selftests/mm/Makefile: don't wipe out TEST_GEN_PROGS]
Link: https://lkml.kernel.org/r/ZDxrvZh/cw357D8P@x1n
Link: https://lkml.kernel.org/r/20230412164218.328104-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
There're two util headers under mm/ kselftest. Merge one with another.
It turns out util.h is the easy one to move.
When merging, drop PAGE_SIZE / PAGE_SHIFT because they're unnecessary
wrappers to page_size() / page_shift(), meanwhile rename them to psize()
and pshift() so as to not conflict with some existing definitions in some
test files that includes vm_util.h.
Link: https://lkml.kernel.org/r/20230412164120.327731-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Dump a summary after running whatever test specified. Useful for human
runners to identify any kind of failures (besides exit code).
Link: https://lkml.kernel.org/r/20230412164117.327720-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "selftests/mm: Split / Refactor userfault test", v2.
This patchset splits userfaultfd.c into two tests:
- uffd-stress: the "vanilla", old and powerful stress test
- uffd-unit-tests: all the unit tests will be moved here
This is on my todo list for a long time but I never did it for real. The
uffd test is growing into a small and cute monster. I start to notice it's
going harder to maintain such a test and make it useful.
A few issues I found when looking at userfaultfd test:
- We have a bunch of unit tests in userfaultfd.c, but they always need to
be run only after a stress type. No way to not do it.
- We can only run an unit test for one memory type only, if we want to
do a quick smoke test to check regressions, there's no good way. The
best to come currently is "bash ./run_vmtests.sh -t userfaultfd" thanks
to the most recent changes to run_vmtests.sh on tagging. Still, that
needs to run the stress tests always and hard to see what's wrong.
- It's hard to add a new unit test to userfaultfd.c, we don't really know
what's happening, not until we mostly read the whole file.
- We did a bunch of useless tests, e.g. we run twice the whole suite of
stress test just to verify both syscall and /dev/userfaultfd. They're
all using userfaultfd_new() to create the handle, everything should
really be the same underneath. One simple unit test should cover that!
- We have tens of global variables in one file but shared with all the
tests. Some of them are not suitable to be a global var from
maintainance pov. It enforces every unit test to consider how these
vars affects the stress test and vice versa, but that's logically not
necessary.
- Userfaultfd test is not friendly to old kernels. Mostly it only works
on the latest kernel tree. It's preferrable to be run on all kernels
and properly report what's missing.
I'll stop here, I feel like I can still list some..
This patchset should resolve all issues above, and actually we can do even
more on top. I stopped doing that until I found I already got 29 patches
and 2000+ LOC changes. That's already a patchset terrible enough so we
should move in small steps.
After the whole set applied, "./run_vmtests.sh -t userfaultfd" looks like
this:
===8<===
vm.nr_hugepages = 1024
-------------------------
running ./uffd-unit-tests
-------------------------
Testing UFFDIO_API (with syscall)... done
Testing UFFDIO_API (with /dev/userfaultfd)... done
Testing register-ioctls on anon... done
Testing register-ioctls on shmem... done
Testing register-ioctls on shmem-private... done
Testing register-ioctls on hugetlb... done
Testing register-ioctls on hugetlb-private... done
Testing zeropage on anon... done
Testing zeropage on shmem... done
Testing zeropage on shmem-private... done
Testing zeropage on hugetlb... done
Testing zeropage on hugetlb-private... done
Testing pagemap on anon... done
Testing wp-unpopulated on anon... done
Testing minor on shmem... done
Testing minor on hugetlb... done
Testing minor-wp on shmem... done
Testing minor-wp on hugetlb... done
Testing minor-collapse on shmem... done
Testing sigbus on anon... done
Testing sigbus on shmem... done
Testing sigbus on shmem-private... done
Testing sigbus on hugetlb... done
Testing sigbus on hugetlb-private... done
Testing sigbus-wp on anon... done
Testing sigbus-wp on shmem... done
Testing sigbus-wp on shmem-private... done
Testing sigbus-wp on hugetlb... done
Testing sigbus-wp on hugetlb-private... done
Testing events on anon... done
Testing events on shmem... done
Testing events on shmem-private... done
Testing events on hugetlb... done
Testing events on hugetlb-private... done
Testing events-wp on anon... done
Testing events-wp on shmem... done
Testing events-wp on shmem-private... done
Testing events-wp on hugetlb... done
Testing events-wp on hugetlb-private... done
Userfaults unit tests: pass=39, skip=0, fail=0 (total=39)
[PASS]
--------------------------------
running ./uffd-stress anon 20 16
--------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 345 missing (26+48+61+102+30+12+59+7) 1596 wp (120+139+317+346+215+67+306+86)
[...]
[PASS]
------------------------------------
running ./uffd-stress hugetlb 128 32
------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 29 missing (6+6+6+5+4+2+0+0) 104 wp (20+19+22+18+7+12+5+1)
[...]
[PASS]
--------------------------------------------
running ./uffd-stress hugetlb-private 128 32
--------------------------------------------
nr_pages: 64, nr_pages_per_cpu: 8
bounces: 31, mode: rnd racing ver poll, userfaults: 33 missing (12+9+7+0+5+0+0+0) 111 wp (24+25+14+14+11+17+5+1)
[...]
[PASS]
---------------------------------
running ./uffd-stress shmem 20 16
---------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 247 missing (15+17+34+60+81+37+3+0) 2038 wp (180+114+276+400+381+318+165+204)
[...]
[PASS]
-----------------------------------------
running ./uffd-stress shmem-private 20 16
-----------------------------------------
nr_pages: 5120, nr_pages_per_cpu: 640
bounces: 15, mode: rnd racing ver poll, userfaults: 235 missing (52+29+55+56+13+9+16+5) 2849 wp (218+406+461+531+328+284+430+191)
[...]
[PASS]
SUMMARY: PASS=6 SKIP=0 FAIL=0
===8<===
The output may be different if we miss some features (e.g., hugetlb not
allocated, old kernel, less privilege of uffd handle), but they should show
up with good reasons. E.g., I tried to run the unit test on my Fedora
kernel and it gives me:
===8<===
UFFDIO_API (with syscall)... failed [reason: UFFDIO_API should fail with wrong api but didn't]
UFFDIO_API (with /dev/userfaultfd)... skipped [reason: cannot open userfaultfd handle]
zeropage on anon... done
zeropage on shmem... done
zeropage on shmem-private... done
zeropage-hugetlb on hugetlb... done
zeropage-hugetlb on hugetlb-private... done
pagemap on anon... pagemap on anon... pagemap on anon... done
wp-unpopulated on anon... skipped [reason: feature missing]
minor on shmem... done
minor on hugetlb... done
minor-wp on shmem... skipped [reason: feature missing]
minor-wp on hugetlb... skipped [reason: feature missing]
minor-collapse on shmem... done
sigbus on anon... skipped [reason: possible lack of priviledge]
sigbus on shmem... skipped [reason: possible lack of priviledge]
sigbus on shmem-private... skipped [reason: possible lack of priviledge]
sigbus on hugetlb... skipped [reason: possible lack of priviledge]
sigbus on hugetlb-private... skipped [reason: possible lack of priviledge]
sigbus-wp on anon... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem... skipped [reason: possible lack of priviledge]
sigbus-wp on shmem-private... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb... skipped [reason: possible lack of priviledge]
sigbus-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
events on anon... skipped [reason: possible lack of priviledge]
events on shmem... skipped [reason: possible lack of priviledge]
events on shmem-private... skipped [reason: possible lack of priviledge]
events on hugetlb... skipped [reason: possible lack of priviledge]
events on hugetlb-private... skipped [reason: possible lack of priviledge]
events-wp on anon... skipped [reason: possible lack of priviledge]
events-wp on shmem... skipped [reason: possible lack of priviledge]
events-wp on shmem-private... skipped [reason: possible lack of priviledge]
events-wp on hugetlb... skipped [reason: possible lack of priviledge]
events-wp on hugetlb-private... skipped [reason: possible lack of priviledge]
Userfaults unit tests: pass=9, skip=24, fail=1 (total=34)
===8<===
Patch layout:
- Revert "userfaultfd: don't fail on unrecognized features"
Something I found when I got the UFFDIO_API test below. Axel, I still
propose to revert it as a whole, but feel free to continue the discussion
from the original patch thread.
- selftests/mm: Update .gitignore with two missing tests
- selftests/mm: Dump a summary in run_vmtests.sh
- selftests/mm: Merge util.h into vm_util.h
- selftests/mm: Use TEST_GEN_PROGS where proper
- selftests/mm: Link vm_util.c always
- selftests/mm: Merge default_huge_page_size() into one
- selftests/mm: Use PM_* macros in vm_utils.h
- selftests/mm: Reuse pagemap_get_entry() in vm_util.h
- selftests/mm: Test UFFDIO_ZEROPAGE only when !hugetlb
- selftests/mm: Drop test_uffdio_zeropage_eexist
Until here, all cleanups here and there. I wanted to keep going, but I
found that maybe it'll take a few more days to split the test. Hence I
did a split starting from the next one, so we have a working thing first.
- selftests/mm: Create uffd-common.[ch]
- selftests/mm: Split uffd tests into uffd-stress and uffd-unit-tests
This did the major brute force split of common codes into
uffd-common.[ch]. That'll be the so far common base for stress and unit
tests. Then a new unit test is created.
- selftests/mm: uffd_[un]register()
- selftests/mm: uffd_open_{dev|sys}()
- selftests/mm: UFFDIO_API test
This patch hides here to start writting the 1st unit test with
UFFDIO_API, also detection of userfaultfd privileges.
- selftests/mm: Drop global mem_fd in uffd tests
- selftests/mm: Drop global hpage_size in uffd tests
- selftests/mm: Rename uffd_stats to uffd_args
- selftests/mm: Let uffd_handle_page_fault() takes wp parameter
- selftests/mm: Allow allocate_area() to fail properly
Some further cleanup that I noticed otherwise hard to move the tests.
- selftests/mm: Add framework for uffd-unit-test
The major patch provides the framework for most of the rest unit tests.
- selftests/mm: Move uffd pagemap test to unit test
- selftests/mm: Move uffd minor test to unit test
- selftests/mm: Move uffd sig/events tests into uffd unit tests
- selftests/mm: Move zeropage test into uffd unit tests
Move unit tests and suite them into the new file.
- selftests/mm: Workaround no way to detect uffd-minor + wp
- selftests/mm: Allow uffd test to skip properly with no privilege
- selftests/mm: Drop sys/dev test in uffd-stress test
- selftests/mm: Add shmem-private test to uffd-stress
A bunch of changes to do better on error reportings, and add
shmem-private to the stress test which was long missing.
- selftests/mm: Add uffdio register ioctls test
One more patch to test uffdio_register.ioctls.
This patch (of 30):
Update .gitignore with two missing tests.
Link: https://lkml.kernel.org/r/20230412163922.327282-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20230412164114.327709-1-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Zach O'Keefe <zokeefe@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Let's add some tests that trigger (pte|pmd)_mkdirty on VMAs without write
permissions. If an architecture implementation is wrong, we might
accidentally set the PTE/PMD writable and allow for write access in a VMA
without write permissions.
The tests include reproducers for the two issues recently discovered
and worked-around in core-MM for now:
(1) commit 624a2c94f5 ("Partly revert "mm/thp: carry over dirty
bit when thp splits on pmd"")
(2) commit 96a9c287e2 ("mm/migrate: fix wrongly apply write bit
after mkdirty on sparc64")
In addition, some other tests that reveal further issues.
All tests pass under x86_64:
./mkdirty
# [INFO] detected THP size: 2048 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
ok 6 SIGSEGV generated, page not modified
# Totals: pass:6 fail:0 xfail:0 xpass:0 skip:0 error:0
But some fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 3 out of 6 tests failed
# Totals: pass:3 fail:3 xfail:0 xpass:0 skip:0 error:0
Reverting both above commits makes all tests fail on sparc64:
./mkdirty
# [INFO] detected THP size: 8192 KiB
TAP version 13
1..6
# [INFO] PTRACE write access
not ok 1 SIGSEGV generated, page not modified
# [INFO] PTRACE write access to THP
not ok 2 SIGSEGV generated, page not modified
# [INFO] Page migration
not ok 3 SIGSEGV generated, page not modified
# [INFO] Page migration of THP
not ok 4 SIGSEGV generated, page not modified
# [INFO] PTE-mapping a THP
not ok 5 SIGSEGV generated, page not modified
# [INFO] UFFDIO_COPY
not ok 6 SIGSEGV generated, page not modified
Bail out! 6 out of 6 tests failed
# Totals: pass:0 fail:6 xfail:0 xpass:0 skip:0 error:0
The tests are useful to detect other problematic archs, to verify new
arch fixes, and to stop such issues from reappearing in the future.
For now, we don't add any hugetlb tests.
Link: https://lkml.kernel.org/r/20230411142512.438404-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "mm: (pte|pmd)_mkdirty() should not unconditionally allow for
write access".
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
This patch (of 6):
The COW selftest can deal with THP not being configured. So move error
handling of read_pmd_pagesize() into the callers such that we can reuse it
in the COW selftest.
Link: https://lkml.kernel.org/r/20230411142512.438404-1-david@redhat.com
Link: https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com [1]
Link: https://lkml.kernel.org/r/20230411142512.438404-2-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add a test case to check whether the number of maple_alloc structures is
actually equal to mas->alloc->total.
Link: https://lkml.kernel.org/r/20230411041005.26205-2-zhangpeng.00@bytedance.com
Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This adds a test for the recently added RISC-V interface for probing
hardware capabilities. It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well.
Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
To make it easier for bleeding-edge BPF applications, such as sched_ext,
to utilize open-coded iterators, move bpf_for(), bpf_for_each(), and
bpf_repeat() macros from selftests/bpf-internal bpf_misc.h helper, to
libbpf-provided bpf_helpers.h header.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230418002148.3255690-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The first field of /proc/uptime relies on the CLOCK_BOOTTIME clock which
can also be fetched from clock_gettime() API.
Improve the test coverage while verifying the monotonicity of
CLOCK_BOOTTIME accross both interfaces.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-9-frederic@kernel.org
Due to broken iowait task counting design (cf: comments above
get_cpu_idle_time_us() and nr_iowait()), it is not possible to provide
the guarantee that /proc/stat or /proc/uptime display monotonic idle
time values.
Remove the assertions that verify the related wrong assumption so that
testers and maintainers don't spend more time on that.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230222144649.624380-8-frederic@kernel.org
Add a selftest to ensure subreg equality if source register
upper 32bit is 0. Without previous patch, the test will
fail verification.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230417222139.360607-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Most of the code had an issue according to ShellCheck.
That's mainly due to the fact it incorrectly believes most of the code
was unreachable because it's invoked by variable name, see how the
"tests" array is used.
Once SC2317 has been ignored, three small warnings were still visible:
- SC2155: Declare and assign separately to avoid masking return values.
- SC2046: Quote this to prevent word splitting: can be ignored because
"ip netns pids" can display more than one pid.
- SC2166: Prefer [ p ] || [ q ] as [ p -o q ] is not well defined.
This probably didn't fix any actual issues but it might help spotting
new interesting warnings reported by ShellCheck as just before,
ShellCheck was reporting issues for most lines making it a bit useless.
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
mptcp_connect tool was printing some duplicated entries when showing how
to use it: -j -l -r
While at it, I also:
- moved the very few entries that were not sorted,
- added -R that was missing since
commit 8a4b910d00 ("mptcp: selftests: add rcvbuf set option"),
- removed the -u parameter that has been removed in
commit f730b65c9d ("selftests: mptcp: try to set mptcp ulp mode in different sk states").
No need to backport this, it is just an internal tool used by our
selftests. The help menu is mainly useful for MPTCP kernel devs.
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The upcall socket interface can be exercised now to make sure that
future feature adjustments to the field can maintain backwards
compatibility.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a basic set of fields to print in a 'dpflow' format. This will be
used by future commits to check for flow fields after parsing, as
well as verifying the flow fields pushed into the kernel from
userspace.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Includes an associated test to generate netns and connect
interfaces, with the option to include packet tracing.
This will be used in the future when flow support is added
for additional test cases.
Signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've managed to improve the UX for kptrs significantly over the last 9
months. All of the prior main use cases, struct bpf_cpumask *, struct
task_struct *, and struct cgroup *, have all been updated to be
synchronized mainly using RCU. In other words, their KF_ACQUIRE kfunc
calls are all KF_RCU, and the pointers themselves are MEM_RCU and can be
accessed in an RCU read region in BPF.
In a follow-on change, we'll be removing the KF_KPTR_GET kfunc flag.
This patch prepares for that by removing the
bpf_kfunc_call_test_kptr_get() kfunc, and all associated selftests.
Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230416084928.326135-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Validate that the following new ptrace requests work as expected
* PTRACE_GET_SYSCALL_USER_DISPATCH_CONFIG
returns the contents of task->syscall_dispatch
* PTRACE_SET_SYSCALL_USER_DISPATCH_CONFIG
sets the contents of task->syscall_dispatch
Signed-off-by: Gregory Price <gregory.price@memverge.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230407171834.3558-5-gregory.price@memverge.com
Test that POSIX timers using CLOCK_PROCESS_CPUTIME_ID eventually deliver
a signal to all running threads. This effectively tests that the kernel
doesn't prefer any one thread (or subset of threads) for signal delivery.
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Marco Elver <elver@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230316123028.2890338-2-elver@google.com
Test refcounted local kptr functionality added in previous patches in
the series.
Usecases which pass verification:
* Add refcounted local kptr to both tree and list. Then, read and -
possibly, depending on test variant - delete from tree, then list.
* Also test doing read-and-maybe-delete in opposite order
* Stash a refcounted local kptr in a map_value, then add it to a
rbtree. Read from both, possibly deleting after tree read.
* Add refcounted local kptr to both tree and list. Then, try reading and
deleting twice from one of the collections.
* bpf_refcount_acquire of just-added non-owning ref should work, as
should bpf_refcount_acquire of owning ref just out of bpf_obj_new
Usecases which fail verification:
* The simple successful bpf_refcount_acquire cases from above should
both fail to verify if the newly-acquired owning ref is not dropped
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-10-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch modifies bpf_rbtree_remove to account for possible failure
due to the input rb_node already not being in any collection.
The function can now return NULL, and does when the aforementioned
scenario occurs. As before, on successful removal an owning reference to
the removed node is returned.
Adding KF_RET_NULL to bpf_rbtree_remove's kfunc flags - now KF_RET_NULL |
KF_ACQUIRE - provides the desired verifier semantics:
* retval must be checked for NULL before use
* if NULL, retval's ref_obj_id is released
* retval is a "maybe acquired" owning ref, not a non-owning ref,
so it will live past end of critical section (bpf_spin_unlock), and
thus can be checked for NULL after the end of the CS
BPF programs must add checks
============================
This does change bpf_rbtree_remove's verifier behavior. BPF program
writers will need to add NULL checks to their programs, but the
resulting UX looks natural:
bpf_spin_lock(&glock);
n = bpf_rbtree_first(&ghead);
if (!n) { /* ... */}
res = bpf_rbtree_remove(&ghead, &n->node);
bpf_spin_unlock(&glock);
if (!res) /* Newly-added check after this patch */
return 1;
n = container_of(res, /* ... */);
/* Do something else with n */
bpf_obj_drop(n);
return 0;
The "if (!res)" check above is the only addition necessary for the above
program to pass verification after this patch.
bpf_rbtree_remove no longer clobbers non-owning refs
====================================================
An issue arises when bpf_rbtree_remove fails, though. Consider this
example:
struct node_data {
long key;
struct bpf_list_node l;
struct bpf_rb_node r;
struct bpf_refcount ref;
};
long failed_sum;
void bpf_prog()
{
struct node_data *n = bpf_obj_new(/* ... */);
struct bpf_rb_node *res;
n->key = 10;
bpf_spin_lock(&glock);
bpf_list_push_back(&some_list, &n->l); /* n is now a non-owning ref */
res = bpf_rbtree_remove(&some_tree, &n->r, /* ... */);
if (!res)
failed_sum += n->key; /* not possible */
bpf_spin_unlock(&glock);
/* if (res) { do something useful and drop } ... */
}
The bpf_rbtree_remove in this example will always fail. Similarly to
bpf_spin_unlock, bpf_rbtree_remove is a non-owning reference
invalidation point. The verifier clobbers all non-owning refs after a
bpf_rbtree_remove call, so the "failed_sum += n->key" line will fail
verification, and in fact there's no good way to get information about
the node which failed to add after the invalidation. This patch removes
non-owning reference invalidation from bpf_rbtree_remove to allow the
above usecase to pass verification. The logic for why this is now
possible is as follows:
Before this series, bpf_rbtree_add couldn't fail and thus assumed that
its input, a non-owning reference, was in the tree. But it's easy to
construct an example where two non-owning references pointing to the same
underlying memory are acquired and passed to rbtree_remove one after
another (see rbtree_api_release_aliasing in
selftests/bpf/progs/rbtree_fail.c).
So it was necessary to clobber non-owning refs to prevent this
case and, more generally, to enforce "non-owning ref is definitely
in some collection" invariant. This series removes that invariant and
the failure / runtime checking added in this patch provide a clean way
to deal with the aliasing issue - just fail to remove.
Because the aliasing issue prevented by clobbering non-owning refs is no
longer an issue, this patch removes the invalidate_non_owning_refs
call from verifier handling of bpf_rbtree_remove. Note that
bpf_spin_unlock - the other caller of invalidate_non_owning_refs -
clobbers non-owning refs for a different reason, so its clobbering
behavior remains unchanged.
No BPF program changes are necessary for programs to remain valid as a
result of this clobbering change. A valid program before this patch
passed verification with its non-owning refs having shorter (or equal)
lifetimes due to more aggressive clobbering.
Also, update existing tests to check bpf_rbtree_remove retval for NULL
where necessary, and move rbtree_api_release_aliasing from
progs/rbtree_fail.c to progs/rbtree.c since it's now expected to pass
verification.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-8-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The linked_list tests use macros and function pointers to reduce code
duplication. Earlier in the series, bpf_list_push_{front,back} were
modified to be macros, expanding to invoke actual kfuncs
bpf_list_push_{front,back}_impl. Due to this change, a code snippet
like:
void (*p)(void *, void *) = (void *)&bpf_list_##op;
p(hexpr, nexpr);
meant to do bpf_list_push_{front,back}(hexpr, nexpr), will no longer
work as it's no longer valid to do &bpf_list_push_{front,back} since
they're no longer functions.
This patch fixes issues of this type, along with two other minor changes
- one improvement and one fix - both related to the node argument to
list_push_{front,back}.
* The fix: migration of list_push tests away from (void *, void *)
func ptr uncovered that some tests were incorrectly passing pointer
to node, not pointer to struct bpf_list_node within the node. This
patch fixes such issues (CHECK(..., f) -> CHECK(..., &f->node))
* The improvement: In linked_list tests, the struct foo type has two
list_node fields: node and node2, at byte offsets 0 and 40 within
the struct, respectively. Currently node is used in ~all tests
involving struct foo and lists. The verifier needs to do some work
to account for the offset of bpf_list_node within the node type, so
using node2 instead of node exercises that logic more in the tests.
This patch migrates linked_list tests to use node2 instead of node.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-7-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Consider this code snippet:
struct node {
long key;
bpf_list_node l;
bpf_rb_node r;
bpf_refcount ref;
}
int some_bpf_prog(void *ctx)
{
struct node *n = bpf_obj_new(/*...*/), *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->r, /* ... */);
m = bpf_refcount_acquire(n);
bpf_rbtree_add(&other_tree, &m->r, /* ... */);
bpf_spin_unlock(&glock);
/* ... */
}
After bpf_refcount_acquire, n and m point to the same underlying memory,
and that node's bpf_rb_node field is being used by the some_tree insert,
so overwriting it as a result of the second insert is an error. In order
to properly support refcounted nodes, the rbtree and list insert
functions must be allowed to fail. This patch adds such support.
The kfuncs bpf_rbtree_add, bpf_list_push_{front,back} are modified to
return an int indicating success/failure, with 0 -> success, nonzero ->
failure.
bpf_obj_drop on failure
=======================
Currently the only reason an insert can fail is the example above: the
bpf_{list,rb}_node is already in use. When such a failure occurs, the
insert kfuncs will bpf_obj_drop the input node. This allows the insert
operations to logically fail without changing their verifier owning ref
behavior, namely the unconditional release_reference of the input
owning ref.
With insert that always succeeds, ownership of the node is always passed
to the collection, since the node always ends up in the collection.
With a possibly-failed insert w/ bpf_obj_drop, ownership of the node
is always passed either to the collection (success), or to bpf_obj_drop
(failure). Regardless, it's correct to continue unconditionally
releasing the input owning ref, as something is always taking ownership
from the calling program on insert.
Keeping owning ref behavior unchanged results in a nice default UX for
insert functions that can fail. If the program's reaction to a failed
insert is "fine, just get rid of this owning ref for me and let me go
on with my business", then there's no reason to check for failure since
that's default behavior. e.g.:
long important_failures = 0;
int some_bpf_prog(void *ctx)
{
struct node *n, *m, *o; /* all bpf_obj_new'd */
bpf_spin_lock(&glock);
bpf_rbtree_add(&some_tree, &n->node, /* ... */);
bpf_rbtree_add(&some_tree, &m->node, /* ... */);
if (bpf_rbtree_add(&some_tree, &o->node, /* ... */)) {
important_failures++;
}
bpf_spin_unlock(&glock);
}
If we instead chose to pass ownership back to the program on failed
insert - by returning NULL on success or an owning ref on failure -
programs would always have to do something with the returned ref on
failure. The most likely action is probably "I'll just get rid of this
owning ref and go about my business", which ideally would look like:
if (n = bpf_rbtree_add(&some_tree, &n->node, /* ... */))
bpf_obj_drop(n);
But bpf_obj_drop isn't allowed in a critical section and inserts must
occur within one, so in reality error handling would become a
hard-to-parse mess.
For refcounted nodes, we can replicate the "pass ownership back to
program on failure" logic with this patch's semantics, albeit in an ugly
way:
struct node *n = bpf_obj_new(/* ... */), *m;
bpf_spin_lock(&glock);
m = bpf_refcount_acquire(n);
if (bpf_rbtree_add(&some_tree, &n->node, /* ... */)) {
/* Do something with m */
}
bpf_spin_unlock(&glock);
bpf_obj_drop(m);
bpf_refcount_acquire is used to simulate "return owning ref on failure".
This should be an uncommon occurrence, though.
Addition of two verifier-fixup'd args to collection inserts
===========================================================
The actual bpf_obj_drop kfunc is
bpf_obj_drop_impl(void *, struct btf_struct_meta *), with bpf_obj_drop
macro populating the second arg with 0 and the verifier later filling in
the arg during insn fixup.
Because bpf_rbtree_add and bpf_list_push_{front,back} now might do
bpf_obj_drop, these kfuncs need a btf_struct_meta parameter that can be
passed to bpf_obj_drop_impl.
Similarly, because the 'node' param to those insert functions is the
bpf_{list,rb}_node within the node type, and bpf_obj_drop expects a
pointer to the beginning of the node, the insert functions need to be
able to find the beginning of the node struct. A second
verifier-populated param is necessary: the offset of {list,rb}_node within the
node type.
These two new params allow the insert kfuncs to correctly call
__bpf_obj_drop_impl:
beginning_of_node = bpf_rb_node_ptr - offset
if (already_inserted)
__bpf_obj_drop_impl(beginning_of_node, btf_struct_meta->record);
Similarly to other kfuncs with "hidden" verifier-populated params, the
insert functions are renamed with _impl prefix and a macro is provided
for common usage. For example, bpf_rbtree_add kfunc is now
bpf_rbtree_add_impl and bpf_rbtree_add is now a macro which sets
"hidden" args to 0.
Due to the two new args BPF progs will need to be recompiled to work
with the new _impl kfuncs.
This patch also rewrites the "hidden argument" explanation to more
directly say why the BPF program writer doesn't need to populate the
arguments with anything meaningful.
How does this new logic affect non-owning references?
=====================================================
Currently, non-owning refs are valid until the end of the critical
section in which they're created. We can make this guarantee because, if
a non-owning ref exists, the referent was added to some collection. The
collection will drop() its nodes when it goes away, but it can't go away
while our program is accessing it, so that's not a problem. If the
referent is removed from the collection in the same CS that it was added
in, it can't be bpf_obj_drop'd until after CS end. Those are the only
two ways to free the referent's memory and neither can happen until
after the non-owning ref's lifetime ends.
On first glance, having these collection insert functions potentially
bpf_obj_drop their input seems like it breaks the "can't be
bpf_obj_drop'd until after CS end" line of reasoning. But we care about
the memory not being _freed_ until end of CS end, and a previous patch
in the series modified bpf_obj_drop such that it doesn't free refcounted
nodes until refcount == 0. So the statement can be more accurately
rewritten as "can't be free'd until after CS end".
We can prove that this rewritten statement holds for any non-owning
reference produced by collection insert functions:
* If the input to the insert function is _not_ refcounted
* We have an owning reference to the input, and can conclude it isn't
in any collection
* Inserting a node in a collection turns owning refs into
non-owning, and since our input type isn't refcounted, there's no
way to obtain additional owning refs to the same underlying
memory
* Because our node isn't in any collection, the insert operation
cannot fail, so bpf_obj_drop will not execute
* If bpf_obj_drop is guaranteed not to execute, there's no risk of
memory being free'd
* Otherwise, the input to the insert function is refcounted
* If the insert operation fails due to the node's list_head or rb_root
already being in some collection, there was some previous successful
insert which passed refcount to the collection
* We have an owning reference to the input, it must have been
acquired via bpf_refcount_acquire, which bumped the refcount
* refcount must be >= 2 since there's a valid owning reference and the
node is already in a collection
* Insert triggering bpf_obj_drop will decr refcount to >= 1, never
resulting in a free
So although we may do bpf_obj_drop during the critical section, this
will never result in memory being free'd, and no changes to non-owning
ref logic are needed in this patch.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-6-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently, BPF programs can interact with the lifetime of refcounted
local kptrs in the following ways:
bpf_obj_new - Initialize refcount to 1 as part of new object creation
bpf_obj_drop - Decrement refcount and free object if it's 0
collection add - Pass ownership to the collection. No change to
refcount but collection is responsible for
bpf_obj_dropping it
In order to be able to add a refcounted local kptr to multiple
collections we need to be able to increment the refcount and acquire a
new owning reference. This patch adds a kfunc, bpf_refcount_acquire,
implementing such an operation.
bpf_refcount_acquire takes a refcounted local kptr and returns a new
owning reference to the same underlying memory as the input. The input
can be either owning or non-owning. To reinforce why this is safe,
consider the following code snippets:
struct node *n = bpf_obj_new(typeof(*n)); // A
struct node *m = bpf_refcount_acquire(n); // B
In the above snippet, n will be alive with refcount=1 after (A), and
since nothing changes that state before (B), it's obviously safe. If
n is instead added to some rbtree, we can still safely refcount_acquire
it:
struct node *n = bpf_obj_new(typeof(*n));
struct node *m;
bpf_spin_lock(&glock);
bpf_rbtree_add(&groot, &n->node, less); // A
m = bpf_refcount_acquire(n); // B
bpf_spin_unlock(&glock);
In the above snippet, after (A) n is a non-owning reference, and after
(B) m is an owning reference pointing to the same memory as n. Although
n has no ownership of that memory's lifetime, it's guaranteed to be
alive until the end of the critical section, and n would be clobbered if
we were past the end of the critical section, so it's safe to bump
refcount.
Implementation details:
* From verifier's perspective, bpf_refcount_acquire handling is similar
to bpf_obj_new and bpf_obj_drop. Like the former, it returns a new
owning reference matching input type, although like the latter, type
can be inferred from concrete kptr input. Verifier changes in
{check,fixup}_kfunc_call and check_kfunc_args are largely copied from
aforementioned functions' verifier changes.
* An exception to the above is the new KF_ARG_PTR_TO_REFCOUNTED_KPTR
arg, indicated by new "__refcounted_kptr" kfunc arg suffix. This is
necessary in order to handle both owning and non-owning input without
adding special-casing to "__alloc" arg handling. Also a convenient
place to confirm that input type has bpf_refcount field.
* The implemented kfunc is actually bpf_refcount_acquire_impl, with
'hidden' second arg that the verifier sets to the type's struct_meta
in fixup_kfunc_call.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com>
Link: https://lore.kernel.org/r/20230415201811.343116-5-davemarchevsky@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add testing for the event "Instructions retired" (0xc0) in the PMU
event filter on both Intel and AMD to ensure that the event doesn't
count when it is disallowed. Unlike most of the other events, the
event "Instructions retired" will be incremented by KVM when an
instruction is emulated. Test that this case is being properly handled
and that KVM doesn't increment the counter when that event is
disallowed.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230307141400.1486314-6-aaronlewis@google.com
Link: https://lore.kernel.org/r/20230407233254.957013-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use a single struct to track all PMC event counts in the PMU filter test,
and copy the full struct to/from the guest when running and measuring each
guest workload. Using a common struct avoids naming conflicts, e.g. the
loads/stores testcase has claimed "perf_counter", and eliminates the
unnecessary truncation of the counter values when they are propagated from
the guest MSRs to the host structs.
Zero the struct before running the guest workload to ensure that the test
doesn't get a false pass due to consuming data from a previous run.
Link: https://lore.kernel.org/r/20230407233254.957013-6-seanjc@google.com
Reviewed by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Use '0' to signal success and '-errno' to signal failure in the PMU event
filter test so that the values are slightly less magical/arbitrary. Using
'0' in the error paths is especially confusing as understanding it's an
error value requires following the breadcrumbs to the host code that
ultimately consumes the value.
Arguably there should also be a #define for "success", but 0/-errno is a
common enough pattern that defining another macro on top would likely do
more harm than good.
Link: https://lore.kernel.org/r/20230407233254.957013-5-seanjc@google.com
Reviewed by: Aaron Lewis <aaronlewis@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Provide the actual vs. expected count in the PMU event filter test's
asserts instead of relying on pr_info() to provide the context, e.g. so
that all information needed to triage a failure is readily available even
if the environment in which the test is run captures only the assert
itself.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
[sean: rewrite changelog]
Link: https://lore.kernel.org/r/20230407233254.957013-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Add helper macros to consolidate the asserts that a PMC is/isn't counting
(branch) instructions retired. This will make it easier to add additional
asserts related to counting instructions later on.
No functional changes intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
[sean: add "INSTRUCTIONS", massage changelog]
Link: https://lore.kernel.org/r/20230407233254.957013-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
Split out the common parts of the Intel and AMD guest code in the PMU
event filter test into a helper function. This is in preparation for
adding additional counters to the test.
No functional changes intended.
Signed-off-by: Aaron Lewis <aaronlewis@google.com>
Link: https://lore.kernel.org/r/20230407233254.957013-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.intel.com/
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Some distros ship with older vm_sockets.h that doesn't have VMADDR_CID_LOCAL
which causes selftests build to fail:
/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c:261:18: error: ‘VMADDR_CID_LOCAL’ undeclared (first use in this function); did you mean ‘VMADDR_CID_HOST’?
261 | addr->svm_cid = VMADDR_CID_LOCAL;
| ^~~~~~~~~~~~~~~~
| VMADDR_CID_HOST
Workaround this issue by defining it on demand.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>