* Address some fallout of the locking rework, this time affecting
the way the vgic is configured
* Fix an issue where the page table walker frees a subtree and
then proceeds with walking what it has just freed...
* Check that a given PA donated to the guest is actually memory
(only affecting pKVM)
* Correctly handle MTE CMOs by Set/Way
* Fix the reported address of a watchpoint forwarded to userspace
* Fix the freeing of the root of stage-2 page tables
* Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead.
x86:
* Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
* Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
* Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
* Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race.
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmR7k1QUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNblwf/faUVOBMv7mQBGsGa7FNcmaNhYeIT
U1k4pFNlo7dNNuNJrGdpo+sOGP5A8CRLNSVvlyjgCHF1Qc9gVtXNvZ9PnA6nAYmB
qqvUz/TDw9/NLTlJEkbSs05B4am4yfd5pV6R/32jrPIbXOW++6ae2LpILS/NPBrB
y0tGiVUJrO3zVXdBKa4PFmlO8jsXPmMEiicEJa5v2Boeo5SFyFfErw9zDNwSMsQc
27bzbs3O2daXTNMFnwVCCpWUxt1EqWYUXGvBjsChAUI0K10F2/GW9f6YeFsGXqKI
d8g1QuCukSt/CvN0pT+g/540mR6i0Azpek1myQfuCu2IhQ1jCJaSWOjoEw==
=8VrO
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Address some fallout of the locking rework, this time affecting the
way the vgic is configured
- Fix an issue where the page table walker frees a subtree and then
proceeds with walking what it has just freed...
- Check that a given PA donated to the guest is actually memory (only
affecting pKVM)
- Correctly handle MTE CMOs by Set/Way
- Fix the reported address of a watchpoint forwarded to userspace
- Fix the freeing of the root of stage-2 page tables
- Stop creating spurious PMU events to perform detection of the
default PMU and use the existing PMU list instead
x86:
- Fix a memslot lookup bug in the NX recovery thread that could
theoretically let userspace bypass the NX hugepage mitigation
- Fix a s/BLOCKING/PENDING bug in SVM's vNMI support
- Account exit stats for fastpath VM-Exits that never leave the super
tight run-loop
- Fix an out-of-bounds bug in the optimized APIC map code, and add a
regression test for the race"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: selftests: Add test for race in kvm_recalculate_apic_map()
KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds
KVM: x86: Account fastpath-only VM-Exits in vCPU stats
KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker
KVM: arm64: Document default vPMU behavior on heterogeneous systems
KVM: arm64: Iterate arm_pmus list to probe for default PMU
KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed()
KVM: arm64: Populate fault info for watchpoint
KVM: arm64: Reload PTE after invoking walker callback on preorder traversal
KVM: arm64: Handle trap of tagged Set/Way CMOs
arm64: Add missing Set/Way CMO encodings
KVM: arm64: Prevent unconditional donation of unmapped regions from the host
KVM: arm64: vgic: Fix a comment
KVM: arm64: vgic: Fix locking comment
KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
KVM: arm64: vgic: Fix a circular locking issue
- Return NULL if the trace_probe list on trace_probe_event is empty.
- selftests/ftrace: Choose testing symbol name for filtering feature
from sample data instead of fixed symbol.
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmR640AACgkQ2/sHvwUr
PxugGgf/YwwocmUqiEtTukTB7fzoAjYyQXr0YaJM+DjeZXMqAJ4dl9tV1/AmAL4j
iWtZd53aolTym/3P2VADfSc4xiyWjFdkYv7zRPjpqfMg3XsELJgshwz+12dmmMdx
0uco1l2/Ge3JNPK6BuWaO3V44QjoPSgiRsmxxKLh5K7M9V5swL7fadoLtins1B0r
TVVqdyEHQkZLTByexg7wHYd/ro+4lexv1yhvyP4rEmYRPDoR56eOF2zwcQMHPvaY
qstdP2ce6m5rG0gp4TsY7oRkezb64y903hNQuumoU6VR9nI3IK4PZjuX5/xns2By
G9mRaOqb02+UmP+HhX4QGmr92G9Vyw==
=o07w
-----END PGP SIGNATURE-----
Merge tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull probes fixes from Masami Hiramatsu:
- Return NULL if the trace_probe list on trace_probe_event is empty
- selftests/ftrace: Choose testing symbol name for filtering feature
from sample data instead of fixed symbol
* tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
selftests/ftrace: Choose target function for filter test from samples
tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during
APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map
recalc makes multiple passes over the list of vCPUs, and the calculations
ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where
toggling LAPIC_MODE_DISABLED is quite interesting.
Signed-off-by: Michal Luczaj <mhal@rbox.co>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20230602233250.1014316-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
sub-tree submissions this week. The mlx5 IRQ fixes are notable,
people were complaining about that. No fires burning.
Current release - regressions:
- eth: mlx5e:
- multiple fixes for dynamic IRQ allocation
- prevent encap offload when neigh update is running
- eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters
Current release - new code bugs:
- eth: mlx5e: DR, add missing mutex init/destroy in pattern manager
Previous releases - always broken:
- tcp: deny tcp_disconnect() when threads are waiting
- sched: prevent ingress Qdiscs from getting installed in random
locations in the hierarchy and moving around
- sched: flower: fix possible OOB write in fl_set_geneve_opt()
- netlink: fix NETLINK_LIST_MEMBERSHIPS length report
- udp6: fix race condition in udp6_sendmsg & connect
- tcp: fix mishandling when the sack compression is deferred
- rtnetlink: validate link attributes set at creation time
- mptcp: fix connect timeout handling
- eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked
- eth: amd-xgbe: fix the false linkup in xgbe_phy_status
- eth: mlx5e:
- fix corner cases in internal buffer configuration
- drain health before unregistering devlink
- usb: qmi_wwan: set DTR quirk for BroadMobi BM818
Misc:
- tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmR42fUACgkQMUZtbf5S
IrtD7w//R4zkxkNpT/pTmBsbm4zA7MSttPEW07HPTBXcMBbJMPV3pEyo18Pezm/a
kJLO+2mvNTz/S74WAU0H2M3ux2I+Uc/srRXroff52ttCeV7mO/OHPAYBna0PqDoq
O5A+laSGxaq3ulmLJdbE0SMhrH4t6iCeelEFUw03q49XhEKQlfgSHW4+lws16ffE
Togxy1Iip8PXCMYXqWb1Hc0MFqF+2MdPm8YVaIfYRuFrI56apdknywrKuHYwk7kl
gsy8hKHRJDRQy5RjmHDgbsLm4RCr2abHOe2mYwtdGAXWfvdRUI4HGeFYtc/b5i32
55WAkegBzZPVkql7OLQM1N0hZYjz7JdAV/MiT8Taf8kSJGmLNbQxgJxgF8t8xj8S
9ddUGSuxOhKBU3jeen6zjenVVbKWXXHDHICS3T5j/rlGUeeUmfgmLyvZ3PfYQeHA
gibfLsq2KQqYQDKESO97hfORXnBV+dP0xXPMNcjKLNx+NnmhmaCI4vVYPfLc7X02
1XqApdfhR/CK2ytNTATDVQdP72xyhtuqRTuhWEYi0q5UYQamTCXt+mDG7DKXfCj5
cyJOcFNOJQH14XKiiAxf0IkD2VimuPVYxuhebTu79YQTXSqf6sUw67pLJ7i8LhUW
tQMM4qOCcIH7y1qJvHbUFDa/4fK/+HnnmC2H3LH/Q8wtpZLKgRU=
=o77c
-----END PGP SIGNATURE-----
Merge tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Happy Wear a Dress Day.
Fairly standard-sized batch of fixes, accounting for the lack of
sub-tree submissions this week. The mlx5 IRQ fixes are notable, people
were complaining about that. No fires burning.
Current release - regressions:
- eth: mlx5e:
- multiple fixes for dynamic IRQ allocation
- prevent encap offload when neigh update is running
- eth: mana: fix perf regression: remove rx_cqes, tx_cqes counters
Current release - new code bugs:
- eth: mlx5e: DR, add missing mutex init/destroy in pattern manager
Previous releases - always broken:
- tcp: deny tcp_disconnect() when threads are waiting
- sched: prevent ingress Qdiscs from getting installed in random
locations in the hierarchy and moving around
- sched: flower: fix possible OOB write in fl_set_geneve_opt()
- netlink: fix NETLINK_LIST_MEMBERSHIPS length report
- udp6: fix race condition in udp6_sendmsg & connect
- tcp: fix mishandling when the sack compression is deferred
- rtnetlink: validate link attributes set at creation time
- mptcp: fix connect timeout handling
- eth: stmmac: fix call trace when stmmac_xdp_xmit() is invoked
- eth: amd-xgbe: fix the false linkup in xgbe_phy_status
- eth: mlx5e:
- fix corner cases in internal buffer configuration
- drain health before unregistering devlink
- usb: qmi_wwan: set DTR quirk for BroadMobi BM818
Misc:
- tcp: return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if
user_mss set"
* tag 'net-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (71 commits)
mptcp: fix active subflow finalization
mptcp: add annotations around sk->sk_shutdown accesses
mptcp: fix data race around msk->first access
mptcp: consolidate passive msk socket initialization
mptcp: add annotations around msk->subflow accesses
mptcp: fix connect timeout handling
rtnetlink: add the missing IFLA_GRO_ tb check in validate_linkmsg
rtnetlink: move IFLA_GSO_ tb check to validate_linkmsg
rtnetlink: call validate_linkmsg in rtnl_create_link
ice: recycle/free all of the fragments from multi-buffer frame
net: phy: mxl-gpy: extend interrupt fix to all impacted variants
net: renesas: rswitch: Fix return value in error path of xmit
net: dsa: mv88e6xxx: Increase wait after reset deactivation
net: ipa: Use correct value for IPA_STATUS_SIZE
tcp: fix mishandling when the sack compression is deferred.
net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
sfc: fix error unwinds in TC offload
net/mlx5: Read embedded cpu after init bit cleared
net/mlx5e: Fix error handling in mlx5e_refresh_tirs
net/mlx5: Ensure af_desc.mask is properly initialized
...
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 259a834fad ("selftests: mptcp: functional tests for the userspace PM type")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: dc65fe82fb ("selftests: mptcp: add packet mark test case")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 1a418cb8e8 ("mptcp: simult flow self-tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: df62f2ec3d ("selftests/mptcp: add diag interface tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: b08fbf2410 ("selftests: add test-cases for MPTCP MP_JOIN")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped".
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: eedbc68532 ("selftests: add PM netlink functional tests")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Selftests are supposed to run on any kernels, including the old ones not
supporting MPTCP.
A new check is then added to make sure MPTCP is supported. If not, the
test stops and is marked as "skipped". Note that this check can also
mark the test as failed if 'SELFTESTS_MPTCP_LIB_EXPECT_ALL_FEATURES' env
var is set to 1: by doing that, we can make sure a test is not being
skipped by mistake.
A new shared file is added here to be able to re-used the same check in
the different selftests we have.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: 048d19d444 ("mptcp: add basic kselftest for mptcp")
Cc: stable@vger.kernel.org
Acked-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
BusyBox's 'cmp' command doesn't support the '--bytes' parameter.
Some CIs -- i.e. LKFT -- use BusyBox and have the mptcp_join.sh test
failing [1] because their 'cmp' command doesn't support this '--bytes'
option:
cmp: unrecognized option '--bytes=1024'
BusyBox v1.35.0 () multi-call binary.
Usage: cmp [-ls] [-n NUM] FILE1 [FILE2]
Instead, 'head --bytes' can be used as this option is supported by
BusyBox. A temporary file is needed for this operation.
Because it is apparently quite common to use BusyBox, it is certainly
better to backport this fix to impacted kernels.
Fixes: 6bf41020b7 ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable@vger.kernel.org
Link: https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc5-5-g148341f0a2f5/testrun/16088933/suite/kselftest-net-mptcp/test/net_mptcp_userspace_pm_sh/log [1]
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
User events:
- Use long instead of int for storing the enable set/clear bit, as it was
found that big endian machines could end up using the wrong bits.
- Split allocating mm and attaching it. This keeps the allocation separate
from the registration and avoids various races.
- Remove RCU locking around pin_user_pages_remote() as that can schedule. The
RCU protection is no longer needed with the above split of mm allocation and
attaching.
- Rename the "link" fields of the various structs to something more
meaningful.
- Add comments around user_event_mm struct usage and locking requirements.
Timerlat tracer:
- Fix missed wakeup of timerlat thread caused by the timerlat interrupt
triggering when tracing is off. The timer interrupt handler needs to always
wake up the timerlat thread regardless if tracing is enabled or not,
otherwise, it will never wake up.
Histograms:
- Fix regression of breaking the "stacktrace" modifier for variables. That
modifier cannot be used for values, but can be used for variables that are
passed from one histogram to the next. This was broken when adding the
restriction to values as the variable logic used the same code.
- Rename the special field "stacktrace" to "common_stacktrace". Special fields
(that are not actually part of the event, but can act just like event
fields, like 'comm' and 'timestamp') should be prefixed with 'common_' for
consistency. To keep backward compatibility, 'stacktrace' can still be used
(as with the special field 'cpu'), but can be overridden if the event has a
field called 'stacktrace'.
- Update the synthetic event selftests to use the new name (synthetic events
are created by histograms)
Tracing bootup selftests:
- Reorganize the code to keep artifacts of the selftests not compiled in when
selftests are not configured.
- Add various cond_resched() around the selftest code, as the softlock
watchdog was triggering much more often. It appears that the kernel runs
slower now with full debugging enabled.
- While debugging ftrace with ftrace (using an instance ring buffer instead of
the top level one), I found that the selftests were disabling prints to the
debug instance. This should not happen, as the selftests only disable
printing to the main buffer as the selftests examine the main buffer to see
if it has what it expects, and prints can make the tests fail. Make the
selftests only disable printing to the toplevel buffer, and leave the
instance buffers alone.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZHQGJBQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qu6hAQCJ1WebZUTJ/s7pFo36mXirLnrW4afB
Ua6sALseqKNesgEAyhLmd2+sMeqmAbCCIUWtcWJb/Pod0jGOt0U8+cBxfw8=
=PhaX
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"User events:
- Use long instead of int for storing the enable set/clear bit, as it
was found that big endian machines could end up using the wrong
bits.
- Split allocating mm and attaching it. This keeps the allocation
separate from the registration and avoids various races.
- Remove RCU locking around pin_user_pages_remote() as that can
schedule. The RCU protection is no longer needed with the above
split of mm allocation and attaching.
- Rename the "link" fields of the various structs to something more
meaningful.
- Add comments around user_event_mm struct usage and locking
requirements.
Timerlat tracer:
- Fix missed wakeup of timerlat thread caused by the timerlat
interrupt triggering when tracing is off. The timer interrupt
handler needs to always wake up the timerlat thread regardless if
tracing is enabled or not, otherwise, it will never wake up.
Histograms:
- Fix regression of breaking the "stacktrace" modifier for variables.
That modifier cannot be used for values, but can be used for
variables that are passed from one histogram to the next. This was
broken when adding the restriction to values as the variable logic
used the same code.
- Rename the special field "stacktrace" to "common_stacktrace".
Special fields (that are not actually part of the event, but can
act just like event fields, like 'comm' and 'timestamp') should be
prefixed with 'common_' for consistency. To keep backward
compatibility, 'stacktrace' can still be used (as with the special
field 'cpu'), but can be overridden if the event has a field called
'stacktrace'.
- Update the synthetic event selftests to use the new name (synthetic
events are created by histograms)
Tracing bootup selftests:
- Reorganize the code to keep artifacts of the selftests not compiled
in when selftests are not configured.
- Add various cond_resched() around the selftest code, as the
softlock watchdog was triggering much more often. It appears that
the kernel runs slower now with full debugging enabled.
- While debugging ftrace with ftrace (using an instance ring buffer
instead of the top level one), I found that the selftests were
disabling prints to the debug instance.
This should not happen, as the selftests only disable printing to
the main buffer as the selftests examine the main buffer to see if
it has what it expects, and prints can make the tests fail.
Make the selftests only disable printing to the toplevel buffer,
and leave the instance buffers alone"
* tag 'trace-v6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Have function_graph selftest call cond_resched()
tracing: Only make selftest conditionals affect the global_trace
tracing: Make tracing_selftest_running/delete nops when not used
tracing: Have tracer selftests call cond_resched() before running
tracing: Move setting of tracing_selftest_running out of register_tracer()
tracing/selftests: Update synthetic event selftest to use common_stacktrace
tracing: Rename stacktrace field to common_stacktrace
tracing/histograms: Allow variables to have some modifiers
tracing/user_events: Document user_event_mm one-shot list usage
tracing/user_events: Rename link fields for clarity
tracing/user_events: Remove RCU lock while pinning pages
tracing/user_events: Split up mm alloc and attach
tracing/timerlat: Always wakeup the timerlat thread
tracing/user_events: Use long vs int for atomic bit ops
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported models
- fix numberspace pollution when using dynamically and statically allocated
GPIOs together
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEFp3rbAvDxGAT0sefEacuoBRx13IFAmRws3YACgkQEacuoBRx
13JHlhAAkyUNjtvkcTdO9buB2AXB3e/JH31MgSEEzoAi9kr54n70D+mtmSz5iX0X
4a1KjZrd5Gd7nSGA3AHll7jxL1Rfa4vJsTID/khfhIPB0arGl7brZ0J2fItRUGqz
oxAvpH3OpSWI+QWQMzp/NEZV0F5rfWBme93w6OqYItzGNr4LfP+3x49zPDUngOjp
d3EsDzQbgxezYvrmWjPvZuZvk3RYud+dfU8bjvtnI6TJZqIxzePmcqEAFgqsnDJF
VLEJQFPGpLaX7U+ZepcAQAs2s5ggk6Mb0JdGKp1w9hQ9w91TClBUgUURG6hzdLmZ
w6/X3iGaFGaHugA4DmWhYd4FWE7H58dWazw/RVK4RM2ZGZ9cT8R8OVQlDsBhesyG
tOTY4RJlU1DgMt66nlUvf8r3ECdm2ayvFq7qSN9xRDQ0W4Ti3+QU2+/re3s8jevR
VYAo1BHtVai28+lCMjopK+fYBe8G4DqsGx9Uh3y0RxYIFvMKdo4EQF9Ku3yTEcq3
3FNFO00KU5ktKgnx7z6bnO7+F3OvHUeSRh44U8O6pm/odfdkUn7CHjFVi2dM6yO5
763UKwMZq9rDT2owdwljolIRECb8IAN0siBhhHYYW4oAFTD2w2vTQ/y2KYrxbntv
CptwauQMM9vjzg9bxMP+4R5YVZhZhtJ4I1oa2kQx7eXIPBy/6OM=
=R4dV
-----END PGP SIGNATURE-----
Merge tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull gpio fixes from Bartosz Golaszewski:
- fix incorrect output in in-tree gpio tools
- fix a shell coding issue in gpio-sim selftests
- correctly set the permissions for debugfs attributes exposed by
gpio-mockup
- fix chip name and pin count in gpio-f7188x for one of the supported
models
- fix numberspace pollution when using dynamically and statically
allocated GPIOs together
* tag 'gpio-fixes-for-v6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio-f7188x: fix chip name and pin count on Nuvoton chip
gpiolib: fix allocation of mixed dynamic/static GPIOs
gpio: mockup: Fix mode of debugfs files
selftests: gpio: gpio-sim: Fix BUG: test FAILED due to recent change
tools: gpio: fix debounce_period_us output of lsgpio
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZG4AiAAKCRDbK58LschI
g+xlAQCmefGbDuwPckZLnomvt6gl4bkIjs7kc1ySbG9QBnaInwD/WyrJaQIPijuD
qziHPAyx+MEgPseFU1b7Le35SZ66IwM=
=s4R1
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-05-24
We've added 19 non-merge commits during the last 10 day(s) which contain
a total of 20 files changed, 738 insertions(+), 448 deletions(-).
The main changes are:
1) Batch of BPF sockmap fixes found when running against NGINX TCP tests,
from John Fastabend.
2) Fix a memleak in the LRU{,_PERCPU} hash map when bucket locking fails,
from Anton Protopopov.
3) Init the BPF offload table earlier than just late_initcall,
from Jakub Kicinski.
4) Fix ctx access mask generation for 32-bit narrow loads of 64-bit fields,
from Will Deacon.
5) Remove a now unsupported __fallthrough in BPF samples,
from Andrii Nakryiko.
6) Fix a typo in pkg-config call for building sign-file,
from Jeremy Sowden.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf, sockmap: Test progs verifier error with latest clang
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer with drops
bpf, sockmap: Test FIONREAD returns correct bytes in rx buffer
bpf, sockmap: Test shutdown() correctly exits epoll and recv()=0
bpf, sockmap: Build helper to create connected socket pair
bpf, sockmap: Pull socket helpers out of listen test for general use
bpf, sockmap: Incorrectly handling copied_seq
bpf, sockmap: Wake up polling after data copy
bpf, sockmap: TCP data stall on recv before accept
bpf, sockmap: Handle fin correctly
bpf, sockmap: Improved check for empty queue
bpf, sockmap: Reschedule is now done through backlog
bpf, sockmap: Convert schedule_work into delayed_work
bpf, sockmap: Pass skb ownership through read_skb
bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields
samples/bpf: Drop unnecessary fallthrough
bpf: netdev: init the offload table earlier
selftests/bpf: Fix pkg-config call building sign-file
====================
Link: https://lore.kernel.org/r/20230524170839.13905-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
With the rename of the stacktrace field to common_stacktrace, update the
selftests to reflect this change. Copy the current selftest to test the
backward compatibility "stacktrace" keyword. Also the "requires" of that
test was incorrect, so it would never actually ran before. That is fixed
now.
Link: https://lore.kernel.org/linux-trace-kernel/20230523225402.55951f2f@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When BPF program drops pkts the sockmap logic 'eats' the packet and
updates copied_seq. In the PASS case where the sk_buff is accepted
we update copied_seq from recvmsg path so we need a new test to
handle the drop case.
Original patch series broke this resulting in
test_sockmap_skb_verdict_fionread:PASS:ioctl(FIONREAD) error 0 nsec
test_sockmap_skb_verdict_fionread:FAIL:ioctl(FIONREAD) unexpected ioctl(FIONREAD): actual 1503041772 != expected 256
After updated patch with fix.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-14-john.fastabend@gmail.com
A bug was reported where ioctl(FIONREAD) returned zero even though the
socket with a SK_SKB verdict program attached had bytes in the msg
queue. The result is programs may hang or more likely try to recover,
but use suboptimal buffer sizes.
Add a test to check that ioctl(FIONREAD) returns the correct number of
bytes.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-13-john.fastabend@gmail.com
When session gracefully shutdowns epoll needs to wake up and any recv()
readers should return 0 not the -EAGAIN they previously returned.
Note we use epoll instead of select to test the epoll wake on shutdown
event as well.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-12-john.fastabend@gmail.com
A common operation for testing is to spin up a pair of sockets that are
connected. Then we can use these to run specific tests that need to
send data, check BPF programs and so on.
The sockmap_listen programs already have this logic lets move it into
the new sockmap_helpers header file for general use.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-11-john.fastabend@gmail.com
No functional change here we merely pull the helpers in sockmap_listen.c
into a header file so we can use these in other programs. The tests we
are about to add aren't really _listen tests so doesn't make sense
to add them here.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-10-john.fastabend@gmail.com
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2
Fixes: b60417a9f2 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmRmEHoSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkXu0P/AnSeVtu2CYZSjyCQYvkKpdpbDOSlsee
GOnG0jduOdJ+OabYyM6prg9JHp1t3QxxTVJs+Spc7Eh9EX+YHRwK5DNPhv3GQ7RI
pSwQxwiEhLVXVjaEtqUo1Wf8JgCQJ+ThisSIDgQRnaHKQnrRIlbZRngwn6TwNFba
kxBpCUn2RZcZZhL8xdVF4UbSbEeLEupN76rHiePBVKNG70QVRptX3Rd6EV6FxmTa
EzAtfNh1r0p3BHW1YYgsbpy7PeXhKGhlVLqIld8h9r/y4hATsrQ2f7Bv0RNrVBDf
f8r1bdG5E0K3V8AzFPpyOe304G3GAwV+V/wtvA3GRjiwmPUwzJOzaNnSgJZZ/sbq
mnR4pCwJ4gDGnDxLa8hBh6+emQGB6LJJX0JTQY7vMPNmUwtIuEQc6tLxJ4DpXTzW
psEndQPDA7yR/7pNoE7ax+8CKCxPvfiBRnV9sxzmPV6FcxWtzJeLQihOuOA4IB8i
Ddhq2OYH+HCodTNOLWNyMSjk65O7Whee1O/YGiVW9+iUbKBUSBFatJ9fJjH4bXMT
VRZZnhlFGGIMuZXhkL5+a4ZnomqfPRXNGJ/QBbB4Ty7CXr84mXb0SX5gW4qsLOBA
YEuxiqD8Oej2Mrid4ypF5GmwBYLAf4CCZajGVii0yyz2hp69RvlQ0c5lA0WisQu3
wvY2HDFGBsa+
=PfWP
-----END PGP SIGNATURE-----
Merge tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm, bluetooth and netfilter.
Current release - regressions:
- ipv6: fix RCU splat in ipv6_route_seq_show()
- wifi: iwlwifi: disable RFI feature
Previous releases - regressions:
- tcp: fix possible sk_priority leak in tcp_v4_send_reset()
- tipc: do not update mtu if msg_max is too small in mtu negotiation
- netfilter: fix null deref on element insertion
- devlink: change per-devlink netdev notifier to static one
- phylink: fix ksettings_set() ethtool call
- wifi: mac80211: fortify the spinlock against deadlock by interrupt
- wifi: brcmfmac: check for probe() id argument being NULL
- eth: ice:
- fix undersized tx_flags variable
- fix ice VF reset during iavf initialization
- eth: hns3: fix sending pfc frames after reset issue
Previous releases - always broken:
- xfrm: release all offloaded policy memory
- nsh: use correct mac_offset to unwind gso skb in nsh_gso_segment()
- vsock: avoid to close connected socket after the timeout
- dsa: rzn1-a5psw: enable management frames for CPU port
- eth: virtio_net: fix error unwinding of XDP initialization
- eth: tun: fix memory leak for detached NAPI queue.
Misc:
- MAINTAINERS: sctp: move Neil to CREDITS"
* tag 'net-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (107 commits)
MAINTAINERS: skip CCing netdev for Bluetooth patches
mdio_bus: unhide mdio_bus_init prototype
bridge: always declare tunnel functions
atm: hide unused procfs functions
net: isa: include net/Space.h
Revert "ARM: dts: stm32: add CAN support on stm32f746"
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nf_tables: fix nft_trans type confusion
netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT
net: wwan: t7xx: Ensure init is completed before system sleep
net: selftests: Fix optstring
net: pcs: xpcs: fix C73 AN not getting enabled
net: wwan: iosm: fix NULL pointer dereference when removing device
vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit()
mailmap: add entries for Nikolay Aleksandrov
igb: fix bit_shift to be in [1..8] range
net: dsa: mv88e6xxx: Fix mv88e6393x EPC write command offset
cassini: Fix a memory leak in the error handling path of cas_init_one()
tun: Fix memory leak for detached NAPI queue.
can: kvaser_pciefd: Disable interrupts in probe error path
...
This Kselftest fixes update for Linux 6.4-rc3 consists of:
- sgx test fix for false negatives.
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmRlDzgACgkQCwJExA0N
Qxzlfw//fc+Uq5hC2KUeVOiIAPYejEZ3+ToY+OUjaKvykWndMpYys5bCB68qERm8
vtgzYOKm04ovvnyh2jXzxYTjd8439qeRxr0Eybqc5HjiVDhyA1jm/pcsy3EDBf1+
5QGjgIN2fK9rdv7TmyMHL8pSxLvYQfe2zMjLtULuHZWtjzIY98hh3zjA9mdnn3pK
f2zqEJzu2lysTiwkhDF/yspOa1TymVp1W6dCVvfcxr5wJOWYtZaZyOTYNiCI3gX6
9Qy2fm4txPcsooNrwSEekXUIV669fet9vbQX5PF3pRDipxydarAAUUbTm1hOP4X+
3YnKk9dx/Pg483Go/sGMhtf+/P3ZJlWj2YZSro+maULt8O1Jb8TGe1MB7myheiFc
CiXj5duiG+WlwteF8WMsm58vjplgbnFAfyVZ+mumRhE1e4qLtjnc5YxSdNYoOHOF
40wI1Nebfwofh/qW7/ipn5tx7bN29UUSXRd26q1T4qF3KwKYaaj720lcisS7kBR0
gK8FFCdqroKilJ/Fj3e+AM3xqmlJxornj7EhQoAVtBZCsjN90Wyaw2gaCceABp+f
lDAisdO00v/+fpVhWFjXy4Mww5t6MDsRxmE5gLGOS1QHZUV9/mV7H9e9gd9ix6mV
GXYAeytvRrQQBCDEbOz1ax8SS3eA2MFVdbmMxHYn/gWVx6ClpL4=
=DOlj
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
- sgx test fix for false negatives
- ftrace output is hard to parses and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner
* tag 'linux-kselftest-fixes-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Improve integration with kselftest runner
selftests/sgx: Add "test_encl.elf" to TEST_FILES
The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.
ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)
Fixes: 5feba47273 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When building sign-file, the call to get the CFLAGS for libcrypto is
missing white-space between `pkg-config` and `--cflags`:
$(shell $(HOSTPKG_CONFIG)--cflags libcrypto 2> /dev/null)
Removing the redirection of stderr, we see:
$ make -C tools/testing/selftests/bpf sign-file
make: Entering directory '[...]/tools/testing/selftests/bpf'
make: pkg-config--cflags: No such file or directory
SIGN-FILE sign-file
make: Leaving directory '[...]/tools/testing/selftests/bpf'
Add the missing space.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Link: https://lore.kernel.org/bpf/20230426215032.415792-1-jeremy@azazel.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT4 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The srv6_end_dt4_l3vpn_test instantiates a virtual network consisting of
several routers (rt-1, rt-2) and hosts.
When the IPv6 addresses of rt-{1,2} routers are configured, the Deduplicate
Address Detection (DAD) kicks in when enabled in the Linux distros running
the selftests. DAD is used to check whether an IPv6 address is already
assigned in a network. Such a mechanism consists of sending an ICMPv6 Echo
Request and waiting for a reply.
As the DAD process could take too long to complete, it may cause the
failing of some tests carried out by the srv6_end_dt4_l3vpn_test script.
To make the srv6_end_dt4_l3vpn_test more robust, we disable DAD on routers
since we configure the virtual network manually and do not need any address
deduplication mechanism at all.
Fixes: 2195444e09 ("selftests: add selftest for the SRv6 End.DT4 behavior")
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
According to Mirsad the gpio-sim.sh test appears to FAIL in a wrong way
due to missing initialisation of shell variables:
4.2. Bias settings work correctly
cat: /sys/devices/platform/gpio-sim.0/gpiochip18/sim_gpio0/value: No such file or directory
./gpio-sim.sh: line 393: test: =: unary operator expected
bias setting does not work
GPIO gpio-sim test FAIL
After this change the test passed:
4.2. Bias settings work correctly
GPIO gpio-sim test PASS
His testing environment is AlmaLinux 8.7 on Lenovo desktop box with
the latest Linux kernel based on v6.2:
Linux 6.2.0-mglru-kmlk-andy-09238-gd2980d8d8265 x86_64
Suggested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac@alu.unizg.hr>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmRbUnQACgkQ1V2XiooU
IOQNVw//eoCiid3J4TuNdzHaBHXUlZvln1n5Z6K5fIz5ytrY1T7oKC8Y9StkzzWR
29ToFLOJt1iRAcgxsghWRIwUzNwpuUdqgd9cUZMHMQxT0BJItp6FXUql2+1LkF/I
b/gnnb90zyE7lBS/VSRyOiqMiJlP+Som22d7Nn5k2KfTYEdXKwfzjsWAu3W3Sb0s
Lv/MA9DE42qcwiZubmFmDtOtAunPJFZm3HgkcAVeXoNkBDrSfkvxLIMYG6VfFNhQ
AkKMyzX293wpwVxfOuQfJr4QVlxAgOQUko+FqajoWMBtfA3yldZjJ8RC7c9Af9uI
ciOP11vHBCG84KrTabC5kdqOcvadreDiM/oIvk57ztQhCr3e+po+vIz6Cv4p9I30
m5GXfgbtMRl6hM2S5lrRc5fNRkYJHE4aNvesFTGaLpK3LogusH1E9mH5jdjnbU42
TwkIe250qJJelNn9ZxS5Jt0BgyogNfeiA9lOXmaQBpYmwIahrjDf0g8z2I85nPhF
PDukjSXsxi4uHwpSF5wFrlqkAPiEX4vC95uSUTVbbBgZrHhNJdGgu4FJSHRM4mVo
6Awxk2O2bvcDpXEfTBdD4EJF71bZ/aH2i8ddU1oupIl88O01TWTtkO4SYHqMX+tC
fPnpGzBN8KJvHgeFxO4p0v1oelv2RtiITv1gi7YJOnq/+Vd2yy0=
=HmfP
-----END PGP SIGNATURE-----
Merge tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter updates for net
The following patchset contains Netfilter fixes for net:
1) Fix UAF when releasing netnamespace, from Florian Westphal.
2) Fix possible BUG_ON when nf_conntrack is enabled with enable_hooks,
from Florian Westphal.
3) Fixes for nft_flowtable.sh selftest, from Boris Sukholitko.
4) Extend nft_flowtable.sh selftest to cover integration with
ingress/egress hooks, from Florian Westphal.
* tag 'nf-23-05-10' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
selftests: nft_flowtable.sh: check ingress/egress chain too
selftests: nft_flowtable.sh: monitor result file sizes
selftests: nft_flowtable.sh: wait for specific nc pids
selftests: nft_flowtable.sh: no need for ps -x option
selftests: nft_flowtable.sh: use /proc for pid checking
netfilter: conntrack: fix possible bug_on with enable_hooks=1
netfilter: nf_tables: always release netdev hooks from notifier
====================
Link: https://lore.kernel.org/r/20230510083313.152961-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When run the test in netns, it's not easy to get the tc stats via
tc_rule_handle_stats_get(). With the new netns parameter, we can get
stats from specific netns like
num=$(tc_rule_handle_stats_get "dev eth0 ingress" 101 ".packets" "-n ns")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure flowtable interacts correctly with ingress and egress
chains, i.e. those get handled before and after flow table respectively.
Adds three more tests:
1. repeat flowtable test, but with 'ip dscp set cs3' done in
inet forward chain.
Expect that some packets have been mangled (before flowtable offload
became effective) while some pass without mangling (after offload
succeeds).
2. repeat flowtable test, but with 'ip dscp set cs3' done in
veth0:ingress.
Expect that all packets pass with cs3 dscp field.
3. same as 2, but use veth1:egress. Expect the same outcome.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When running nft_flowtable.sh in VM on a busy server we've found that
the time of the netcat file transfers vary wildly.
Therefore replace hardcoded 3 second sleep with the loop checking for
a change in the file sizes. Once no change in detected we test the results.
Nice side effect is that we shave 1 second sleep in the fast case
(hard-coded 3 second sleep vs two 1 second sleeps).
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Doing wait with no parameters may interfere with some of the tests
having their own background processes.
Although no such test is currently present, the cleanup is useful
to rely on the nft_flowtable.sh for local development (e.g. running
background tcpdump command during the tests).
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -x option. For the
purposes of hash calculation of the list of processes this option is
inessential.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Some ps commands (e.g. busybox derived) have no -p option. Use /proc for
pid existence check.
Signed-off-by: Boris Sukholitko <boris.sukholitko@broadcom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e6 ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai@intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
=7G78
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZFLsxAAKCRDdBJ7gKXxA
jl8yAQCqjstPsOULf9QN0z4bGAUhY+Wj4ERz1jbKSIuhFCJWiQEAgQvgRXObKjmi
OtUB0Ek4CMDCQzbyIQ1Bhp3kxi6+Jgs=
=AbyC
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull more MM updates from Andrew Morton:
- Some DAMON cleanups from Kefeng Wang
- Some KSM work from David Hildenbrand, to make the PR_SET_MEMORY_MERGE
ioctl's behavior more similar to KSM's behavior.
[ Andrew called these "final", but I suspect we'll have a series fixing
up the fact that the last commit in the dmapools series in the
previous pull seems to have unintentionally just reverted all the
other commits in the same series.. - Linus ]
* tag 'mm-stable-2023-05-03-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm: hwpoison: coredump: support recovery from dump_user_range()
mm/page_alloc: add some comments to explain the possible hole in __pageblock_pfn_to_page()
mm/ksm: move disabling KSM from s390/gmap code to KSM code
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: unmerge and clear VM_MERGEABLE when setting PR_SET_MEMORY_MERGE=0
mm/damon/paddr: fix missing folio_sz update in damon_pa_young()
mm/damon/paddr: minor refactor of damon_pa_mark_accessed_or_deactivate()
mm/damon/paddr: minor refactor of damon_pa_pageout()
1. Don't hard-code pkg-config
2. Remove distro-specific default for CFLAGS
3. Use pkg-config for LDLIBS
Fixes: a50a88f026 ("selftests: netfilter: fix a build error on openSUSE")
Suggested-by: Jan Engelhardt <jengelh@inai.de>
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Let's test whether setting PR_SET_MEMORY_MERGE to 0 after setting it to 1
will unmerge pages, similar to how setting MADV_UNMERGEABLE after setting
MADV_MERGEABLE would.
Link: https://lkml.kernel.org/r/20230422205420.30372-3-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Stefan Roesch <shr@devkernel.io>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
* More phys_to_virt conversions
* Improvement of AP management for VSIE (nested virtualization)
ARM64:
* Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
* New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features
being moved to VMMs rather than be implemented in the kernel.
* Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one.
This last part allows the NV timer code to be implemented on
top.
* A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
* The usual selftest fixes and improvements.
KVM x86 changes for 6.4:
* Optimize CR0.WP toggling by avoiding an MMU reload when TDP is enabled,
and by giving the guest control of CR0.WP when EPT is enabled on VMX
(VMX-only because SVM doesn't support per-bit controls)
* Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long" return
as a bool
* Move AMD_PSFD to cpufeatures.h and purge KVM's definition
* Avoid unnecessary writes+flushes when the guest is only adding new PTEs
* Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s optimizations
when emulating invalidations
* Clean up the range-based flushing APIs
* Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a single
A/D bit using a LOCK AND instead of XCHG, and skip all of the "handle
changed SPTE" overhead associated with writing the entire entry
* Track the number of "tail" entries in a pte_list_desc to avoid having
to walk (potentially) all descriptors during insertion and deletion,
which gets quite expensive if the guest is spamming fork()
* Disallow virtualizing legacy LBRs if architectural LBRs are available,
the two are mutually exclusive in hardware
* Disallow writes to immutable feature MSRs (notably PERF_CAPABILITIES)
after KVM_RUN, similar to CPUID features
* Overhaul the vmx_pmu_caps selftest to better validate PERF_CAPABILITIES
* Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
x86 AMD:
* Add support for virtual NMIs
* Fixes for edge cases related to virtual interrupts
x86 Intel:
* Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if XTILE_DATA is
not being reported due to userspace not opting in via prctl()
* Fix a bug in emulation of ENCLS in compatibility mode
* Allow emulation of NOP and PAUSE for L2
* AMX selftests improvements
* Misc cleanups
MIPS:
* Constify MIPS's internal callbacks (a leftover from the hardware enabling
rework that landed in 6.3)
Generic:
* Drop unnecessary casts from "void *" throughout kvm_main.c
* Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the struct
size by 8 bytes on 64-bit kernels by utilizing a padding hole
Documentation:
* Fix goof introduced by the conversion to rST
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNExkUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroNyjwf+MkzDael9y9AsOZoqhEZ5OsfQYJ32
Im5ZVYsPRU2K5TuoWql6meIihgclCj1iIU32qYHa2F1WYt2rZ72rJp+HoY8b+TaI
WvF0pvNtqQyg3iEKUBKPA4xQ6mj7RpQBw86qqiCHmlfNt0zxluEGEPxH8xrWcfhC
huDQ+NUOdU7fmJ3rqGitCvkUbCuZNkw3aNPR8dhU8RAWrwRzP2hBOmdxIeo81WWY
XMEpJSijbGpXL9CvM0Jz9nOuMJwZwCCBGxg1vSQq0xTfLySNMxzvWZC2GFaBjucb
j0UOQ7yE0drIZDVhd3sdNslubXXU6FcSEzacGQb9aigMUon3Tem9SHi7Kw==
=S2Hq
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm updates from Paolo Bonzini:
"s390:
- More phys_to_virt conversions
- Improvement of AP management for VSIE (nested virtualization)
ARM64:
- Numerous fixes for the pathological lock inversion issue that
plagued KVM/arm64 since... forever.
- New framework allowing SMCCC-compliant hypercalls to be forwarded
to userspace, hopefully paving the way for some more features being
moved to VMMs rather than be implemented in the kernel.
- Large rework of the timer code to allow a VM-wide offset to be
applied to both virtual and physical counters as well as a
per-timer, per-vcpu offset that complements the global one. This
last part allows the NV timer code to be implemented on top.
- A small set of fixes to make sure that we don't change anything
affecting the EL1&0 translation regime just after having having
taken an exception to EL2 until we have executed a DSB. This
ensures that speculative walks started in EL1&0 have completed.
- The usual selftest fixes and improvements.
x86:
- Optimize CR0.WP toggling by avoiding an MMU reload when TDP is
enabled, and by giving the guest control of CR0.WP when EPT is
enabled on VMX (VMX-only because SVM doesn't support per-bit
controls)
- Add CR0/CR4 helpers to query single bits, and clean up related code
where KVM was interpreting kvm_read_cr4_bits()'s "unsigned long"
return as a bool
- Move AMD_PSFD to cpufeatures.h and purge KVM's definition
- Avoid unnecessary writes+flushes when the guest is only adding new
PTEs
- Overhaul .sync_page() and .invlpg() to utilize .sync_page()'s
optimizations when emulating invalidations
- Clean up the range-based flushing APIs
- Revamp the TDP MMU's reaping of Accessed/Dirty bits to clear a
single A/D bit using a LOCK AND instead of XCHG, and skip all of
the "handle changed SPTE" overhead associated with writing the
entire entry
- Track the number of "tail" entries in a pte_list_desc to avoid
having to walk (potentially) all descriptors during insertion and
deletion, which gets quite expensive if the guest is spamming
fork()
- Disallow virtualizing legacy LBRs if architectural LBRs are
available, the two are mutually exclusive in hardware
- Disallow writes to immutable feature MSRs (notably
PERF_CAPABILITIES) after KVM_RUN, similar to CPUID features
- Overhaul the vmx_pmu_caps selftest to better validate
PERF_CAPABILITIES
- Apply PMU filters to emulated events and add test coverage to the
pmu_event_filter selftest
- AMD SVM:
- Add support for virtual NMIs
- Fixes for edge cases related to virtual interrupts
- Intel AMX:
- Don't advertise XTILE_CFG in KVM_GET_SUPPORTED_CPUID if
XTILE_DATA is not being reported due to userspace not opting in
via prctl()
- Fix a bug in emulation of ENCLS in compatibility mode
- Allow emulation of NOP and PAUSE for L2
- AMX selftests improvements
- Misc cleanups
MIPS:
- Constify MIPS's internal callbacks (a leftover from the hardware
enabling rework that landed in 6.3)
Generic:
- Drop unnecessary casts from "void *" throughout kvm_main.c
- Tweak the layout of "struct kvm_mmu_memory_cache" to shrink the
struct size by 8 bytes on 64-bit kernels by utilizing a padding
hole
Documentation:
- Fix goof introduced by the conversion to rST"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (211 commits)
KVM: s390: pci: fix virtual-physical confusion on module unload/load
KVM: s390: vsie: clarifications on setting the APCB
KVM: s390: interrupt: fix virtual-physical confusion for next alert GISA
KVM: arm64: Have kvm_psci_vcpu_on() use WRITE_ONCE() to update mp_state
KVM: arm64: Acquire mp_state_lock in kvm_arch_vcpu_ioctl_vcpu_init()
KVM: selftests: Test the PMU event "Instructions retired"
KVM: selftests: Copy full counter values from guest in PMU event filter test
KVM: selftests: Use error codes to signal errors in PMU event filter test
KVM: selftests: Print detailed info in PMU event filter asserts
KVM: selftests: Add helpers for PMC asserts in PMU event filter test
KVM: selftests: Add a common helper for the PMU event filter guest code
KVM: selftests: Fix spelling mistake "perrmited" -> "permitted"
KVM: arm64: vhe: Drop extra isb() on guest exit
KVM: arm64: vhe: Synchronise with page table walker on MMU update
KVM: arm64: pkvm: Document the side effects of kvm_flush_dcache_to_poc()
KVM: arm64: nvhe: Synchronise with page table walker on TLBI
KVM: arm64: Handle 32bit CNTPCTSS traps
KVM: arm64: nvhe: Synchronise with page table walker on vcpu run
KVM: arm64: vgic: Don't acquire its_lock before config_lock
KVM: selftests: Add test to verify KVM's supported XCR0
...
* cpuset changes including the fix for an incorrect interaction with CPU
hotplug and an optimization.
* Other doc and cosmetic changes.
-----BEGIN PGP SIGNATURE-----
iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZErfng4cdGpAa2VybmVs
Lm9yZwAKCRCxYfJx3gVYGVVtAQCDycK4VSgc4nsFPG1vh1Oy1A723ciEUwAbKmV/
F1n7xwEA68FiDvE29LpMJJuYP9HnX0A5zRMyNnb52kN9jmgcEQI=
=ALol
-----END PGP SIGNATURE-----
Merge tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- cpuset changes including the fix for an incorrect interaction with
CPU hotplug and an optimization
- Other doc and cosmetic changes
* tag 'cgroup-for-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
docs: cgroup-v1/cpusets: update libcgroup project link
cgroup/cpuset: Minor updates to test_cpuset_prs.sh
cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated
cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset
cpuset: Clean up cpuset_node_allowed
cgroup: bpf: use cgroup_lock()/cgroup_unlock() wrappers
* Support for runtime detection of the Svnapot extension.
* Support for Zicboz when clearing pages.
* We've moved to GENERIC_ENTRY.
* Support for !MMU on rv32 systems.
* The linear region is now mapped via huge pages.
* Support for building relocatable kernels.
* Support for the hwprobe interface.
* Various fixes and cleanups throughout the tree.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
g6G3ZP6STSQBmw==
=oxQd
-----END PGP SIGNATURE-----
Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for runtime detection of the Svnapot extension
- Support for Zicboz when clearing pages
- We've moved to GENERIC_ENTRY
- Support for !MMU on rv32 systems
- The linear region is now mapped via huge pages
- Support for building relocatable kernels
- Support for the hwprobe interface
- Various fixes and cleanups throughout the tree
* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
RISC-V: hwprobe: Explicity check for -1 in vdso init
RISC-V: hwprobe: There can only be one first
riscv: Allow to downgrade paging mode from the command line
dt-bindings: riscv: add sv57 mmu-type
RISC-V: hwprobe: Remove __init on probe_vendor_features()
riscv: Use --emit-relocs in order to move .rela.dyn in init
riscv: Check relocations at compile time
powerpc: Move script to check relocations at compile time in scripts/
riscv: Introduce CONFIG_RELOCATABLE
riscv: Move .rela.dyn outside of init to avoid empty relocations
riscv: Prepare EFI header for relocatable kernels
riscv: Unconditionnally select KASAN_VMALLOC if KASAN
riscv: Fix ptdump when KASAN is enabled
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
riscv: Move DTB_EARLY_BASE_VA to the kernel address space
riscv: Rework kasan population functions
riscv: Split early and final KASAN population functions
riscv: Use PUD/P4D/PGD pages for the linear mapping
riscv: Move the linear mapping creation in its own function
riscv: Get rid of riscv_pfn_base variable
...