To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.
Move filter handle initialization earlier.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.
CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.
Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.
Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.
Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Stefan Schmidt says:
====================
pull-request: ieee802154-next 2023-02-20
Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.
Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).
Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
tx path
* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
ieee802154: Drop device trackers
mac802154: Fix an always true condition
mac802154: Send beacons using the MLME Tx path
ieee802154: Change error code on monitor scan netlink request
ieee802154: Convert scan error messages to extack
ieee802154: Use netlink policies when relevant on scan parameters
ieee802154: at86rf230: switch to using gpiod API
ieee802154: at86rf230: drop support for platform data
Revert "at86rf230: convert to gpio descriptors"
cc2520: move to gpio descriptors
mac802154: Avoid superfluous endianness handling
at86rf230: convert to gpio descriptors
mac802154: Handle basic beaconing
ieee802154: Add support for user beaconing requests
mac802154: Handle passive scanning
mac802154: Add MLME Tx locked helpers
mac802154: Prepare forcing specific symbol duration
ieee802154: Introduce a helper to validate a channel
ieee802154: Define a beacon frame header
ieee802154: Add support for user scanning requests
====================
Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Commit 2c02d41d71 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
=tYU+
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-02-17
We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).
The main changes are:
1) Add a rbtree data structure following the "next-gen data structure"
precedent set by recently-added linked-list, that is, by using
kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.
2) Add a new benchmark for hashmap lookups to BPF selftests,
from Anton Protopopov.
3) Fix bpf_fib_lookup to only return valid neighbors and add an option
to skip the neigh table lookup, from Martin KaFai Lau.
4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
accouting for container environments, from Yafang Shao.
5) Batch of ice multi-buffer and driver performance fixes,
from Alexander Lobakin.
6) Fix a bug in determining whether global subprog's argument is
PTR_TO_CTX, which is based on type names which breaks kprobe progs,
from Andrii Nakryiko.
7) Prep work for future -mcpu=v4 LLVM option which includes usage of
BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
from Eduard Zingerman.
8) More prep work for later building selftests with Memory Sanitizer
in order to detect usages of undefined memory, from Ilya Leoshkevich.
9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
dereference via sendmsg(), from Maciej Fijalkowski.
10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.
11) Fix BPF memory allocator in combination with BPF hashtab where it could
corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.
12) Fix LoongArch BPF JIT to always use 4 instructions for function
address so that instruction sequences don't change between passes,
from Hengqi Chen.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
selftests/bpf: Add bpf_fib_lookup test
bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
riscv, bpf: Add bpf trampoline support for RV64
riscv, bpf: Add bpf_arch_text_poke support for RV64
riscv, bpf: Factor out emit_call for kernel and bpf context
riscv: Extend patch_text for multiple instructions
Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
selftests/bpf: Add global subprog context passing tests
selftests/bpf: Convert test_global_funcs test to test_loader framework
bpf: Fix global subprog context argument resolution logic
LoongArch, bpf: Use 4 instructions for function address in JIT
bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
bpf: Disable bh in bpf_test_run for xdp and tc prog
xsk: check IFF_UP earlier in Tx path
Fix typos in selftest/bpf files
selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
...
====================
Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
/DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
+K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
2uEgcHzHQw==
=poRc
-----END PGP SIGNATURE-----
Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
- NVMe updates via Christoph:
- Small improvements to the logging functionality (Amit Engel)
- Authentication cleanups (Hannes Reinecke)
- Cleanup and optimize the DMA mapping cod in the PCIe driver
(Keith Busch)
- Work around the command effects for Format NVM (Keith Busch)
- Misc cleanups (Keith Busch, Christoph Hellwig)
- Fix and cleanup freeing single sgl (Keith Busch)
- MD updates via Song:
- Fix a rare crash during the takeover process
- Don't update recovery_cp when curr_resync is ACTIVE
- Free writes_pending in md_stop
- Change active_io to percpu
- Updates to drbd, inching us closer to unifying the out-of-tree driver
with the in-tree one (Andreas, Christoph, Lars, Robert)
- BFQ update adding support for multi-actuator drives (Paolo, Federico,
Davide)
- Make brd compliant with REQ_NOWAIT (me)
- Fix for IOPOLL and queue entering, fixing stalled IO waiting on
timeouts (me)
- Fix for REQ_NOWAIT with multiple bios (me)
- Fix memory leak in blktrace cleanup (Greg)
- Clean up sbitmap and fix a potential hang (Kemeng)
- Clean up some bits in BFQ, and fix a bug in the request injection
(Kemeng)
- Clean up the request allocation and issue code, and fix some bugs
related to that (Kemeng)
- ublk updates and fixes:
- Add support for unprivileged ublk (Ming)
- Improve device deletion handling (Ming)
- Misc (Liu, Ziyang)
- s390 dasd fixes (Alexander, Qiheng)
- Improve utility of request caching and fixes (Anuj, Xiao)
- zoned cleanups (Pankaj)
- More constification for kobjs (Thomas)
- blk-iocost cleanups (Yu)
- Remove bio splitting from drivers that don't need it (Christoph)
- Switch blk-cgroups to use struct gendisk. Some of this is now
incomplete as select late reverts were done. (Christoph)
- Add bvec initialization helpers, and convert callers to use that
rather than open-coding it (Christoph)
- Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
Matthew, Ulf, Zhong)
* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
brd: use radix_tree_maybe_preload instead of radix_tree_preload
block: use proper return value from bio_failfast()
block: bio-integrity: Copy flags when bio_integrity_payload is cloned
block: Fix io statistics for cgroup in throttle path
brd: mark as nowait compatible
brd: check for REQ_NOWAIT and set correct page allocation mask
brd: return 0/-error from brd_insert_page()
block: sync mixed merged request's failfast with 1st bio's
Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
Revert "blk-cgroup: pass a gendisk to blkg_lookup"
Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
Revert "blk-cgroup: move the cgroup information to struct gendisk"
nvme-pci: remove iod use_sgls
nvme-pci: fix freeing single sgl
block: ublk: check IO buffer based on flag need_get_data
s390/dasd: Fix potential memleak in dasd_eckd_init()
s390/dasd: sort out physical vs virtual pointers usage
block: Remove the ALLOC_CACHE_SLACK constant
block: make kobj_type structures constant
...
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
=+BG5
-----END PGP SIGNATURE-----
Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull vfs idmapping updates from Christian Brauner:
- Last cycle we introduced the dedicated struct mnt_idmap type for
mount idmapping and the required infrastucture in 256c8aed2b ("fs:
introduce dedicated idmap type for mounts"). As promised in last
cycle's pull request message this converts everything to rely on
struct mnt_idmap.
Currently we still pass around the plain namespace that was attached
to a mount. This is in general pretty convenient but it makes it easy
to conflate namespaces that are relevant on the filesystem with
namespaces that are relevant on the mount level. Especially for
non-vfs developers without detailed knowledge in this area this was a
potential source for bugs.
This finishes the conversion. Instead of passing the plain namespace
around this updates all places that currently take a pointer to a
mnt_userns with a pointer to struct mnt_idmap.
Now that the conversion is done all helpers down to the really
low-level helpers only accept a struct mnt_idmap argument instead of
two namespace arguments.
Conflating mount and other idmappings will now cause the compiler to
complain loudly thus eliminating the possibility of any bugs. This
makes it impossible for filesystem developers to mix up mount and
filesystem idmappings as they are two distinct types and require
distinct helpers that cannot be used interchangeably.
Everything associated with struct mnt_idmap is moved into a single
separate file. With that change no code can poke around in struct
mnt_idmap. It can only be interacted with through dedicated helpers.
That means all filesystems are and all of the vfs is completely
oblivious to the actual implementation of idmappings.
We are now also able to extend struct mnt_idmap as we see fit. For
example, we can decouple it completely from namespaces for users that
don't require or don't want to use them at all. We can also extend
the concept of idmappings so we can cover filesystem specific
requirements.
In combination with the vfs{g,u}id_t work we finished in v6.2 this
makes this feature substantially more robust and thus difficult to
implement wrong by a given filesystem and also protects the vfs.
- Enable idmapped mounts for tmpfs and fulfill a longstanding request.
A long-standing request from users had been to make it possible to
create idmapped mounts for tmpfs. For example, to share the host's
tmpfs mount between multiple sandboxes. This is a prerequisite for
some advanced Kubernetes cases. Systemd also has a range of use-cases
to increase service isolation. And there are more users of this.
However, with all of the other work going on this was way down on the
priority list but luckily someone other than ourselves picked this
up.
As usual the patch is tiny as all the infrastructure work had been
done multiple kernel releases ago. In addition to all the tests that
we already have I requested that Rodrigo add a dedicated tmpfs
testsuite for idmapped mounts to xfstests. It is to be included into
xfstests during the v6.3 development cycle. This should add a slew of
additional tests.
* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
shmem: support idmapped mounts for tmpfs
fs: move mnt_idmap
fs: port vfs{g,u}id helpers to mnt_idmap
fs: port fs{g,u}id helpers to mnt_idmap
fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
fs: port i_{g,u}id_{needs_}update() to mnt_idmap
quota: port to mnt_idmap
fs: port privilege checking helpers to mnt_idmap
fs: port inode_owner_or_capable() to mnt_idmap
fs: port inode_init_owner() to mnt_idmap
fs: port acl to mnt_idmap
fs: port xattr to mnt_idmap
fs: port ->permission() to pass mnt_idmap
fs: port ->fileattr_set() to pass mnt_idmap
fs: port ->set_acl() to pass mnt_idmap
fs: port ->get_acl() to pass mnt_idmap
fs: port ->tmpfile() to pass mnt_idmap
fs: port ->rename() to pass mnt_idmap
fs: port ->mknod() to pass mnt_idmap
fs: port ->mkdir() to pass mnt_idmap
...
I'm guessing that the warning fired because there's some code path
that is called on module unload where the gss_krb5_enctypes file
was never set up.
name 'gss_krb5_enctypes'
WARNING: CPU: 0 PID: 6187 at fs/proc/generic.c:712 remove_proc_entry+0x38d/0x460 fs/proc/generic.c:712
destroy_krb5_enctypes_proc_entry net/sunrpc/auth_gss/svcauth_gss.c:1543 [inline]
gss_svc_shutdown_net+0x7d/0x2b0 net/sunrpc/auth_gss/svcauth_gss.c:2120
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
setup_net+0x9bd/0xe60 net/core/net_namespace.c:356
copy_net_ns+0x320/0x6b0 net/core/net_namespace.c:483
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
copy_namespaces+0x410/0x500 kernel/nsproxy.c:179
copy_process+0x311d/0x76b0 kernel/fork.c:2272
kernel_clone+0xeb/0x9a0 kernel/fork.c:2684
__do_sys_clone+0xba/0x100 kernel/fork.c:2825
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Reported-by: syzbot+04a8437497bcfb4afa95@syzkaller.appspotmail.com
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no need for the cost of this extra virtual function call
during every RPC transaction: the RQ_SECURE bit can be set properly
in ->xpo_recvfrom() instead.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
With the KUnit infrastructure recently added, we are free to define
other unit tests particular to our implementation. As an example,
I've added a self-test that encrypts then decrypts a string, and
checks the result.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 8009 provides sample encryption results. Add KUnit tests to
ensure our implementation derives the expected results for the
provided sample input.
I hate how large this test is, but using non-standard key usage
values means rfc8009_encrypt_case() can't simply reuse ->import_ctx
to allocate and key its ciphers; and the test provides its own
confounders, which means krb5_etm_encrypt() can't be used directly.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 8009 provides sample checksum results. Add KUnit tests to ensure
our implementation derives the expected results for the provided
sample input.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 8009 provides sample key derivation results, so Kunit tests are
added to ensure our implementation derives the expected keys for the
provided sample input.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The Camellia enctypes use a new KDF, so add some tests to ensure it
is working properly.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add Kunit tests for ENCTYPE_AES128_CTS_HMAC_SHA1_96. The test
vectors come from RFC 3962 Appendix B.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 3961 Appendix A provides tests for the KDF specified in that
document as well as other parts of Kerberos. The other three usage
scenarios in Section 10 are not implemented by the Linux kernel's
RPCSEC GSS Kerberos 5 mechanism, so tests are not added for those.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I plan to add KUnit tests that will need enctype profile
information. Export the enctype profile lookup function.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The Kerberos RFCs provide test vectors to verify the operation of
an implementation. Introduce a KUnit test framework to exercise the
Linux kernel's implementation of Kerberos.
Start with test cases for the RFC 3961-defined n-fold function. The
sample vectors for that are found in RFC 3961 Section 10.
Run the GSS Kerberos 5 mechanism's unit tests with this command:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./net/sunrpc/.kunitconfig
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The goal is to leave only protocol-defined items in gss_krb5.h so
that it can be easily replaced by a generic header. Implementation
specific items are moved to the new internal header.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add the RFC 6803 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The Camellia enctypes use the KDF_FEEDBACK_CMAC Key Derivation
Function defined in RFC 6803 Section 3.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 6803 defines two encryption types that use Camellia ciphers (RFC
3713) and CMAC digests. Implement support for those in SunRPC's GSS
Kerberos 5 mechanism.
There has not been an explicit request to support these enctypes.
However, this new set of enctypes provides a good alternative to the
AES-SHA1 enctypes that are to be deprecated at some point.
As this implementation is still a "beta", the default is to not
build it automatically.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Add the RFC 8009 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=400
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 8009 enctypes use different crypt formulae than previous
Kerberos 5 encryption types. Section 1 of RFC 8009 explains the
reason for this change:
> The new types conform to the framework specified in [RFC3961],
> but do not use the simplified profile, as the simplified profile
> is not compliant with modern cryptographic best practices such as
> calculating Message Authentication Codes (MACs) over ciphertext
> rather than plaintext.
Add new .encrypt and .decrypt functions to handle this variation.
The new approach described above is referred to as Encrypt-then-MAC
(or EtM). Hence the names of the new functions added here are
prefixed with "krb5_etm_".
A critical second difference with previous crypt formulae is that
the cipher state is included in the computed HMAC. Note however that
for RPCSEC, the initial cipher state is easy to compute on both
initiator and acceptor because it is always all zeroes.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The RFC 8009 encryption types use a different key derivation
function than the RFC 3962 encryption types. The new key derivation
function is defined in Section 3 of RFC 8009.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Fill in entries in the supported_gss_krb5_enctypes array for the
encryption types defined in RFC 8009. These new enctypes use the
SHA-256 and SHA-384 message digest algorithms (as defined in
FIPS-180) instead of the deprecated SHA-1 algorithm, and are thus
more secure.
Note that NIST has scheduled SHA-1 for deprecation:
https://www.nist.gov/news-events/news/2022/12/nist-retires-sha-1-cryptographic-algorithm
Thus these new encryption types are placed under a separate CONFIG
option to enable distributors to separately introduce support for
the AES-SHA2 enctypes and deprecate support for the current set of
AES-SHA1 encryption types as their user space allows.
As this implementation is still a "beta", the default is to not
build it automatically.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cryptosystem profile enctypes all use cipher block chaining
with ciphertext steal (CBC-with-CTS). However enctypes that are
currently supported in the Linux kernel SunRPC implementation
use only the encrypt-&-MAC approach. The RFC 8009 enctypes use
encrypt-then-MAC, which performs encryption and checksumming in
a different order.
Refactor to make it possible to share the CBC with CTS encryption
and decryption mechanisms between e&M and etM enctypes.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits. Add new fields to struct
gss_krb5_enctype that specify the key lengths individually, and
where needed, use the correct new field instead of ->keylength.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Although the Kerberos specs have always listed separate subkey
lengths, the Linux kernel's SunRPC GSS Kerberos enctype profiles
assume the base key and the derived keys have identical lengths.
The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits.
To support that enctype, parametrize context_v2_alloc_cipher() so
that each of its call sites can pass in its desired key length. For
now it will be the same length as before (gk5e->keylength), but a
subsequent patch will change this.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
De-duplicate some common code.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Each Kerberos enctype can have a different KDF. Refactor the key
derivation path to support different KDFs for the enctypes
introduced in subsequent patches.
In particular, expose the key derivation function in struct
gss_krb5_enctype instead of the enctype's preferred random-to-key
function. The latter is usually the identity function and is only
ever called during key derivation, so have each KDF call it
directly.
A couple of extra clean-ups:
- Deduplicate the set_cdata() helper
- Have ->derive_key return negative errnos, in accordance with usual
kernel coding conventions
This patch is a little bigger than I'd like, but these are all
mechanical changes and they are all to the same areas of code. No
behavior change is intended.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: there is now only one encrypt and only one decrypt method,
thus there is no longer a need for the v2-suffixed method names.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: ->encrypt is set to only one value. Replace the two
remaining call sites with direct calls to krb5_encrypt().
There have never been any call sites for the ->decrypt() method.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Because the DES block cipher has been deprecated by Internet
standard, highly secure configurations might require that DES
support be blacklisted or not installed. NFS Kerberos should still
be able to work correctly with only the AES-based enctypes in that
situation.
Also note that MIT Kerberos has begun a deprecation process for DES
encryption types. Their README for 1.19.3 states:
> Beginning with the krb5-1.19 release, a warning will be issued
> if initial credentials are acquired using the des3-cbc-sha1
> encryption type. In future releases, this encryption type will
> be disabled by default and eventually removed.
>
> Beginning with the krb5-1.18 release, single-DES encryption
> types have been removed.
Aside from the CONFIG option name change, there are two important
policy changes:
1. The 'insecure enctype' group is now disabled by default.
Distributors have to take action to enable support for deprecated
enctypes. Implementation of these enctypes will be removed in a
future kernel release.
2. des3-cbc-sha1 is now considered part of the 'insecure enctype'
group, having been deprecated by RFC 8429, and is thus disabled
by default
After this patch is applied, SunRPC support can be built with
Kerberos 5 support but without CRYPTO_DES enabled in the kernel.
And, when these enctypes are disabled, the Linux kernel's SunRPC
RPCSEC GSS implementation fully complies with BCP 179 / RFC 6649
and BCP 218 / RFC 8429.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that all consumers of the KRB5_SUPPORTED_ENCTYPES macro are
within the SunRPC layer, the macro can be replaced with something
private and more flexible.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
I would like to replace the KRB5_SUPPORTED_ENCTYPES macro so that
there is finer granularity about what enctype support is built in
to the kernel and then advertised by it.
The /proc/fs/nfsd/supported_krb5_enctypes file is a legacy API
that advertises supported enctypes to rpc.svcgssd (I think?). It
simply prints the value of the KRB5_SUPPORTED_ENCTYPES macro, so it
will need to be replaced with something that can instead display
exactly which enctypes are configured and built into the SunRPC
layer.
Completely decommissioning such APIs is hard. Instead, add a file
that is managed by SunRPC's GSS Kerberos mechanism, which is
authoritative about enctype support status. A subsequent patch will
replace /proc/fs/nfsd/supported_krb5_enctypes with a symlink to this
new file.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace another switch on encryption type so that it does not have
to be modified when adding or removing support for an enctype.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace a number of switches on encryption type so that all of them don't
have to be modified when adding or removing support for an enctype.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no need to keep the integrity keys around if we instead
allocate and key a pair of ahashes and keep those. This not only
enables the subkeys to be destroyed immediately after deriving
them, but it makes the Kerberos integrity code path more efficient.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no need to keep the signing keys around if we instead allocate
and key an ahash and keep that. This not only enables the subkeys to
be destroyed immediately after deriving them, but it makes the
Kerberos signing code path more efficient.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The encryption subkeys are not used after the cipher transforms have
been allocated and keyed. There is no need to retain them in struct
krb5_ctx.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Hoist the name of the aux_cipher into struct gss_krb5_enctype to
prepare for obscuring the encryption keys just after they are
derived.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
ctx->Ksess is never used after import has completed. Obscure it
immediately so it cannot be re-used or copied.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Other common Kerberos implementations use a fully random confounder
for encryption. The reason for this is explained in the new comment
added by this patch. The current get_random_bytes() implementation
does not exhaust system entropy.
Since confounder generation is part of Kerberos itself rather than
the GSS-API Kerberos mechanism, the function is renamed and moved.
Note that light top-down analysis shows that the SHA-1 transform
is by far the most CPU-intensive part of encryption. Thus we do not
expect this change to result in a significant performance impact.
However, eventually it might be necessary to generate an independent
stream of confounders for each Kerberos context to help improve I/O
parallelism.
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that arcfour-hmac is gone, the confounder length is again the
same as the cipher blocksize for every implemented enctype. The
gss_krb5_enctype::conflen field is no longer necessary.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It is not clear from documenting comments, specifications, or code
usage what value the gss_krb5_enctype.blocksize field is supposed
to store. The "encryption blocksize" depends only on the cipher
being used, so that value can be derived where it's needed instead
of stored as a constant.
RFC 3961 Section 5.2 says:
> cipher block size, c
> This is the block size of the block cipher underlying the
> encryption and decryption functions indicated above, used for key
> derivation and for the size of the message confounder and initial
> vector. (If a block cipher is not in use, some comparable
> parameter should be determined.) It must be at least 5 octets.
>
> This is not actually an independent parameter; rather, it is a
> property of the functions E and D. It is listed here to clarify
> the distinction between it and the message block size, m.
In the Linux kernel's implemenation of the SunRPC RPCSEC GSS
Kerberos 5 mechanism, the cipher block size, which is dependent on
the encryption and decryption transforms, is used only in
krb5_derive_key(), so it is straightforward to replace it.
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Eliminate the use of bus-locked operations in svc_xprt_enqueue(),
which is a hot path. Replace them with per-cpu variables to reduce
cross-CPU memory bus traffic.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that svcauth_gss_prepare_to_wrap() no longer computes the
location of RPC header fields in the response buffer,
svcauth_gss_accept() can save the location of the databody
rather than the location of the verifier.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
To navigate around the space that svcauth_gss_accept() reserves
for the RPC payload body length and sequence number fields,
svcauth_gss_release() does a little dance with the reply's
accept_stat, moving the accept_stat value in the response buffer
down by two words.
Instead, let's have the ->accept() methods each set the proper
final location of the accept_stat to avoid having to move
things.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Currently, svcauth_gss_accept() pre-reserves response buffer space
for the RPC payload length and GSS sequence number before returning
to the dispatcher, which then adds the header's accept_stat field.
The problem is the accept_stat field is supposed to go before the
length and seq_num fields. So svcauth_gss_release() has to relocate
the accept_stat value (see svcauth_gss_prepare_to_wrap()).
To enable these fields to be added to the response buffer in the
correct (final) order, the pointer to the accept_stat has to be made
available to svcauth_gss_accept() so that it can set it before
reserving space for the length and seq_num fields.
As a first step, move the pointer to the location of the accept_stat
field into struct svc_rqst.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The main part of RPC header encoding and the formation of error
responses are now done using the xdr_stream helpers. Bounds checking
before each XDR data item is encoded makes the server's encoding
path safer against accidental buffer overflows.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that each ->accept method has been converted, the
svcxdr_init_encode() calls can be hoisted back up into the generic
RPC server code.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header encoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This code constructs replies to the decorated NULL procedure calls
that establish GSS contexts. Convert this code path to use struct
xdr_stream to encode such responses.
Done as part of hardening the server-side RPC header encoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
We're now moving svcxdr_init_encode() to /before/ the flavor's
->accept method has set rq_auth_slack. Add a helper that can
set rq_auth_slack /after/ svcxdr_init_encode() has been called.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header encoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header encoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header encoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_encode() call into those methods.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it
is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 5531 defines an MSG_ACCEPTED Reply message like this:
struct accepted_reply {
opaque_auth verf;
union switch (accept_stat stat) {
case SUCCESS:
...
In the current server code, struct opaque_auth encoding is open-
coded. Introduce a helper that encodes an opaque_auth data item
within the context of a xdr_stream.
Done as part of hardening the server-side RPC header decoding and
encoding paths.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There's no RPC header field called rpc_stat; more precisely, the
variable appears to be recording an accept_stat value. But it looks
like we don't need to preserve this value at all, actually, so
simply remove the variable.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Commit 5b304bc5bf ("[PATCH] knfsd: svcrpc: gss: fix failure on
SVC_DENIED in integrity case") added a check to prevent wrapping an
RPC response if reply_stat == MSG_DENIED, assuming that the only way
to get to svcauth_gss_release() with that reply_stat value was if
the reject_stat was AUTH_ERROR (reject_stat == MISMATCH is handled
earlier in svc_process_common()).
The code there is somewhat confusing. For one thing, rpc_success is
an accept_stat value, not a reply_stat value. The correct reply_stat
value to look for is RPC_MSG_DENIED. It happens to be the same value
as rpc_success, so it all works out, but it's not terribly readable.
Since commit 438623a06b ("SUNRPC: Add svc_rqst::rq_auth_stat"),
the actual auth_stat value is stored in the svc_rqst, so that value
is now available to svcauth_gss_prepare_to_wrap() to make its
decision to wrap, based on direct information about the
authentication status of the RPC caller.
No behavior change is intended, this simply replaces some old code
with something that should be more self-documenting.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Actually xdr_stream does not add value here because of how
gss_wrap() works. This is just a clean-up patch.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Simplify the references to the head and tail iovecs for readability.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Match the error reporting in the other unwrap and wrap functions.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up variable names to match the other unwrap and wrap
functions.
Additionally, the explicit type cast on @gsd in unnecessary; and
@resbuf is renamed to match the variable naming in the unwrap
functions.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Replace finicky logic: Instead of trying to find scratch space in
the response buffer, use the scratch buffer from struct
gss_svc_data.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
An error computing the checksum here is an exceptional event.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: To help orient readers, name the stack variables to match
the XDR field names.
Additionally, the explicit type cast on @gsd is unnecessary; and
@resbuf is renamed to match the variable naming in the unwrap
functions.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that upper layers use an xdr_stream to track the construction
of each RPC Reply message, resbuf->len is kept up-to-date
automatically. There's no need to recompute it in svc_gss_release().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now the entire RPC Call header parsing path is handled via struct
xdr_stream-based decoders.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: With xdr_stream decoding, the @argv parameter is no longer
used.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: Saving the RPC program number in two places is
unnecessary.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that each ->accept method has been converted to use xdr_stream,
the svcxdr_init_decode() calls can be hoisted back up into the
generic RPC server code.
The dprintk in svc_authenticate() is removed, since
trace_svc_authenticate() reports the same information.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Micro-optimizations:
1. The value of rqstp->rq_auth_stat is replaced no matter which
arm of the switch is taken, so the initial assignment can be
safely removed.
2. Avoid checking the value of gc->gc_proc twice in the I/O
(RPC_GSS_PROC_DATA) path.
The cost is a little extra code redundancy.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: To help orient readers, name the stack variables to match
the XDR field names.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: To help orient readers, name the stack variables to match
the XDR field names.
For readability, I'm also going to rename the unwrap and wrap
functions in a consistent manner, starting with unwrap_integ_data().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up / code de-duplication - this functionality is already
available in the generic XDR layer.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The entire RPC_GSS_PROC_INIT path is converted over to xdr_stream
for decoding the Call credential and verifier.
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
gss_read_verf() is already short. Fold it into its only caller.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
gss_read_common_verf() is now just a wrapper for dup_netobj(), thus
it can be replaced with direct calls to dup_netobj().
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Since upcalls are infrequent, ensure the compiler places the upcall
mechanism out-of-line from the I/O path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Since the server-side of the Linux kernel SunRPC implementation
ignores the contents of the Call's machinename field, there's no
need for its RPC_AUTH_UNIX authenticator to reject names that are
larger than UNX_MAXNODENAME.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Done as part of hardening the server-side RPC header decoding path.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
RFC 5531 defines the body of an RPC Call message like this:
struct call_body {
unsigned int rpcvers;
unsigned int prog;
unsigned int vers;
unsigned int proc;
opaque_auth cred;
opaque_auth verf;
/* procedure-specific parameters start here */
};
In the current server code, decoding a struct opaque_auth type is
open-coded in several places, and is thus difficult to harden
everywhere.
Introduce a helper for decoding an opaque_auth within the context
of a xdr_stream. This helper can be shared with all authentication
flavor implemenations, even on the client-side.
Done as part of hardening the server-side RPC header decoding paths.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_decode() call into those methods.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that all vs_dispatch functions invoke svcxdr_init_decode(), it
is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This is a followup of commit 2558b8039d ("net: use a bounce
buffer for copying skb->mark")
x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]
Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.
[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The recent merge from net left-over some unused code in
leftover.c - nomen omen.
Just drop the unused bits.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
That really was meant to be a per netns attribute from the beginning.
The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.
To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Add safeguard to check for NULL tupe in objects updates via
NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.
2) Incorrect pointer check in the new destroy rule command,
from Yang Yingliang.
3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
from Florian Westphal.
4) Simplify seq_print_acct(), from Ilia Gavrilov.
5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
from Julian Anastasov.
6) TCP connection enters CLOSE state in conntrack for locally
originated TCP reset packet from the reject target,
from Florian Westphal.
The fixes#2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Change icmpv6_echo_reply() to return a drop reason.
For the moment, return NOT_SPECIFIED or SKB_CONSUMED.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hosts can often receive neighbour discovery messages
that are not for them.
Use a dedicated drop reason to make clear the packet is dropped
for this normal case.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a generic drop reason for any error detected
in ndisc_parse_options().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndisc_redirect_rcv() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and values from icmpv6_notify().
More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndisc_router_discovery() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and SKB_CONSUMED.
More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndisc_recv_rs() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndisc_recv_na() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change ndisc_recv_ns() to return a drop reason.
For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we try to start AF_XDP on some machines with long running time, due
to the machine's memory fragmentation problem, there is no sufficient
contiguous physical memory that will cause the start failure.
If the size of the queue is 8 * 1024, then the size of the desc[] is
8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is
16page+. This is necessary to apply for a 4-order memory. If there are a
lot of queues, it is difficult to these machine with long running time.
Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but
we only use 17 pages.
This patch replaces __get_free_pages() by vmalloc() to allocate memory
to solve these problems.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a certain probability that following
exceptions will occur in the wrk benchmark test:
Running 10s test @ http://11.213.45.6:80
8 threads and 64 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.72ms 13.94ms 245.33ms 94.17%
Req/Sec 1.96k 713.67 5.41k 75.16%
155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3
We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.
Consider the following scenarios:
CPU0 CPU1
buf_desc->used = 0;
cmpxchg(buf_desc->used, 0, 1)
deal_with(buf_desc)
memset(buf_desc->cpu_addr,0);
This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.
This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.
Fixes: 1c5526968e ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is a certain chance to trigger the following panic:
PID: 5900 TASK: ffff88c1c8af4100 CPU: 1 COMMAND: "kworker/1:48"
#0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7
#1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a
#2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60
#3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7
#4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715
#5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654
#6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62
[exception RIP: ib_alloc_mr+19]
RIP: ffffffffc0c9cce3 RSP: ffff9456c1cc7c38 RFLAGS: 00010202
RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000004
RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88c1ea281d00 R8: 000000020a34ffff R9: ffff88c1350bbb20
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000010 R14: ffff88c1ab040a50 R15: ffff88c1ea281d00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc]
#8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc]
#9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc]
The reason here is that when the server tries to create a second link,
smc_llc_srv_add_link() has no protection and may add a new link to
link group. This breaks the security environment protected by
llc_conf_mutex.
Fixes: 2d2209f201 ("net/smc: first part of add link processing as SMC server")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It makes no sense to keep randomly large max_sdu values, especially if
larger than the device's max_mtu. These are visible in "tc qdisc show".
Such a max_sdu is practically unlimited and will cause no packets for
that traffic class to be dropped on enqueue.
Just set max_sdu_dynamic to U32_MAX, which in the logic below causes
taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user
space of 0 (unlimited).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The overhead specified in the size table comes from the user. With small
time intervals (or gates always closed), the overhead can be larger than
the max interval for that traffic class, and their difference is
negative.
What we want to happen is for max_sdu_dynamic to have the smallest
non-zero value possible (1) which means that all packets on that traffic
class are dropped on enqueue. However, since max_sdu_dynamic is u32, a
negative is represented as a large value and oversized dropping never
happens.
Use max_t with int to force a truncation of max_frm_len to no smaller
than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no
smaller than 1.
Fixes: fed87cc671 ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
taprio_calculate_gate_durations() depends on netdev_get_num_tc() and
this returns 0. So it calculates the maximum gate durations for no
traffic class.
I had tested the blamed commit only with another patch in my tree, one
which in the end I decided isn't valuable enough to submit ("net/sched:
taprio: mask off bits in gate mask that exceed number of TCs").
The problem is that having this patch threw off my testing. By moving
the netdev_set_num_tc() call earlier, we implicitly gave to
taprio_calculate_gate_durations() the information it needed.
Extract only the portion from the unsubmitted change which applies the
mqprio configuration to the netdev earlier.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/
Fixes: a306a90c8f ("net/sched: taprio: calculate tc gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Fix three cases of overproduction of wakeups:
(1) rxrpc_input_split_jumbo() conditionally notifies the app that there's
data for recvmsg() to collect if it queues some data - and then its
only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway.
Fix the rxrpc_input_data() to only do the wakeup in failure cases.
(2) If a DATA packet is received for a call by the I/O thread whilst
recvmsg() is busy draining the call's rx queue in the app thread, the
call will left on the recvmsg() queue for recvmsg() to pick up, even
though there isn't any data on it.
This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR
set after the reply has been posted to a service call.
Fix this by discarding pending calls from the recvmsg() queue that
don't need servicing yet.
(3) Not-yet-completed calls get requeued after having data read from them,
even if they have no data to read.
Fix this by only requeuing them if they have data waiting on them; if
they don't, the I/O thread will requeue them when data arrives or they
fail.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
In order to prevent a device from disappearing when a background job was
started, dev_hold() and dev_put() calls were made. During the
stabilization phase of the scan/beacon features, it was later decided
that removing the device while a background job was ongoing was a valid use
case, and we should instead stop the background job and then remove the
device, rather than prevent the device from being removed. This is what
is currently done, which means manually reference counting the device
during background jobs is no longer needed.
Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b ("ieee802154: Add support for user beaconing requests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
At this stage we simply do not care about the delayed work value,
because active scan is not yet supported, so we can blindly queue
another work once a beacon has been sent.
It fixes a smatch warning:
mac802154_beacon_worker() warn: always true condition
'(local->beacon_interval >= 0) => (0-u32max >= 0)'
Fixes: 3accf47627 ("mac802154: Handle basic beaconing")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Using ieee802154_subif_start_xmit() to bypass the net queue when
sending beacons is broken because it does not acquire the
HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by
beacons upon contention situation. Using the mlme_tx helper is not the
best fit either but at least it is not buggy and has little-to-no
performance hit. More details are given in the comment explaining this
choice in the code.
Fixes: 3accf47627 ("mac802154: Handle basic beaconing")
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Returning EPERM gives the impression that "right now" it is not
possible, but "later" it could be, while what we want to express is the
fact that this is not currently supported at all (might change in the
future). So let's return EOPNOTSUPP instead.
Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Suggested-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Instead of printing error messages in the kernel log, let's use extack.
When there is a netlink error returned that could be further specified
with a string, use extack as well.
Apply this logic to the very recent scan/beacon infrastructure.
Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b ("ieee802154: Add support for user beaconing requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
Instead of open-coding scan parameters (page, channels, duration, etc),
let's use the existing NLA_POLICY* macros. This help greatly reducing
the error handling and clarifying the overall logic.
Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.
In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.
This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.
This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup
helper that will be used in a latter selftest. In particular, it
calls ___neigh_lookup_noref that expects the bh disabled.
This patch disables bh before calling bpf_prog_run[_xdp], so
the testing prog can also use those helpers.
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev
iptables/nftables support responding to tcp packets with tcp resets.
The generated tcp reset packet passes through both output and postrouting
netfilter hooks, but conntrack will never see them because the generated
skb has its ->nfct pointer copied over from the packet that triggered the
reset rule.
If the reset rule is used for established connections, this
may result in the conntrack entry to be around for a very long
time (default timeout is 5 days).
One way to avoid this would be to not copy the nf_conn pointer
so that the rest packet passes through conntrack too.
Problem is that output rules might not have the same conntrack
zone setup as the prerouting ones, so its possible that the
reset skb won't find the correct entry. Generating a template
entry for the skb seems error prone as well.
Add an explicit "closing" function that switches a confirmed
conntrack entry to closed state and wire this up for tcp.
If the entry isn't confirmed, no action is needed because
the conntrack entry will never be committed to the table.
Reported-by: Russel King <linux@armlinux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* EHT channel puncturing support (client & AP)
* some support for AP MLD without mac80211
* fixes for A-MSDU on mesh connections
Major driver changes:
iwlwifi
* EHT rate reporting
* Bump FW API to 74 for AX devices
* STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware
mt76
* switch to using page pool allocator
* mt7996 EHT (Wi-Fi 7) support
* Wireless Ethernet Dispatch (WED) reset support
libertas
* WPS enrollee support
brcmfmac
* Rename Cypress 89459 to BCM4355
* BCM4355 and BCM4377 support
mwifiex
* SD8978 chipset support
rtl8xxxu
* LED support
ath12k
* new driver for Qualcomm Wi-Fi 7 devices
ath11k
* IPQ5018 support
* Fine Timing Measurement (FTM) responder role support
* channel 177 support
ath10k
* store WLAN firmware version in SMEM image table
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmPuCvYACgkQ10qiO8sP
aADG4g/+NkmZHdlQVaWHRnv+hm6/BOODJDpe+tzSqP+kUKFuKWJA6k2RozV6iz2L
SLpOefU9Wq05xGwP2BfVPGdYrD9eEoPrToN0t6up85iPyJurtd2yg7fiOt7vLk0K
UahZQd+hnp2N2wDFH+pJn6xWaQvO1ZM46i2ufvxGKEOuRyDiq3abKeba4UWxCOWF
Zw+3rnfhvki1I4ZBLWuEBHrRoTPYfU7s7bkV2k2GdKJhhTMbojFkKPUMhO8XMba0
NxT5NcSCsQR3kBLbms/Wy1Q0f6G/rK8O3N0QLaOTpT3UMLLkgKLMHyjU+MNiQ6rH
G6aC5T9U6br8t+xXMFUmkvdI2jnbZ7nHYDBKrs9kGqq0lfbp2Xjifxu2882qOOFh
4OfVrnoUQQOrl8RGgwvs6/8gpoXSIsb4M03G3xD5MVwFncTs9V5ehMnSUSkzgrdK
XN2RijXN4V3SULGXyR8MqsS8O70RK6QjHarJtTClzdIHoHTet7NqSpnx5stuF6+K
MbZD0C10z1sgiUA591wrfwINFLO1xQKYDsUlds2xoaXi3zb6LiEgfyNUUouN//aj
dJ/YEtnwc4qCGqv2SeAbkjj0T9jGS3njziUp2tQyi2GGZB8bSLcb9Vz96+o3cjLC
V4AXk7o4y3XurCN+gj2VL9tqNigJqyHMRbb1HxxyPskb1BXOOJo=
=p6+A
-----END PGP SIGNATURE-----
Merge tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Johannes Berg says:
====================
Major stack changes:
* EHT channel puncturing support (client & AP)
* some support for AP MLD without mac80211
* fixes for A-MSDU on mesh connections
Major driver changes:
iwlwifi
* EHT rate reporting
* Bump FW API to 74 for AX devices
* STEP equalizer support: transfer some STEP (connection to radio
on platforms with integrated wifi) related parameters from the
BIOS to the firmware
mt76
* switch to using page pool allocator
* mt7996 EHT (Wi-Fi 7) support
* Wireless Ethernet Dispatch (WED) reset support
libertas
* WPS enrollee support
brcmfmac
* Rename Cypress 89459 to BCM4355
* BCM4355 and BCM4377 support
mwifiex
* SD8978 chipset support
rtl8xxxu
* LED support
ath12k
* new driver for Qualcomm Wi-Fi 7 devices
ath11k
* IPQ5018 support
* Fine Timing Measurement (FTM) responder role support
* channel 177 support
ath10k
* store WLAN firmware version in SMEM image table
* tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits)
wifi: brcmfmac: p2p: Introduce generic flexible array frame member
wifi: mac80211: add documentation for amsdu_mesh_control
wifi: cfg80211: remove gfp parameter from cfg80211_obss_color_collision_notify description
wifi: mac80211: always initialize link_sta with sta
wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
wifi: cfg80211: Set SSID if it is not already set
wifi: rtw89: move H2C of del_pkt_offload before polling FW status ready
wifi: rtw89: use readable return 0 in rtw89_mac_cfg_ppdu_status()
wifi: rtw88: usb: drop now unnecessary URB size check
wifi: rtw88: usb: send Zero length packets if necessary
wifi: rtw88: usb: Set qsel correctly
wifi: mac80211: fix off-by-one link setting
wifi: mac80211: Fix for Rx fragmented action frames
wifi: mac80211: avoid u32_encode_bits() warning
wifi: mac80211: Don't translate MLD addresses for multicast
wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint
wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify
wifi: nl80211: Allow authentication frames and set keys on NAN interface
wifi: mac80211: fix non-MLO station association
wifi: mac80211: Allow NSS change only up to capability
...
====================
Link: https://lore.kernel.org/r/20230216105406.208416-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
- Penultimate Segment Pop (PSP);
- Ultimate Segment Pop (USP);
- Ultimate Segment Decapsulation (USD).
Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.
A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.
Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
i) decreasing the Segment Left (SL) from 1 to 0;
ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
Address (DA);
iii) removing (i.e., popping) the outer SRH from the extension headers
following the IPv6 header.
It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.
SL=2 SL=1 SL=0
| | |
For example, given the SRv6 policy (SID List := < X, Y, Z >):
- a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
operation as Segment Left (SL) is 1, corresponding to the Penultimate
Segment of the SID List;
- a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
PSP operation as the Segment Left is 2. This behavior instance will
apply the "standard" End packet processing, ignoring the configured PSP
flavor at all.
[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The End nexthop lookup/input operations are moved into a new helper
function named input_action_end_finish(). This avoids duplicating the
code needed to compute the nexthop in the different flavors of the End
behavior.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Cited commit changed devlink to register its netdev notifier block on
the global netdev notifier chain instead of on the per network namespace
one.
However, when changing the network namespace of the devlink instance,
devlink still tries to unregister its notifier block from the chain of
the old namespace and register it on the chain of the new namespace.
This results in corruption of the notifier chains, as the same notifier
block is registered on two different chains: The global one and the per
network namespace one. In turn, this causes other problems such as the
inability to dismantle namespaces due to netdev reference count issues.
Fix by preventing devlink from moving its notifier block between
namespaces.
Reproducer:
# echo "10 1" > /sys/bus/netdevsim/new_device
# ip netns add test123
# devlink dev reload netdevsim/netdevsim10 netns test123
# ip netns del test123
[ 71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2
[ 71.938348] leaked reference.
Fixes: 565b4824c3 ("devlink: change port event netdev notifier from per-net to global")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230215073139.1360108-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Since act_pedit now has access to percpu counters, use the
tcf_action_inc_overlimit_qstats wrapper that will use the percpu
counter whenever they are available.
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tc action act_gate was using shared stats, move it to percpu stats.
tdc results:
1..12
ok 1 5153 - Add gate action with priority and sched-entry
ok 2 7189 - Add gate action with base-time
ok 3 a721 - Add gate action with cycle-time
ok 4 c029 - Add gate action with cycle-time-ext
ok 5 3719 - Replace gate base-time action
ok 6 d821 - Delete gate action with valid index
ok 7 3128 - Delete gate action with invalid index
ok 8 7837 - List gate actions
ok 9 9273 - Flush gate actions
ok 10 c829 - Add gate action with duplicate index
ok 11 3043 - Add gate action with invalid index
ok 12 2930 - Add gate action with cookie
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tc action act_connmark was using shared stats and taking the per
action lock in the datapath. Improve it by using percpu stats and rcu.
perf before:
- 13.55% tcf_connmark_act
- 81.18% _raw_spin_lock
80.46% native_queued_spin_lock_slowpath
perf after:
- 2.85% tcf_connmark_act
tdc results:
1..15
ok 1 2002 - Add valid connmark action with defaults
ok 2 56a5 - Add valid connmark action with control pass
ok 3 7c66 - Add valid connmark action with control drop
ok 4 a913 - Add valid connmark action with control pipe
ok 5 bdd8 - Add valid connmark action with control reclassify
ok 6 b8be - Add valid connmark action with control continue
ok 7 d8a6 - Add valid connmark action with control jump
ok 8 aae8 - Add valid connmark action with zone argument
ok 9 2f0b - Add valid connmark action with invalid zone argument
ok 10 9305 - Add connmark action with unsupported argument
ok 11 71ca - Add valid connmark action and replace it
ok 12 5f8f - Add valid connmark action with cookie
ok 13 c506 - Replace connmark with invalid goto chain control
ok 14 6571 - Delete connmark action with valid index
ok 15 3426 - Delete connmark action with invalid index
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tc action act_nat was using shared stats and taking the per action
lock in the datapath. Improve it by using percpu stats and rcu.
perf before:
- 10.48% tcf_nat_act
- 81.83% _raw_spin_lock
81.08% native_queued_spin_lock_slowpath
perf after:
- 0.48% tcf_nat_act
tdc results:
1..27
ok 1 7565 - Add nat action on ingress with default control action
ok 2 fd79 - Add nat action on ingress with pipe control action
ok 3 eab9 - Add nat action on ingress with continue control action
ok 4 c53a - Add nat action on ingress with reclassify control action
ok 5 76c9 - Add nat action on ingress with jump control action
ok 6 24c6 - Add nat action on ingress with drop control action
ok 7 2120 - Add nat action on ingress with maximum index value
ok 8 3e9d - Add nat action on ingress with invalid index value
ok 9 f6c9 - Add nat action on ingress with invalid IP address
ok 10 be25 - Add nat action on ingress with invalid argument
ok 11 a7bd - Add nat action on ingress with DEFAULT IP address
ok 12 ee1e - Add nat action on ingress with ANY IP address
ok 13 1de8 - Add nat action on ingress with ALL IP address
ok 14 8dba - Add nat action on egress with default control action
ok 15 19a7 - Add nat action on egress with pipe control action
ok 16 f1d9 - Add nat action on egress with continue control action
ok 17 6d4a - Add nat action on egress with reclassify control action
ok 18 b313 - Add nat action on egress with jump control action
ok 19 d9fc - Add nat action on egress with drop control action
ok 20 a895 - Add nat action on egress with DEFAULT IP address
ok 21 2572 - Add nat action on egress with ANY IP address
ok 22 37f3 - Add nat action on egress with ALL IP address
ok 23 6054 - Add nat action on egress with cookie
ok 24 79d6 - Add nat action on ingress with cookie
ok 25 4b12 - Replace nat action with invalid goto chain control
ok 26 b811 - Delete nat action with valid index
ok 27 a521 - Delete nat action with invalid index
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The kernel stack can be more consistent by printing the IFF_PROMISC
aka promiscuous enable/disable messages with the standard netdev_info
message which can include bus and driver info as well as the device.
typical command usage from user space looks like:
ip link set eth0 promisc <on|off>
But lots of utilities such as bridge, tcpdump, etc put the interface into
promiscuous mode.
old message:
[ 406.034418] device eth0 entered promiscuous mode
[ 408.424703] device eth0 left promiscuous mode
new message:
[ 406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode
[ 408.424715] ice 0000:17:00.0 eth0: left promiscuous mode
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are
no log messages printed to the kernel log to indicate anything happened.
This is inexplicably different from most other dev->flags changes, and
could suprise the user.
Typically this occurs from user-space when a user:
ip link set eth0 allmulticast <on|off>
However, other devices like bridge set allmulticast as well, and many
other flows might trigger entry into allmulticast as well.
The new message uses the standard netdev_info print and looks like:
[ 413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode
[ 415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The rsvp classifier has served us well for about a quarter of a century but has
has not been getting much maintenance attention due to lack of known users.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The tcindex classifier has served us well for about a quarter of a century
but has not been getting much TLC due to lack of known users. Most recently
it has become easy prey to syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The dsmark qdisc has served us well over the years for diffserv but has not
been getting much attention due to other more popular approaches to do diffserv
services. Most recently it has become a shooting target for syzkaller. For this
reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
The ATM qdisc has served us well over the years but has not been getting much
TLC due to lack of known users. Most recently it has become a shooting target
for syzkaller. For this reason, we are retiring it.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
While this amazing qdisc has served us well over the years it has not been
getting any tender love and care and has bitrotted over time.
It has become mostly a shooting target for syzkaller lately.
For this reason, we are retiring it. Goodbye CBQ - we loved you.
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
MSG_ZEROCOPY ensures that pinned user pages do not exceed the limit.
If no limit is set, skip this accounting as otherwise expensive
atomic_long operations are called for no reason.
This accounting is already skipped for privileged (CAP_IPC_LOCK)
users. Rely on the same mechanism: if no mmp->user is set,
mm_unaccount_pinned_pages does not decrement either.
Tested by running tools/testing/selftests/net/msg_zerocopy.sh with
an unprivileged user for the TXMODE binary:
ip netns exec "${NS1}" sudo -u "{$USER}" "${BIN}" "-${IP}" ...
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230214155740.3448763-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Now that all devlink health callbacks and related code are in file
health.c move common health functions and devlink_health_reporter struct
to be local in health.c file.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health report test callback from leftover.c to health.c. No
functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health report dump callbacks and related code from
leftover.c to health.c. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Devlink fmsg (formatted message) is used by devlink health diagnose,
dump and drivers which support these devlink health callbacks.
Therefore, move devlink fmsg helpers and related code to file health.c.
Move devlink health diagnose to file health.c. No functional change in
this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health report helper and recover callback and related code
from leftover.c to health.c. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health get and set callbacks and related code from
leftover.c to health.c. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix
it to call nla_nest_cancel() instead.
Note the bug is harmless as genlmsg_cancel() cancel the entire message,
so no fixes tag added.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Move devlink health reporter create/destroy and related dev code to new
file health.c. This file shall include all callbacks and functionality
that are related to devlink health.
In addition, fix kdoc indentation and make reporter create/destroy kdoc
more clear. No functional change in this patch.
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
&xdp_buff and &xdp_frame are bound in a way that
xdp_buff->data_hard_start == xdp_frame
It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:
for (u32 i = 0; i < 0xdead; i++) {
xdpf = xdp_convert_buff_to_frame(&xdp);
xdp_convert_frame_to_buff(xdpf, &xdp);
}
shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.
Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.
(was found while testing XDP traffic generator on ice, which calls
xdp_convert_frame_to_buff() for each XDP frame)
Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This documentation wasn't added in the original patch,
add it now.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 6e4c0d0460 ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When we have multiple interfaces receiving the same frame,
such as a multicast frame, one interface might have a sta
and the other not. In this case, link_sta would be set but
not cleared again.
Always set link_sta, so we keep an invariant that link_sta
and sta are either both set or both not set.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
There's at least one case in ieee80211_rx_for_interface()
where we might pass &((struct sta_info *)NULL)->sta to it
only to then do container_of(), and then checking the
result for NULL, but checking the result of container_of()
for NULL looks really odd.
Fix this by just passing the struct sta_info * instead.
Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When a connection was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.
When using a userspace configuration that does not call
cfg80211_connect() (can be checked with breakpoints in the kernel),
this patch should allow `networkctl status device_name` to output the
SSID instead of null.
Cc: stable@vger.kernel.org
Reported-by: Yohan Prod'homme <kernel@zoddo.fr>
Fixes: 7b0a0e3c3a (wifi: cfg80211: do some rework towards MLO link APIs)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand@systemb.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When swap is activated to a file on an NFSv4 mount we arrange that the
state manager thread is always present as starting a new thread requires
memory allocations that might block waiting for swap.
Unfortunately the code for allowing the state manager thread to exit when
swap is disabled was not tested properly and does not work.
This can be seen by examining /proc/fs/nfsfs/servers after disabling swap
and unmounting the filesystem. The servers file will still list one
entry. Also a "ps" listing will show the state manager thread is still
present.
There are two problems.
1/ rpc_clnt_swap_deactivate() doesn't walk up the ->cl_parent list to
find the primary client on which the state manager runs.
2/ The thread is not woken up properly and it immediately goes back to
sleep without checking whether it is really needed. Using
nfs4_schedule_state_manager() ensures a proper wake-up.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 4dc73c6791 ("NFSv4: keep state manager thread active if swap is enabled")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).
Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.
Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.
sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.
Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e48c414ee6 ("[INET]: Generalise the TCP sock ID lookup routines")
commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another
commit 463c84b97f ("[NET]: Introduce inet_connection_sock") removed it.
Since we could track all of them through bpf and kprobe related tools
and the feature could print loads of information which might not be
that helpful even under a little bit pressure, the whole feature which
has been inactive for many years is no longer supported.
Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions to prevent
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definition to prevent
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
When sending a SYN message, this kernel stack trace is observed:
...
[ 13.396352] RIP: 0010:_copy_from_iter+0xb4/0x550
...
[ 13.398494] Call Trace:
[ 13.398630] <TASK>
[ 13.398630] ? __alloc_skb+0xed/0x1a0
[ 13.398630] tipc_msg_build+0x12c/0x670 [tipc]
[ 13.398630] ? shmem_add_to_page_cache.isra.71+0x151/0x290
[ 13.398630] __tipc_sendmsg+0x2d1/0x710 [tipc]
[ 13.398630] ? tipc_connect+0x1d9/0x230 [tipc]
[ 13.398630] ? __local_bh_enable_ip+0x37/0x80
[ 13.398630] tipc_connect+0x1d9/0x230 [tipc]
[ 13.398630] ? __sys_connect+0x9f/0xd0
[ 13.398630] __sys_connect+0x9f/0xd0
[ 13.398630] ? preempt_count_add+0x4d/0xa0
[ 13.398630] ? fpregs_assert_state_consistent+0x22/0x50
[ 13.398630] __x64_sys_connect+0x16/0x20
[ 13.398630] do_syscall_64+0x42/0x90
[ 13.398630] entry_SYSCALL_64_after_hwframe+0x63/0xcd
It is because commit a41dad905e ("iov_iter: saner checks for attempt
to copy to/from iterator") has introduced sanity check for copying
from/to iov iterator. Lacking of copy direction from the iterator
viewpoint would lead to kernel stack trace like above.
This commit fixes this issue by initializing the iov iterator with
the correct copy direction when sending SYN or ACK without data.
Fixes: f25dcc7687 ("tipc: tipc ->sendmsg() conversion")
Reported-by: syzbot+d43608d061e8847ec9f3@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230214012606.5804-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.
Take advantage of this to constify the structure definitions to prevent
modification at runtime.
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The convention for find_first_bit() is 0-based, while ffs()
is 1-based, so this is now off-by-one. I cannot reproduce the
gcc-9 problem, but since the -1 is now removed, I'm hoping it
will still avoid the original issue.
Reported-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Fixes: 1d8d4af434 ("wifi: mac80211: avoid u32_encode_bits() warning")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The ieee80211_accept_frame() function performs a number of early checks
to decide whether or not further processing needs to be done on a frame.
One of those checks is the ieee80211_is_robust_mgmt_frame() function.
It requires to peek into the frame payload, but because defragmentation
does not occur until later on in the receive path, this peek is invalid
for any fragment other than the first one. Also, in this scenario there
is no STA and so the fragmented frame will be dropped later on in the
process and will not reach the upper stack. This can happen with large
action frames at low rates, for example, we see issues with DPP on S1G.
This change will only check if the frame is robust if it's the first
fragment. Invalid fragmented packets will be discarded later after
defragmentation is completed.
Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com>
Link: https://lore.kernel.org/r/20221124005336.1618411-1-gilad.itzkovitch@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
gcc-9 triggers a false-postive warning in ieee80211_mlo_multicast_tx()
for u32_encode_bits(ffs(links) - 1, ...), since ffs() can return zero
on an empty bitmask, and the negative argument to u32_encode_bits()
is then out of range:
In file included from include/linux/ieee80211.h:21,
from include/net/cfg80211.h:23,
from net/mac80211/tx.c:23:
In function 'u32_encode_bits',
inlined from 'ieee80211_mlo_multicast_tx' at net/mac80211/tx.c:4437:17,
inlined from 'ieee80211_subif_start_xmit' at net/mac80211/tx.c:4485:3:
include/linux/bitfield.h:177:3: error: call to '__field_overflow' declared with attribute error: value doesn't fit into mask
177 | __field_overflow(); \
| ^~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:197:2: note: in expansion of macro '____MAKE_OP'
197 | ____MAKE_OP(u##size,u##size,,)
| ^~~~~~~~~~~
include/linux/bitfield.h:200:1: note: in expansion of macro '__MAKE_OP'
200 | __MAKE_OP(32)
| ^~~~~~~~~
Newer compiler versions do not cause problems with the zero argument
because they do not consider this a __builtin_constant_p().
It's also harmless since the hweight16() check already guarantees
that this cannot be 0.
Replace the ffs() with an equivalent find_first_bit() check that
matches the later for_each_set_bit() style and avoids the warning.
Fixes: 963d0e8d08 ("wifi: mac80211: optionally implement MLO multicast TX")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230214132025.1532147-1-arnd@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The change on network namespace only makes sense during re-init reload
action. For FW activation it is not applicable. So check if user passed
an ATTR indicating network namespace change request and forbid it.
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230213115836.3404039-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
MLD address translation should be done only for individually addressed
frames. Otherwise, AAD calculation would be wrong and the decryption
would fail.
Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Link: https://lore.kernel.org/r/20230214101048.792414-1-andrei.otcheretianski@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently the regulatory driver does not call the regulatory callback
reg_notifier for self managed wiphys. Sometimes driver needs cfg80211
to calculate the info of ieee80211_channel such as flags and power,
and driver needs to get the info of ieee80211_channel after hint of
driver, but driver does not know when calculation of the info of
ieee80211_channel become finished, so add notify to driver in
reg_process_self_managed_hint() from cfg80211 is a good way, then
driver could get the correct info in callback of reg_notifier.
Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230201065313.27203-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Wi-Fi Aware R4 specification defines NAN Pairing which uses PASN handshake
to authenticate the peer and generate keys. Hence allow to register and transmit
the PASN authentication frames on NAN interface and set the keys to driver or
underlying modules on NAN interface.
The driver needs to configure the feature flag NL80211_EXT_FEATURE_SECURE_NAN,
which also helps userspace modules to know if the driver supports secure NAN.
Signed-off-by: Vinay Gannevaram <quic_vganneva@quicinc.com>
Link: https://lore.kernel.org/r/1675519179-24174-1-git-send-email-quic_vganneva@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Non-MLO station frames are dropped in Rx path due to the condition
check in ieee80211_rx_is_valid_sta_link_id(). In multi-link AP scenario,
non-MLO stations try to connect in any of the valid links in the ML AP,
where the station valid_links and link_id params are valid in the
ieee80211_sta object. But ieee80211_rx_is_valid_sta_link_id() always
return false for the non-MLO stations by the assumption taken is
valid_links and link_id are not valid in non-MLO stations object
(ieee80211_sta), this assumption is wrong. Due to this assumption,
non-MLO station frames are dropped which leads to failure in association.
Fix it by removing the condition check and allow the link validation
check for the non-MLO stations.
Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com>
Link: https://lore.kernel.org/r/20230206160330.1613-1-quic_periyasa@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Stations can update bandwidth/NSS change in
VHT action frame with action type Operating Mode Notification.
(IEEE Std 802.11-2020 - 9.4.1.53 Operating Mode field)
For Operating Mode Notification, an RX NSS change to a value
greater than AP's maximum NSS should not be allowed.
During fuzz testing, by forcefully sending VHT Op. mode notif.
frames from STA with random rx_nss values, it is found that AP
accepts rx_nss values greater that APs maximum NSS instead of
discarding such NSS change.
Hence allow NSS change only up to maximum NSS that is negotiated
and capped to AP's capability during association.
Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/20230207114146.10567-1-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
At least ath10k and ath11k supported hardware (maybe more) does not implement
mesh A-MSDU aggregation in a standard compliant way.
802.11-2020 9.3.2.2.2 declares that the Mesh Control field is part of the
A-MSDU header (and little-endian).
As such, its length must not be included in the subframe length field.
Hardware affected by this bug treats the mesh control field as part of the
MSDU data and sets the length accordingly.
In order to avoid packet loss, keep track of which stations are affected
by this and take it into account when converting A-MSDU to 802.3 + mesh control
packets.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The current mac80211 mesh A-MSDU receive path fails to parse A-MSDU packets
on mesh interfaces, because it assumes that the Mesh Control field is always
directly after the 802.11 header.
802.11-2020 9.3.2.2.2 Figure 9-70 shows that the Mesh Control field is
actually part of the A-MSDU subframe header.
This makes more sense, since it allows packets for multiple different
destinations to be included in the same A-MSDU, as long as RA and TID are
still the same.
Another issue is the fact that the A-MSDU subframe length field was apparently
accidentally defined as little-endian in the standard.
In order to fix this, the mesh forwarding path needs happen at a different
point in the receive path.
ieee80211_data_to_8023_exthdr is changed to ignore the mesh control field
and leave it in after the ethernet header. This also affects the source/dest
MAC address fields, which now in the case of mesh point to the mesh SA/DA.
ieee80211_amsdu_to_8023s is changed to deal with the endian difference and
to add the Mesh Control length to the subframe length, since it's not covered
by the MSDU length field.
With these changes, the mac80211 will get the same packet structure for
converted regular data packets and unpacked A-MSDU subframes.
The mesh forwarding checks are now only performed after the A-MSDU decap.
For locally received packets, the Mesh Control header is stripped away.
For forwarded packets, a new 802.11 header gets added.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-4-nbd@nbd.name
[fix fortify build error]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Now that all drivers use iTXQ, it does not make sense to check to drop
tx forwarding packets when the driver has stopped the queues.
fq_codel will take care of dropping packets when the queues fill up
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
When parsing the outer A-MSDU header, don't check for inner bridge tunnel
or RFC1042 headers. This is handled by ieee80211_amsdu_to_8023s already.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The value of last_rate in ieee80211_sta_rx_stats is degraded from u32 to
u16 after being assigned to rate variable, which causes information loss
in STA_STATS_FIELD_TYPE and later bitfields.
Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Link: https://lore.kernel.org/r/20230209110659.25447-1-shayne.chen@mediatek.com
Fixes: 41cbb0f5a2 ("mac80211: add support for HE")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.
Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230206081641.3193-1-liubo03@inspur.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Currently action frames TX only with ML address as A3(BSSID) are
allowed in an ML AP, but TX for a non-ML Station can happen in any
link of an ML BSS with link BSS address as A3.
In case of an MLD, if User-space has provided a valid link_id in
action frame TX request, allow transmission of the frame in that link.
Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/20230201061602.3918-1-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Configure the bitmap in link_conf and notify the driver.
- Modify 'change' in ieee80211_start_ap() from u32 to u64 to support
BSS_CHANGED_EHT_PUNCTURING.
- Propagate the bitmap in channel switch events to userspace.
Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Signed-off-by: Muna Sinada <quic_msinada@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-5-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- New feature flag, NL80211_EXT_FEATURE_PUNCT, to advertise
driver support for preamble puncturing in AP mode.
- New attribute, NL80211_ATTR_PUNCT_BITMAP, to receive a puncturing
bitmap from the userspace during AP bring up (NL80211_CMD_START_AP)
and channel switch (NL80211_CMD_CHANNEL_SWITCH) operations. Each bit
corresponds to a 20 MHz channel in the operating bandwidth, lowest
bit for the lowest channel. Bit set to 1 indicates that the channel
is punctured. Higher 16 bits are reserved.
- New members added to structures cfg80211_ap_settings and
cfg80211_csa_settings to propagate the bitmap to the driver after
validation.
Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Signed-off-by: Muna Sinada <quic_msinada@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-3-quic_alokad@quicinc.com
[move validation against 0xffff into policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
- Move ieee80211_valid_disable_subchannel_bitmap() from mlme.c to
chan.c, rename it as cfg80211_valid_disable_subchannel_bitmap()
and export it.
- Modify the prototype to include struct cfg80211_chan_def instead
of only bandwidth to support a check which returns false if the
primary channel is punctured.
Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-2-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Add an error message to the missing frequency case to have all
-EINVAL in nl80211_parse_chandef() return a better error.
Signed-off-by: Jaewan Kim <jaewan@google.com>
Link: https://lore.kernel.org/r/20230130074514.1560021-1-jaewan@google.com
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
nl80211_send_ap_stopped() can be called multiple times on the same
netdev for each link when using Multi-Link Operation. Add the
MLO_LINK_ID attribute to the event to allow userspace to distinguish
which link the event is for.
Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Link: https://lore.kernel.org/r/20230128125844.2407135-2-alvin@pqrs.dk
Signed-off-by: Johannes Berg <johannes.berg@intel.com>