Commit Graph

72612 Commits

Author SHA1 Message Date
Paul Blakey 606c7c43d0 net/sched: flower: Support hardware miss to tc action
To support hardware miss to tc action in actions on the flower
classifier, implement the required getting of filter actions,
and setup filter exts (actions) miss by giving it the filter's
handle and actions.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey 08a0063df3 net/sched: flower: Move filter handle initialization earlier
To support miss to action during hardware offload the filter's
handle is needed when setting up the actions (tcf_exts_init()),
and before offloading.

Move filter handle initialization earlier.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey 80cd22c35c net/sched: cls_api: Support hardware miss to tc action
For drivers to support partial offload of a filter's action list,
add support for action miss to specify an action instance to
continue from in sw.

CT action in particular can't be fully offloaded, as new connections
need to be handled in software. This imposes other limitations on
the actions that can be offloaded together with the CT action, such
as packet modifications.

Assign each action on a filter's action list a unique miss_cookie
which drivers can then use to fill action_miss part of the tc skb
extension. On getting back this miss_cookie, find the action
instance with relevant cookie and continue classifying from there.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Paul Blakey db4b49025c net/sched: Rename user cookie and act cookie
struct tc_action->act_cookie is a user defined cookie,
and the related struct flow_action_entry->act_cookie is
used as an handle similar to struct flow_cls_offload->cookie.

Rename tc_action->act_cookie to user_cookie, and
flow_action_entry->act_cookie to cookie so their names
would better fit their usage.

Signed-off-by: Paul Blakey <paulb@nvidia.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:46:10 -08:00
Jakub Kicinski 871489dd01 Merge tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:

====================
pull-request: ieee802154-next 2023-02-20

Miquel Raynal build upon his earlier work and introduced two new
features into the ieee802154 stack. Beaconing to announce existing
PAN's and passive scanning to discover the beacons and associated
PAN's. The matching changes to the userspace configuration tool
have been posted as well and will be released together with the
kernel release.

Arnd Bergmann and Dmitry Torokhov worked on converting the
at86rf230 and cc2520 drivers away from the unused platform_data
usage and towards the new gpiod API. (I had to add a revert as
Dmitry found a regression on an already pushed tree on my side).

Changes since v1 (pull request 2023-02-02)
- Netlink API extack and NLA_POLICY* usage as suggested by Jakub
- Removed always true condition found by kernel test robot
- Simplify device removal with running background job for scanning
- Fix problems with beacon sending in some cases by using the MLME
  tx path

* tag 'ieee802154-for-net-next-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  ieee802154: Drop device trackers
  mac802154: Fix an always true condition
  mac802154: Send beacons using the MLME Tx path
  ieee802154: Change error code on monitor scan netlink request
  ieee802154: Convert scan error messages to extack
  ieee802154: Use netlink policies when relevant on scan parameters
  ieee802154: at86rf230: switch to using gpiod API
  ieee802154: at86rf230: drop support for platform data
  Revert "at86rf230: convert to gpio descriptors"
  cc2520: move to gpio descriptors
  mac802154: Avoid superfluous endianness handling
  at86rf230: convert to gpio descriptors
  mac802154: Handle basic beaconing
  ieee802154: Add support for user beaconing requests
  mac802154: Handle passive scanning
  mac802154: Add MLME Tx locked helpers
  mac802154: Prepare forcing specific symbol duration
  ieee802154: Introduce a helper to validate a channel
  ieee802154: Define a beacon frame header
  ieee802154: Add support for user scanning requests
====================

Link: https://lore.kernel.org/r/20230220213749.386451-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:40:52 -08:00
Kuniyuki Iwashima be9832c2e9 net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
Commit 2c02d41d71 ("net/ulp: prevent ULP without clone op from entering
the LISTEN status") guarantees that all ULP listeners have clone() op, so
we no longer need to test it in inet_clone_ulp().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230217200920.85306-1-kuniyu@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:31:49 -08:00
Jakub Kicinski ee8d72a157 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
 g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
 j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
 =tYU+
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-17

We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).

The main changes are:

1) Add a rbtree data structure following the "next-gen data structure"
   precedent set by recently-added linked-list, that is, by using
   kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.

2) Add a new benchmark for hashmap lookups to BPF selftests,
   from Anton Protopopov.

3) Fix bpf_fib_lookup to only return valid neighbors and add an option
   to skip the neigh table lookup, from Martin KaFai Lau.

4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
   accouting for container environments, from Yafang Shao.

5) Batch of ice multi-buffer and driver performance fixes,
   from Alexander Lobakin.

6) Fix a bug in determining whether global subprog's argument is
   PTR_TO_CTX, which is based on type names which breaks kprobe progs,
   from Andrii Nakryiko.

7) Prep work for future -mcpu=v4 LLVM option which includes usage of
   BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
   from Eduard Zingerman.

8) More prep work for later building selftests with Memory Sanitizer
   in order to detect usages of undefined memory, from Ilya Leoshkevich.

9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
   dereference via sendmsg(), from Maciej Fijalkowski.

10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.

11) Fix BPF memory allocator in combination with BPF hashtab where it could
    corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.

12) Fix LoongArch BPF JIT to always use 4 instructions for function
    address so that instruction sequences don't change between passes,
    from Hengqi Chen.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
  selftests/bpf: Add bpf_fib_lookup test
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
  riscv, bpf: Add bpf trampoline support for RV64
  riscv, bpf: Add bpf_arch_text_poke support for RV64
  riscv, bpf: Factor out emit_call for kernel and bpf context
  riscv: Extend patch_text for multiple instructions
  Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
  selftests/bpf: Add global subprog context passing tests
  selftests/bpf: Convert test_global_funcs test to test_loader framework
  bpf: Fix global subprog context argument resolution logic
  LoongArch, bpf: Use 4 instructions for function address in JIT
  bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
  bpf: Disable bh in bpf_test_run for xdp and tc prog
  xsk: check IFF_UP earlier in Tx path
  Fix typos in selftest/bpf files
  selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  ...
====================

Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:31:14 -08:00
Linus Torvalds 5b0ed59649 for-6.3/block-2023-02-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPvfncQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpob2EADXJxcr2jjYHm/7cjKkyuVX8fr80dNdMeuY
 JFdsjG1k6Uj73BVhQQWYTcs/PsrWBHWRsv6uz4WgOELj55eXmf5Q0kJszyUeJW33
 /DjqLvtoppVcYf80xE13wKvCfn73BjwQo6xkGM0qAYn15eaXiD/Ax3xC6eJlsBeK
 PEw7EJyhacbSxZa/1D2B6+mqII1jUQWProTCc3udZ4JHi3WvdWa3Rda0qCqHl4a1
 +K2aP2YTFIRPxBzfMNa/CafWVIFubTdht+4Ds6R60RImzB9e0VUBfcsiUyW5Zg7L
 Fwv7ptXuWrALwVNdW56Oz1QikBxn2pdRR2HMLwKJW1MD8kP9r8LMm2jV5Rhiwe0B
 OQsGRYkOzBvw+bxeP5fvk0iPGVMz6ActH4gkraA5QdLqayDaFYOadlhqz0uRo5SH
 Fb42Vl658K/MHDSIk8U58TNkmrsIJsBGohXI9DOGINPPvv3XOPi4Q1HmXkGRmii0
 y+lNU/QEGh7xXXew29SPP76uQpQaYfC7NxXCMw/OpOMwehzjsjshmM2lpxi8zsgt
 PJUmfHv5qxCplNmTJXmUpmX7sS7550HUdu9FJb13DM+gzKg8bk9jWVuLrzqrVlG5
 1hKWEl1+heg1heRfaIuJVLbPI0au6Sb4uqhih/PHyrP9TWIoAruDbDJM65GKTxyE
 2uEgcHzHQw==
 =poRc
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Christoph:
      - Small improvements to the logging functionality (Amit Engel)
      - Authentication cleanups (Hannes Reinecke)
      - Cleanup and optimize the DMA mapping cod in the PCIe driver
        (Keith Busch)
      - Work around the command effects for Format NVM (Keith Busch)
      - Misc cleanups (Keith Busch, Christoph Hellwig)
      - Fix and cleanup freeing single sgl (Keith Busch)

 - MD updates via Song:
      - Fix a rare crash during the takeover process
      - Don't update recovery_cp when curr_resync is ACTIVE
      - Free writes_pending in md_stop
      - Change active_io to percpu

 - Updates to drbd, inching us closer to unifying the out-of-tree driver
   with the in-tree one (Andreas, Christoph, Lars, Robert)

 - BFQ update adding support for multi-actuator drives (Paolo, Federico,
   Davide)

 - Make brd compliant with REQ_NOWAIT (me)

 - Fix for IOPOLL and queue entering, fixing stalled IO waiting on
   timeouts (me)

 - Fix for REQ_NOWAIT with multiple bios (me)

 - Fix memory leak in blktrace cleanup (Greg)

 - Clean up sbitmap and fix a potential hang (Kemeng)

 - Clean up some bits in BFQ, and fix a bug in the request injection
   (Kemeng)

 - Clean up the request allocation and issue code, and fix some bugs
   related to that (Kemeng)

 - ublk updates and fixes:
      - Add support for unprivileged ublk (Ming)
      - Improve device deletion handling (Ming)
      - Misc (Liu, Ziyang)

 - s390 dasd fixes (Alexander, Qiheng)

 - Improve utility of request caching and fixes (Anuj, Xiao)

 - zoned cleanups (Pankaj)

 - More constification for kobjs (Thomas)

 - blk-iocost cleanups (Yu)

 - Remove bio splitting from drivers that don't need it (Christoph)

 - Switch blk-cgroups to use struct gendisk. Some of this is now
   incomplete as select late reverts were done. (Christoph)

 - Add bvec initialization helpers, and convert callers to use that
   rather than open-coding it (Christoph)

 - Misc fixes and cleanups (Jinke, Keith, Arnd, Bart, Li, Martin,
   Matthew, Ulf, Zhong)

* tag 'for-6.3/block-2023-02-16' of git://git.kernel.dk/linux: (169 commits)
  brd: use radix_tree_maybe_preload instead of radix_tree_preload
  block: use proper return value from bio_failfast()
  block: bio-integrity: Copy flags when bio_integrity_payload is cloned
  block: Fix io statistics for cgroup in throttle path
  brd: mark as nowait compatible
  brd: check for REQ_NOWAIT and set correct page allocation mask
  brd: return 0/-error from brd_insert_page()
  block: sync mixed merged request's failfast with 1st bio's
  Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"
  Revert "blk-cgroup: pass a gendisk to blkg_lookup"
  Revert "blk-cgroup: delay blk-cgroup initialization until add_disk"
  Revert "blk-cgroup: delay calling blkcg_exit_disk until disk_release"
  Revert "blk-cgroup: move the cgroup information to struct gendisk"
  nvme-pci: remove iod use_sgls
  nvme-pci: fix freeing single sgl
  block: ublk: check IO buffer based on flag need_get_data
  s390/dasd: Fix potential memleak in dasd_eckd_init()
  s390/dasd: sort out physical vs virtual pointers usage
  block: Remove the ALLOC_CACHE_SLACK constant
  block: make kobj_type structures constant
  ...
2023-02-20 14:27:21 -08:00
Linus Torvalds 05e6295f7b fs.idmapped.v6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY+5NlQAKCRCRxhvAZXjc
 orOaAP9i2h3OJy95nO2Fpde0Bt2UT+oulKCCcGlvXJ8/+TQpyQD/ZQq47gFQ0EAz
 Br5NxeyGeecAb0lHpFz+CpLGsxMrMwQ=
 =+BG5
 -----END PGP SIGNATURE-----

Merge tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs idmapping updates from Christian Brauner:

 - Last cycle we introduced the dedicated struct mnt_idmap type for
   mount idmapping and the required infrastucture in 256c8aed2b ("fs:
   introduce dedicated idmap type for mounts"). As promised in last
   cycle's pull request message this converts everything to rely on
   struct mnt_idmap.

   Currently we still pass around the plain namespace that was attached
   to a mount. This is in general pretty convenient but it makes it easy
   to conflate namespaces that are relevant on the filesystem with
   namespaces that are relevant on the mount level. Especially for
   non-vfs developers without detailed knowledge in this area this was a
   potential source for bugs.

   This finishes the conversion. Instead of passing the plain namespace
   around this updates all places that currently take a pointer to a
   mnt_userns with a pointer to struct mnt_idmap.

   Now that the conversion is done all helpers down to the really
   low-level helpers only accept a struct mnt_idmap argument instead of
   two namespace arguments.

   Conflating mount and other idmappings will now cause the compiler to
   complain loudly thus eliminating the possibility of any bugs. This
   makes it impossible for filesystem developers to mix up mount and
   filesystem idmappings as they are two distinct types and require
   distinct helpers that cannot be used interchangeably.

   Everything associated with struct mnt_idmap is moved into a single
   separate file. With that change no code can poke around in struct
   mnt_idmap. It can only be interacted with through dedicated helpers.
   That means all filesystems are and all of the vfs is completely
   oblivious to the actual implementation of idmappings.

   We are now also able to extend struct mnt_idmap as we see fit. For
   example, we can decouple it completely from namespaces for users that
   don't require or don't want to use them at all. We can also extend
   the concept of idmappings so we can cover filesystem specific
   requirements.

   In combination with the vfs{g,u}id_t work we finished in v6.2 this
   makes this feature substantially more robust and thus difficult to
   implement wrong by a given filesystem and also protects the vfs.

 - Enable idmapped mounts for tmpfs and fulfill a longstanding request.

   A long-standing request from users had been to make it possible to
   create idmapped mounts for tmpfs. For example, to share the host's
   tmpfs mount between multiple sandboxes. This is a prerequisite for
   some advanced Kubernetes cases. Systemd also has a range of use-cases
   to increase service isolation. And there are more users of this.

   However, with all of the other work going on this was way down on the
   priority list but luckily someone other than ourselves picked this
   up.

   As usual the patch is tiny as all the infrastructure work had been
   done multiple kernel releases ago. In addition to all the tests that
   we already have I requested that Rodrigo add a dedicated tmpfs
   testsuite for idmapped mounts to xfstests. It is to be included into
   xfstests during the v6.3 development cycle. This should add a slew of
   additional tests.

* tag 'fs.idmapped.v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (26 commits)
  shmem: support idmapped mounts for tmpfs
  fs: move mnt_idmap
  fs: port vfs{g,u}id helpers to mnt_idmap
  fs: port fs{g,u}id helpers to mnt_idmap
  fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap
  fs: port i_{g,u}id_{needs_}update() to mnt_idmap
  quota: port to mnt_idmap
  fs: port privilege checking helpers to mnt_idmap
  fs: port inode_owner_or_capable() to mnt_idmap
  fs: port inode_init_owner() to mnt_idmap
  fs: port acl to mnt_idmap
  fs: port xattr to mnt_idmap
  fs: port ->permission() to pass mnt_idmap
  fs: port ->fileattr_set() to pass mnt_idmap
  fs: port ->set_acl() to pass mnt_idmap
  fs: port ->get_acl() to pass mnt_idmap
  fs: port ->tmpfile() to pass mnt_idmap
  fs: port ->rename() to pass mnt_idmap
  fs: port ->mknod() to pass mnt_idmap
  fs: port ->mkdir() to pass mnt_idmap
  ...
2023-02-20 11:53:11 -08:00
Chuck Lever 2172e84ea0 SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes
I'm guessing that the warning fired because there's some code path
that is called on module unload where the gss_krb5_enctypes file
was never set up.

name 'gss_krb5_enctypes'
WARNING: CPU: 0 PID: 6187 at fs/proc/generic.c:712 remove_proc_entry+0x38d/0x460 fs/proc/generic.c:712

destroy_krb5_enctypes_proc_entry net/sunrpc/auth_gss/svcauth_gss.c:1543 [inline]
gss_svc_shutdown_net+0x7d/0x2b0 net/sunrpc/auth_gss/svcauth_gss.c:2120
ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
setup_net+0x9bd/0xe60 net/core/net_namespace.c:356
copy_net_ns+0x320/0x6b0 net/core/net_namespace.c:483
create_new_namespaces+0x3f6/0xb20 kernel/nsproxy.c:110
copy_namespaces+0x410/0x500 kernel/nsproxy.c:179
copy_process+0x311d/0x76b0 kernel/fork.c:2272
kernel_clone+0xeb/0x9a0 kernel/fork.c:2684
__do_sys_clone+0xba/0x100 kernel/fork.c:2825
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Reported-by: syzbot+04a8437497bcfb4afa95@syzkaller.appspotmail.com
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:57 -05:00
Chuck Lever 319951eba0 SUNRPC: Remove ->xpo_secure_port()
There's no need for the cost of this extra virtual function call
during every RPC transaction: the RQ_SECURE bit can be set properly
in ->xpo_recvfrom() instead.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:55 -05:00
Chuck Lever ecfa398773 SUNRPC: Fix whitespace damage in svcauth_unix.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:52 -05:00
Chuck Lever c4a9f0552c SUNRPC: Add encryption self-tests
With the KUnit infrastructure recently added, we are free to define
other unit tests particular to our implementation. As an example,
I've added a self-test that encrypts then decrypts a string, and
checks the result.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever 4d2d15c0f1 SUNRPC: Add RFC 8009 encryption KUnit tests
RFC 8009 provides sample encryption results. Add KUnit tests to
ensure our implementation derives the expected results for the
provided sample input.

I hate how large this test is, but using non-standard key usage
values means rfc8009_encrypt_case() can't simply reuse ->import_ctx
to allocate and key its ciphers; and the test provides its own
confounders, which means krb5_etm_encrypt() can't be used directly.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever 003caf4f8c SUNRPC: Add RFC 8009 checksum KUnit tests
RFC 8009 provides sample checksum results. Add KUnit tests to ensure
our implementation derives the expected results for the provided
sample input.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever fcbad14b58 SUNRPC: Add KDF-HMAC-SHA2 Kunit tests
RFC 8009 provides sample key derivation results, so Kunit tests are
added to ensure our implementation derives the expected keys for the
provided sample input.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever b958cff6b2 SUNRPC: Add encryption KUnit tests for the RFC 6803 encryption types
Add tests for the new-to-RPCSEC Camellia cipher.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:51 -05:00
Chuck Lever 02142b2ca8 SUNRPC: Add checksum KUnit tests for the RFC 6803 encryption types
Test the new-to-RPCSEC CMAC digest algorithm.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever 35f6e42e81 SUNRPC: Add KDF KUnit tests for the RFC 6803 encryption types
The Camellia enctypes use a new KDF, so add some tests to ensure it
is working properly.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever e1a9a3849d SUNRPC: Add Kunit tests for RFC 3962-defined encryption/decryption
Add Kunit tests for ENCTYPE_AES128_CTS_HMAC_SHA1_96. The test
vectors come from RFC 3962 Appendix B.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever 6eb6b8a446 SUNRPC: Add KUnit tests RFC 3961 Key Derivation
RFC 3961 Appendix A provides tests for the KDF specified in that
document as well as other parts of Kerberos. The other three usage
scenarios in Section 10 are not implemented by the Linux kernel's
RPCSEC GSS Kerberos 5 mechanism, so tests are not added for those.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:50 -05:00
Chuck Lever ddd8c1f975 SUNRPC: Export get_gss_krb5_enctype()
I plan to add KUnit tests that will need enctype profile
information. Export the enctype profile lookup function.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:48 -05:00
Chuck Lever eebd8c2d19 SUNRPC: Add KUnit tests for rpcsec_krb5.ko
The Kerberos RFCs provide test vectors to verify the operation of
an implementation. Introduce a KUnit test framework to exercise the
Linux kernel's implementation of Kerberos.

Start with test cases for the RFC 3961-defined n-fold function. The
sample vectors for that are found in RFC 3961 Section 10.

Run the GSS Kerberos 5 mechanism's unit tests with this command:

$ ./tools/testing/kunit/kunit.py run \
	--kunitconfig ./net/sunrpc/.kunitconfig

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:46 -05:00
Chuck Lever 6e460c230d SUNRPC: Move remaining internal definitions to gss_krb5_internal.h
The goal is to leave only protocol-defined items in gss_krb5.h so
that it can be easily replaced by a generic header. Implementation
specific items are moved to the new internal header.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:46 -05:00
Chuck Lever 6e6d9eee0e SUNRPC: Advertise support for the Camellia encryption types
Add the RFC 6803 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:44 -05:00
Chuck Lever 45b4ef46b5 SUNRPC: Add KDF_FEEDBACK_CMAC
The Camellia enctypes use the KDF_FEEDBACK_CMAC Key Derivation
Function defined in RFC 6803 Section 3.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever 3394682fba SUNRPC: Support the Camellia enctypes
RFC 6803 defines two encryption types that use Camellia ciphers (RFC
3713) and CMAC digests. Implement support for those in SunRPC's GSS
Kerberos 5 mechanism.

There has not been an explicit request to support these enctypes.
However, this new set of enctypes provides a good alternative to the
AES-SHA1 enctypes that are to be deprecated at some point.

As this implementation is still a "beta", the default is to not
build it automatically.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever f26ec6b1b1 SUNRPC: Advertise support for RFC 8009 encryption types
Add the RFC 8009 encryption types to the string of integers that is
reported to gssd during upcalls. This enables gssd to utilize keys
with these encryption types when support for them is built into the
kernel.

Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=400
Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever 0d5b5a0f32 SUNRPC: Add RFC 8009 encryption and decryption functions
RFC 8009 enctypes use different crypt formulae than previous
Kerberos 5 encryption types. Section 1 of RFC 8009 explains the
reason for this change:

> The new types conform to the framework specified in [RFC3961],
> but do not use the simplified profile, as the simplified profile
> is not compliant with modern cryptographic best practices such as
> calculating Message Authentication Codes (MACs) over ciphertext
> rather than plaintext.

Add new .encrypt and .decrypt functions to handle this variation.

The new approach described above is referred to as Encrypt-then-MAC
(or EtM). Hence the names of the new functions added here are
prefixed with "krb5_etm_".

A critical second difference with previous crypt formulae is that
the cipher state is included in the computed HMAC. Note however that
for RPCSEC, the initial cipher state is easy to compute on both
initiator and acceptor because it is always all zeroes.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever ae2e4d2bae SUNRPC: Add KDF-HMAC-SHA2
The RFC 8009 encryption types use a different key derivation
function than the RFC 3962 encryption types. The new key derivation
function is defined in Section 3 of RFC 8009.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever a40cf7530d SUNRPC: Add gk5e definitions for RFC 8009 encryption types
Fill in entries in the supported_gss_krb5_enctypes array for the
encryption types defined in RFC 8009. These new enctypes use the
SHA-256 and SHA-384 message digest algorithms (as defined in
FIPS-180) instead of the deprecated SHA-1 algorithm, and are thus
more secure.

Note that NIST has scheduled SHA-1 for deprecation:

https://www.nist.gov/news-events/news/2022/12/nist-retires-sha-1-cryptographic-algorithm

Thus these new encryption types are placed under a separate CONFIG
option to enable distributors to separately introduce support for
the AES-SHA2 enctypes and deprecate support for the current set of
AES-SHA1 encryption types as their user space allows.

As this implementation is still a "beta", the default is to not
build it automatically.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:43 -05:00
Chuck Lever dfb632432a SUNRPC: Refactor CBC with CTS into helpers
Cryptosystem profile enctypes all use cipher block chaining
with ciphertext steal (CBC-with-CTS). However enctypes that are
currently supported in the Linux kernel SunRPC implementation
use only the encrypt-&-MAC approach. The RFC 8009 enctypes use
encrypt-then-MAC, which performs encryption and checksumming in
a different order.

Refactor to make it possible to share the CBC with CTS encryption
and decryption mechanisms between e&M and etM enctypes.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever af664fc902 SUNRPC: Add new subkey length fields
The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits. Add new fields to struct
gss_krb5_enctype that specify the key lengths individually, and
where needed, use the correct new field instead of ->keylength.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever 8b3a09f345 SUNRPC: Parametrize the key length passed to context_v2_alloc_cipher()
Although the Kerberos specs have always listed separate subkey
lengths, the Linux kernel's SunRPC GSS Kerberos enctype profiles
assume the base key and the derived keys have identical lengths.

The aes256-cts-hmac-sha384-192 enctype specifies the length of its
checksum and integrity subkeys as 192 bits, but the length of its
encryption subkey (Ke) as 256 bits.

To support that enctype, parametrize context_v2_alloc_cipher() so
that each of its call sites can pass in its desired key length. For
now it will be the same length as before (gk5e->keylength), but a
subsequent patch will change this.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever ec4aaab39a SUNRPC: Clean up cipher set up for v1 encryption types
De-duplicate some common code.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever 2691a27d9b SUNRPC: Hoist KDF into struct gss_krb5_enctype
Each Kerberos enctype can have a different KDF. Refactor the key
derivation path to support different KDFs for the enctypes
introduced in subsequent patches.

In particular, expose the key derivation function in struct
gss_krb5_enctype instead of the enctype's preferred random-to-key
function. The latter is usually the identity function and is only
ever called during key derivation, so have each KDF call it
directly.

A couple of extra clean-ups:
- Deduplicate the set_cdata() helper
- Have ->derive_key return negative errnos, in accordance with usual
  kernel coding conventions

This patch is a little bigger than I'd like, but these are all
mechanical changes and they are all to the same areas of code. No
behavior change is intended.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever ae6ad5d0b7 SUNRPC: Rename .encrypt_v2 and .decrypt_v2 methods
Clean up: there is now only one encrypt and only one decrypt method,
thus there is no longer a need for the v2-suffixed method names.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:42 -05:00
Chuck Lever d50b8152c9 SUNRPC: Remove ->encrypt and ->decrypt methods from struct gss_krb5_enctype
Clean up: ->encrypt is set to only one value. Replace the two
remaining call sites with direct calls to krb5_encrypt().

There have never been any call sites for the ->decrypt() method.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:41 -05:00
Chuck Lever dfe9a12345 SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES
Because the DES block cipher has been deprecated by Internet
standard, highly secure configurations might require that DES
support be blacklisted or not installed. NFS Kerberos should still
be able to work correctly with only the AES-based enctypes in that
situation.

Also note that MIT Kerberos has begun a deprecation process for DES
encryption types. Their README for 1.19.3 states:

> Beginning with the krb5-1.19 release, a warning will be issued
> if initial credentials are acquired using the des3-cbc-sha1
> encryption type.  In future releases, this encryption type will
> be disabled by default and eventually removed.
>
> Beginning with the krb5-1.18 release, single-DES encryption
> types have been removed.

Aside from the CONFIG option name change, there are two important
policy changes:

1. The 'insecure enctype' group is now disabled by default.
   Distributors have to take action to enable support for deprecated
   enctypes. Implementation of these enctypes will be removed in a
   future kernel release.

2. des3-cbc-sha1 is now considered part of the 'insecure enctype'
   group, having been deprecated by RFC 8429, and is thus disabled
   by default

After this patch is applied, SunRPC support can be built with
Kerberos 5 support but without CRYPTO_DES enabled in the kernel.
And, when these enctypes are disabled, the Linux kernel's SunRPC
RPCSEC GSS implementation fully complies with BCP 179 / RFC 6649
and BCP 218 / RFC 8429.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:41 -05:00
Chuck Lever 17781b2ce4 SUNRPC: Replace KRB5_SUPPORTED_ENCTYPES macro
Now that all consumers of the KRB5_SUPPORTED_ENCTYPES macro are
within the SunRPC layer, the macro can be replaced with something
private and more flexible.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:39 -05:00
Chuck Lever bdb12fb157 SUNRPC: Add /proc/net/rpc/gss_krb5_enctypes file
I would like to replace the KRB5_SUPPORTED_ENCTYPES macro so that
there is finer granularity about what enctype support is built in
to the kernel and then advertised by it.

The /proc/fs/nfsd/supported_krb5_enctypes file is a legacy API
that advertises supported enctypes to rpc.svcgssd (I think?). It
simply prints the value of the KRB5_SUPPORTED_ENCTYPES macro, so it
will need to be replaced with something that can instead display
exactly which enctypes are configured and built into the SunRPC
layer.

Completely decommissioning such APIs is hard. Instead, add a file
that is managed by SunRPC's GSS Kerberos mechanism, which is
authoritative about enctype support status. A subsequent patch will
replace /proc/fs/nfsd/supported_krb5_enctypes with a symlink to this
new file.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:37 -05:00
Chuck Lever 279a67cdd4 SUNRPC: Remove another switch on ctx->enctype
Replace another switch on encryption type so that it does not have
to be modified when adding or removing support for an enctype.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever e01b2c79f4 SUNRPC: Refactor the GSS-API Per Message calls in the Kerberos mechanism
Replace a number of switches on encryption type so that all of them don't
have to be modified when adding or removing support for an enctype.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever 8270dbfceb SUNRPC: Obscure Kerberos integrity keys
There's no need to keep the integrity keys around if we instead
allocate and key a pair of ahashes and keep those. This not only
enables the subkeys to be destroyed immediately after deriving
them, but it makes the Kerberos integrity code path more efficient.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever 2dbe0cac3c SUNRPC: Obscure Kerberos signing keys
There's no need to keep the signing keys around if we instead allocate
and key an ahash and keep that. This not only enables the subkeys to
be destroyed immediately after deriving them, but it makes the
Kerberos signing code path more efficient.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever 9f0b49f933 SUNRPC: Obscure Kerberos encryption keys
The encryption subkeys are not used after the cipher transforms have
been allocated and keyed. There is no need to retain them in struct
krb5_ctx.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:35 -05:00
Chuck Lever 7989a4f4ab SUNRPC: Refactor set-up for aux_cipher
Hoist the name of the aux_cipher into struct gss_krb5_enctype to
prepare for obscuring the encryption keys just after they are
derived.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever 01c4e32632 SUNRPC: Obscure Kerberos session key
ctx->Ksess is never used after import has completed. Obscure it
immediately so it cannot be re-used or copied.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever 7f675ca775 SUNRPC: Improve Kerberos confounder generation
Other common Kerberos implementations use a fully random confounder
for encryption. The reason for this is explained in the new comment
added by this patch. The current get_random_bytes() implementation
does not exhaust system entropy.

Since confounder generation is part of Kerberos itself rather than
the GSS-API Kerberos mechanism, the function is renamed and moved.

Note that light top-down analysis shows that the SHA-1 transform
is by far the most CPU-intensive part of encryption. Thus we do not
expect this change to result in a significant performance impact.
However, eventually it might be necessary to generate an independent
stream of confounders for each Kerberos context to help improve I/O
parallelism.

Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever 4be416a5f2 SUNRPC: Remove .conflen field from struct gss_krb5_enctype
Now that arcfour-hmac is gone, the confounder length is again the
same as the cipher blocksize for every implemented enctype. The
gss_krb5_enctype::conflen field is no longer necessary.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever f03640a1a9 SUNRPC: Remove .blocksize field from struct gss_krb5_enctype
It is not clear from documenting comments, specifications, or code
usage what value the gss_krb5_enctype.blocksize field is supposed
to store. The "encryption blocksize" depends only on the cipher
being used, so that value can be derived where it's needed instead
of stored as a constant.

RFC 3961 Section 5.2 says:

> cipher block size, c
>    This is the block size of the block cipher underlying the
>    encryption and decryption functions indicated above, used for key
>    derivation and for the size of the message confounder and initial
>    vector.  (If a block cipher is not in use, some comparable
>    parameter should be determined.)  It must be at least 5 octets.
>
>    This is not actually an independent parameter; rather, it is a
>    property of the functions E and D.  It is listed here to clarify
>    the distinction between it and the message block size, m.

In the Linux kernel's implemenation of the SunRPC RPCSEC GSS
Kerberos 5 mechanism, the cipher block size, which is dependent on
the encryption and decryption transforms, is used only in
krb5_derive_key(), so it is straightforward to replace it.

Tested-by: Scott Mayhew <smayhew@redhat.com>
Reviewed-by: Simo Sorce <simo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:34 -05:00
Chuck Lever ccf08bed6e SUNRPC: Replace pool stats with per-CPU variables
Eliminate the use of bus-locked operations in svc_xprt_enqueue(),
which is a hot path. Replace them with per-cpu variables to reduce
cross-CPU memory bus traffic.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:32 -05:00
Chuck Lever 65ba3d2425 SUNRPC: Use per-CPU counters to tally server RPC counts
- Improves counting accuracy
 - Reduces cross-CPU memory traffic

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:32 -05:00
Chuck Lever db1d61656c SUNRPC: Go back to using gsd->body_start
Now that svcauth_gss_prepare_to_wrap() no longer computes the
location of RPC header fields in the response buffer,
svcauth_gss_accept() can save the location of the databody
rather than the location of the verifier.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever 4bcf0343e8 SUNRPC: Set rq_accept_statp inside ->accept methods
To navigate around the space that svcauth_gss_accept() reserves
for the RPC payload body length and sequence number fields,
svcauth_gss_release() does a little dance with the reply's
accept_stat, moving the accept_stat value in the response buffer
down by two words.

Instead, let's have the ->accept() methods each set the proper
final location of the accept_stat to avoid having to move
things.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever cee4db1945 SUNRPC: Refactor RPC server dispatch method
Currently, svcauth_gss_accept() pre-reserves response buffer space
for the RPC payload length and GSS sequence number before returning
to the dispatcher, which then adds the header's accept_stat field.

The problem is the accept_stat field is supposed to go before the
length and seq_num fields. So svcauth_gss_release() has to relocate
the accept_stat value (see svcauth_gss_prepare_to_wrap()).

To enable these fields to be added to the response buffer in the
correct (final) order, the pointer to the accept_stat has to be made
available to svcauth_gss_accept() so that it can set it before
reserving space for the length and seq_num fields.

As a first step, move the pointer to the location of the accept_stat
field into struct svc_rqst.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever 5f69d5f65a SUNRPC: Final clean-up of svc_process_common()
The @resv parameter is no longer used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:31 -05:00
Chuck Lever 649a692e0f SUNRPC: Convert RPC Reply header encoding to use xdr_stream
The main part of RPC header encoding and the formation of error
responses are now done using the xdr_stream helpers. Bounds checking
before each XDR data item is encoded makes the server's encoding
path safer against accidental buffer overflows.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:30 -05:00
Chuck Lever fcef2afffe SUNRPC: Hoist init_encode out of svc_authenticate()
Now that each ->accept method has been converted, the
svcxdr_init_encode() calls can be hoisted back up into the generic
RPC server code.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:29 -05:00
Chuck Lever 72a1e53a8b SUNRPC: Use xdr_stream for encoding GSS reply verifiers
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:29 -05:00
Chuck Lever b2f42f1d99 SUNRPC: Use xdr_stream to encode replies in server-side GSS upcall helpers
This code constructs replies to the decorated NULL procedure calls
that establish GSS contexts. Convert this code path to use struct
xdr_stream to encode such responses.

Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever 7bb0dfb223 SUNRPC: Convert unwrap data paths to use xdr_stream for replies
We're now moving svcxdr_init_encode() to /before/ the flavor's
->accept method has set rq_auth_slack. Add a helper that can
set rq_auth_slack /after/ svcxdr_init_encode() has been called.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever df18f9ccb9 SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_tls_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever 3b03f3c5d4 SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_unix_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever b2c88ca65a SUNRPC: Use xdr_stream to encode Reply verifier in svcauth_null_accept()
Done as part of hardening the server-side RPC header encoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever faca897816 SUNRPC: Move svcxdr_init_encode() into ->accept methods
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_encode() call into those methods.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:28 -05:00
Chuck Lever 8dd41d70f3 SUNRPC: Push svcxdr_init_encode() into svc_process_common()
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it
is common code and can be pushed down into the generic RPC server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:27 -05:00
Chuck Lever 7b402c8db6 SUNRPC: Add XDR encoding helper for opaque_auth
RFC 5531 defines an MSG_ACCEPTED Reply message like this:

	struct accepted_reply {
		opaque_auth verf;
		union switch (accept_stat stat) {
		case SUCCESS:
		   ...

In the current server code, struct opaque_auth encoding is open-
coded. Introduce a helper that encodes an opaque_auth data item
within the context of a xdr_stream.

Done as part of hardening the server-side RPC header decoding and
encoding paths.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:27 -05:00
Chuck Lever 6d037b15e4 SUNRPC: Remove the rpc_stat variable in svc_process_common()
There's no RPC header field called rpc_stat; more precisely, the
variable appears to be recording an accept_stat value. But it looks
like we don't need to preserve this value at all, actually, so
simply remove the variable.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever 99d074d6b1 SUNRPC: Check rq_auth_stat when preparing to wrap a response
Commit 5b304bc5bf ("[PATCH] knfsd: svcrpc: gss: fix failure on
SVC_DENIED in integrity case") added a check to prevent wrapping an
RPC response if reply_stat == MSG_DENIED, assuming that the only way
to get to svcauth_gss_release() with that reply_stat value was if
the reject_stat was AUTH_ERROR (reject_stat == MISMATCH is handled
earlier in svc_process_common()).

The code there is somewhat confusing. For one thing, rpc_success is
an accept_stat value, not a reply_stat value. The correct reply_stat
value to look for is RPC_MSG_DENIED. It happens to be the same value
as rpc_success, so it all works out, but it's not terribly readable.

Since commit 438623a06b ("SUNRPC: Add svc_rqst::rq_auth_stat"),
the actual auth_stat value is stored in the svc_rqst, so that value
is now available to svcauth_gss_prepare_to_wrap() to make its
decision to wrap, based on direct information about the
authentication status of the RPC caller.

No behavior change is intended, this simply replaces some old code
with something that should be more self-documenting.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever eb1b780f2f SUNRPC: Convert svcauth_gss_wrap_priv() to use xdr_stream()
Actually xdr_stream does not add value here because of how
gss_wrap() works. This is just a clean-up patch.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever a84cfbcd5a SUNRPC: Add @head and @tail variables in svcauth_gss_wrap_priv()
Simplify the references to the head and tail iovecs for readability.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever ba8b13e5f4 SUNRPC: Record gss_wrap() errors in svcauth_gss_wrap_priv()
Match the error reporting in the other unwrap and wrap functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever 7b135c65bb SUNRPC: Rename automatic variables in svcauth_gss_wrap_resp_priv()
Clean up variable names to match the other unwrap and wrap
functions.

Additionally, the explicit type cast on @gsd in unnecessary; and
@resbuf is renamed to match the variable naming in the unwrap
functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:25 -05:00
Chuck Lever 7702378ac4 SUNRPC: Convert svcauth_gss_wrap_integ() to use xdr_stream()
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:24 -05:00
Chuck Lever d91f0323a0 SUNRPC: Replace checksum construction in svcauth_gss_wrap_integ()
Replace finicky logic: Instead of trying to find scratch space in
the response buffer, use the scratch buffer from struct
gss_svc_data.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:24 -05:00
Chuck Lever 15d8f80891 SUNRPC: Record gss_get_mic() errors in svcauth_gss_wrap_integ()
An error computing the checksum here is an exceptional event.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:24 -05:00
Chuck Lever 0adaddd32f SUNRPC: Rename automatic variables in svcauth_gss_wrap_resp_integ()
Clean up: To help orient readers, name the stack variables to match
the XDR field names.

Additionally, the explicit type cast on @gsd is unnecessary; and
@resbuf is renamed to match the variable naming in the unwrap
functions.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:24 -05:00
Chuck Lever 5a92938309 SUNRPC: Clean up svcauth_gss_release()
Now that upper layers use an xdr_stream to track the construction
of each RPC Reply message, resbuf->len is kept up-to-date
automatically. There's no need to recompute it in svc_gss_release().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:22 -05:00
Chuck Lever f4afc8fead SUNRPC: Hoist svcxdr_init_decode() into svc_process()
Now the entire RPC Call header parsing path is handled via struct
xdr_stream-based decoders.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:21 -05:00
Chuck Lever 1c59a532ae SUNRPC: Remove svc_process_common's argv parameter
Clean up: With xdr_stream decoding, the @argv parameter is no longer
used.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:21 -05:00
Chuck Lever 163cdfca34 SUNRPC: Decode most of RPC header with xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:21 -05:00
Chuck Lever 4119bd0306 SUNRPC: Eliminate unneeded variable
Clean up: Saving the RPC program number in two places is
unnecessary.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:19 -05:00
Chuck Lever 2009e32997 SUNRPC: Re-order construction of the first reply fields
Clean up: Group these together for legibility.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:19 -05:00
Chuck Lever 6898b47a0f SUNRPC: Hoist init_decode out of svc_authenticate()
Now that each ->accept method has been converted to use xdr_stream,
the svcxdr_init_decode() calls can be hoisted back up into the
generic RPC server code.

The dprintk in svc_authenticate() is removed, since
trace_svc_authenticate() reports the same information.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:19 -05:00
Chuck Lever b0bc53470d SUNRPC: Convert the svcauth_gss_accept() pre-amble to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:17 -05:00
Chuck Lever 6734706bc0 SUNRPC: Clean up svcauth_gss_accept's NULL procedure check
Micro-optimizations:

1. The value of rqstp->rq_auth_stat is replaced no matter which
   arm of the switch is taken, so the initial assignment can be
   safely removed.

2. Avoid checking the value of gc->gc_proc twice in the I/O
   (RPC_GSS_PROC_DATA) path.

The cost is a little extra code redundancy.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:17 -05:00
Chuck Lever 0653028e8f SUNRPC: Convert gss_verify_header() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever 42140718ea SUNRPC: Convert unwrap_priv_data() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever f4a59e822f SUNRPC: Rename automatic variables in unwrap_priv_data()
Clean up: To help orient readers, name the stack variables to match
the XDR field names.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever b68e4c5c32 SUNRPC: Convert unwrap_integ_data() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever e14673c9c1 SUNRPC: Rename automatic variables in unwrap_integ_data()
Clean up: To help orient readers, name the stack variables to match
the XDR field names.

For readability, I'm also going to rename the unwrap and wrap
functions in a consistent manner, starting with unwrap_integ_data().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever 26a949d1db SUNRPC: Replace read_u32_from_xdr_buf() with existing XDR helper
Clean up / code de-duplication - this functionality is already
available in the generic XDR layer.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:16 -05:00
Chuck Lever c020fa695a SUNRPC: Convert server-side GSS upcall helpers to use xdr_stream
The entire RPC_GSS_PROC_INIT path is converted over to xdr_stream
for decoding the Call credential and verifier.

Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:15 -05:00
Chuck Lever 1cbfb92197 SUNRPC: Remove gss_read_verf()
gss_read_verf() is already short. Fold it into its only caller.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:15 -05:00
Chuck Lever 4d51366dee SUNRPC: Remove gss_read_common_verf()
gss_read_common_verf() is now just a wrapper for dup_netobj(), thus
it can be replaced with direct calls to dup_netobj().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:13 -05:00
Chuck Lever 20ebe927ed SUNRPC: Hoist common verifier decoding code into svcauth_gss_proc_init()
Pre-requisite to replacing gss_read_common_verf().

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:11 -05:00
Chuck Lever 4ac5e7a690 SUNRPC: Move the server-side GSS upcall to a noinline function
Since upcalls are infrequent, ensure the compiler places the upcall
mechanism out-of-line from the I/O path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:11 -05:00
Chuck Lever e8e38e1400 SUNRPC: Convert svcauth_tls_accept() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:11 -05:00
Chuck Lever 6181b0c643 SUNRPC: Convert svcauth_unix_accept() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Since the server-side of the Linux kernel SunRPC implementation
ignores the contents of the Call's machinename field, there's no
need for its RPC_AUTH_UNIX authenticator to reject names that are
larger than UNX_MAXNODENAME.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:11 -05:00
Chuck Lever bee13639c0 SUNRPC: Convert svcauth_null_accept() to use xdr_stream
Done as part of hardening the server-side RPC header decoding path.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:09 -05:00
Chuck Lever 846b5756d7 SUNRPC: Add an XDR decoding helper for struct opaque_auth
RFC 5531 defines the body of an RPC Call message like this:

	struct call_body {
		unsigned int rpcvers;
		unsigned int prog;
		unsigned int vers;
		unsigned int proc;
		opaque_auth cred;
		opaque_auth verf;
		/* procedure-specific parameters start here */
	};

In the current server code, decoding a struct opaque_auth type is
open-coded in several places, and is thus difficult to harden
everywhere.

Introduce a helper for decoding an opaque_auth within the context
of a xdr_stream. This helper can be shared with all authentication
flavor implemenations, even on the client-side.

Done as part of hardening the server-side RPC header decoding paths.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:09 -05:00
Chuck Lever 1e9e177df3 SUNRPC: Move svcxdr_init_decode() into ->accept methods
Refactor: So that the overhaul of each ->accept method can be done
in separate smaller patches, temporarily move the
svcxdr_init_decode() call into those methods.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:09 -05:00
Chuck Lever dba5eaa46b SUNRPC: Push svcxdr_init_decode() into svc_process_common()
Now that all vs_dispatch functions invoke svcxdr_init_decode(), it
is common code and can be pushed down into the generic RPC server.

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-02-20 09:20:09 -05:00
Eric Dumazet 5f1eb1ff58 scm: add user copy checks to put_cmsg()
This is a followup of commit 2558b8039d ("net: use a bounce
buffer for copying skb->mark")

x86 and powerpc define user_access_begin, meaning
that they are not able to perform user copy checks
when using user_write_access_begin() / unsafe_copy_to_user()
and friends [1]

Instead of waiting bugs to trigger on other arches,
add a check_object_size() in put_cmsg() to make sure
that new code tested on x86 with CONFIG_HARDENED_USERCOPY=y
will perform more security checks.

[1] We can not generically call check_object_size() from
unsafe_copy_to_user() because UACCESS is enabled at this point.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 11:39:59 +00:00
Paolo Abeni fce10282a0 devlink: drop leftover duplicate/unused code
The recent merge from net left-over some unused code in
leftover.c - nomen omen.

Just drop the unused bits.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 11:38:35 +00:00
Paolo Abeni 50bcfe8df7 net: make default_rps_mask a per netns attribute
That really was meant to be a per netns attribute from the beginning.

The idea is that once proper isolation is in place in the main
namespace, additional demux in the child namespaces will be redundant.
Let's make child netns default rps mask empty by default.

To avoid bloating the netns with a possibly large cpumask, allocate
it on-demand during the first write operation.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 11:22:54 +00:00
David S. Miller 1155a2281d Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add safeguard to check for NULL tupe in objects updates via
   NFT_MSG_NEWOBJ, this should not ever happen. From Alok Tiwari.

2) Incorrect pointer check in the new destroy rule command,
   from Yang Yingliang.

3) Incorrect status bitcheck in nf_conntrack_udp_packet(),
   from Florian Westphal.

4) Simplify seq_print_acct(), from Ilia Gavrilov.

5) Use 2-arg optimal variant of kfree_rcu() in IPVS,
   from Julian Anastasov.

6) TCP connection enters CLOSE state in conntrack for locally
   originated TCP reset packet from the reject target,
   from Florian Westphal.

The fixes #2 and #3 in this series address issues from the previous pull
nf-next request in this net-next cycle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 10:53:56 +00:00
Shigeru Yoshida 9ca5e7ecab l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()
When a file descriptor of pppol2tp socket is passed as file descriptor
of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register().
This situation is reproduced by the following program:

int main(void)
{
	int sock;
	struct sockaddr_pppol2tp addr;

	sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP);
	if (sock < 0) {
		perror("socket");
		return 1;
	}

	addr.sa_family = AF_PPPOX;
	addr.sa_protocol = PX_PROTO_OL2TP;
	addr.pppol2tp.pid = 0;
	addr.pppol2tp.fd = sock;
	addr.pppol2tp.addr.sin_family = PF_INET;
	addr.pppol2tp.addr.sin_port = htons(0);
	addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1");
	addr.pppol2tp.s_tunnel = 1;
	addr.pppol2tp.s_session = 0;
	addr.pppol2tp.d_tunnel = 0;
	addr.pppol2tp.d_session = 0;

	if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) {
		perror("connect");
		return 1;
	}

	return 0;
}

This program causes the following lockdep warning:

 ============================================
 WARNING: possible recursive locking detected
 6.2.0-rc5-00205-gc96618275234 #56 Not tainted
 --------------------------------------------
 repro/8607 is trying to acquire lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0

 but task is already holding lock:
 ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(sk_lock-AF_PPPOX);
   lock(sk_lock-AF_PPPOX);

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by repro/8607:
  #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30

 stack backtrace:
 CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x100/0x178
  __lock_acquire.cold+0x119/0x3b9
  ? lockdep_hardirqs_on_prepare+0x410/0x410
  lock_acquire+0x1e0/0x610
  ? l2tp_tunnel_register+0x2b7/0x11c0
  ? lock_downgrade+0x710/0x710
  ? __fget_files+0x283/0x3e0
  lock_sock_nested+0x3a/0xf0
  ? l2tp_tunnel_register+0x2b7/0x11c0
  l2tp_tunnel_register+0x2b7/0x11c0
  ? sprintf+0xc4/0x100
  ? l2tp_tunnel_del_work+0x6b0/0x6b0
  ? debug_object_deactivate+0x320/0x320
  ? lockdep_init_map_type+0x16d/0x7a0
  ? lockdep_init_map_type+0x16d/0x7a0
  ? l2tp_tunnel_create+0x2bf/0x4b0
  ? l2tp_tunnel_create+0x3c6/0x4b0
  pppol2tp_connect+0x14e1/0x1a30
  ? pppol2tp_put_sk+0xd0/0xd0
  ? aa_sk_perm+0x2b7/0xa80
  ? aa_af_perm+0x260/0x260
  ? bpf_lsm_socket_connect+0x9/0x10
  ? pppol2tp_put_sk+0xd0/0xd0
  __sys_connect_file+0x14f/0x190
  __sys_connect+0x133/0x160
  ? __sys_connect_file+0x190/0x190
  ? lockdep_hardirqs_on+0x7d/0x100
  ? ktime_get_coarse_real_ts64+0x1b7/0x200
  ? ktime_get_coarse_real_ts64+0x147/0x200
  ? __audit_syscall_entry+0x396/0x500
  __x64_sys_connect+0x72/0xb0
  do_syscall_64+0x38/0xb0
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

This patch fixes the issue by getting/creating the tunnel before
locking the pppol2tp socket.

Fixes: 0b2c59720e ("l2tp: close all race conditions in l2tp_tunnel_register()")
Cc: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 09:25:20 +00:00
Eric Dumazet ac03694bc0 ipv6: icmp6: add drop reason support to icmpv6_echo_reply()
Change icmpv6_echo_reply() to return a drop reason.

For the moment, return NOT_SPECIFIED or SKB_CONSUMED.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet c34b8bb11e ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_NS_OTHERHOST
Hosts can often receive neighbour discovery messages
that are not for them.

Use a dedicated drop reason to make clear the packet is dropped
for this normal case.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet 784d4477f0 ipv6: icmp6: add SKB_DROP_REASON_IPV6_NDISC_BAD_OPTIONS
This is a generic drop reason for any error detected
in ndisc_parse_options().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet ec993edf05 ipv6: icmp6: add drop reason support to ndisc_redirect_rcv()
Change ndisc_redirect_rcv() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and values from icmpv6_notify().

More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet 2f326d9d9f ipv6: icmp6: add drop reason support to ndisc_router_discovery()
Change ndisc_router_discovery() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
and SKB_CONSUMED.

More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet 243e37c642 ipv6: icmp6: add drop reason support to ndisc_recv_rs()
Change ndisc_recv_rs() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet 3009f9ae21 ipv6: icmp6: add drop reason support to ndisc_recv_na()
Change ndisc_recv_na() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED. More reasons are added later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet 7c9c8913f4 ipv6: icmp6: add drop reason support to ndisc_recv_ns()
Change ndisc_recv_ns() to return a drop reason.

For the moment, return PKT_TOO_SMALL, NOT_SPECIFIED
or SKB_CONSUMED.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:54:23 +00:00
Eric Dumazet dd1b527831 net: add location to trace_consume_skb()
kfree_skb() includes the location, it makes sense
to add it to consume_skb() as well.

After patch:

 taskd_EventMana  8602 [004]   420.406239: skb:consume_skb: skbaddr=0xffff893a4a6d0500 location=unix_stream_read_generic
         swapper     0 [011]   422.732607: skb:consume_skb: skbaddr=0xffff89597f68cee0 location=mlx4_en_free_tx_desc
      discipline  9141 [043]   423.065653: skb:consume_skb: skbaddr=0xffff893a487e9c00 location=skb_consume_udp
         swapper     0 [010]   423.073166: skb:consume_skb: skbaddr=0xffff8949ce9cdb00 location=icmpv6_rcv
         borglet  8672 [014]   425.628256: skb:consume_skb: skbaddr=0xffff8949c42e9400 location=netlink_dump
         swapper     0 [028]   426.263317: skb:consume_skb: skbaddr=0xffff893b1589dce0 location=net_rx_action
            wget 14339 [009]   426.686380: skb:consume_skb: skbaddr=0xffff893a51b552e0 location=tcp_rcv_state_process

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:28:49 +00:00
Xuan Zhuo 9f78bf330a xsk: support use vaddr as ring
When we try to start AF_XDP on some machines with long running time, due
to the machine's memory fragmentation problem, there is no sufficient
contiguous physical memory that will cause the start failure.

If the size of the queue is 8 * 1024, then the size of the desc[] is
8 * 1024 * 8 = 16 * PAGE, but we also add struct xdp_ring size, so it is
16page+. This is necessary to apply for a 4-order memory. If there are a
lot of queues, it is difficult to these machine with long running time.

Here, that we actually waste 15 pages. 4-Order memory is 32 pages, but
we only use 17 pages.

This patch replaces __get_free_pages() by vmalloc() to allocate memory
to solve these problems.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:22:12 +00:00
D. Wythe 475f9ff63e net/smc: fix application data exception
There is a certain probability that following
exceptions will occur in the wrk benchmark test:

Running 10s test @ http://11.213.45.6:80
  8 threads and 64 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.72ms   13.94ms 245.33ms   94.17%
    Req/Sec     1.96k   713.67     5.41k    75.16%
  155262 requests in 10.10s, 23.10MB read
Non-2xx or 3xx responses: 3

We will find that the error is HTTP 400 error, which is a serious
exception in our test, which means the application data was
corrupted.

Consider the following scenarios:

CPU0                            CPU1

buf_desc->used = 0;
                                cmpxchg(buf_desc->used, 0, 1)
                                deal_with(buf_desc)

memset(buf_desc->cpu_addr,0);

This will cause the data received by a victim connection to be cleared,
thus triggering an HTTP 400 error in the server.

This patch exchange the order between clear used and memset, add
barrier to ensure memory consistency.

Fixes: 1c5526968e ("net/smc: Clear memory when release and reuse buffer")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:09:27 +00:00
D. Wythe e40b801b36 net/smc: fix potential panic dues to unprotected smc_llc_srv_add_link()
There is a certain chance to trigger the following panic:

PID: 5900   TASK: ffff88c1c8af4100  CPU: 1   COMMAND: "kworker/1:48"
 #0 [ffff9456c1cc79a0] machine_kexec at ffffffff870665b7
 #1 [ffff9456c1cc79f0] __crash_kexec at ffffffff871b4c7a
 #2 [ffff9456c1cc7ab0] crash_kexec at ffffffff871b5b60
 #3 [ffff9456c1cc7ac0] oops_end at ffffffff87026ce7
 #4 [ffff9456c1cc7ae0] page_fault_oops at ffffffff87075715
 #5 [ffff9456c1cc7b58] exc_page_fault at ffffffff87ad0654
 #6 [ffff9456c1cc7b80] asm_exc_page_fault at ffffffff87c00b62
    [exception RIP: ib_alloc_mr+19]
    RIP: ffffffffc0c9cce3  RSP: ffff9456c1cc7c38  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000004
    RDX: 0000000000000010  RSI: 0000000000000000  RDI: 0000000000000000
    RBP: ffff88c1ea281d00   R8: 000000020a34ffff   R9: ffff88c1350bbb20
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000010  R14: ffff88c1ab040a50  R15: ffff88c1ea281d00
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff9456c1cc7c60] smc_ib_get_memory_region at ffffffffc0aff6df [smc]
 #8 [ffff9456c1cc7c88] smcr_buf_map_link at ffffffffc0b0278c [smc]
 #9 [ffff9456c1cc7ce0] __smc_buf_create at ffffffffc0b03586 [smc]

The reason here is that when the server tries to create a second link,
smc_llc_srv_add_link() has no protection and may add a new link to
link group. This breaks the security environment protected by
llc_conf_mutex.

Fixes: 2d2209f201 ("net/smc: first part of add link processing as SMC server")
Signed-off-by: D. Wythe <alibuda@linux.alibaba.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20 08:07:01 +00:00
Vladimir Oltean 64cb6aad12 net/sched: taprio: dynamic max_sdu larger than the max_mtu is unlimited
It makes no sense to keep randomly large max_sdu values, especially if
larger than the device's max_mtu. These are visible in "tc qdisc show".
Such a max_sdu is practically unlimited and will cause no packets for
that traffic class to be dropped on enqueue.

Just set max_sdu_dynamic to U32_MAX, which in the logic below causes
taprio to save a max_frm_len of U32_MAX and a max_sdu presented to user
space of 0 (unlimited).

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20 08:46:57 +01:00
Vladimir Oltean bdf366bd86 net/sched: taprio: don't allow dynamic max_sdu to go negative after stab adjustment
The overhead specified in the size table comes from the user. With small
time intervals (or gates always closed), the overhead can be larger than
the max interval for that traffic class, and their difference is
negative.

What we want to happen is for max_sdu_dynamic to have the smallest
non-zero value possible (1) which means that all packets on that traffic
class are dropped on enqueue. However, since max_sdu_dynamic is u32, a
negative is represented as a large value and oversized dropping never
happens.

Use max_t with int to force a truncation of max_frm_len to no smaller
than dev->hard_header_len + 1, which in turn makes max_sdu_dynamic no
smaller than 1.

Fixes: fed87cc671 ("net/sched: taprio: automatically calculate queueMaxSDU based on TC gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20 08:46:57 +01:00
Vladimir Oltean 09dbdf28f9 net/sched: taprio: fix calculation of maximum gate durations
taprio_calculate_gate_durations() depends on netdev_get_num_tc() and
this returns 0. So it calculates the maximum gate durations for no
traffic class.

I had tested the blamed commit only with another patch in my tree, one
which in the end I decided isn't valuable enough to submit ("net/sched:
taprio: mask off bits in gate mask that exceed number of TCs").

The problem is that having this patch threw off my testing. By moving
the netdev_set_num_tc() call earlier, we implicitly gave to
taprio_calculate_gate_durations() the information it needed.

Extract only the portion from the unsubmitted change which applies the
mqprio configuration to the netdev earlier.

Link: https://patchwork.kernel.org/project/netdevbpf/patch/20230130173145.475943-15-vladimir.oltean@nxp.com/
Fixes: a306a90c8f ("net/sched: taprio: calculate tc gate durations")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20 08:46:57 +01:00
David Howells c078381856 rxrpc: Fix overproduction of wakeups to recvmsg()
Fix three cases of overproduction of wakeups:

 (1) rxrpc_input_split_jumbo() conditionally notifies the app that there's
     data for recvmsg() to collect if it queues some data - and then its
     only caller, rxrpc_input_data(), goes and wakes up recvmsg() anyway.

     Fix the rxrpc_input_data() to only do the wakeup in failure cases.

 (2) If a DATA packet is received for a call by the I/O thread whilst
     recvmsg() is busy draining the call's rx queue in the app thread, the
     call will left on the recvmsg() queue for recvmsg() to pick up, even
     though there isn't any data on it.

     This can cause an unexpected recvmsg() with a 0 return and no MSG_EOR
     set after the reply has been posted to a service call.

     Fix this by discarding pending calls from the recvmsg() queue that
     don't need servicing yet.

 (3) Not-yet-completed calls get requeued after having data read from them,
     even if they have no data to read.

     Fix this by only requeuing them if they have data waiting on them; if
     they don't, the I/O thread will requeue them when data arrives or they
     fail.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/3386149.1676497685@warthog.procyon.org.uk
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-20 08:33:25 +01:00
Miquel Raynal ed9a8ad7d8 ieee802154: Drop device trackers
In order to prevent a device from disappearing when a background job was
started, dev_hold() and dev_put() calls were made. During the
stabilization phase of the scan/beacon features, it was later decided
that removing the device while a background job was ongoing was a valid use
case, and we should instead stop the background job and then remove the
device, rather than prevent the device from being removed. This is what
is currently done, which means manually reference counting the device
during background jobs is no longer needed.

Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b ("ieee802154: Add support for user beaconing requests")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-7-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:49:53 +01:00
Miquel Raynal 61d7dddf46 mac802154: Fix an always true condition
At this stage we simply do not care about the delayed work value,
because active scan is not yet supported, so we can blindly queue
another work once a beacon has been sent.

It fixes a smatch warning:
    mac802154_beacon_worker() warn: always true condition
    '(local->beacon_interval >= 0) => (0-u32max >= 0)'

Fixes: 3accf47627 ("mac802154: Handle basic beaconing")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-6-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:47:26 +01:00
Miquel Raynal 1375e3ba9d mac802154: Send beacons using the MLME Tx path
Using ieee802154_subif_start_xmit() to bypass the net queue when
sending beacons is broken because it does not acquire the
HARD_TX_LOCK(), hence not preventing datagram buffers to be smashed by
beacons upon contention situation. Using the mlme_tx helper is not the
best fit either but at least it is not buggy and has little-to-no
performance hit. More details are given in the comment explaining this
choice in the code.

Fixes: 3accf47627 ("mac802154: Handle basic beaconing")
Reported-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-5-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:44:53 +01:00
Miquel Raynal 1edecbd0bd ieee802154: Change error code on monitor scan netlink request
Returning EPERM gives the impression that "right now" it is not
possible, but "later" it could be, while what we want to express is the
fact that this is not currently supported at all (might change in the
future). So let's return EOPNOTSUPP instead.

Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Suggested-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-4-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:41:23 +01:00
Miquel Raynal a0b6106672 ieee802154: Convert scan error messages to extack
Instead of printing error messages in the kernel log, let's use extack.
When there is a netlink error returned that could be further specified
with a string, use extack as well.

Apply this logic to the very recent scan/beacon infrastructure.

Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Fixes: 9bc114504b ("ieee802154: Add support for user beaconing requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-3-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:38:41 +01:00
Miquel Raynal 648324c9b6 ieee802154: Use netlink policies when relevant on scan parameters
Instead of open-coding scan parameters (page, channels, duration, etc),
let's use the existing NLA_POLICY* macros. This help greatly reducing
the error handling and clarifying the overall logic.

Fixes: ed3557c947 ("ieee802154: Add support for user scanning requests")
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Link: https://lore.kernel.org/r/20230214135035.1202471-2-miquel.raynal@bootlin.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2023-02-18 16:35:10 +01:00
Martin KaFai Lau 31de4105f0 bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
2023-02-17 22:12:04 +01:00
Martin KaFai Lau 181127fb76 Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
This reverts commit 6c20822fad.

build bot failed on arch with different cache line size:
https://lore.kernel.org/bpf/50c35055-afa9-d01e-9a05-ea5351280e4f@intel.com/

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-17 12:24:33 -08:00
Martin KaFai Lau 1fe4850b34 bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.

This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
2023-02-17 16:19:42 +01:00
Martin KaFai Lau af2d0d09ea bpf: Disable bh in bpf_test_run for xdp and tc prog
Some of the bpf helpers require bh disabled. eg. The bpf_fib_lookup
helper that will be used in a latter selftest. In particular, it
calls ___neigh_lookup_noref that expects the bh disabled.

This patch disables bh before calling bpf_prog_run[_xdp], so
the testing prog can also use those helpers.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-2-martin.lau@linux.dev
2023-02-17 16:19:23 +01:00
Maciej Fijalkowski 1596dae2f1 xsk: check IFF_UP earlier in Tx path
Xsk Tx can be triggered via either sendmsg() or poll() syscalls. These
two paths share a call to common function xsk_xmit() which has two
sanity checks within. A pseudo code example to show the two paths:

__xsk_sendmsg() :                       xsk_poll():
if (unlikely(!xsk_is_bound(xs)))        if (unlikely(!xsk_is_bound(xs)))
    return -ENXIO;                          return mask;
if (unlikely(need_wait))                (...)
    return -EOPNOTSUPP;                 xsk_xmit()
mark napi id
(...)
xsk_xmit()

xsk_xmit():
if (unlikely(!(xs->dev->flags & IFF_UP)))
	return -ENETDOWN;
if (unlikely(!xs->tx))
	return -ENOBUFS;

As it can be observed above, in sendmsg() napi id can be marked on
interface that was not brought up and this causes a NULL ptr
dereference:

[31757.505631] BUG: kernel NULL pointer dereference, address: 0000000000000018
[31757.512710] #PF: supervisor read access in kernel mode
[31757.517936] #PF: error_code(0x0000) - not-present page
[31757.523149] PGD 0 P4D 0
[31757.525726] Oops: 0000 [#1] PREEMPT SMP NOPTI
[31757.530154] CPU: 26 PID: 95641 Comm: xdpsock Not tainted 6.2.0-rc5+ #40
[31757.536871] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[31757.547457] RIP: 0010:xsk_sendmsg+0xde/0x180
[31757.551799] Code: 00 75 a2 48 8b 00 a8 04 75 9b 84 d2 74 69 8b 85 14 01 00 00 85 c0 75 1b 48 8b 85 28 03 00 00 48 8b 80 98 00 00 00 48 8b 40 20 <8b> 40 18 89 85 14 01 00 00 8b bd 14 01 00 00 81 ff 00 01 00 00 0f
[31757.570840] RSP: 0018:ffffc90034f27dc0 EFLAGS: 00010246
[31757.576143] RAX: 0000000000000000 RBX: ffffc90034f27e18 RCX: 0000000000000000
[31757.583389] RDX: 0000000000000001 RSI: ffffc90034f27e18 RDI: ffff88984cf3c100
[31757.590631] RBP: ffff88984714a800 R08: ffff88984714a800 R09: 0000000000000000
[31757.597877] R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffa
[31757.605123] R13: 0000000000000000 R14: 0000000000000003 R15: 0000000000000000
[31757.612364] FS:  00007fb4c5931180(0000) GS:ffff88afdfa00000(0000) knlGS:0000000000000000
[31757.620571] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[31757.626406] CR2: 0000000000000018 CR3: 000000184b41c003 CR4: 00000000007706e0
[31757.633648] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[31757.640894] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[31757.648139] PKRU: 55555554
[31757.650894] Call Trace:
[31757.653385]  <TASK>
[31757.655524]  sock_sendmsg+0x8f/0xa0
[31757.659077]  ? sockfd_lookup_light+0x12/0x70
[31757.663416]  __sys_sendto+0xfc/0x170
[31757.667051]  ? do_sched_setscheduler+0xdb/0x1b0
[31757.671658]  __x64_sys_sendto+0x20/0x30
[31757.675557]  do_syscall_64+0x38/0x90
[31757.679197]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[31757.687969] Code: 8e f6 ff 44 8b 4c 24 2c 4c 8b 44 24 20 41 89 c4 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 3a 44 89 e7 48 89 44 24 08 e8 b5 8e f6 ff 48
[31757.707007] RSP: 002b:00007ffd49c73c70 EFLAGS: 00000293 ORIG_RAX: 000000000000002c
[31757.714694] RAX: ffffffffffffffda RBX: 000055a996565380 RCX: 00007fb4c5727c16
[31757.721939] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
[31757.729184] RBP: 0000000000000040 R08: 0000000000000000 R09: 0000000000000000
[31757.736429] R10: 0000000000000040 R11: 0000000000000293 R12: 0000000000000000
[31757.743673] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[31757.754940]  </TASK>

To fix this, let's make xsk_xmit a function that will be responsible for
generic Tx, where RCU is handled accordingly and pull out sanity checks
and xs->zc handling. Populate sanity checks to __xsk_sendmsg() and
xsk_poll().

Fixes: ca2e1a6270 ("xsk: Mark napi_id on sendmsg()")
Fixes: 18b1ab7aa7 ("xsk: Fix race at socket teardown")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230215143309.13145-1-maciej.fijalkowski@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2023-02-17 16:05:28 +01:00
Florian Westphal 2954fe60e3 netfilter: let reset rules clean out conntrack entries
iptables/nftables support responding to tcp packets with tcp resets.

The generated tcp reset packet passes through both output and postrouting
netfilter hooks, but conntrack will never see them because the generated
skb has its ->nfct pointer copied over from the packet that triggered the
reset rule.

If the reset rule is used for established connections, this
may result in the conntrack entry to be around for a very long
time (default timeout is 5 days).

One way to avoid this would be to not copy the nf_conn pointer
so that the rest packet passes through conntrack too.

Problem is that output rules might not have the same conntrack
zone setup as the prerouting ones, so its possible that the
reset skb won't find the correct entry.  Generating a template
entry for the skb seems error prone as well.

Add an explicit "closing" function that switches a confirmed
conntrack entry to closed state and wire this up for tcp.

If the entry isn't confirmed, no action is needed because
the conntrack entry will never be committed to the table.

Reported-by: Russel King <linux@armlinux.org.uk>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-02-17 13:04:56 +01:00
David S. Miller 675f176b4d Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net
Some of the devlink bits were tricky, but I think I got it right.

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-17 11:06:39 +00:00
Jakub Kicinski ca0df43d21 Major stack changes:
* EHT channel puncturing support (client & AP)
  * some support for AP MLD without mac80211
  * fixes for A-MSDU on mesh connections
 
 Major driver changes:
 
 iwlwifi
  * EHT rate reporting
  * Bump FW API to 74 for AX devices
  * STEP equalizer support: transfer some STEP (connection to radio
    on platforms with integrated wifi) related parameters from the
    BIOS to the firmware
 
 mt76
  * switch to using page pool allocator
  * mt7996 EHT (Wi-Fi 7) support
  * Wireless Ethernet Dispatch (WED) reset support
 
 libertas
  * WPS enrollee support
 
 brcmfmac
  * Rename Cypress 89459 to BCM4355
  * BCM4355 and BCM4377 support
 
 mwifiex
  * SD8978 chipset support
 
 rtl8xxxu
  * LED support
 
 ath12k
  * new driver for Qualcomm Wi-Fi 7 devices
 
 ath11k
  * IPQ5018 support
  * Fine Timing Measurement (FTM) responder role support
  * channel 177 support
 
 ath10k
  * store WLAN firmware version in SMEM image table
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEpeA8sTs3M8SN2hR410qiO8sPaAAFAmPuCvYACgkQ10qiO8sP
 aADG4g/+NkmZHdlQVaWHRnv+hm6/BOODJDpe+tzSqP+kUKFuKWJA6k2RozV6iz2L
 SLpOefU9Wq05xGwP2BfVPGdYrD9eEoPrToN0t6up85iPyJurtd2yg7fiOt7vLk0K
 UahZQd+hnp2N2wDFH+pJn6xWaQvO1ZM46i2ufvxGKEOuRyDiq3abKeba4UWxCOWF
 Zw+3rnfhvki1I4ZBLWuEBHrRoTPYfU7s7bkV2k2GdKJhhTMbojFkKPUMhO8XMba0
 NxT5NcSCsQR3kBLbms/Wy1Q0f6G/rK8O3N0QLaOTpT3UMLLkgKLMHyjU+MNiQ6rH
 G6aC5T9U6br8t+xXMFUmkvdI2jnbZ7nHYDBKrs9kGqq0lfbp2Xjifxu2882qOOFh
 4OfVrnoUQQOrl8RGgwvs6/8gpoXSIsb4M03G3xD5MVwFncTs9V5ehMnSUSkzgrdK
 XN2RijXN4V3SULGXyR8MqsS8O70RK6QjHarJtTClzdIHoHTet7NqSpnx5stuF6+K
 MbZD0C10z1sgiUA591wrfwINFLO1xQKYDsUlds2xoaXi3zb6LiEgfyNUUouN//aj
 dJ/YEtnwc4qCGqv2SeAbkjj0T9jGS3njziUp2tQyi2GGZB8bSLcb9Vz96+o3cjLC
 V4AXk7o4y3XurCN+gj2VL9tqNigJqyHMRbb1HxxyPskb1BXOOJo=
 =p6+A
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
Major stack changes:
 * EHT channel puncturing support (client & AP)
 * some support for AP MLD without mac80211
 * fixes for A-MSDU on mesh connections

Major driver changes:

iwlwifi
 * EHT rate reporting
 * Bump FW API to 74 for AX devices
 * STEP equalizer support: transfer some STEP (connection to radio
   on platforms with integrated wifi) related parameters from the
   BIOS to the firmware

mt76
 * switch to using page pool allocator
 * mt7996 EHT (Wi-Fi 7) support
 * Wireless Ethernet Dispatch (WED) reset support

libertas
 * WPS enrollee support

brcmfmac
 * Rename Cypress 89459 to BCM4355
 * BCM4355 and BCM4377 support

mwifiex
 * SD8978 chipset support

rtl8xxxu
 * LED support

ath12k
 * new driver for Qualcomm Wi-Fi 7 devices

ath11k
 * IPQ5018 support
 * Fine Timing Measurement (FTM) responder role support
 * channel 177 support

ath10k
 * store WLAN firmware version in SMEM image table

* tag 'wireless-next-2023-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (207 commits)
  wifi: brcmfmac: p2p: Introduce generic flexible array frame member
  wifi: mac80211: add documentation for amsdu_mesh_control
  wifi: cfg80211: remove gfp parameter from cfg80211_obss_color_collision_notify description
  wifi: mac80211: always initialize link_sta with sta
  wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
  wifi: cfg80211: Set SSID if it is not already set
  wifi: rtw89: move H2C of del_pkt_offload before polling FW status ready
  wifi: rtw89: use readable return 0 in rtw89_mac_cfg_ppdu_status()
  wifi: rtw88: usb: drop now unnecessary URB size check
  wifi: rtw88: usb: send Zero length packets if necessary
  wifi: rtw88: usb: Set qsel correctly
  wifi: mac80211: fix off-by-one link setting
  wifi: mac80211: Fix for Rx fragmented action frames
  wifi: mac80211: avoid u32_encode_bits() warning
  wifi: mac80211: Don't translate MLD addresses for multicast
  wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint
  wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify
  wifi: nl80211: Allow authentication frames and set keys on NAN interface
  wifi: mac80211: fix non-MLO station association
  wifi: mac80211: Allow NSS change only up to capability
  ...
====================

Link: https://lore.kernel.org/r/20230216105406.208416-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-16 11:33:10 -08:00
Andrea Mayer bdf3c0b9c1 seg6: add PSP flavor support for SRv6 End behavior
The "flavors" framework defined in RFC8986 [1] represents additional
operations that can modify or extend a subset of existing behaviors such as
SRv6 End, End.X and End.T. We report these flavors hereafter:
 - Penultimate Segment Pop (PSP);
 - Ultimate Segment Pop (USP);
 - Ultimate Segment Decapsulation (USD).

Depending on how the Segment Routing Header (SRH) has to be handled, an
SRv6 End* behavior can support these flavors either individually or in
combinations.
In this patch, we only consider the PSP flavor for the SRv6 End behavior.

A PSP enabled SRv6 End behavior is used by the Source/Ingress SR node
(i.e., the one applying the SRv6 Policy) when it needs to instruct the
penultimate SR Endpoint node listed in the SID List (carried by the SRH) to
remove the SRH from the IPv6 header.

Specifically, a PSP enabled SRv6 End behavior processes the SRH by:
   i) decreasing the Segment Left (SL) from 1 to 0;
  ii) copying the Last Segment IDentifier (SID) into the IPv6 Destination
      Address (DA);
 iii) removing (i.e., popping) the outer SRH from the extension headers
      following the IPv6 header.

It is important to note that PSP operation (steps i, ii, iii) takes place
only at a penultimate SR Segment Endpoint node (i.e., when the SL=1) and
does not happen at non-penultimate Endpoint nodes. Indeed, when a SID of
PSP flavor is processed at a non-penultimate SR Segment Endpoint node, the
PSP operation is not performed because it would not be possible to decrease
the SL from 1 to 0.

                                                 SL=2 SL=1 SL=0
                                                   |    |    |
For example, given the SRv6 policy (SID List := <  X,   Y,   Z  >):
 - a PSP enabled SRv6 End behavior bound to SID "Y" will apply the PSP
   operation as Segment Left (SL) is 1, corresponding to the Penultimate
   Segment of the SID List;
 - a PSP enabled SRv6 End behavior bound to SID "X" will *NOT* apply the
   PSP operation as the Segment Left is 2. This behavior instance will
   apply the "standard" End packet processing, ignoring the configured PSP
   flavor at all.

[1] - RFC8986: https://datatracker.ietf.org/doc/html/rfc8986

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 13:18:06 +01:00
Andrea Mayer 525c65ff56 seg6: factor out End lookup nexthop processing to a dedicated function
The End nexthop lookup/input operations are moved into a new helper
function named input_action_end_finish(). This avoids duplicating the
code needed to compute the nexthop in the different flavors of the End
behavior.

Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 13:18:06 +01:00
Ido Schimmel b20b8aec6f devlink: Fix netdev notifier chain corruption
Cited commit changed devlink to register its netdev notifier block on
the global netdev notifier chain instead of on the per network namespace
one.

However, when changing the network namespace of the devlink instance,
devlink still tries to unregister its notifier block from the chain of
the old namespace and register it on the chain of the new namespace.
This results in corruption of the notifier chains, as the same notifier
block is registered on two different chains: The global one and the per
network namespace one. In turn, this causes other problems such as the
inability to dismantle namespaces due to netdev reference count issues.

Fix by preventing devlink from moving its notifier block between
namespaces.

Reproducer:

 # echo "10 1" > /sys/bus/netdevsim/new_device
 # ip netns add test123
 # devlink dev reload netdevsim/netdevsim10 netns test123
 # ip netns del test123
 [   71.935619] unregister_netdevice: waiting for lo to become free. Usage count = 2
 [   71.938348] leaked reference.

Fixes: 565b4824c3 ("devlink: change port event netdev notifier from per-net to global")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230215073139.1360108-1-idosch@nvidia.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 11:53:47 +01:00
Pedro Tammela 2d2e75d2d4 net/sched: act_pedit: use percpu overlimit counter when available
Since act_pedit now has access to percpu counters, use the
tcf_action_inc_overlimit_qstats wrapper that will use the percpu
counter whenever they are available.

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:39:28 +01:00
Pedro Tammela 7afd073e55 net/sched: act_gate: use percpu stats
The tc action act_gate was using shared stats, move it to percpu stats.

tdc results:
1..12
ok 1 5153 - Add gate action with priority and sched-entry
ok 2 7189 - Add gate action with base-time
ok 3 a721 - Add gate action with cycle-time
ok 4 c029 - Add gate action with cycle-time-ext
ok 5 3719 - Replace gate base-time action
ok 6 d821 - Delete gate action with valid index
ok 7 3128 - Delete gate action with invalid index
ok 8 7837 - List gate actions
ok 9 9273 - Flush gate actions
ok 10 c829 - Add gate action with duplicate index
ok 11 3043 - Add gate action with invalid index
ok 12 2930 - Add gate action with cookie

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:39:28 +01:00
Pedro Tammela 288864effe net/sched: act_connmark: transition to percpu stats and rcu
The tc action act_connmark was using shared stats and taking the per
action lock in the datapath. Improve it by using percpu stats and rcu.

perf before:
- 13.55% tcf_connmark_act
   - 81.18% _raw_spin_lock
       80.46% native_queued_spin_lock_slowpath

perf after:
- 2.85% tcf_connmark_act

tdc results:
1..15
ok 1 2002 - Add valid connmark action with defaults
ok 2 56a5 - Add valid connmark action with control pass
ok 3 7c66 - Add valid connmark action with control drop
ok 4 a913 - Add valid connmark action with control pipe
ok 5 bdd8 - Add valid connmark action with control reclassify
ok 6 b8be - Add valid connmark action with control continue
ok 7 d8a6 - Add valid connmark action with control jump
ok 8 aae8 - Add valid connmark action with zone argument
ok 9 2f0b - Add valid connmark action with invalid zone argument
ok 10 9305 - Add connmark action with unsupported argument
ok 11 71ca - Add valid connmark action and replace it
ok 12 5f8f - Add valid connmark action with cookie
ok 13 c506 - Replace connmark with invalid goto chain control
ok 14 6571 - Delete connmark action with valid index
ok 15 3426 - Delete connmark action with invalid index

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:39:28 +01:00
Pedro Tammela 7d12057b45 net/sched: act_nat: transition to percpu stats and rcu
The tc action act_nat was using shared stats and taking the per action
lock in the datapath. Improve it by using percpu stats and rcu.

perf before:
- 10.48% tcf_nat_act
   - 81.83% _raw_spin_lock
        81.08% native_queued_spin_lock_slowpath

perf after:
- 0.48% tcf_nat_act

tdc results:
1..27
ok 1 7565 - Add nat action on ingress with default control action
ok 2 fd79 - Add nat action on ingress with pipe control action
ok 3 eab9 - Add nat action on ingress with continue control action
ok 4 c53a - Add nat action on ingress with reclassify control action
ok 5 76c9 - Add nat action on ingress with jump control action
ok 6 24c6 - Add nat action on ingress with drop control action
ok 7 2120 - Add nat action on ingress with maximum index value
ok 8 3e9d - Add nat action on ingress with invalid index value
ok 9 f6c9 - Add nat action on ingress with invalid IP address
ok 10 be25 - Add nat action on ingress with invalid argument
ok 11 a7bd - Add nat action on ingress with DEFAULT IP address
ok 12 ee1e - Add nat action on ingress with ANY IP address
ok 13 1de8 - Add nat action on ingress with ALL IP address
ok 14 8dba - Add nat action on egress with default control action
ok 15 19a7 - Add nat action on egress with pipe control action
ok 16 f1d9 - Add nat action on egress with continue control action
ok 17 6d4a - Add nat action on egress with reclassify control action
ok 18 b313 - Add nat action on egress with jump control action
ok 19 d9fc - Add nat action on egress with drop control action
ok 20 a895 - Add nat action on egress with DEFAULT IP address
ok 21 2572 - Add nat action on egress with ANY IP address
ok 22 37f3 - Add nat action on egress with ALL IP address
ok 23 6054 - Add nat action on egress with cookie
ok 24 79d6 - Add nat action on ingress with cookie
ok 25 4b12 - Replace nat action with invalid goto chain control
ok 26 b811 - Delete nat action with valid index
ok 27 a521 - Delete nat action with invalid index

Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:39:28 +01:00
Jesse Brandeburg 3ba0bf47ed net/core: refactor promiscuous mode message
The kernel stack can be more consistent by printing the IFF_PROMISC
aka promiscuous enable/disable messages with the standard netdev_info
message which can include bus and driver info as well as the device.

typical command usage from user space looks like:
ip link set eth0 promisc <on|off>

But lots of utilities such as bridge, tcpdump, etc put the interface into
promiscuous mode.

old message:
[  406.034418] device eth0 entered promiscuous mode
[  408.424703] device eth0 left promiscuous mode

new message:
[  406.034431] ice 0000:17:00.0 eth0: entered promiscuous mode
[  408.424715] ice 0000:17:00.0 eth0: left promiscuous mode

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:11:14 +01:00
Jesse Brandeburg 802dcbd6f3 net/core: print message for allmulticast
When the user sets or clears the IFF_ALLMULTI flag in the netdev, there are
no log messages printed to the kernel log to indicate anything happened.
This is inexplicably different from most other dev->flags changes, and
could suprise the user.

Typically this occurs from user-space when a user:
ip link set eth0 allmulticast <on|off>

However, other devices like bridge set allmulticast as well, and many
other flows might trigger entry into allmulticast as well.

The new message uses the standard netdev_info print and looks like:
[  413.246110] ixgbe 0000:17:00.0 eth0: entered allmulticast mode
[  415.977184] ixgbe 0000:17:00.0 eth0: left allmulticast mode

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 10:11:14 +01:00
Jamal Hadi Salim 265b4da82d net/sched: Retire rsvp classifier
The rsvp classifier has served us well for about a quarter of a century but has
has not been getting much maintenance attention due to lack of known users.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 09:27:07 +01:00
Jamal Hadi Salim 8c710f7525 net/sched: Retire tcindex classifier
The tcindex classifier has served us well for about a quarter of a century
but has not been getting much TLC due to lack of known users. Most recently
it has become easy prey to syzkaller. For this reason, we are retiring it.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 09:27:07 +01:00
Jamal Hadi Salim bbe77c14ee net/sched: Retire dsmark qdisc
The dsmark qdisc has served us well over the years for diffserv but has not
been getting much attention due to other more popular approaches to do diffserv
services. Most recently it has become a shooting target for syzkaller. For this
reason, we are retiring it.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 09:27:06 +01:00
Jamal Hadi Salim fb38306ceb net/sched: Retire ATM qdisc
The ATM qdisc has served us well over the years but has not been getting much
TLC due to lack of known users. Most recently it has become a shooting target
for syzkaller. For this reason, we are retiring it.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 09:27:06 +01:00
Jamal Hadi Salim 051d442098 net/sched: Retire CBQ qdisc
While this amazing qdisc has served us well over the years it has not been
getting any tender love and care and has bitrotted over time.
It has become mostly a shooting target for syzkaller lately.
For this reason, we are retiring it. Goodbye CBQ - we loved you.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-16 09:27:06 +01:00
Willem de Bruijn 14ade6ba41 net: msg_zerocopy: elide page accounting if RLIM_INFINITY
MSG_ZEROCOPY ensures that pinned user pages do not exceed the limit.
If no limit is set, skip this accounting as otherwise expensive
atomic_long operations are called for no reason.

This accounting is already skipped for privileged (CAP_IPC_LOCK)
users. Rely on the same mechanism: if no mmp->user is set,
mm_unaccount_pinned_pages does not decrement either.

Tested by running tools/testing/selftests/net/msg_zerocopy.sh with
an unprivileged user for the TXMODE binary:

    ip netns exec "${NS1}" sudo -u "{$USER}" "${BIN}" "-${IP}" ...

Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230214155740.3448763-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 21:26:36 -08:00
Moshe Shemesh 12af29e779 devlink: Move health common function to health file
Now that all devlink health callbacks and related code are in file
health.c move common health functions and devlink_health_reporter struct
to be local in health.c file.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh c9311ee13f devlink: Move devlink health test to health file
Move devlink health report test callback from leftover.c to health.c. No
functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh 7004c6c457 devlink: Move devlink health dump to health file
Move devlink health report dump callbacks and related code from
leftover.c to health.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh a929df7fd9 devlink: Move devlink fmsg and health diagnose to health file
Devlink fmsg (formatted message) is used by devlink health diagnose,
dump and drivers which support these devlink health callbacks.
Therefore, move devlink fmsg helpers and related code to file health.c.
Move devlink health diagnose to file health.c. No functional change in
this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh 55b9b24968 devlink: Move devlink health report and recover to health file
Move devlink health report helper and recover callback and related code
from leftover.c to health.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh db6b5f3ec4 devlink: Move devlink health get and set code to health file
Move devlink health get and set callbacks and related code from
leftover.c to health.c. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh bfd4e6a5db devlink: health: Fix nla_nest_end in error flow
devlink_nl_health_reporter_fill() error flow calls nla_nest_end(). Fix
it to call nla_nest_cancel() instead.

Note the bug is harmless as genlmsg_cancel() cancel the entire message,
so no fixes tag added.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Moshe Shemesh b4740e3a81 devlink: Split out health reporter create code
Move devlink health reporter create/destroy and related dev code to new
file health.c. This file shall include all callbacks and functionality
that are related to devlink health.

In addition, fix kdoc indentation and make reporter create/destroy kdoc
more clear. No functional change in this patch.

Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-15 19:15:44 -08:00
Alexander Lobakin 6c20822fad bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES
&xdp_buff and &xdp_frame are bound in a way that

xdp_buff->data_hard_start == xdp_frame

It's always the case and e.g. xdp_convert_buff_to_frame() relies on
this.
IOW, the following:

	for (u32 i = 0; i < 0xdead; i++) {
		xdpf = xdp_convert_buff_to_frame(&xdp);
		xdp_convert_frame_to_buff(xdpf, &xdp);
	}

shouldn't ever modify @xdpf's contents or the pointer itself.
However, "live packet" code wrongly treats &xdp_frame as part of its
context placed *before* the data_hard_start. With such flow,
data_hard_start is sizeof(*xdpf) off to the right and no longer points
to the XDP frame.

Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
places and praying that there are no more miscalcs left somewhere in the
code, unionize ::frm with ::data in a flex array, so that both starts
pointing to the actual data_hard_start and the XDP frame actually starts
being a part of it, i.e. a part of the headroom, not the context.
A nice side effect is that the maximum frame size for this mode gets
increased by 40 bytes, as xdp_buff::frame_sz includes everything from
data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
info.
Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
hardcoded for 64 bit && 4k pages, it can be made more flexible later on.

Minor: align `&head->data` with how `head->frm` is assigned for
consistency.
Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
clarity.

(was found while testing XDP traffic generator on ice, which calls
 xdp_convert_frame_to_buff() for each XDP frame)

Fixes: b530e9e106 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-02-15 17:39:36 -08:00
Johannes Berg 3caf31e7b1 wifi: mac80211: add documentation for amsdu_mesh_control
This documentation wasn't added in the original patch,
add it now.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 6e4c0d0460 ("wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15 18:31:16 +01:00
Johannes Berg ab5f171e36 wifi: mac80211: always initialize link_sta with sta
When we have multiple interfaces receiving the same frame,
such as a multicast frame, one interface might have a sta
and the other not. In this case, link_sta would be set but
not cleared again.

Always set link_sta, so we keep an invariant that link_sta
and sta are either both set or both not set.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15 18:27:35 +01:00
Johannes Berg 0d846bdc11 wifi: mac80211: pass 'sta' to ieee80211_rx_data_set_sta()
There's at least one case in ieee80211_rx_for_interface()
where we might pass &((struct sta_info *)NULL)->sta to it
only to then do container_of(), and then checking the
result for NULL, but checking the result of container_of()
for NULL looks really odd.

Fix this by just passing the struct sta_info * instead.

Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15 18:27:25 +01:00
Marc Bornand c38c701851 wifi: cfg80211: Set SSID if it is not already set
When a connection was established without going through
NL80211_CMD_CONNECT, the ssid was never set in the wireless_dev struct.
Now we set it in __cfg80211_connect_result() when it is not already set.

When using a userspace configuration that does not call
cfg80211_connect() (can be checked with breakpoints in the kernel),
this patch should allow `networkctl status device_name` to output the
SSID instead of null.

Cc: stable@vger.kernel.org
Reported-by: Yohan Prod'homme <kernel@zoddo.fr>
Fixes: 7b0a0e3c3a (wifi: cfg80211: do some rework towards MLO link APIs)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216711
Signed-off-by: Marc Bornand <dev.mbornand@systemb.ch>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-15 18:26:58 +01:00
NeilBrown 5bab56fff5 NFS: fix disabling of swap
When swap is activated to a file on an NFSv4 mount we arrange that the
state manager thread is always present as starting a new thread requires
memory allocations that might block waiting for swap.

Unfortunately the code for allowing the state manager thread to exit when
swap is disabled was not tested properly and does not work.
This can be seen by examining /proc/fs/nfsfs/servers after disabling swap
and unmounting the filesystem.  The servers file will still list one
entry.  Also a "ps" listing will show the state manager thread is still
present.

There are two problems.
 1/ rpc_clnt_swap_deactivate() doesn't walk up the ->cl_parent list to
    find the primary client on which the state manager runs.

 2/ The thread is not woken up properly and it immediately goes back to
    sleep without checking whether it is really needed.  Using
    nfs4_schedule_state_manager() ensures a proper wake-up.

Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: 4dc73c6791 ("NFSv4: keep state manager thread active if swap is enabled")
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-15 10:33:00 -05:00
Jakub Kicinski fda6c89fe3 net: mpls: fix stale pointer if allocation fails during device rename
lianhui reports that when MPLS fails to register the sysctl table
under new location (during device rename) the old pointers won't
get overwritten and may be freed again (double free).

Handle this gracefully. The best option would be unregistering
the MPLS from the device completely on failure, but unfortunately
mpls_ifdown() can fail. So failing fully is also unreliable.

Another option is to register the new table first then only
remove old one if the new one succeeds. That requires more
code, changes order of notifications and two tables may be
visible at the same time.

sysctl point is not used in the rest of the code - set to NULL
on failures and skip unregister if already NULL.

Reported-by: lianhui tang <bluetlh@gmail.com>
Fixes: 0fae3bf018 ("mpls: handle device renames for per-device sysctls")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-15 10:26:37 +00:00
Jason Xing fe33311c3e net: no longer support SOCK_REFCNT_DEBUG feature
Commit e48c414ee6 ("[INET]: Generalise the TCP sock ID lookup routines")
commented out the definition of SOCK_REFCNT_DEBUG in 2005 and later another
commit 463c84b97f ("[NET]: Introduce inet_connection_sock") removed it.
Since we could track all of them through bpf and kprobe related tools
and the feature could print loads of information which might not be
that helpful even under a little bit pressure, the whole feature which
has been inactive for many years is no longer supported.

Link: https://lore.kernel.org/lkml/20230211065153.54116-1-kerneljasonxing@gmail.com/
Suggested-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Jason Xing <kernelxing@tencent.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-15 10:25:21 +00:00
Pedro Tammela 42018a322b net/sched: tcindex: search key must be 16 bits
Syzkaller found an issue where a handle greater than 16 bits would trigger
a null-ptr-deref in the imperfect hash area update.

general protection fault, probably for non-canonical address
0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af]
CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted
6.2.0-rc7-syzkaller-00112-gc68f345b7c42 #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/21/2023
RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509
Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb
a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c
02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00
RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8
RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8
R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00
FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572
tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155
rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x334/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmmsg+0x18f/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80

Fixes: ee059170b1 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-15 10:23:54 +00:00
Thomas Weißschuh b279351705 net-sysfs: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14 20:48:08 -08:00
Thomas Weißschuh e8c6cbd765 net: bridge: make kobj_type structure constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definition to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14 20:48:08 -08:00
Tung Nguyen 11a4d6f67c tipc: fix kernel warning when sending SYN message
When sending a SYN message, this kernel stack trace is observed:

...
[   13.396352] RIP: 0010:_copy_from_iter+0xb4/0x550
...
[   13.398494] Call Trace:
[   13.398630]  <TASK>
[   13.398630]  ? __alloc_skb+0xed/0x1a0
[   13.398630]  tipc_msg_build+0x12c/0x670 [tipc]
[   13.398630]  ? shmem_add_to_page_cache.isra.71+0x151/0x290
[   13.398630]  __tipc_sendmsg+0x2d1/0x710 [tipc]
[   13.398630]  ? tipc_connect+0x1d9/0x230 [tipc]
[   13.398630]  ? __local_bh_enable_ip+0x37/0x80
[   13.398630]  tipc_connect+0x1d9/0x230 [tipc]
[   13.398630]  ? __sys_connect+0x9f/0xd0
[   13.398630]  __sys_connect+0x9f/0xd0
[   13.398630]  ? preempt_count_add+0x4d/0xa0
[   13.398630]  ? fpregs_assert_state_consistent+0x22/0x50
[   13.398630]  __x64_sys_connect+0x16/0x20
[   13.398630]  do_syscall_64+0x42/0x90
[   13.398630]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

It is because commit a41dad905e ("iov_iter: saner checks for attempt
to copy to/from iterator") has introduced sanity check for copying
from/to iov iterator. Lacking of copy direction from the iterator
viewpoint would lead to kernel stack trace like above.

This commit fixes this issue by initializing the iov iterator with
the correct copy direction when sending SYN or ACK without data.

Fixes: f25dcc7687 ("tipc: tipc ->sendmsg() conversion")
Reported-by: syzbot+d43608d061e8847ec9f3@syzkaller.appspotmail.com
Acked-by: Jon Maloy <jmaloy@redhat.com>
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Link: https://lore.kernel.org/r/20230214012606.5804-1-tung.q.nguyen@dektech.com.au
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14 20:46:24 -08:00
Eric Dumazet 2558b8039d net: use a bounce buffer for copying skb->mark
syzbot found arm64 builds would crash in sock_recv_mark()
when CONFIG_HARDENED_USERCOPY=y

x86 and powerpc are not detecting the issue because
they define user_access_begin.
This will be handled in a different patch,
because a check_object_size() is missing.

Only data from skb->cb[] can be copied directly to/from user space,
as explained in commit 79a8a642bf ("net: Whitelist
the skbuff_head_cache "cb" field")

syzbot report was:
usercopy: Kernel memory exposure attempt detected from SLUB object 'skbuff_head_cache' (offset 168, size 4)!
------------[ cut here ]------------
kernel BUG at mm/usercopy.c:102 !
Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 4410 Comm: syz-executor533 Not tainted 6.2.0-rc7-syzkaller-17907-g2d3827b3f393 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/21/2023
pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : usercopy_abort+0x90/0x94 mm/usercopy.c:90
lr : usercopy_abort+0x90/0x94 mm/usercopy.c:90
sp : ffff80000fb9b9a0
x29: ffff80000fb9b9b0 x28: ffff0000c6073400 x27: 0000000020001a00
x26: 0000000000000014 x25: ffff80000cf52000 x24: fffffc0000000000
x23: 05ffc00000000200 x22: fffffc000324bf80 x21: ffff0000c92fe1a8
x20: 0000000000000001 x19: 0000000000000004 x18: 0000000000000000
x17: 656a626f2042554c x16: ffff0000c6073dd0 x15: ffff80000dbd2118
x14: ffff0000c6073400 x13: 00000000ffffffff x12: ffff0000c6073400
x11: ff808000081bbb4c x10: 0000000000000000 x9 : 7b0572d7cc0ccf00
x8 : 7b0572d7cc0ccf00 x7 : ffff80000bf650d4 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000001 x3 : 0000000000000000
x2 : ffff0001fefbff08 x1 : 0000000100000000 x0 : 000000000000006c
Call trace:
usercopy_abort+0x90/0x94 mm/usercopy.c:90
__check_heap_object+0xa8/0x100 mm/slub.c:4761
check_heap_object mm/usercopy.c:196 [inline]
__check_object_size+0x208/0x6b8 mm/usercopy.c:251
check_object_size include/linux/thread_info.h:199 [inline]
__copy_to_user include/linux/uaccess.h:115 [inline]
put_cmsg+0x408/0x464 net/core/scm.c:238
sock_recv_mark net/socket.c:975 [inline]
__sock_recv_cmsgs+0x1fc/0x248 net/socket.c:984
sock_recv_cmsgs include/net/sock.h:2728 [inline]
packet_recvmsg+0x2d8/0x678 net/packet/af_packet.c:3482
____sys_recvmsg+0x110/0x3a0
___sys_recvmsg net/socket.c:2737 [inline]
__sys_recvmsg+0x194/0x210 net/socket.c:2767
__do_sys_recvmsg net/socket.c:2777 [inline]
__se_sys_recvmsg net/socket.c:2774 [inline]
__arm64_sys_recvmsg+0x2c/0x3c net/socket.c:2774
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x64/0x178 arch/arm64/kernel/syscall.c:52
el0_svc_common+0xbc/0x180 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x48/0x110 arch/arm64/kernel/syscall.c:193
el0_svc+0x58/0x14c arch/arm64/kernel/entry-common.c:637
el0t_64_sync_handler+0x84/0xf0 arch/arm64/kernel/entry-common.c:655
el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:591
Code: 91388800 aa0903e1 f90003e8 94e6d752 (d4210000)

Fixes: 6fd1d51cfa ("net: SO_RCVMARK socket option for SO_MARK with recvmsg()")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erin MacNeil <lnx.erin@gmail.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20230213160059.3829741-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-14 20:31:13 -08:00
Thomas Weißschuh d67307b414 SUNRPC: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-02-14 16:21:52 -05:00
Johannes Berg cf08e29db7 wifi: mac80211: fix off-by-one link setting
The convention for find_first_bit() is 0-based, while ffs()
is 1-based, so this is now off-by-one. I cannot reproduce the
gcc-9 problem, but since the -1 is now removed, I'm hoping it
will still avoid the original issue.

Reported-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Fixes: 1d8d4af434 ("wifi: mac80211: avoid u32_encode_bits() warning")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 20:09:30 +01:00
Gilad Itzkovitch e6f5dcb7ec wifi: mac80211: Fix for Rx fragmented action frames
The ieee80211_accept_frame() function performs a number of early checks
to decide whether or not further processing needs to be done on a frame.
One of those checks is the ieee80211_is_robust_mgmt_frame() function.
It requires to peek into the frame payload, but because defragmentation
does not occur until later on in the receive path, this peek is invalid
for any fragment other than the first one. Also, in this scenario there
is no STA and so the fragmented frame will be dropped later on in the
process and will not reach the upper stack. This can happen with large
action frames at low rates, for example, we see issues with DPP on S1G.

This change will only check if the frame is robust if it's the first
fragment. Invalid fragmented packets will be discarded later after
defragmentation is completed.

Signed-off-by: Gilad Itzkovitch <gilad.itzkovitch@morsemicro.com>
Link: https://lore.kernel.org/r/20221124005336.1618411-1-gilad.itzkovitch@morsemicro.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 14:48:25 +01:00
Arnd Bergmann 1d8d4af434 wifi: mac80211: avoid u32_encode_bits() warning
gcc-9 triggers a false-postive warning in ieee80211_mlo_multicast_tx()
for u32_encode_bits(ffs(links) - 1, ...), since ffs() can return zero
on an empty bitmask, and the negative argument to u32_encode_bits()
is then out of range:

In file included from include/linux/ieee80211.h:21,
                 from include/net/cfg80211.h:23,
                 from net/mac80211/tx.c:23:
In function 'u32_encode_bits',
    inlined from 'ieee80211_mlo_multicast_tx' at net/mac80211/tx.c:4437:17,
    inlined from 'ieee80211_subif_start_xmit' at net/mac80211/tx.c:4485:3:
include/linux/bitfield.h:177:3: error: call to '__field_overflow' declared with attribute error: value doesn't fit into mask
  177 |   __field_overflow();     \
      |   ^~~~~~~~~~~~~~~~~~
include/linux/bitfield.h:197:2: note: in expansion of macro '____MAKE_OP'
  197 |  ____MAKE_OP(u##size,u##size,,)
      |  ^~~~~~~~~~~
include/linux/bitfield.h:200:1: note: in expansion of macro '__MAKE_OP'
  200 | __MAKE_OP(32)
      | ^~~~~~~~~

Newer compiler versions do not cause problems with the zero argument
because they do not consider this a __builtin_constant_p().
It's also harmless since the hweight16() check already guarantees
that this cannot be 0.

Replace the ffs() with an equivalent find_first_bit() check that
matches the later for_each_set_bit() style and avoids the warning.

Fixes: 963d0e8d08 ("wifi: mac80211: optionally implement MLO multicast TX")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20230214132025.1532147-1-arnd@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 14:44:13 +01:00
Jiri Pirko 2edd925704 devlink: don't allow to change net namespace for FW_ACTIVATE reload action
The change on network namespace only makes sense during re-init reload
action. For FW activation it is not applicable. So check if user passed
an ATTR indicating network namespace change request and forbid it.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230213115836.3404039-1-jiri@resnulli.us
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-02-14 14:04:21 +01:00
Andrei Otcheretianski daf8fb4295 wifi: mac80211: Don't translate MLD addresses for multicast
MLD address translation should be done only for individually addressed
frames. Otherwise, AAD calculation would be wrong and the decryption
would fail.

Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com>
Link: https://lore.kernel.org/r/20230214101048.792414-1-andrei.otcheretianski@intel.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 13:36:06 +01:00
Wen Gong d99975c495 wifi: cfg80211: call reg_notifier for self managed wiphy from driver hint
Currently the regulatory driver does not call the regulatory callback
reg_notifier for self managed wiphys. Sometimes driver needs cfg80211
to calculate the info of ieee80211_channel such as flags and power,
and driver needs to get the info of ieee80211_channel after hint of
driver, but driver does not know when calculation of the info of
ieee80211_channel become finished, so add notify to driver in
reg_process_self_managed_hint() from cfg80211 is a good way, then
driver could get the correct info in callback of reg_notifier.

Signed-off-by: Wen Gong <quic_wgong@quicinc.com>
Link: https://lore.kernel.org/r/20230201065313.27203-1-quic_wgong@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:37:39 +01:00
Lorenzo Bianconi 935ef47b16 wifi: cfg80211: get rid of gfp in cfg80211_bss_color_notify
Since cfg80211_bss_color_notify() is now always run in non-atomic
context, get rid of gfp_t flags in the routine signature and always use
GFP_KERNEL for netlink message allocation.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/c687724e7b53556f7a2d9cbe3d11cdcf065cb687.1675255390.git.lorenzo@kernel.org
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:35:02 +01:00
Vinay Gannevaram 9b89495e47 wifi: nl80211: Allow authentication frames and set keys on NAN interface
Wi-Fi Aware R4 specification defines NAN Pairing which uses PASN handshake
to authenticate the peer and generate keys. Hence allow to register and transmit
the PASN authentication frames on NAN interface and set the keys to driver or
underlying modules on NAN interface.

The driver needs to configure the feature flag NL80211_EXT_FEATURE_SECURE_NAN,
which also helps userspace modules to know if the driver supports secure NAN.

Signed-off-by: Vinay Gannevaram <quic_vganneva@quicinc.com>
Link: https://lore.kernel.org/r/1675519179-24174-1-git-send-email-quic_vganneva@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:35:02 +01:00
Karthikeyan Periyasamy aaacf1740f wifi: mac80211: fix non-MLO station association
Non-MLO station frames are dropped in Rx path due to the condition
check in ieee80211_rx_is_valid_sta_link_id(). In multi-link AP scenario,
non-MLO stations try to connect in any of the valid links in the ML AP,
where the station valid_links and link_id params are valid in the
ieee80211_sta object. But ieee80211_rx_is_valid_sta_link_id() always
return false for the non-MLO stations by the assumption taken is
valid_links and link_id are not valid in non-MLO stations object
(ieee80211_sta), this assumption is wrong. Due to this assumption,
non-MLO station frames are dropped which leads to failure in association.

Fix it by removing the condition check and allow the link validation
check for the non-MLO stations.

Fixes: e66b7920aa ("wifi: mac80211: fix initialization of rx->link and rx->link_sta")
Signed-off-by: Karthikeyan Periyasamy <quic_periyasa@quicinc.com>
Link: https://lore.kernel.org/r/20230206160330.1613-1-quic_periyasa@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:35:02 +01:00
Rameshkumar Sundaram 57b341e9ab wifi: mac80211: Allow NSS change only up to capability
Stations can update bandwidth/NSS change in
VHT action frame with action type Operating Mode Notification.
(IEEE Std 802.11-2020 - 9.4.1.53 Operating Mode field)

For Operating Mode Notification, an RX NSS change to a value
greater than AP's maximum NSS should not be allowed.
During fuzz testing, by forcefully sending VHT Op. mode notif.
frames from STA with random rx_nss values, it is found that AP
accepts rx_nss values greater that APs maximum NSS instead of
discarding such NSS change.

Hence allow NSS change only up to maximum NSS that is negotiated
and capped to AP's capability during association.

Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/20230207114146.10567-1-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:35:02 +01:00
Felix Fietkau 6e4c0d0460 wifi: mac80211: add a workaround for receiving non-standard mesh A-MSDU
At least ath10k and ath11k supported hardware (maybe more) does not implement
mesh A-MSDU aggregation in a standard compliant way.
802.11-2020 9.3.2.2.2 declares that the Mesh Control field is part of the
A-MSDU header (and little-endian).
As such, its length must not be included in the subframe length field.
Hardware affected by this bug treats the mesh control field as part of the
MSDU data and sets the length accordingly.
In order to avoid packet loss, keep track of which stations are affected
by this and take it into account when converting A-MSDU to 802.3 + mesh control
packets.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-5-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:35:02 +01:00
Felix Fietkau 986e43b19a wifi: mac80211: fix receiving A-MSDU frames on mesh interfaces
The current mac80211 mesh A-MSDU receive path fails to parse A-MSDU packets
on mesh interfaces, because it assumes that the Mesh Control field is always
directly after the 802.11 header.
802.11-2020 9.3.2.2.2 Figure 9-70 shows that the Mesh Control field is
actually part of the A-MSDU subframe header.
This makes more sense, since it allows packets for multiple different
destinations to be included in the same A-MSDU, as long as RA and TID are
still the same.
Another issue is the fact that the A-MSDU subframe length field was apparently
accidentally defined as little-endian in the standard.

In order to fix this, the mesh forwarding path needs happen at a different
point in the receive path.

ieee80211_data_to_8023_exthdr is changed to ignore the mesh control field
and leave it in after the ethernet header. This also affects the source/dest
MAC address fields, which now in the case of mesh point to the mesh SA/DA.

ieee80211_amsdu_to_8023s is changed to deal with the endian difference and
to add the Mesh Control length to the subframe length, since it's not covered
by the MSDU length field.

With these changes, the mac80211 will get the same packet structure for
converted regular data packets and unpacked A-MSDU subframes.

The mesh forwarding checks are now only performed after the A-MSDU decap.
For locally received packets, the Mesh Control header is stripped away.
For forwarded packets, a new 802.11 header gets added.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-4-nbd@nbd.name
[fix fortify build error]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:34:51 +01:00
Felix Fietkau 5c1e269aa5 wifi: mac80211: remove mesh forwarding congestion check
Now that all drivers use iTXQ, it does not make sense to check to drop
tx forwarding packets when the driver has stopped the queues.
fq_codel will take care of dropping packets when the queues fill up

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-3-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:25:23 +01:00
Felix Fietkau 9f718554e7 wifi: cfg80211: factor out bridge tunnel / RFC1042 header check
The same check is done in multiple places, unify it.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-2-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:25:11 +01:00
Felix Fietkau 0f690e6b4d wifi: cfg80211: move A-MSDU check in ieee80211_data_to_8023_exthdr
When parsing the outer A-MSDU header, don't check for inner bridge tunnel
or RFC1042 headers. This is handled by ieee80211_amsdu_to_8023s already.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Link: https://lore.kernel.org/r/20230213100855.34315-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:25:01 +01:00
Shayne Chen 59336e07b2 wifi: mac80211: make rate u32 in sta_set_rate_info_rx()
The value of last_rate in ieee80211_sta_rx_stats is degraded from u32 to
u16 after being assigned to rate variable, which causes information loss
in STA_STATS_FIELD_TYPE and later bitfields.

Signed-off-by: Shayne Chen <shayne.chen@mediatek.com>
Link: https://lore.kernel.org/r/20230209110659.25447-1-shayne.chen@mediatek.com
Fixes: 41cbb0f5a2 ("mac80211: add support for HE")
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:23:12 +01:00
Bo Liu 796703baea rfkill: Use sysfs_emit() to instead of sprintf()
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: Bo Liu <liubo03@inspur.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230206081641.3193-1-liubo03@inspur.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:21:14 +01:00
Rameshkumar Sundaram 19085ef39f wifi: cfg80211: Allow action frames to be transmitted with link BSS in MLD
Currently action frames TX only with ML address as A3(BSSID) are
allowed in an ML AP, but TX for a non-ML Station can happen in any
link of an ML BSS with link BSS address as A3.
In case of an MLD, if User-space has provided a valid link_id in
action frame TX request, allow transmission of the frame in that link.

Signed-off-by: Rameshkumar Sundaram <quic_ramess@quicinc.com>
Link: https://lore.kernel.org/r/20230201061602.3918-1-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:17:54 +01:00
Aloka Dixit 2cc25e4b2a wifi: mac80211: configure puncturing bitmap
- Configure the bitmap in link_conf and notify the driver.
- Modify 'change' in ieee80211_start_ap() from u32 to u64 to support
BSS_CHANGED_EHT_PUNCTURING.
- Propagate the bitmap in channel switch events to userspace.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Signed-off-by: Muna Sinada <quic_msinada@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-5-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:17:22 +01:00
Aloka Dixit b345f0637c wifi: cfg80211: include puncturing bitmap in channel switch events
Add puncturing bitmap in channel switch notifications
and corresponding trace functions.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-4-quic_alokad@quicinc.com
[fix qtnfmac]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:14:39 +01:00
Aloka Dixit d7c1a9a0ed wifi: nl80211: validate and configure puncturing bitmap
- New feature flag, NL80211_EXT_FEATURE_PUNCT, to advertise
  driver support for preamble puncturing in AP mode.
- New attribute, NL80211_ATTR_PUNCT_BITMAP, to receive a puncturing
  bitmap from the userspace during AP bring up (NL80211_CMD_START_AP)
  and channel switch (NL80211_CMD_CHANNEL_SWITCH) operations. Each bit
  corresponds to a 20 MHz channel in the operating bandwidth, lowest
  bit for the lowest channel. Bit set to 1 indicates that the channel
  is punctured. Higher 16 bits are reserved.
- New members added to structures cfg80211_ap_settings and
  cfg80211_csa_settings to propagate the bitmap to the driver after
  validation.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Signed-off-by: Muna Sinada <quic_msinada@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-3-quic_alokad@quicinc.com
[move validation against 0xffff into policy]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:13:24 +01:00
Aloka Dixit b25413fed3 wifi: cfg80211: move puncturing bitmap validation from mac80211
- Move ieee80211_valid_disable_subchannel_bitmap() from mlme.c to
  chan.c, rename it as cfg80211_valid_disable_subchannel_bitmap()
  and export it.
- Modify the prototype to include struct cfg80211_chan_def instead
  of only bandwidth to support a check which returns false if the
  primary channel is punctured.

Signed-off-by: Aloka Dixit <quic_alokad@quicinc.com>
Link: https://lore.kernel.org/r/20230131001227.25014-2-quic_alokad@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:09:18 +01:00
Jaewan Kim 90b2c3cc4b wifi: nl80211: return error message for malformed chandef
Add an error message to the missing frequency case to have all
-EINVAL in nl80211_parse_chandef() return a better error.

Signed-off-by: Jaewan Kim <jaewan@google.com>
Link: https://lore.kernel.org/r/20230130074514.1560021-1-jaewan@google.com
[rewrite commit message]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:09:18 +01:00
Alvin Šipraga cba7217a92 wifi: nl80211: add MLO_LINK_ID to CMD_STOP_AP event
nl80211_send_ap_stopped() can be called multiple times on the same
netdev for each link when using Multi-Link Operation. Add the
MLO_LINK_ID attribute to the event to allow userspace to distinguish
which link the event is for.

Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Link: https://lore.kernel.org/r/20230128125844.2407135-2-alvin@pqrs.dk
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2023-02-14 12:09:17 +01:00