Commit Graph

1154783 Commits

Author SHA1 Message Date
Linus Torvalds 28b4387f0e Networking fixes for 6.2-rc6, including fixes from netfilter.
Current release - regressions:
 
   - sched: sch_taprio: do not schedule in taprio_reset()
 
 Previous releases - regressions:
 
   - core: fix UaF in netns ops registration error path
 
   - ipv4: prevent potential spectre v1 gadgets
 
   - ipv6: fix reachability confirmation with proxy_ndp
 
   - netfilter: fix for the set rbtree
 
   - eth: fec: use page_pool_put_full_page when freeing rx buffers
 
   - eth: iavf: fix temporary deadlock and failure to set MAC address
 
 Previous releases - always broken:
 
  - netlink: prevent potential spectre v1 gadgets
 
  - netfilter: fixes for SCTP connection tracking
 
  - mctp: struct sock lifetime fixes
 
  - eth: ravb: fix possible hang if RIS2_QFF1 happen
 
  - eth: tg3: resolve deadlock in tg3_reset_task() during EEH
 
 Misc:
 
  - Mat stepped out as MPTCP co-maintainer
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmPSbsQSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOk1sAP/0uQCY1dZ3Q+PSPurc0+ZyWU+lW4bMjV
 ok98iYlQqvavfKVVcPDkI7dC/ag7vaiuaveYg1KjOC1sfgO7g/l90vHxXgLkP8qw
 Oy5ABmPGEAvZwAInl/ACzCvaXgLjOYiti7uRvFQ8ECQJXKoNUDIrt4fXbm/j2TLs
 +bgVwwr4dUdrsTMZS/P7t3bL6XefBzVp/v2bUnroBTFQgZQ/HEuWreYM55XMnYX0
 0GyOUXrkslm4ZZWUrvgLXJDyvonTl5jNI5BnS1XGNtcZZOe9sKkJdLndnEz9FZdT
 jIDmgtGhRYDqGdeVq2RpNNLxuRGB5JwcciP6k/zDZrckV3IxGzESs6G4E2Sd9CSk
 Xed2lAEAmdLn2X5N0k3PNT/csadA0BhdD6hI3B4nRZF1XSYPQUZtaA05m4TwEYWS
 G3LfEeKgEyLycFNsbAGWjg+2r1zSqj2Bu6f9VCeAJjL+APxNwvMqdC1vlrgyiDc4
 QLEYFsNX8fY9+tDJPySFamqboC7YrbAkMzZ/w9Hl/s3AmIcXudS7FlpI/uTixMLR
 MI5yRLB1mBXB4v8v9XN/fuR6PWu0umTFxpR5bbbnjJuksNh5tNhduKCWNOGGVGnm
 2WIBTNJO2GLmliL8+swLUWekIZUuVf+upE/vOK+9ENSEn65lXfW2UvMWqFPJyByl
 Ubl547BAwKBT
 =ay1Q
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - sched: sch_taprio: do not schedule in taprio_reset()

  Previous releases - regressions:

   - core: fix UaF in netns ops registration error path

   - ipv4: prevent potential spectre v1 gadgets

   - ipv6: fix reachability confirmation with proxy_ndp

   - netfilter: fix for the set rbtree

   - eth: fec: use page_pool_put_full_page when freeing rx buffers

   - eth: iavf: fix temporary deadlock and failure to set MAC address

  Previous releases - always broken:

   - netlink: prevent potential spectre v1 gadgets

   - netfilter: fixes for SCTP connection tracking

   - mctp: struct sock lifetime fixes

   - eth: ravb: fix possible hang if RIS2_QFF1 happen

   - eth: tg3: resolve deadlock in tg3_reset_task() during EEH

  Misc:

   - Mat stepped out as MPTCP co-maintainer"

* tag 'net-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (40 commits)
  net: mdio-mux-meson-g12a: force internal PHY off on mux switch
  docs: networking: Fix bridge documentation URL
  tsnep: Fix TX queue stop/wake for multiple queues
  net/tg3: resolve deadlock in tg3_reset_task() during EEH
  net: mctp: mark socks as dead on unhash, prevent re-add
  net: mctp: hold key reference when looking up a general key
  net: mctp: move expiry timer delete to unhash
  net: mctp: add an explicit reference from a mctp_sk_key to sock
  net: ravb: Fix possible hang if RIS2_QFF1 happen
  net: ravb: Fix lack of register setting after system resumed for Gen3
  net/x25: Fix to not accept on connected socket
  ice: move devlink port creation/deletion
  sctp: fail if no bound addresses can be used for a given scope
  net/sched: sch_taprio: do not schedule in taprio_reset()
  Revert "Merge branch 'ethtool-mac-merge'"
  netrom: Fix use-after-free of a listening socket.
  netfilter: conntrack: unify established states for SCTP paths
  Revert "netfilter: conntrack: add sctp DATA_SENT state"
  netfilter: conntrack: fix bug in for_each_sctp_chunk
  netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
  ...
2023-01-26 10:20:12 -08:00
Linus Torvalds 262b42e02d treewide: fix up files incorrectly marked executable
I'm not exactly clear on what strange workflow causes people to do it,
but clearly occasionally some files end up being committed as executable
even though they clearly aren't.

This is a reprise of commit 90fda63fa1 ("treewide: fix up files
incorrectly marked executable"), just with a different set of files (but
with the same trivial shell scripting).

So apparently we need to re-do this every five years or so, and Joe
needs to just keep reminding me to do so ;)

Reported-by: Joe Perches <joe@perches.com>
Fixes: 523375c943 ("drm/vmwgfx: Port vmwgfx to arm64")
Fixes: 5c43993777 ("ASoC: codecs: add support for ES8326")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-26 10:05:39 -08:00
Jerome Brunet 7083df59ab net: mdio-mux-meson-g12a: force internal PHY off on mux switch
Force the internal PHY off then on when switching to the internal path.
This fixes problems where the PHY ID is not properly set.

Fixes: 7090425104 ("net: phy: add amlogic g12a mdio mux support")
Suggested-by: Qi Duan <qi.duan@amlogic.com>
Co-developed-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
Link: https://lore.kernel.org/r/20230124101157.232234-1-jbrunet@baylibre.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25 22:46:51 -08:00
Ivan Vecera aee2770d19 docs: networking: Fix bridge documentation URL
Current documentation URL [1] is no longer valid.

[1] https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124145127.189221-1-ivecera@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25 22:44:27 -08:00
Gerhard Engleder 3d53aaef43 tsnep: Fix TX queue stop/wake for multiple queues
netif_stop_queue() and netif_wake_queue() act on TX queue 0. This is ok
as long as only a single TX queue is supported. But support for multiple
TX queues was introduced with 762031375d and I missed to adapt stop
and wake of TX queues.

Use netif_stop_subqueue() and netif_tx_wake_queue() to act on specific
TX queue.

Fixes: 762031375d ("tsnep: Support multiple TX/RX queue pairs")
Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com>
Link: https://lore.kernel.org/r/20230124191440.56887-1-gerhard@engleder-embedded.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25 22:41:50 -08:00
David Christensen 6c4ca03bd8 net/tg3: resolve deadlock in tg3_reset_task() during EEH
During EEH error injection testing, a deadlock was encountered in the tg3
driver when tg3_io_error_detected() was attempting to cancel outstanding
reset tasks:

crash> foreach UN bt
...
PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
...
 #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
 #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
 #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
...

PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
...
 #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
 #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c00000003721fc60] process_one_work at c00000000019e5c4
...

PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
...
 #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
 #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c000000037247c60] process_one_work at c00000000019e5c4
...

PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
...:1

 #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
 #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
 #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
...

Code inspection shows that both tg3_io_error_detected() and
tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
their code blocks.  If tg3_reset_task() should happen to execute between
the times when tg3_io_error_deteced() acquires the RTNL lock and
tg3_reset_task_cancel() is called, a deadlock will occur.

Moving tg3_reset_task_cancel() call earlier within the code block, prior
to acquiring RTNL, prevents this from happening, but also exposes another
deadlock issue where tg3_reset_task() may execute AFTER
tg3_io_error_detected() has executed:

crash> foreach UN bt
PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
...
 #4 [c000000006867a60] rtnl_lock at c000000000c940d8
 #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
 #6 [c000000006867b00] eeh_report_reset at c00000000004de88
...
PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
...
 #3 [c000000036c1bb70] msleep at c000000000259e6c
 #4 [c000000036c1bba0] napi_disable at c000000000c6b848
 #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
 #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
...

This issue can be avoided by aborting tg3_reset_task() if EEH error
recovery is already in progress.

Fixes: db84bf43ef ("tg3: tg3_reset_task() needs to use rtnl_lock to synchronize")
Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Link: https://lore.kernel.org/r/20230124185339.225806-1-drc@linux.vnet.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-25 22:35:42 -08:00
Linus Torvalds 7c46948a6e fs.fuse.acl.v6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY8/6rwAKCRCRxhvAZXjc
 okFnAP43wz7vu7w4dUbq+UP+a9SeB7TVp3WYcQC7LT2hlGKaNgEApcgstqa3MY+r
 TH3xgH/LbIWc380k01bkCjfU6YfZDwk=
 =tkHk
 -----END PGP SIGNATURE-----

Merge tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull fuse ACL fix from Christian Brauner:
 "The new posix acl API doesn't depend on the xattr handler
  infrastructure anymore and instead only relies on the posix acl inode
  operations. As a result daemons without FUSE_POSIX_ACL are unable to
  use posix acls like they used to.

  Fix this by copying what we did for overlayfs during the posix acl api
  conversion. Make fuse implement a dedicated ->get_inode_acl() method
  as does overlayfs. Fuse can then also uses this to express different
  needs for vfs permission checking during lookup and acl based
  retrieval via the regular system call path.

  This allows fuse to continue to refuse retrieving posix acls for
  daemons that don't set FUSE_POSXI_ACL for permission checking while
  also allowing a fuse server to retrieve it via the usual system calls"

* tag 'fs.fuse.acl.v6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  fuse: fixes after adapting to new posix acl api
2023-01-25 09:15:15 -08:00
David S. Miller ac8d986cbf Merge branch 'mptcp-fixes'
Jeremy Kerr says:

====================
net: mctp: struct sock lifetime fixes

This series is a set of fixes for the sock lifetime handling in the
AF_MCTP code, fixing a uaf reported by Noam Rathaus
<noamr@ssd-disclosure.com>.

The Fixes: tags indicate the original patches affected, but some
tweaking to backport to those commits may be needed; I have a separate
branch with backports to 5.15 if that helps with stable trees.

Of course, any comments/queries most welcome.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:07:38 +00:00
Jeremy Kerr b98e1a04e2 net: mctp: mark socks as dead on unhash, prevent re-add
Once a socket has been unhashed, we want to prevent it from being
re-used in a sk_key entry as part of a routing operation.

This change marks the sk as SOCK_DEAD on unhash, which prevents addition
into the net's key list.

We need to do this during the key add path, rather than key lookup, as
we release the net keys_lock between those operations.

Fixes: 4a992bbd36 ("mctp: Implement message fragmentation & reassembly")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:07:37 +00:00
Paolo Abeni 6e54ea37e3 net: mctp: hold key reference when looking up a general key
Currently, we have a race where we look up a sock through a "general"
(ie, not directly associated with the (src,dest,tag) tuple) key, then
drop the key reference while still holding the key's sock.

This change expands the key reference until we've finished using the
sock, and hence the sock reference too.

Commit message changes from Jeremy Kerr <jk@codeconstruct.com.au>.

Reported-by: Noam Rathaus <noamr@ssd-disclosure.com>
Fixes: 73c618456d ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:07:37 +00:00
Jeremy Kerr 5f41ae6fca net: mctp: move expiry timer delete to unhash
Currently, we delete the key expiry timer (in sk->close) before
unhashing the sk. This means that another thread may find the sk through
its presence on the key list, and re-queue the timer.

This change moves the timer deletion to the unhash, after we have made
the key no longer observable, so the timer cannot be re-queued.

Fixes: 7b14e15ae6 ("mctp: Implement a timeout for tags")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:07:37 +00:00
Jeremy Kerr de8a6b15d9 net: mctp: add an explicit reference from a mctp_sk_key to sock
Currently, we correlate the mctp_sk_key lifetime to the sock lifetime
through the sock hash/unhash operations, but this is pretty tenuous, and
there are cases where we may have a temporary reference to an unhashed
sk.

This change makes the reference more explicit, by adding a hold on the
sock when it's associated with a mctp_sk_key, released on final key
unref.

Fixes: 73c618456d ("mctp: locking, lifetime and validity changes for sk_keys")
Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:07:37 +00:00
David S. Miller a9e9b78d53 Merge branch 'ravb-fixes'
Yoshihiro Shimoda says:

====================
net: ravb: Fix potential issues

Fix potentiall issues on the ravb driver.

Changes from v2:
https://lore.kernel.org/all/20230123131331.1425648-1-yoshihiro.shimoda.uh@renesas.com/
 - Add Reviewed-by in the patch [2/2].
 - Add a commit description in the patch [2/2].

Changes from v1:
https://lore.kernel.org/all/20230119043920.875280-1-yoshihiro.shimoda.uh@renesas.com/
 - Fix typo in the patch [1/2].
 - Add Reviewed-by in the patch [1/2].
 - Fix "Fixed" tag in the patch [2/2].
 - Fix a comment indentation of the code in the patch [2/2].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:00:27 +00:00
Yoshihiro Shimoda f3c07758c9 net: ravb: Fix possible hang if RIS2_QFF1 happen
Since this driver enables the interrupt by RIC2_QFE1, this driver
should clear the interrupt flag if it happens. Otherwise, the interrupt
causes to hang the system.

Note that this also fix a minor coding style (a comment indentation)
around the fixed code.

Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:00:27 +00:00
Yoshihiro Shimoda c2b6cdee1d net: ravb: Fix lack of register setting after system resumed for Gen3
After system entered Suspend to RAM, registers setting of this
hardware is reset because the SoC will be turned off. On R-Car Gen3
(info->ccc_gac), ravb_ptp_init() is called in ravb_probe() only. So,
after system resumed, it lacks of the initial settings for ptp. So,
add ravb_ptp_{init,stop}() into ravb_{resume,suspend}().

Fixes: f5d7837f96 ("ravb: ptp: Add CONFIG mode support")
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 13:00:27 +00:00
Hyunwoo Kim f2b0b5210f net/x25: Fix to not accept on connected socket
When listen() and accept() are called on an x25 socket
that connect() succeeds, accept() succeeds immediately.
This is because x25_connect() queues the skb to
sk->sk_receive_queue, and x25_accept() dequeues it.

This creates a child socket with the sk of the parent
x25 socket, which can cause confusion.

Fix x25_listen() to return -EINVAL if the socket has
already been successfully connect()ed to avoid this issue.

Signed-off-by: Hyunwoo Kim <v4bel@theori.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-25 09:51:04 +00:00
Jakub Kicinski 2a48216cff Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Perform SCTP vtag verification for ABORT/SHUTDOWN_COMPLETE according
   to RFC 9260, Sect 8.5.1.

2) Fix infinite loop if SCTP chunk size is zero in for_each_sctp_chunk().
   And remove useless check in this macro too.

3) Revert DATA_SENT state in the SCTP tracker, this was applied in the
   previous merge window. Next patch in this series provides a more
   simple approach to multihoming support.

4) Unify HEARTBEAT_ACKED and ESTABLISHED states for SCTP multihoming
   support, use default ESTABLISHED of 210 seconds based on
   heartbeat timeout * maximum number of retransmission + round-trip timeout.
   Otherwise, SCTP conntrack entry that represents secondary paths
   remain stale in the table for up to 5 days.

This is a slightly large batch with fixes for the SCTP connection
tracking helper, all patches from Sriram Yagnaraman.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: conntrack: unify established states for SCTP paths
  Revert "netfilter: conntrack: add sctp DATA_SENT state"
  netfilter: conntrack: fix bug in for_each_sctp_chunk
  netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
====================

Link: https://lore.kernel.org/r/20230124183933.4752-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24 18:59:37 -08:00
Paul M Stillwell Jr 418e53401e ice: move devlink port creation/deletion
Commit a286ba7387 ("ice: reorder PF/representor devlink
port register/unregister flows") moved the code to create
and destroy the devlink PF port. This was fine, but created
a corner case issue in the case of ice_register_netdev()
failing. In that case, the driver would end up calling
ice_devlink_destroy_pf_port() twice.

Additionally, it makes no sense to tie creation of the devlink
PF port to the creation of the netdev so separate out the
code to create/destroy the devlink PF port from the netdev
code. This makes it a cleaner interface.

Fixes: a286ba7387 ("ice: reorder PF/representor devlink port register/unregister flows")
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20230124005714.3996270-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24 18:52:15 -08:00
Marcelo Ricardo Leitner 458e279f86 sctp: fail if no bound addresses can be used for a given scope
Currently, if you bind the socket to something like:
        servaddr.sin6_family = AF_INET6;
        servaddr.sin6_port = htons(0);
        servaddr.sin6_scope_id = 0;
        inet_pton(AF_INET6, "::1", &servaddr.sin6_addr);

And then request a connect to:
        connaddr.sin6_family = AF_INET6;
        connaddr.sin6_port = htons(20000);
        connaddr.sin6_scope_id = if_nametoindex("lo");
        inet_pton(AF_INET6, "fe88::1", &connaddr.sin6_addr);

What the stack does is:
 - bind the socket
 - create a new asoc
 - to handle the connect
   - copy the addresses that can be used for the given scope
   - try to connect

But the copy returns 0 addresses, and the effect is that it ends up
trying to connect as if the socket wasn't bound, which is not the
desired behavior. This unexpected behavior also allows KASLR leaks
through SCTP diag interface.

The fix here then is, if when trying to copy the addresses that can
be used for the scope used in connect() it returns 0 addresses, bail
out. This is what TCP does with a similar reproducer.

Reported-by: Pietro Borrello <borrello@diag.uniroma1.it>
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/9fcd182f1099f86c6661f3717f63712ddd1c676c.1674496737.git.marcelo.leitner@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24 18:32:33 -08:00
Linus Torvalds 948ef7bb70 modules-6.2-rc6
There has been a fix we have been delaying for v6.2 due to lack of
 early testing on linux-next. The commit has been sitting on linux-next
 since December and testing has also been now a bit extensive by a few
 developers. Since this is a fix which definitely will go to v6.3 it
 should also apply to v6.2 so if there are any issues we pick them up
 earlier rather than later. The fix fixes a regression since v5.3, prior
 to me helping with module maintenance, however, the issue is real in
 that in the worst case now can prevent boot.
 
 We've discussed all possible corner cases [0] and at last do feel this is
 ready for v6.2-rc6.
 
 [0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPQRn8SHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinH9cP/344JxH/hzHb1XMY6DuFs2LJSrcA9t0D
 ZIV4DO/MHGc6NpgNAztbtKmj2S0UX7OMOF85MwlKKNklqi2Z0DyUc6Wu2wjeTTBp
 p5dqMdgtpl2BFdly5GuL7ttKXBp1Hd6Z9DVl+N5nWjrmLE/HsF4hiwTAXL6L10Fi
 +n65C1hwsgbnUVCgoh/4tPGjsB8n5CemR35LBahO5EWyjFdzMMb9MBuKYNmG2j64
 rh8DW6Wp5i647ysDJ13DI1/fsvMab28ZPE/JRM27Nuak9WYbzgz1zTpofRCv2tCi
 B03kXBRdrtDqWomdeB4FXNJdwK+IIfLkn7K/AEKagXgRIzF7byd93H9E29zuwywG
 8DmHxGpcOoyjhoYtc1oVTYSaYD5F1jYq/G4AWNrjcuR3kTiLMIsA/SQbpBe/uyI3
 mqUqe6MshNb3p1a9xVEfQITCXQWvJOWvBqJcQVUN5uI8cftVtEaq9qvHurVYBnCn
 tHNxMa7mP2YfZez6uXqeud4FXbEVchM07p0DCEbV59vRaQz3bse0CfB8txFadsFe
 UvREoo7vrgT1A8ysuvFsXUwskEq9rQ/6XeEqm+N/EqfbG2a3ArpxygNdjP11QjA9
 5MV12X2fU6x0Q0OHBCMWkxpDDnvu1tdSbQWA5lO6Zy755ya3KNoXeNOQNyqzqF6G
 JkA1xZHc5UvP
 =qa41
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module fix from Luis Chamberlain:
 "Theis is a fix we have been delaying for v6.2 due to lack of early
  testing on linux-next.

  The commit has been sitting in linux-next since December and testing
  has also been now a bit extensive by a few developers. Since this is a
  fix which definitely will go to v6.3 it should also apply to v6.2 so
  if there are any issues we pick them up earlier rather than later. The
  fix fixes a regression since v5.3, prior to me helping with module
  maintenance, however, the issue is real in that in the worst case now
  can prevent boot.

  We've discussed all possible corner cases [0] and at last do feel this
  is ready for v6.2-rc6"

Link https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/ [0]

* tag 'modules-6.2-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  module: Don't wait for GOING modules
2023-01-24 18:19:44 -08:00
Eric Dumazet ea4fdbaa2f net/sched: sch_taprio: do not schedule in taprio_reset()
As reported by syzbot and hinted by Vinicius, I should not have added
a qdisc_synchronize() call in taprio_reset()

taprio_reset() can be called with qdisc spinlock held (and BH disabled)
as shown in included syzbot report [1].

Only taprio_destroy() needed this synchronization, as explained
in the blamed commit changelog.

[1]

BUG: scheduling while atomic: syz-executor150/5091/0x00000202
2 locks held by syz-executor150/5091:
Modules linked in:
Preemption disabled at:
[<0000000000000000>] 0x0
Kernel panic - not syncing: scheduling while atomic: panic_on_warn set ...
CPU: 1 PID: 5091 Comm: syz-executor150 Not tainted 6.2.0-rc3-syzkaller-00219-g010a74f52203 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
panic+0x2cc/0x626 kernel/panic.c:318
check_panic_on_warn.cold+0x19/0x35 kernel/panic.c:238
__schedule_bug.cold+0xd5/0xfe kernel/sched/core.c:5836
schedule_debug kernel/sched/core.c:5865 [inline]
__schedule+0x34e4/0x5450 kernel/sched/core.c:6500
schedule+0xde/0x1b0 kernel/sched/core.c:6682
schedule_timeout+0x14e/0x2a0 kernel/time/timer.c:2167
schedule_timeout_uninterruptible kernel/time/timer.c:2201 [inline]
msleep+0xb6/0x100 kernel/time/timer.c:2322
qdisc_synchronize include/net/sch_generic.h:1295 [inline]
taprio_reset+0x93/0x270 net/sched/sch_taprio.c:1703
qdisc_reset+0x10c/0x770 net/sched/sch_generic.c:1022
dev_reset_queue+0x92/0x130 net/sched/sch_generic.c:1285
netdev_for_each_tx_queue include/linux/netdevice.h:2464 [inline]
dev_deactivate_many+0x36d/0x9f0 net/sched/sch_generic.c:1351
dev_deactivate+0xed/0x1b0 net/sched/sch_generic.c:1374
qdisc_graft+0xe4a/0x1380 net/sched/sch_api.c:1080
tc_modify_qdisc+0xb6b/0x19a0 net/sched/sch_api.c:1689
rtnetlink_rcv_msg+0x43e/0xca0 net/core/rtnetlink.c:6141
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2564
netlink_unicast_kernel net/netlink/af_netlink.c:1330 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1356
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1932
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x712/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmsg+0xf7/0x1c0 net/socket.c:2559
do_syscall_x64 arch/x86/entry/common.c:50 [inline]

Fixes: 3a415d59c1 ("net/sched: sch_taprio: fix possible use-after-free")
Link: https://lore.kernel.org/netdev/167387581653.2747.13878941339893288655.git-patchwork-notify@kernel.org/T/
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Link: https://lore.kernel.org/r/20230123084552.574396-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-24 18:17:29 -08:00
Linus Torvalds 246dc53fb2 Rust fixes for v6.2
A soundness fix:
 
  - Avoid evaluating arguments in 'pr_*' macros in 'unsafe' blocks.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmPQCLIACgkQGXyLc2ht
 IW1D5w/+KReYuziOU9pT9DvWg0qY1Fb9w8JQQxG/qKRtAzm3s3DQFFHRmfHnhW/L
 k2ESdGemAkWSplmjVYjbQVSHBfq50lhkMmnwzh6FS3StpeX4P65RdOiew6AJFjJC
 XpRzaDbpbaBWdiv3Xw5IQHOOnPqmyX+d8caX8P1IeUBMDFx+BhbfKL2saTRvnj1k
 PW0yAxDLxhiFph8Il98pLzrPRJ4TVUHHS0Bl505tMsWPGOXabCYbiGUCycXcjR5A
 7c1muA0EXF/5A2J1RzDJMYSUOBK3tKqYenXAh21WyaRbD5syglQazfBLFMhM4R6G
 ZQbr1hTVb2mEov9F3EfOvkx6lHilrCQedyr4bs54uhUjbGqdBFEXCKd+xWyyFp6M
 U9xxqRQ98Ex+IIbLIL+iNalDDQ/rWFlFonWwEZNsrgcEwsDIIJvqnd238sv1inOV
 Hx186F9jDLFGtsIGwj6xI2HwXpQQhU0Q/GQ+JfkXk7ZjtVQc0TQVDEf/3cHmUpgk
 u8ARDjUqRH4y35cH99Sclhztm47GbTFIUD3tAAgBb3JF5QP6igSTnlbml2VbdCf7
 plyleZbGwD5/RMv8XOuwGlzZQxykxRKFUrs4FnfpCpvD6wtMl+64j/aNrOb0vNuT
 iJbAef07FAlVvmVfQgQXeVcnO6S/Hh2a/d5dlccVmdDValff4C4=
 =01k+
 -----END PGP SIGNATURE-----

Merge tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux

Pull rust fix from Miguel Ojeda:

 - Avoid evaluating arguments in 'pr_*' macros in 'unsafe' blocks

* tag 'rust-fixes-6.2' of https://github.com/Rust-for-Linux/linux:
  rust: print: avoid evaluating arguments in `pr_*` macros in `unsafe` blocks
2023-01-24 17:54:25 -08:00
Linus Torvalds b2f317173e ARM64:
- Pass the correct address to mte_clear_page_tags() on initialising
   a tagged page
 
 - Plug a race against a GICv4.1 doorbell interrupt while saving
   the vgic-v3 pending state.
 
 x86:
 
 - A command line parsing fix and a clang compilation fix for selftests
 
 - A fix for a longstanding VMX issue, that surprisingly was only found
   now to affect real world guests
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmPM/foUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroM18Af/ZygTp0zd0+ZEqI8lu6hi9MmL7pKu
 CbzjuJUD7iw8fUGZDyYpL7CrcAdQX7JC6cRjBQMq+9Zzh+QBc1SkkBoEwpHy/EoR
 xPOSlNmZGM3kQssqHhwC5ciLNYQQ9yEMAw0kTIoOw3/Aznjk70PUzjwIFC5fRTAB
 +ScOQj+9hkr9bzNTnIxY50Ewt6kwiZ7BEbL3a6CHCvkFkLnUAjwp/Ci6dIsqXsae
 Stlq/ZJi9QYw5Od4C0e63pfSG3MniaVT3aqisB3dEi8I4Tcpbsh7MaJf43ImFm56
 jEymmu/FYWXyMpV2Dlt3703SstXO8V9lVztsnbOVgU7/TEjFD5ADUOi7Dg==
 =WKnF
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "ARM64:

   - Pass the correct address to mte_clear_page_tags() on initialising a
     tagged page

   - Plug a race against a GICv4.1 doorbell interrupt while saving the
     vgic-v3 pending state.

  x86:

   - A command line parsing fix and a clang compilation fix for
     selftests

   - A fix for a longstanding VMX issue, that surprisingly was only
     found now to affect real world guests"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: selftests: Make reclaim_period_ms input always be positive
  KVM: x86/vmx: Do not skip segment attributes if unusable bit is set
  selftests: kvm: move declaration at the beginning of main()
  KVM: arm64: GICv4.1: Fix race with doorbell on VPE activation/deactivation
  KVM: arm64: Pass the actual page address to mte_clear_page_tags()
2023-01-24 17:48:09 -08:00
Linus Torvalds 02db81a787 SCSI fixes on 20230123
Six fixes, all in drivers.  The biggest are the UFS devfreq fixes
 which address a lock inversion and the two iscsi_tcp fixes which try
 to prevent a use after free from userspace still accessing an area
 which the kernel has released (seen by KASAN).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCY87V4iYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRl+AQDWchrR
 xwIlYfLEZ9wbIsJWTpxIFBxf2UpRY1CAM94KswEA49R/UA4qhrjdNsf8+Lkss5Fb
 fSlWJnhlQ3YruMCmsfY=
 =tm6h
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Six fixes, all in drivers.

  The biggest are the UFS devfreq fixes which address a lock inversion
  and the two iscsi_tcp fixes which try to prevent a use after free from
  userspace still accessing an area which the kernel has released (seen
  by KASAN)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: device_handler: alua: Remove a might_sleep() annotation
  scsi: iscsi_tcp: Fix UAF during login when accessing the shost ipaddress
  scsi: iscsi_tcp: Fix UAF during logout when accessing the shost ipaddress
  scsi: ufs: core: Fix devfreq deadlocks
  scsi: hpsa: Fix allocation size for scsi_host_alloc()
  scsi: target: core: Fix warning on RT kernels
2023-01-24 17:42:53 -08:00
Linus Torvalds fb6e71db53 nfsd-6.2 fixes:
- Nail another UAF in NFSD's filecache
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmPPYLoACgkQM2qzM29m
 f5cFXhAAmSn3h41br0tW0vn3fkDVqJpY5y1GsT31llT833CvxoEG+dERWmfFqwaT
 rfNAnfFJjJdOmLEos2KmkABP/9HLUHo3ePgqS9MXEDouHPVdnPEKLYNxB+kp/535
 +NUDEm7HrcxnctZEcWdGuprmdbSexZeE4ng2lEmbvaiWRQRhBoJS59iM2YfHcN77
 7bVz0jrCEYklGSwtfN0wzq9O4VeFPzRhESfycV1LV4ZvUwTNd5vGl1zBWs9ydxWN
 kBET/222Bd1rGuvoNFEWcK/dQFDtPrz1tiXH06IHthPvd70BP1z25sOmNfcQHrPp
 7gfGJD03PnC2CPVg8Uuou2e1/Je3/Ib+3V2cQJwUWWVWw1GDdwWrk3LG4+esRbdv
 OP2qT0dw5uHOuoECwehc/mDyYv2QIIzkXUjxlMNL2WqCxXlKgxO/4lpcvryMlbw6
 WHcMV9miCzkA1bK2d8QNisqkNTIQBsWzfrMbXZ9zeQnahrz981Y25OYdXjYIbRyC
 itliKYty4L9mS0z2gu5Y6WNBTk9bWItkq2GIIhjWo3K4UAccgfQSn+f6rXX5wNjP
 M1P2+QTtb3fMyepbYyDH0KM3wOtROA1MycFvWLSt9sobwiIa/Mt/1mMfcxHtdFEB
 85rDxB+zeWqXA5xbzowI3KcmkuHta1QLfBXY9f4x5nLKFduGwAM=
 =1Wj9
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fix from Chuck Lever:

 - Nail another UAF in NFSD's filecache

* tag 'nfsd-6.2-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't free files unconditionally in __nfsd_file_cache_purge
2023-01-24 12:58:47 -08:00
Linus Torvalds 50306df38a Update the MAINTAINERS file entry for fscrypt
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCY88exxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2GqAQD5N14vXZ7Xpn2B4pAK1bD9QTBpzFdD
 NC+iC7Da9euEswD/WBCOw92Ce9N5IV3Yea9M5TsNTBF459+7F1N85TLjEg4=
 =hYmO
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt MAINTAINERS entry update from Eric Biggers:
 "Update the MAINTAINERS file entry for fscrypt"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  MAINTAINERS: update fscrypt git repo
2023-01-24 12:53:26 -08:00
Petr Pavlu 0254127ab9 module: Don't wait for GOING modules
During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.

Since commit 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.

This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.

This patch attempts a different solution for the problem 6e6de3dee5
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee5 (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.

This has been tested on linux-next since December 2022 and passes
all kmod selftests except test 0009 with module compression enabled
but it has been confirmed that this issue has existed and has gone
unnoticed since prior to this commit and can also be reproduced without
module compression with a simple usleep(5000000) on tools/modprobe.c [0].
These failures are caused by hitting the kernel mod_concurrent_max and can
happen either due to a self inflicted kernel module auto-loead DoS somehow
or on a system with large CPU count and each CPU count incorrectly triggering
many module auto-loads. Both of those issues need to be fixed in-kernel.

[0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/

Fixes: 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Co-developed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Petr Mladek <pmladek@suse.com>
[mcgrof: enhance commit log with testing and kmod test result interpretation ]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-24 12:52:52 -08:00
Linus Torvalds 5149394c89 Update the MAINTAINERS file entry for fsverity
-----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCY88euhQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+yQAP4wWnC29X3t6kQC4+T2hlw+MOuZBdfd
 dm70qTNd6itL7QD8DToDCGm6gt6IqunjIllUBGfEU2oyeKU5MT7SVITfnAo=
 =4W3N
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity MAINTAINERS entry update from Eric Biggers:
 "Update the MAINTAINERS file entry for fsverity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  MAINTAINERS: update fsverity git repo, list, and patchwork
2023-01-24 12:51:49 -08:00
Linus Torvalds 854f0912f8 ext4: make xattr char unsignedness in hash explicit
Commit f3bbac3247 ("ext4: deal with legacy signed xattr name hash
values") added a hashing function for the legacy case of having the
xattr hash calculated using a signed 'char' type.  It left the unsigned
case alone, since it's all implicitly handled by the '-funsigned-char'
compiler option.

However, there's been some noise about back-porting it all into stable
kernels that lack the '-funsigned-char', so let's just make that at
least possible by making the whole 'this uses unsigned char' very
explicit in the code itself.  Whether such a back-port is really
warranted or not, I'll leave to others, but at least together with this
change it is technically sensible.

Also, add a 'pr_warn_once()' for reporting the "hey, signedness for this
hash calculation has changed" issue.  Hopefully it never triggers except
for that xfstests generic/454 test-case, but even if it does it's just
good information to have.

If for no other reason than "we can remove the legacy signed hash code
entirely if nobody ever sees the message any more".

Cc: Sasha Levin <sashal@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Cc: Jason Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-01-24 12:38:45 -08:00
Paolo Abeni d968117a7e Revert "Merge branch 'ethtool-mac-merge'"
This reverts commit 0ad999c1ee, reversing
changes made to e38553bdc3.

It was not intended for net.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24 17:44:14 +01:00
Christian Brauner facd61053c
fuse: fixes after adapting to new posix acl api
This cycle we ported all filesystems to the new posix acl api. While
looking at further simplifications in this area to remove the last
remnants of the generic dummy posix acl handlers we realized that we
regressed fuse daemons that don't set FUSE_POSIX_ACL but still make use
of posix acls.

With the change to a dedicated posix acl api interacting with posix acls
doesn't go through the old xattr codepaths anymore and instead only
relies the get acl and set acl inode operations.

Before this change fuse daemons that don't set FUSE_POSIX_ACL were able
to get and set posix acl albeit with two caveats. First, that posix acls
aren't cached. And second, that they aren't used for permission checking
in the vfs.

We regressed that use-case as we currently refuse to retrieve any posix
acls if they aren't enabled via FUSE_POSIX_ACL. So older fuse daemons
would see a change in behavior.

We can restore the old behavior in multiple ways. We could change the
new posix acl api and look for a dedicated xattr handler and if we find
one prefer that over the dedicated posix acl api. That would break the
consistency of the new posix acl api so we would very much prefer not to
do that.

We could introduce a new ACL_*_CACHE sentinel that would instruct the
vfs permission checking codepath to not call into the filesystem and
ignore acls.

But a more straightforward fix for v6.2 is to do the same thing that
Overlayfs does and give fuse a separate get acl method for permission
checking. Overlayfs uses this to express different needs for vfs
permission lookup and acl based retrieval via the regular system call
path as well. Let fuse do the same for now. This way fuse can continue
to refuse to retrieve posix acls for daemons that don't set
FUSE_POSXI_ACL for permission checking while allowing a fuse server to
retrieve it via the usual system calls.

In the future, we could extend the get acl inode operation to not just
pass a simple boolean to indicate rcu lookup but instead make it a flag
argument. Then in addition to passing the information that this is an
rcu lookup to the filesystem we could also introduce a flag that tells
the filesystem that this is a request from the vfs to use these acls for
permission checking. Then fuse could refuse the get acl request for
permission checking when the daemon doesn't have FUSE_POSIX_ACL set in
the same get acl method. This would also help Overlayfs and allow us to
remove the second method for it as well.

But since that change is more invasive as we need to update the get acl
inode operation for multiple filesystems we should not do this as a fix
for v6.2. Instead we will do this for the v6.3 merge window.

Fwiw, since posix acls are now always correctly translated in the new
posix acl api we could also allow them to be used for daemons without
FUSE_POSIX_ACL that are not mounted on the host. But this is behavioral
change and again if dones should be done for v6.3. For now, let's just
restore the original behavior.

A nice side-effect of this change is that for fuse daemons with and
without FUSE_POSIX_ACL the same code is used for posix acls in a
backwards compatible way. This also means we can remove the legacy xattr
handlers completely. We've also added comments to explain the expected
behavior for daemons without FUSE_POSIX_ACL into the code.

Fixes: 318e66856d ("xattr: use posix acl api")
Signed-off-by: Seth Forshee (Digital Ocean) <sforshee@kernel.org>
Reviewed-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-01-24 16:33:37 +01:00
Kuniyuki Iwashima 409db27e3a netrom: Fix use-after-free of a listening socket.
syzbot reported a use-after-free in do_accept(), precisely nr_accept()
as sk_prot_alloc() allocated the memory and sock_put() frees it. [0]

The issue could happen if the heartbeat timer is fired and
nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket
has SOCK_DESTROY or a listening socket has SOCK_DEAD.

In this case, the first condition cannot be true.  SOCK_DESTROY is
flagged in nr_release() only when the file descriptor is close()d,
but accept() is being called for the listening socket, so the second
condition must be true.

Usually, the AF_NETROM listener neither starts timers nor sets
SOCK_DEAD.  However, the condition is met if connect() fails before
listen().  connect() starts the t1 timer and heartbeat timer, and
t1timer calls nr_disconnect() when timeout happens.  Then, SOCK_DEAD
is set, and if we call listen(), the heartbeat timer calls
nr_destroy_socket().

  nr_connect
    nr_establish_data_link(sk)
      nr_start_t1timer(sk)
    nr_start_heartbeat(sk)
                                    nr_t1timer_expiry
                                      nr_disconnect(sk, ETIMEDOUT)
                                        nr_sk(sk)->state = NR_STATE_0
                                        sk->sk_state = TCP_CLOSE
                                        sock_set_flag(sk, SOCK_DEAD)
nr_listen
  if (sk->sk_state != TCP_LISTEN)
    sk->sk_state = TCP_LISTEN
                                    nr_heartbeat_expiry
                                      switch (nr->state)
                                      case NR_STATE_0
                                        if (sk->sk_state == TCP_LISTEN &&
                                            sock_flag(sk, SOCK_DEAD))
                                          nr_destroy_socket(sk)

This path seems expected, and nr_destroy_socket() is called to clean
up resources.  Initially, there was sock_hold() before nr_destroy_socket()
so that the socket would not be freed, but the commit 517a16b1a8
("netrom: Decrease sock refcount when sock timers expire") accidentally
removed it.

To fix use-after-free, let's add sock_hold().

[0]:
BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848
Read of size 8 at addr ffff88807978d398 by task syz-executor.3/5315

CPU: 0 PID: 5315 Comm: syz-executor.3 Not tainted 6.2.0-rc3-syzkaller-00165-gd9fc1511728c #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:306 [inline]
 print_report+0x15e/0x461 mm/kasan/report.c:417
 kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
 do_accept+0x483/0x510 net/socket.c:1848
 __sys_accept4_file net/socket.c:1897 [inline]
 __sys_accept4+0x9a/0x120 net/socket.c:1927
 __do_sys_accept net/socket.c:1944 [inline]
 __se_sys_accept net/socket.c:1941 [inline]
 __x64_sys_accept+0x75/0xb0 net/socket.c:1941
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fa436a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fa437784168 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 00007fa436bac050 RCX: 00007fa436a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
RBP: 00007fa436ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffebc6700df R14: 00007fa437784300 R15: 0000000000022000
 </TASK>

Allocated by task 5294:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 ____kasan_kmalloc mm/kasan/common.c:371 [inline]
 ____kasan_kmalloc mm/kasan/common.c:330 [inline]
 __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
 kasan_kmalloc include/linux/kasan.h:211 [inline]
 __do_kmalloc_node mm/slab_common.c:968 [inline]
 __kmalloc+0x5a/0xd0 mm/slab_common.c:981
 kmalloc include/linux/slab.h:584 [inline]
 sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
 sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
 nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
 __sock_create+0x359/0x790 net/socket.c:1515
 sock_create net/socket.c:1566 [inline]
 __sys_socket_create net/socket.c:1603 [inline]
 __sys_socket_create net/socket.c:1588 [inline]
 __sys_socket+0x133/0x250 net/socket.c:1636
 __do_sys_socket net/socket.c:1649 [inline]
 __se_sys_socket net/socket.c:1647 [inline]
 __x64_sys_socket+0x73/0xb0 net/socket.c:1647
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 14:
 kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
 kasan_set_track+0x25/0x30 mm/kasan/common.c:52
 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
 ____kasan_slab_free mm/kasan/common.c:236 [inline]
 ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
 kasan_slab_free include/linux/kasan.h:177 [inline]
 __cache_free mm/slab.c:3394 [inline]
 __do_kmem_cache_free mm/slab.c:3580 [inline]
 __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
 sk_prot_free net/core/sock.c:2074 [inline]
 __sk_destruct+0x5df/0x750 net/core/sock.c:2166
 sk_destruct net/core/sock.c:2181 [inline]
 __sk_free+0x175/0x460 net/core/sock.c:2192
 sk_free+0x7c/0xa0 net/core/sock.c:2203
 sock_put include/net/sock.h:1991 [inline]
 nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148
 call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700
 expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751
 __run_timers kernel/time/timer.c:2022 [inline]
 __run_timers kernel/time/timer.c:1995 [inline]
 run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
 __do_softirq+0x1fb/0xadc kernel/softirq.c:571

Fixes: 517a16b1a8 ("netrom: Decrease sock refcount when sock timers expire")
Reported-by: syzbot+5fafd5cfe1fc91f6b352@syzkaller.appspotmail.com
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230120231927.51711-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-01-24 11:54:01 +01:00
Sriram Yagnaraman a44b765148 netfilter: conntrack: unify established states for SCTP paths
An SCTP endpoint can start an association through a path and tear it
down over another one. That means the initial path will not see the
shutdown sequence, and the conntrack entry will remain in ESTABLISHED
state for 5 days.

By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
ESTABLISHED state, there remains no difference between a primary or
secondary path. The timeout for the merged ESTABLISHED state is set to
210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
path doesn't see the shutdown sequence, it will expire in a reasonable
amount of time.

With this change in place, there is now more than one state from which
we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
handle the setting of ASSURED bit whenever a state change has happened
and the new state is ESTABLISHED. Removed the check for dir==REPLY since
the transition to ESTABLISHED can happen only in the reply direction.

Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24 09:52:52 +01:00
Sriram Yagnaraman 13bd9b31a9 Revert "netfilter: conntrack: add sctp DATA_SENT state"
This reverts commit (bff3d0534804: "netfilter: conntrack: add sctp
DATA_SENT state")

Using DATA/SACK to detect a new connection on secondary/alternate paths
works only on new connections, while a HEARTBEAT is required on
connection re-use. It is probably consistent to wait for HEARTBEAT to
create a secondary connection in conntrack.

Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24 09:52:32 +01:00
Sriram Yagnaraman 98ee007745 netfilter: conntrack: fix bug in for_each_sctp_chunk
skb_header_pointer() will return NULL if offset + sizeof(_sch) exceeds
skb->len, so this offset < skb->len test is redundant.

if sch->length == 0, this will end up in an infinite loop, add a check
for sch->length > 0

Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24 09:52:31 +01:00
Sriram Yagnaraman a9993591fa netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE
RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
MUST be accepted if the vtag of the packet matches its own tag and the
T bit is not set OR if it is set to its peer's vtag and the T bit is set
in chunk flags. Otherwise the packet MUST be silently dropped.

Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
description.

Fixes: 9fb9cbb108 ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Sriram Yagnaraman <sriram.yagnaraman@est.tech>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-24 09:52:31 +01:00
Jakub Kicinski 208a21107e Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-01-20 (iavf)

This series contains updates to iavf driver only.

Michal Schmidt converts single iavf workqueue to per adapter to avoid
deadlock issues.

Marcin moves setting of VLAN related netdev features to watchdog task to
avoid RTNL deadlock.

Stefan Assmann schedules immediate watchdog task execution on changing
primary MAC to avoid excessive delay.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  iavf: schedule watchdog immediately when changing primary MAC
  iavf: Move netdev_update_features() into watchdog task
  iavf: fix temporary deadlock and failure to set MAC address
====================

Link: https://lore.kernel.org/r/20230120211036.430946-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 22:36:59 -08:00
Jakub Kicinski 571cca79df Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix overlap detection in rbtree set backend: Detect overlap by going
   through the ordered list of valid tree nodes. To shorten the number of
   visited nodes in the list, this algorithm descends the tree to search
   for an existing element greater than the key value to insert that is
   greater than the new element.

2) Fix for the rbtree set garbage collector: Skip inactive and busy
   elements when checking for expired elements to avoid interference
   with an ongoing transaction from control plane.

This is a rather large fix coming at this stage of the 6.2-rc. Since
33c7aba0b4 ("netfilter: nf_tables: do not set up extensions for end
interval"), bogus overlap errors in the rbtree set occur more frequently.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
  netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
====================

Link: https://lore.kernel.org/r/20230123211601.292930-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:50:58 -08:00
Mat Martineau bce4affe30 MAINTAINERS: Update MPTCP maintainer list and CREDITS
My responsibilities at Intel have changed, so I'm handing off exclusive
MPTCP subsystem maintainer duties to Matthieu. It has been a privilege
to see MPTCP through its initial upstreaming and first few years in the
upstream kernel!

Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20230120231121.36121-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:42:13 -08:00
Alexandru Tachici 8a4f6d0232 net: ethernet: adi: adin1110: Fix multicast offloading
Driver marked broadcast/multicast frames as offloaded incorrectly.
Mark them as offloaded only when HW offloading has been enabled.
This should happen only for ADIN2111 when both ports are bridged
by the software.

Fixes: bc93e19d08 ("net: ethernet: adi: Add ADIN1110 support")
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20230120090846.18172-1-alexandru.tachici@analog.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:41:33 -08:00
Ahmad Fatoum 360fdc999d net: dsa: microchip: fix probe of I2C-connected KSZ8563
Starting with commit eee16b1471 ("net: dsa: microchip: perform the
compatibility check for dev probed"), the KSZ switch driver now bails
out if it thinks the DT compatible doesn't match the actual chip ID
read back from the hardware:

  ksz9477-switch 1-005f: Device tree specifies chip KSZ9893 but found
  KSZ8563, please fix it!

For the KSZ8563, which used ksz_switch_chips[KSZ9893], this was fine
at first, because it indeed shares the same chip id as the KSZ9893.

Commit b449080956 ("net: dsa: microchip: add separate struct
ksz_chip_data for KSZ8563 chip") started differentiating KSZ9893
compatible chips by consulting the 0x1F register. The resulting breakage
was fixed for the SPI driver in the same commit by introducing the
appropriate ksz_switch_chips[KSZ8563], but not for the I2C driver.

Fix this for I2C-connected KSZ8563 now to get it probing again.

Fixes: b449080956 ("net: dsa: microchip: add separate struct ksz_chip_data for KSZ8563 chip").
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Acked-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20230120110933.1151054-1-a.fatoum@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:40:54 -08:00
Eric Dumazet 5e9398a26a ipv4: prevent potential spectre v1 gadget in fib_metrics_match()
if (!type)
        continue;
    if (type > RTAX_MAX)
        return false;
    ...
    fi_val = fi->fib_metrics->metrics[type - 1];

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 5f9ae3d9e7 ("ipv4: do metrics match when looking up and deleting a route")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:37:39 -08:00
Eric Dumazet 1d1d63b612 ipv4: prevent potential spectre v1 gadget in ip_metrics_convert()
if (!type)
		continue;
	if (type > RTAX_MAX)
		return -EINVAL;
	...
	metrics[type - 1] = val;

@type being used as an array index, we need to prevent
cpu speculation or risk leaking kernel memory content.

Fixes: 6cf9dfd3bd ("net: fib: move metrics parsing to a helper")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230120133040.3623463-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:37:25 -08:00
Jakub Kicinski d6ab640c21 Merge branch 'netlink-annotate-various-data-races'
Eric Dumazet says:

====================
netlink: annotate various data races

A recent syzbot report came to my attention.

After addressing it, I also fixed other related races.
====================

Link: https://lore.kernel.org/r/20230120125955.3453768-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:35:56 -08:00
Eric Dumazet 9b663b5cbb netlink: annotate data races around sk_state
netlink_getsockbyportid() reads sk_state while a concurrent
netlink_connect() can change its value.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:35:53 -08:00
Eric Dumazet 004db64d18 netlink: annotate data races around dst_portid and dst_group
netlink_getname(), netlink_sendmsg() and netlink_getsockbyportid()
can read nlk->dst_portid and nlk->dst_group while another
thread is changing them.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:35:53 -08:00
Eric Dumazet c1bb9484e3 netlink: annotate data races around nlk->portid
syzbot reminds us netlink_getname() runs locklessly [1]

This first patch annotates the race against nlk->portid.

Following patches take care of the remaining races.

[1]
BUG: KCSAN: data-race in netlink_getname / netlink_insert

write to 0xffff88814176d310 of 4 bytes by task 2315 on cpu 1:
netlink_insert+0xf1/0x9a0 net/netlink/af_netlink.c:583
netlink_autobind+0xae/0x180 net/netlink/af_netlink.c:856
netlink_sendmsg+0x444/0x760 net/netlink/af_netlink.c:1895
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg net/socket.c:734 [inline]
____sys_sendmsg+0x38f/0x500 net/socket.c:2476
___sys_sendmsg net/socket.c:2530 [inline]
__sys_sendmsg+0x19a/0x230 net/socket.c:2559
__do_sys_sendmsg net/socket.c:2568 [inline]
__se_sys_sendmsg net/socket.c:2566 [inline]
__x64_sys_sendmsg+0x42/0x50 net/socket.c:2566
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff88814176d310 of 4 bytes by task 2316 on cpu 0:
netlink_getname+0xcd/0x1a0 net/netlink/af_netlink.c:1144
__sys_getsockname+0x11d/0x1b0 net/socket.c:2026
__do_sys_getsockname net/socket.c:2041 [inline]
__se_sys_getsockname net/socket.c:2038 [inline]
__x64_sys_getsockname+0x3e/0x50 net/socket.c:2038
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x00000000 -> 0xc9a49780

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 2316 Comm: syz-executor.2 Not tainted 6.2.0-rc3-syzkaller-00030-ge8f60cd7db24-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-23 21:35:53 -08:00
Pablo Neira Ayuso 5d235d6ce7 netfilter: nft_set_rbtree: skip elements in transaction from garbage collection
Skip interference with an ongoing transaction, do not perform garbage
collection on inactive elements. Reset annotated previous end interval
if the expired element is marked as busy (control plane removed the
element right before expiration).

Fixes: 8d8540c4f5 ("netfilter: nft_set_rbtree: add timeout support")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-23 21:38:33 +01:00
Pablo Neira Ayuso c9e6978e27 netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
...instead of a tree descent, which became overly complicated in an
attempt to cover cases where expired or inactive elements would affect
comparisons with the new element being inserted.

Further, it turned out that it's probably impossible to cover all those
cases, as inactive nodes might entirely hide subtrees consisting of a
complete interval plus a node that makes the current insertion not
overlap.

To speed up the overlap check, descent the tree to find a greater
element that is closer to the key value to insert. Then walk down the
node list for overlap detection. Starting the overlap check from
rb_first() unconditionally is slow, it takes 10 times longer due to the
full linear traversal of the list.

Moreover, perform garbage collection of expired elements when walking
down the node list to avoid bogus overlap reports.

For the insertion operation itself, this essentially reverts back to the
implementation before commit 7c84d41416 ("netfilter: nft_set_rbtree:
Detect partial overlaps on insertion"), except that cases of complete
overlap are already handled in the overlap detection phase itself, which
slightly simplifies the loop to find the insertion point.

Based on initial patch from Stefano Brivio, including text from the
original patch description too.

Fixes: 7c84d41416 ("netfilter: nft_set_rbtree: Detect partial overlaps on insertion")
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-01-23 21:36:38 +01:00
Linus Torvalds 7bf70dbb18 VFIO fixes for v6.2-rc6
- Honor reserved regions when testing for IOMMU find grained super
    page support, avoiding a regression on s390 for a firmware device
    where the existence of the mapping, even if unused can trigger
    an error state. (Niklas Schnelle)
 
  - Fix a deadlock in releasing KVM references by using the alternate
    .release() rather than .destroy() callback for the kvm-vfio device.
    (Yi Liu)
 -----BEGIN PGP SIGNATURE-----
 
 iQJPBAABCAA5FiEEQvbATlQL0amee4qQI5ubbjuwiyIFAmPOwUUbHGFsZXgud2ls
 bGlhbXNvbkByZWRoYXQuY29tAAoJECObm247sIsicDAQAJSMkMjCkkMQRZaPhD3O
 k3/5F2cHC73zY5ldxy+a5oq7na00jkyL3+cCIgZBN7NrADPTjpgPGruhw0zhrzIw
 17UvKZZYznLK7x4iapKXZEeRN6HgtUWgeYzGj43lIsiQJtablrcIYbExnCfwIOUo
 APgh0crV2RJ6F7pCVuyYCJefKPDjgsrMO8YKJrVh6f6Avv445kwtk2xpIgSwp0C6
 O2b/AkysT8Q2bwD7XnHrMIKJ7on2qUcfMJFgYJQ8DjDzbXw+NCW4YizcqNc3/DWM
 SoPaSuNZqEbu8Q3cZuQn7uafwL6FTk9WoOef7RowSvO3dn3RA3B61hO+heYukudl
 APz2dgAXPnt0fqIadyiFKDrjXcIwgmi29Xb51mAiJbKMYogmrmBhY6jVPCSOoilv
 heYEvDxqwD/AiCBzuJP3Dqpc21Xq4kN4jePVFh4aR3Jd+vBITK0EIktonALhnMH+
 2ik8FQ9L/HefssytcsXjtbO5K778+OsTP3Bhdbsj6qjGDXHaOIjQzJXnxeT/Uysm
 5KLjNRpeRjeIRgsiOB1L8bDyD2bR7SbZWcvw3Z6E9NMY719Txvs19X6OQ9nQd90b
 OPJrgVnxCZDMegy7yi68/pyOBSQduo75AgKbNdQa9Nyf92LbfL1HeBkF6+NxjAh5
 SQDZYxlWtKHXH0l7yWkksn2G
 =tdng
 -----END PGP SIGNATURE-----

Merge tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Honor reserved regions when testing for IOMMU find grained super page
   support, avoiding a regression on s390 for a firmware device where
   the existence of the mapping, even if unused can trigger an error
   state. (Niklas Schnelle)

 - Fix a deadlock in releasing KVM references by using the alternate
   .release() rather than .destroy() callback for the kvm-vfio device.
   (Yi Liu)

* tag 'vfio-v6.2-rc6' of https://github.com/awilliam/linux-vfio:
  kvm/vfio: Fix potential deadlock on vfio group_lock
  vfio/type1: Respect IOMMU reserved regions in vfio_test_domain_fgsp()
2023-01-23 11:56:07 -08:00