Commit Graph

69453 Commits

Author SHA1 Message Date
Johannes Berg bfd8403add wifi: mac80211: reorg some iface data structs for MLD
Start reorganizing interface related data structures toward
MLD. The most complex part here is for the keys, since we
have to split the various kinds of GTKs off to the link but
still need to use (for WEP) the other keys as a fallback
even for multicast frames.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20 12:55:06 +02:00
Johannes Berg f276e20b18 wifi: mac80211: move interface config to new struct
We'll use bss_conf for per-link configuration later, so
move out all the non-link-specific data out into a new
struct ieee80211_vif_cfg used in the vif.

Some adjustments were done with the following spatch:

    @@
    expression sdata;
    struct ieee80211_vif *vifp;
    identifier var = { assoc, ibss_joined, aid, arp_addr_list, arp_addr_cnt, ssid, ssid_len, s1g, ibss_creator };
    @@
    (
    -sdata->vif.bss_conf.var
    +sdata->vif.cfg.var
    |
    -vifp->bss_conf.var
    +vifp->cfg.var
    )

    @bss_conf@
    struct ieee80211_bss_conf *bss_conf;
    identifier var = { assoc, ibss_joined, aid, arp_addr_list, arp_addr_cnt, ssid, ssid_len, s1g, ibss_creator };
    @@
    -bss_conf->var
    +vif_cfg->var

(though more manual fixups were needed, e.g. replacing
"vif_cfg->" by "vif->cfg." in many files.)

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20 12:55:03 +02:00
Johannes Berg d0a9123ef5 wifi: mac80211: move some future per-link data to bss_conf
To add MLD, reuse the bss_conf structure later for per-link
information, so move some things into it that are per link.

Most transformations were done with the following spatch:

    @@
    expression sdata;
    identifier var = { chanctx_conf, mu_mimo_owner, csa_active, color_change_active, color_change_color };
    @@
    -sdata->vif.var
    +sdata->vif.bss_conf.var

    @@
    struct ieee80211_vif *vif;
    identifier var = { chanctx_conf, mu_mimo_owner, csa_active, color_change_active, color_change_color };
    @@
    -vif->var
    +vif->bss_conf.var

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20 12:55:01 +02:00
Johannes Berg 7b0a0e3c3a wifi: cfg80211: do some rework towards MLO link APIs
In order to support multi-link operation with multiple links,
start adding some APIs. The notable addition here is to have
the link ID in a new nl80211 attribute, that will be used to
differentiate the links in many nl80211 operations.

So far, this patch adds the netlink NL80211_ATTR_MLO_LINK_ID
attribute (as well as the NL80211_ATTR_MLO_LINKS attribute)
and plugs it through the system in some places, checking the
validity etc. along with other infrastructure needed for it.

For now, I've decided to include only the over-the-air link
ID in the API. I know we discussed that we eventually need to
have to have other ways of identifying a link, but for local
AP mode and auth/assoc commands as well as set_key etc. we'll
use the OTA ID.

Also included in this patch is some refactoring of the data
structures in struct wireless_dev, splitting for the first
time the data into type dependent pieces, to make reasoning
about these things easier.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20 12:54:58 +02:00
Johannes Berg 92ea8df110 wifi: mac80211: reject WEP or pairwise keys with key ID > 3
We don't really care too much right now since our data
structures are set up to not have a problem with this,
but clearly it's wrong to accept WEP and pairwise keys
with key ID > 3.

However, with MLD we need to split into per-link (GTK,
IGTK, BIGTK) and per interface/MLD (including WEP) keys
so make sure this is not a problem.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-20 12:54:48 +02:00
Sieng Piaw Liew 49ae83fc4f net: don't check skb_count twice
NAPI cache skb_count is being checked twice without condition. Change to
checking the second time only if the first check is run.

Signed-off-by: Sieng Piaw Liew <liew.s.piaw@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-15 12:51:31 +01:00
Casper Andersson 2aa4abed37 net: bridge: allow add/remove permanent mdb entries on disabled ports
Adding mdb entries on disabled ports allows you to do setup before
accepting any traffic, avoiding any time where the port is not in the
multicast group.

Signed-off-by: Casper Andersson <casper.casan@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-15 09:35:21 +01:00
Marco Bonelli 19d62f5eea ethtool: Fix and simplify ethtool_convert_link_mode_to_legacy_u32()
Fix the implementation of ethtool_convert_link_mode_to_legacy_u32(), which
is supposed to return false if src has bits higher than 31 set. The current
implementation uses the complement of bitmap_fill(ext, 32) to test high
bits of src, which is wrong as bitmap_fill() fills _with long granularity_,
and sizeof(long) can be > 4. No users of this function currently check the
return value, so the bug was dormant.

Also remove the check for __ETHTOOL_LINK_MODE_MASK_NBITS > 32, as the enum
ethtool_link_mode_bit_indices contains far beyond 32 values. Using
find_next_bit() to test the src bitmask works regardless of this anyway.

Signed-off-by: Marco Bonelli <marco@mebeim.net>
Link: https://lore.kernel.org/r/20220609134900.11201-1-marco@mebeim.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-13 23:11:35 -07:00
Yajun Deng c04245328d net: make __sys_accept4_file() static
__sys_accept4_file() isn't used outside of the file, make it static.

As the same time, move file_flags and nofile parameters into
__sys_accept4_file().

Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:47:15 +01:00
Eric Dumazet 219160be49 tcp: sk_forced_mem_schedule() optimization
sk_memory_allocated_add() has three callers, and returns
to them @memory_allocated.

sk_forced_mem_schedule() is one of them, and ignores
the returned value.

Change sk_memory_allocated_add() to return void.

Change sock_reserve_memory() and __sk_mem_raise_allocated()
to call sk_memory_allocated().

This removes one cache line miss [1] for RPC workloads,
as first skbs in TCP write queue and receive queue go through
sk_forced_mem_schedule().

[1] Cache line holding tcp_memory_allocated.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-06-13 13:35:25 +01:00
Eric Dumazet 0f2c269398 net: unexport __sk_mem_{raise|reduce}_allocated
These two helpers are only used from core networking.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:27 -07:00
Eric Dumazet 4890b686f4 net: keep sk->sk_forward_alloc as small as possible
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast.

Each TCP socket can forward allocate up to 2 MB of memory, even after
flow became less active.

10,000 sockets can have reserved 20 GB of memory,
and we have no shrinker in place to reclaim that.

Instead of trying to reclaim the extra allocations in some places,
just keep sk->sk_forward_alloc values as small as possible.

This should not impact performance too much now we have per-cpu
reserves: Changes to tcp_memory_allocated should not be too frequent.

For sockets not using SO_RESERVE_MEM:
 - idle sockets (no packets in tx/rx queues) have zero forward alloc.
 - non idle sockets have a forward alloc smaller than one page.

Note:

 - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD
   is left to MPTCP maintainers as a follow up.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:27 -07:00
Eric Dumazet 0defbb0af7 net: add per_cpu_fw_alloc field to struct proto
Each protocol having a ->memory_allocated pointer gets a corresponding
per-cpu reserve, that following patches will use.

Instead of having reserved bytes per socket,
we want to have per-cpu reserves.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Eric Dumazet 100fdd1faf net: remove SK_MEM_QUANTUM and SK_MEM_QUANTUM_SHIFT
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE.

This might change in the future, but it seems better to avoid the
confusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 16:21:26 -07:00
Jakub Kicinski 5c281b4e52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 15:55:32 -07:00
Jakub Kicinski b97dcb8575 wireless-next patches for v5.20
Here's a first set of patches for v5.20. This is just a
 queue flush, before we get things back from net-next that
 are causing conflicts, and then can start merging a lot
 of MLO (multi-link operation, part of 802.11be) code.
 
 Lots of cleanups all over.
 
 The only notable change is perhaps wilc1000 being the
 first driver to disable WEP (while enabling WPA3).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEH1e1rEeCd0AIMq6MB8qZga/fl8QFAmKjUdAACgkQB8qZga/f
 l8RE0xAAhVNBB3r0n8bcZXNxmb/zswjyQcRV3BrSxRwfOGppB4iqHuTEx7U7iBOK
 9hMacse+myVlFNncWzGnOiZ9XIIElepPATfHXYPlVOrUO5AzqvtuuZG/6cBShO+G
 A1YrdVPYd87WiowTovY2x7tknZYMoQYeVeGmIMIEViM0RjULkXPC9AhpKbiHoV4I
 Ayn97E0j2+6R/gCtlhYTm0ASvzbVVoIB9cHMwvopzEXtsIjcE5Tglgrhygtw0FI3
 w2EZi5091c6IA2lc+kEmN2saAX72f6G3cewYID84/l8U2+VuwzdDUnXsyXYgGFF8
 UM47qizFSrwAn7eSiUNpLK0b8um/C2+ryBBUDrhbCvlR6/8shwvV1YMSX5eo00Av
 rPtC7/7wXF0ox8Os+FTTqAptyWDFQMI4dYkbQjZ4KsR7/jXssReIsYLLPlYGRgU5
 zemdd1onofZN4N9QXMtMxR7xwoKvPBRGqZa0YgnbSGF7dSjL+fleVlRwuhLZsWvb
 KJQyut9/InC9C2kKjsdK+bcv8lLmJE65PdFM5CZBLnEZvf7stOkeg2WcuqNSzjca
 VO7UIv8yQeJV2cpSBgmC4XchAU21r2rEzViz7PDLTFB9ZfYgcBIad9G10Mx5u11L
 2GHmDX5r2X1QD91nsTqOBCn0xO67jpcgxMpiGC31VReV7BTKvSc=
 =E0Dx
 -----END PGP SIGNATURE-----

Merge tag 'wireless-next-2022-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next

Johannes Berg says:

====================
wireless-next patches for v5.20

Here's a first set of patches for v5.20. This is just a
queue flush, before we get things back from net-next that
are causing conflicts, and then can start merging a lot
of MLO (multi-link operation, part of 802.11be) code.

Lots of cleanups all over.

The only notable change is perhaps wilc1000 being the
first driver to disable WEP (while enabling WPA3).

* tag 'wireless-next-2022-06-10' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (29 commits)
  wifi: mac80211_hwsim: Directly use ida_alloc()/free()
  wifi: mac80211: refactor some key code
  wifi: mac80211: remove cipher scheme support
  wifi: nl80211: fix typo in comment
  wifi: virt_wifi: fix typo in comment
  rtw89: add new state to CFO state machine for UL-OFDMA
  rtw89: 8852c: add trigger frame counter
  ieee80211: add trigger frame definition
  wifi: wfx: Remove redundant NULL check before release_firmware() call
  wifi: rtw89: support MULTI_BSSID and correct BSSID mask of H2C
  wifi: ray_cs: Drop useless status variable in parse_addr()
  wifi: ray_cs: Utilize strnlen() in parse_addr()
  wifi: rtw88: use %*ph to print small buffer
  wifi: wilc1000: add IGTK support
  wifi: wilc1000: add WPA3 SAE support
  wifi: wilc1000: remove WEP security support
  wifi: wilc1000: use correct sequence of RESET for chip Power-UP/Down
  wifi: rtlwifi: fix error codes in rtl_debugfs_set_write_h2c()
  wifi: rtw88: Fix Sparse warning for rtw8821c_hw_spec
  wifi: rtw88: Fix Sparse warning for rtw8723d_hw_spec
  ...
====================

Link: https://lore.kernel.org/r/20220610142838.330862-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-10 08:57:35 -07:00
Johannes Berg 8cbf0c2ab6 wifi: mac80211: refactor some key code
There's some pretty close code here, with the exception
of RCU dereference vs. protected dereference. Refactor
this to just return a pointer and then do the deref in
the caller later.

Change-Id: Ide5315e2792da6ac66eaf852293306a3ac71ced9
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10 16:12:57 +02:00
Johannes Berg 23a5f0af6f wifi: mac80211: remove cipher scheme support
The only driver using this was iwlwifi, where we just removed
the support because it was never really used. Remove the code
from mac80211 as well.

Change-Id: I1667417a5932315ee9d81f5c233c56a354923f09
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2022-06-10 15:35:53 +02:00
Jakub Kicinski 6cbd05b2d0 Merge tag 'ieee802154-for-net-next-2022-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:

====================
pull-request: ieee802154-next 2022-06-09

This is a separate pull request for 6lowpan changes. We agreed with the
bluetooth maintainers to switch the trees these changing are going into
from bluetooth to ieee802154.

Jukka Rissanen stepped down as a co-maintainer of 6lowpan (Thanks for the
work!). Alexander is staying as maintainer.

Alexander reworked the nhc_id lookup in 6lowpan to be way simpler.
Moved the data structure from rb to an array, which is all we need in this
case.

* tag 'ieee802154-for-net-next-2022-06-09' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
  MAINTAINERS: Remove Jukka Rissanen as 6lowpan maintainer
  net: 6lowpan: constify lowpan_nhc structures
  net: 6lowpan: use array for find nhc id
  net: 6lowpan: remove const from scalars
====================

Link: https://lore.kernel.org/r/20220609202956.1512156-1-stefan@datenfreihafen.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 23:21:29 -07:00
Andrea Mayer a3bd2102e4 net: seg6: fix seg6_lookup_any_nexthop() to handle VRFs using flowi_l3mdev
Commit 40867d74c3 ("net: Add l3mdev index to flow struct and avoid oif
reset for port devices") adds a new entry (flowi_l3mdev) in the common
flow struct used for indicating the l3mdev index for later rule and
table matching.
The l3mdev_update_flow() has been adapted to properly set the
flowi_l3mdev based on the flowi_oif/flowi_iif. In fact, when a valid
flowi_iif is supplied to the l3mdev_update_flow(), this function can
update the flowi_l3mdev entry only if it has not yet been set (i.e., the
flowi_l3mdev entry is equal to 0).

The SRv6 End.DT6 behavior in VRF mode leverages a VRF device in order to
force the routing lookup into the associated routing table. This routing
operation is performed by seg6_lookup_any_nextop() preparing a flowi6
data structure used by ip6_route_input_lookup() which, in turn,
(indirectly) invokes l3mdev_update_flow().

However, seg6_lookup_any_nexthop() does not initialize the new
flowi_l3mdev entry which is filled with random garbage data. This
prevents l3mdev_update_flow() from properly updating the flowi_l3mdev
with the VRF index, and thus SRv6 End.DT6 (VRF mode)/DT46 behaviors are
broken.

This patch correctly initializes the flowi6 instance allocated and used
by seg6_lookup_any_nexhtop(). Specifically, the entire flowi6 instance
is wiped out: in case new entries are added to flowi/flowi6 (as happened
with the flowi_l3mdev entry), we should no longer have incorrectly
initialized values. As a result of this operation, the value of
flowi_l3mdev is also set to 0.

The proposed fix can be tested easily. Starting from the commit
referenced in the Fixes, selftests [1],[2] indicate that the SRv6
End.DT6 (VRF mode)/DT46 behaviors no longer work correctly. By applying
this patch, those behaviors are back to work properly again.

[1] - tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
[2] - tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh

Fixes: 40867d74c3 ("net: Add l3mdev index to flow struct and avoid oif reset for port devices")
Reported-by: Anton Makarov <am@3a-alliance.com>
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220608091917.20345-1-andrea.mayer@uniroma2.it
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 22:04:47 -07:00
Eric Dumazet fd9ea57f4e net: add napi_get_frags_check() helper
This is a follow up of commit 3226b158e6
("net: avoid 32 x truesize under-estimation for tiny skbs")

When/if we increase MAX_SKB_FRAGS, we better make sure
the old bug will not come back.

Adding a check in napi_get_frags() would be costly,
even if using DEBUG_NET_WARN_ON_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:57 -07:00
Eric Dumazet ee2640df23 net: add debug checks in napi_consume_skb and __napi_alloc_skb()
Commit 6454eca81e ("net: Use lockdep_assert_in_softirq()
in napi_consume_skb()") added a check in napi_consume_skb()
which is a bit weak.

napi_consume_skb() and __napi_alloc_skb() should only
be used from BH context, not from hard irq or nmi context,
otherwise we could have races.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:56 -07:00
Eric Dumazet 7890e2f09d net: use DEBUG_NET_WARN_ON_ONCE() in skb_release_head_state()
Remove this check from fast path unless CONFIG_DEBUG_NET=y

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:56 -07:00
Eric Dumazet dd29c67dbb af_unix: use DEBUG_NET_WARN_ON_ONCE()
Replace four WARN_ON() that have not triggered recently
with DEBUG_NET_WARN_ON_ONCE().

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:56 -07:00
Eric Dumazet c59f02f848 net: use WARN_ON_ONCE() in sk_stream_kill_queues()
sk_stream_kill_queues() has three checks which have been
useful to detect kernel bugs in the past.

However they are potentially a problem because they
could flood the syslog.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:55 -07:00
Eric Dumazet 3e7f2b8d30 net: use WARN_ON_ONCE() in inet_sock_destruct()
inet_sock_destruct() has four warnings which have been
useful to point to kernel bugs in the past.

However they are potentially a problem because they
could flood the syslog.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:55 -07:00
Eric Dumazet 76458faeb2 net: use DEBUG_NET_WARN_ON_ONCE() in dev_loopback_xmit()
One check in dev_loopback_xmit() has not caught issues
in the past.

Keep it for CONFIG_DEBUG_NET=y builds only.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:55 -07:00
Eric Dumazet 63fbdd3c77 net: use DEBUG_NET_WARN_ON_ONCE() in __release_sock()
Check against skb dst in socket backlog has never triggered
in past years.

Keep the check omly for CONFIG_DEBUG_NET=y builds.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:54 -07:00
Eric Dumazet c6cce71e74 drop_monitor: adopt u64_stats_t
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:12 -07:00
Eric Dumazet 958751e080 devlink: adopt u64_stats_t
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:11 -07:00
Eric Dumazet 9962acefbc net: adopt u64_stats_t in struct pcpu_sw_netstats
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:11 -07:00
Eric Dumazet afd2051b18 ip6_tunnel: use dev_sw_netstats_rx_add()
We have a convenient helper, let's use it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:10 -07:00
Eric Dumazet 3a960ca7f6 sit: use dev_sw_netstats_rx_add()
We have a convenient helper, let's use it.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:10 -07:00
Eric Dumazet 09cca53c16 vlan: adopt u64_stats_t
As explained in commit 316580b69d ("u64_stats: provide u64_stats_t type")
we should use u64_stats_t and related accessors to avoid load/store tearing.

Add READ_ONCE() when reading rx_errors & tx_dropped.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:53:09 -07:00
Jakub Kicinski d62607c3fe net: rename reference+tracking helpers
Netdev reference helpers have a dev_ prefix for historic
reasons. Renaming the old helpers would be too much churn
but we can rename the tracking ones which are relatively
recent and should be the default for new code.

Rename:
 dev_hold_track()    -> netdev_hold()
 dev_put_track()     -> netdev_put()
 dev_replace_track() -> netdev_ref_replace()

Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:52:55 -07:00
Maxim Mikityanskiy b489a6e587 tls: Rename TLS_INFO_ZC_SENDFILE to TLS_INFO_ZC_TX
To embrace possible future optimizations of TLS, rename zerocopy
sendfile definitions to more generic ones:

* setsockopt: TLS_TX_ZEROCOPY_SENDFILE- > TLS_TX_ZEROCOPY_RO
* sock_diag: TLS_INFO_ZC_SENDFILE -> TLS_INFO_ZC_RO_TX

RO stands for readonly and emphasizes that the application shouldn't
modify the data being transmitted with zerocopy to avoid potential
disconnection.

Fixes: c1318b39c7 ("tls: Add opt-in zerocopy mode of sendfile()")
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Link: https://lore.kernel.org/r/20220608153425.3151146-1-maximmi@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 21:51:57 -07:00
Jakub Kicinski a98a62e456 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-09 16:38:15 -07:00
Alexander Aring f3de6f4ecc net: 6lowpan: constify lowpan_nhc structures
This patch constify the lowpan_nhc declarations. Since we drop the rb
node datastructure there is no need for runtime manipulation of this
structure.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Link: https://lore.kernel.org/r/20220428030534.3220410-4-aahringo@redhat.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09 21:53:28 +02:00
Alexander Aring 31264f9563 net: 6lowpan: use array for find nhc id
This patch will remove the complete overengineered and overthinking rb data
structure for looking up the nhc by nhcid. Instead we using the existing
nhc next header array and iterate over it. It works now for 1 byte values
only. However there are only 1 byte nhc id values currently
supported and IANA also does not specify large than 1 byte values yet.
If there are 2 byte values for nhc ids specified we can revisit this
data structure and add support for it.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Link: https://lore.kernel.org/r/20220428030534.3220410-3-aahringo@redhat.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09 21:53:28 +02:00
Alexander Aring eb9edf4366 net: 6lowpan: remove const from scalars
The keyword const makes no sense for scalar types inside the lowpan_nhc
structure. Most compilers will ignore it so we remove the keyword from
the scalar types.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Stefan Schmidt <stefan@datenfreihafen.org>
Acked-by: Jukka Rissanen <jukka.rissanen@linux.intel.com>
Link: https://lore.kernel.org/r/20220428030534.3220410-2-aahringo@redhat.com
Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
2022-06-09 21:53:28 +02:00
Linus Torvalds 825464e79d Networking fixes for 5.19-rc2, including fixes from bpf and netfilter.
Current release - regressions:
   - eth: amt: fix possible null-ptr-deref in amt_rcv()
 
 Previous releases - regressions:
   - tcp: use alloc_large_system_hash() to allocate table_perturb
 
   - af_unix: fix a data-race in unix_dgram_peer_wake_me()
 
   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
 
   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF
 
 Previous releases - always broken:
   - ipv6: fix signed integer overflow in __ip6_append_data
 
   - netfilter:
     - nat: really support inet nat without l3 address
     - nf_tables: memleak flow rule from commit path
 
   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs
 
   - openvswitch: fix misuse of the cached connection on tuple changes
 
   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred
 
   - eth: altera: fix refcount leak in altera_tse_mdio_create
 
 Misc:
   - add Quentin Monnet to bpftool maintainers
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmKhykgSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkN7sQAIn+ZmzQqTm5MVWnlvt/GcRGjjMP2VQY
 60oS2re8QC773yWoP6PvXqxCSFc99paDCC5BmCK6DMLbp9yuVSp5W8iAPuFuyjXE
 /Nur4Ti57LcGJ8ZpcJheBD4cRFbf+xtsGzx9a1WhUDrCYASo7vqRes5Eos2dT7P7
 qjgTduhUtaj6S1CfenfTnYqemZPzSGa+1euDuQ/Bu4mjCPUTrNZZQVYjmfTYM9p1
 UzwfCQr9TtmRKo8wLFHnYDLoWHNpfp55SNL0ShAwIQqgldiJ2OdMje+a2Sa4m6uF
 etRz8H0WrGVqfneD424tdyZv4nwhHw5dnaSrGe8DGq98c4/lIIcVyC38oDAbfWqI
 l8p7ZmtvNid7rpgoQFcxKpb2TAYAI+jaFq5GySEhvj5ZAblNQgFyghfMGPoncXCO
 XW6va8TtP2lmHFScAljQiQb6GNwDO52x77/q14Jkwvr+DILRKXMZZ3hCGrKUn5JM
 lafGkdL5ufm+E9C9RlaWN3imb2KoRj+wdThgV79efEPGG1py7yLOPVMoOCP3qmLq
 torcGcfDi1LGb7ohQxN6tCMv0JgXjS5nd1i+qJnImpkhRrUmahOfmpnElHoPuzs3
 6FU8HR77Eo15x70Jt+WOMy4oXrNh2MeEm8/Fhpj84MEhKpxVn+2o/53M+++5h+ru
 YtiLwEri0dCA
 =rdoB
 -----END PGP SIGNATURE-----

Merge tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf and netfilter.

  Current release - regressions:

   - eth: amt: fix possible null-ptr-deref in amt_rcv()

  Previous releases - regressions:

   - tcp: use alloc_large_system_hash() to allocate table_perturb

   - af_unix: fix a data-race in unix_dgram_peer_wake_me()

   - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling

   - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF

  Previous releases - always broken:

   - ipv6: fix signed integer overflow in __ip6_append_data

   - netfilter:
       - nat: really support inet nat without l3 address
       - nf_tables: memleak flow rule from commit path

   - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs

   - openvswitch: fix misuse of the cached connection on tuple changes

   - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred

   - eth: altera: fix refcount leak in altera_tse_mdio_create

  Misc:

   - add Quentin Monnet to bpftool maintainers"

* tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits)
  net: amd-xgbe: fix clang -Wformat warning
  tcp: use alloc_large_system_hash() to allocate table_perturb
  net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY
  net: dsa: mv88e6xxx: correctly report serdes link failure
  net: dsa: mv88e6xxx: fix BMSR error to be consistent with others
  net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete
  net: altera: Fix refcount leak in altera_tse_mdio_create
  net: openvswitch: fix misuse of the cached connection on tuple changes
  net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag
  ip_gre: test csum_start instead of transport header
  au1000_eth: stop using virt_to_bus()
  ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
  ipv6: Fix signed integer overflow in __ip6_append_data
  nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred
  nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION
  nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling
  nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION
  net: ipv6: unexport __init-annotated seg6_hmac_init()
  net: xfrm: unexport __init-annotated xfrm4_protocol_init()
  net: mdio: unexport __init-annotated mdio_bus_init()
  ...
2022-06-09 12:06:52 -07:00
Muchun Song e67b72b90b tcp: use alloc_large_system_hash() to allocate table_perturb
In our server, there may be no high order (>= 6) memory since we reserve
lots of HugeTLB pages when booting.  Then the system panic.  So use
alloc_large_system_hash() to allocate table_perturb.

Fixes: e926147618 ("tcp: dynamically allocate the perturb table used by source ports")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220607070214.94443-1-songmuchun@bytedance.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 21:11:05 -07:00
Ilya Maximets 2061ecfdf2 net: openvswitch: fix misuse of the cached connection on tuple changes
If packet headers changed, the cached nfct is no longer relevant
for the packet and attempt to re-use it leads to the incorrect packet
classification.

This issue is causing broken connectivity in OpenStack deployments
with OVS/OVN due to hairpin traffic being unexpectedly dropped.

The setup has datapath flows with several conntrack actions and tuple
changes between them:

  actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
          set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
          set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
          ct(zone=8),recirc(0x4)

After the first ct() action the packet headers are almost fully
re-written.  The next ct() tries to re-use the existing nfct entry
and marks the packet as invalid, so it gets dropped later in the
pipeline.

Clearing the cached conntrack entry whenever packet tuple is changed
to avoid the issue.

The flow key should not be cleared though, because we should still
be able to match on the ct_state if the recirculation happens after
the tuple change but before the next ct() action.

Cc: stable@vger.kernel.org
Fixes: 7f8a436eaa ("openvswitch: Add conntrack action")
Reported-by: Frode Nordahl <frode.nordahl@canonical.com>
Link: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-May/051829.html
Link: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
Link: https://lore.kernel.org/r/20220606221140.488984-1-i.maximets@ovn.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 20:49:52 -07:00
Willem de Bruijn 8d21e9963b ip_gre: test csum_start instead of transport header
GRE with TUNNEL_CSUM will apply local checksum offload on
CHECKSUM_PARTIAL packets.

ipgre_xmit must validate csum_start after an optional skb_pull,
else lco_csum may trigger an overflow. The original check was

	if (csum && skb_checksum_start(skb) < skb->data)
		return -EINVAL;

This had false positives when skb_checksum_start is undefined:
when ip_summed is not CHECKSUM_PARTIAL. A discussed refinement
was straightforward

	if (csum && skb->ip_summed == CHECKSUM_PARTIAL &&
	    skb_checksum_start(skb) < skb->data)
		return -EINVAL;

But was eventually revised more thoroughly:
- restrict the check to the only branch where needed, in an
  uncommon GRE path that uses header_ops and calls skb_pull.
- test skb_transport_header, which is set along with csum_start
  in skb_partial_csum_set in the normal header_ops datapath.

Turns out skbs can arrive in this branch without the transport
header set, e.g., through BPF redirection.

Revise the check back to check csum_start directly, and only if
CHECKSUM_PARTIAL. Do leave the check in the updated location.
Check field regardless of whether TUNNEL_CSUM is configured.

Link: https://lore.kernel.org/netdev/YS+h%2FtqCJJiQei+W@shredder/
Link: https://lore.kernel.org/all/20210902193447.94039-2-willemdebruijn.kernel@gmail.com/T/#u
Fixes: 8a0ed250f9 ("ip_gre: validate csum_start only on pull")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Link: https://lore.kernel.org/r/20220606132107.3582565-1-willemdebruijn.kernel@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 20:34:43 -07:00
Jakub Kicinski d5d4c36398 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2022-06-09

We've added 6 non-merge commits during the last 2 day(s) which contain
a total of 8 files changed, 49 insertions(+), 15 deletions(-).

The main changes are:

1) Fix an illegal copy_to_user() attempt seen by syzkaller through arm64
   BPF JIT compiler, from Eric Dumazet.

2) Fix calling global functions from BPF_PROG_TYPE_EXT programs by using
   the correct program context type, from Toke Høiland-Jørgensen.

3) Fix XSK TX batching invalid descriptor handling, from Maciej Fijalkowski.

4) Fix potential integer overflows in multi-kprobe link code by using safer
   kvmalloc_array() allocation helpers, from Dan Carpenter.

5) Add Quentin as bpftool maintainer, from Quentin Monnet.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  MAINTAINERS: Add a maintainer for bpftool
  xsk: Fix handling of invalid descriptors in XSK TX batching API
  selftests/bpf: Add selftest for calling global functions from freplace
  bpf: Fix calling global functions from BPF_PROG_TYPE_EXT programs
  bpf: Use safer kvmalloc_array() where possible
  bpf, arm64: Clear prog->jited_len along prog->jited
====================

Link: https://lore.kernel.org/r/20220608234133.32265-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 20:31:21 -07:00
Wang Yufen f638a84afe ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be
overflow. To fix, we can follow what udpv6 does and subtract the
transhdrlen from the max.

Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 10:56:43 -07:00
Wang Yufen f93431c86b ipv6: Fix signed integer overflow in __ip6_append_data
Resurrect ubsan overflow checks and ubsan report this warning,
fix it by change the variable [length] type to size_t.

UBSAN: signed-integer-overflow in net/ipv6/ip6_output.c:1489:19
2147479552 + 8567 cannot be represented in type 'int'
CPU: 0 PID: 253 Comm: err Not tainted 5.16.0+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
  dump_backtrace+0x214/0x230
  show_stack+0x30/0x78
  dump_stack_lvl+0xf8/0x118
  dump_stack+0x18/0x30
  ubsan_epilogue+0x18/0x60
  handle_overflow+0xd0/0xf0
  __ubsan_handle_add_overflow+0x34/0x44
  __ip6_append_data.isra.48+0x1598/0x1688
  ip6_append_data+0x128/0x260
  udpv6_sendmsg+0x680/0xdd0
  inet6_sendmsg+0x54/0x90
  sock_sendmsg+0x70/0x88
  ____sys_sendmsg+0xe8/0x368
  ___sys_sendmsg+0x98/0xe0
  __sys_sendmmsg+0xf4/0x3b8
  __arm64_sys_sendmmsg+0x34/0x48
  invoke_syscall+0x64/0x160
  el0_svc_common.constprop.4+0x124/0x300
  do_el0_svc+0x44/0xc8
  el0_svc+0x3c/0x1e8
  el0t_64_sync_handler+0x88/0xb0
  el0t_64_sync+0x16c/0x170

Changes since v1:
-Change the variable [length] type to unsigned, as Eric Dumazet suggested.
Changes since v2:
-Don't change exthdrlen type in ip6_make_skb, as Paolo Abeni suggested.
Changes since v3:
-Don't change ulen type in udpv6_sendmsg and l2tp_ip6_sendmsg, as
Jakub Kicinski suggested.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Yufen <wangyufen@huawei.com>
Link: https://lore.kernel.org/r/20220607120028.845916-1-wangyufen@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 10:56:43 -07:00
Masahiro Yamada 5801f064e3 net: ipv6: unexport __init-annotated seg6_hmac_init()
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.

modpost used to detect it, but it has been broken for a decade.

Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.

There are two ways to fix it:

  - Remove __init
  - Remove EXPORT_SYMBOL

I chose the latter for this case because the caller (net/ipv6/seg6.c)
and the callee (net/ipv6/seg6_hmac.c) belong to the same module.
It seems an internal function call in ipv6.ko.

Fixes: bf355b8d2c ("ipv6: sr: add core files for SR HMAC support")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 10:10:14 -07:00
Masahiro Yamada 4a388f08d8 net: xfrm: unexport __init-annotated xfrm4_protocol_init()
EXPORT_SYMBOL and __init is a bad combination because the .init.text
section is freed up after the initialization. Hence, modules cannot
use symbols annotated __init. The access to a freed symbol may end up
with kernel panic.

modpost used to detect it, but it has been broken for a decade.

Recently, I fixed modpost so it started to warn it again, then this
showed up in linux-next builds.

There are two ways to fix it:

  - Remove __init
  - Remove EXPORT_SYMBOL

I chose the latter for this case because the only in-tree call-site,
net/ipv4/xfrm4_policy.c is never compiled as modular.
(CONFIG_XFRM is boolean)

Fixes: 2f32b51b60 ("xfrm: Introduce xfrm_input_afinfo to access the the callbacks properly")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08 10:10:13 -07:00
Maciej Fijalkowski d678cbd2f8 xsk: Fix handling of invalid descriptors in XSK TX batching API
xdpxceiver run on a AF_XDP ZC enabled driver revealed a problem with XSK
Tx batching API. There is a test that checks how invalid Tx descriptors
are handled by AF_XDP. Each valid descriptor is followed by invalid one
on Tx side whereas the Rx side expects only to receive a set of valid
descriptors.

In current xsk_tx_peek_release_desc_batch() function, the amount of
available descriptors is hidden inside xskq_cons_peek_desc_batch(). This
can be problematic in cases where invalid descriptors are present due to
the fact that xskq_cons_peek_desc_batch() returns only a count of valid
descriptors. This means that it is impossible to properly update XSK
ring state when calling xskq_cons_release_n().

To address this issue, pull out the contents of
xskq_cons_peek_desc_batch() so that callers (currently only
xsk_tx_peek_release_desc_batch()) will always be able to update the
state of ring properly, as total count of entries is now available and
use this value as an argument in xskq_cons_release_n(). By
doing so, xskq_cons_peek_desc_batch() can be dropped altogether.

Fixes: 9349eb3a9d ("xsk: Introduce batched Tx descriptor interfaces")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220607142200.576735-1-maciej.fijalkowski@intel.com
2022-06-08 16:20:07 +02:00