Commit Graph

1136178 Commits

Author SHA1 Message Date
Paolo Abeni a5ef058dc4 net: introduce and use custom sockopt socket flag
We will soon introduce custom setsockopt for UDP sockets, too.
Instead of doing even more complex arbitrary checks inside
sock_use_custom_sol_socket(), add a new socket flag and set it
for the relevant socket types (currently only MPTCP).

Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:52:50 +01:00
David S. Miller ea5ed0f00b Merge branch 'net-800Gbps-support'
Petr Machata says:

====================
net: Add support for 800Gbps speed

Amit Cohen <amcohen@nvidia.com> writes:

The next Nvidia Spectrum ASIC will support 800Gbps speed.
The IEEE 802 LAN/MAN Standards Committee already published standards for
800Gbps, see the last update [1] and the list of approved changes [2].

As first phase, add support for 800Gbps over 8 lanes (100Gbps/lane).
In the future 800Gbps over 4 lanes can be supported also.

Extend ethtool to support the relevant PMDs and extend mlxsw and bonding
drivers to support 800Gbps.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:43:39 +01:00
Amit Cohen 41305d3781 bonding: 3ad: Add support for 800G speed
Add support for 800Gbps speed to allow using 3ad mode with 800G devices.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:43:39 +01:00
Amit Cohen cceef209dd mlxsw: Add support for 800Gbps link modes
Add support for 800Gbps speed, link modes of 100Gbps per lane.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:43:39 +01:00
Amit Cohen 404c76783f ethtool: Add support for 800Gbps link modes
Add support for 800Gbps speed, link modes of 100Gbps per lane.
As mentioned in slide 21 in IEEE documentation [1], all adopted 802.3df
copper and optical PMDs baselines using 100G/lane will be supported.

Add the relevant PMDs which are mentioned in slide 5 in IEEE
documentation [1] and were approved on 10-2022 [2]:
BP - KR8
Cu Cable - CR8
MMF 50m - VR8
MMF 100m - SR8
SMF 500m - DR8
SMF 2km - DR8-2

[1]: https://www.ieee802.org/3/df/public/22_10/22_1004/shrikhande_3df_01a_221004.pdf
[2]: https://ieee802.org/3/df/KeyMotions_3df_221005.pdf

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:43:39 +01:00
David S. Miller c1aa0a9078 Merge branch 'sparx5-IS2-VCAP'
Steen Hegelund says:

====================
Add support for Sparx5 IS2 VCAP

This provides initial support for the Sparx5 VCAP functionality via the
'tc' traffic control userspace tool and its flower filter.

Overview:
=========

The supported flower filter keys and actions are:

- source and destination MAC address keys
- trap action
- pass action

The supported Sparx5 VCAPs are: IS2 (see below for more info)

The VCAP (Versatile Content-Aware Processor) feature is essentially a TCAM
with rules consisting of:

- Programmable key fields
- Programmable action fields
- A counter (which may be only one bit wide)

Besides this each VCAP has:

- A number of independent lookups
- A keyset configuration typically per port per lookup

VCAPs are used in many of the TSN features such as PSFP, PTP, FRER as well
as the general shaping, policing and access control, so it is an important
building block for these advanced features.

Functionality:
==============

When a frame is passed to a VCAP the VCAP will generate a set of keys
(keyset) based on the traffic type.  If there is a rule created with this
keyset in the VCAP and the values of the keys matches the values in the
keyset of the frame, the rule is said to match and the actions in the rule
will be executed and the rule counter will be incremented.  No more rules
will be examined in this VCAP lookup.

If there is no match in the current lookup the frame will be matched
against the next lookup (some VCAPs do the processing of the lookups in
parallel).

The Sparx5 SoC has 6 different VCAP types:

- IS0: Ingress Stage 0 (AKA CLM) mostly handles classification
- IS2: Ingress Stage 2 mostly handles access control
- IP6PFX: IPv6 prefix: Provides tables for IPV6 address management
- LPM: Longest Path Match for IP guarding and routing
- ES0: Egress Stage 0 is mostly used for CPU copying and multicast handling
- ES2: Egress Stage 2 is known as the rewriter and mostly updates tags

Design:
=======

The VCAP implementation provides switchcore independent handling of rules
and supports:

- Creating and deleting rules
- Updating and getting rules

The platform specific API implementation as well as the platform specific
model of the VCAP instances are attached to the VCAP API and a client can
then access rules via the API in a platform independent way, with the
limitations that each VCAP has in terms of is supported keys and actions.

The VCAP model is generated from information delivered by the designers of
the VCAP hardware.

Here is an illustration of this:

  +------------------+     +------------------+
  | TC flower filter |     | PTP client       |
  | for Sparx5       |     | for Sparx5       |
  +-------------\----+     +---------/--------+
                 \                  /
                  \                /
                   \              /
                    \            /
                     \          /
                 +----v--------v----+
                 |     VCAP API     |
                 +---------|--------+
                           |
                           |
                           |
                           |
                 +---------v--------+
                 |   VCAP control   |
                 |   instance       |
                 +----/--------|----+
                     /         |
                    /          |
                   /           |
                  /            |
  +--------------v---+    +----v-------------+
  |   Sparx5 VCAP    |    | Sparx5 VCAP API  |
  |   model          |    | Implementation   |
  +------------------+    +---------|--------+
                                    |
                                    |
                                    |
                                    |
                          +---------v--------+
                          | Sparx5 VCAP HW   |
                          +------------------+

Delivery:
=========

For now only the IS2 is supported but later the IS0, ES0 and ES2 will be
added. There are currently no plans to support the IP6PFX and the LPM
VCAPs.

The IS2 VCAP has 4 lookups and they are accessible with a TC chain id:

- chain 8000000: IS2 Lookup 0
- chain 8100000: IS2 Lookup 1
- chain 8200000: IS2 Lookup 2
- chain 8300000: IS2 Lookup 3

These lookups are executed in parallel by the IS2 VCAP but the actions are
executed in series (the datasheet explains what happens if actions
overlap).

The functionality of TC flower as well as TC matchall filters will be
expanded in later submissions as well as the number of VCAPs supported.

This is current plan:

- add support for more TC flower filter keys and extend the Sparx5 port
  keyset configuration
- support for TC protocol all
- debugfs support for inspecting rules
- TC flower filter statistics
- Sparx5 IS0 VCAP support and more TC keys and actions to support this
- add TC policer and drop action support (depends on the Sparx5 QoS support
  upstreamed separately)
- Sparx5 ES0 VCAP support and more TC actions to support this
- TC flower template support
- TC matchall filter support for mirroring and policing ports
- TC flower filter mirror action support
- Sparx5 ES2 VCAP support

The LAN966x switchcore will also be updated to use the VCAP API as well as
future Microchip switches.
The LAN966x has 3 VCAPS (IS1, IS2 and ES0) and a slightly different keyset
and actionset portfolio than Sparx5.

Version History:
================
v3      Moved the sparx5_tc_flower_set_exterr function to the VCAP API and
        renamed it.
        Moved the sparx5_netbytes_copy function to the VCAP_API and renamed
        it (thanks Horatiu Vultur).
        Fixed indentation in the vcap_write_rule function.
        Added a comment mentioning the typegroup table terminator in the
        vcap_iter_skip_tg function.

v2      Made the KUNIT test model a superset of the real model to fix a
        kernel robot build error.

v1      Initial version
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 67d637516f net: microchip: sparx5: Adding KUNIT test for the VCAP API
This provides a KUNIT test suite for the VCAP APIs encoding functionality.

The test can be run by adding these settings in a .kunitconfig file

CONFIG_KUNIT=y
CONFIG_NET=y
CONFIG_VCAP_KUNIT_TEST=y

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 5d7e5b0401 net: microchip: sparx5: Adding KUNIT test VCAP model
This provides a test VCAP model for use in a KUNIT test.  The model
provides 3 different VCAP types for better test coverage.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 683e05c032 net: microchip: sparx5: Writing rules to the IS2 VCAP
This adds rule encoding functionality to the VCAP API.

A rule consists of keys and actions in separate cache sections.

The maximum size of the keyset or actionset determines the size of the
rule.

The VCAP hardware need to be able to distinguish different rule sizes from
each other, and for that purpose some extra typegroup bits are added to the
rule when it is encoded.

The API provides a bit stream iterator that allows highlevel encoding
functionality to add key and action value bits independent of typegroup
bits.

This is handled by letting the concrete VCAP model provide the typegroup
table for the different rule sizes.
After the key and action values have been added to the encoding bit streams
the typegroup bits are set to their correct values just before the rule is
written to the VCAP hardware.

The key and action offsets provided in the VCAP model are the offset before
adding the typegroup bits.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Tested-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 8e10490b00 net: microchip: sparx5: Adding basic rule management in VCAP API
This provides most of the rule handling needed to add a new rule to a VCAP.
To add a rule a client must follow these steps:

1) Allocate a new rule (provide an id or get one automatically assigned)
2) Add keys to the rule
3) Add actions to the rule
4) Optionally set a keyset on the rule
5) Optionally set an actionset on the rule
6) Validate the rule (this will add keyset and actionset if not specified
   in the previous steps)
7) Add the rule (if the validation was successful)
8) Free the rule instance (a copy has been added to the VCAP)

The validation step will fail if there are no keysets with the requested
keys, or there are no actionsets with the requested actions.
The validation will also fail if the keyset is not configured for the port
for the requested protocol).

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Tested-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 46be056ee0 net: microchip: sparx5: Adding port keyset config and callback interface
This provides a default port keyset configuration for the Sparx5 IS2 VCAP
where all ports and all lookups in IS2 use the same keyset (MAC_ETYPE) for
all types of traffic.

This means that no matter what frame type is received on any front port it
will generate the MAC_ETYPE keyset in the IS VCAP and any rule in the IS2
VCAP that uses this keyset will be matched against the keys in the
MAC_ETYPE keyset.

The callback interface used by the VCAP API is populated with Sparx5
specific handler functions that takes care of the actual reading and
writing to data to the Sparx5 IS2 VCAP instance.

A few functions are also added to the VCAP API to support addition of rule
fields such as the ingress port mask and the lookup bit.

The IS2 VCAP in Sparx5 is really divided in two instances with lookup 0
and 1 in the first instance and lookup 2 and 3 in the second instance.
The lookup bit selects lookup 0 or 3 in the respective instance when it is
set.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Tested-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund c9da1ac1c2 net: microchip: sparx5: Adding initial tc flower support for VCAP API
This adds initial TC flower filter support to Sparx5 for the IS2 VCAP.

The support consists of the source and destination MAC addresses,
and the trap and pass actions.

This is how you can create a rule that test the functionality:

tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress chain 8000000 prio 10 handle 10 \
      protocol all flower skip_sw \
      dst_mac 0a:0b:0c:0d:0e:0f \
      src_mac 2:0:0:0:0:1 \
      action trap

The IS2 chains in Sparx5 are assigned like this:

- chain 8000000: IS2 Lookup 0
- chain 8100000: IS2 Lookup 1
- chain 8200000: IS2 Lookup 2
- chain 8300000: IS2 Lookup 3

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Tested-by: Casper Andersson <casper.casan@gmail.com>
Reviewed-by: Casper Andersson <casper.casan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:43 +01:00
Steen Hegelund 45c00ad003 net: microchip: sparx5: Adding IS2 VCAP register interface
This adds the register interface needed to access the Sparx5 Ingress Stage
2 VCAP (IS2).

The Sparx5 Chip Register Model can be browsed at this location:
https://github.com/microchip-ung/sparx-5_reginfo

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:42 +01:00
Steen Hegelund e8145e0685 net: microchip: sparx5: Adding IS2 VCAP model to VCAP API
This provides the Sparx5 Ingress Stage 2 (IS2) model and adds it to the
VCAP control instance that will be provided to the VCAP API.

The Sparx5 IS2 C code model is generated from the Sparx5 RTL design model.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:42 +01:00
Steen Hegelund 8beef08f46 net: microchip: sparx5: Adding initial VCAP API support
This provides the initial VCAP API framework and Sparx5 specific VCAP
implementation.

When the Sparx5 Switchdev driver is initialized it will also initialize its
VCAP module, and this hooks up the concrete Sparx5 VCAP model to the VCAP
API, so that the VCAP API knows what VCAP instances are available.

Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:37:42 +01:00
Yanguo Li abc210952a nfp: flower: tunnel neigh support bond offload
Support hardware offload when tunnel neigh out port is bond.
These feature work with the nfp firmware. If the firmware
supports the NFP_FL_FEATS_TUNNEL_NEIGH_LAG feature, nfp driver
write the bond information to the firmware neighbor table or
do nothing for bond. when neighbor MAC changes, nfp driver
need to update the neighbor information too.

Signed-off-by: Yanguo Li <yanguo.li@corigine.com>
Reviewed-by: Louis Peens <louis.peens@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 10:32:45 +01:00
David S. Miller 04d63e62ef Merge branch 'inet6_destroy_sock-calls-remove'
Kuniyuki Iwashima says:

====================
inet6: Remove inet6_destroy_sock() calls.

This is a follow-up series for commit d38afeec26 ("tcp/udp: Call
inet6_destroy_sock() in IPv6 sk->sk_destruct().").

This series cleans up unnecessary inet6_destory_sock() calls in
sk->sk_prot->destroy() and call it from sk->sk_destruct() to make
sure we do not leak memory related to IPv6 specific-resources.

Changes:
  v2:
    * patch 1
      * Fix build failure for CONFIG_MPTCP_IPV6=y

  v1: https://lore.kernel.org/netdev/20221018190956.1308-1-kuniyu@amazon.com/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:39 +01:00
Kuniyuki Iwashima b45a337f06 inet6: Clean up failure path in do_ipv6_setsockopt().
We can reuse the unlock label above and need not repeat the same code.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:39 +01:00
Kuniyuki Iwashima 1f8c4eeb94 inet6: Remove inet6_destroy_sock().
The last user of inet6_destroy_sock() is its wrapper inet6_cleanup_sock().
Let's rename inet6_destroy_sock() to inet6_cleanup_sock().

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:39 +01:00
Kuniyuki Iwashima 6431b0f6ff sctp: Call inet6_destroy_sock() via sk->sk_destruct().
After commit d38afeec26 ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and
SCTPv6 socket reuses it as the init function.

To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we
set sctp_v6_destruct_sock() in a new init function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:39 +01:00
Kuniyuki Iwashima 1651951ebe dccp: Call inet6_destroy_sock() via sk->sk_destruct().
After commit d38afeec26 ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
DCCPv6 socket shares it by calling the same init function via
dccp_v6_init_sock().

To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
export it and set dccp_v6_sk_destruct() in the init function.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:38 +01:00
Kuniyuki Iwashima b5fc29233d inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().
After commit d38afeec26 ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

Now we can remove unnecessary inet6_destroy_sock() calls in
sk->sk_prot->destroy().

DCCP and SCTP have their own sk->sk_destruct() function, so we
change them separately in the following patches.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:40:38 +01:00
David S. Miller 225480f040 Merge branch 'dpaa2-eth-AF_XDP-zc'
Ioana Ciornei says:

====================
net: dpaa2-eth: AF_XDP zero-copy support

This patch set adds support for AF_XDP zero-copy in the dpaa2-eth
driver. The support is available on the LX2160A SoC and its variants and
only on interfaces (DPNIs) with a maximum of 8 queues (HW limitations
are the root cause).

We are first implementing the .get_channels() callback since this a
dependency for further work.

Patches 2-3 are working on making the necessary changes for multiple
buffer pools on a single interface. By default, without an AF_XDP socket
attached, only a single buffer pool will be used and shared between all
the queues. The changes in the functions are made in this patch, but the
actual allocation and setup of a new BP is done in patch#10.

Patches 4-5 are improving the information exposed in debugfs. We are
exposing a new file to show which buffer pool is used by what channels
and how many buffers it currently has.

The 6th patch updates the dpni_set_pools() firmware API so that we are
capable of setting up a different buffer per queue in later patches.

In the 7th patch the generic dev_open/close APIs are used instead of the
dpaa2-eth internal ones.

Patches 8-9 are rearranging the existing code in dpaa2-eth.c in order to
create new functions which will be used in the XSK implementation in
dpaa2-xsk.c

Finally, the last 3 patches are adding the actual support for both the
Rx and Tx path of AF_XDP zero-copy and some associated tracepoints.
Details on the implementation can be found in the actual patch.

Changes in v2:
 - 3/12:  Export dpaa2_eth_allocate_dpbp/dpaa2_eth_free_dpbp in this
   patch to avoid a build warning. The functions will be used in next
   patches.
 - 6/12:  Use __le16 instead of u16 for the dpbp_id field.
 - 12/12: Use xdp_buff->data_hard_start when tracing the BP seeding.

Changes in v3:
 - 3/12: fix leaking of bp on error path
====================

Acked-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:12 +01:00
Robert-Ionut Alexa 3817b2ac71 net: dpaa2-eth: add trace points on XSK events
Define the dpaa2_tx_xsk_fd and dpaa2_rx_xsk_fd trace events for the XSK
zero-copy Rx and Tx path.  Also, define the dpaa2_eth_buf as an event
class so that both dpaa2_eth_buf_seed and dpaa2_xsk_buf_seed traces can
derive from the same class.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 4a7f6c5ac9 net: dpaa2-eth: AF_XDP TX zero copy support
Add support in dpaa2-eth for packet processing on the Tx path using
AF_XDP zero copy mode.

The newly added dpaa2_xsk_tx() function will handle enqueuing AF_XDP Tx
packets into the appropriate queue and update any necessary statistics.

On a more detailed note, the dpaa2_xsk_tx_build_fd() function handles
creating a Scatter-Gather frame descriptor with only one data buffer.
This is needed because otherwise we would need to impose a headroom in
the Tx buffer to store our software annotation structures.
This tactic is already used on the normal data path of the dpaa2-eth
driver, thus we are reusing the dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle
functions in order to allocate and recycle the Scatter-Gather table
buffers.

In case we have reached the maximum number of Tx XSK packets to be sent
in a NAPI cycle, we'll exit the dpaa2_eth_poll() and hope to be
rescheduled again.

On the XSK Tx confirmation path, we are just unmapping the SGT buffer
and recycle it for further use.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 48276c08cf net: dpaa2-eth: AF_XDP RX zero copy support
This patch adds the support for receiving packets via the AF_XDP
zero-copy mechanism in the dpaa2-eth driver. The support is available
only on the LX2160A SoC and variants because we are relying on the HW
capability to associate a buffer pool to a specific queue (QDBIN), only
available on newer WRIOP versions.

On the control path, the dpaa2_xsk_enable_pool() function is responsible
to allocate a buffer pool (BP), setup this new BP to be used only on the
requested queue and change the consume function to point to the XSK ZC
one.
We are forced to call dev_close() in order to change the queue to buffer
pool association (dpaa2_xsk_set_bp_per_qdbin) . This also works in our
favor since at dev_close() the buffer pools will be drained and at the
later dev_open() call they will be again seeded, this time with buffers
allocated from the XSK pool if needed.

On the data path, a new software annotation type is defined to be used
only for the XSK scenarios. This will enable us to pass keep necessary
information about a packet buffer between the moment in which it was
seeded and when it's received by the driver. In the XSK case, we are
keeping the associated xdp_buff.
Depending on the action returned by the BPF program, we will do the
following:
 - XDP_PASS: copy the contents of the packet into a brand new skb,
   recycle the initial buffer.
 - XDP_TX: just enqueue the same frame descriptor back into the Tx path,
   the buffer will get automatically released into the initial BP.
 - XDP_REDIRECT: call xdp_do_redirect() and exit.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa ee2a3bdef9 net: dpaa2-eth: create and export the dpaa2_eth_receive_skb() function
Carve out code from the dpaa2_eth_rx() function in order to create and
export the dpaa2_eth_receive_skb() function. Do this in order to reuse
this code also from the XSK path which will be introduced in a later
patch.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 129902a351 net: dpaa2-eth: create and export the dpaa2_eth_alloc_skb function
The dpaa2_eth_alloc_skb() function is added by moving code from the
dpaa2_eth_copybreak() previously defined function. What the new API does
is to allocate a new skb, copy the frame data from the passed FD to the
new skb and then return the skb.
Export this new function since we'll need the this functionality also
from the XSK code path.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Ioana Ciornei e3caeb2ddb net: dpaa2-eth: use dev_close/open instead of the internal functions
Instead of calling the internal functions which implement .ndo_stop and
.ndo_open, we can simply call dev_close and dev_open, so that we keep
the code cleaner.

Also, in the next patches we'll use the same APIs from other files
without needing to export the internal functions.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 801c76dd06 net: dpaa2-eth: update the dpni_set_pools() API to support per QDBIN pools
Update the dpni_set_pool() firmware API so that in the next patches we
can configure per Rx queue (per QDBIN) buffer pools.
This is a hard requirement of the AF_XDP, thus we need the newer API
version.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Ioana Ciornei b1dd9bf6ea net: dpaa2-eth: export buffer pool info into a new debugfs file
Export the allocated buffer pools, the number of buffers that they have
currently and which channels are using which BP.

The output looks like below:

Buffer pool info for eth2:
IDX        BPID      Buf count      CH#0      CH#1      CH#2      CH#3
BP#0         1           5124         x         x         x         x

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Ioana Ciornei 96b44697e5 net: dpaa2-eth: export the CH#<index> in the 'ch_stats' debug file
Just give out an index for each channel that we export into the debug
file in the form of CH#<index>. This is purely to help corelate each
channel information from one debugfs file to another one.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 095174dafc net: dpaa2-eth: add support for multiple buffer pools per DPNI
This patch allows the configuration of multiple buffer pools associated
with a single DPNI object, each distinct DPBP object not necessarily
shared among all queues.
The user can interogate both the number of buffer pools and the buffer
count in each buffer pool by using the .get_ethtool_stats() callback.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Ioana Ciornei 3313206827 net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats
Rearrange the variables in the dpaa2_eth_get_ethtool_stats() function so
that we adhere to the reverse Christmas tree rule.
Also, in the next patch we are adding more variables and I didn't know
where to place them with the current ordering.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Robert-Ionut Alexa 14e493ddc3 net: dpaa2-eth: add support to query the number of queues through ethtool
The .get_channels() ethtool_ops callback is implemented and exports the
number of queues: Rx, Tx, Tx conf and Rx err.
The last two ones, Tx confirmation and Rx err, are counted as 'others'.

The .set_channels() callback is not implemented since the DPAA2
software/firmware architecture does not allow the dynamic
reconfiguration of the number of queues.

Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com>
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24 09:22:11 +01:00
Alexandru Tachici 3bd5549bd4 dt-bindings: net: adin1110: Document reset
Document GPIO for HW reset.

Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-21 13:11:33 +01:00
Alexandru Tachici 36934cac7a net: ethernet: adi: adin1110: add reset GPIO
Add an optional GPIO to be used for a hardware reset of the IC.

Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-21 13:11:33 +01:00
Jeff Johnson c0facc045a net: ipa: Make QMI message rules const
Commit ff6d365898 ("soc: qcom: qmi: use const for struct
qmi_elem_info") allows QMI message encoding/decoding rules to be
const, so do that for IPA.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Reviewed-by: Sibi Sankar <quic_sibis@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-21 12:39:16 +01:00
Doug Berger 070f822d07 net: bcmgenet: add RX_CLS_LOC_ANY support
If a matching flow spec exists its current location is as good
as ANY. If not add the new flow spec at the first available
location.

Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Link: https://lore.kernel.org/r/20221019215123.316997-1-opendmb@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 21:43:57 -07:00
Alexey Kodanev 377eb9aab0 sctp: remove unnecessary NULL checks in sctp_enqueue_event()
After commit 178ca044aa ("sctp: Make sctp_enqueue_event tak an
skb list."), skb_list cannot be NULL.

Detected using the static analysis tool - Svace.
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20221019180735.161388-3-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 21:43:10 -07:00
Alexey Kodanev b66aeddbe3 sctp: remove unnecessary NULL check in sctp_ulpq_tail_event()
After commit 013b96ec64 ("sctp: Pass sk_buff_head explicitly to
sctp_ulpq_tail_event().") there is one more unneeded check of
skb_list for NULL.

Detected using the static analysis tool - Svace.
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20221019180735.161388-2-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 21:43:10 -07:00
Alexey Kodanev 6fdfdef7fd sctp: remove unnecessary NULL check in sctp_association_init()
'&asoc->ulpq' passed to sctp_ulpq_init() as the first argument,
then sctp_qlpq_init() initializes it and eventually returns the
address of the struct member back. Therefore, in this case, the
return pointer cannot be NULL.

Moreover, it seems sctp_ulpq_init() has always been used only in
sctp_association_init(), so there's really no need to return ulpq
anymore.

Detected using the static analysis tool - Svace.
Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20221019180735.161388-1-aleksei.kodanev@bell-sw.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 21:43:10 -07:00
Jakub Kicinski 94adb5e29e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-20 17:49:10 -07:00
Linus Torvalds 6d36c728bc Networking fixes for 6.1-rc2, including fixes from netfilter
Current release - regressions:
   - revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}"
 
   - revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
 
   - dsa: uninitialized variable in dsa_slave_netdevice_event()
 
   - eth: sunhme: uninitialized variable in happy_meal_init()
 
 Current release - new code bugs:
   - eth: octeontx2: fix resource not freed after malloc
 
 Previous releases - regressions:
   - sched: fix return value of qdisc ingress handling on success
 
   - sched: fix race condition in qdisc_graft()
 
   - udp: update reuse->has_conns under reuseport_lock.
 
   - tls: strp: make sure the TCP skbs do not have overlapping data
 
   - hsr: avoid possible NULL deref in skb_clone()
 
   - tipc: fix an information leak in tipc_topsrv_kern_subscr
 
   - phylink: add mac_managed_pm in phylink_config structure
 
   - eth: i40e: fix DMA mappings leak
 
   - eth: hyperv: fix a RX-path warning
 
   - eth: mtk: fix memory leaks
 
 Previous releases - always broken:
   - sched: cake: fix null pointer access issue when cake_init() fails
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmNRKdcSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkYn8P/31xjE9/BRQKVGQOMxj78vhQvVHEZYXJ
 OJcaLcjxUCqj6hu3pEcuf88PeTicyfEqN32zzH1k+SS8jGQCmoVtXUfbZ7pDR6Tc
 rsAqVhLD6JnYkGEgtzm3i+8EfSeBoCy9kT4JZzRxQmOfZr1rBmtoMHOB4cGk9g1K
 lSF3KJcKT1GDacB/gVei+ms0Y+Q9WULOg3OFuyLSeltAkhZKaTfx/qqsLLEHFqZc
 u6eR31GwG28Y4GVurLQSOdaWrFOKqmPFOpzvjmeKC2RBqS6hVl4/YKZTmTV53Lee
 brm6kuVlU7CJVZEN2qF8G2+/SqLgcB0o26JVnml1kT8n0GlFAbyFf5akawjT8/Je
 G/zgz6k6wUAI2g3nSPNmgqVtobsypthzWL/bOpWfGfJFXxGOgLG3pbZYIl816Tha
 KnibZqQOBHxfPaUzh0xCLhidoi5G0T8ip9o9tyKlnmvbKY/EWk6HiIjCWlxnkPiO
 GRdHkyF7KMxqo/QuE9AK3LnD/AsLeWcuqzMveiaTbYLkMjGDW1yz3otL7KXW5l8U
 zYoUn1HQLkNqDE17+PjRo28awOMyN6ujXggKBK/hfXVnPpdW3yWPUoslqQdVn5KC
 3PLeSNM1v4UQSMWx1alRx1PvA+zYDX4GQpSSXbQgYGVim8LMwZQ1xui2qWF8xEau
 k9ZKfMSUNGEr
 =y2As
 -----END PGP SIGNATURE-----

Merge tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from netfilter.

  Current release - regressions:

   - revert "net: fix cpu_max_bits_warn() usage in
     netif_attrmask_next{,_and}"

   - revert "net: sched: fq_codel: remove redundant resource cleanup in
     fq_codel_init()"

   - dsa: uninitialized variable in dsa_slave_netdevice_event()

   - eth: sunhme: uninitialized variable in happy_meal_init()

  Current release - new code bugs:

   - eth: octeontx2: fix resource not freed after malloc

  Previous releases - regressions:

   - sched: fix return value of qdisc ingress handling on success

   - sched: fix race condition in qdisc_graft()

   - udp: update reuse->has_conns under reuseport_lock.

   - tls: strp: make sure the TCP skbs do not have overlapping data

   - hsr: avoid possible NULL deref in skb_clone()

   - tipc: fix an information leak in tipc_topsrv_kern_subscr

   - phylink: add mac_managed_pm in phylink_config structure

   - eth: i40e: fix DMA mappings leak

   - eth: hyperv: fix a RX-path warning

   - eth: mtk: fix memory leaks

  Previous releases - always broken:

   - sched: cake: fix null pointer access issue when cake_init() fails"

* tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
  net: phy: dp83822: disable MDI crossover status change interrupt
  net: sched: fix race condition in qdisc_graft()
  net: hns: fix possible memory leak in hnae_ae_register()
  wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
  sfc: include vport_id in filter spec hash and equal()
  genetlink: fix kdoc warnings
  selftests: add selftest for chaining of tc ingress handling to egress
  net: Fix return value of qdisc ingress handling on success
  net: sched: sfb: fix null pointer access issue when sfb_init() fails
  Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
  net: sched: cake: fix null pointer access issue when cake_init() fails
  ethernet: marvell: octeontx2 Fix resource not freed after malloc
  netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
  netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
  ionic: catch NULL pointer issue on reconfig
  net: hsr: avoid possible NULL deref in skb_clone()
  bnxt_en: fix memory leak in bnxt_nvm_test()
  ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed
  udp: Update reuse->has_conns under reuseport_lock.
  net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable()
  ...
2022-10-20 17:24:59 -07:00
Linus Torvalds c7b006525b ata fixes for 6.1-rc2
Several minor fixes for rc2:
 
 * Fix the module alias for the ahci_imx driver to get autoloading to
   work (from Alexander).
 
 * Fix a potential array-index-out-of-bounds problem with the enclosure
   managment support in the ahci driver (from Kai-Heng).
 
 * Several patches to fix compilation warnings thrown by clang in the
   ahci_st, sata_rcar, ahci_brcm, ahci_xgene, ahci_imx and ahci_qoriq
   drivers (from me).
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCY1HZ7wAKCRDdoc3SxdoY
 dqs9AP4lTc4nUh/cf6EOsPFDBsT4WmAkHy4XFzEF7VQsGmfPAwEAtOFFNfR3//Ku
 EeskPDGwy3E7F/s/pI92J01l/z4fSwg=
 =qKSO
 -----END PGP SIGNATURE-----

Merge tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

Pull ata fixes from Damien Le Moal:
 "Several minor fixes:

   - Fix the module alias for the ahci_imx driver to get autoloading to
     work (Alexander)

   - Fix a potential array-index-out-of-bounds problem with the
     enclosure managment support in the ahci driver (Kai-Heng)

   - Several patches to fix compilation warnings thrown by clang in the
     ahci_st, sata_rcar, ahci_brcm, ahci_xgene, ahci_imx and ahci_qoriq
     drivers (me)"

* tag 'ata-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
  ata: ahci_qoriq: Fix compilation warning
  ata: ahci_imx: Fix compilation warning
  ata: ahci_xgene: Fix compilation warning
  ata: ahci_brcm: Fix compilation warning
  ata: sata_rcar: Fix compilation warning
  ata: ahci_st: Fix compilation warning
  ata: ahci: Match EM_MAX_SLOTS with SATA_PMP_MAX_PORTS
  ata: ahci-imx: Fix MODULE_ALIAS
2022-10-20 17:07:54 -07:00
Linus Torvalds a3ccea6ed8 - Fix dm-bufio to use test_bit_acquire to properly test_bit on arches
with weaker memory ordering.
 
 - DM core replace DMWARN with DMERR or DMCRIT for fatal errors.
 
 - Enable WQ_HIGHPRI on DM verity target's verify_wq.
 
 - Add documentation for DM verity's try_verify_in_tasklet option.
 
 - Various typo and redundant word fixes in code and/or comments.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAmNPGkIACgkQxSPxCi2d
 A1oRywf/Si+OVFkBfLo43xxz9lHjJ8cbZktwo79X4J8WUAJsXbhACaMhPGi7kADS
 AmmVvuEAHzghRWzQewUoqNxdmXVyP9KYnpAqlAncxHTfVqdQGeg8FEv+ssoHkJgL
 UGBy6eIZLEJiIBkEnjgzLRsy/XEtNsV2TDcu09CdjFc1tFwJng9aUbkRAXuFwoWh
 x7z7qr8R7GogpoLw2sN7MmKM+l7svxsxZ+Dbt6l6/TWbjCwmKo/+kf/COhS7B+Bt
 Dpi0HxzFhBEuWod4qVZwLSggSKdAAfBG4Tcn2iRiJhJN5VPdyqo5SyH1ni4cie4q
 pvFX5KlkdkiYiMIBuxZMQq9mNT4moA==
 =dumQ
 -----END PGP SIGNATURE-----

Merge tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Fix dm-bufio to use test_bit_acquire to properly test_bit on arches
   with weaker memory ordering.

 - DM core replace DMWARN with DMERR or DMCRIT for fatal errors.

 - Enable WQ_HIGHPRI on DM verity target's verify_wq.

 - Add documentation for DM verity's try_verify_in_tasklet option.

 - Various typo and redundant word fixes in code and/or comments.

* tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm clone: Fix typo in block_device format specifier
  dm: remove unnecessary assignment statement in alloc_dev()
  dm verity: Add documentation for try_verify_in_tasklet option
  dm cache: delete the redundant word 'each' in comment
  dm raid: fix typo in analyse_superblocks code comment
  dm verity: enable WQ_HIGHPRI on verify_wq
  dm raid: delete the redundant word 'that' in comment
  dm: change from DMWARN to DMERR or DMCRIT for fatal errors
  dm bufio: use the acquire memory barrier when testing for B_READING
2022-10-20 17:00:54 -07:00
Kees Cook 36875a063b net: ipa: Proactively round up to kmalloc bucket size
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: netdev@vger.kernel.org
Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/lkml/4d75a9fd-1b94-7208-9de8-5a0102223e68@ieee.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221018092724.give.735-kees@kernel.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-10-20 10:13:54 +02:00
Felix Riemann 7f378c03aa net: phy: dp83822: disable MDI crossover status change interrupt
If the cable is disconnected the PHY seems to toggle between MDI and
MDI-X modes. With the MDI crossover status interrupt active this causes
roughly 10 interrupts per second.

As the crossover status isn't checked by the driver, the interrupt can
be disabled to reduce the interrupt load.

Fixes: 87461f7a58 ("net: phy: DP83822 initial driver submission")
Signed-off-by: Felix Riemann <felix.riemann@sma.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-19 18:46:17 -07:00
Eric Dumazet ebda44da44 net: sched: fix race condition in qdisc_graft()
We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.

<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>

Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).

[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027

CPU: 0 PID: 21027 Comm: syz-executor.4 Not tainted 6.0.0-rc3-syzkaller-00363-g7726d4c3e60b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:317 [inline]
print_report.cold+0x2ba/0x719 mm/kasan/report.c:433
kasan_report+0xb1/0x1e0 mm/kasan/report.c:495
__tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
__tcf_qdisc_find net/sched/cls_api.c:1051 [inline]
tc_new_tfilter+0x34f/0x2200 net/sched/cls_api.c:2018
rtnetlink_rcv_msg+0x955/0xca0 net/core/rtnetlink.c:6081
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f5efaa89279
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f5efbc31168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f5efab9bf80 RCX: 00007f5efaa89279
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000005
RBP: 00007f5efaae32e9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007f5efb0cfb1f R14: 00007f5efbc31300 R15: 0000000000022000
</TASK>

Allocated by task 21027:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:45 [inline]
set_alloc_info mm/kasan/common.c:437 [inline]
____kasan_kmalloc mm/kasan/common.c:516 [inline]
____kasan_kmalloc mm/kasan/common.c:475 [inline]
__kasan_kmalloc+0xa9/0xd0 mm/kasan/common.c:525
kmalloc_node include/linux/slab.h:623 [inline]
kzalloc_node include/linux/slab.h:744 [inline]
qdisc_alloc+0xb0/0xc50 net/sched/sch_generic.c:938
qdisc_create_dflt+0x71/0x4a0 net/sched/sch_generic.c:997
attach_one_default_qdisc net/sched/sch_generic.c:1152 [inline]
netdev_for_each_tx_queue include/linux/netdevice.h:2437 [inline]
attach_default_qdiscs net/sched/sch_generic.c:1170 [inline]
dev_activate+0x760/0xcd0 net/sched/sch_generic.c:1229
__dev_open+0x393/0x4d0 net/core/dev.c:1441
__dev_change_flags+0x583/0x750 net/core/dev.c:8556
rtnl_configure_link+0xee/0x240 net/core/rtnetlink.c:3189
rtnl_newlink_create net/core/rtnetlink.c:3371 [inline]
__rtnl_newlink+0x10b8/0x17e0 net/core/rtnetlink.c:3580
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3593
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Freed by task 21020:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
kasan_set_track+0x21/0x30 mm/kasan/common.c:45
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
____kasan_slab_free mm/kasan/common.c:367 [inline]
____kasan_slab_free+0x166/0x1c0 mm/kasan/common.c:329
kasan_slab_free include/linux/kasan.h:200 [inline]
slab_free_hook mm/slub.c:1754 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1780
slab_free mm/slub.c:3534 [inline]
kfree+0xe2/0x580 mm/slub.c:4562
rcu_do_batch kernel/rcu/tree.c:2245 [inline]
rcu_core+0x7b5/0x1890 kernel/rcu/tree.c:2505
__do_softirq+0x1d3/0x9c6 kernel/softirq.c:571

Last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
call_rcu+0x99/0x790 kernel/rcu/tree.c:2793
qdisc_put+0xcd/0xe0 net/sched/sch_generic.c:1083
notify_and_destroy net/sched/sch_api.c:1012 [inline]
qdisc_graft+0xeb1/0x1270 net/sched/sch_api.c:1084
tc_modify_qdisc+0xbb7/0x1a00 net/sched/sch_api.c:1671
rtnetlink_rcv_msg+0x43a/0xca0 net/core/rtnetlink.c:6090
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2501
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x543/0x7f0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x917/0xe10 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
____sys_sendmsg+0x6eb/0x810 net/socket.c:2482
___sys_sendmsg+0x110/0x1b0 net/socket.c:2536
__sys_sendmsg+0xf3/0x1c0 net/socket.c:2565
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

Second to last potentially related work creation:
kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
__kasan_record_aux_stack+0xbe/0xd0 mm/kasan/generic.c:348
kvfree_call_rcu+0x74/0x940 kernel/rcu/tree.c:3322
neigh_destroy+0x431/0x630 net/core/neighbour.c:912
neigh_release include/net/neighbour.h:454 [inline]
neigh_cleanup_and_release+0x1f8/0x330 net/core/neighbour.c:103
neigh_del net/core/neighbour.c:225 [inline]
neigh_remove_one+0x37d/0x460 net/core/neighbour.c:246
neigh_forced_gc net/core/neighbour.c:276 [inline]
neigh_alloc net/core/neighbour.c:447 [inline]
___neigh_create+0x18b5/0x29a0 net/core/neighbour.c:642
ip6_finish_output2+0xfb8/0x1520 net/ipv6/ip6_output.c:125
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x690/0x1160 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:296 [inline]
ip6_output+0x1ed/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:451 [inline]
NF_HOOK include/linux/netfilter.h:307 [inline]
NF_HOOK include/linux/netfilter.h:301 [inline]
mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x71c/0xdc0 net/ipv6/mcast.c:2653
process_one_work+0x991/0x1610 kernel/workqueue.c:2289
worker_thread+0x665/0x1080 kernel/workqueue.c:2436
kthread+0x2e4/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306

The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000, ffff88802065e400)

The buggy address belongs to the physical page:
page:ffffea0000819600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x20658
head:ffffea0000819600 order:3 compound_mapcount:0 compound_pincount:0
flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000010200 0000000000000000 dead000000000001 ffff888011841dc0
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd20c0(__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 3523, tgid 3523 (sshd), ts 41495190986, free_ts 41417713212
prep_new_page mm/page_alloc.c:2532 [inline]
get_page_from_freelist+0x109b/0x2ce0 mm/page_alloc.c:4283
__alloc_pages+0x1c7/0x510 mm/page_alloc.c:5515
alloc_pages+0x1a6/0x270 mm/mempolicy.c:2270
alloc_slab_page mm/slub.c:1824 [inline]
allocate_slab+0x27e/0x3d0 mm/slub.c:1969
new_slab mm/slub.c:2029 [inline]
___slab_alloc+0x7f1/0xe10 mm/slub.c:3031
__slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3118
slab_alloc_node mm/slub.c:3209 [inline]
__kmalloc_node_track_caller+0x2f2/0x380 mm/slub.c:4955
kmalloc_reserve net/core/skbuff.c:358 [inline]
__alloc_skb+0xd9/0x2f0 net/core/skbuff.c:430
alloc_skb_fclone include/linux/skbuff.h:1307 [inline]
tcp_stream_alloc_skb+0x38/0x580 net/ipv4/tcp.c:861
tcp_sendmsg_locked+0xc36/0x2f80 net/ipv4/tcp.c:1325
tcp_sendmsg+0x2b/0x40 net/ipv4/tcp.c:1483
inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:734
sock_write_iter+0x291/0x3d0 net/socket.c:1108
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578
ksys_write+0x1e8/0x250 fs/read_write.c:631
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1449 [inline]
free_pcp_prepare+0x5e4/0xd20 mm/page_alloc.c:1499
free_unref_page_prepare mm/page_alloc.c:3380 [inline]
free_unref_page+0x19/0x4d0 mm/page_alloc.c:3476
__unfreeze_partials+0x17c/0x1a0 mm/slub.c:2548
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:447
kasan_slab_alloc include/linux/kasan.h:224 [inline]
slab_post_alloc_hook mm/slab.h:727 [inline]
slab_alloc_node mm/slub.c:3243 [inline]
slab_alloc mm/slub.c:3251 [inline]
__kmem_cache_alloc_lru mm/slub.c:3258 [inline]
kmem_cache_alloc+0x267/0x3b0 mm/slub.c:3268
kmem_cache_zalloc include/linux/slab.h:723 [inline]
alloc_buffer_head+0x20/0x140 fs/buffer.c:2974
alloc_page_buffers+0x280/0x790 fs/buffer.c:829
create_empty_buffers+0x2c/0xee0 fs/buffer.c:1558
ext4_block_write_begin+0x1004/0x1530 fs/ext4/inode.c:1074
ext4_da_write_begin+0x422/0xae0 fs/ext4/inode.c:2996
generic_perform_write+0x246/0x560 mm/filemap.c:3738
ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:270
ext4_file_write_iter+0x44a/0x1660 fs/ext4/file.c:679
call_write_iter include/linux/fs.h:2187 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9e9/0xdd0 fs/read_write.c:578

Fixes: af356afa01 ("net_sched: reintroduce dev->qdisc for use by sch_api")
Reported-by: syzbot <syzkaller@googlegroups.com>
Diagnosed-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20221018203258.2793282-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-19 17:35:30 -07:00
Yang Yingliang ff2f5ec5d0 net: hns: fix possible memory leak in hnae_ae_register()
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().

unreferenced object 0xffff00c01aba2100 (size 128):
  comm "systemd-udevd", pid 1259, jiffies 4294903284 (age 294.152s)
  hex dump (first 32 bytes):
    68 6e 61 65 30 00 00 00 18 21 ba 1a c0 00 ff ff  hnae0....!......
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000034783f26>] slab_post_alloc_hook+0xa0/0x3e0
    [<00000000748188f2>] __kmem_cache_alloc_node+0x164/0x2b0
    [<00000000ab0743e8>] __kmalloc_node_track_caller+0x6c/0x390
    [<000000006c0ffb13>] kvasprintf+0x8c/0x118
    [<00000000fa27bfe1>] kvasprintf_const+0x60/0xc8
    [<0000000083e10ed7>] kobject_set_name_vargs+0x3c/0xc0
    [<000000000b87affc>] dev_set_name+0x7c/0xa0
    [<000000003fd8fe26>] hnae_ae_register+0xcc/0x190 [hnae]
    [<00000000fe97edc9>] hns_dsaf_ae_init+0x9c/0x108 [hns_dsaf]
    [<00000000c36ff1eb>] hns_dsaf_probe+0x548/0x748 [hns_dsaf]

Fixes: 6fe6611ff2 ("net: add Hisilicon Network Subsystem hnae framework support")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20221018122451.1749171-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-19 17:28:52 -07:00