Commit Graph

1092364 Commits

Author SHA1 Message Date
Alex Elder 8797972aff net: ipa: remove command info pool
The ipa_cmd_info structure now contains only one field, and it's an
enumerated type whose values all fit in 8 bits.  Currently we'll
never use more than 8 TREs in a command transaction, and we can
represent that number of command opcodes in the same space as a 64
bit pointer to an ipa_cmd_info structure.

Define IPA_COMMAND_TRANS_TRE_MAX as the maximum number of TREs that
can be in a command transaction.  Replace the info pointer in a
transaction with a fixed-size array named cmd_opcode[] of that many
bytes.  Store the opcode in this array when adding a command TRE to
a transaction, as was done previously for the info array.  This
makes the ipa_cmd_info unused, so get rid of it.

When committing an immediate command transaction, use the channel's
Boolean command flag to determine whether to fill in the opcode,
which will be taken (as before) from the array in the transaction.

This makes the command info pool unnecessary, so get rid of it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder 4de284b72e net: ipa: remove command direction argument
We no longer use the direction argument for gsi_trans_cmd_add(), so
get rid of it in its definition, and in its seven callers.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder 7ffba3bdf7 net: ipa: get rid of ipa_cmd_info->direction
The direction field of the ipa_cmd_info structure is set, but never
used.  It seems it might have been used for the DMA_SHARED_MEM
immediate command, but the DIRECTION flag is set based on the value
of the passed-in direction flag there.

Anyway, remove this unused field from the ipa_cmd_info structure.
This is done as a separate patch to make it very obvious that it's
not required.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder 2091c79ac4 net: ipa: count the number of modem TX endpoints
In ipa_endpoint_modem_exception_reset_all(), a high estimate was
made of the number of endpoints that need their status register
updated.  We only used what was needed, so the high estimate didn't
matter much.

However the next few patches are going to limit the number of
commands in a single transaction, and the overestimate would exceed
that.  So count the number of modem TX endpoints at initialization
time, and use it in ipa_endpoint_modem_exception_reset_all().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder d15180b4ea net: ipa: kill gsi_trans_commit_wait_timeout()
Since the beginning gsi_trans_commit_wait_timeout() has existed to
provide a way to allow waiting a limited time for a transaction
to complete.  But that function has never been used.

In fact, there is no use for this function, because a transaction
committed to hardware should *always* complete.  The only reason it
might not complete is if there were a hardware failure, or perhaps a
system configuration error.

Furthermore, if a timeout ever did occur, the IPA hardware would be
in an indeterminate state, from which there is no recovery.  It
would require some sort of complete IPA reset, and would require the
participation of the modem, and at this time there is no such
sequence defined.

So get rid of the definition of gsi_trans_commit_wait_timeout(), and
update a few comments accordingly.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder beb90cba60 net: ipa: specify RX aggregation time limit in config data
Don't assume that a 500 microsecond time limit should be used for
all receive endpoints that support aggregation.  Instead, specify
the time limit to use in the configuration data.

Set a 500 microsecond limit for all existing RX endpoints, as before.

Checking for overflow for the time limit field is a bit complicated.
Rather than duplicate a lot of code in ipa_endpoint_data_valid_one(),
call WARN() if any value is found to be too large when encoding it.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder 3cebb7c2ed net: ipa: support hard aggregation limits
Add a new flag for AP receive endpoints that indicates whether
a "hard limit" is used as a criterion for closing aggregation.
Add comments explaining the difference between "hard" and "soft"
aggregation limits.

Pass a flag to ipa_aggr_size_kb() so it computes the proper
aggregation size value whether using hard or soft limits.  Move
that function earlier in "ipa_endpoint.c" so it can be used
without a forward-reference.

Update ipa_endpoint_data_valid_one() so it validates endpoints whose
data indicate a hard aggregation limit is used, and so it reports
set aggregation flags for endpoints without aggregation enabled.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Alex Elder 153213f055 net: ipa: make endpoint HOLB drop configurable
Add a new Boolean flag for RX endpoints defining whether HOLB drop
is initially enabled or disabled for the endpoint.  All existing AP
endpoints should have HOLB drop disabled.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:46:12 +01:00
Julia Lawall 60f243ad14 qed: fix typos in comments
Spelling mistakes (triple letters) in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:30 +01:00
Julia Lawall b993e72cdd nfp: flower: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall 878e2eb29a net: marvell: prestera: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall 3f660c1820 cirrus: cs89x0: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall cc4e7fa549 net: qed: fix typos in comments
Spelling mistakes (triple letters) in comments.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall b0ea505ba0 net/mlx5: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall e34be16bee net: mvpp2: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Julia Lawall 1f36a72ae3 net: sparx5: switchdev: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-22 20:44:29 +01:00
Jakub Kicinski aa5334b1f9 Merge branch 'add-a-bhash2-table-hashed-by-port-address'
Joanne Koong says:

====================
Add a bhash2 table hashed by port + address

This patchset proposes adding a bhash2 table that hashes by port and address.
The motivation behind bhash2 is to expedite bind requests in situations where
the port has many sockets in its bhash table entry, which makes checking bind
conflicts costly especially given that we acquire the table entry spinlock
while doing so, which can cause softirq cpu lockups and can prevent new tcp
connections.

We ran into this problem at Meta where the traffic team binds a large number
of IPs to port 443 and the bind() call took a significant amount of time
which led to cpu softirq lockups, which caused packet drops and other failures
on the machine

The patches are as follows:
1/2 - Adds a second bhash table (bhash2) hashed by port and address
2/2 - Adds a test for timing how long an additional bind request takes when
the bhash entry is populated

When experimentally testing this on a local server for ~24k sockets bound to
the port, the results seen were:

ipv4:
before - 0.002317 seconds
with bhash2 - 0.000018 seconds

ipv6:
before - 0.002431 seconds
with bhash2 - 0.000021 seconds
====================

Link: https://lore.kernel.org/r/20220520001834.2247810-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 18:16:31 -07:00
Joanne Koong 538aaf9b23 selftests: Add test for timing a bind request to a port with a populated bhash entry
This test populates the bhash table for a given port with
MAX_THREADS * MAX_CONNECTIONS sockets, and then times how long
a bind request on the port takes.

When populating the bhash table, we create the sockets and then bind
the sockets to the same address and port (SO_REUSEADDR and SO_REUSEPORT
are set). When timing how long a bind on the port takes, we bind on a
different address without SO_REUSEPORT set. We do not set SO_REUSEPORT
because we are interested in the case where the bind request does not
go through the tb->fastreuseport path, which is fragile (eg
tb->fastreuseport path does not work if binding with a different uid).

To run the test locally, I did:
* ulimit -n 65535000
* ip addr add 2001:0db8:0:f101::1 dev eth0
* ./bind_bhash_test 443

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 18:16:24 -07:00
Joanne Koong d5a42de8bd net: Add a second bind table hashed by port and address
We currently have one tcp bind table (bhash) which hashes by port
number only. In the socket bind path, we check for bind conflicts by
traversing the specified port's inet_bind2_bucket while holding the
bucket's spinlock (see inet_csk_get_port() and inet_csk_bind_conflict()).

In instances where there are tons of sockets hashed to the same port
at different addresses, checking for a bind conflict is time-intensive
and can cause softirq cpu lockups, as well as stops new tcp connections
since __inet_inherit_port() also contests for the spinlock.

This patch proposes adding a second bind table, bhash2, that hashes by
port and ip address. Searching the bhash2 table leads to significantly
faster conflict resolution and less time holding the spinlock.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 18:16:24 -07:00
Jakub Kicinski eac67d83bf wwan: iosm: use a flexible array rather than allocate short objects
GCC array-bounds warns that ipc_coredump_get_list() under-allocates
the size of struct iosm_cd_table *cd_table.

This is avoidable - we just need a flexible array. Nothing calls
sizeof() on struct iosm_cd_list or anything that contains it.

Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Link: https://lore.kernel.org/r/20220520060013.2309497-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:56:50 -07:00
Xiu Jianfeng 29849a486a stcp: Use memset_after() to zero sctp_stream_out_ext
Use memset_after() helper to simplify the code, there is no functional
change in this patch.

Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Link: https://lore.kernel.org/r/20220519062932.249926-1-xiujianfeng@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:42:53 -07:00
Alaa Mohamed f7b5a89c66 net: mscc: fix the alignment in ocelot_port_fdb_del()
align the extack argument of the ocelot_port_fdb_del()
function.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520002040.4442-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:40:44 -07:00
Alaa Mohamed c2e10f5345 net: vxlan: Fix kernel coding style
The continuation line does not align with the opening bracket
and this patch fix it.

Signed-off-by: Alaa Mohamed <eng.alaamohamedsoliman.am@gmail.com>
Link: https://lore.kernel.org/r/20220520003614.6073-1-eng.alaamohamedsoliman.am@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:38:27 -07:00
Jakub Kicinski dbb2f362c7 eth: bnxt: make ulp_id unsigned to make GCC 12 happy
GCC array bounds checking complains that ulp_id is validated
only against upper bound. Make it unsigned.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://lore.kernel.org/r/20220520061955.2312968-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:30:19 -07:00
Amit Cohen 5feba47273 selftests: fib_nexthops: Make ping timeout configurable
Commit 49bb39bdda ("selftests: fib_nexthops: Make the test more robust")
increased the timeout of ping commands to 5 seconds, to make the test
more robust. Make the timeout configurable using '-w' argument to allow
user to change it depending on the system that runs the test. Some systems
suffer from slow forwarding performance, so they may need to change the
timeout.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220519070921.3559701-1-amcohen@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:21:37 -07:00
Yang Yingliang 9ee152ee3e net: wwan: t7xx: use GFP_ATOMIC under spin lock in t7xx_cldma_gpd_set_next_ptr()
Sometimes t7xx_cldma_gpd_set_next_ptr() is called under spin lock,
so add 'gfp_mask' parameter in t7xx_cldma_gpd_set_next_ptr() to pass
the flag.

Fixes: 39d439047f ("net: wwan: t7xx: Add control DMA interface")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20220519032108.2996400-1-yangyingliang@huawei.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:19:19 -07:00
Rolf Eike Beer dc2df00af9 net: tulip: fix build with CONFIG_GSC
Fix typo which breaks build for parisc.

Fixes: 3daebfbeb4 ("net: tulip: convert to devres")
Reported-by: Linux Kernel Functional Testing <lkft@linaro.org>
Link: https://lore.kernel.org/all/CA+G9fYuCzU5VZ_nc+6NEdBXJdVCH=J2SB1Na1G_NS_0BNdGYtg@mail.gmail.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Link: https://lore.kernel.org/r/4719560.GXAFRqVoOG@eto.sf-tec.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:14:40 -07:00
Jakub Kicinski c09b0cd2cc net: avoid strange behavior with skb_defer_max == 1
When user sets skb_defer_max to 1 the kick threshold is 0
(half of 1). If we increment queue length before the check
the kick will never happen, and the skb may get stranded.
This is likely harmless but can be avoided by moving the
increment after the check. This way skb_defer_max == 1
will always kick. Still a silly config to have, but
somehow that feels more correct.

While at it drop a comment which seems to be outdated
or confusing, and wrap the defer_count write with
a WRITE_ONCE() since it's read on the fast path
that avoids taking the lock.

Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220518185522.2038683-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 17:05:36 -07:00
Martin Habets cc398a34d1 sfc/siena: Remove duplicate check on segments
Siena only supports software TSO. This means more code can be deleted,
as pointed out by the Smatch static checker warning:
	drivers/net/ethernet/sfc/siena/tx.c:184 __efx_siena_enqueue_skb()
	warn: duplicate check 'segments' (previous on line 158)

Fixes: 956f2d86cb ("sfc/siena: Remove build references to missing functionality")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/kernel-janitors/YoH5tJMnwuGTrn1Z@kili/
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
Link: https://lore.kernel.org/r/165294463549.23865.4557617334650441347.stgit@palantir17.mph.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 16:46:49 -07:00
Jakub Kicinski dc7769244e tcp_ipv6: set the drop_reason in the right place
Looks like the IPv6 version of the patch under Fixes was
a copy/paste of the IPv4 but hit the wrong spot.
It is tcp_v6_rcv() which uses drop_reason as a boolean, and
needs to be protected against reason == 0 before calling free.
tcp_v6_do_rcv() has a pretty straightforward flow.

The resulting warning looks like this:
  WARNING: CPU: 1 PID: 0 at net/core/skbuff.c:775
  Call Trace:
    tcp_v6_rcv (net/ipv6/tcp_ipv6.c:1767)
    ip6_protocol_deliver_rcu (net/ipv6/ip6_input.c:438)
    ip6_input_finish (include/linux/rcupdate.h:726)
    ip6_input (include/linux/netfilter.h:307)

Fixes: f8319dfd1b ("net: tcp: reset 'drop_reason' to NOT_SPCIFIED in tcp_v{4,6}_rcv()")
Tested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Link: https://lore.kernel.org/r/20220520021347.2270207-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-20 09:35:42 -07:00
David S. Miller b6d261449e Merge branch 'net-ipa-next'
Alex Elder says:

====================
net: ipa: a mix of patches

This series includes a mix of things things that are generally
minor.  The first four are sort of unrelated fixes, and summarizing
them here wouldn't be that helpful.

The last three together make it so only the "configuration data" we
need after initialization is saved for later use.  Most such data is
used only during driver initialization.  But endpoint configuration
is needed later, so the last patch saves a copy of that.  Eventually
we'll want to support reconfiguring endpoints at runtime as well,
and this will facilitate that.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:24 +01:00
Alex Elder 660e52d651 net: ipa: save a copy of endpoint default config
All elements of the default endpoint configuration are used in the
code when programming an endpoint for use.  But none of the other
configuration data is ever needed once things are initialized.

So rather than saving a pointer to *all* of the configuration data,
save a copy of only the endpoint configuration portion.

This will eventually allow endpoint configuration to be modifiable
at runtime.  But even before that it means we won't keep a pointer
to configuration data after when no longer needed.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:24 +01:00
Alex Elder cf4e73a166 net: ipa: rename a few endpoint config data types
Rename the just-moved data structure types to drop the "_data"
suffix, to make it more obvious they are no longer meant to be used
just as read-only initialization data.  Rename the fields and
variables of these types to use "config" instead of "data" in the
name.  This is another small step meant to facilitate review.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:24 +01:00
Alex Elder f0488c540e net: ipa: move endpoint configuration data definitions
Move the definitions of the structures defining endpoint-specific
configuration data out of "ipa_data.h" and into "ipa_endpoint.h".
This is a trivial movement of code without any other change, to
prepare for the next few patches.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:24 +01:00
Alex Elder 75944b040b net: ipa: open-code ether_setup()
About half of the fields set by the call in ipa_modem_netdev_setup()
are overwritten after the call.  Instead, just skip the call, and
open-code the (other) assignments it makes to the net_device
structure fields.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:24 +01:00
Alex Elder 332ef7c814 net: ipa: ignore endianness if there is no header
If we program an RX endpoint to have no header (header length is 0),
header-related endpoint configuration values are meaningless and are
ignored.

The only case we support that defines a header is QMAP endpoints.
In ipa_endpoint_init_hdr_ext() we set the endianness mask value
unconditionally, but it should not be done if there is no header
(meaning it is not configured for QMAP).

Set the endianness conditionally, and rearrange the logic in that
function slightly to avoid testing the qmap flag twice.

Delete an incorrect comment in ipa_endpoint_init_aggr().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:23 +01:00
Alex Elder c9d92cf28c net: ipa: rename a GSI error code
The CHANNEL_NOT_RUNNING error condition has been generalized, so
rename it to be INCORRECT_CHANNEL_STATE.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:23 +01:00
Alex Elder c15f950d14 net: ipa: drop an unneeded transaction reference
In gsi_channel_update(), a reference count is taken on the last
completed transaction "to keep it from completing" before we give
the event back to the hardware.  Completion processing for that
transaction (and any other "new" ones) will not occur until after
this function returns, so there's no risk it completing early.  So
there's no need to take and drop the additional transaction
reference.

Use local variables in the call to gsi_evt_ring_doorbell().

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-05-20 11:12:23 +01:00
Jakub Kicinski 805cb5aadc Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next, misc
updates and fallout fixes from recent Florian's code rewritting (from
last pull request):

1) Use new flowi4_l3mdev field in ip_route_me_harder(), from Martin Willi.

2) Avoid unnecessary GC with a timestamp in conncount, from William Tu
   and Yifeng Sun.

3) Remove TCP conntrack debugging, from Florian Westphal.

4) Fix compilation warning in ctnetlink, from Florian.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: ctnetlink: fix up for "netfilter: conntrack: remove unconfirmed list"
  netfilter: conntrack: remove pr_debug callsites from tcp tracker
  netfilter: nf_conncount: reduce unnecessary GC
  netfilter: Use l3mdev flow key when re-routing mangled packets
====================

Link: https://lore.kernel.org/r/20220519220206.722153-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 21:53:08 -07:00
Jakub Kicinski 16ea52c44e eth: mtk_ppe: fix up after merge
I missed this in the barrage of GCC 12 warnings. Commit cf2df74e20
("net: fix dev_fill_forward_path with pppoe + bridge") changed
the pointer into an array.

Fixes: d7e6f58360 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Link: https://lore.kernel.org/r/20220520012555.2262461-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 21:45:56 -07:00
Jakub Kicinski 0784c25d21 Merge branch 'mptcp-miscellaneous-fixes-and-a-new-test-case'
Mat Martineau says:

====================
mptcp: Miscellaneous fixes and a new test case

Patches 1 and 3 remove helpers that were iterating over the subflow
connection list without proper locking. Iteration was not needed in
either case.

Patch 2 fixes handling of MP_FAIL timeout, checking for orphaned
subflows instead of using the MPTCP socket data lock and connection
state.

Patch 4 adds a test for MP_FAIL timeout using tc pedit to induce checksum
failures.
====================

Link: https://lore.kernel.org/r/20220518220446.209750-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 20:05:10 -07:00
Geliang Tang 2ba18161d4 selftests: mptcp: add MP_FAIL reset testcase
Add the multiple subflows test case for MP_FAIL, to test the MP_FAIL
reset case. Use the test_linkfail value to make 1024KB test files.

Invoke reset_with_fail() to use 'iptables' and 'tc action pedit' rules
to produce the bit flips to trigger the checksum failures on ns2eth2.
Add delays on ns2eth1 to make sure more data can translate on ns2eth2.

The check_invert flag is enabled in reset_with_fail(), so this test
prints out the inverted bytes, instead of the file mismatch errors.

Invoke pedit_action_pkts() to get the numbers of the packets edited
by the tc pedit actions, and print this numbers to the output.

Co-developed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 20:05:07 -07:00
Mat Martineau d9fb797046 mptcp: Do not traverse the subflow connection list without lock
The MPTCP socket's conn_list (list of subflows) requires the socket lock
to access. The MP_FAIL timeout code added such an access, where it would
check the list of subflows both in timer context and (later) in workqueue
context where the socket lock is held.

Rather than check the list twice, remove the check in the timeout
handler and only depend on the check in the workqueue. Also remove the
MPTCP_FAIL_NO_RESPONSE flag, since mptcp_mp_fail_no_response() has
insignificant overhead and can be checked on each worker run.

Fixes: 49fa1919d6 ("mptcp: reset subflow when MP_FAIL doesn't respond")
Reported-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 20:05:07 -07:00
Mat Martineau d42f9e4e23 mptcp: Check for orphaned subflow before handling MP_FAIL timer
MP_FAIL timeout (waiting for a peer to respond to an MP_FAIL with
another MP_FAIL) is implemented using the MPTCP socket's sk_timer. That
timer is also used at MPTCP socket close, so it's important to not have
the two timer users interfere with each other.

At MPTCP socket close, all subflows are orphaned before sk_timer is
manipulated. By checking the SOCK_DEAD flag on the subflows, each
subflow can determine if the timer is safe to alter without acquiring
any MPTCP-level lock. This replaces code that was using the
mptcp_data_lock and MPTCP-level socket state checks that did not
correctly protect the timer.

Fixes: 49fa1919d6 ("mptcp: reset subflow when MP_FAIL doesn't respond")
Reviewed-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 20:05:07 -07:00
Paolo Abeni 7b16871f99 mptcp: stop using the mptcp_has_another_subflow() helper
The mentioned helper requires the msk socket lock, and the
current callers don't own it nor can't acquire it, so the
access is racy.

All the current callers are really checking for infinite mapping
fallback, and the latter condition is explicitly tracked by
the relevant msk variable: we can safely remove the caller usage
- and the caller itself.

The issue is present since MP_FAIL implementation, but the
fix only applies since the infinite fallback support, ence the
somewhat unexpected fixes tag.

Fixes: 0530020a7c ("mptcp: track and update contiguous data status")
Acked-and-tested-by: Geliang Tang <geliang.tang@suse.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 20:05:07 -07:00
Yuchung Cheng 9ad084d666 tcp: improve PRR loss recovery
This patch improves TCP PRR loss recovery behavior for a corner
case. Previously during PRR conservation-bound mode, it strictly
sends the amount equals to the amount newly acked or s/acked.

The patch changes s.t. PRR may send additional amount that was banked
previously (e.g. application-limited) in the conservation-bound
mode, similar to the slow-start mode. This unifies and simplifies the
algorithm further and may improve the recovery latency. This change
still follow the general packet conservation design principle and
always keep inflight/cwnd below the slow start threshold set
by the congestion control module.

PRR is described in RFC 6937. We'll include this change in the
latest revision rfc6937-bis as well.

Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220519003410.2531936-1-ycheng@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 18:49:50 -07:00
Jakub Kicinski 7ebe52f555 docs: change the title of networking docs
The current title of our section of the documentation is
Linux Networking Documentation. Since we're describing
a section of Linux Documentation repeating those two
words seems redundant.

Link: https://lore.kernel.org/r/20220518234346.2088436-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 18:44:57 -07:00
Jakub Kicinski 1172aa6e4a net: ipa: don't proceed to out-of-bound write
GCC 12 seems upset that we check ipa_irq against array bound
but then proceed, anyway:

drivers/net/ipa/ipa_interrupt.c: In function ‘ipa_interrupt_add’:
drivers/net/ipa/ipa_interrupt.c:196:27: warning: array subscript 30 is above array bounds of ‘void (*[30])(struct ipa *, enum ipa_irq_id)’ [-Warray-bounds]
  196 |         interrupt->handler[ipa_irq] = handler;
      |         ~~~~~~~~~~~~~~~~~~^~~~~~~~~
drivers/net/ipa/ipa_interrupt.c:42:27: note: while referencing ‘handler’
   42 |         ipa_irq_handler_t handler[IPA_IRQ_COUNT];
      |                           ^~~~~~~

Reviewed-by: Alex Elder <elder@linaro.org>
Link: https://lore.kernel.org/r/20220519004417.2109886-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 18:44:51 -07:00
Jakub Kicinski dbbc7d04c5 net: wwan: iosm: remove pointless null check
GCC 12 warns:

drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c: In function ‘ipc_protocol_dl_td_process’:
drivers/net/wwan/iosm/iosm_ipc_protocol_ops.c:406:13: warning: the comparison will always evaluate as ‘true’ for the address of ‘cb’ will never be NULL [-Waddress]
  406 |         if (!IPC_CB(skb)) {
      |             ^

Indeed the check seems entirely pointless. Hopefully the other
validation checks will catch if the cb is bad, but it can't be
NULL.

Reviewed-by: M Chetan Kumar <m.chetan.kumar@intel.com>
Link: https://lore.kernel.org/r/20220519004342.2109832-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 18:44:44 -07:00
Jakub Kicinski 53332f8451 Merge branch 'lantiq_gswip-two-small-fixes'
Martin Blumenstingl says:

====================
lantiq_gswip: Two small fixes

While updating the Lantiq target in OpenWrt to Linux 5.15 I came across
an FDB related error message. While that still needs to be solved I
found two other small issues on the way.

This series fixes the two minor issues found while revisiting the FDB
code in the lantiq_gswip driver:
- The first patch fixes the start index used in gswip_port_fdb() to
  find the entry with the matching bridge. The updated logic is now
  consistent with the rest of the driver.
- The second patch fixes a typo in a dev_err() message.

[0] https://lore.kernel.org/netdev/20220517194015.1081632-1-martin.blumenstingl@googlemail.com/
====================

Link: https://lore.kernel.org/r/20220518220051.1520023-1-martin.blumenstingl@googlemail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-19 18:40:57 -07:00