Commit Graph

918659 Commits

Author SHA1 Message Date
Florian Westphal 72511aab95 mptcp: avoid blocking in tcp_sendpages
The transmit loop continues to xmit new data until an error is returned
or all data was transmitted.

For the blocking i/o case, this means that tcp_sendpages() may block on
the subflow until more space becomes available, i.e. we end up sleeping
with the mptcp socket lock held.

Instead we should check if a different subflow is ready to be used.

This restarts the subflow sk lookup when the tx operation succeeded
and the tcp subflow can't accept more data or if tcp_sendpages
indicates -EAGAIN on a blocking mptcp socket.

In that case we also need to set the NOSPACE bit to make sure we get
notified once memory becomes available.

In case all subflows are busy, the existing logic will wait until a
subflow is ready, releasing the mptcp socket lock while doing so.

The mptcp worker already sets DONTWAIT, so no need to make changes there.

v2:
 * set NOSPACE bit
 * add a comment to clarify that mptcp-sk sndbuf limits need to
   be checked as well.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17 12:35:34 -07:00
Florian Westphal fb529e62d3 mptcp: break and restart in case mptcp sndbuf is full
Its not enough to check for available tcp send space.

We also hold on to transmitted data for mptcp-level retransmits.
Right now we will send more and more data if the peer can ack data
at the tcp level fast enough, since that frees up tcp send buffer space.

But we also need to check that data was acked and reclaimed at the mptcp
level.

Therefore add needed check in mptcp_sendmsg, flush tcp data and
wait until more mptcp snd space becomes available if we are over the
limit.  Before we wait for more data, also make sure we start the
retransmit timer if we ran out of sndbuf space.

Otherwise there is a very small chance that we wait forever:

 * receiver is waiting for data
 * sender is blocked because mptcp socket buffer is full
 * at tcp level, all data was acked
 * mptcp-level snd_una was not updated, because last ack
   that acknowledged the last data packet carried an older
   MPTCP-ack.

Restarting the retransmit timer avoids this problem: if TCP
subflow is idle, data is retransmitted from the RTX queue.

New data will make the peer send a new, updated MPTCP-Ack.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17 12:35:34 -07:00
Florian Westphal a0e17064d4 mptcp: move common nospace-pattern to a helper
Paolo noticed that ssk_check_wmem() has same pattern, so add/use
common helper for both places.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17 12:35:34 -07:00
David Ahern eb682677f5 selftests: Drop 'pref medium' in route checks
The 'pref medium' attribute was moved in iproute2 to be near the prefix
which is where it applies versus after the last nexthop. The nexthop
tests were updated to drop the string from route checking, but it crept
in again with the compat tests.

Fixes: 4dddb5be13 ("selftests: net: add new testcases for nexthop API compat mode sysctl")
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-17 12:26:55 -07:00
David S. Miller 2f6ca9570d Merge branch 'net-ipa-sc7180-suspend-resume'
Alex Elder says:

====================
net: ipa: sc7180 suspend/resume

This series permits suspend/resume to work for the IPA driver
on the Qualcomm SC7180 SoC.  The IPA version on this SoC requires
interrupts to be enabled when the suspend and resume callbacks are
made, and the first patch moves away from using the noirq variants.
The second patch fixes a problem with resume that occurs because
pending interrupts were being cleared before starting a channel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:47:19 -07:00
Alex Elder 195ef57f87 net: ipa: do not clear interrupt in gsi_channel_start()
In gsi_channel_start() there is harmless-looking comment "Clear the
channel's event ring interrupt in case it's pending".  The intent
was to avoid getting spurious interrupts when first bringing up a
channel.

However we now use channel stop/start to implement suspend and
resume, and an interrupt pending at the time we resume is actually
something we don't want to ignore.

The very first time we bring up the channel we do not expect an
interrupt to be pending, and even if it were, the effect would
simply be to schedule NAPI on that channel, which would find nothing
to do, which is not a problem.

Stop clearing any pending IEOB interrupt in gsi_channel_start().
That leaves one caller of the trivial function gsi_isr_ieob_clear().
Get rid of that function and just open-code it in gsi_isr_ieob()
instead.

This fixes a problem where suspend/resume IPA v4.2 would get stuck
when resuming after a suspend.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:47:19 -07:00
Alex Elder a4f48458ca net: ipa: don't use noirq suspend/resume callbacks
Use the suspend and resume callbacks rather than suspend_noirq and
resume_noirq.  With IPA v4.2, we use the CHANNEL_STOP command to
implement a suspend, and without interrupts enabled, that command
won't complete.

Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:47:19 -07:00
David S. Miller d53b1162d7 Merge branch 'mlxsw-Reorganize-trap-data'
Ido Schimmel says:

====================
mlxsw: Reorganize trap data

This patch set does not include any functional changes. It merely
reworks the internal storage of traps, trap groups and trap policers in
mlxsw to each use a single array.

These changes allow us to get rid of the multiple arrays we currently
have for traps, which make the trap data easier to validate and extend
with more per-trap information in the future. It will also allow us to
more easily add per-ASIC traps in future submissions.

Last two patches include minor changes to devlink-trap selftests.

Tested with existing devlink-trap selftests.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:32 -07:00
Ido Schimmel 04cc99d9bd selftests: mlxsw: Do not hard code trap group name
It can be derived dynamically from the trap's name, so drop it.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Ido Schimmel 84e0d83567 selftests: devlink_lib: Remove double blank line
One blank line is enough.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Ido Schimmel 200b7cca0b mlxsw: spectrum_trap: Store all trap data in one array
Each trap registered with devlink is mapped to one or more Rx listeners.
These listeners allow the switch driver (e.g., mlxsw_spectrum) to
register a function that is called when a packet is received (trapped)
for a specific reason.

Currently, three arrays are used to describe the mapping between the
logical devlink traps and the Rx listeners.

Instead, get rid of these arrays and store all the information in one
array that is easier to validate and extend with more per-trap
information.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Ido Schimmel b14a40dbde mlxsw: spectrum_trap: Store all trap group data in one array
Use one array to store all the information about all the trap groups
instead of hard coding it in code. This will be used in future patches
to disable certain functionality (e.g., policer binding) on a trap group
basis.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Ido Schimmel cc678f4dbc mlxsw: spectrum_trap: Store all trap policer data in one array
Instead of maintaining an array of policers and a linked list, only
maintain an array.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Ido Schimmel 85d4ec5925 mlxsw: spectrum_trap: Move struct definition out of header file
'struct mlxsw_sp_trap_policer_item' is only used in one file, so move it
there.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 16:42:31 -07:00
Heiner Kallweit 13f15b59ad r8169: remove remaining call to mdiobus_unregister
After having switched to devm_mdiobus_register() also this remaining
call to mdiobus_unregister() can be removed.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 15:20:34 -07:00
David S. Miller 1ab9b5eabb Merge branch 'ethtool-set_channels-add-a-few-more-checks'
Jakub Kicinski says:

====================
ethtool: set_channels: add a few more checks

There seems to be a few more things we can check in the core before
we call drivers' ethtool_ops->set_channels. Adding the checks to
the core simplifies the drivers. This set only includes changes
to the NFP driver as an example.

There is a small risk in the first patch that someone actually
purposefully accepts a strange configuration without RX or TX
channels, but I couldn't find such a driver in the tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:56:30 -07:00
Jakub Kicinski 75c36dbb1c ethtool: don't call set_channels in drivers if config didn't change
Don't call drivers if nothing changed. Netlink code already
contains this logic.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:56:30 -07:00
Jakub Kicinski 4df6ff2a99 nfp: don't check lack of RX/TX channels
Core will now perform this check.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:56:30 -07:00
Jakub Kicinski 7be92514b9 ethtool: check if there is at least one channel for TX/RX in the core
Having a channel config with no ability to RX or TX traffic is
clearly wrong. Check for this in the core so the drivers don't
have to.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:56:30 -07:00
Christoph Paasch a0c1d0eafd mptcp: Use 32-bit DATA_ACK when possible
RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not
sending 64-bit data-sequence numbers. The 64-bit DSN is only there for
extreme scenarios when a very high throughput subflow is combined with a
long-RTT subflow such that the high-throughput subflow wraps around the
32-bit sequence number space within an RTT of the high-RTT subflow.

It is thus a rare scenario and we should try to use the 32-bit DATA_ACK
instead as long as possible. It allows to reduce the TCP-option overhead
by 4 bytes, thus makes space for an additional SACK-block. It also makes
tcpdumps much easier to read when the DSN and DATA_ACK are both either
32 or 64-bit.

Signed-off-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:51:10 -07:00
Nicolas Dichtel 9efd6a3cec netns: enable to inherit devconf from current netns
The goal is to be able to inherit the initial devconf parameters from the
current netns, ie the netns where this new netns has been created.

This is useful in a containers environment where /proc/sys is read only.
For example, if a pod is created with specifics devconf parameters and has
the capability to create netns, the user expects to get the same parameters
than his 'init_net', which is not the real init_net in this case.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:46:37 -07:00
Ioana Ciornei 74a1c05916 dpaa2-eth: add bulking to XDP_TX
Add driver level bulking to the XDP_TX action.

An array of frame descriptors is held for each Tx frame queue and
populated accordingly when the action returned by the XDP program is
XDP_TX. The frames will be actually enqueued only when the array is
filled. At the end of the NAPI cycle a flush on the queued frames is
performed in order to enqueue the remaining FDs.

Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:45:25 -07:00
Kevin Lo 6f42a29305 net: phy: broadcom: fix checkpatch complains about tabs
This patch makes checkpatch happy for tabs

Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-16 13:38:33 -07:00
David S. Miller ea6119aa67 mlx5-updates-2020-05-15
mlx5 core and mlx5e (netdev) updates:
 
 1) Two fixes for release all FW pages support.
 2) Improvement in calculating the send queue stop room on tx
 3) Flow steering auto-groups creation improvements
 4) TC offload fix for Connection tracking with NAT action
 5) IPoIB support for self looback to allow communication between ipoib
 pkey child interfaces on the same host.
 6) DCBNL cleanup to avoid #ifdef DCBNL all over the main mlx5e code
 7) Small and trivial code cleanup
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl6/G1YACgkQSD+KveBX
 +j6z5Qf9GgUyciytFq3gcmIqjjvhugWuGAjsyD5i0X/TblQJXAAfXLBk4SDJDwdC
 FbjUpDzO6kbKUUoOYSUlyY8LAzve+jObCqRn6GHtJAm7qxaN/OuPhIBrh7ysphxV
 aPRV564KXMqOyOKmufWSmYfJtxthSv/c3ZkTZpYeqNr0psNSfz8mXX3YOrtr+UBH
 5fUoJO5sICtMvnswQD/uy1KcyH6YAtxSEcAtTSw5NZJRuXe8cxZ7g+4CrPmFUiQy
 JvXigroGgpL7WdHDaId74VkIOplSQZvIbebhEZ53Fy7aQQpwLjLY26TmCcvLMBSk
 4Y1rFp0/+rTXUXWaRfKYRQfW4qGLFA==
 =pY9E
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2020-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2020-05-15

mlx5 core and mlx5e (netdev) updates:

1) Two fixes for release all FW pages support.
2) Improvement in calculating the send queue stop room on tx
3) Flow steering auto-groups creation improvements
4) TC offload fix for Connection tracking with NAT action
5) IPoIB support for self looback to allow communication between ipoib
pkey child interfaces on the same host.
6) DCBNL cleanup to avoid #ifdef DCBNL all over the main mlx5e code
7) Small and trivial code cleanup
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 16:36:46 -07:00
Nathan Chancellor 2ea46dc686 ethernet: ti: am65-cpts: Add missing inline qualifier to stub functions
When building with Clang:

In file included from drivers/net/ethernet/ti/am65-cpsw-ethtool.c:15:
drivers/net/ethernet/ti/am65-cpts.h:58:12: warning: unused function
'am65_cpts_ns_gettime' [-Wunused-function]
static s64 am65_cpts_ns_gettime(struct am65_cpts *cpts)
           ^
drivers/net/ethernet/ti/am65-cpts.h:63:12: warning: unused function
'am65_cpts_estf_enable' [-Wunused-function]
static int am65_cpts_estf_enable(struct am65_cpts *cpts,
           ^
drivers/net/ethernet/ti/am65-cpts.h:69:13: warning: unused function
'am65_cpts_estf_disable' [-Wunused-function]
static void am65_cpts_estf_disable(struct am65_cpts *cpts, int idx)
            ^
3 warnings generated.

These functions need to be marked as inline, which adds __maybe_unused,
to avoid these warnings, which is the pattern for stub functions.

Fixes: ec008fa2a9 ("ethernet: ti: am65-cpts: add routines to support taprio offload")
Link: https://github.com/ClangBuiltLinux/linux/issues/1026
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 16:32:27 -07:00
Tariq Toukan 3f3ab178c7 net/mlx5e: Take DCBNL-related definitions into dedicated files
Take DCBNL-related definitions out of the common en.h header,
Use a dedicated header file for exposing them.
Some need not to be exposed, use them locally in the .c file.
Use stubs to eliminate use of CONFIG_MLX5_CORE_EN_DCB in the
generic control flows.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:36 -07:00
Maxim Mikityanskiy 5ffb4d858b net/mlx5e: Calculate SQ stop room in a robust way
Currently, different formulas are used to estimate the space that may be
taken by WQEs in the SQ during a single packet transmit. This space is
called stop room, and it's checked in the end of packet transmit to find
out if the next packet could overflow the SQ. If it could, the driver
tells the kernel to stop sending next packets.

Many factors affect the stop room:

1. Padding with NOPs to avoid WQEs spanning over page boundaries.

2. Enabled and disabled offloads (TLS, upcoming MPWQE).

3. The maximum size of a WQE.

The padding is performed before every WQE if it doesn't fit the current
page.

The current formula assumes that only one padding will be required per
packet, and it doesn't take into account that the WQEs posted during the
transmission of a single packet might exceed the page size in very rare
circumstances. For example, to hit this condition with 4096-byte pages,
TLS offload will have to interrupt an almost-full MPWQE session, be in
the resync flow and try to transmit a near to maximum amount of data.

To avoid SQ overflows in such rare cases after MPWQE is added, this
patch introduces a more robust formula to estimate the stop room. The
new formula uses the fact that a WQE of size X will not require more
than X-1 WQEBBs of padding. More exact estimations are possible, but
they result in much more complex and error-prone code for little gain.

Before this patch, the TLS stop room included space for both INNOVA and
ConnectX TLS offloads that couldn't run at the same time anyway, so this
patch accounts only for the active one.

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:34 -07:00
Erez Shitrit 8b46d424a7 net/mlx5e: IPoIB, Drop multicast packets that this interface sent
After enabled loopback packets for IPoIB, we need to drop these packets
that this HCA has replicated and came back to the same interface that
sent them.

Fixes: 4c6c615e3f ("net/mlx5e: IPoIB, Add PKEY child interface nic profile")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:32 -07:00
Erez Shitrit 80639b199c net/mlx5e: IPoIB, Enable loopback packets for IPoIB interfaces
Enable loopback of unicast and multicast traffic for IPoIB enhanced
mode.
This will allow interfaces with the same pkey to communicate between
them e.g cloned interfaces that located in different namespaces.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:30 -07:00
Roi Dayan 9102d836d2 net/mlx5e: CT: Fix offload with CT action after CT NAT action
It could be a chain of rules will do action CT again after CT NAT
Before this fix matching will break as we get into the CT table
after NAT changes and not CT NAT.
Fix this by adding pre ct and pre ct nat tables to skip ct/ct_nat
tables and go straight to post_ct table if ct/nat was already done.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:27 -07:00
Eran Ben Elisha 90bf1c8dbd net/mlx5: Move internal timer read function to clock library
Move mlx5_read_internal_timer() into lib/clock.c file as it is being
used there. As such, make this function a static one.

In addition, rearrange headers include to support function move.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Aya Levin <ayal@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:25 -07:00
Paul Blakey 49c0355d30 net/mlx5: Wait for inactive autogroups
Currently, if one thread tries to add an entry to an autogrouped table
with no free matching group, while another thread is in the process of
creating a new matching autogroup, it doesn't wait for the new group
creation, and creates an unnecessary new autogroup.

Instead of skipping inactive, wait on the write lock of those groups.

Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:22 -07:00
Parav Pandit 41798df9bf net/mlx5: Drain wq first during PCI device removal
mlx5_unload_one() is done with cleanup = true only once.

So instead of doing health wq drain inside the if(), directly do
during PCI device removal.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:20 -07:00
Parav Pandit 4162f58b47 net/mlx5: Have single error unwinding path
Having multiple error unwinding path are error prone.
Lets have just one error unwinding path.

Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:17 -07:00
Eran Ben Elisha e7f860e210 net/mlx5: Fix a bug of releasing wrong chunks on > 4K page size systems
On systems with page size larger than 4K, a fwp object has few 4K chunks.
Fix a bug in fwp free flow where the chunk address was dropped and
fwp->addr was used instead (first chunk address). This caused a wrong
update of fwp->bitmask which later can cause errors in re-alloc fwp
chunk flow.

In order to fix this it, re-factor the release flow:
- Free 4k: Releases a specific 4k chunk inside the fwp, defined by
  starting address.
- Free fwp: Unconditionally release the whole fwp and its resources.
Free addr will call free fwp if all chunks were released, in order to do
code sharing.

In addition, fix npages to count for all released chunks correctly.

Fixes: c6168161f6 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:15 -07:00
Eran Ben Elisha 2726cd4a29 net/mlx5: Dedicate fw page to the requesting function
The cited patch assumes that all chuncks in a fw page belong to the same
function, thus the driver must dedicate fw page to the requesting
function, which is actually what was intedned in the original fw pages
allocator design, hence the fwp->func_id !

Up until the cited patch everything worked ok, but now "relase all pages"
is broken on systems with page_size > 4k.

Fix this by dedicating fw page to the requesting function id via adding a
func_id parameter to alloc_4k() function.

Fixes: c6168161f6 ("net/mlx5: Add support for release all pages event")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2020-05-15 15:44:12 -07:00
David S. Miller da07f52d3c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Move the bpf verifier trace check into the new switch statement in
HEAD.

Resolve the overlapping changes in hinic, where bug fixes overlap
the addition of VF support.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 13:48:59 -07:00
Linus Torvalds f85c1598dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Fix sk_psock reference count leak on receive, from Xiyu Yang.

 2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.

 3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
    from Maciej Żenczykowski.

 4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
    Abeni.

 5) Fix netprio cgroup v2 leak, from Zefan Li.

 6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.

 7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.

 8) Various bpf probe handling fixes, from Daniel Borkmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
  selftests: mptcp: pm: rm the right tmp file
  dpaa2-eth: properly handle buffer size restrictions
  bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
  bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
  bpf: Restrict bpf_probe_read{, str}() only to archs where they work
  MAINTAINERS: Mark networking drivers as Maintained.
  ipmr: Add lockdep expression to ipmr_for_each_table macro
  ipmr: Fix RCU list debugging warning
  drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
  net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
  tcp: fix error recovery in tcp_zerocopy_receive()
  MAINTAINERS: Add Jakub to networking drivers.
  MAINTAINERS: another add of Karsten Graul for S390 networking
  drivers: ipa: fix typos for ipa_smp2p structure doc
  pppoe: only process PADT targeted at local interfaces
  selftests/bpf: Enforce returning 0 for fentry/fexit programs
  bpf: Enforce returning 0 for fentry/fexit progs
  net: stmmac: fix num_por initialization
  security: Fix the default value of secid_to_secctx hook
  libbpf: Fix register naming in PT_REGS s390 macros
  ...
2020-05-15 13:10:06 -07:00
Linus Torvalds d5dfe4f1b4 Second RDMA 5.7 rc pull request
A few minor bug fixes for user visible defects, and one regression:
 
 - Various bugs from static checkers and syzkaller
 
 - Add missing error checking in mlx4
 
 - Prevent RTNL lock recursion in i40iw
 
 - Fix segfault in cxgb4 in peer abort cases
 
 - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could be
   lost, and wasn't delivered to all the FDs
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl6+5V8ACgkQOG33FX4g
 mxo/Zw//QlrhDSM4DG4hZsFhFX3i/vT7xbxkNTp/U1+Vue9cKcUlotNOzQ79lQur
 /12QEiDffLZNsy2jCD1nxf/rEo9NDc8iSHPcryQ3aGa2K3XJ3QQ2ZhHBzfFU9GVP
 GQYlGt9uB4t6LEcU4P/ZCdf1nmkfYFvNDfmveVWrPLasABK7WqxCBXaqbLed+0ca
 lGhFBGLdGNJqyK2BkPCtXr9XYTjzZhW0lJqMuex0YD7cIAFfn+qbzvLQheBy/mhH
 o9n28GQbIgpNXvYz2HvUkTfiwDLylFaNVBYVctnm10cbNtLHv2J0bBQQb2ZaXUAM
 xt54AQ2QAMHKC2h6sUqyDAmWKfPEOnxZ9LycYDa0ZaIvm/uK/jiEOXE9XbQeiOKF
 eHiyuDg0EB4AVcYlNSm7roHdbh3rAluHerGe4Kv3efF/b1Zt2mtTW7q/XuROfSeT
 WBnALyl7RurnSG0HY9hyUMak65JwolgqPdBDpzRjqIF5jeKW0nn8MsbGTPGWlcyN
 zLu5IQn/vv+sfISav1cVMRoijboe85+3SOVtMBQOJ2m7EkdzRyRSQh3oG1556n2y
 oqseNSQFKNFA1p4hEeTf8BVEqVX3gj50ykj+2g1HyxH3sB+iE3+EctnHNhfd3IKI
 n6e/67Nm0ub/WM1OXeuXBmW0NvsTt0aPBfsp4s+oq3eICD4YbFE=
 =R5BJ
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:
 "A few minor bug fixes for user visible defects, and one regression:

   - Various bugs from static checkers and syzkaller

   - Add missing error checking in mlx4

   - Prevent RTNL lock recursion in i40iw

   - Fix segfault in cxgb4 in peer abort cases

   - Fix a regression added in 5.7 where the IB_EVENT_DEVICE_FATAL could
     be lost, and wasn't delivered to all the FDs"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/uverbs: Move IB_EVENT_DEVICE_FATAL to destroy_uobj
  RDMA/uverbs: Do not discard the IB_EVENT_DEVICE_FATAL event
  RDMA/iw_cxgb4: Fix incorrect function parameters
  RDMA/core: Fix double put of resource
  IB/core: Fix potential NULL pointer dereference in pkey cache
  IB/hfi1: Fix another case where pq is left on waitlist
  IB/i40iw: Remove bogus call to netdev_master_upper_dev_get()
  IB/mlx4: Test return value of calls to ib_get_cached_pkey
  RDMA/rxe: Always return ERR_PTR from rxe_create_mmap_info()
  i40iw: Fix error handling in i40iw_manage_arp_cache()
2020-05-15 13:06:56 -07:00
Linus Torvalds ce24729667 linux-kselftest-5.7-rc6
This Kselftest update for Linux 5.7-rc6 consists of
 
 - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors
 - ftrace test handling when test module doesn't exist
 - nsfs test fix to replace zero-length array with flexible-array
 - dmabuf-heaps test fix to return clear error value
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAl6+0KwACgkQCwJExA0N
 QxyBog/8DME+7YtjZ8TqgeWuxMckDHqVBTLTobd3Wyd2Vsk6W3EWKaDo+R8sqRfN
 VmjExY1PH0+N4JAFuQxR1UAmO0E/YZ/HwGxU6IH4iAoXtVZ0Tc52jzlZkwo0/RwF
 uOXukaEjLNlMva06/CEptxZy4UAWKP1l9DaEYjY48zDC4zlXaPgGjk69MACZhK4E
 2M6mWJuUewf+PobZTGFv/4RPIAihZpx8gDMRBR/hy+RWKh1qNhpNeXQRrwTQm0u3
 5ewOQjP26VRam4pA/e7RsU1b2IR8nbDZFpX3dTEzkme9dJ2hoN3DZLOw0/WXS/aQ
 /Ha5bPBI7RMhq0FUQP1vMErZ6Da/YEkbTxmCpx1kL6vliKqMkHb+epLABTJlGGnD
 i+ENoLeJVhcyQ3vyWq3VNZDGlHYF3KuUftf21sdG4bXDajucnu/lSPvfSKzhMNIG
 c1+AcvTAeGTPHi2dCfdETjg1dakO4exrX9/ABvu3JY1pp9D8DLHZNc/O9iOvG0O5
 6Lp9LhtxCxKdAGJVxKxzXPjwPDSnzNV5sGm1ElFpXzA0Nv17VH/DIL8i6745sQXW
 ik29Z1QNv3QIOs5U1pe5wyY22D51UJOfS7hVWH+pWPveG46ApYeUxqmfNYoG/Vtr
 cMspPzCSgZTAvkswCGfnSahlqCSpJxzuxUEkM4HlPTulrW4jbDk=
 =2cqI
 -----END PGP SIGNATURE-----

Merge tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:

 - lkdtm runner fixes to prevent dmesg clearing and shellcheck errors

 - ftrace test handling when test module doesn't exist

 - nsfs test fix to replace zero-length array with flexible-array

 - dmabuf-heaps test fix to return clear error value

* tag 'linux-kselftest-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests/lkdtm: Use grep -E instead of egrep
  selftests/lkdtm: Don't clear dmesg when running tests
  selftests/ftrace: mark irqsoff_tracer.tc test as unresolved if the test module does not exist
  tools/testing: Replace zero-length array with flexible-array
  kselftests: dmabuf-heaps: Fix confused return value on expected error testing
2020-05-15 12:57:50 -07:00
Linus Torvalds 67e45621af RISC-V Fixes for 5.7-rc6
This consists of a handful of build fixes, all found by Huawei's autobuilder.
 None of these patches should have any functional impact on kernels that build,
 and they're mostly related to various features intermingling with !MMU.  While
 some of these might be better hoisted to generic code, it seems better to have
 the simple fixes in the meanwhile.
 
 As far as I know these are the only outstanding patches for 5.7.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAl6+3AsTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYic+uD/9nfqXgmKUV+X0/Sr4L61fqOPPXAVY1
 k6JprJETUAMQdd2blGvuBcQqr76HQ6SgmBe3mQG2blL0XEyp54RWRAOwd5Ygz96P
 C8pQ+7n6Rmc1kA7U3RUkgGCWpCf6WHyq8wpYeZaPkPYBJz02wi8Czzwx8HHRNc+g
 r2hSlP2WT4UI6q1xWwKtwgQ8OpQXGvNHxG8DNkg0scm8CDWxO253FhbknrhK3Dx+
 2+kbUyqLdS0kQfPxcCooFkb05X4fCIwfbYYamNIfYGh8r8qNcNctw8j4maZ8/PcZ
 +Y+XuHHR/dck6oky9KNGE2K0X+P1yjbzsEb5rTs09GjDxpvtegGHRyngdeQ2GVXA
 8GeVTSpIfe76PcQ1yZ4FJTYZAp6yqIsRE+Brcp//AsWghZp7jnBL3TRJ+aV4u51V
 qU/UL+wIlUv2YIT9Kqe5D2/uki7IDdCqe/IZc0eoY+2seJ+Y5WbLwtz10f/Dfgt7
 6qBYsbjISsIOUqXrGsv4guvQuO0omwz55MikQkKWius5IRe8RkAFZWezBpD5BJ8q
 IyhRvtsIHjCxr/5V4KIw9RMeQXOtt1PI6CELvlRpu5XjoB3Q2M/2E2YzN7xGPGc7
 nuW5V9rAWhxAAta4kwtjb899AqFl3w6wt9c8znoVKzZHYOztF3a4gSnY7s3r1kfp
 klG2AHELH3jdfg==
 =8koH
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:
 "A handful of build fixes, all found by Huawei's autobuilder.

  None of these patches should have any functional impact on kernels
  that build, and they're mostly related to various features
  intermingling with !MMU.

  While some of these might be better hoisted to generic code, it seems
  better to have the simple fixes in the meanwhile.

  As far as I know these are the only outstanding patches for 5.7"

* tag 'riscv-for-linus-5.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: mmiowb: Fix implicit declaration of function 'smp_processor_id'
  riscv: pgtable: Fix __kernel_map_pages build error if NOMMU
  riscv: Make SYS_SUPPORTS_HUGETLBFS depends on MMU
  riscv: Disable ARCH_HAS_DEBUG_VIRTUAL if NOMMU
  riscv: Add pgprot_writecombine/device and PAGE_SHARED defination if NOMMU
  riscv: stacktrace: Fix undefined reference to `walk_stackframe'
  riscv: Fix unmet direct dependencies built based on SOC_VIRT
  riscv: perf: RISCV_BASE_PMU should be independent
  riscv: perf_event: Make some funciton static
2020-05-15 12:47:15 -07:00
David S. Miller 93d43e5868 Merge branch 'mptcp-fix-MP_JOIN-failure-handling'
Paolo Abeni says:

====================
mptcp: fix MP_JOIN failure handling

Currently if we hit an MP_JOIN failure on the third ack, the child socket is
closed with reset, but the request socket is not deleted, causing weird
behaviors.

The main problem is that MPTCP's MP_JOIN code needs to plug it's own
'valid 3rd ack' checks and the current TCP callbacks do not allow that.

This series tries to address the above shortcoming introducing a new MPTCP
specific bit in a 'struct tcp_request_sock' hole, and leveraging that to allow
tcp_check_req releasing the request socket when needed.

The above allows cleaning-up a bit current MPTCP hooking in tcp_check_req().

An alternative solution, possibly cleaner but more invasive, would be
changing the 'bool *own_req' syn_recv_sock() argument into 'int *req_status'
and let MPTCP set it to 'REQ_DROP'.

v1 -> v2:
 - be more conservative about drop_req initialization

RFC -> v1:
 - move the drop_req bit inside tcp_request_sock (Eric)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 12:30:13 -07:00
Paolo Abeni 729cd6436f mptcp: cope better with MP_JOIN failure
Currently, on MP_JOIN failure we reset the child
socket, but leave the request socket untouched.

tcp_check_req will deal with it according to the
'tcp_abort_on_overflow' sysctl value - by default the
req socket will stay alive.

The above leads to inconsistent behavior on MP JOIN
failure, and bad listener overflow accounting.

This patch addresses the issue leveraging the infrastructure
just introduced to ask the TCP stack to drop the req on
failure.

The child socket is not freed anymore by subflow_syn_recv_sock(),
instead it's moved to a dead state and will be disposed by the
next sock_put done by the TCP stack, so that listener overflow
accounting is not affected by MP JOIN failure.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 12:30:13 -07:00
Paolo Abeni 2f8a397d0a inet_connection_sock: factor out destroy helper.
Move the steps to prepare an inet_connection_sock for
forced disposal inside a separate helper. No functional
changes inteded, this will just simplify the next patch.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 12:30:13 -07:00
Paolo Abeni 90bf45134d mptcp: add new sock flag to deal with join subflows
MP_JOIN subflows must not land into the accept queue.
Currently tcp_check_req() calls an mptcp specific helper
to detect such scenario.

Such helper leverages the subflow context to check for
MP_JOIN subflows. We need to deal also with MP JOIN
failures, even when the subflow context is not available
due allocation failure.

A possible solution would be changing the syn_recv_sock()
signature to allow returning a more descriptive action/
error code and deal with that in tcp_check_req().

Since the above need is MPTCP specific, this patch instead
uses a TCP request socket hole to add a MPTCP specific flag.
Such flag is used by the MPTCP syn_recv_sock() to tell
tcp_check_req() how to deal with the request socket.

This change is a no-op for !MPTCP build, and makes the
MPTCP code simpler. It allows also the next patch to deal
correctly with MP JOIN failure.

v1 -> v2:
 - be more conservative on drop_req initialization (Mat)

RFC -> v1:
 - move the drop_req bit inside tcp_request_sock (Eric)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Reviewed-by: Christoph Paasch <cpaasch@apple.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 12:30:13 -07:00
Linus Torvalds 01d8a74803 Fix flush_icache_range() second argument in machine_kexec() to be an
address rather than size.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl6+z98ACgkQa9axLQDI
 XvFKOA//VFEAWOrCWS65DUn4TuonMG+foUp82zxaFZWK1epGCZoIEJQrSG6g5Png
 phL6Gx2oniqj+taLYZWqckMe+7k/yn9Xq+0DwDliMjaY9NB9bldE9J9ZGxpa4jBI
 qIrdaTgtRip1LQvm5BczmANGf86/p9E4nC6qHBJf9Stn9SYxoQuZlOGiwaK8XTe2
 QGn+NQ69ln7TkfUwEixhGI39dOHasrDAGgcUp/pTn2ki8mtpE2lG8+qSxxGTGTIT
 1SaJftER8JSc3VKjXKhOIaSwx1MqwTx0Cx4dTfAGhKzcQLdmXz0GDJjQCzU0weOP
 yHyH1gbJYUmB5X+CSA2OKssxsxcckVNMvTm7Jnn3Pb0f9oHVZQV4UR8cE/G40/hh
 c6OHAJXfmGla5GhNpRUXB5SGJyWDxaIrdZRXbDPCWGOqWxwIJyFWNz5h4kiVsyAe
 OJBtgzCqw3oUkr7KXgz0EECWV+T22+Y5cBbolptpqm7m6JbDQf/iviVJV5tkXnLF
 vkbZEx94Ajy4U4TZkQBWlJFN8M4Z+ttYuGYxebnurNI2Ottb0czGFypx+WXdGItp
 9rBaklIfzsCPoXZH0iHtaRxbDl4avwS46yKndtb2Y6tGoqqSNW4A6rf+S3YxvqBN
 bEBxMqxCuMxPWNPBb3bXpw4QlFo4LLkWQeAvbc59SROep35cPTo=
 =BoxO
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:
 "Fix flush_icache_range() second argument in machine_kexec() to be an
  address rather than size"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix the flush_icache_range arguments in machine_kexec
2020-05-15 11:08:46 -07:00
Oleksij Rempel ca1c933bce net: phy: tja11xx: execute cable test on link up
A typical 100Base-T1 link should be always connected. If the link is in
a shot or open state, it is a failure. In most cases, we won't be able
to automatically handle this issue, but we need to log it or notify user
(if possible).

With this patch, the cable will be tested on "ip l s dev .. up" attempt
and send ethnl notification to the user space.

This patch was tested with TJA1102 PHY and "ethtool --monitor" command.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 11:03:03 -07:00
David S. Miller 8e1381049e Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Alexei Starovoitov says:

====================
pull-request: bpf 2020-05-15

The following pull-request contains BPF updates for your *net* tree.

We've added 9 non-merge commits during the last 2 day(s) which contain
a total of 14 files changed, 137 insertions(+), 43 deletions(-).

The main changes are:

1) Fix secid_to_secctx LSM hook default value, from Anders.

2) Fix bug in mmap of bpf array, from Andrii.

3) Restrict bpf_probe_read to archs where they work, from Daniel.

4) Enforce returning 0 for fentry/fexit progs, from Yonghong.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 10:57:21 -07:00
Kevin Lo b0ed0bbfb3 net: phy: broadcom: add support for BCM54811 PHY
The BCM54811 PHY shares many similarities with the already supported BCM54810
PHY but additionally requires some semi-unique configuration.

Signed-off-by: Kevin Lo <kevlo@kevlo.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 10:56:31 -07:00
David S. Miller d42d118cfc Merge branch 'cxgb4-improve-and-tune-TC-MQPRIO-offload'
Rahul Lakkireddy says:

====================
cxgb4: improve and tune TC-MQPRIO offload

Patch 1 improves the Tx path's credit request and recovery mechanism
when running under heavy load.

Patch 2 adds ability to tune the burst buffer sizes of all traffic
classes to improve performance for <= 1500 MTU, under heavy load.

Patch 3 adds support to track EOTIDs and dump software queue
contexts used by TC-MQPRIO offload.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-15 10:54:08 -07:00