Commit Graph

104711 Commits

Author SHA1 Message Date
Vladimir Oltean c57fe0037a net: mscc: ocelot: make use of all 63 PTP timestamp identifiers
At present, there is a problem when user space bombards a port with PTP
event frames which have TX timestamping requests (or when a tc-taprio
offload is installed on a port, which delays the TX timestamps by a
significant amount of time). The driver will happily roll over the 2-bit
timestamp ID and this will cause incorrect matches between an skb and
the TX timestamp collected from the FIFO.

The Ocelot switches have a 6-bit PTP timestamp identifier, and the value
63 is reserved, so that leaves identifiers 0-62 to be used.

The timestamp identifiers are selected by the REW_OP packet field, and
are actually shared between CPU-injected frames and frames which match a
VCAP IS2 rule that modifies the REW_OP. The hardware supports
partitioning between the two uses of the REW_OP field through the
PTP_ID_LOW and PTP_ID_HIGH registers, and by default reserves the PTP
IDs 0-3 for CPU-injected traffic and the rest for VCAP IS2.

The driver does not use VCAP IS2 to set REW_OP for 2-step timestamping,
and it also writes 0xffffffff to both PTP_ID_HIGH and PTP_ID_LOW in
ocelot_init_timestamp() which makes all timestamp identifiers available
to CPU injection.

Therefore, we can make use of all 63 timestamp identifiers, which should
allow more timestampable packets to be in flight on each port. This is
only part of the solution, more issues will be addressed in future changes.

Fixes: 4e3b0468e6 ("net: mscc: PTP Hardware Clock (PHC) support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:35:17 -07:00
Vladimir Oltean 4ac0567e40 net: dsa: sja1105: break dependency between dsa_port_is_sja1105 and switch driver
It's nice to be able to test a tagging protocol with dsa_loop, but not
at the cost of losing the ability of building the tagging protocol and
switch driver as modules, because as things stand, there is a circular
dependency between the two. Tagging protocol drivers cannot depend on
switch drivers, that is a hard fact.

The reasoning behind the blamed patch was that accessing dp->priv should
first make sure that the structure behind that pointer is what we really
think it is.

Currently the "sja1105" and "sja1110" tagging protocols only operate
with the sja1105 switch driver, just like any other tagging protocol and
switch combination. The only way to mix and match them is by modifying
the code, and this applies to dsa_loop as well (by default that uses
DSA_TAG_PROTO_NONE). So while in principle there is an issue, in
practice there isn't one.

Until we extend dsa_loop to allow user space configuration, treat the
problem as a non-issue and just say that DSA ports found by tag_sja1105
are always sja1105 ports, which is in fact true. But keep the
dsa_port_is_sja1105 function so that it's easy to patch it during
testing, and rely on dead code elimination.

Fixes: 994d2cbb08 ("net: dsa: tag_sja1105: be dsa_loop-safe")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Vladimir Oltean 28da0555c3 net: dsa: move sja1110_process_meta_tstamp inside the tagging protocol driver
The problem is that DSA tagging protocols really must not depend on the
switch driver, because this creates a circular dependency at insmod
time, and the switch driver will effectively not load when the tagging
protocol driver is missing.

The code was structured in the way it was for a reason, though. The DSA
driver-facing API for PTP timestamping relies on the assumption that
two-step TX timestamps are provided by the hardware in an out-of-band
manner, typically by raising an interrupt and making that timestamp
available inside some sort of FIFO which is to be accessed over
SPI/MDIO/etc.

So the API puts .port_txtstamp into dsa_switch_ops, because it is
expected that the switch driver needs to save some state (like put the
skb into a queue until its TX timestamp arrives).

On SJA1110, TX timestamps are provided by the switch as Ethernet
packets, so this makes them be received and processed by the tagging
protocol driver. This in itself is great, because the timestamps are
full 64-bit and do not require reconstruction, and since Ethernet is the
fastest I/O method available to/from the switch, PTP timestamps arrive
very quickly, no matter how bottlenecked the SPI connection is, because
SPI interaction is not needed at all.

DSA's code structure and strict isolation between the tagging protocol
driver and the switch driver break the natural code organization.

When the tagging protocol driver receives a packet which is classified
as a metadata packet containing timestamps, it passes those timestamps
one by one to the switch driver, which then proceeds to compare them
based on the recorded timestamp ID that was generated in .port_txtstamp.

The communication between the tagging protocol and the switch driver is
done through a method exported by the switch driver, sja1110_process_meta_tstamp.
To satisfy build requirements, we force a dependency to build the
tagging protocol driver as a module when the switch driver is a module.
However, as explained in the first paragraph, that causes the circular
dependency.

To solve this, move the skb queue from struct sja1105_private :: struct
sja1105_ptp_data to struct sja1105_private :: struct sja1105_tagger_data.
The latter is a data structure for which hacks have already been put
into place to be able to create persistent storage per switch that is
accessible from the tagging protocol driver (see sja1105_setup_ports).

With the skb queue directly accessible from the tagging protocol driver,
we can now move sja1110_process_meta_tstamp into the tagging driver
itself, and avoid exporting a symbol.

Fixes: 566b18c8b7 ("net: dsa: sja1105: implement TX timestamping for SJA1110")
Link: https://lore.kernel.org/netdev/20210908220834.d7gmtnwrorhharna@skbuf/
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 17:33:36 -07:00
Baowen Zheng 60d950f443 nfp: flow_offload: move flow_indr_dev_register from app init to app start
In commit 74fc4f8287 ("net: Fix offloading indirect devices dependency
on qdisc order creation"), it adds a process to trigger the callback to
setup the bo callback when the driver regists a callback.

In our current implement, we are not ready to run the callback when nfp
call the function flow_indr_dev_register, then there will be error
message as:

kernel: Oops: 0000 [#1] SMP PTI
kernel: CPU: 0 PID: 14119 Comm: kworker/0:0 Tainted: G
kernel: Workqueue: events work_for_cpu_fn
kernel: RIP: 0010:nfp_flower_indr_setup_tc_cb+0x258/0x410
kernel: RSP: 0018:ffffbc1e02c57bf8 EFLAGS: 00010286
kernel: RAX: 0000000000000000 RBX: ffff9c761fabc000 RCX: 0000000000000001
kernel: RDX: 0000000000000001 RSI: fffffffffffffff0 RDI: ffffffffc0be9ef1
kernel: RBP: ffffbc1e02c57c58 R08: ffffffffc08f33aa R09: ffff9c6db7478800
kernel: R10: 0000009c003f6e00 R11: ffffbc1e02800000 R12: ffffbc1e000d9000
kernel: R13: ffffbc1e000db428 R14: ffff9c6db7478800 R15: ffff9c761e884e80
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: fffffffffffffff0 CR3: 00000009e260a004 CR4: 00000000007706f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: PKRU: 55555554
kernel: Call Trace:
kernel: ? flow_indr_dev_register+0xab/0x210
kernel: ? __cond_resched+0x15/0x30
kernel: ? kmem_cache_alloc_trace+0x44/0x4b0
kernel: ? nfp_flower_setup_tc+0x1d0/0x1d0 [nfp]
kernel: flow_indr_dev_register+0x158/0x210
kernel: ? tcf_block_unbind+0xe0/0xe0
kernel: nfp_flower_init+0x40b/0x650 [nfp]
kernel: nfp_net_pci_probe+0x25f/0x960 [nfp]
kernel: ? nfp_rtsym_read_le+0x76/0x130 [nfp]
kernel: nfp_pci_probe+0x6a9/0x820 [nfp]
kernel: local_pci_probe+0x45/0x80

So we need to call flow_indr_dev_register in app start process instead of
init stage.

Fixes: 74fc4f8287 ("net: Fix offloading indirect devices dependency on qdisc order creation")
Signed-off-by: Baowen Zheng <baowen.zheng@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Louis Peens <louis.peens@corigine.com>
Link: https://lore.kernel.org/r/20211012124850.13025-1-louis.peens@corigine.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-12 16:07:52 -07:00
Jacob Keller 4d4a223a86 ice: fix locking for Tx timestamp tracking flush
Commit 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
added a lock around the Tx timestamp tracker flow which is used to
cleanup any left over SKBs and prepare for device removal.

This lock is problematic because it is being held around a call to
ice_clear_phy_tstamp. The clear function takes a mutex to send a PHY
write command to firmware. This could lead to a deadlock if the mutex
actually sleeps, and causes the following warning on a kernel with
preemption debugging enabled:

[  715.419426] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:573
[  715.427900] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3100, name: rmmod
[  715.435652] INFO: lockdep is turned off.
[  715.439591] Preemption disabled at:
[  715.439594] [<0000000000000000>] 0x0
[  715.446678] CPU: 52 PID: 3100 Comm: rmmod Tainted: G        W  OE     5.15.0-rc4+ #42 bdd7ec3018e725f159ca0d372ce8c2c0e784891c
[  715.458058] Hardware name: Intel Corporation S2600STQ/S2600STQ, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
[  715.468483] Call Trace:
[  715.470940]  dump_stack_lvl+0x6a/0x9a
[  715.474613]  ___might_sleep.cold+0x224/0x26a
[  715.478895]  __mutex_lock+0xb3/0x1440
[  715.482569]  ? stack_depot_save+0x378/0x500
[  715.486763]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.494979]  ? kfree+0xc1/0x520
[  715.498128]  ? mutex_lock_io_nested+0x12a0/0x12a0
[  715.502837]  ? kasan_set_free_info+0x20/0x30
[  715.507110]  ? __kasan_slab_free+0x10b/0x140
[  715.511385]  ? slab_free_freelist_hook+0xc7/0x220
[  715.516092]  ? kfree+0xc1/0x520
[  715.519235]  ? ice_deinit_lag+0x16c/0x220 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.527359]  ? ice_remove+0x1cf/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.535133]  ? pci_device_remove+0xab/0x1d0
[  715.539318]  ? __device_release_driver+0x35b/0x690
[  715.544110]  ? driver_detach+0x214/0x2f0
[  715.548035]  ? bus_remove_driver+0x11d/0x2f0
[  715.552309]  ? pci_unregister_driver+0x26/0x250
[  715.556840]  ? ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.564799]  ? __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.570554]  ? do_syscall_64+0x3b/0x90
[  715.574303]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.579529]  ? start_flush_work+0x542/0x8f0
[  715.583719]  ? ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.591923]  ice_sq_send_cmd+0x78/0x14c0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.599960]  ? wait_for_completion_io+0x250/0x250
[  715.604662]  ? lock_acquire+0x196/0x200
[  715.608504]  ? do_raw_spin_trylock+0xa5/0x160
[  715.612864]  ice_sbq_rw_reg+0x1e6/0x2f0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.620813]  ? ice_reset+0x130/0x130 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.628497]  ? __debug_check_no_obj_freed+0x1e8/0x3c0
[  715.633550]  ? trace_hardirqs_on+0x1c/0x130
[  715.637748]  ice_write_phy_reg_e810+0x70/0xf0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.646220]  ? do_raw_spin_trylock+0xa5/0x160
[  715.650581]  ? ice_ptp_release+0x910/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.658797]  ? ice_ptp_release+0x255/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.667013]  ice_clear_phy_tstamp+0x2c/0x110 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.675403]  ice_ptp_release+0x408/0x910 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.683440]  ice_remove+0x560/0x6a0 [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.691037]  ? _raw_spin_unlock_irqrestore+0x46/0x73
[  715.696005]  pci_device_remove+0xab/0x1d0
[  715.700018]  __device_release_driver+0x35b/0x690
[  715.704637]  driver_detach+0x214/0x2f0
[  715.708389]  bus_remove_driver+0x11d/0x2f0
[  715.712489]  pci_unregister_driver+0x26/0x250
[  715.716857]  ice_module_exit+0xc/0x2f [ice 9a7e1ec00971c89ecd3fe0d4dc7da2b3786a421d]
[  715.724637]  __do_sys_delete_module.constprop.0+0x2d8/0x4e0
[  715.730210]  ? free_module+0x6d0/0x6d0
[  715.733963]  ? task_work_run+0xe1/0x170
[  715.737803]  ? exit_to_user_mode_loop+0x17f/0x1d0
[  715.742509]  ? rcu_read_lock_sched_held+0x12/0x80
[  715.747215]  ? trace_hardirqs_on+0x1c/0x130
[  715.751401]  do_syscall_64+0x3b/0x90
[  715.754981]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[  715.760033] RIP: 0033:0x7f4dfe59000b
[  715.763612] Code: 73 01 c3 48 8b 0d 6d 1e 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 3d 1e 0c 00 f7 d8 64 89 01 48
[  715.782357] RSP: 002b:00007ffe8c891708 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
[  715.789923] RAX: ffffffffffffffda RBX: 00005558a20468b0 RCX: 00007f4dfe59000b
[  715.797054] RDX: 000000000000000a RSI: 0000000000000800 RDI: 00005558a2046918
[  715.804189] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[  715.811319] R10: 00007f4dfe603ac0 R11: 0000000000000206 R12: 00007ffe8c891940
[  715.818455] R13: 00007ffe8c8920a3 R14: 00005558a20462a0 R15: 00005558a20468b0

Notice that this is the only case where we use the lock in this way. In
the cleanup kthread and work kthread the lock is only taken around the
bit accesses. This was done intentionally to avoid this kind of issue.
The way the lock is used, we only protect ordering of bit sets vs bit
clears. The Tx writers in the hot path don't need to be protected
against the entire kthread loop. The Tx queues threads only need to
ensure that they do not re-use an index that is currently in use. The
cleanup loop does not need to block all new set bits, since it will
re-queue itself if new timestamps are present.

Fix the tracker flow so that it uses the same flow as the standard
cleanup thread. In addition, ensure the in_use bitmap actually gets
cleared properly.

This fixes the warning and also avoids the potential deadlock that might
have occurred otherwise.

Fixes: 4dd0d5c33c ("ice: add lock around Tx timestamp tracker flush")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 12:10:39 +01:00
Arun Ramadoss ef1100ef20 net: dsa: microchip: Added the condition for scheduling ksz_mib_read_work
When the ksz module is installed and removed using rmmod, kernel crashes
with null pointer dereferrence error. During rmmod, ksz_switch_remove
function tries to cancel the mib_read_workqueue using
cancel_delayed_work_sync routine and unregister switch from dsa.

During dsa_unregister_switch it calls ksz_mac_link_down, which in turn
reschedules the workqueue since mib_interval is non-zero.
Due to which queue executed after mib_interval and it tries to access
dp->slave. But the slave is unregistered in the ksz_switch_remove
function. Hence kernel crashes.

To avoid this crash, before canceling the workqueue, resetted the
mib_interval to 0.

v1 -> v2:
-Removed the if condition in ksz_mib_read_work

Fixes: 469b390e1b ("net: dsa: microchip: use delayed_work instead of timer + work")
Signed-off-by: Arun Ramadoss <arun.ramadoss@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:35:53 +01:00
Vegard Nossum 9973a43012 r8152: select CRC32 and CRYPTO/CRYPTO_HASH/CRYPTO_SHA256
Fix the following build/link errors by adding a dependency on
CRYPTO, CRYPTO_HASH, CRYPTO_SHA256 and CRC32:

  ld: drivers/net/usb/r8152.o: in function `rtl8152_fw_verify_checksum':
  r8152.c:(.text+0x2b2a): undefined reference to `crypto_alloc_shash'
  ld: r8152.c:(.text+0x2bed): undefined reference to `crypto_shash_digest'
  ld: r8152.c:(.text+0x2c50): undefined reference to `crypto_destroy_tfm'
  ld: drivers/net/usb/r8152.o: in function `_rtl8152_set_rx_mode':
  r8152.c:(.text+0xdcb0): undefined reference to `crc32_le'

Fixes: 9370f2d05a ("r8152: support request_firmware for RTL8153")
Fixes: ac718b6930 ("net/usb: new driver for RTL8152")
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:33:32 +01:00
Maarten Zanders 4a3e0aeddf net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's
mv88e6xxx_port_ppu_updates() interpretes data in the PORT_STS
register incorrectly for internal ports (ie no PPU). In these
cases, the PHY_DETECT bit indicates link status. This results
in forcing the MAC state whenever the PHY link goes down which
is not intended. As a side effect, LED's configured to show
link status stay lit even though the physical link is down.

Add a check in mac_link_down and mac_link_up to see if it
concerns an external port and only then, look at PPU status.

Fixes: 5d5b231da7 (net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down)
Reported-by: Maarten Zanders <m.zanders@televic.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: Maarten Zanders <maarten.zanders@mind.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:32:14 +01:00
Wan Jiabing 74a3bc42fe net: mscc: ocelot: Fix dumplicated argument in ocelot
Fix the following coccicheck warning:
drivers/net/ethernet/mscc/ocelot.c:474:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot.c:476:duplicated argument to & or |
drivers/net/ethernet/mscc/ocelot_net.c:1627:duplicated argument
to & or |

These DEV_CLOCK_CFG_MAC_TX_RST are duplicate here.
Here should be DEV_CLOCK_CFG_MAC_RX_RST.

Fixes: e6e12df625 ("net: mscc: ocelot: convert to phylink")
Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-12 11:21:55 +01:00
Xuan Zhuo 732b74d647 virtio-net: fix for skb_over_panic inside big mode
commit 126285651b ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
accidentally reverted the effect of
commit 1a8024239d ("virtio-net: fix for skb_over_panic inside big mode")
on drivers/net/virtio_net.c

As a result, users of crosvm (which is using large packet mode)
are experiencing crashes with 5.14-rc1 and above that do not
occur with 5.13.

Crash trace:

[   61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL>
[   61.369192] kernel BUG at net/core/skbuff.c:111!
[   61.372840] invalid opcode: 0000 [#1] SMP PTI
[   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
[   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0

..

[   61.393635] Call Trace:
[   61.394127]  <IRQ>
[   61.394488]  skb_put.cold+0x10/0x10
[   61.395095]  page_to_skb+0xf7/0x410
[   61.395689]  receive_buf+0x81/0x1660
[   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
[   61.397180]  ? napi_gro_flush+0x97/0xe0
[   61.397896]  ? detach_buf_split+0x67/0x120
[   61.398573]  virtnet_poll+0x2cf/0x420
[   61.399197]  __napi_poll+0x25/0x150
[   61.399764]  net_rx_action+0x22f/0x280
[   61.400394]  __do_softirq+0xba/0x257
[   61.401012]  irq_exit_rcu+0x8e/0xb0
[   61.401618]  common_interrupt+0x7b/0xa0
[   61.402270]  </IRQ>

See
https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
for the report.

Apply the original 1a8024239d ("virtio-net: fix for skb_over_panic inside big mode")
again, the original logic still holds:

In virtio-net's large packet mode, there is a hole in the space behind
buf.

    hdr_padded_len - hdr_len

We must take this into account when calculating tailroom.

Cc: Greg KH <gregkh@linuxfoundation.org>
Fixes: fb32856b16 ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Fixes: 126285651b ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: Corentin Noël <corentin.noel@collabora.com>
Tested-by: Corentin Noël <corentin.noel@collabora.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:50:33 +01:00
Florian Fainelli f49823939e net: phy: Do not shutdown PHYs in READY state
In case a PHY device was probed thus in the PHY_READY state, but not
configured and with no network device attached yet, we should not be
trying to shut it down because it has been brought back into reset by
phy_device_reset() towards the end of phy_probe() and anyway we have not
configured the PHY yet.

Fixes: e2f016cf77 ("net: phy: add a shutdown procedure")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:47:37 +01:00
chongjiapeng a5a14ea7b4 qed: Fix missing error code in qed_slowpath_start()
The error code is missing in this code scenario, add the error code
'-EINVAL' to the return value 'rc'.

Eliminate the follow smatch warning:

drivers/net/ethernet/qlogic/qed/qed_main.c:1298 qed_slowpath_start()
warn: missing error code 'rc'.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Fixes: d51e4af5c2 ("qed: aRFS infrastructure support")
Signed-off-by: chongjiapeng <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 13:46:41 +01:00
Shannon Nelson 5c976a5657 ionic: don't remove netdev->dev_addr when syncing uc list
Bridging, and possibly other upper stack gizmos, adds the
lower device's netdev->dev_addr to its own uc list, and
then requests it be deleted when the upper bridge device is
removed.  This delete request also happens with the bridging
vlan_filtering is enabled and then disabled.

Bonding has a similar behavior with the uc list, but since it
also uses set_mac to manage netdev->dev_addr, it doesn't have
the same the failure case.

Because we store our netdev->dev_addr in our uc list, we need
to ignore the delete request from dev_uc_sync so as to not
lose the address and all hope of communicating.  Note that
ndo_set_mac_address is expressly changing netdev->dev_addr,
so no limitation is set there.

Fixes: 2a654540be ("ionic: Add Rx filter and rx_mode ndo support")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-09 11:56:59 +01:00
Haiyang Zhang be0499369d net: mana: Fix error handling in mana_create_rxq()
Fix error handling in mana_create_rxq() when
cq->gdma_id >= gc->max_num_cqs.

Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://lore.kernel.org/r/1633698691-31721-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 17:00:04 -07:00
Vladimir Oltean 5bded8259e net: dsa: mv88e6xxx: isolate the ATU databases of standalone and bridged ports
Similar to commit 6087175b79 ("net: dsa: mt7530: use independent VLAN
learning on VLAN-unaware bridges"), software forwarding between an
unoffloaded LAG port (a bonding interface with an unsupported policy)
and a mv88e6xxx user port directly under a bridge is broken.

We adopt the same strategy, which is to make the standalone ports not
find any ATU entry learned on a bridge port.

Theory: the mv88e6xxx ATU is looked up by FID and MAC address. There are
as many FIDs as VIDs (4096). The FID is derived from the VID when
possible (the VTU maps a VID to a FID), with a fallback to the port
based default FID value when not (802.1Q Mode is disabled on the port,
or the classified VID isn't present in the VTU).

The mv88e6xxx driver makes the following use of FIDs and VIDs:

- the port's DefaultVID (to which untagged & pvid-tagged packets get
  classified) is 0 and is absent from the VTU, so this kind of packets is
  processed in FID 0, the default FID assigned by mv88e6xxx_setup_port.

- every time a bridge VLAN is created, mv88e6xxx_port_vlan_join() ->
  mv88e6xxx_atu_new() associates a FID with that VID which increases
  linearly starting from 1. Like this:

  bridge vlan add dev lan0 vid 100 # FID 1
  bridge vlan add dev lan1 vid 100 # still FID 1
  bridge vlan add dev lan2 vid 1024 # FID 2

The FID allocation made by the driver is sub-optimal for the following
reasons:

(a) A standalone port has a DefaultPVID of 0 and a default FID of 0 too.
    A VLAN-unaware bridged port has a DefaultPVID of 0 and a default FID
    of 0 too. The difference is that the bridged ports may learn ATU
    entries, while the standalone port has the requirement that it must
    not, and must not find them either. Standalone ports must not use
    the same FID as ports belonging to a bridge. All standalone ports
    can use the same FID, since the ATU will never have an entry in
    that FID.

(b) Multiple VLAN-unaware bridges will all use a DefaultPVID of 0 and a
    default FID of 0 on all their ports. The FDBs will not be isolated
    between these bridges. Every VLAN-unaware bridge must use the same
    FID on all its ports, different from the FID of other bridge ports.

(c) Each bridge VLAN uses a unique FID which is useful for Independent
    VLAN Learning, but the same VLAN ID on multiple VLAN-aware bridges
    will result in the same FID being used by mv88e6xxx_atu_new().
    The correct behavior is for VLAN 1 in br0 to have a different FID
    compared to VLAN 1 in br1.

This patch cannot fix all the above. Traditionally the DSA framework did
not care about this, and the reality is that DSA core involvement is
needed for the aforementioned issues to be solved. The only thing we can
solve here is an issue which does not require API changes, and that is
issue (a), aka use a different FID for standalone ports vs ports under
VLAN-unaware bridges.

The first step is deciding what VID and FID to use for standalone ports,
and what VID and FID for bridged ports. The 0/0 pair for standalone
ports is what they used up till now, let's keep using that. For bridged
ports, there are 2 cases:

- VLAN-aware ports will never end up using the port default FID, because
  packets will always be classified to a VID in the VTU or dropped
  otherwise. The FID is the one associated with the VID in the VTU.

- On VLAN-unaware ports, we _could_ leave their DefaultVID (pvid) at
  zero (just as in the case of standalone ports), and just change the
  port's default FID from 0 to a different number (say 1).

However, Tobias points out that there is one more requirement to cater to:
cross-chip bridging. The Marvell DSA header does not carry the FID in
it, only the VID. So once a packet crosses a DSA link, if it has a VID
of zero it will get classified to the default FID of that cascade port.
Relying on a port default FID for upstream cascade ports results in
contradictions: a default FID of 0 breaks ATU isolation of bridged ports
on the downstream switch, a default FID of 1 breaks standalone ports on
the downstream switch.

So not only must standalone ports have different FIDs compared to
bridged ports, they must also have different DefaultVID values.
IEEE 802.1Q defines two reserved VID values: 0 and 4095. So we simply
choose 4095 as the DefaultVID of ports belonging to VLAN-unaware
bridges, and VID 4095 maps to FID 1.

For the xmit operation to look up the same ATU database, we need to put
VID 4095 in DSA tags sent to ports belonging to VLAN-unaware bridges
too. All shared ports are configured to map this VID to the bridging
FID, because they are members of that VLAN in the VTU. Shared ports
don't need to have 802.1QMode enabled in any way, they always parse the
VID from the DSA header, they don't need to look at the 802.1Q header.

We install VID 4095 to the VTU in mv88e6xxx_setup_port(), with the
mention that mv88e6xxx_vtu_setup() which was located right below that
call was flushing the VTU so those entries wouldn't be preserved.
So we need to relocate the VTU flushing prior to the port initialization
during ->setup(). Also note that this is why it is safe to assume that
VID 4095 will get associated with FID 1: the user ports haven't been
created, so there is no avenue for the user to create a bridge VLAN
which could otherwise race with the creation of another FID which would
otherwise use up the non-reserved FID value of 1.

[ Currently mv88e6xxx_port_vlan_join() doesn't have the option of
  specifying a preferred FID, it always calls mv88e6xxx_atu_new(). ]

mv88e6xxx_port_db_load_purge() is the function to access the ATU for
FDB/MDB entries, and it used to determine the FID to use for
VLAN-unaware FDB entries (VID=0) using mv88e6xxx_port_get_fid().
But the driver only called mv88e6xxx_port_set_fid() once, during probe,
so no surprises, the port FID was always 0, the call to get_fid() was
redundant. As much as I would have wanted to not touch that code, the
logic is broken when we add a new FID which is not the port-based
default. Now the port-based default FID only corresponds to standalone
ports, and FDB/MDB entries belong to the bridging service. So while in
the future, when the DSA API will support FDB isolation, we will have to
figure out the FID based on the bridge number, for now there's a single
bridging FID, so hardcode that.

Lastly, the tagger needs to check, when it is transmitting a VLAN
untagged skb, whether it is sending it towards a bridged or a standalone
port. When we see it is bridged we assume the bridge is VLAN-unaware.
Not because it cannot be VLAN-aware but:

- if we are transmitting from a VLAN-aware bridge we are likely doing so
  using TX forwarding offload. That code path guarantees that skbs have
  a vlan hwaccel tag in them, so we would not enter the "else" branch
  of the "if (skb->protocol == htons(ETH_P_8021Q))" condition.

- if we are transmitting on behalf of a VLAN-aware bridge but with no TX
  forwarding offload (no PVT support, out of space in the PVT, whatever),
  we would indeed be transmitting with VLAN 4095 instead of the bridge
  device's pvid. However we would be injecting a "From CPU" frame, and
  the switch won't learn from that - it only learns from "Forward" frames.
  So it is inconsequential for address learning. And VLAN 4095 is
  absolutely enough for the frame to exit the switch, since we never
  remove that VLAN from any port.

Fixes: 57e661aae6 ("net: dsa: mv88e6xxx: Link aggregation support")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 15:47:46 -07:00
Vladimir Oltean 8b6836d824 net: dsa: mv88e6xxx: keep the pvid at 0 when VLAN-unaware
The VLAN support in mv88e6xxx has a loaded history. Commit 2ea7a679ca
("net: dsa: Don't add vlans when vlan filtering is disabled") noticed
some issues with VLAN and decided the best way to deal with them was to
make the DSA core ignore VLANs added by the bridge while VLAN awareness
is turned off. Those issues were never explained, just presented as
"at least one corner case".

That approach had problems of its own, presented by
commit 54a0ed0df4 ("net: dsa: provide an option for drivers to always
receive bridge VLANs") for the DSA core, followed by
commit 1fb7419198 ("net: dsa: mv88e6xxx: fix vlan setup") which
applied ds->configure_vlan_while_not_filtering = true for mv88e6xxx in
particular.

We still don't know what corner case Andrew saw when he wrote
commit 2ea7a679ca ("net: dsa: Don't add vlans when vlan filtering is
disabled"), but Tobias now reports that when we use TX forwarding
offload, pinging an external station from the bridge device is broken if
the front-facing DSA user port has flooding turned off. The full
description is in the link below, but for short, when a mv88e6xxx port
is under a VLAN-unaware bridge, it inherits that bridge's pvid.
So packets ingressing a user port will be classified to e.g. VID 1
(assuming that value for the bridge_default_pvid), whereas when
tag_dsa.c xmits towards a user port, it always sends packets using a VID
of 0 if that port is standalone or under a VLAN-unaware bridge - or at
least it did so prior to commit d82f8ab0d8 ("net: dsa: tag_dsa:
offload the bridge forwarding process").

In any case, when there is a conversation between the CPU and a station
connected to a user port, the station's MAC address is learned in VID 1
but the CPU tries to transmit through VID 0. The packets reach the
intended station, but via flooding and not by virtue of matching the
existing ATU entry.

DSA has established (and enforced in other drivers: sja1105, felix,
mt7530) that a VLAN-unaware port should use a private pvid, and not
inherit the one from the bridge. The bridge's pvid should only be
inherited when that bridge is VLAN-aware, so all state transitions need
to be handled. On the other hand, all bridge VLANs should sit in the VTU
starting with the moment when the bridge offloads them via switchdev,
they are just not used.

This solves the problem that Tobias sees because packets ingressing on
VLAN-unaware user ports now get classified to VID 0, which is also the
VID used by tag_dsa.c on xmit.

Fixes: d82f8ab0d8 ("net: dsa: tag_dsa: offload the bridge forwarding process")
Link: https://patchwork.kernel.org/project/netdevbpf/patch/20211003222312.284175-2-vladimir.oltean@nxp.com/#24491503
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-08 15:47:46 -07:00
Herve Codina 9cb1d19f47 net: stmmac: add support for dwmac 3.40a
dwmac 3.40a is an old ip version that can be found on SPEAr3xx soc.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08 16:22:39 +01:00
Herve Codina 075da584ba net: stmmac: fix get_hw_feature() on old hardware
Some old IPs do not provide the hardware feature register.
On these IPs, this register is read 0x00000000.

In old driver version, this feature was handled but a regression came
with the commit f10a6a3541 ("stmmac: rework get_hw_feature function").
Indeed, this commit removes the return value in dma->get_hw_feature().
This return value was used to indicate the validity of retrieved
information and used later on in stmmac_hw_init() to override
priv->plat data if this hardware feature were valid.

This patch restores the return code in ->get_hw_feature() in order
to indicate the hardware feature validity and override priv->plat
data only if this hardware feature is valid.

Fixes: f10a6a3541 ("stmmac: rework get_hw_feature function")
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-08 16:22:38 +01:00
Linus Torvalds 4a16df549d Networking fixes for 5.15-rc5, including fixes from xfrm, bpf,
netfilter, and wireless.
 
 Current release - regressions:
 
  - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting
    a new value in the middle of an enum
 
  - unix: fix an issue in unix_shutdown causing the other end
    read/write failures
 
  - phy: mdio: fix memory leak
 
 Current release - new code bugs:
 
  - mlx5e: improve MQPRIO resiliency against bad configs
 
 Previous releases - regressions:
 
  - bpf: fix integer overflow leading to OOB access in map element
    pre-allocation
 
  - stmmac: dwmac-rk: fix ethernet on rk3399 based devices
 
  - netfilter: conntrack: fix boot failure with nf_conntrack.enable_hooks=1
 
  - brcmfmac: revert using ISO3166 country code and 0 rev as fallback
 
  - i40e: fix freeing of uninitialized misc IRQ vector
 
  - iavf: fix double unlock of crit_lock
 
 Previous releases - always broken:
 
  - bpf, arm: fix register clobbering in div/mod implementation
 
  - netfilter: nf_tables: correct issues in netlink rule change
    event notifications
 
  - dsa: tag_dsa: fix mask for trunked packets
 
  - usb: r8152: don't resubmit rx immediately to avoid soft lockup
    on device unplug
 
  - i40e: fix endless loop under rtnl if FW fails to correctly
    respond to capability query
 
  - mlx5e: fix rx checksum offload coexistence with ipsec offload
 
  - mlx5: force round second at 1PPS out start time and allow it
    only in supported clock modes
 
  - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
    sequence
 
 Misc:
 
  - xfrm: slightly rejig the new policy uAPI to make it less cryptic
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmFfEpUACgkQMUZtbf5S
 Iru3vQ//fgm+pdDE860BXmLEgrbJTHU4rq/YD1vwZBcWw/i5wI1vnLr6BzZsdPNX
 DkhcKFGOZUTj+8ctuRDuqrkqoDjva6uRjwM0vcPWh5i9sGqJpKjxB3dFksyxJELR
 SnXM3Jmlk7YiGAw9Bi+66OuIwt2ouRLR/bNIwg8/qCnFI1efIF7IPeCpuvKCw/yd
 SOiSBIfuSPD1qcs5Sy4aHZqA8Xr9qbwDNwWQfFLXgNDKEiY7XOSdo3CoCddSxdR+
 2nmpOiz4w68wspapdZn3GSZHYrQh5kjz7b0Aru0Jvw86M79mKp3b9AfJ9uXTcJhp
 4cQBralLnQvLKanvKi1z5CI6NjXx+r6rXI43N6NjHOtjRUPoFMqxZEH0d7o11aT1
 sN3UDgtFtuE9Pfrhnc5ZHuHqFCCyxA6NWD6nt1dUoSEo0oWt9mOHfeoFT4+45fO0
 5no5+1q3EkYdH4jiJlavnM2DMvVzMd6FbxDzWpXJ2j4W1vM6TLkexEJIK4MLGxPV
 76lxeXzcvbM9a0vq5BabR4QbPIAv+A9qYPmXJwPVrvjo+zynwuWc3gMO5hc4EaOf
 FXF2Ka5Jn97jW8/JS7i7Gj6M8GKdyIxaFHgS4MLtJNs6pt3h7m6bSgcIOQZ5psBZ
 dKRYjM2lxeVkvDDmy5Gztkw2asbofYQP004tgagc+jwXP7DwXaY=
 =xzZg
 -----END PGP SIGNATURE-----

Merge tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from xfrm, bpf, netfilter, and wireless.

  Current release - regressions:

   - xfrm: fix XFRM_MSG_MAPPING ABI breakage caused by inserting a new
     value in the middle of an enum

   - unix: fix an issue in unix_shutdown causing the other end
     read/write failures

   - phy: mdio: fix memory leak

  Current release - new code bugs:

   - mlx5e: improve MQPRIO resiliency against bad configs

  Previous releases - regressions:

   - bpf: fix integer overflow leading to OOB access in map element
     pre-allocation

   - stmmac: dwmac-rk: fix ethernet on rk3399 based devices

   - netfilter: conntrack: fix boot failure with
     nf_conntrack.enable_hooks=1

   - brcmfmac: revert using ISO3166 country code and 0 rev as fallback

   - i40e: fix freeing of uninitialized misc IRQ vector

   - iavf: fix double unlock of crit_lock

  Previous releases - always broken:

   - bpf, arm: fix register clobbering in div/mod implementation

   - netfilter: nf_tables: correct issues in netlink rule change event
     notifications

   - dsa: tag_dsa: fix mask for trunked packets

   - usb: r8152: don't resubmit rx immediately to avoid soft lockup on
     device unplug

   - i40e: fix endless loop under rtnl if FW fails to correctly respond
     to capability query

   - mlx5e: fix rx checksum offload coexistence with ipsec offload

   - mlx5: force round second at 1PPS out start time and allow it only
     in supported clock modes

   - phy: pcs: xpcs: fix incorrect CL37 AN sequence, EEE disable
     sequence

  Misc:

   - xfrm: slightly rejig the new policy uAPI to make it less cryptic"

* tag 'net-5.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (66 commits)
  net: prefer socket bound to interface when not in VRF
  iavf: fix double unlock of crit_lock
  i40e: Fix freeing of uninitialized misc IRQ vector
  i40e: fix endless loop under rtnl
  dt-bindings: net: dsa: marvell: fix compatible in example
  ionic: move filter sync_needed bit set
  gve: report 64bit tx_bytes counter from gve_handle_report_stats()
  gve: fix gve_get_stats()
  rtnetlink: fix if_nlmsg_stats_size() under estimation
  gve: Properly handle errors in gve_assign_qpl
  gve: Avoid freeing NULL pointer
  gve: Correct available tx qpl check
  unix: Fix an issue in unix_shutdown causing the other end read/write failures
  net: stmmac: trigger PCS EEE to turn off on link down
  net: pcs: xpcs: fix incorrect steps on disable EEE
  netlink: annotate data races around nlk->bound
  net: pcs: xpcs: fix incorrect CL37 AN sequence
  net: sfp: Fix typo in state machine debug string
  net/sched: sch_taprio: properly cancel timer from taprio_destroy()
  net: bridge: fix under estimation in br_get_linkxstats_size()
  ...
2021-10-07 09:50:31 -07:00
Stefan Assmann 54ee39439a iavf: fix double unlock of crit_lock
The crit_lock mutex could be unlocked twice as reported here
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210823/025525.html

Remove the superfluous unlock. Technically the problem was already
present before 5ac49f3c27 as that commit only replaced the locking
primitive, but no functional change.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 5ac49f3c27 ("iavf: use mutexes for locking of critical sections")
Fixes: bac8486116 ("iavf: Refactor the watchdog state machine")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:34 -07:00
Sylwester Dziedziuch 2e5a20573a i40e: Fix freeing of uninitialized misc IRQ vector
When VSI set up failed in i40e_probe() as part of PF switch set up
driver was trying to free misc IRQ vectors in
i40e_clear_interrupt_scheme and produced a kernel Oops:

   Trying to free already-free IRQ 266
   WARNING: CPU: 0 PID: 5 at kernel/irq/manage.c:1731 __free_irq+0x9a/0x300
   Workqueue: events work_for_cpu_fn
   RIP: 0010:__free_irq+0x9a/0x300
   Call Trace:
   ? synchronize_irq+0x3a/0xa0
   free_irq+0x2e/0x60
   i40e_clear_interrupt_scheme+0x53/0x190 [i40e]
   i40e_probe.part.108+0x134b/0x1a40 [i40e]
   ? kmem_cache_alloc+0x158/0x1c0
   ? acpi_ut_update_ref_count.part.1+0x8e/0x345
   ? acpi_ut_update_object_reference+0x15e/0x1e2
   ? strstr+0x21/0x70
   ? irq_get_irq_data+0xa/0x20
   ? mp_check_pin_attr+0x13/0xc0
   ? irq_get_irq_data+0xa/0x20
   ? mp_map_pin_to_irq+0xd3/0x2f0
   ? acpi_register_gsi_ioapic+0x93/0x170
   ? pci_conf1_read+0xa4/0x100
   ? pci_bus_read_config_word+0x49/0x70
   ? do_pci_enable_device+0xcc/0x100
   local_pci_probe+0x41/0x90
   work_for_cpu_fn+0x16/0x20
   process_one_work+0x1a7/0x360
   worker_thread+0x1cf/0x390
   ? create_worker+0x1a0/0x1a0
   kthread+0x112/0x130
   ? kthread_flush_work_fn+0x10/0x10
   ret_from_fork+0x1f/0x40

The problem is that at that point misc IRQ vectors
were not allocated yet and we get a call trace
that driver is trying to free already free IRQ vectors.

Add a check in i40e_clear_interrupt_scheme for __I40E_MISC_IRQ_REQUESTED
PF state before calling i40e_free_misc_vector. This state is set only if
misc IRQ vectors were properly initialized.

Fixes: c17401a1dd ("i40e: use separate state bit for miscellaneous IRQ setup")
Reported-by: PJ Waskiewicz <pwaskiewicz@jumptrading.com>
Signed-off-by: Sylwester Dziedziuch <sylwesterx.dziedziuch@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:34 -07:00
Jiri Benc 857b6c6f66 i40e: fix endless loop under rtnl
The loop in i40e_get_capabilities can never end. The problem is that
although i40e_aq_discover_capabilities returns with an error if there's
a firmware problem, the returned error is not checked. There is a check for
pf->hw.aq.asq_last_status but that value is set to I40E_AQ_RC_OK on most
firmware problems.

When i40e_aq_discover_capabilities encounters a firmware problem, it will
encounter the same problem on its next invocation. As the result, the loop
becomes endless. We hit this with I40E_ERR_ADMIN_QUEUE_TIMEOUT but looking
at the code, it can happen with a range of other firmware errors.

I don't know what the correct behavior should be: whether the firmware
should be retried a few times, or whether pf->hw.aq.asq_last_status should
be always set to the encountered firmware error (but then it would be
pointless and can be just replaced by the i40e_aq_discover_capabilities
return value). However, the current behavior with an endless loop under the
rtnl mutex(!) is unacceptable and Intel has not submitted a fix, although we
explained the bug to them 7 months ago.

This may not be the best possible fix but it's better than hanging the whole
system on a firmware bug.

Fixes: 56a62fc868 ("i40e: init code and hardware support")
Tested-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jiri Benc <jbenc@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Dave Switzer <david.switzer@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-06 09:49:23 -07:00
Shannon Nelson 3707428dda ionic: move filter sync_needed bit set
Move the setting of the filter-sync-needed bit to the error
case in the filter add routine to be sure we're checking the
live filter status rather than a copy of the pre-sync status.

Fixes: 969f843946 ("ionic: sync the filters in the work task")
Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:15:05 +01:00
Eric Dumazet 17c37d748f gve: report 64bit tx_bytes counter from gve_handle_report_stats()
Each tx queue maintains a 64bit counter for bytes, there is
no reason to truncate this to 32bit (or this has not been
documented)

Fixes: 24aeb56f2d ("gve: Add Gvnic stats AQ command and ethtool show/set-priv-flags.")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yangchun Fu <yangchun@google.com>
Cc: Kuo Zhao <kuozhao@google.com>
Cc: David Awogbemila <awogbemila@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:11:51 +01:00
Eric Dumazet 2f57d4975f gve: fix gve_get_stats()
gve_get_stats() can report wrong numbers if/when u64_stats_fetch_retry()
returns true.

What is needed here is to sample values in temporary variables,
and only use them after each loop is ended.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Catherine Sullivan <csully@google.com>
Cc: Sagi Shahar <sagis@google.com>
Cc: Jon Olson <jonolson@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Luigi Rizzo <lrizzo@google.com>
Cc: Jeroen de Borst <jeroendb@google.com>
Cc: Tao Liu <xliutaox@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:11:51 +01:00
Catherine Sullivan d4b111fda6 gve: Properly handle errors in gve_assign_qpl
Ignored errors would result in crash.

Fixes: ede3fcf5ec ("gve: Add support for raw addressing to the rx path")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
Tao Liu 922aa9bcac gve: Avoid freeing NULL pointer
Prevent possible crashes when cleaning up after unsuccessful
initializations.

Fixes: 893ce44df5 ("gve: Add basic driver framework for Compute Engine Virtual NIC")
Signed-off-by: Tao Liu <xliutaox@google.com>
Signed-off-by: Catherine Sully <csully@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
Catherine Sullivan d03477ee10 gve: Correct available tx qpl check
The qpl_map_size is rounded up to a multiple of sizeof(long), but the
number of qpls doesn't have to be.

Fixes: f5cedc84a3 ("gve: Add transmit and receive support")
Signed-off-by: Catherine Sullivan <csully@google.com>
Signed-off-by: Jeroen de Borst <jeroendb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 15:04:26 +01:00
Wong Vee Khee d4aeaed80b net: stmmac: trigger PCS EEE to turn off on link down
The current implementation enable PCS EEE feature in the event of link
up, but PCS EEE feature is not disabled on link down.

This patch makes sure PCE EEE feature is disabled on link down.

Fixes: 656ed8b015 ("net: stmmac: fix EEE init issue when paired with EEE capable PHYs")
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 11:18:27 +01:00
Wong Vee Khee 590df78bc7 net: pcs: xpcs: fix incorrect steps on disable EEE
When Energy-Efficient Ethernet(EEE) is disable from the MAC side,
we need to clear the DW_VR_MII_EEE_TRN_LPI bit of DW_VR_MII_EEE_MCTRL1
register.

Fixes: 7617af3d1a ("net: pcs: Introducing support for DWC xpcs Energy Efficient Ethernet")
Cc: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-06 11:18:27 +01:00
Wong Vee Khee e3cf002d5a net: pcs: xpcs: fix incorrect CL37 AN sequence
According to Synopsys DesignWare Cores Ethernet PCS databook, it is
required to disable Clause 37 auto-negotiation by programming bit-12
(AN_ENABLE) to 0 if it is already enabled, before programming various
fields of VR_MII_AN_CTRL registers.

After all these programming are done, it is then required to enable
Clause 37 auto-negotiation by programming bit-12 (AN_ENABLE) to 1.

Fixes: b97b5331b8 ("net: pcs: add C37 SGMII AN support for intel mGbE controller")
Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:10:24 +01:00
Sean Anderson 25a9da6641 net: sfp: Fix typo in state machine debug string
The string should be "tx_disable" to match the state enum.

Fixes: 4005a7cb4f ("net: phy: sftp: print debug message with text, not numbers")
Signed-off-by: Sean Anderson <sean.anderson@seco.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 13:08:13 +01:00
Hayes Wang baf33d7a75 r8152: avoid to resubmit rx immediately
For the situation that the disconnect event comes very late when the
device is unplugged, the driver would resubmit the RX bulk transfer
after getting the callback with -EPROTO immediately and continually.
Finally, soft lockup occurs.

This patch avoids to resubmit RX immediately. It uses a workqueue to
schedule the RX NAPI. And the NAPI would resubmit the RX. It let the
disconnect event have opportunity to stop the submission before soft
lockup.

Reported-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Tested-by: Jason-ch Chen <jason-ch.chen@mediatek.com>
Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-05 12:38:39 +01:00
Linus Torvalds 84928ce3bb Driver core fixes for 5.15-rc4
Here are some driver core and kernfs fixes for reported issues for
 5.15-rc4.  These fixes include:
 	- kernfs positive dentry bugfix
 	- debugfs_create_file_size error path fix
 	- cpumask sysfs file bugfix to preserve the user/kernel abi (has
 	  been reported multiple times.)
 	- devlink fixes for mdiobus devices as reported by the subsystem
 	  maintainers.
 
 Also included in here are some devlink debugging changes to make it
 easier for people to report problems when asked.  They have already
 helped with the mdiobus and other subsystems reporting issues.
 
 All of these have been linux-next for a while with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYVmC1A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykfCQCgtQ0kdPoPdUG6mPCn45rrbJQLBY4AnRL5JyhO
 zB60l6C2EHHnLnRxMnQq
 =gygB
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are some driver core and kernfs fixes for reported issues for
  5.15-rc4. These fixes include:

   - kernfs positive dentry bugfix

   - debugfs_create_file_size error path fix

   - cpumask sysfs file bugfix to preserve the user/kernel abi (has been
     reported multiple times.)

   - devlink fixes for mdiobus devices as reported by the subsystem
     maintainers.

  Also included in here are some devlink debugging changes to make it
  easier for people to report problems when asked. They have already
  helped with the mdiobus and other subsystems reporting issues.

  All of these have been linux-next for a while with no reported issues"

* tag 'driver-core-5.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: also call kernfs_set_rev() for positive dentry
  driver core: Add debug logs when fwnode links are added/deleted
  driver core: Create __fwnode_link_del() helper function
  driver core: Set deferred probe reason when deferred by driver core
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  driver core: fw_devlink: Improve handling of cyclic dependencies
  cpumask: Omit terminating null byte in cpumap_print_{list,bitmask}_to_buf
  debugfs: debugfs_create_file_size(): use IS_ERR to check for error
2021-10-03 11:10:09 -07:00
Punit Agrawal aec3f415f7 net: stmmac: dwmac-rk: Fix ethernet on rk3399 based devices
Commit 2d26f6e39a ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
while getting rid of a runtime PM warning ended up breaking ethernet
on rk3399 based devices. By dropping an extra reference to the device,
the commit ends up enabling suspend / resume of the ethernet device -
which appears to be broken.

While the issue with runtime pm is being investigated, partially
revert commit 2d26f6e39a to restore the network on rk3399.

Fixes: 2d26f6e39a ("net: stmmac: dwmac-rk: fix unbalanced pm_runtime_enable warnings")
Suggested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Punit Agrawal <punitagrawal@gmail.com>
Cc: Michael Riesch <michael.riesch@wolfvision.net>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20210929135049.3426058-1-punitagrawal@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 15:15:29 -07:00
Vladimir Oltean 019d9329e7 net: mscc: ocelot: fix VCAP filters remaining active after being deleted
When ocelot_flower.c calls ocelot_vcap_filter_add(), the filter has a
given filter->id.cookie. This filter is added to the block->rules list.

However, when ocelot_flower.c calls ocelot_vcap_block_find_filter_by_id()
which passes the cookie as argument, the filter is never found by
filter->id.cookie when searching through the block->rules list.

This is unsurprising, since the filter->id.cookie is an unsigned long,
but the cookie argument provided to ocelot_vcap_block_find_filter_by_id()
is a signed int, and the comparison fails.

Fixes: 50c6cc5b92 ("net: mscc: ocelot: store a namespaced VCAP filter ID")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Link: https://lore.kernel.org/r/20210930125330.2078625-1-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-01 15:13:20 -07:00
David S. Miller 78764f450b mlx5-fixes-2021-09-30
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmFWJy0ACgkQSD+KveBX
 +j7wJQgAqlx3rBiJGYgtnEr/HHzBSgZXkIICGfoQHfiOORoNTjpq46Csxuiu2Fg1
 JvxB1Dahoht/8Vfcm+GooKKDRlkUXx8vYhQF3BSQ7oboVGkWZ1a6ZuEUSfW3rrag
 oBkSCuVcn6EwVBFF0NvAi9ARd2G5GAYsew9yQHnXVQzp9ZWqzZcMMsp3pdhQvKLu
 CL0SlY0uZhXHyHNl5Gz79xQDPwSyXbVlhhEK9lIaPi5tcqA3X1Y7ZTqP/ouDjZBj
 /VLCIYJZ40471c4a2YVLEXiVZu0E0BDgOLlEWRiHIkTrQf+gjcjOCOUT1CkMOuid
 F7Ny4xnVaCzCIx8V5i3FXAOfcOeOIA==
 =TCu2
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2021-09-30' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

mlx5-fixes-2021-09-30
2021-10-01 14:16:52 +01:00
David S. Miller 5abab4982d wireless-drivers fixes for v5.15
Second set of fixes for v5.15, nothing major this time. Most important
 here are reverting a brcmfmac regression and a fix for an old rare
 ath5k build error.
 
 iwlwifi
 
 * fixes to NULL dereference, off by one and missing unlock
 
 * add support for Killer AX1650 on Dell XPS 15 (9510) laptop
 
 ath5k
 
 * build fix with LEDS=m
 
 brcmfmac
 
 * revert a regression causing BCM4359/9 devices stop working as access point
 
 mwifiex
 
 * fix clang warning about null pointer arithmetic
 -----BEGIN PGP SIGNATURE-----
 
 iQFJBAABCgAzFiEEiBjanGPFTz4PRfLobhckVSbrbZsFAmFWy3oVHGt2YWxvQGNv
 ZGVhdXJvcmEub3JnAAoJEG4XJFUm622bBRYIAKBa0NeCkgX8Jnfj492SQj8E9B2o
 +fim/W1OcBQqJnfFKaTgEHIYvudnaLuOAiCnTpzvQ5+Dc+dvM3RaPz9c405uzhwj
 TaHYs8zoB+//VWTGwmL+ACBYpUCdlVHRzwREv6hCiprd7M57rs1IYkgDgQ8EFCop
 prm3z4+Gcr48cNB0CYY39xOEz8K0+pjq82DMK5DnBrfBhO9xjf9PybXqKwjnEcxm
 qoRApBjNtDXrQVYHg+AcF8XQ+Vix9H5kSOW+60nGMCtkQszxvu/SQi6hJWALY1U2
 0iFNGj/uY92wibQBenqW/nc96vq99HSWPPJeAD52KPXIWZFIUt3V5vJ8aRU=
 =hYgS
 -----END PGP SIGNATURE-----

Merge tag 'wireless-drivers-2021-10-01' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers

wireless-drivers fixes for v5.15

Second set of fixes for v5.15, nothing major this time. Most important
here are reverting a brcmfmac regression and a fix for an old rare
ath5k build error.

iwlwifi

* fixes to NULL dereference, off by one and missing unlock

* add support for Killer AX1650 on Dell XPS 15 (9510) laptop

ath5k

* build fix with LEDS=m

brcmfmac

* revert a regression causing BCM4359/9 devices stop working as access point

mwifiex

* fix clang warning about null pointer arithmetic
2021-10-01 14:16:29 +01:00
Pavel Skripkin ca6e11c337 phy: mdio: fix memory leak
Syzbot reported memory leak in MDIO bus interface, the problem was in
wrong state logic.

MDIOBUS_ALLOCATED indicates 2 states:
	1. Bus is only allocated
	2. Bus allocated and __mdiobus_register() fails, but
	   device_register() was called

In case of device_register() has been called we should call put_device()
to correctly free the memory allocated for this device, but mdiobus_free()
calls just kfree(dev) in case of MDIOBUS_ALLOCATED state

To avoid this behaviour we need to set bus->state to MDIOBUS_UNREGISTERED
_before_ calling device_register(), because put_device() should be
called even in case of device_register() failure.

Link: https://lore.kernel.org/netdev/YVMRWNDZDUOvQjHL@shell.armlinux.org.uk/
Fixes: 46abc02175 ("phylib: give mdio buses a device tree presence")
Reported-and-tested-by: syzbot+398e7dc692ddbbb4cfec@syzkaller.appspotmail.com
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/eceae1429fbf8fa5c73dd2a0d39d525aa905074d.1633024062.git.paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 17:11:59 -07:00
Pavel Skripkin 10eff1f578 Revert "net: mdiobus: Fix memory leak in __mdiobus_register"
This reverts commit ab609f25d1.

This patch is correct in the sense that we _should_ call device_put() in
case of device_register() failure, but the problem in this code is more
vast.

We need to set bus->state to UNMDIOBUS_REGISTERED before calling
device_register() to correctly release the device in mdiobus_free().
This patch prevents us from doing it, since in case of device_register()
failure put_device() will be called 2 times and it will cause UAF or
something else.

Also, Reported-by: tag in revered commit was wrong, since syzbot
reported different leak in same function.

Link: https://lore.kernel.org/netdev/20210928092657.GI2048@kadam/
Acked-by: Yanfei Xu <yanfei.xu@windriver.com>
Signed-off-by: Pavel Skripkin <paskripkin@gmail.com>
Link: https://lore.kernel.org/r/f12fb1faa4eccf0f355788225335eb4309ff2599.1633024062.git.paskripkin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-09-30 17:11:58 -07:00
Aya Levin 3bf1742f3c net/mlx5e: Mutually exclude setting of TX-port-TS and MQPRIO in channel mode
TX-port-TS hijacks the PTP traffic to a specific HW TX-queue. This
conflicts with MQPRIO in channel mode, which specifies explicitly which
TC accepts the packet. This patch mutually excludes the above
configuration.

Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Lama Kayal dd1979cf3c net/mlx5e: Fix the presented RQ index in PTP stats
PTP-RQ counters title format contains PTP-RQ identifier, which is
mistakenly not passed to sprinft().
This leads to unexpected garbage values instead.
This patch fixes it.

Before applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq15_packets: 0
     ptp_rq8_bytes: 0
     ptp_rq6_csum_complete: 0
     ptp_rq14_csum_complete_tail: 0
     ptp_rq3_csum_complete_tail_slow : 0
     ptp_rq9_csum_unnecessary: 0
     ptp_rq1_csum_unnecessary_inner: 0
     ptp_rq7_csum_none: 0
     ptp_rq10_xdp_drop: 0
     ptp_rq9_xdp_redirect: 0
     ptp_rq13_lro_packets: 0
     ptp_rq12_lro_bytes: 0
     ptp_rq10_ecn_mark: 0
     ptp_rq9_removed_vlan_packets: 0
     ptp_rq5_wqe_err: 0
     ptp_rq8_mpwqe_filler_cqes: 0
     ptp_rq2_mpwqe_filler_strides: 0
     ptp_rq5_oversize_pkts_sw_drop: 0
     ptp_rq6_buff_alloc_err: 0
     ptp_rq15_cqe_compress_blks: 0
     ptp_rq2_cqe_compress_pkts: 0
     ptp_rq2_cache_reuse: 0
     ptp_rq12_cache_full: 0
     ptp_rq11_cache_empty: 256
     ptp_rq12_cache_busy: 0
     ptp_rq11_cache_waive: 0
     ptp_rq12_congst_umr: 0
     ptp_rq11_arfs_err: 0
     ptp_rq9_recover: 0

After applying the patch:
ethtool -S eth3 | grep ptp_rq
     ptp_rq0_packets: 0
     ptp_rq0_bytes: 0
     ptp_rq0_csum_complete: 0
     ptp_rq0_csum_complete_tail: 0
     ptp_rq0_csum_complete_tail_slow : 0
     ptp_rq0_csum_unnecessary: 0
     ptp_rq0_csum_unnecessary_inner: 0
     ptp_rq0_csum_none: 0
     ptp_rq0_xdp_drop: 0
     ptp_rq0_xdp_redirect: 0
     ptp_rq0_lro_packets: 0
     ptp_rq0_lro_bytes: 0
     ptp_rq0_ecn_mark: 0
     ptp_rq0_removed_vlan_packets: 0
     ptp_rq0_wqe_err: 0
     ptp_rq0_mpwqe_filler_cqes: 0
     ptp_rq0_mpwqe_filler_strides: 0
     ptp_rq0_oversize_pkts_sw_drop: 0
     ptp_rq0_buff_alloc_err: 0
     ptp_rq0_cqe_compress_blks: 0
     ptp_rq0_cqe_compress_pkts: 0
     ptp_rq0_cache_reuse: 0
     ptp_rq0_cache_full: 0
     ptp_rq0_cache_empty: 256
     ptp_rq0_cache_busy: 0
     ptp_rq0_cache_waive: 0
     ptp_rq0_congst_umr: 0
     ptp_rq0_arfs_err: 0
     ptp_rq0_recover: 0

Fixes: a28359e922 ("net/mlx5e: Add PTP-RX statistics")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:57 -07:00
Shay Drory f88c487634 net/mlx5: Fix setting number of EQs of SFs
When setting number of completion EQs of the SF, consider number of
online CPUs.
Without this consideration, when number of online cpus are less than 8,
unnecessary 8 completion EQs are allocated.

Fixes: c36326d38d ("net/mlx5: Round-Robin EQs over IRQs")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Shay Drory ac8b7d50ae net/mlx5: Fix length of irq_index in chars
The maximum irq_index can be 2047, This means irq_name should have 4
characters reserve for the irq_index. Hence, increase it to 4.

Fixes: 3af26495a2 ("net/mlx5: Enlarge interrupt field in CREATE_EQ")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin 99b9a678b2 net/mlx5: Avoid generating event after PPS out in Real time mode
When in Real-time mode, HW clock is synced with the PTP daemon. Hence
driver should not re-calibrate the next pulse (via MTPPSE repetitive
events mechanism).

This patch arms repetitive events only in free-running mode.

Fixes: 432119de33 ("net/mlx5: Add cyc2time HW translation mode support")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:56 -07:00
Aya Levin 6472829470 net/mlx5: Force round second at 1PPS out start time
Allow configuration of 1PPS start time only with time-stamp representing
a round second. Prior to this patch driver allowed setting of a
non-round-second which is not supported by the device. Avoid unexpected
behavior by restricting start-time configuration to a round-second.

Fixes: 4272f9b88d ("net/mlx5e: Change 1PPS out scheme")
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Moshe Shemesh a586775f83 net/mlx5: E-Switch, Fix double allocation of acl flow counter
Flow counter is allocated in eswitch legacy acl setting functions
without checking if already allocated by previous setting. Add a check
to avoid such double allocation.

Fixes: 07bab95026 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes")
Fixes: ea651a86d4 ("net/mlx5: E-Switch, Refactor eswitch egress acl codes")
Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan 7dbc849b2a net/mlx5e: Improve MQPRIO resiliency
* Add netdev->tc_to_txq rollback in case of failure in
  mlx5e_update_netdev_queues().
* Fix broken transition between the two modes:
  MQPRIO DCB mode with tc==8, and MQPRIO channel mode.
* Disable MQPRIO channel mode if re-attaching with a different number
  of channels.
* Improve code sharing.

Fixes: ec60c4581b ("net/mlx5e: Support MQPRIO channel mode")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:55 -07:00
Tariq Toukan 9d758d4a3a net/mlx5e: Keep the value for maximum number of channels in-sync
The value for maximum number of channels is first calculated based
on the netdev's profile and current function resources (specifically,
number of MSIX vectors, which depends among other things on the number
of online cores in the system).
This value is then used to calculate the netdev's number of rxqs/txqs.
Once created (by alloc_etherdev_mqs), the number of netdev's rxqs/txqs
is constant and we must not exceed it.

To achieve this, keep the maximum number of channels in sync upon any
netdevice re-attach.

Use mlx5e_get_max_num_channels() for calculating the number of netdev's
rxqs/txqs. After netdev is created, use mlx5e_calc_max_nch() (which
coinsiders core device resources, profile, and netdev) to init or
update priv->max_nch.

Before this patch, the value of priv->max_nch might get out of sync,
mistakenly allowing accesses to out-of-bounds objects, which would
crash the system.

Track the number of channels stats structures used in a separate
field, as they are persistent to suspend/resume operations. All the
collected stats of every channel index that ever existed should be
preserved. They are reset only when struct mlx5e_priv is,
in mlx5e_priv_cleanup(), which is part of the profile changing flow.

There is no point anymore in blocking a profile change due to max_nch
mismatch in mlx5e_netdev_change_profile(). Remove the limitation.

Fixes: a1f240f180 ("net/mlx5e: Adjust to max number of channles when re-attaching")
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00
Raed Salem f9a10440f0 net/mlx5e: IPSEC RX, enable checksum complete
Currently in Rx data path IPsec crypto offloaded packets uses
csum_none flag, so checksum is handled by the stack, this naturally
have some performance/cpu utilization impact on such flows. As Nvidia
NIC starting from ConnectX6DX provides checksum complete value out of
the box also for such flows there is no sense in taking csum_none path,
furthermore the stack (xfrm) have the method to handle checksum complete
corrections for such flows i.e. IPsec trailer removal and consequently
checksum value adjustment.

Because of the above and in addition the ConnectX6DX is the first HW
which supports IPsec crypto offload then it is safe to report csum
complete for IPsec offloaded traffic.

Fixes: b2ac7541e3 ("net/mlx5e: IPsec: Add Connect-X IPsec Rx data path offload")
Signed-off-by: Raed Salem <raeds@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-09-30 14:07:54 -07:00