OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Amit Cohen	2be5c8a963	mlxsw: spectrum_ethtool: Move mlxsw_sp_port_type_speed_ops structs Move mlxsw_sp1_port_type_speed_ops and mlxsw_sp2_port_type_speed_ops with the relevant code from spectrum.c to spectrum_ethtool.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-29 17:45:02 -07:00
Amit Cohen	614d509aa1	mlxsw: Move ethtool_ops to spectrum_ethtool.c Add spectrum_ethtool.c file for ethtool code. Move ethtool_ops and the relevant code from spectrum.c to spectrum_ethtool.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-29 17:45:02 -07:00
Amit Cohen	a2af44b64c	mlxsw: spectrum_dcb: Rename mlxsw_sp_port_headroom_set() mlxsw_sp_port_headroom_set() is defined twice - in spectrum.c and in spectrum_dcb.c, with different arguments and different implementation but the name is same. Rename mlxsw_sp_port_headroom_set() to mlxsw_sp_port_headroom_ets_set() in order to allow using the second function in several files, and not only as static function in spectrum.c. Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-29 17:45:02 -07:00
Tariq Toukan	a29074367b	net/mlx5e: kTLS, Improve rx handler function call Prior to this patch mlx5e tls rx handler was called unconditionally on all rx frames and the decision whether a frame is a valid tls record is done inside that function. A function call can be expensive especially for regular rx packet rate. To avoid this, check the tls validity before jumping into the tls rx handler. While at it, split between kTLS device offload rx handler and FPGA tls rx handler using a similar method. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com>	2020-06-27 14:00:25 -07:00
Tariq Toukan	ed9a7c53b8	net/mlx5e: kTLS, Cleanup redundant capability check All callers of mlx5e_ktls_build_netdev() check capability before the call. Remove the repeated check in the function. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:25 -07:00
Tariq Toukan	c5607360ec	net/mlx5e: Increase Async ICO SQ size Resync communication with HW for kTLS RX is done via the async ICOSQs. kTLS RX resync requests might come in bursts. To improve the success chances for such bursts, use a larger ICOSQ. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:24 -07:00
Tariq Toukan	76c1e1ac2a	net/mlx5e: kTLS, Add kTLS RX stats Add global and per-channel ethtool SW stats for the device offload. Document the new counters in tls-offload.rst. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:23 -07:00
Tariq Toukan	0419d8c9d8	net/mlx5e: kTLS, Add kTLS RX resync support Implement the RX resync procedure, using the TLS async resync API. The HW offload of TLS decryption in RX side might get out-of-sync due to out-of-order reception of packets. This requires SW intervention to update the HW context and get it back in-sync. Performance: CPU: Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz, 24 cores, HT off NIC: ConnectX-6 Dx 100GbE dual port Goodput (app-layer throughput) comparison: +---------------+-------+-------+---------+ \| # connections \| 1 \| 4 \| 8 \| +---------------+-------+-------+---------+ \| SW (Gbps) \| 7.26 \| 24.70 \| 50.30 \| +---------------+-------+-------+---------+ \| HW (Gbps) \| 18.50 \| 64.30 \| 92.90 \| +---------------+-------+-------+---------+ \| Speedup \| 2.55x \| 2.56x \| 1.85x * \| +---------------+-------+-------+---------+ * After linerate is reached, diff is observed in CPU util. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:23 -07:00
Tariq Toukan	1182f36593	net/mlx5e: kTLS, Add kTLS RX HW offload support Implement driver support for the kTLS RX HW offload feature. Resync support is added in a downstream patch. New offload contexts post their static/progress params WQEs over the per-channel async ICOSQ, protected under a spin-lock. The Channel/RQ is selected according to the socket's rxq index. Feature is OFF by default. Can be turned on by: $ ethtool -K <if> tls-hw-rx-offload on A new TLS-RX workqueue is used to allow asynchronous addition of steering rules, out of the NAPI context. It will be also used in a downstream patch in the resync procedure. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:21 -07:00
Tariq Toukan	df8d866770	net/mlx5e: kTLS, Use kernel API to extract private offload context Modify the implementation of the private kTLS TX HW offload context getter and setter, so it uses the kernel API functions, instead of a local shadow structure. A single BUILD_BUG_ON check is sufficient, remove the duplicate. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:20 -07:00
Tariq Toukan	7d0d0d86ec	net/mlx5e: kTLS, Improve TLS feature modularity Better separate the code into c/h files, so that kTLS internals are exposed to the corresponding non-accel flow as follows: - Necessary datapath functions are exposed via ktls_txrx.h. - Necessary caps and configuration functions are exposed via ktls.h, which became very small. In addition, kTLS internal code sharing is done via ktls_utils.h, which is not exposed to any non-accel file. Add explicit WQE structures for the TLS static and progress params, breaking the union of the static with UMR, and the progress with PSV. Generalize the API as a preparation for TLS RX offload support. Move kTLS TX-specific code to the proper file. Remove the inline tag for function in C files, let the compiler decide. Use kzalloc/kfree for the priv_tx context. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>	2020-06-27 14:00:20 -07:00
Tariq Toukan	5229a96e59	net/mlx5e: Accel, Expose flow steering API for rules add/del Given a socket, the function extracts the TCP/IP{4,6} ntuple and adds rule to steering. Another function gets the rule and deletes it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>	2020-06-27 14:00:19 -07:00
Boris Pismenny	c062d52ac2	net/mlx5e: Receive flow steering framework for accelerated TCP flows The framework allows creating flow tables to steer incoming traffic of TCP sockets to the acceleration TIRs. This is used in downstream patches for TLS, and will be used in the future for other offloads. Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:18 -07:00
Saeed Mahameed	b8922a73ec	net/mlx5e: API to manipulate TTC rules destinations Store the default destinations of the on-load generated TTC (Traffic Type Classifier) rules in the ttc rules table. Introduce TTC API functions to manipulate/restore and get the TTC rule destination and use these API functions in arfs implementation. This will allow a better decoupling between TTC implementation and its users. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>	2020-06-27 14:00:18 -07:00
Tariq Toukan	c293ac927f	net/mlx5e: Refactor build channel params Take the CQ params into their respective RQ/SQ params. Split the params build of the different ICOSQs (sync and async), as they require different init values. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:17 -07:00
Tariq Toukan	8d94b590f1	net/mlx5e: Turn XSK ICOSQ into a general asynchronous one There is an upcoming demand (in downstream patches) for an ICOSQ to be populated out of the NAPI context, asynchronously. There is already an existing one serving XSK-related use case. In this patch, promote this ICOSQ to serve as general async ICOSQ, to be used for XSK and non-XSK flows. As part of this, the reg_umr bit of the SQ context is now set (if capable), as the general async ICOSQ should support possible posts of UMR WQEs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:16 -07:00
Saeed Mahameed	e396eccf0f	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: kTLS, Improve TLS params layout structures net/mlx5: Avoid eswitch header inclusion in fs core layer net/mlx5: Avoid RDMA file inclusion in core driver net/mlx5: Add support in query QP, CQ and MKEY segments net/mlx5: Export resource dump interface Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 14:00:13 -07:00
Tariq Toukan	2d1b69ed65	net/mlx5: kTLS, Improve TLS params layout structures Add explicit WQE segment structures for the TLS static and progress params. According to the HW spec, TISN is not part of the progress params context, take it out of it. Rename the control segment tisn field as it could hold either a TIS or a TIR number. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 13:50:46 -07:00
Parav Pandit	188f0f988b	net/mlx5: Avoid eswitch header inclusion in fs core layer Flow steering core layer is independent of the eswitch layer. Hence avoid fs_core dependency on eswitch. Fixes: `328edb499f` ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-27 13:50:46 -07:00
David S. Miller	7bed145516	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Minor overlapping changes in xfrm_device.c, between the double ESP trailing bug fix setting the XFRM_INIT flag and the changes in net-next preparing for bonding encryption support. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-25 19:29:51 -07:00
Saeed Mahameed	efbb974d8e	net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() struct mlx5_vxlan_port is not exposed to the outside callers, it is redundant to return a pointer to it from mlx5_vxlan_port_lookup(), to be only used as a boolean, so just return a boolean. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:46 -07:00
Saeed Mahameed	7a64ca862a	net/mlx5e: vxlan: Use RCU for vxlan table lookup Remove the spinlock protecting the vxlan table and use RCU instead. This will improve performance as it will eliminate contention on data path cores. Fixes: `b3f63c3d5e` ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com>	2020-06-25 12:41:43 -07:00
Vlad Buslov	185901ceeb	net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT en_tc.h header file declares several TC-specific functions in CONFIG_MLX5_ESWITCH block even though those functions are only compiled when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move them to proper block. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Dickman <maord@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:40 -07:00
Alaa Hleihel	8fab0175aa	net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c After the cited commit, the header net/arp.h is no longer used in en_rep.c. So, move it to the new file rep/neigh.c that uses it now. Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:38 -07:00
Maxim Mikityanskiy	d39c9885b6	net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel mlx5e_xsk_first_unused_channel is a leftover from old versions of the first XSK commit, and it was never used. Remove it. Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:35 -07:00
Denis Efremov	360000b26e	net/mlx5: Use kfree(ft->g) in arfs_create_groups() Use kfree() instead of kvfree() on ft->g in arfs_create_groups() because the memory is allocated with kcalloc(). Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:32 -07:00
Hu Haowen	39797f1c53	net/mlx5: FWTrace: Add missing space Missing space at the end of a comment line, add it. Signed-off-by: Hu Haowen <xianfengting221@163.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:30 -07:00
Parav Pandit	04dfa7057b	net/mlx5: Avoid eswitch header inclusion in fs core layer Flow steering core layer is independent of the eswitch layer. Hence avoid fs_core dependency on eswitch. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-25 12:41:28 -07:00
Jarod Wilson	bdfd2d1fa7	bonding/xfrm: use real_dev instead of slave_dev Rather than requiring every hw crypto capable NIC driver to do a check for slave_dev being set, set real_dev in the xfrm layer and xso init time, and then override it in the bonding driver as needed. Then NIC drivers can always use real_dev, and at the same time, we eliminate the use of a variable name that probably shouldn't have been used in the first place, particularly given recent current events. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Suggested-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:19:55 -07:00
Petr Machata	34639fa383	mlxsw: Enforce firmware version for Spectrum-3 In a fashion similar to the other Spectrum systems, enforce a specific firmware version for Spectrum-3 so that the driver and firmware are always in sync with regards to new features. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:14:13 -07:00
Petr Machata	69c8a8c543	mlxsw: Bump firmware version to XX.2007.1168 This version comes with fixes to the following problems, among others: - Wrong shaper configuration on Spectrum-1 - Bogus temperature reading on Spectrum-2 - Problems in setting egress buffer size after MTU change on Spectrum-2 Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:14:13 -07:00
Masanari Iida	13fdc4193c	mlxsw: spectrum_dcb: Fix a spelling typo in spectrum_dcb.c This patch fixes a spelling typo in spectrum_dcb.c Signed-off-by: Masanari Iida <standby24x7@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-23 15:03:54 -07:00
Maor Gottlieb	608ca553c9	net/mlx5: Add support in query QP, CQ and MKEY segments Introduce new resource dump segments - PRM_QUERY_QP, PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource dump in PRM query format. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-06-23 17:26:10 +03:00
Maor Gottlieb	d63cc24933	net/mlx5: Export resource dump interface Export some of the resource dump API. mlx5_ib driver will use it in downstream patches. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2020-06-23 17:26:10 +03:00
Petr Machata	ce10d7d4ad	mlxsw: spectrum_acl: Support FLOW_ACTION_MANGLE for TCP, UDP ports Spectrum-2 supports an ACL action L4_PORT, which allows TCP and UDP source and destination port number change. Offload suitable mangles to this action. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	faad0525c0	mlxsw: core_acl_flex_actions: Add L4_PORT_ACTION Add fields related to L4_PORT_ACTION, which is used for changing of TCP and UDP port numbers. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Petr Machata	3cc9a15a0b	mlxsw: spectrum: Split handling of pedit mangle by chip type Certain ACL actions are only available on some Spectrum revisions. In particular, L4_PORT_ACTION is not available on Spectrum-1. Introduce a new ops struct intended to hold these differences, mlxsw_sp_rulei_ops. Prime it with a sole member, act_mangle_field, meant for handling of pedit mangles. Create two ops structures, one for Spectrum-1, the other for Spectrum-2 and above. Add callbacks for act_mangle_field and dispatch to the common handler. Invoke mlxsw_sp_rulei_ops.act_mangle_field from the field mangler instead of calling the common handler directly. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:32:11 -07:00
Ido Schimmel	f3fe412b0a	mlxsw: spectrum: Do not rely on machine endianness The second commit cited below performed a cast of 'u32 buffsize' to '(u16 )' when calling mlxsw_sp_port_headroom_8x_adjust(): mlxsw_sp_port_headroom_8x_adjust(mlxsw_sp_port, (u16 ) &buffsize); Colin noted that this will behave differently on big endian architectures compared to little endian architectures. Fix this by following Colin's suggestion and have the function accept and return 'u32' instead of passing the current size by reference. Fixes: `da382875c6` ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Fixes: `60833d54d5` ("mlxsw: spectrum: Adjust headroom buffers for 8x ports") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Colin Ian King <colin.king@canonical.com> Suggested-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 16:29:51 -07:00
Jarod Wilson	bf3a058de5	mlx5: become aware of when running as a bonding slave I've been unable to get my hands on suitable supported hardware to date, but I believe this ought to be all that is needed to enable the mlx5 driver to also work with bonding active-backup crypto offload passthru. CC: Boris Pismenny <borisp@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> CC: Leon Romanovsky <leon@kernel.org> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Jakub Kicinski <kuba@kernel.org> CC: Steffen Klassert <steffen.klassert@secunet.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:38:57 -07:00
Parav Pandit	330077d14d	net/mlx5: E-switch, Supporting setting devlink port function mac address Enable user to set mac address of the PCI PF and VF port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	1094795ce4	net/mlx5: Split mac address setting function for using state_lock Refactor mac address setting function to let caller hold the necessary state_lock mutex, so that subsequent patch and use this helper routine. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	f099fde16d	net/mlx5: E-switch, Support querying port function mac address Support querying mac address of the eswitch devlink port function. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	443bf36eb5	net/mlx5: Move helper to eswitch layer To use port number to port index conversion at eswitch level, move it to eswitch header. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	bd93975353	net/mlx5: E-switch, Introduce and use eswitch support check helper Introduce an helper routine to get esw from a devlink device and use it at eswitch callbacks and in subsequent patch. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
Parav Pandit	fa997825eb	net/mlx5: Constify mac address pointer Since none of the functions need to modify the input mac address, constify them. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-22 15:29:19 -07:00
wenxu	a1db217861	net: flow_offload: fix flow_indr_dev_unregister path If the representor is removed, then identify the indirect flow_blocks that need to be removed by the release callback and the port representor structure. To identify the port representor structure, a new indr.cb_priv field needs to be introduced. The flow_block also needs to be removed from the driver list from the cleanup path. Fixes: `1fac52da59` ("net: flow_offload: consolidate indirect flow_block infrastructure") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 20:12:58 -07:00
wenxu	66f1939a1b	flow_offload: use flow_indr_block_cb_alloc/remove function Prepare fix the bug in the next patch. use flow_indr_block_cb_alloc/remove function and remove the __flow_block_indr_binding. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 20:12:58 -07:00
Po Liu	4b61d3e8d3	net: qos offload add flow status with dropped count This patch adds a drop frames counter to tc flower offloading. Reporting h/w dropped frames is necessary for some actions. Some actions like police action and the coming introduced stream gate action would produce dropped frames which is necessary for user. Status update shows how many filtered packets increasing and how many dropped in those packets. v2: Changes - Update commit comments suggest by Jiri Pirko. Signed-off-by: Po Liu <Po.Liu@nxp.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-19 12:53:30 -07:00
Ido Schimmel	60833d54d5	mlxsw: spectrum: Adjust headroom buffers for 8x ports The port's headroom buffers are used to store packets while they traverse the device's pipeline and also to store packets that are egress mirrored. On Spectrum-3, ports with eight lanes use two headroom buffers between which the configured headroom size is split. In order to prevent packet loss, multiply the calculated headroom size by two for 8x ports. Fixes: `da382875c6` ("mlxsw: spectrum: Extend to support Spectrum-3 ASIC") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-16 13:46:27 -07:00
Linus Torvalds	96144c58ab	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: 1) Fix cfg80211 deadlock, from Johannes Berg. 2) RXRPC fails to send norigications, from David Howells. 3) MPTCP RM_ADDR parsing has an off by one pointer error, fix from Geliang Tang. 4) Fix crash when using MSG_PEEK with sockmap, from Anny Hu. 5) The ucc_geth driver needs __netdev_watchdog_up exported, from Valentin Longchamp. 6) Fix hashtable memory leak in dccp, from Wang Hai. 7) Fix how nexthops are marked as FDB nexthops, from David Ahern. 8) Fix mptcp races between shutdown and recvmsg, from Paolo Abeni. 9) Fix crashes in tipc_disc_rcv(), from Tuong Lien. 10) Fix link speed reporting in iavf driver, from Brett Creeley. 11) When a channel is used for XSK and then reused again later for XSK, we forget to clear out the relevant data structures in mlx5 which causes all kinds of problems. Fix from Maxim Mikityanskiy. 12) Fix memory leak in genetlink, from Cong Wang. 13) Disallow sockmap attachments to UDP sockets, it simply won't work. From Lorenz Bauer. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (83 commits) net: ethernet: ti: ale: fix allmulti for nu type ale net: ethernet: ti: am65-cpsw-nuss: fix ale parameters init net: atm: Remove the error message according to the atomic context bpf: Undo internal BPF_PROBE_MEM in BPF insns dump libbpf: Support pre-initializing .bss global variables tools/bpftool: Fix skeleton codegen bpf: Fix memlock accounting for sock_hash bpf: sockmap: Don't attach programs to UDP sockets bpf: tcp: Recv() should return 0 when the peer socket is closed ibmvnic: Flush existing work items before device removal genetlink: clean up family attributes allocations net: ipa: header pad field only valid for AP->modem endpoint net: ipa: program upper nibbles of sequencer type net: ipa: fix modem LAN RX endpoint id net: ipa: program metadata mask differently ionic: add pcie_print_link_status rxrpc: Fix race between incoming ACK parser and retransmitter net/mlx5: E-Switch, Fix some error pointer dereferences net/mlx5: Don't fail driver on failure to create debugfs net/mlx5e: CT: Fix ipv6 nat header rewrite actions ...	2020-06-13 16:27:13 -07:00
Masahiro Yamada	a7f7f6248d	treewide: replace '---help---' in Kconfig files with 'help' Since commit `84af7a6194` ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig' \| xargs sed -i 's/^[[:space:]]---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>	2020-06-14 01:57:21 +09:00
Dan Carpenter	09a9297574	net/mlx5: E-Switch, Fix some error pointer dereferences We can't leave "counter" set to an error pointer. Otherwise either it will lead to an error pointer dereference later in the function or it leads to an error pointer dereference when we call mlx5_fc_destroy(). Fixes: `07bab95026` ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:08 -07:00
Leon Romanovsky	17e73d47cd	net/mlx5: Don't fail driver on failure to create debugfs Clang warns: drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:6: warning: variable 'err' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized] if (!priv->dbg_root) { ^~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1303:9: note: uninitialized use occurs here return err; ^~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1278:2: note: remove the 'if' if its condition is always false if (!priv->dbg_root) { ^~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/main.c:1259:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. The check of returned value of debugfs_create_dir() is wrong because by the design debugfs failures should never fail the driver and the check itself was wrong too. The kernel compiled without CONFIG_DEBUG_FS will return ERR_PTR(-ENODEV) and not NULL as expected. Fixes: `11f3b84d70` ("net/mlx5: Split mdev init and pci init") Link: https://github.com/ClangBuiltLinux/linux/issues/1042 Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:06 -07:00
Oz Shlomo	0d156f2ded	net/mlx5e: CT: Fix ipv6 nat header rewrite actions Set the ipv6 word fields according to the hardware definitions. Fixes: `ac991b48d4` ("net/mlx5e: CT: Offload established flows") Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:04 -07:00
Parav Pandit	98f91c4576	net/mlx5: Fix devlink objects and devlink device unregister sequence Current below problems exists. 1. devlink device is registered by mlx5_load_one(). But it is not unregistered by mlx5_unload_one(). This is incorrect. 2. Above issue leads to, When mlx5 PCI device is removed, currently devlink device is unregistered before devlink ports are unregistered in below ladder diagram. remove_one() mlx5_devlink_unregister() [..] devlink_unregister() <- ports are still registered! mlx5_unload_one() mlx5_unregister_device() mlx5_remove_device() mlx5e_remove() mlx5e_devlink_port_unregister() devlink_port_unregister() 3. Condition checking for registering and unregister device are not symmetric either in these routines. Hence, fix the sequence by having load and unload routines symmetric and in right order. i.e. (a) register devlink device followed by registering devlink ports (b) unregister devlink ports followed by devlink device Do this based on boot and cleanup flags instead of different conditions. Fixes: `c6acd629ee` ("net/mlx5e: Add support for devlink-port in non-representors mode") Fixes: `f60f315d33` ("net/mlx5e: Register devlink ports for physical link, PCI PF, VFs") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:02 -07:00
Parav Pandit	60904cd349	net/mlx5: Disable reload while removing the device While unregistration is in progress, user might be reloading the interface. This can race with unregistration in below flow which uses the resources which are getting disabled by reload flow. Hence, disable the devlink reloading first when removing the device. CPU0 CPU1 ---- ---- local_pci_remove() devlink_mutex remove_one() devlink_nl_cmd_reload() mlx5_unregister_device() devlink_reload() ops->reload_down() mlx5_unload_one() Fixes: `4383cfcc65` ("net/mlx5: Add devlink reload") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:38:00 -07:00
Aya Levin	5f1572e617	net/mlx5e: Fix ethtool hfunc configuration change Changing RX hash function requires rearranging of RQT internal indexes, the user isn't exposed to such changes and these changes do not affect the user configured indirection table. Rebuild RQ table on hfunc change. Fixes: `bdfc028de1` ("net/mlx5e: Fix ethtool RX hash func configuration change") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:57 -07:00
Maxim Mikityanskiy	36d45fb9d2	net/mlx5e: Fix repeated XSK usage on one channel After an XSK is closed, the relevant structures in the channel are not zeroed. If an XSK is opened the second time on the same channel without recreating channels, the stray values in the structures will lead to incorrect operation of queues, which causes CQE errors, and the new socket doesn't work at all. This patch fixes the issue by explicitly zeroing XSK-related structs in the channel on XSK close. Note that those structs are zeroed on channel creation, and usually a configuration change (XDP program is set) happens on XSK open, which leads to recreating channels, so typical XSK usecases don't suffer from this issue. However, if XSKs are opened and closed on the same channel without removing the XDP program, this bug reproduces. Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:55 -07:00
Denis Efremov	47a357de2b	net/mlx5: DR, Fix freeing in dr_create_rc_qp() Variable "in" in dr_create_rc_qp() is allocated with kvzalloc() and should be freed with kvfree(). Fixes: `297cccebdc` ("net/mlx5: DR, Expose an internal API to issue RDMA operations") Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:53 -07:00
Shay Drory	b6e0b6bebe	net/mlx5: Fix fatal error handling during device load Currently, in case of fatal error during mlx5_load_one(), we cannot enter error state until mlx5_load_one() is finished, what can take several minutes until commands will get timeouts, because these commands can't be processed due to the fatal error. Fix it by setting dev->state as MLX5_DEVICE_STATE_INTERNAL_ERROR before requesting the lock. Fixes: `c1d4d2e92a` ("net/mlx5: Avoid calling sleeping function by the health poll thread") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:51 -07:00
Shay Drory	42ea9f1b5c	net/mlx5: drain health workqueue in case of driver load error In case there is a work in the health WQ when we teardown the driver, in driver load error flow, the health work will try to read dev->iseg, which was already unmap in mlx5_pci_close(). Fix it by draining the health workqueue first thing in mlx5_pci_close(). Trace of the error: BUG: unable to handle page fault for address: ffffb5b141c18014 PF: supervisor read access in kernel mode PF: error_code(0x0000) - not-present page PGD 1fe95d067 P4D 1fe95d067 PUD 1fe95e067 PMD 1b7823067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 3 PID: 6755 Comm: kworker/u128:2 Not tainted 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ? mlx5_health_try_recover+0x4d/0x270 [mlx5_core] mlx5_fw_fatal_reporter_recover+0x16/0x20 [mlx5_core] devlink_health_reporter_recover+0x1c/0x50 devlink_health_report+0xfb/0x240 mlx5_fw_fatal_reporter_err_work+0x65/0xd0 [mlx5_core] process_one_work+0x1fb/0x4e0 ? process_one_work+0x16b/0x4e0 worker_thread+0x4f/0x3d0 kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 ? kthread_cancel_delayed_work_sync+0x20/0x20 ret_from_fork+0x1f/0x30 Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 nfs fscache 8021q garp mrp stp llc ipmi_devintf ipmi_msghandler rpcrdma rdma_ucm ib_iser rdma_cm ib_umad iw_cm ib_ipoib libiscsi scsi_transport_iscsi ib_cm mlx5_ib ib_uverbs ib_core mlx5_core sb_edac crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 mlxfw crypto_simd cryptd glue_helper input_leds hyperv_fb intel_rapl_perf joydev serio_raw pci_hyperv pci_hyperv_mini mac_hid hv_balloon nfsd auth_rpcgss nfs_acl lockd grace sunrpc sch_fq_codel ip_tables x_tables autofs4 hv_utils hid_generic hv_storvsc ptp hid_hyperv hid hv_netvsc hyperv_keyboard pps_core scsi_transport_fc psmouse hv_vmbus i2c_piix4 floppy pata_acpi CR2: ffffb5b141c18014 ---[ end trace b12c5503157cad24 ]--- RIP: 0010:ioread32be+0x30/0x40 Code: 00 77 27 48 81 ff 00 00 01 00 76 07 0f b7 d7 ed 0f c8 c3 55 48 c7 c6 3b ee d5 9f 48 89 e5 e8 67 fc ff ff b8 ff ff ff ff 5d c3 <8b> 07 0f c8 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 81 fe ff ff 03 RSP: 0018:ffffb5b14c56fd78 EFLAGS: 00010292 RAX: ffffb5b141c18000 RBX: ffff8e9f78a801c0 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffff8e9f7ecd7628 RDI: ffffb5b141c18014 RBP: ffffb5b14c56fd90 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8e9f372a2c30 R11: ffff8e9f87f4bc40 R12: ffff8e9f372a1fc0 R13: ffff8e9f78a80000 R14: ffffffffc07136a0 R15: ffff8e9f78ae6f20 FS: 0000000000000000(0000) GS:ffff8e9f7ecc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffb5b141c18014 CR3: 00000001c8f82006 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 BUG: sleeping function called from invalid context at ./include/linux/percpu-rwsem.h:38 in_atomic(): 0, irqs_disabled(): 1, pid: 6755, name: kworker/u128:2 INFO: lockdep is turned off. CPU: 3 PID: 6755 Comm: kworker/u128:2 Tainted: G D 5.2.0-net-next-mlx5-hv_stats-over-last-worked-hyperv #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016 Workqueue: mlx5_healtha050:00:02.0 mlx5_fw_fatal_reporter_err_work [mlx5_core] Call Trace: dump_stack+0x63/0x88 ___might_sleep+0x10a/0x130 __might_sleep+0x4a/0x80 exit_signals+0x33/0x230 ? blocking_notifier_call_chain+0x16/0x20 do_exit+0xb1/0xc30 ? kthread+0x10d/0x140 ? process_one_work+0x4e0/0x4e0 Fixes: `52c368dc3d` ("net/mlx5: Move health and page alloc init to mdev_init") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-06-11 15:37:48 -07:00
Linus Torvalds	af7b480103	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from David Miller: - Fix the build with certain Kconfig combinations for the Chelsio inline TLS device, from Rohit Maheshwar and Vinay Kumar Yadavi. - Fix leak in genetlink, from Cong Lang. - Fix out of bounds packet header accesses in seg6, from Ahmed Abdelsalam. - Two XDP fixes in the ENA driver, from Sameeh Jubran - Use rwsem in device rename instead of a seqcount because this code can sleep, from Ahmed S. Darwish. - Fix WoL regressions in r8169, from Heiner Kallweit. - Fix qed crashes in kdump mode, from Alok Prasad. - Fix the callbacks used for certain thermal zones in mlxsw, from Vadim Pasternak. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (35 commits) net: dsa: lantiq_gswip: fix and improve the unsupported interface error mlxsw: core: Use different get_trend() callbacks for different thermal zones net: dp83869: Reset return variable if PHY strap is read rhashtable: Drop raw RCU deref in nested_table_free cxgb4: Use kfree() instead kvfree() where appropriate net: qed: fixes crash while running driver in kdump kernel vsock/vmci: make vmci_vsock_transport_cb() static net: ethtool: Fix comment mentioning typo in IS_ENABLED() net: phy: mscc: fix Serdes configuration in vsc8584_config_init net: mscc: Fix OF_MDIO config check net: marvell: Fix OF_MDIO config check net: dp83867: Fix OF_MDIO config check net: dp83869: Fix OF_MDIO config check net: ethernet: mvneta: fix MVNETA_SKB_HEADROOM alignment ethtool: linkinfo: remove an unnecessary NULL check net/xdp: use shift instead of 64 bit division crypto/chtls:Fix compile error when CONFIG_IPV6 is disabled inet_connection_sock: clear inet_num out of destroy helper yam: fix possible memory leak in yam_init_driver lan743x: Use correct MAC_CR configuration for 1 GBit speed ...	2020-06-07 17:27:45 -07:00
Vadim Pasternak	2dc2f76005	mlxsw: core: Use different get_trend() callbacks for different thermal zones The driver registers three different types of thermal zones: For the ASIC itself, for port modules and for gearboxes. Currently, all three types use the same get_trend() callback which does not work correctly for the ASIC thermal zone. The callback assumes that the device data is of type 'struct mlxsw_thermal_module', whereas for the ASIC thermal zone 'struct mlxsw_thermal' is passed as device data. Fix this by using one get_trend() callback for the ASIC thermal zone and another for the other two types. Fixes: `6f73862fab` ("mlxsw: core: Add the hottest thermal zone detection") Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-07 16:59:43 -07:00
Linus Torvalds	242b233198	RDMA 5.8 merge window pull request A few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and the removal of FMR has been a recurring discussion theme for a long time. The usual smattering of features and bug fixes. - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl7X/IwACgkQOG33FX4g mxp2uw/+MI2S/aXqEBvZfTT8yrkAwqYezS0VeTDnwH/T6UlTMDhHVN/2Ji3tbbX3 FEKT1i2mnAL5RqUAL1lr9g4sG/bVozrpN46Ws5Lu9dTbIPLKTNPWDuLFQDUShKY7 OyMI/bRx6anGnsOy20iiBqnrQbrrZj5TECgnmrkAl62QFdcl7aBWe/yYjy4CT11N ub+aBXBREN1F1pc0HIjd2tI+8gnZc+mNm1LVVDRH9Capun/pI26qDNh7e6QwGyIo n8ItraC8znLwv/nsUoTE7/JRcsTEe6vJI26PQmczZfNJs/4O65G7fZg0eSBseZYi qKf7Uwtb3qW0R7jRUMEgFY4DKXVAA0G2ph40HXBuzOSsqlT6HqYMO2wgG8pJkrTc qAjoSJGzfAHIsjxzxKI8wKuufCddjCm30VWWU7EKeriI6h1J0uPVqKkQMfYBTkik 696eZSBycAVgwayOng3XaehiTxOL7qGMTjUpDjUR6UscbiPG919vP+QsbIUuBXdb YoddBQJdyGJiaCXv32ciJjo9bjPRRi/bII7Q5qzCNI2mi4ZVbudF4ffzyQvdHtNJ nGnpRXoPi7kMvUrKTMPWkFjj0R5/UsPszsA51zbxPydfgBe0Dlc2PrrIG8dlzYAp wbV0Lec+iJucKlt7EZtrjz1xOiOOaQt/5/cW1bWqL+wk2t6gAuY= =9zTe -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "A more active cycle than most of the recent past, with a few large, long discussed works this time. The RNBD block driver has been posted for nearly two years now, and flowing through RDMA due to it also introducing a new ULP. The removal of FMR has been a recurring discussion theme for a long time. And the usual smattering of features and bug fixes. Summary: - Various small driver bugs fixes in rxe, mlx5, hfi1, and efa - Continuing driver cleanups in bnxt_re, hns - Big cleanup of mlx5 QP creation flows - More consistent use of src port and flow label when LAG is used and a mlx5 implementation - Additional set of cleanups for IB CM - 'RNBD' network block driver and target. This is a network block RDMA device specific to ionos's cloud environment. It brings strong multipath and resiliency capabilities. - Accelerated IPoIB for HFI1 - QP/WQ/SRQ ioctl migration for uverbs, and support for multiple async fds - Support for exchanging the new IBTA defiend ECE data during RDMA CM exchanges - Removal of the very old and insecure FMR interface from all ULPs and drivers. FRWR should be preferred for at least a decade now" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (247 commits) RDMA/cm: Spurious WARNING triggered in cm_destroy_id() RDMA/mlx5: Return ECE DC support RDMA/mlx5: Don't rely on FW to set zeros in ECE response RDMA/mlx5: Return an error if copy_to_user fails IB/hfi1: Use free_netdev() in hfi1_netdev_free() RDMA/hns: Uninitialized variable in modify_qp_init_to_rtr() RDMA/core: Move and rename trace_cm_id_create() IB/hfi1: Fix hfi1_netdev_rx_init() error handling RDMA: Remove 'max_map_per_fmr' RDMA: Remove 'max_fmr' RDMA/core: Remove FMR device ops RDMA/rdmavt: Remove FMR memory registration RDMA/mthca: Remove FMR support for memory registration RDMA/mlx4: Remove FMR support for memory registration RDMA/i40iw: Remove FMR leftovers RDMA/bnxt_re: Remove FMR leftovers RDMA/mlx5: Remove FMR leftovers RDMA/core: Remove FMR pool API RDMA/rds: Remove FMR support for memory registration RDMA/srp: Remove support for FMR memory registration ...	2020-06-05 14:05:57 -07:00
Max Gurtovoy	1f55b7ab90	RDMA/mlx4: Remove FMR support for memory registration HCA's that are driven by mlx4 driver support FRWR method to register memory. Remove the ancient and unsafe FMR method. Link: https://lore.kernel.org/r/8-v3-f58e6669d5d3+2cf-fmr_removal_jgg@mellanox.com Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-06-02 20:32:54 -03:00
Lorenzo Bianconi	1b698fa5d8	xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org	2020-06-01 15:02:53 -07:00
Ido Schimmel	88e2774961	mlxsw: spectrum_trap: Register ACL control traps In a similar fashion to other control traps, register ACL control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	8110668ecd	mlxsw: spectrum_trap: Register layer 3 control traps In a similar fashion to layer 2 control traps, register layer 3 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	39c10350cf	mlxsw: spectrum_trap: Register layer 2 control traps In a similar fashion to other traps, register layer 2 control traps with devlink. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	45b1c87313	mlxsw: spectrum_trap: Factor out common Rx listener function We currently have an Rx listener function for exception traps that marks received skbs with 'offload_fwd_mark' and injects them to the kernel's Rx path. The marking is done because all these exceptions occur during L3 forwarding, after the packets were potentially flooded at L2. A subsequent patch will add support for control traps. Packets received via some of these control traps need different handling: 1. Packets might not need to be marked with 'offload_fwd_mark'. For example, if packet was trapped before L2 forwarding 2. Packets might not need to be injected to the kernel's Rx path. For example, sampled packets are reported to user space via the psample module Factor out a common Rx listener function that only reports trapped packets to devlink. Call it from mlxsw_sp_rx_no_mark_listener() and mlxsw_sp_rx_mark_listener() that will inject the packets to the kernel's Rx path, without and with the marking, respectively. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Ido Schimmel	1e292f5c11	mlxsw: spectrum_trap: Move layer 3 exceptions to exceptions trap group The layer 3 exceptions are still subject to the same trap policer, so nothing changes, but user space can choose to assign a different one. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:49:23 -07:00
Pablo Neira Ayuso	9eabd18871	mlx5: update indirect block support Register ndo callback via flow_indr_dev_register() and flow_indr_dev_unregister(). No need for mlx5e_rep_indr_clean_block_privs() since flow_block_cb_free() already releases the internal mapping via ->release callback, which in this case is mlx5e_rep_indr_tc_block_unbind(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-06-01 11:41:50 -07:00
David S. Miller	1806c13dc2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-31 17:48:46 -07:00
Saeed Mahameed	eb24387183	net/mlx5e: Make mlx5e_dcbnl_ops static Fix sparse warning: drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c:988:29: error: symbol 'mlx5e_dcbnl_ops' was not declared. Should it be static? Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:23 -07:00
Saeed Mahameed	58ff18e12c	net/mlx5e: en_tc: Fix cast to restricted __be32 warning Fixes sparse warnings: warning: cast to restricted __be32 warning: restricted __be32 degrades to integer Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	c51323ee7a	net/mlx5e: en_tc: Fix incorrect type in initializer warnings Fix some trivial warnings of the type: warning: incorrect type in initializer (different base types) Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	aee3e9c457	net/mlx5: Accel: fpga tls fix cast to __be64 and incorrect argument types tls handle and rcd_sn are actually big endian and not in host format. Fix that. Fix the following sparse warnings: drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:177:21: warning: cast to restricted __be64 drivers/net/ethernet/mellanox/mlx5/core/en_accel/tls.c:178:52: warning: incorrect type in argument 2 (different base types) expected unsigned int [usertype] handle got restricted __be32 [usertype] handle Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:22 -07:00
Saeed Mahameed	2553f421f4	net/mlx5: cmd: Fix memset with byte count warning Fix sparse warning: drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1949:15: warning: memset with byte count of 271720 mlx5_cmd_stats array is too big to be held inline in mlx5_cmd. Allocate it separately. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	9ff2e92c46	net/mlx5: DR: Fix incorrect type in return expression dr_ste_crc32_calc() calculates crc32 and should return it in HW format. It is being used to calculate a u32 index, hence we force the return value of u32 to avoid the sparse warning: drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c:115:16: warning: incorrect type in return expression (different base types) expected unsigned int got restricted __be32 [usertype] Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	c2ba2c2287	net/mlx5: DR: Fix cast to restricted __be32 raw_ip actual type is __be32 and not u32. Fix that and get rid of the warning. drivers/net/ethernet/mellanox/mlx5/core/steering/dr_ste.c:906:31: warning: cast to restricted __be32 Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Saeed Mahameed	618f88c4c4	net/mlx5: DR: Fix incorrect type in argument HW spec objects should receive a void ptr to work on, the MLX5_SET/GET macro will know how to handle it. No need to provide explicit or wrong pointer type in this case. warning: incorrect type in argument 1 (different base types) expected unsigned long long const [usertype] sw_action got restricted __be64 [usertype] [assigned] sw_action Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com>	2020-05-29 21:20:21 -07:00
Eli Cohen	f7e3ac424a	net/mlx5e: Use generic API to build MPLS label Make use of generic API mpls_entry_encode() to build mpls label and get rid of local function. Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Arnd Bergmann	e1167e1611	net/mlx5: reduce stack usage in qp_read_field Moving the mlx5_ifc_query_qp_out_bits structure on the stack was a bit excessive and now causes the compiler to complain on 32-bit architectures: drivers/net/ethernet/mellanox/mlx5/core/debugfs.c: In function 'qp_read_field': drivers/net/ethernet/mellanox/mlx5/core/debugfs.c:274:1: error: the frame size of 1104 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] Revert the previous patch partially to use dynamically allocation as the code did before. Unfortunately there is no good error handling in case the allocation fails. Fixes: `57a6c5e992` ("net/mlx5: Replace hand written QP context struct with automatic getters") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Nathan Chancellor	2861904697	net/mlx5e: Don't use err uninitialized in mlx5e_attach_decap Clang warns: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:6: warning: variable 'err' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] if (IS_ERR(d->pkt_reformat)) { ^~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3718:6: note: uninitialized use occurs here if (err) ^~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3712:2: note: remove the 'if' if its condition is always true if (IS_ERR(d->pkt_reformat)) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:3670:9: note: initialize the variable 'err' to silence this warning int err; ^ = 0 1 warning generated. It is not wrong, err is only ever initialized in if statements but this one is not in one. Initialize err to 0 to fix this. Fixes: `14e6b038af` ("net/mlx5e: Add support for hw decapsulation of MPLS over UDP") Link: https://github.com/ClangBuiltLinux/linux/issues/1037 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:20 -07:00
Saeed Mahameed	2950d1d64f	net/mlx5: Kconfig: Fix spelling typo "mdoe"->"mode" Fixes: `d956873f90` ("net/mlx5e: Introduce kconfig var for TC support") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>	2020-05-29 21:20:19 -07:00
Jesper Dangaard Brouer	56e2287b41	mlx5: fix xdp data_meta setup in mlx5e_fill_xdp_buff The helper function xdp_set_data_meta_invalid() must be called after setting xdp->data as it depends on it. The bug was introduced in the cited patch below, and cause the kernel to crash when using BPF helper bpf_xdp_adjust_head() on mlx5 driver. Fixes: `39d6443c8d` ("mlx5, xsk: Migrate to new MEM_TYPE_XSK_BUFF_POOL") Reported-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 21:20:19 -07:00
Saeed Mahameed	971ae1ed03	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux net/mlx5: Add ability to read and write ECE options net/mlx5: Add support for RDMA TX FT headers modifying net/mlx5: Move iseg access helper routines close to mlx5_core driver net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits net/mlx5: Add support in forward to namespace {IB/net}/mlx5: Simplify don't trap code net/mlx5: Replace zero-length array with flexible-array Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 14:38:57 -07:00
Pablo Neira Ayuso	a683012a8e	net/mlx5e: replace EINVAL in mlx5e_flower_parse_meta() The drivers reports EINVAL to userspace through netlink on invalid meta match. This is confusing since EINVAL is usually reserved for malformed netlink messages. Replace it by more meaningful codes. Fixes: `6d65bc64e2` ("net/mlx5e: Add mlx5e_flower_parse_meta support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:54 -07:00
Vlad Buslov	cb9a0641b5	net/mlx5e: Fix MLX5_TC_CT dependencies Change MLX5_TC_CT config dependencies to include MLX5_ESWITCH instead of MLX5_CORE_EN && NET_SWITCHDEV, which are already required by MLX5_ESWITCH. Without this change mlx5 fails to compile if user disables MLX5_ESWITCH without also manually disabling MLX5_TC_CT. Fixes: `4c3844d9e9` ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:54 -07:00
Tal Gilboa	ebeaf084ad	net/mlx5e: Properly set default values when disabling adaptive moderation Add a call to mlx5e_reset_rx/tx_moderation() when enabling/disabling adaptive moderation, in order to select the proper default values. In order to do so, we separate the logic of selecting the moderation values and setting moderion mode (CQE/EQE based). Fixes: `0088cbbc4b` ("net/mlx5e: Enable CQE based moderation on TX CQ") Fixes: `9908aa2929` ("net/mlx5e: CQE based moderation") Signed-off-by: Tal Gilboa <talgi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Aya Levin	b623603bbb	net/mlx5e: Fix arch depending casting issue in FEC Change type of active_fec to u32 to match the type expected by mlx5e_get_fec_mode. Copy active_fec and configured_fec values to unsigned long before preforming bitwise manipulations. Take the same approach when configuring FEC over 50G link modes: copy the policy into an unsigned long and only than preform bitwise operations. Fixes: `2132b71f78` ("net/mlx5e: Advertise globaly supported FEC modes") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Maor Dickman	20300aafa7	net/mlx5e: Remove warning "devices are not on same switch HW" On tunnel decap rule insertion, the indirect mechanism will attempt to offload the rule on all uplink representors which will trigger the "devices are not on same switch HW, can't offload forwarding" message for the uplink which isn't on the same switch HW as the VF representor. The above flow is valid and shouldn't cause warning message, fix by removing the warning and only report this flow using extack. Fixes: `321348475d` ("net/mlx5e: Fix allowed tc redirect merged eswitch offload cases") Signed-off-by: Maor Dickman <maord@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:53 -07:00
Roi Dayan	0a2a6f498f	net/mlx5e: Fix stats update for matchall classifier It's bytes, packets, lastused. Fixes: `fcb64c0f56` ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:52 -07:00
Mark Bloch	8fc3e29be9	net/mlx5: Fix crash upon suspend/resume Currently a Linux system with the mlx5 NIC always crashes upon hibernation - suspend/resume. Add basic callbacks so the NIC could be suspended and resumed. Fixes: `9603b61de1` ("mlx5: Move pci device handling from mlx5_ib to mlx5_core") Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-29 13:07:52 -07:00
David S. Miller	1eba1110f0	mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ \| VM _________________ \| \| \|FAILOVER device \| \| \| \|________________\| \| \| \| \| \| ____\|_____ \| \| \| \| \| \| ______ \|___ ____\|_______ \| \| \| VF PT \| \|VIRTIO-NET \| \| \| \| device \| \| device \| \| \| \|_________\| \|___________\| \| \|___________\|______________\|________\| \| \| \| HYPERVISOR \| \| ____\|______ \| \| macvtap \| \| \|virtio BE \| \| \|___________\| \| \| \| ____\|_____ \| \|host VF \| \| \|_________\| \| \| _____\|______ _____\|_____ \| PT VF \| \| host VF \| \|representor\| \|representor\| \|___________\| \|___________\| \ / \ / \ / \ / _________________ \_______/ \| \| _______\|________ \| V-SWITCH \| \|VF representors \|________________\| (OVS) \| \| bond \| \|________________\| \|________________\| \| ________\|________ \| Uplink \| \| representor \| \|_________________\| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7PEFAACgkQSD+KveBX +j4Z5Af+NYwihYZpQYBBN00K7Wu10XZ65u5MbGSDmzpdN62w0kKfjsJ70bb9aiws h8LC7lspdMLRMMn9pWwFKshyF6RoSD9Ku3ZYhUbtj+hJLElAd9IwGt6pPKr8hPDd 9h+ZcBkacdhNwWKf7CKThic0c/0PLdVyzRysHxcQWKSMPCTdgiL5Z3PQHA0TM6J3 6Excs2z7kSuuyyxQ1cyWCaqSz4rqCrYyd8Ws4HOPhXgSbX14Q3mtMsBDayx2gHNW rdVbaNN6s2o0TxbrCwd0AaNP3UWcnjNqu1ohxgJiSe8y+MHMoB0OMoO+6vQJnwNI bzpZEioswV1zdgK3qNmXqbHOiHRSVQ== =xM1D -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-26 Updates highlights: 1) From Vu Pham (8): Support VM traffics failover with bonded VF representors and e-switch egress/ingress ACLs This series introduce the support for Virtual Machine running I/O traffic over direct/fast VF path and failing over to slower paravirtualized path using the following features: __________________________________ \| VM _________________ \| \| \|FAILOVER device \| \| \| \|________________\| \| \| \| \| \| ____\|_____ \| \| \| \| \| \| ______ \|___ ____\|_______ \| \| \| VF PT \| \|VIRTIO-NET \| \| \| \| device \| \| device \| \| \| \|_________\| \|___________\| \| \|___________\|______________\|________\| \| \| \| HYPERVISOR \| \| ____\|______ \| \| macvtap \| \| \|virtio BE \| \| \|___________\| \| \| \| ____\|_____ \| \|host VF \| \| \|_________\| \| \| _____\|______ _____\|_____ \| PT VF \| \| host VF \| \|representor\| \|representor\| \|___________\| \|___________\| \ / \ / \ / \ / _________________ \_______/ \| \| _______\|________ \| V-SWITCH \| \|VF representors \|________________\| (OVS) \| \| bond \| \|________________\| \|________________\| \| ________\|________ \| Uplink \| \| representor \| \|_________________\| Summary: -------- Problem statement: ------------------ Currently in above topology, when netfailover device is configured using VFs and eswitch VF representors, and when traffic fails over to stand-by VF which is exposed using macvtap device to guest VM, eswitch fails to switch the traffic to the stand-by VF representor. This occurs because there is no knowledge at eswitch level of the stand-by representor device. Solution: --------- Using standard bonding driver, a bond netdevice is created over VF representor device which is used for offloading tc rules. Two VF representors are bonded together, one for the passthrough VF device and another one for the stand-by VF device. With this solution, mlx5 driver listens to the failover events occuring at the bond device level to failover traffic to either of the active VF representor of the bond. a. VM with netfailover device of VF pass-thru (PT) device and virtio-net paravirtualized device with same MAC-address to handle failover traffics at VM level. b. Host bond is active-standby mode, with the lower devices being the VM VF PT representor, and the representor of the 2nd VF to handle failover traffics at Hypervisor/V-Switch OVS level. - During the steady state (fast datapath): set the bond active device to be the VM PT VF representor. - During failover: apply bond failover to the second VF representor device which connects to the VM non-accelerated path. c. E-Switch ingress/egress ACL tables to support failover traffics at E-Switch level I. E-Switch egress ACL with forward-to-vport rule: - By default, eswitch vport egress acl forward packets to its counterpart NIC vport. - During port failover, the egress acl forward-to-vport rule will be added to e-switch vport of passive/in-active slave VF representor to forward packets to other e-switch vport ie. the active slave representor's e-switch vport to handle egress "failover" traffics. - Using lower change netdev event to detect a representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all other slave netdevs to forward to this representor's vport. - Using upper change netdev event to detect a representor unslaving from bond device to delete its vport's egress acl forward-to-vport rule. II. E-Switch ingress ACL metadata reg_c for match - Bonded representors' vorts sharing tc block have the same root ingress acl table and a unique metadata for match. - Traffics from both representors's vports will be tagged with same unique metadata reg_c. - Using upper change netdev event to detect a representor enslaving/unslaving from bond device to setup shared root ingress acl and unique metadata. 2) From Alex Vesker (2): Slpit RX and TX lock for parallel rule insertion in software steering 3) Eli Britstein (2): Optimize performance for IPv4/IPv6 ethertype use the HW ip_version register rather than parsing eth frames for ethertype. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-28 11:04:12 -07:00
Alex Vesker	ed03a418ab	net/mlx5: DR, Split RX and TX lock for parallel insertion Change the locking flow to support RX and TX locks, splitting the single lock to two will allow inserting rules in parallel for RX and TX parts of the FDB. Locking the dr_domain will be done by locking the RX domain and the TX domain locks, this is mostly used for control operations on the dr_domain. When inserting rules for RX or TX the single nic_doamin RX or TX lock will be used. Splitting the lock is safe since RX and TX domains are logically separated from each other, shared objects such the send-ring and memory pool are protected by locks. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:52 -07:00
Alex Vesker	cedb28191f	net/mlx5: DR, Add a spinlock to protect the send ring Adding this lock will allow writing steering entries without locking the dr_domain and allow parallel insertion. Signed-off-by: Alex Vesker <valex@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:51 -07:00
Eli Britstein	fca533041a	net/mlx5e: Optimize performance for IPv4/IPv6 ethertype The HW is optimized for IPv4/IPv6. For such cases, pending capability, avoid matching on ethertype, and use ip_version field instead. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:51 -07:00
Eli Britstein	4a5d5d7392	net/mlx5e: Helper function to set ethertype Set ethertype match in a helper function as a pre-step towards optimizing it. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Parav Pandit	810cbb2554	net/mlx5: Add missing mutex destroy Add mutex destroy calls to balance with mutex_init() done in the init path. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Vu Pham	9728366f53	net/mlx5e: Use change upper event to setup representors' bond_metadata Use change upper event to detect slave representor from enslaving/unslaving to/from lag device. On enslaving event, call mlx5_enslave_rep() API to create, add this slave representor shadow entry to the slaves list of bond_metadata structure representing master lag device and use its metadata to setup ingress acl metadata header. On unslaving event, resetting the vport of unslaved representor to use its default ingress/egress acls and rx rules with its default_metadata. The last slave will free the shared bond_metadata and its unique metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:50 -07:00
Vu Pham	88e96e533c	net/mlx5e: Slave representors sharing unique metadata for match Bonded slave representors' vports must share a unique metadata for match. On enslaving event of slave representor to lag device, allocate new unique "bond_metadata" for match if this is the first slave. The subsequent enslaved representors will share the same unique "bond_metadata". On unslaving event of slave representor, reset the slave representor's vport to use its own default metadata. Replace ingress acl and rx rules of the slave representors' vports using new vport->bond_metadata. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Vu Pham	133dcfc577	net/mlx5: E-Switch, Alloc and free unique metadata for match Introduce infrastructure to create unique metadata for match for vport without depending on vport_num. Vport uses its default metadata for match in standalone configuration but will share a different unique "bond_metadata" for match with other vports in bond configuration. Using ida to generate unique metadata for match for vports in default and bond configurations. Introduce APIs to generate, free metadata for match. Introduce APIs to set vport's bond_metadata and replace its ingress acl rules with bond_metatada. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Vu Pham	d97555e145	net/mlx5e: Add bond_metadata and its slave entries Adding bond_metadata and its slave entries to represent a lag device and its slaves VF representors. Bond_metadata structure includes a unique metadata shared by slaves VF respresentors, and a list of slaves representors slave entries. On enslaving event, create a bond_metadata structure representing the upper lag device of this slave representor if it has not been created yet. Create and add entry for the slave representor to the slaves list. On unslaving event, free the slave entry of the slave representor. On the last unslave event, free the bond_metadata structure and its resources. Introduce APIs to create and remove bond_metadata and its resources, enslave and unslave VF representor slave entries. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:49 -07:00
Or Gerlitz	d34eb2fcd0	net/mlx5e: Offload flow rules to active lower representor When a bond device is created over one or more non uplink representors, and when a flow rule is offloaded to such bond device, offload a rule to the active lower device. Assuming that this is active-backup lag, the rules should be offloaded to the active lower device which is the representor of the direct path (not the failover). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:48 -07:00
Vu Pham	553f932838	net/mlx5e: Support tc block sharing for representors Currently offloading a rule over a tc block shared by multiple representors fails because an e-switch global hashtable to keep the mapping from tc cookies to mlx5e flow instances is used, and tc block sharing offloads the same rule/cookie multiple times, each time for different representor sharing the tc block. Changing the implementation and behavior by acknowledging and returning success if the same rule/cookie is offloaded again to other slave representor sharing the tc block by setting, checking and comparing the netdev that added the rule first. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:48 -07:00
Or Gerlitz	7e51891a23	net/mlx5e: Use netdev events to set/del egress acl forward-to-vport rule Register a notifier block to handle netdev events for bond device of non-uplink representors to support eswitch vports bonding. When a non-uplink representor is a lower dev (slave) of bond and becomes active, adding egress acl forward-to-vport rule of all slave netdevs (active + standby) to forward to this representor's vport. Use change lower netdev event to do this. Use change upper event to detect slave representor unslaved from lag device to delete its vport egress acl forward rule if any. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	bf773dc0e6	net/mlx5: E-Switch, Introduce APIs to enable egress acl forward-to-vport rule By default, e-switch vport's egress acl just forward packets to its counterpart NIC vport using existing egress acl table. During port failover in bonding scenario where two VFs representors are bonded, the egress acl forward-to-vport rule will be added to the existing egress acl table of e-switch vport of passive/inactive slave representor to forward packets to other NIC vport ie. the active slave representor's NIC vport to handle egress "failover" traffic. Enable egress acl and have APIs to create and destroy egress acl forward-to-vport rule and group. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	07bab95026	net/mlx5: E-Switch, Refactor eswitch ingress acl codes Restructure the eswitch ingress acl codes into eswitch directory and different files: . Acl ingress helper functions to acl_helper.c/h . Acl ingress functions used in offloads mode to acl_ingress_ofld.c . Acl ingress functions used in legacy mode to acl_ingress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com>	2020-05-27 18:13:47 -07:00
Vu Pham	ea651a86d4	net/mlx5: E-Switch, Refactor eswitch egress acl codes Refactor the egress acl codes so that offloads and legacy modes can configure specifically their own needs of egress acl table, groups and rules. While at it, restructure the eswitch egress acl codes into eswitch directory and different files: . Acl egress helper functions to acl_helper.c/h . Acl egress functions used in offloads mode to acl_egress_ofld.c . Acl egress functions used in legacy mode to acl_egress_lgy.c This patch does not change any functionality. Signed-off-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-27 18:13:46 -07:00
Jason Gunthorpe	e4fdf7625b	Merge branch 'mellanox/mlx5-next' into rdma.git for/next From the mlx5-next branch at git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Required for dependencies in following patches * branch 'mellanox/mlx5-next': net/mlx5: Add ability to read and write ECE options net/mlx5: Add support for RDMA TX FT headers modifying net/mlx5: Move iseg access helper routines close to mlx5_core driver net/mlx5: Cleanup mlx5_ifc_fte_match_set_misc2_bits Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2020-05-27 16:01:17 -03:00
Colin Ian King	7cf4eda481	mlxsw: spectrum_router: remove redundant initialization of pointer br_dev The pointer br_dev is being initialized with a value that is never read and is being updated with a new value later on. The initialization is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-27 11:23:27 -07:00
Ido Schimmel	10d3757fcb	mlxsw: spectrum_router: Allow programming link-local prefix routes The device has a trap for IPv6 packets that need be routed and have a unicast link-local destination IP (i.e., fe80::/10). This allows mlxsw to ignore link-local routes, as the packets will be trapped to the CPU in any case. However, since link-local routes are not programmed, it is possible for routed packets to hit the default route which might also be programmed to trap packets. This means that packets with a link-local destination IP might be trapped for the wrong reason. To overcome this, allow programming link-local prefix routes (usually one fe80::/64 per-table), so that the packets will be forwarded until reaching the link-local trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	9785b92b44	mlxsw: spectrum: Add packet traps for BFD packets Bidirectional Forwarding Detection (BFD) provides "low-overhead, short-duration detection of failures in the path between adjacent forwarding engines" (RFC 5880). This is accomplished by exchanging BFD packets between the two forwarding engines. Up until now these packets were trapped via the general local delivery (i.e., IP2ME) trap which also traps a lot of other packets that are not as time-sensitive as BFD packets. Expose dedicated traps for BFD packets so that user space could configure a dedicated policer for them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	dacc4e3acf	mlxsw: spectrum: Treat IPv6 link-local SIP as an exception IPv6 packets that need to be forwarded and have a link-local source IP are dropped by the kernel and an ICMPv6 "Destination unreachable" is sent to the sending host. As such, change the trap group of such packets so that they do not interfere with IPv6 management packets. In the future this trap will be exposed as an exception via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	1260e083d4	mlxsw: spectrum: Share one group for all locally delivered packets Routed IP packets with the Router Alert option need to be trapped to the CPU as they might need to be locally delivered to raw sockets with the IP_ROUTER_ALERT / IPV6_ROUTER_ALERT socket option. Move them to the same group with other packets that might need to be trapped following route lookup. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	500769bebe	mlxsw: reg: Move all trap groups under the same enum After the previous patch the split is no longer necessary and all the trap groups can be moved under the same enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	b87bde80da	mlxsw: spectrum_trap: Do not hard code "thin" policer identifier As explained in commit `e612523041` ("mlxsw: spectrum_trap: Introduce dummy group with thin policer"), the purpose of the "thin" policer is to pass as less packets as possible to the CPU. The identifier of this policer is currently set according to the maximum number of used trap groups, but this is fragile: On Spectrum-1 the maximum number of policers is less than the maximum number of trap groups, which might result in an invalid policer identifier in case the number of used trap groups grows beyond the policer limit. Solve this by dynamically allocating the policer identifier. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	03cb0ce0dd	mlxsw: switchx2: Move SwitchX-2 trap groups out of main enum The number of Spectrum trap groups is not infinite, but two identifiers are occupied by SwitchX-2 specific trap groups. Free these identifiers by moving them out of the main enum. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	025b7de7f4	mlxsw: spectrum: Reduce priority of locally delivered packets To align with recent recommended values. Will be configurable by future patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:58 -07:00
Ido Schimmel	1e3cd58942	mlxsw: spectrum: Use same trap group for local routes and link-local destination Packets with an IPv6 link-local destination (i.e., fe80::/10) should not be forwarded and are therefore trapped to the CPU for local delivery. Since these packets are trapped for the same logical reason as packets hitting local routes, associate both traps with the same group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	d322309d72	mlxsw: spectrum: Use separate trap group for FID miss When a packet enters the device it is classified to a filtering identifier (FID) based on the ingress port and VLAN. The FID miss trap is used to trap packets for which a FID could not be found. In mlxsw this trap should only be triggered when a port is enslaved to an OVS bridge and a matching ACL rule could not be found, so as to trigger learning. These packets are therefore completely unrelated to packets hitting local routes and should be in a different group. Move them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	954eef2677	mlxsw: spectrum: Use same trap group for various IPv6 packets Group these various IPv6 packets (e.g., router solicitations, router advertisement) together and subject them to the same policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	412df3d1bb	mlxsw: spectrum: Rename IPv6 ND trap group The IPv6 Neighbour Discovery (ND) group will be used for various IPv6 packets, not all of which fall under the definition of ND, so rename it to "IPV6" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	761bc42fbe	mlxsw: spectrum: Use same switch case for identical groups Trap groups that use the same policer settings can share the same switch case. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Ido Schimmel	3c2d8a046a	mlxsw: spectrum: Use dedicated trap group for ACL trap Packets that are trapped via tc's trap action are currently subject to the same policer as packets hitting local routes. The latter are critical to the correct functioning of the control plane, while the former are mainly used for traffic inspection. Split the ACL trap to a separate group with its own policer. Use a higher priority for these traps than for traps using mirror action (e.g., ARP, IGMP). Otherwise, packets matching both traps will not be forwarded in hardware (because of trap action) and also not forwarded in software because they will be marked with 'offload_fwd_mark'. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 20:33:57 -07:00
Guillaume Nault	58cff782cc	flow_dissector: Parse multiple MPLS Label Stack Entries The current MPLS dissector only parses the first MPLS Label Stack Entry (second LSE can be parsed too, but only to set a key_id). This patch adds the possibility to parse several LSEs by making __skb_flow_dissect_mpls() return FLOW_DISSECT_RET_PROTO_AGAIN as long as the Bottom Of Stack bit hasn't been seen, up to a maximum of FLOW_DIS_MPLS_MAX entries. FLOW_DIS_MPLS_MAX is arbitrarily set to 7. This should be enough for many practical purposes, without wasting too much space. To record the parsed values, flow_dissector_key_mpls is modified to store an array of stack entries, instead of just the values of the first one. A bit field, "used_lses", is also added to keep track of the LSEs that have been set. The objective is to avoid defining a new FLOW_DISSECTOR_KEY_MPLS_XX for each level of the MPLS stack. TC flower is adapted for the new struct flow_dissector_key_mpls layout. Matching on several MPLS Label Stack Entries will be added in the next patch. The NFP and MLX5 drivers are also adapted: nfp_flower_compile_mac() and mlx5's parse_tunnel() now verify that the rule only uses the first LSE and fail if it doesn't. Finally, the behaviour of the FLOW_DISSECTOR_KEY_MPLS_ENTROPY key is slightly modified. Instead of recording the first Entropy Label, it now records the last one. This shouldn't have any consequences since there doesn't seem to have any user of FLOW_DISSECTOR_KEY_MPLS_ENTROPY in the tree. We'd probably better do a hash of all parsed MPLS labels instead (excluding reserved labels) anyway. That'd give better entropy and would probably also simplify the code. But that's not the purpose of this patch, so I'm keeping that as a future possible improvement. Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-26 15:22:58 -07:00
Ido Schimmel	154388e112	mlxsw: spectrum: Fix spelling mistake in trap's name Fix incorrect spelling of "advertisement". Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	ce3c3bf0bf	mlxsw: spectrum: Use dedicated trap group for sampled packets The rate with which packets are sampled is determined by user space, so there is no need to associate such packets with a policer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	b33f5d9fb7	mlxsw: spectrum: Use same trap group for IPv6 ND and ARP packets Both packet types are needed for the same reason (neighbour discovery), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	32446438cc	mlxsw: spectrum: Rename ARP trap group The ARP trap group will be used for IPv6 ND traps in the next patch, so rename it to "NEIGH_DISCOVERY" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	d88f8cc158	mlxsw: spectrum_trap: Remove unnecessary field Now that traffic class (TC) and priority are set to the same value, there is no need to store both. Remove the first. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	5047d819f5	mlxsw: spectrum: Align TC and trap priority The traffic class (TC) attribute of packet traps determines through which TC a packet trap will be scheduled through the CPU port. The priority attribute determines which trap will be triggered in case several packet traps match a packet. We try to configure these attributes to the same value for all packet traps as there is little reason not to. Some packet traps did not use the same value, so rectify that now. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	e0d848477a	mlxsw: spectrum_buffers: Assign non-zero quotas to TC 0 of the CPU port As explained in commit `9ffcc3725f` ("mlxsw: spectrum: Allow packets to be trapped from any PG"), incoming packets can be admitted to the shared buffer and forwarded / trapped, if: (Ingress{Port}.Usage < Thres && Ingress{Port,PG}.Usage < Thres && Egress{Port}.Usage < Thres && Egress{Port,TC}.Usage < Thres) \|\| (Ingress{Port}.Usage < Min \|\| Ingress{Port,PG} < Min \|\| Egress{Port}.Usage < Min \|\| Egress{Port,TC}.Usage < Min) Trapped packets are scheduled to transmission through the CPU port. Currently, the minimum and maximum quotas of traffic class (TC) 0 of the CPU port are 0, which means it is not usable. Assign non-zero quotas to TC 0 of the CPU port, so that it could be utilized by subsequent patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	938e6d0b76	mlxsw: spectrum: Change default rate and priority of DHCP packets Reduce the default acceptable rate of DHCP packets to 128 packets per second and reduce their priority. This is reasonable given the Spectrum ASICs are limited to 128 ports at the moment. These are only the default values. Users will be able to modify them via devlink-trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	0ecb947412	mlxsw: spectrum: Trap IPv4 DHCP packets in router Currently, IPv4 DHCP packets are trapped during L2 forwarding, which means that packets might be trapped unnecessarily. Instead, only trap the DHCP packets that reach the router. Either because they were flooded to the router port or forwarded to it by the FDB. This is consistent with the corresponding IPv6 trap. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	99129069b7	mlxsw: spectrum: Use same trap group for MLD and IGMP packets Both packet types are needed for the same reason (multicast snooping), so associate them with the same trap group. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
Ido Schimmel	debb7af686	mlxsw: spectrum: Rename IGMP trap group The IGMP trap group will be used for MLD traps in the next patch, so rename it to "MC_SNOOPING" which is more appropriate. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 19:32:23 -07:00
David S. Miller	13209a8f73	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net The MSCC bug fix in 'net' had to be slightly adjusted because the register accesses are done slightly differently in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-24 13:47:27 -07:00
David S. Miller	e3181e9a72	mlx5-fixes-2020-05-22 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7IbksACgkQSD+KveBX +j5T8Af/XT6b23VlSn2Km4tg8WQNDRJLdq1s6fTS5SGcyc0awxfH07cvYvJ26kKW kmdDNijkVbd0ma2UxHiiD3vmE8Vs85gZ6BDNyl485x/cH3zFzAm54R5fZdnK5JgN YNgdFP0MOwPtAdDtxLH+r8aOyNKncIOmCZrMNnxVgI+IytG1L5QLnS6GeQy2zyIx 9F/9sihta2z567IstGu2wvmgviSHVk/zV9yqn/orD9tV6oFvvrBQMlEt8l27b1tA 4bajbHIyc1WmfQ+wg56eXATdbqCQ2YYfMjhchiCfFv5DhnMnPi5bV0PNR9Rq0CYw 05xpF16/85uvDbTizsgGNZ1Pb1nGsQ== =oFWF -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2020-05-22 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.13 ('net/mlx5: Add command entry handling completion') For -stable v5.2 ('net/mlx5: Fix error flow in case of function_setup failure') ('net/mlx5: Fix memory leak in mlx5_events_init') For -stable v5.3 ('net/mlx5e: Update netdev txq on completions during closure') ('net/mlx5e: kTLS, Destroy key object after destroying the TIS') ('net/mlx5e: Fix inner tirs handling') For -stable v5.6 ('net/mlx5: Fix cleaning unmanaged flow tables') ('net/mlx5: Fix a race when moving command interface to events mode') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:39:45 -07:00
David S. Miller	46c54f9500	mlx5-updates-2020-05-22 This series includes two updates and one cleanup patch 1) Tang Bim, clean-up with IS_ERR() usage 2) Vlad introduces a new mlx5 kconfig flag for TC support This is required due to the high volume of current and upcoming development in the eswitch and representors areas where some of the feature are TC based such as the downstream patches of MPLSoUDP and the following representor bonding support for VF live migration and uplink representor dynamic loading. For this Vlad kept TC specific code in tc.c and rep/tc.c and organized non TC code in representors specific files. 3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl7IZFEACgkQSD+KveBX +j7IEQf/RFv633bWTlL63fEJjViRv1rjfkbyaXrGVL3gzr/Er01DeAPR22CNOlC3 bu1jHLKqVn0Mg0g5g2B4/H/7JoFbMBRTy4MXpM5VrQCIqwMuXG4zhWuoUj7ncQ5w kXHAU6DUuZRn8/x1JLQOHDRTzKhav7ldT+nvvoKEMrad/DEMGz+bq67xh4l8nfi+ ktSFAO0UFi9ysb25CMfdqIqAL0J5nAJ7DNhw5x7IvtwUxNxate7HtBaBhBgZ9NWv jYf8R3p+7JdgvVW18pZhmjbaBqaApXcZrC7rI07PR6rCOAHfToX6miR8gUtpIEno itQkzYt9UF2dgNwMmxoJLqnUNiy/Cg== =wkSR -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2020-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2020-05-22 This series includes two updates and one cleanup patch 1) Tang Bim, clean-up with IS_ERR() usage 2) Vlad introduces a new mlx5 kconfig flag for TC support This is required due to the high volume of current and upcoming development in the eswitch and representors areas where some of the feature are TC based such as the downstream patches of MPLSoUDP and the following representor bonding support for VF live migration and uplink representor dynamic loading. For this Vlad kept TC specific code in tc.c and rep/tc.c and organized non TC code in representors specific files. 3) Eli Cohen adds support for MPLS over UPD encap and decap TC offloads. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:37:00 -07:00
Qiushi Wu	febfd9d3c7	net/mlx4_core: fix a memory leak bug. In function mlx4_opreq_action(), pointer "mailbox" is not released, when mlx4_cmd_box() return and error, causing a memory leak bug. Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can free this pointer. Fixes: `fe6f700d6c` ("net/mlx4_core: Respond to operation request by firmware") Signed-off-by: Qiushi Wu <wu000273@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-23 16:34:37 -07:00
David S. Miller	a152b85984	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2020-05-23 The following pull-request contains BPF updates for your net-next tree. We've added 50 non-merge commits during the last 8 day(s) which contain a total of 109 files changed, 2776 insertions(+), 2887 deletions(-). The main changes are: 1) Add a new AF_XDP buffer allocation API to the core in order to help lowering the bar for drivers adopting AF_XDP support. i40e, ice, ixgbe as well as mlx5 have been moved over to the new API and also gained a small improvement in performance, from Björn Töpel and Magnus Karlsson. 2) Add getpeername()/getsockname() attach types for BPF sock_addr programs in order to allow for e.g. reverse translation of load-balancer backend to service address/port tuple from a connected peer, from Daniel Borkmann. 3) Improve the BPF verifier is_branch_taken() logic to evaluate pointers being non-NULL, e.g. if after an initial test another non-NULL test on that pointer follows in a given path, then it can be pruned right away, from John Fastabend. 4) Larger rework of BPF sockmap selftests to make output easier to understand and to reduce overall runtime as well as adding new BPF kTLS selftests that run in combination with sockmap, also from John Fastabend. 5) Batch of misc updates to BPF selftests including fixing up test_align to match verifier output again and moving it under test_progs, allowing bpf_iter selftest to compile on machines with older vmlinux.h, and updating config options for lirc and v6 segment routing helpers, from Stanislav Fomichev, Andrii Nakryiko and Alan Maguire. 6) Conversion of BPF tracing samples outdated internal BPF loader to use libbpf API instead, from Daniel T. Lee. 7) Follow-up to BPF kernel test infrastructure in order to fix a flake in the XDP selftests, from Jesper Dangaard Brouer. 8) Minor improvements to libbpf's internal hashmap implementation, from Ian Rogers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2020-05-22 18:30:34 -07:00
Shay Drory	4f7400d5cb	net/mlx5: Fix error flow in case of function_setup failure Currently, if an error occurred during mlx5_function_setup(), we keep dev->state as DEVICE_STATE_UP. Fixing it by adding a goto label. Fixes: `e161105e58` ("net/mlx5: Function setup/teardown procedures") Signed-off-by: Shay Drory <shayd@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:58 -07:00
Roi Dayan	d37bd5e81e	net/mlx5e: CT: Correctly get flow rule The correct way is to us the flow_cls_offload_flow_rule() wrapper instead of f->rule directly. Fixes: `4c3844d9e9` ("net/mlx5e: CT: Introduce connection tracking") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:56 -07:00
Moshe Shemesh	5e911e2c06	net/mlx5e: Update netdev txq on completions during closure On sq closure when we free its descriptors, we should also update netdev txq on completions which would not arrive. Otherwise if we reopen sqs and attach them back, for example on fw fatal recovery flow, we may get tx timeout. Fixes: `29429f3300` ("net/mlx5e: Timeout if SQ doesn't flush during close") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:54 -07:00
Roi Dayan	9ca415399d	net/mlx5: Annotate mutex destroy for root ns Invoke mutex_destroy() to catch any errors. Fixes: `2cc43b494a` ("net/mlx5_core: Managing root flow table") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:52 -07:00
Roi Dayan	6eb7a268a9	net/mlx5: Don't maintain a case of del_sw_func being null Add del_sw_func cb for root ns. Now there is no need to maintain a case of del_sw_func being null when freeing the node. Fixes: `2cc43b494a` ("net/mlx5_core: Managing root flow table") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:50 -07:00
Roi Dayan	aee37f3d94	net/mlx5: Fix cleaning unmanaged flow tables Unmanaged flow tables doesn't have a parent and tree_put_node() assume there is always a parent if cleaning is needed. fix that. Fixes: `5281a0c909` ("net/mlx5: fs_core: Introduce unmanaged flow tables") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:48 -07:00
Moshe Shemesh	df14ad1ecc	net/mlx5: Fix memory leak in mlx5_events_init Fix memory leak in mlx5_events_init(), in case create_single_thread_workqueue() fails, events struct should be freed. Fixes: `5d3c537f90` ("net/mlx5: Handle event of power detection in the PCIE slot") Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2020-05-22 17:28:46 -07:00

1 2 3 4 5 ...

6615 Commits