OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Ido Schimmel	b7bf027087	mlxsw: core: Add API to set trap action Up until now the action of a trap was never changed during its lifetime. This is going to change by subsequent patches that will allow devlink to control the action of certain traps. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-21 12:58:39 -07:00
Jason Gunthorpe	868df536f5	Merge branch 'odp_fixes' into rdma.git for-next Jason Gunthorpe says: ==================== This is a collection of general cleanups for ODP to clarify some of the flows around umem creation and use of the interval tree. ==================== The branch is based on v5.3-rc5 due to dependencies * odp_fixes: RDMA/mlx5: Use odp instead of mr->umem in pagefault_mr RDMA/mlx5: Use ib_umem_start instead of umem.address RDMA/core: Make invalidate_range a device operation RDMA/odp: Use kvcalloc for the dma_list and page_list RDMA/odp: Check for overflow when computing the umem_odp end RDMA/odp: Provide ib_umem_odp_release() to undo the allocs RDMA/odp: Split creating a umem_odp from ib_umem_get RDMA/odp: Make the three ways to create a umem_odp clear RMDA/odp: Consolidate umem_odp initialization RDMA/odp: Make it clearer when a umem is an implicit ODP umem RDMA/odp: Iterate over the whole rbtree directly RDMA/odp: Use the common interval tree library instead of generic RDMA/mlx5: Fix MR npages calculation for IB_ACCESS_HUGETLB Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-08-21 14:10:36 -03:00
Mark Zhang	e6806e9a63	net/mlx5: Create bypass and loopback flow steering namespaces for RDMA RX Use different namespaces for bypass and switchdev loopback because they have different priorities and default table miss action requirement: 1. bypass: with multiple priorities support, and MLX5_FLOW_TABLE_MISS_ACTION_DEF as the default table miss action; 2. switchdev loopback: with single priority support, and MLX5_FLOW_TABLE_MISS_ACTION_SWITCH_DOMAIN as the default table miss action. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-08-21 16:57:17 +03:00
Mark Zhang	f66ad830b1	net/mlx5: Add per-namespace flow table default miss action support Currently all the namespaces under the same steering domain share the same default table miss action, however in some situations (e.g., RDMA RX) different actions are required. This patch adds a per-namespace default table miss action instead of using the miss action of the steering domain. Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-08-21 16:57:00 +03:00
Masahiro Yamada	cbdf59ad65	treewide: remove dummy Makefiles for single targets Now that the single target build descends into sub-directories in the same way as the normal build, these dummy Makefiles are not needed any more. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>	2019-08-21 21:05:21 +09:00
Saeed Mahameed	866ff8f223	net/mlx5: Improve functions documentation Fix documentation of mlx5_eq_enable/disable to cleanup compiler warnings. drivers/net/ethernet/mellanox/mlx5/core//eq.c:334: warning: Function parameter or member 'dev' not described in 'mlx5_eq_enable' warning: Function parameter or member 'eq' not described in 'mlx5_eq_enable' warning: Function parameter or member 'nb' not described in 'mlx5_eq_enable' drivers/net/ethernet/mellanox/mlx5/core//eq.c:355: warning: Function parameter or member 'dev' not described in 'mlx5_eq_disable' warning: Function parameter or member 'eq' not described in 'mlx5_eq_disable' warning: Function parameter or member 'nb' not described in 'mlx5_eq_disable' Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:53:58 -07:00
Saeed Mahameed	eed6f7dc28	net/mlx5: Add missing include file to lib/crypto.c Add missing include file to avoid compiler warnings: drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:6:5: warning: no previous prototype for ‘mlx5_create_encryption_key’ 6 \| int mlx5_create_encryption_key(struct mlx5_core_dev mdev, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core//lib/crypto.c:60:6: warning: no previous prototype for ‘mlx5_destroy_encryption_key’ 60 \| void mlx5_destroy_encryption_key(struct mlx5_core_dev mdev, ... Fixes: `45d3b55dc6` ("net/mlx5: Add crypto library to support create/destroy encryption key") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:53:58 -07:00
Gavi Teitz	b1b9f97a09	net/mlx5: Fix the order of fc_stats cleanup Previously, mlx5_cleanup_fc_stats() would cleanup the flow counter pool beofre releasing all the counters to it, which would result in flow counter bulks not getting freed. Resolve this by changing the order in which elements of fc_stats are cleaned up, so that the flow counter pool is cleaned up after all the counters are released. Also move cleanup actions for freeing the bulk query memory and destroying the idr to the end of mlx5_cleanup_fc_stats(). Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:19 -07:00
Vlad Buslov	3c140dd54f	net/mlx5e: Fix deallocation of non-fully init encap entries Recent rtnl lock dependency refactoring changed encap entry attach code to insert encap entry to hash table before it was fully initialized in order to allow concurrent tc users to wait on completion for encap entry to finish initialization. That change required all the users of encap entry to obtain reference to it first and for caller that creates encap to put reference to it on error, instead of freeing the entry memory directly. However, releasing reference to such encap entry that wasn't fully initialized causes NULL pointer dereference in mlx5e_rep_encap_entry_detach() which expects e->out_dev to be set and encap to be attached to nhe: [ 1092.454517] BUG: unable to handle page fault for address: 00000000000420e8 [ 1092.454571] #PF: supervisor read access in kernel mode [ 1092.454602] #PF: error_code(0x0000) - not-present page [ 1092.454632] PGD 800000083032c067 P4D 800000083032c067 PUD 84107d067 PMD 0 [ 1092.454673] Oops: 0000 [#1] SMP PTI [ 1092.454697] CPU: 20 PID: 22393 Comm: tc Not tainted 5.3.0-rc3+ #589 [ 1092.454733] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 1092.454806] RIP: 0010:mlx5e_rep_encap_entry_detach+0x1c/0x630 [mlx5_core] [ 1092.454845] Code: be f4 ff ff ff e9 11 ff ff ff 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 89 f3 48 83 ec 30 <48> 8b 87 28 16 04 00 48 89 f7 48 05 d0 03 00 00 48 89 45 c8 e8 cb [ 1092.454942] RSP: 0018:ffffb6f08421f5a0 EFLAGS: 00010286 [ 1092.454974] RAX: 0000000000000000 RBX: ffff8ab668644e00 RCX: ffffb6f08421f56c [ 1092.455013] RDX: ffff8ab668644e40 RSI: ffff8ab668644e00 RDI: 0000000000000ac0 [ 1092.455053] RBP: ffffb6f08421f5f8 R08: 0000000000000001 R09: 0000000000000000 [ 1092.455092] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000ac0 [ 1092.455131] R13: 00000000ffffff9b R14: ffff8ab63f200ac0 R15: ffff8ab668644e40 [ 1092.455171] FS: 00007fa195bdc480(0000) GS:ffff8ab66fa00000(0000) knlGS:0000000000000000 [ 1092.455216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1092.455249] CR2: 00000000000420e8 CR3: 0000000867522001 CR4: 00000000001606e0 [ 1092.455288] Call Trace: [ 1092.455315] ? __mutex_unlock_slowpath+0x4d/0x2a0 [ 1092.455365] mlx5e_encap_dealloc.isra.0+0x31/0x60 [mlx5_core] [ 1092.455424] mlx5e_tc_add_fdb_flow+0x596/0x750 [mlx5_core] [ 1092.455484] __mlx5e_add_fdb_flow+0x152/0x210 [mlx5_core] [ 1092.455534] mlx5e_configure_flower+0x4d5/0xe30 [mlx5_core] [ 1092.455574] tc_setup_cb_call+0x67/0xb0 [ 1092.455601] fl_hw_replace_filter+0x142/0x300 [cls_flower] [ 1092.455639] fl_change+0xd24/0x1bdb [cls_flower] [ 1092.455675] tc_new_tfilter+0x3e0/0x970 [ 1092.455709] ? tc_del_tfilter+0x720/0x720 [ 1092.455735] rtnetlink_rcv_msg+0x389/0x4b0 [ 1092.455763] ? netlink_deliver_tap+0x95/0x400 [ 1092.455791] ? rtnl_dellink+0x2d0/0x2d0 [ 1092.455817] netlink_rcv_skb+0x49/0x110 [ 1092.455844] netlink_unicast+0x171/0x200 [ 1092.455872] netlink_sendmsg+0x224/0x3f0 [ 1092.455901] sock_sendmsg+0x5e/0x60 [ 1092.455924] ___sys_sendmsg+0x2ae/0x330 [ 1092.455950] ? task_work_add+0x43/0x50 [ 1092.455976] ? fput_many+0x45/0x80 [ 1092.456004] ? __lock_acquire+0x248/0x18e0 [ 1092.456033] ? find_held_lock+0x2b/0x80 [ 1092.456058] ? task_work_run+0x7b/0xd0 [ 1092.456085] __sys_sendmsg+0x59/0xa0 [ 1092.457013] do_syscall_64+0x5c/0xb0 [ 1092.457924] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 1092.458842] RIP: 0033:0x7fa195da27b8 [ 1092.459918] Code: 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 65 8f 0c 00 8b 00 85 c0 75 17 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 89 54 [ 1092.462634] RSP: 002b:00007fff94409298 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 1092.464011] RAX: ffffffffffffffda RBX: 000000005d515b0e RCX: 00007fa195da27b8 [ 1092.465391] RDX: 0000000000000000 RSI: 00007fff94409300 RDI: 0000000000000003 [ 1092.466761] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 1092.468121] R10: 0000000000404ec2 R11: 0000000000000246 R12: 0000000000000001 [ 1092.469456] R13: 0000000000480640 R14: 0000000000000016 R15: 0000000000000001 [ 1092.470766] Modules linked in: act_mirred act_tunnel_key cls_flower dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache tun bridge stp llc sunrpc rdma_ucm rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mlx5_core kvm_intel kvm irqbypass crct10dif_pclmul mei_me crc32_pclmul crc32 c_intel igb iTCO_wdt ghash_clmulni_intel ses mlxfw intel_cstate iTCO_vendor_support ptp intel_uncore lpc_ich pps_core mei i2c_i801 joydev intel_rapl_perf ioatdma enclosure ipmi_ssif pcspkr dca wmi ipmi_ si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter ast i2c_algo_bit drm_vram_helper ttm drm_kms_helper drm mpt3sas raid_class scsi_transport_sas [ 1092.479618] CR2: 00000000000420e8 [ 1092.481214] ---[ end trace ce2e0f4d9a67f604 ]--- To fix the issue, set e->compl_result to positive value after encap was initialized successfully. Check e->compl_result value in mlx5e_encap_dealloc() and only detach and dealloc encap if the value is positive. Fixes: `d589e785ba` ("net/mlx5e: Allow concurrent creation of encap entries") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:18 -07:00
Aya Levin	8276ea1353	net/mlx5e: Report and recover from CQE with error on RQ Add support for report and recovery from error on completion on RQ by setting the queue back to ready state. Handle only errors with a syndrome indicating the RQ might enter error state and could be recovered. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:18 -07:00
Saeed Mahameed	0a35ab3e13	net/mlx5e: RX, Handle CQE with error at the earliest stage Just to be aligned with the MPWQE handlers, handle RX WQE with error for legacy RQs in the top RX handlers, just before calling skb_from_cqe(). CQE error handling will now be called at the same stage regardless of the RQ type or netdev mode NIC, Representor, IPoIB, etc .. This will be useful for down stream patch to improve error CQE handling. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:18 -07:00
Aya Levin	32c57fb268	net/mlx5e: Report and recover from rx timeout Add support for report and recovery from rx timeout. On driver open we post NOP work request on the rx channels to trigger napi in order to fillup the rx rings. In case napi wasn't scheduled due to a lost interrupt, perform EQ recovery. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:17 -07:00
Aya Levin	be5323c837	net/mlx5e: Report and recover from CQE error on ICOSQ Add support for report and recovery from error on completion on ICOSQ. Deactivate RQ and flush, then deactivate ICOSQ. Set the queue back to ready state (firmware) and reset the ICOSQ and the RQ (software resources). Finally, activate the ICOSQ and the RQ. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:17 -07:00
Aya Levin	9d18b5144a	net/mlx5e: Split open/close ICOSQ into stages Align ICOSQ open/close behaviour with RQ and SQ. Split open flow into open and activate where open handles creation and activate enables the queue. Do a symmetric thing in close flow: split into close and deactivate. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:17 -07:00
Aya Levin	9032e7192e	net/mlx5e: Add support to rx reporter diagnose Add rx reporter, which supports diagnose call-back. Diagnostics output include: information common to all RQs: RQ type, RQ size, RQ stride size, CQ size and CQ stride size. In addition advertise information per RQ and its related icosq and attached CQ. $ devlink health diagnose pci/0000:00:0b.0 reporter rx Common config: RQ: type: 2 stride size: 2048 size: 8 CQ: stride size: 64 size: 1024 RQs: channel ix: 0 rqn: 4308 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1032 HW status: 0 channel ix: 1 rqn: 4313 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1036 HW status: 0 channel ix: 2 rqn: 4318 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1040 HW status: 0 channel ix: 3 rqn: 4323 HW state: 1 SW state: 3 posted WQEs: 7 cc: 7 ICOSQ HW state: 1 CQ: cqn: 1044 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter rx -jp { "Common config": { "RQ": { "type": 2, "stride size": 2048, "size": 8 }, "CQ": { "stride size": 64, "size": 1024 } }, "RQs": [ { "channel ix": 0, "rqn": 4308, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1032, "HW status": 0 } },{ "channel ix": 1, "rqn": 4313, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1036, "HW status": 0 } },{ "channel ix": 2, "rqn": 4318, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1040, "HW status": 0 } },{ "channel ix": 3, "rqn": 4323, "HW state": 1, "SW state": 3, "posted WQEs": 7, "cc": 7, "ICOSQ HW state": 1, "CQ": { "cqn": 1044, "HW status": 0 } } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:17 -07:00
Aya Levin	11af6a6d09	net/mlx5e: Add helper functions for reporter's basics Introduce helper functions for create and destroy reporters and update channels. In the following patch, rx reporter is added and it will use these helpers too. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:16 -07:00
Aya Levin	2bf09e60ae	net/mlx5e: Add cq info to tx reporter diagnose Add cq information to general diagnose output: CQ size and stride size. Per SQ add information about the related CQ: cqn and CQ's HW status. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common Config: SQ: stride size: 64 size: 1024 CQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1030 HW status: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1034 HW status: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1038 HW status: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 CQ: cqn: 1042 HW status: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common Config": { "SQ": { "stride size": 64, "size": 1024 }, "CQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1030, "HW status": 0 } },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1034, "HW status": 0 } },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1038, "HW status": 0 } },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0, "CQ": { "cqn": 1042, "HW status": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:16 -07:00
Aya Levin	2d708887a4	net/mlx5e: Extend tx reporter diagnostics output Enhance tx reporter's diagnostics output to include: information common to all SQs: SQ size, SQ stride size. In addition add channel ix, tc, txq ix, cc and pc. $ devlink health diagnose pci/0000:00:0b.0 reporter tx Common config: SQ: stride size: 64 size: 1024 SQs: channel ix: 0 tc: 0 txq ix: 0 sqn: 4307 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 1 tc: 0 txq ix: 1 sqn: 4312 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 2 tc: 0 txq ix: 2 sqn: 4317 HW state: 1 stopped: false cc: 0 pc: 0 channel ix: 3 tc: 0 txq ix: 3 sqn: 4322 HW state: 1 stopped: false cc: 0 pc: 0 $ devlink health diagnose pci/0000:00:0b.0 reporter tx -jp { "Common config": { "SQ": { "stride size": 64, "size": 1024 } }, "SQs": [ { "channel ix": 0, "tc": 0, "txq ix": 0, "sqn": 4307, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 1, "tc": 0, "txq ix": 1, "sqn": 4312, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 2, "tc": 0, "txq ix": 2, "sqn": 4317, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 },{ "channel ix": 3, "tc": 0, "txq ix": 3, "sqn": 4322, "HW state": 1, "stopped": false, "cc": 0, "pc": 0 } ] } Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:16 -07:00
Aya Levin	dd921fd241	net/mlx5e: Extend tx diagnose function The following patches in the set enhance the diagnostics info of tx reporter. Therefore, it is better to pass a pointer to the SQ for further data extraction. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:16 -07:00
Aya Levin	c50de4af1d	net/mlx5e: Generalize tx reporter's functionality Prepare for code sharing with rx reporter, which is added in the following patches in the set. Introduce a generic error_ctx for agnostic recovery despatch. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:15 -07:00
Aya Levin	06293ae4fa	net/mlx5e: Change naming convention for reporter's functions Change from mlx5e_tx_reporter_* to mlx5e_reporter_tx_*. In the following patches in the set rx reporter is added, the new naming convention is more uniformed. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:15 -07:00
Aya Levin	4edc17fdfd	net/mlx5e: Rename reporter header file Rename reporter.h -> health.h so patches in the set can use it for health related functionality. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-20 13:08:15 -07:00
David S. Miller	446bf64b61	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Merge conflict of mlx5 resolved using instructions in merge commit `9566e650bf`. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-19 11:54:03 -07:00
Pablo Neira Ayuso	ef01adae0e	net: sched: use major priority number as hardware priority tc transparently maps the software priority number to hardware. Update it to pass the major priority which is what most drivers expect. Update drivers too so they do not need to lshift the priority field of the flow_cls_common_offload object. The stmmac driver is an exception, since this code assumes the tc software priority is fine, therefore, lshift it just to be conservative. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-18 14:13:23 -07:00
Maxim Mikityanskiy	a7bd4018d6	net/mlx5e: Add AF_XDP need_wakeup support This commit adds support for the new need_wakeup feature of AF_XDP. The applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag. When this feature is enabled, some behavior changes: RX side: If the Fill Ring is empty, instead of busy-polling, set the flag to tell the application to kick the driver when it refills the Fill Ring. TX side: If there are pending completions or packets queued for transmission, set the flag to tell the application that it can skip the sendto() syscall and save time. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled: \| without need_wakeup \| with need_wakeup \| \|----------------------\|----------------------\| \| one core \| two cores \| one core \| two cores \| -------\|----------\|-----------\|----------\|-----------\| txonly \| 20.1 \| 33.5 \| 29.0 \| 34.2 \| rxdrop \| 0.065 \| 14.1 \| 12.0 \| 14.1 \| l2fwd \| 0.032 \| 7.3 \| 6.6 \| 7.2 \| "One core" means the application and NAPI run on the same core. "Two cores" means they are pinned to different cores. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Maxim Mikityanskiy	871aa189a6	net/mlx5e: Move the SW XSK code from NAPI poll to a separate function Two XSK tasks are performed during NAPI polling, that are not bound to hardware interrupts: TXing packets and polling for frames in the Fill Ring. They are special in a way that the hardware doesn't know about these tasks, so it doesn't trigger interrupts if there is still some work to be done, it's our driver's responsibility to ensure NAPI will be rescheduled if needed. Create a new function to handle these tasks and move the corresponding code from mlx5e_napi_poll to the new function to improve modularity and prepare for the changes in the following patch. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:32 +02:00
Magnus Karlsson	9116e5e2b1	xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new ndo provides the same functionality as before but with the addition of a new flags field that is used to specifiy if Rx, Tx or both should be woken up. The previous ndo only woke up Tx, as implied by the name. The i40e and ixgbe drivers (which are all the supported ones) are updated with this new interface. This new ndo will be used by the new need_wakeup functionality of XDP sockets that need to be able to wake up both Rx and Tx driver processing. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2019-08-17 23:07:31 +02:00
Eran Ben Elisha	f43d48d10a	net/mlx5e: Fix compatibility issue with ethtool flash device Cited patch deleted ethtool flash device support, as ethtool core can fallback into devlink flash callback. However, this is supported only if there is a devlink port registered over the corresponding netdevice. As mlx5e do not have devlink port support over native netdevice, it broke the ability to flash device via ethtool. This patch re-add the ethtool callback to avoid user functionality breakage when trying to flash device via ethtool. Fixes: `9c8bca2637` ("mlx5: Move firmware flash implementation to devlink") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-15 11:43:57 -07:00
Maxim Mikityanskiy	e0d57d9c7e	net/mlx5e: Fix a race with XSKICOSQ in XSK wakeup flow Add a missing spinlock around XSKICOSQ usage at the activation stage, because there is a race between a configuration change and the application calling sendto(). Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-15 11:43:56 -07:00
Wenwen Wang	48ec7014c5	net/mlx4_en: fix a memory leak bug In mlx4_en_config_rss_steer(), 'rss_map->indir_qp' is allocated through kzalloc(). After that, mlx4_qp_alloc() is invoked to configure RSS indirection. However, if mlx4_qp_alloc() fails, the allocated 'rss_map->indir_qp' is not deallocated, leading to a memory leak bug. To fix the above issue, add the 'qp_alloc_err' label to free 'rss_map->indir_qp'. Fixes: `4931c6ef04` ("net/mlx4_en: Optimized single ring steering") Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>	2019-08-13 19:42:43 -07:00
Doug Ledford	749b9eef50	Merge remote-tracking branch 'mlx5-next/mlx5-next' into wip/dl-for-next Merging tip of mlx5-next in order to get changes related to adding XRQ support to the DEVX interface needed prior to the following two patches. Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-13 12:19:19 -04:00
Yishai Hadas	b1635ee612	net/mlx5: Add XRQ legacy commands opcodes Add XRQ legacy commands opcodes, will be used via the DEVX interface. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-08-13 12:58:11 +03:00
Yishai Hadas	647d58a989	net/mlx5: Use debug message instead of warn As QP may be created by DEVX, it may be valid to not find the rsn in mlx5 core tree, change the level to be debug. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-08-13 12:58:06 +03:00
Petr Machata	8028ccda39	mlxsw: spectrum_ptp: Keep unmatched entries in a linked list To identify timestamps for matching with their packets, Spectrum-1 uses a five-tuple of (port, direction, domain number, message type, sequence ID). If there are several clients from the same domain behind a single port sending Delay_Req's, the only thing differentiating these packets, as far as Spectrum-1 is concerned, is the sequence ID. Should sequence IDs between individual clients be similar, conflicts may arise. That is not a problem to hardware, which will simply deliver timestamps on a first comes, first served basis. However the driver uses a simple hash table to store the unmatched pieces. When a new conflicting piece arrives, it pushes out the previously stored one, which if it is a packet, is delivered without timestamp. Later on as the corresponding timestamps arrive, the first one is mismatched to the second packet, and the second one is never matched and eventually is GCd. To correct this issue, instead of using a simple rhashtable, use rhltable to keep the unmatched entries. Previously, a found unmatched entry would always be removed from the hash table. That is not the case anymore--an incompatible entry is left in the hash table. Therefore removal from the hash table cannot be used to confirm the validity of the looked-up pointer, instead the lookup would simply need to be redone. Therefore move it inside the critical section. This simplifies a lot of the code. Fixes: `8748642751` ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Reported-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-11 21:35:39 -07:00
Greg Kroah-Hartman	9f818c8a73	mlx5: no need to check return value of debugfs_create functions When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. This cleans up a lot of unneeded code and logic around the debugfs files, making all of this much simpler and easier to understand as we don't need to keep the dentries saved anymore. Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-10 15:25:47 -07:00
Chuhong Yuan	b51c225e6c	net/mlx5e: Use refcount_t for refcount refcount_t is better for reference counters since its implementation can prevent overflows. So convert atomic_t ref counters to refcount_t. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	c938451f6b	net/mlx5e: Use vhca_id in generating representor port_index It is desired to use unique port indices when multiple pci devices' devlink instance have the same switch-id. Make use of vhca-id to generate such unique devlink port indices. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	724ee17912	net/mlx5e: Simplify querying port representor parent id System image GUID doesn't depend on eswitch switchdev mode. Hence, remove the check which simplifies the code. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:11 -07:00
Parav Pandit	ef2e4094e0	net/mlx5: E-switch, Removed unused hwid Currently mlx5_eswitch_rep stores same hw ID for all representors. However it is never used from this structure. It is always used from mlx5_vport. Hence, remove unused field. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Vu Pham <vuhuong@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	d589e785ba	net/mlx5e: Allow concurrent creation of encap entries Encap entries creation is fully synchronized by encap_tbl_lock. In order to allow concurrent allocation of hardware resources used to offload encapsulation, extend mlx5e_encap_entry with 'res_ready' completion. Move call to mlx5e_tc_tun_create_header_ipv{4\|6}() out of encap_tbl_lock critical section. Modify code that attaches new flows to existing encap to wait for 'res_ready' completion before using the entry. Insert encap entry to table before provisioning it to hardware and modify all users of the encap table to verify that encap was fully initialized by checking completion result for non-zero value (and to wait for 'res_ready' completion, if necessary). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	61086f3910	net/mlx5e: Protect encap hash table with mutex To remove dependency on rtnl lock, protect encap hash table from concurrent modifications with new "encap_tbl_lock" mutex. Use the mutex to protect internal encap entry state from concurrent modification. This is necessary because a flow can be attached to multiple encap entries simultaneously, which significantly complicates using finer grained per-entry lock. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	948993f2be	net/mlx5e: Extend encap entry with reference counter List of flows attached to encap entry is used as implicit reference counter (encap entry is deallocated when list becomes free) and as a mechanism to obtain encap entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to encap entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending encap with reference counting, extract code that lookups and deletes encap entry into standalone put/get helpers. In order to remove this dependency on external locking, extend encap entry with reference counter to manage its lifetime and extend flow structure with direct pointer to encap entry that flow is attached to. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	a734d00717	net/mlx5e: Allow concurrent creation of mod_hdr entries Mod_hdr entries creation is fully synchronized by mod_hdr_tbl->lock. In order to allow concurrent allocation of hardware resources used to offload header rewrite, extend mlx5e_mod_hdr_entry with 'res_ready' completion. Move call to mlx5_modify_header_alloc() out of mod_hdr_tbl->lock critical section. Modify code that attaches new flows to existing mh to wait for 'res_ready' completion before using the entry. Insert mh to mod_hdr table before provisioning it to hardware and modify all users of mod_hdr table to verify that mh was fully initialized by checking completion result for negative value (and to wait for 'res_ready' completion, if necessary). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:10 -07:00
Vlad Buslov	d2faae25c3	net/mlx5e: Protect mod_hdr hash table with mutex To remove dependency on rtnl lock, protect mod_hdr hash table from concurrent modifications with new mutex. Implement helper function to get flow namespace to prevent code duplication. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	83a52f0d52	net/mlx5e: Protect mod header entry flows list with spinlock To remove dependency on rtnl lock, extend mod header entry with spinlock and use it to protect list of flows attached to mod header entry from concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	dd58edc328	net/mlx5e: Extend mod header entry with reference counter List of flows attached to mod header entry is used as implicit reference counter (mod header entry is deallocated when list becomes free) and as a mechanism to obtain mod header entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to mod header entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending mod header with reference counting, extract code that lookups and deletes mod header entry into standalone put/get helpers. In order to remove this dependency on external locking, extend mod header entry with reference counter to manage its lifetime and extend flow structure with direct pointer to mod header entry that flow is attached to. To remove code duplication between legacy and switchdev mode implementations that both support mod_hdr functionality, store mod_hdr table in dedicated structure used by both fdb and kernel namespaces. New table structure is extended with table lock by one of the following patches in this series. Implement helper function to get correct mod_hdr table depending on flow namespace. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	db76ca2424	net/mlx5e: Allow concurrent creation of hairpin entries Hairpin entries creation is fully synchronized by hairpin_tbl_lock. In order to allow concurrent initialization of mlx5e_hairpin structure instances and provisioning of hairpin entries to hardware, extend mlx5e_hairpin_entry with 'res_ready' completion. Move call to mlx5e_hairpin_create() out of hairpin_tbl_lock critical section. Modify code that attaches new flows to existing hpe to wait for 'res_ready' completion before using the hpe. Insert hpe to hairpin table before provisioning it to hardware and modify all users of hairpin table to verify that hpe was fully initialized by checking hpe->hp pointer (and to wait for 'res_ready' completion, if necessary). Modify dead peer update event handling function to save hpe's to temporary list with their reference counter incremented. Wait for completion of hpe's in temporary list and update their 'peer_gone' flag outside of hairpin_tbl_lock critical section. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:09 -07:00
Vlad Buslov	b32accda8a	net/mlx5e: Protect hairpin hash table with mutex To remove dependency on rtnl lock, protect hairpin hash table from concurrent modifications with new "hairpin_tbl_lock" mutex. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
Vlad Buslov	73edca736e	net/mlx5e: Protect hairpin entry flows list with spinlock To remove dependency on rtnl lock, extend hairpin entry with spinlock and use it to protect list of flows attached to hairpin entry from concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
Vlad Buslov	e4f9abbd38	net/mlx5e: Extend hairpin entry with reference counter List of flows attached to hairpin entry is used as implicit reference counter (hairpin entry is deallocated when list becomes free) and as a mechanism to obtain hairpin entry that flow is attached to (through list head). This is not safe when concurrent modification of list of flows attached to hairpin entry is possible. Proper atomic reference counter is required to support concurrent access. As a preparation for extending hairpin with reference counting, extract code that deletes hairpin entry into standalone function. In order to remove this dependency on external locking, extend hairpin entry with reference counter to manage its lifetime and extend flow structure with direct pointer to hairpin entry that flow is attached to. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-09 14:54:08 -07:00
Jiri Pirko	3a5e523479	devlink: remove pointless data_len arg from region snapshot create The size of the snapshot has to be the same as the size of the region, therefore no need to pass it again during snapshot creation. Remove the arg and use region->size instead. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-09 11:21:46 -07:00
Jiri Pirko	da382875c6	mlxsw: spectrum: Extend to support Spectrum-3 ASIC Extend existing driver for Spectrum and Spectrum-2 ASICs to support Spectrum-3 ASIC as well. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 22:27:09 -07:00
wenxu	4e481908c5	flow_offload: move tc indirect block to flow offload move tc indirect block to flow_offload and rename it to flow indirect block.The nf_tables can use the indr block architecture. Signed-off-by: wenxu <wenxu@ucloud.cn> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-08 18:44:30 -07:00
Aya Levin	a4e508cab6	net/mlx5e: Remove redundant check in CQE recovery flow of tx reporter Remove check of recovery bit, in the beginning of the CQE recovery function. This test is already performed right before the reporter is invoked, when CQE error is detected. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:20 -07:00
Aya Levin	276d197e70	net/mlx5e: Fix error flow of CQE recovery on tx reporter CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving the recovery function, to allow further recoveries to take place. This allows removal of clearing recovery bit on sq activate. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:20 -07:00
Aya Levin	d9a2fcf53c	net/mlx5e: Fix false negative indication on tx reporter CQE recovery Remove wrong error return value when SQ is not in error state. CQE recovery on TX reporter queries the sq state. If the sq is not in error state, the sq is either in ready or reset state. Ready state is good state which doesn't require recovery and reset state is a temporal state which ends in ready state. With this patch, CQE recovery in this scenario is successful. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:20 -07:00
Tariq Toukan	b86f1abe2c	net/mlx5e: kTLS, Fix tisn field placement Shift the tisn field in the WQE control segment, per the HW specification. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:19 -07:00
Tariq Toukan	f1897b3cd1	net/mlx5e: kTLS, Fix tisn field name Use the proper tisn field name from the union in struct mlx5_wqe_ctrl_seg. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:19 -07:00
Tariq Toukan	a9bc339032	net/mlx5e: kTLS, Fix progress params context WQE layout The TLS progress params context WQE should not include an Eth segment, drop it. In addition, align the tls_progress_params layout with the HW specification document: - fix the tisn field name. - remove the valid bit. Fixes: `a12ff35e0f` ("net/mlx5: Introduce TLS TX offload hardware bits and structures") Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:19 -07:00
Tariq Toukan	55c9bd37ef	net/mlx5: crypto, Fix wrong offset in encryption key command Fix the 128b key offset in key encryption key creation command, per the HW specification. Fixes: `45d3b55dc6` ("net/mlx5: Add crypto library to support create/destroy encryption key") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:19 -07:00
Mohamad Heib	5faf5b70c5	net/mlx5e: ethtool, Avoid setting speed to 56GBASE when autoneg off Setting speed to 56GBASE is allowed only with auto-negotiation enabled. This patch prevent setting speed to 56GBASE when auto-negotiation disabled. Fixes: `f62b8bb8f2` ("net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality") Signed-off-by: Mohamad Heib <mohamadh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:18 -07:00
Huy Nguyen	466df6eb4a	net/mlx5e: Only support tx/rx pause setting for port owner Only support changing tx/rx pause frame setting if the net device is the vport group manager. Fixes: `3c2d18ef22` ("net/mlx5e: Support ethtool get/set_pauseparam") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:18 -07:00
Huy Nguyen	93b3586e07	net/mlx5: Support inner header match criteria for non decap flow action We have an issue that OVS application creates an offloaded drop rule that drops VXLAN traffic with both inner and outer header match criteria. mlx5_core driver detects correctly the inner and outer header match criteria but does not enable the inner header match criteria due to an incorrect assumption in mlx5_eswitch_add_offloaded_rule that only decap rule needs inner header criteria. Solution: Remove mlx5_esw_flow_attr's match_level and tunnel_match_level and add two new members: inner_match_level and outer_match_level. inner/outer_match_level is set to NONE if the inner/outer match criteria is not specified in the tc rule creation request. The decap assumption is removed and the code just needs to check for inner/outer_match_level to enable the corresponding bit in firmware's match_criteria_enable value. Fixes: `6363651d6d` ("net/mlx5e: Properly set steering match levels for offloaded TC decap rules") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:18 -07:00
Maxim Mikityanskiy	405b93eb76	net/mlx5e: Use flow keys dissector to parse packets for ARFS The current ARFS code relies on certain fields to be set in the SKB (e.g. transport_header) and extracts IP addresses and ports by custom code that parses the packet. The necessary SKB fields, however, are not always set at that point, which leads to an out-of-bounds access. Use skb_flow_dissect_flow_keys() to get the necessary information reliably, fix the out-of-bounds access and reuse the code. Fixes: `18c908e477` ("net/mlx5e: Add accelerated RFS support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-08 13:01:18 -07:00
Chuhong Yuan	94f3e14e00	mlx5: Use refcount_t for refcount Reference counters are preferred to use refcount_t instead of atomic_t. This is because the implementation of refcount_t can prevent overflows and detect possible use-after-free. So convert atomic_t ref counters to refcount_t. Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-07 11:01:48 -07:00
David S. Miller	13dfb3fa49	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Just minor overlapping changes in the conflicts here. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-06 18:44:57 -07:00
Colin Ian King	694a296024	net/mlx5: remove self-assignment on esw->dev There is a self assignment of esw->dev to itself, clean this up by removing it. Also make dev a const pointer. Addresses-Coverity: ("Self assignment") Fixes: `6cedde4513` ("net/mlx5: E-Switch, Verify support QoS element type") Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-06 14:00:04 -07:00
Qian Cai	60d60c8fbd	net/mlx5e: always initialize frag->last_in_page The commit `069d11465a` ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme") introduced an undefined behaviour below due to "frag->last_in_page" is only initialized in mlx5e_init_frags_partition() when, if (next_frag.offset + frag_info[f].frag_stride > PAGE_SIZE) or after bailed out the loop, for (i = 0; i < mlx5_wq_cyc_get_size(&rq->wqe.wq); i++) As the result, there could be some "frag" have uninitialized value of "last_in_page". Later, get_frag() obtains those "frag" and check "frag->last_in_page" in mlx5e_put_rx_frag() and triggers the error during boot. Fix it by always initializing "frag->last_in_page" to "false" in mlx5e_init_frags_partition(). UBSAN: Undefined behaviour in drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:325:12 load of value 170 is not a valid value for type 'bool' (aka '_Bool') Call trace: dump_backtrace+0x0/0x264 show_stack+0x20/0x2c dump_stack+0xb0/0x104 __ubsan_handle_load_invalid_value+0x104/0x128 mlx5e_handle_rx_cqe+0x8e8/0x12cc [mlx5_core] mlx5e_poll_rx_cq+0xca8/0x1a94 [mlx5_core] mlx5e_napi_poll+0x17c/0xa30 [mlx5_core] net_rx_action+0x248/0x940 __do_softirq+0x350/0x7b8 irq_exit+0x200/0x26c __handle_domain_irq+0xc8/0x128 gic_handle_irq+0x138/0x228 el1_irq+0xb8/0x140 arch_cpu_idle+0x1a4/0x348 do_idle+0x114/0x1b0 cpu_startup_entry+0x24/0x28 rest_init+0x1ac/0x1dc arch_call_rest_init+0x10/0x18 start_kernel+0x4d4/0x57c Fixes: `069d11465a` ("net/mlx5e: RX, Enhance legacy Receive Queue memory scheme") Signed-off-by: Qian Cai <cai@lca.pw> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-08-05 11:13:05 -07:00
Joe Perches	31d0e6c149	mlx5: Fix formats with line continuation whitespace The line continuations unintentionally add whitespace so instead use coalesced formats to remove the whitespace. Signed-off-by: Joe Perches <joe@perches.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Link: https://lore.kernel.org/r/f14db3287b23ed8af9bdbf8001e2e2fe7ae9e43a.camel@perches.com Signed-off-by: Doug Ledford <dledford@redhat.com>	2019-08-02 15:04:54 -04:00
Tonghao Zhang	6830b46825	net/mlx5e: Allow dropping specific tunnel packets In some case, we don't want to allow specific tunnel packets to host that can avoid to take up high CPU (e.g network attacks). But other tunnel packets which not matched in hardware will be sent to host too. $ tc filter add dev vxlan_sys_4789 \ protocol ip chain 0 parent ffff: prio 1 handle 1 \ flower dst_ip 1.1.1.100 ip_proto tcp dst_port 80 \ enc_dst_ip 2.2.2.100 enc_key_id 100 enc_dst_port 4789 \ action tunnel_key unset pipe action drop Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:33 -07:00
Aya Levin	c9e6c7209a	net/mlx5e: TX reporter cleanup Remove redundant include files. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:32 -07:00
Aya Levin	baf6dfdb10	net/mlx5e: Set tx reporter only on successful creation When failing to create tx reporter, don't set the reporter's pointer. Creating a reporter is not mandatory for driver load, avoid garbage/error pointer. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:32 -07:00
Aya Levin	7f7cc235c2	net/mlx5e: Fix mlx5e_tx_reporter_create return value Return error when failing to create a reporter in devlink. Since NET_DEVLINK mandatory to MLX5_CORE in Kconfig, returned pointer can't be NULL and can only hold an error in bad path. Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:32 -07:00
Saeed Mahameed	8c7698d5ca	net/mlx5e: Rx, checksum handling refactoring Move vlan checksum fixup flow into mlx5e_skb_padding_csum(), which is supposed to fixup SKB checksum if needed. And rename mlx5e_skb_padding_csum() to mlx5e_skb_csum_fixup(). Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:31 -07:00
Tariq Toukan	b431302e92	net/mlx5e: Tx, Soften inline mode VLAN dependencies If capable, use zero inline mode in TX WQE for non-VLAN packets. For VLAN ones, keep the enforcement of at least L2 inline mode, unless the WQE VLAN insertion offload cap is on. Performance: Tested single core packet rate of 64Bytes. NIC: ConnectX-5 CPU: Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz pktgen: Before: 12.46 Mpps After: 14.65 Mpps (+17.5%) XDP_TX: The MPWQE flow is not affected, as it already has this optimization. So we test with priv-flag xdp_tx_mpwqe: off. Before: 9.90 Mpps After: 10.20 Mpps (+3%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Tested-by: Noam Stolero <noams@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:31 -07:00
Tariq Toukan	7cf6f811b7	net/mlx5e: XDP, Slight enhancement for WQE fetch function Instead of passing an output param, let function return the WQE pointer. In addition, pass &pi so it gets its value in the function, and save the redundant assignment that comes after it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:31 -07:00
Shay Agroskin	6c085a8aab	net/mlx5e: XDP, Close TX MPWQE session when no room for inline packet left In MPWQE mode, when transmitting packets with XDP, a packet that is smaller than a certain size (set to 256 bytes) would be sent inline within its WQE TX descriptor (mem-copied), in case the hardware tx queue is congested beyond a pre-defined water-mark. If a MPWQE cannot contain an additional inline packet, we close this MPWQE session, and send the packet inlined within the next MPWQE. To save some MPWQE session close+open operations, we don't open MPWQE sessions that are contiguously smaller than certain size (set to the HW MPWQE maximum size). If there isn't enough contiguous room in the send queue, we fill it with NOPs and wrap the send queue index around. This way, qualified packets are always sent inline. Perf tests: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: With 24 channels: \| ------ \| bounced packets \| inlined packets \| inline ratio \| \| before \| 113.6Mpps \| 96.3Mpps \| 84% \| \| after \| 115Mpps \| 99.5Mpps \| 86% \| With one channel: \| ------ \| bounced packets \| inlined packets \| inline ratio \| \| before \| 6.7Mpps \| 0pps \| 0% \| \| after \| 6.8Mpps \| 0pps \| 0% \| As we can see, there is improvement in both inline ratio and overall packet rate for 24 channels. Also, we see no degradation for the one-channel case. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:31 -07:00
Tariq Toukan	68865419ba	net/mlx5e: Tx, Strict the room needed for SQ edge NOPs We use NOPs to populate the WQ fragment edge if the WQE does not fit in frag, to avoid WQEs crossing a page boundary (or wrap-around the WQ). The upper bound on the needed number of NOPs is one WQEBB less than the largest possible WQE, for otherwise the WQE would certainly fit. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:30 -07:00
Gavi Teitz	558101f1b9	net/mlx5: Add flow counter pool Add a pool of flow counters, based on flow counter bulks, removing the need to allocate a new counter via a costly FW command during the flow creation process. The time it takes to acquire/release a flow counter is cut from ~50 [us] to ~50 [ns]. The pool is part of the mlx5 driver instance, and provides flow counters for aging flows. mlx5_fc_create() was modified to provide counters for aging flows from the pool by default, and mlx5_destroy_fc() was modified to release counters back to the pool for later reuse. If bulk allocation is not supported or fails, and for non-aging flows, the fallback behavior is to allocate and free individual counters. The pool is comprised of three lists of flow counter bulks, one of fully used bulks, one of partially used bulks, and one of unused bulks. Counters are provided from the partially used bulks first, to help limit bulk fragmentation. The pool maintains a threshold, and strives to maintain the amount of available counters below it. The pool is increased in size when a counter acquisition request is made and there are no available counters, and it is decreased in size when the last counter in a bulk is released and there are more available counters than the threshold. All pool size changes are done in the context of the acquiring/releasing process. The value of the threshold is directly correlated to the amount of used counters the pool is providing, while constrained by a hard maximum, and is recalculated every time a bulk is allocated/freed. This ensures that the pool only consumes large amounts of memory for available counters if the pool is being used heavily. When fully populated and at the hard maximum, the buffer of available counters consumes ~40 [MB]. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:30 -07:00
Gavi Teitz	5d8a025365	net/mlx5: Add flow counter bulk infrastructure Add infrastructure to track bulks of flow counters, providing the means to allocate and deallocate bulks, and to acquire and release individual counters from the bulks. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:30 -07:00
Eli Cohen	fcb64c0f56	net/mlx5: E-Switch, add ingress rate support Use the scheduling elements to implement ingress rate limiter on an eswitch ports ingress traffic. Since the ingress of eswitch port is the egress of VF port, we control eswitch ingress by controlling VF egress. Configuration is done using the ports' representor net devices. Please note that burst size configuration is not supported by devices ConnectX-5 and earlier generations. Configuration examples: tc: tc filter add dev enp59s0f0_0 root protocol ip matchall action police rate 1mbit burst 20k ovs: ovs-vsctl set interface eth0 ingress_policing_rate=1000 Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:30 -07:00
Saeed Mahameed	68e18626df	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch. 1) Eli improves the handling of the support for QoS element type 2) Gavi refactors and prepares mlx5 flow counters for bulk allocation support 3) Parav, refactors and improves E-Switch load/unload flows 4) Saeed, two misc cleanups Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 12:33:14 -07:00
Parav Pandit	5896b97296	net/mlx5: E-switch, Tide up eswitch config sequence Currently for PF and ECPF vports, representors are created before their eswitch hardware ports are initialized in below flow. mlx5_eswitch_enable() esw_offloads_init() esw_offloads_load_all_reps() [..] esw_enable_vport() However for VFs, vports are initialized before creating their respective netdev represnetors in event handling context. Similarly while disabling eswitch, first hardware vports are disabled, followed by destroying their representors. Here while underlying vports gets destroyed but its respective user facing netdevice can still exist on which user can continue to perform more offload operations. Instead, its more accurate to do enable_eswitch switchdev mode: 1. perform FDB tables initialization 2. initialize hw vport 3. create and publish representor for this vport disable_eswitch switchdev mode: 1. destroy user facing representor for the vport 2. disable hw vport 3. perform FDB tables cleanup Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Parav Pandit	131ce70140	net/mlx5: E-Switch, Remove redundant mc_promisc NULL check mc_promisc pointer points to an instance of struct esw_mc_addr allocated as part of the esw structure. Hence it cannot be NULL. Removed such redundant check and assign where it is actually used. While at it, add comment around legacy mode fields and move mc_promisc close to other legacy mode structures to improve code redability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Saeed Mahameed	9ddb830a14	net/mlx5: E-Switch, remove redundant error handling We don't need to handle error flow of esw_create_legacy_table() in the same branch, it is already being handled directly after the if statement, for both legacy and switchdev modes in one place. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Parav Pandit	5019833d66	net/mlx5: E-switch, Introduce helper function to enable/disable vports vports needs to be enabled in switchdev and legacy mode. In switchdev mode, vports should be enabled after initializing the FDB tables and before creating their represntors so that representor works on an initialized vport object. Prepare a helper function which can be called when enabling either of the eswitch modes. Similarly, have disable vports helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Parav Pandit	610090ebce	net/mlx5: E-switch, Initialize TSAR Qos hardware block before its user vports First enable TSAR Qos hardware block in device before enabling its user vports. This refactor is needed so that vports can be enabled before their representor netdevice can be created. While at it, esw_create_tsar() returns error code which was used only to print error. However esw_create_tsar() already prints warning if it hits an error. Hence, remove the redundant warning. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Parav Pandit	332bd3a5b9	net/mlx5: E-switch, Combine metadata enable/disable functionality Except bit toggling code, rest of the code is same to enable/disable metadata passing functionality. Hence, combine them to single function and control using enable flag. Also instead of checking metadata supported at multiple places, fold into the helper function. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Eli Cohen	6cedde4513	net/mlx5: E-Switch, Verify support QoS element type Check if firmware supports the requested element type before attempting to create the element type. In addition, explicitly specify the request element type and tsar type. Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Parav Pandit	0000a5f250	net/mlx5: Make load_one() and unload_one() symmetric Currently mlx5_load_one() perform device registration using mlx5_register_device(). But mlx5_unload_one() doesn't unregister. Make them symmetric by doing device unregistration in mlx5_unload_one(). Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:25 -07:00
Gavi Teitz	8536a6bf2e	net/mlx5: Add flow counter bulk allocation hardware bits and command Add a handle to invoke the new FW capability of allocating a bulk of flow counters. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:24 -07:00
Gavi Teitz	6f06e04b67	net/mlx5: Refactor and optimize flow counter bulk query Towards introducing the ability to allocate bulks of flow counters, refactor the flow counter bulk query process, removing functions and structs whose names indicated being used for flow counter bulk allocation FW commands, despite them actually only being used to support bulk querying, and migrate their functionality to correctly named functions in their natural location, fs_counters.c. Additionally, optimize the bulk query process by: * Extracting the memory used for the query to mlx5_fc_stats so that it is only allocated once, and not for each bulk query. * Querying all the counters in one function call. Signed-off-by: Gavi Teitz <gavi@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-08-01 11:14:24 -07:00
Petr Machata	744ad9a357	mlxsw: spectrum_buffers: Further reduce pool size on Spectrum-2 In commit `e891ce1dd2` ("mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2"), pool size was reduced to mitigate a problem in port buffer usage of ports split four ways. It turns out that this work around does not solve the issue, and a further reduction is required. Thus reduce the size of pool 0 by another 2.7 MiB, and round down to the whole number of cells. Fixes: `e891ce1dd2` ("mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:22:12 -07:00
Jiri Pirko	28fe79000e	mlxsw: spectrum: Fix error path in mlxsw_sp_module_init() In case of sp2 pci driver registration fail, fix the error path to start with sp1 pci driver unregister. Fixes: `c3ab435466` ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-31 08:22:12 -07:00
Colin Ian King	2ad07086a5	mlxsw: spectrum_ptp: fix duplicated check on orig_egr_types Currently are duplicated checks on orig_egr_types which are redundant, I believe this is a typo and should actually be orig_ing_types \|\| orig_egr_types instead of the expression orig_egr_types \|\| orig_egr_types. Fix these. Addresses-Coverity: ("Same on both sides") Fixes: `c6b36bdd04` ("mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-30 10:19:47 -07:00
Vlad Buslov	b6fac0b46a	net/mlx5e: Protect tc flow table with mutex TC flow table is created when first flow is added, and destroyed when last flow is removed. This assumes that all accesses to the table are externally synchronized with rtnl lock. To remove dependency on rtnl lock, add new mutex mlx5e_tc_table->t_lock and use it to protect the flow table. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:26 -07:00
Vlad Buslov	fa833bd52b	net/mlx5e: Rely on rcu instead of rtnl lock when getting upper dev Function netdev_master_upper_dev_get() generates warning if caller doesn't hold rtnl lock. Modify rules update path to use rcu version of that function. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:26 -07:00
Vlad Buslov	0e18134f4f	net/mlx5e: Eswitch, use state_lock to synchronize vlan change esw->state_lock is already used to protect vlan vport configuration change. However, all preparation and correctness checks, and code that sets vport data are not protected by this lock and assume external synchronization by rtnl lock. In order to remove dependency on rtnl lock, extend esw->state_lock protection to whole eswitch vlan add/del functions. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:26 -07:00
Vlad Buslov	525e84bea5	net/mlx5e: Eswitch, change offloads num_flows type to atomic64 Eswitch implements its own locking by means of state_lock mutex and multiple fine-grained lock in containing data structures, and is supposed to not rely on rtnl lock. However, eswitch offloads num_flows type is a regular long long integer and cannot be modified concurrently. This is an implicit assumptions that mlx5 tc is serialized (by rtnl lock or any other means). In order to remove implicit dependency on rtnl lock, change num_flows type to atomic64 to allow concurrent modifications. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:25 -07:00
Vlad Buslov	ad86755b18	net/mlx5e: Protect unready flows with dedicated lock In order to remove dependency on rtnl lock for protecting unready_flows list when reoffloading unready flows on workqueue, extend representor uplink private structure with dedicated 'unready_flows_lock' mutex. Take the lock in all users of unready_flows list before accessing it. Implement helper functions to add and delete unready flow. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:25 -07:00
Vlad Buslov	c5d326b296	net/mlx5e: Protect tc flows hashtable with rcu In order to remove dependency on rtnl lock, access to tc flows hashtable must be explicitly protected from concurrent flows removal. Extend tc flow structure with rcu to allow concurrent parallel access. Use rcu read lock to safely lookup flow in tc flows hash table, and take reference to it. Use rcu free for flow deletion to accommodate concurrent stats requests. Add new DELETED flow flag. Imlement new flow_flag_test_and_set() helper that is used to set a flag and return its previous value. Use it to atomically set the flag in mlx5e_delete_flower() to guarantee that flow can only be deleted once, even when same flow is deleted concurrently by multiple tasks. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:25 -07:00
Vlad Buslov	226f2ca307	net/mlx5e: Change flow flags type to unsigned long To remove dependency on rtnl lock and allow concurrent modification of 'flags' field of tc flow structure, change flow flag type to unsigned long and use atomic bit ops for reading and changing the flags. Implement auxiliary functions for setting, resetting and getting specific flag, and for checking most often used flag values. Always set flags with smp_mb__before_atomic() to ensure that all mlx5e_tc_flow are updated before concurrent readers can read new flags value. Rearrange all code paths to actually set flow->rule[] pointers before setting the OFFLOADED flag. On read side, use smp_mb__after_atomic() when accessing flags to ensure that offload-related flow fields are only read after the flags. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:24 -07:00
Vlad Buslov	5a7e5bcb66	net/mlx5e: Extend tc flow struct with reference counter With new classifier type that doesn't require rtnl lock, following invariant holds: - Filter with specified cookie created only once. - Filter with specified cookie deleted only once. - Stats updates can be performed in parallel to each other. Extend tc flow with rcu and reference counter. To protect from concurrent delete, get reference to tc flow when: - Reading flow stats. - Accessing flow in neigh update handler. - Accessing flow in neigh update used value handler. Only free flow when reference counter reached zero. Modify flow cleanup to account for flows that could be not fully initialized by checking if flow is actually in the list of corresponding mod_hdr, hairpin and encap entries. Don't cleanup flow directly in case of error to allow concurrent neigh update (neigh update will be modified to always take reference to flow when using it). Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:24 -07:00
Eli Britstein	233fd21211	net/mlx5e: Simplify get_route_and_out_devs helper function The helper function has "if" branches that do the same. Merge them to simplify the code. Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:24 -07:00
wenxu	aae67158da	net/mlx5e: Fix unnecessary flow_block_cb_is_busy call When call flow_block_cb_is_busy. The indr_priv is guaranteed to NULL ptr. So there is no need to call flow_bock_cb_is_busy. Fixes: `0d4fd02e71` ("net: flow_offload: add flow_block_cb_is_busy() and use it") Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:24 -07:00
Saeed Mahameed	79ce39be1d	net/mlx5e: Improve ethtool rxnfc callback structure Don't choose who implements the rxnfc "get/set" callbacks according to CONFIG_MLX5_EN_RXNFC, instead have the callbacks always available and delegate to a function of a different driver module when needed (en_fs_ethtool.c), have stubs in en/fs.h to fallback to when en_fs_ethtool.c is compiled out, to avoid complications and ifdefs in en_main.c. Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:23 -07:00
Saeed Mahameed	4240196776	net/mlx5e: Avoid warning print when not required When disabling CQE compression in favor of time-stamping, don't show a warning when CQE compression is already disabled. Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:23 -07:00
Huy Nguyen	842a2eb28f	net/mlx5e: Print a warning when LRO feature is dropped or not allowed When user enables LRO via ethtool and if the RQ mode is legacy, mlx5e_fix_features drops the request without any explanation. Add netdev_warn to cover this case. Fixes: `6c3a823e1e` ("net/mlx5e: RX, Remove HW LRO support in legacy RQ") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 16:40:23 -07:00
Qian Cai	0470e5e38c	net/mlx5: fix -Wtype-limits compilation warnings The commit `b9a7ba5562` ("net/mlx5: Use event mask based on device capabilities") introduced a few compilation warnings due to it bumps MLX5_EVENT_TYPE_MAX from 0x27 to 0x100 which is always greater than an "struct {mlx5_eqe\|mlx5_nb}.type" that is an "u8". drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_eq_notifier_register': drivers/net/ethernet/mellanox/mlx5/core/eq.c:948:21: warning: comparison is always false due to limited range of data type [-Wtype-limits] if (nb->event_type >= MLX5_EVENT_TYPE_MAX) ^~ drivers/net/ethernet/mellanox/mlx5/core/eq.c: In function 'mlx5_eq_notifier_unregister': drivers/net/ethernet/mellanox/mlx5/core/eq.c:959:21: warning: comparison is always false due to limited range of data type [-Wtype-limits] if (nb->event_type >= MLX5_EVENT_TYPE_MAX) Fix them by removing unnecessary checkings. Fixes: `b9a7ba5562` ("net/mlx5: Use event mask based on device capabilities") Signed-off-by: Qian Cai <cai@lca.pw> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-29 14:20:19 -07:00
Petr Machata	c6b36bdd04	mlxsw: spectrum_ptp: Increase parsing depth when PTP is enabled Spectrum systems have a configurable limit on how far into the packet they parse. By default, the limit is 96 bytes. An IPv6 PTP packet is layered as Ethernet/IPv6/UDP (14+40+8 bytes), and sequence ID of a PTP event is only available 32 bytes into payload, for a total of 94 bytes. When an additional 802.1q header is present as well (such as when ptp4l is running on a VLAN port), the parsing limit is exceeded. Such packets are not recognized as PTP, and are not timestamped. Therefore generalize the current VXLAN-specific parsing depth setting to allow reference-counted requests from other modules as well. Keep it in the VXLAN module, because the MPRS register also configures UDP destination port number used for VXLAN, and is thus closely tied to the VXLAN code anyway. Then invoke the new interfaces from both VXLAN (in obvious places), as well as from PTP code, when the (global) timestamping configuration changes from disabled to enabled or vice versa. Fixes: `8748642751` ("mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls") Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-29 13:55:05 -07:00
Jiri Pirko	7079676d09	mlxsw: spectrum_flower: Forbid to offload match on reserved TCP flags bits Matching on reserved TCP flags bits is only supported using custom parser. Since the usecase for that is not known now, just forbid to offload rules that match on these bits. Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 14:32:31 -07:00
Jiri Pirko	c9588e2812	mlxsw: spectrum_acl: Track rules that forbid egress block bind Some matches and actions are not supported on egress. Track such rules and forbid a bind of block which contains them to egress. With this patch, the kernel tells the user he cannot do that: $ tc qdisc add dev ens16np1 ingress_block 22 clsact $ tc filter add block 22 protocol 802.1q pref 2 handle 101 flower vlan_id 100 skip_sw action pass $ tc qdisc add dev ens16np2 egress_block 22 clsact Error: mlxsw_spectrum: Block cannot be bound to egress because it contains unsupported rules. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 14:32:31 -07:00
Jiri Pirko	185556f092	mlxsw: spectrum_flower: Forbid to offload mirred redirect on egress Spectrum ASIC does not support redirection on egress, so refuse to insert such flows: $ tc qdisc add dev ens16np1 clsact $ tc filter add dev ens16np1 egress protocol all pref 1 handle 101 flower skip_sw action mirred egress redirect dev ens16np2 Error: mlxsw_spectrum: Redirect action is not supported on egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 14:32:31 -07:00
Davide Caratti	91c6bfb831	mlx4/en_netdev: allow offloading VXLAN over VLAN ConnectX-3 Pro can offload transmission of VLAN packets with VXLAN inside: enable tunnel offloads in dev->vlan_features, like it's done with other NIC drivers (e.g. be2net and ixgbe). It's no more necessary to change dev->hw_enc_features when VXLAN are added or removed, since .ndo_features_check() already checks for VXLAN packet where the UDP destination port matches the configured value. Just set dev->hw_enc_features when the NIC is initialized, so that overlying VLAN can correctly inherit the tunnel offload capabilities. Changes since v1: - avoid flipping hw_enc_features, instead of calling netdev notifiers, thanks to Saeed Mahameed - squash two patches into a single one CC: Paolo Abeni <pabeni@redhat.com> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-27 14:22:32 -07:00
David S. Miller	0a062ba725	mlx5-fixes-2019-07-25 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl06EYUACgkQSD+KveBX +j7R4QgAht/C4115mi1Tc3d3zYjHp3SWLFxwK4vF0U2j30ouhsj1oaIP8bQdw6Mr 6hS4IZSdKNO5wo+NNqMnLYVtsAnvNGOuvYwUvMK5TDkdDb2lIzRlxihpWgTqWzXr 6Eh3nv5rTItgLMqxbLL1EE8Idlx3HQDJtU2a/AmxjmU/TqSKzbBTpnKIlRMPDFNC PLWXjFXBR/XtcTbsnj7RtlD2HkDAERVTiMP2mlTvXjXxlN56YXCle4CWZamgH9H4 bTCrZwQHH9hllMAnAkq4gpHN7Z6/eXjV6jzu+BOE7ChOaEC5N2F+p5ARXqe+HwRL apMYgRH5u4mzDt+1CbwR/I/pFOw3WA== =NXce -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-25 This series introduces some fixes to mlx5 driver. 1) Ariel is addressing an issue with enacp flow counter race condition 2) Aya fixes ethtool speed handling 3) Edward fixes modify_cq hw bits alignment 4) Maor fixes RDMA_RX capabilities handling 5) Mark reverses unregister devices order to address an issue with LAG 6) From Tariq, - wrong max num channels indication regression - TLS counters naming and documentation as suggested by Jakub - kTLS, Call WARN_ONCE on netdev mismatch There is one patch in this series that touches nfp driver to align TLS statistics names with latest documentation, Jakub is CC'ed. Please pull and let me know if there is any problem. For -stable v4.9: ('net/mlx5: Use reversed order when unregister devices') For -stable v4.20 ('net/mlx5e: Prevent encap flow counter update async to user query') ('net/mlx5: Fix modify_cq_in alignment') For -stable v5.1 ('net/mlx5e: Fix matching of speed to PRM link modes') For -stable v5.2 ('net/mlx5: Add missing RDMA_RX capabilities') ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-26 14:26:41 -07:00
Tariq Toukan	304ecc9a34	net/mlx5e: kTLS, Call WARN_ONCE on netdev mismatch A netdev mismatch in the processed TLS SKB should not occur, and indicates a kernel bug. Add WARN_ONCE to spot such cases. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:31:00 -07:00
Ariel Levkovich	90bb769291	net/mlx5e: Prevent encap flow counter update async to user query This patch prevents a race between user invoked cached counters query and a neighbor last usage updater. The cached flow counter stats can be queried by calling "mlx5_fc_query_cached" which provides the number of bytes and packets that passed via this flow since the last time this counter was queried. It does so by reducting the last saved stats from the current, cached stats and then updating the last saved stats with the cached stats. It also provide the lastuse value for that flow. Since "mlx5e_tc_update_neigh_used_value" needs to retrieve the last usage time of encapsulation flows, it calls the flow counter query method periodically and async to user queries of the flow counter using cls_flower. This call is causing the driver to update the last reported bytes and packets from the cache and therefore, future user queries of the flow stats will return lower than expected number for bytes and packets since the last saved stats in the driver was updated async to the last saved stats in cls_flower. This causes wrong stats presentation of encapsulation flows to user. Since the neighbor usage updater only needs the lastuse stats from the cached counter, the fix is to use a dedicated lastuse query call that returns the lastuse value without synching between the cached stats and the last saved stats. Fixes: `f6dfb4c3f2` ("net/mlx5e: Update neighbour 'used' state using HW flow rules counters") Signed-off-by: Ariel Levkovich <lariel@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:31:00 -07:00
Aya Levin	4b95840a6c	net/mlx5e: Fix matching of speed to PRM link modes Speed translation is performed based on legacy or extended PTYS register. Translate speed with respect to: 1) Capability bit of extended PTYS table. 2) User request: a) When auto-negotiation is turned on, inspect advertisement whether it contains extended link modes. b) When auto-negotiation is turned off, speed > 100Gbps (maximal speed supported in legacy mode). With both conditions fulfilled translation is done with extended PTYS table otherwise use legacy PTYS table. Without this patch 25/50/100 Gbps speed cannot be set, since try to configure in extended mode but read from legacy mode. Fixes: `dd1b9e09c1` ("net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:31:00 -07:00
Tariq Toukan	694826e366	net/mlx5e: Fix wrong max num channels indication No XSK support in the enhanced IPoIB driver and representors. Add a profile property to specify this, and enhance the logic that calculates the max number of channels to take it into account. Fixes: `db05815b36` ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:30:59 -07:00
Maor Gottlieb	987f6c69dd	net/mlx5: Add missing RDMA_RX capabilities New flow table type RDMA_RX was added but the MLX5_CAP_FLOW_TABLE_TYPE didn't handle this new flow table type. This means that MLX5_CAP_FLOW_TABLE_TYPE returns an empty capability to this flow table type. Update both the macro and the maximum supported flow table type to RDMA_RX. Fixes: `d83eb50e29` ("net/mlx5: Add support in RDMA RX steering") Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:30:59 -07:00
Mark Zhang	08aa5e7da6	net/mlx5: Use reversed order when unregister devices When lag is active, which is controlled by the bonded mlx5e netdev, mlx5 interface unregestering must happen in the reverse order where rdma is unregistered (unloaded) first, to guarantee all references to the lag context in hardware is removed, then remove mlx5e netdev interface which will cleanup the lag context from hardware. Without this fix during destroy of LAG interface, we observed following errors: * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xe4ac33) * mlx5_cmd_check:752:(pid 12556): DESTROY_LAG(0x843) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa5aee8). Fixes: `a31208b1e1` ("net/mlx5_core: New init and exit flow for mlx5_core") Reviewed-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Mark Zhang <markz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-25 13:30:59 -07:00
Ido Schimmel	fc25996e6f	mlxsw: spectrum_router: Increase scale of IPv6 nexthop groups Unlike IPv4, the kernel does not consolidate IPv6 nexthop groups. To avoid exhausting the device's adjacency table - where nexthops are stored - the driver does this consolidation instead. Each nexthop group is hashed by XOR-ing the interface indexes of all the member nexthop devices. However, the ifindex itself is not hashed, which can result in identical keys used for different groups and finally an -EBUSY error from rhashtable due to too long objects list. Improve the situation by hashing the ifindex itself. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 11:36:19 -07:00
Amit Cohen	b06689cc1b	mlxsw: spectrum: Expose KVD size for Spectrum-2 Unlike Spectrum-1, the KVD (Key-value database) of Spectrum-2 is not partitioned, so only expose the entire KVD size. This enables users to query the total size of the KVD. Signed-off-by: Amit Cohen <amitc@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 11:36:19 -07:00
Yamin Friedman	f06d0ca458	linux/dim: Fix overflow in dim calculation While using net_dim, a dim_sample was used without ever initializing the comps value. Added use of DIV_ROUND_DOWN_ULL() to prevent potential overflow, it should not be a problem to save the final result in an int because after the division by epms the value should not be larger than a few thousand. [ 1040.127124] UBSAN: Undefined behaviour in lib/dim/dim.c:78:23 [ 1040.130118] signed integer overflow: [ 1040.131643] 134718714 * 100 cannot be represented in type 'int' Fixes: `398c2b05bb` ("linux/dim: Add completions count to dim_sample") Signed-off-by: Yamin Friedman <yaminf@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-25 11:34:39 -07:00
Arnd Bergmann	9eed21c01c	mlx4: avoid large stack usage in mlx4_init_hca() The mlx4_dev_cap and mlx4_init_hca_param are really too large to be put on the kernel stack, as shown by this clang warning: drivers/net/ethernet/mellanox/mlx4/main.c:3304:12: error: stack frame size of 1088 bytes in function 'mlx4_load_one' [-Werror,-Wframe-larger-than=] With gcc, the problem is the same, but it does not warn because it does not inline this function, and therefore stays just below the warning limit, while clang is just above it. Use kzalloc for dynamic allocation instead of putting them on stack. This gets the combined stack frame down to 424 bytes. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-24 15:46:49 -07:00
Arnd Bergmann	658688ce6c	net/mlx5e: xsk: dynamically allocate mlx5e_channel_param The structure is too large to put on the stack, resulting in a warning on 32-bit ARM: drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c:59:5: error: stack frame size of 1344 bytes in function 'mlx5e_open_xsk' [-Werror,-Wframe-larger-than=] Use kvzalloc() instead. Fixes: a038e9794541 ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-23 13:39:43 -07:00
Matthew Wilcox (Oracle)	d7840976e3	net: Use skb accessors in network drivers In preparation for unifying the skb_frag and bio_vec, use the fine accessors which already exist and use skb_frag_t instead of struct skb_frag_struct. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-22 20:47:56 -07:00
Pablo Neira Ayuso	14bfb13f0e	net: flow_offload: add flow_block structure and use it This object stores the flow block callbacks that are attached to this block. Update flow_block_cb_lookup() to take this new object. This patch restores the block sharing feature. Fixes: `da3eeb904f` ("net: flow_offload: add list handling functions") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-19 21:27:45 -07:00
Pablo Neira Ayuso	a732331151	net: flow_offload: rename tc_setup_cb_t to flow_setup_cb_t Rename this type definition and adapt users. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-19 21:27:45 -07:00
Pablo Neira Ayuso	0c7294ddae	net: flow_offload: remove netns parameter from flow_block_cb_alloc() No need to annotate the netns on the flow block callback object, flow_block_cb_is_busy() already checks for used blocks. Fixes: `d63db30c85` ("net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free()") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-19 21:27:45 -07:00
Chuhong Yuan	7369c10f81	net/mlx5: Replace kfree with kvfree Variable allocated by kvmalloc should not be freed by kfree. Because it may be allocated by vmalloc. So replace kfree with kvfree here. Fixes: `9b1f298236` ("net/mlx5: Add support for FW fatal reporter dump") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-18 12:11:12 -07:00
Ido Schimmel	577fa14d21	mlxsw: spectrum: Do not process learned records with a dummy FID The switch periodically sends notifications about learned FDB entries. Among other things, the notification includes the FID (Filtering Identifier) and the port on which the MAC was learned. In case the driver does not have the FID defined on the relevant port, the following error will be periodically generated: mlxsw_spectrum2 0000:06:00.0 swp32: Failed to find a matching {Port, VID} following FDB notification This is not supposed to happen under normal conditions, but can happen if an ingress tc filter with a redirect action is installed on a bridged port. The redirect action will cause the packet's FID to be changed to the dummy FID and a learning notification will be emitted with this FID - which is not defined on the bridged port. Fix this by having the driver ignore learning notifications generated with the dummy FID and delete them from the device. Another option is to chain an ignore action after the redirect action which will cause the device to disable learning, but this means that we need to consume another action whenever a redirect action is used. In addition, the scenario described above is merely a corner case. Fixes: `cedbb8b259` ("mlxsw: spectrum_flower: Set dummy FID before forward action") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Kushnarov <alexanderk@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alex Kushnarov <alexanderk@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-17 15:19:46 -07:00
Petr Machata	dedfde2fe1	mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed Spectrum systems use DSCP rewrite map to update DSCP field in egressing packets to correspond to priority that the packet has. Whether rewriting will take place is determined at the point when the packet ingresses the switch: if the port is in Trust L3 mode, packet priority is determined from the DSCP map at the port, and DSCP rewrite will happen. If the port is in Trust L2 mode, 802.1p is used for packet prioritization, and no DSCP rewrite will happen. The driver determines the port trust mode based on whether any DSCP prioritization rules are in effect at given port. If there are any, trust level is L3, otherwise it's L2. When the last DSCP rule is removed, the port is switched to trust L2. Under that scenario, if DSCP of a packet should be rewritten, it should be rewritten to 0. However, when switching to Trust L2, the driver neglects to also update the DSCP rewrite map. The last DSCP rule thus remains in effect, and packets egressing through this port, if they have the right priority, will have their DSCP set according to this rule. Fix by first configuring the rewrite map, and only then switching to trust L2 and bailing out. Fixes: `b2b1dab688` ("mlxsw: spectrum: Support ieee_setapp, ieee_delapp") Signed-off-by: Petr Machata <petrm@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-17 15:19:46 -07:00
Vlad Buslov	3d144578c9	net/mlx5e: Allow dissector meta key in tc flower Recently, fl_flow_key->indev_ifindex int field was refactored into flow_dissector_key_meta field. With this, flower classifier also sets FLOW_DISSECTOR_KEY_META flow dissector key. However, mlx5 flower dissector validation code rejects filters that use flow dissector keys that are not supported. Add FLOW_DISSECTOR_KEY_META to the list of allowed dissector keys in __parse_cls_flower() to prevent following error when offloading flower classifier to mlx5: Error: mlx5_core: Unsupported key. Fixes: `8212ed777f` ("net: sched: cls_flower: use flow_dissector for ingress ifindex") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-15 13:04:04 -07:00
Vlad Buslov	075973c7d7	net/mlx5e: Rely on filter_dev instead of dissector keys for tunnels Currently, tunnel attributes are parsed and inner header matching is used only when flow dissector specifies match on some of the supported encapsulation fields. When user tries to offload tc filter that doesn't match any encapsulation fields on tunnel device, mlx5 tc layer incorrectly sets to match packet header keys on encap header (outer header) and firmware rejects the rule with syndrome 0x7e1579 when creating new flow group. Change __parse_cls_flower() to determine whether tunnel is used based on fitler_dev tunnel info, instead of determining it indirectly by checking flow dissector enc keys. Fixes: `bbd00f7e23` ("net/mlx5e: Add TC tunnel release action for SRIOV offloads") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-15 13:04:04 -07:00
Eli Cohen	d71f895c31	net/mlx5e: Verify encapsulation is supported When mlx5e_attach_encap() calls mlx5e_get_tc_tun() to get the tunnel info data struct, check that returned value is not NULL, as would be in the case of unsupported encapsulation. Fixes: `d386939a32` ("net/mlx5e: Rearrange tc tunnel code in a modular way") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-15 13:04:04 -07:00
Fuqian Huang	3a5ee3b301	ethernet: remove redundant memset kvzalloc already zeroes the memory during the allocation. pci_alloc_consistent calls dma_alloc_coherent directly. In commit `518a2f1925` ("dma-mapping: zero memory returned from dma_alloc_*"), dma_alloc_coherent has already zeroed the memory. So the memset after these function is not needed. Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-15 11:06:27 -07:00
Vlad Buslov	3929502b95	net/mlx5e: Provide cb_list pointer when setting up tc block on rep Recent refactoring of tc block offloads infrastructure introduced new flow_block_cb_setup_simple() method intended to be used as unified way for all drivers to register offload callbacks. However, commit that actually extended all users (drivers) with block cb list and provided it to flow_block infra missed mlx5 en_rep. This leads to following NULL-pointer dereference when creating Qdisc: [ 278.385175] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 278.393233] #PF: supervisor read access in kernel mode [ 278.399446] #PF: error_code(0x0000) - not-present page [ 278.405847] PGD 8000000850e73067 P4D 8000000850e73067 PUD 8620cd067 PMD 0 [ 278.414141] Oops: 0000 [#1] SMP PTI [ 278.419019] CPU: 7 PID: 3369 Comm: tc Not tainted 5.2.0-rc6+ #492 [ 278.426580] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 278.435853] RIP: 0010:flow_block_cb_setup_simple+0xc4/0x190 [ 278.442953] Code: 10 48 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 49 89 00 48 05 00 01 00 00 49 89 40 08 31 c0 c3 b8 a1 ff ff ff c3 f3 c3 <48> 8b 06 48 39 c6 75 0a eb 1a 48 8b 00 48 39 c6 74 12 48 3b 50 28 [ 278.464829] RSP: 0018:ffffaf07c3f97990 EFLAGS: 00010246 [ 278.471648] RAX: 0000000000000000 RBX: ffff9b43ed4c7680 RCX: ffff9b43d5f80840 [ 278.480408] RDX: ffffffffc0491650 RSI: 0000000000000000 RDI: ffffaf07c3f97998 [ 278.489110] RBP: ffff9b43ddff9000 R08: ffff9b43d5f80840 R09: 0000000000000001 [ 278.497838] R10: 0000000000000009 R11: 00000000000003ad R12: ffffaf07c3f97c08 [ 278.506595] R13: ffff9b43d5f80000 R14: ffff9b43ed4c7680 R15: ffff9b43dfa20b40 [ 278.515374] FS: 00007f796be1b400(0000) GS:ffff9b43ef840000(0000) knlGS:0000000000000000 [ 278.525099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 278.532453] CR2: 0000000000000000 CR3: 0000000840398002 CR4: 00000000001606e0 [ 278.541197] Call Trace: [ 278.545252] tcf_block_offload_cmd.isra.52+0x7e/0xb0 [ 278.551871] tcf_block_get_ext+0x365/0x3e0 [ 278.557569] qdisc_create+0x15c/0x4e0 [ 278.562859] ? kmem_cache_alloc_trace+0x1a2/0x1c0 [ 278.569235] tc_modify_qdisc+0x1c8/0x780 [ 278.574761] rtnetlink_rcv_msg+0x291/0x340 [ 278.580518] ? _cond_resched+0x15/0x40 [ 278.585856] ? rtnl_calcit.isra.29+0x120/0x120 [ 278.591868] netlink_rcv_skb+0x4a/0x110 [ 278.597198] netlink_unicast+0x1a0/0x250 [ 278.602601] netlink_sendmsg+0x2c1/0x3c0 [ 278.608022] sock_sendmsg+0x5b/0x60 [ 278.612969] ___sys_sendmsg+0x289/0x310 [ 278.618231] ? do_wp_page+0x99/0x730 [ 278.623216] ? page_add_new_anon_rmap+0xbe/0x140 [ 278.629298] ? __handle_mm_fault+0xc84/0x1360 [ 278.635113] ? __sys_sendmsg+0x5e/0xa0 [ 278.640285] __sys_sendmsg+0x5e/0xa0 [ 278.645239] do_syscall_64+0x5b/0x1b0 [ 278.650274] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 278.656697] RIP: 0033:0x7f796abdeb87 [ 278.661628] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 278.683248] RSP: 002b:00007ffde213ba48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 278.692245] RAX: ffffffffffffffda RBX: 000000005d261e6f RCX: 00007f796abdeb87 [ 278.700862] RDX: 0000000000000000 RSI: 00007ffde213bab0 RDI: 0000000000000003 [ 278.709527] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000006 [ 278.718167] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 278.726743] R13: 000000000067b580 R14: 0000000000000000 R15: 0000000000000000 [ 278.735302] Modules linked in: dummy vxlan ip6_udp_tunnel udp_tunnel sch_ingress nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc sunrpc mlx5_ib ib_uverbs intel_rapl ib_core sb_edac x86_pkg_temp_ thermal intel_powerclamp coretemp kvm_intel kvm mlx5_core irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel ses mei_me enclosure mlxfw ipmi_ssif intel_cstate iTCO_wdt ptp mei pps_core iTCO_vendor_support pcspkr joydev intel_uncore i2c_i801 ipmi_si lpc_ich intel_rapl_perf ioatdma wmi dca pcc_cpufreq ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad ast i2c_algo_bit drm_k ms_helper ttm drm mpt3sas raid_class scsi_transport_sas [ 278.802263] CR2: 0000000000000000 [ 278.807170] ---[ end trace b1f0a442a279e66f ]--- Extend en_rep with new static mlx5e_rep_block_cb_list list and pass it to flow_block_cb_setup_simple() function instead of hardcoded NULL pointer. Fixes: `955bcb6ea0` ("drivers: net: use flow block API") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-12 15:27:41 -07:00
Nathan Chancellor	9db7e618fc	net/mlx5e: Convert single case statement switch statements into if statements During the review of commit `1ff2f0fa45` ("net/mlx5e: Return in default case statement in tx_post_resync_params"), Leon and Nick pointed out that the switch statements can be converted to single if statements that return early so that the code is easier to follow. Suggested-by: Leon Romanovsky <leon@kernel.org> Suggested-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-12 11:37:03 -07:00
David S. Miller	114a5c3240	mlx5-fixes-2019-07-11 -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0ng7AACgkQSD+KveBX +j41FQgAjcUgjCiq6gmwI62EsMpZjgjwKhGfYz56jwSmNDgG7Ltj9o/K/+b+oL3u ZVPO9YgNstWuoPyApE12MzsA+kx/ydRz8GW6IrI5v71JK3yy8Mw/HrYVyUKEnzH4 BVnSRnvHgvCnjZ3SjmUb6Xs74vmoxti9cYr0w0XtBogzuXlfpSYILYFiqrkks6sH VEmD9P3dTBNQUviYwkRcHhJoI7NqdQBApn9/30F4dQsmnUsikoaom5GovrbBw817 L3Q+U/9+2Xp5k1i7J2MLFSL6Vyaxs/48eRKxs3YubBmrtwBjPj+aOYdRdDOscguQ WcDRiYPOW72cGIlFtYyuzApqo+J+xw== =mq0x -----END PGP SIGNATURE----- Merge tag 'mlx5-fixes-2019-07-11' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-11 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn') For -stable v5.1 ('net/mlx5e: Fix port tunnel GRE entropy control') ('net/mlx5e: Rx, Fix checksum calculation for new hardware') ('net/mlx5e: Fix return value from timeout recover function') ('net/mlx5e: Fix error flow in tx reporter diagnose') For -stable v5.2 ('net/mlx5: E-Switch, Fix default encap mode') Conflict note: This pull request will produce a small conflict when merged with net-next. In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c Take the hunk from net and replace: esw_offloads_steering_init(esw, vf_nvports, total_nvports); with: esw_offloads_steering_init(esw); ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-11 15:06:37 -07:00
Saeed Mahameed	9446d17e0e	net/mlx5: E-Switch, Reduce ingress acl modify metadata stack usage Fix the following compiler warning: In function ‘esw_vport_add_ingress_acl_modify_metadata’: the frame size of 1084 bytes is larger than 1024 bytes [-Wframe-larger-than=] Since the structure is never written to, we can statically allocate it to avoid the stack usage. Fixes: `7445cfb116` ("net/mlx5: E-Switch, Tag packet with vport number in VF vports and uplink ingress ACLs") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Jianbo Liu <jianbol@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-11 15:04:39 -07:00
Saeed Mahameed	2f1f5a7731	net/mlx5e: Fix unused variable warning when CONFIG_MLX5_ESWITCH is off In mlx5e_setup_tc "priv" variable is not being used if CONFIG_MLX5_ESWITCH is off, one way to fix this is to actually use it. mlx5e_setup_tc_mqprio also needs the "priv" variable and it extracts it on its own. We can simply pass priv to mlx5e_setup_tc_mqprio instead of netdev and avoid extracting the priv var, which will also resolve the compiler warning. Fixes: `4e95bc268b` ("net: flow_offload: add flow_block_cb_setup_simple()") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> CC: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-11 15:04:38 -07:00
Tariq Toukan	c93dfec10f	net/mlx5e: Fix compilation error in TLS code In the cited patch below, the Kconfig flags combination of: CONFIG_MLX5_FPGA is not set CONFIG_MLX5_TLS=y CONFIG_MLX5_EN_TLS=y leads to the compilation error: ./include/linux/mlx5/device.h:61:39: error: invalid application of sizeof to incomplete type struct mlx5_ifc_tls_flow_bits. Fix it. Fixes: 90687e1a9a50 ("net/mlx5: Kconfig, Better organize compilation flags") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> CC: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-11 15:04:38 -07:00
Aya Levin	ef1ce7d7b6	net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn Check return value from mlx5e_attach_netdev, add error path on failure. Fixes: `48935bbb7a` ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:04 -07:00
Aya Levin	99d31cbd89	net/mlx5e: Fix error flow in tx reporter diagnose Fix tx reporter's diagnose callback. Propagate error when failing to gather diagnostics information or failing to print diagnostic data per queue. Fixes: `de8650a820` ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:04 -07:00
Aya Levin	39825350ae	net/mlx5e: Fix return value from timeout recover function Fix timeout recover function to return a meaningful return value. When an interrupt was not sent by the FW, return IO error instead of 'true'. Fixes: `c7981bea48` ("net/mlx5e: Fix return status of TX reporter timeout recover") Signed-off-by: Aya Levin <ayal@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:03 -07:00
Saeed Mahameed	db849faa9b	net/mlx5e: Rx, Fix checksum calculation for new hardware CQE checksum full mode in new HW, provides a full checksum of rx frame. Covering bytes starting from eth protocol up to last byte in the received frame (frame_size - ETH_HLEN), as expected by the stack. Fixing up skb->csum by the driver is not required in such case. This fix is to avoid wrong checksum calculation in drivers which already support the new hardware with the new checksum mode. Fixes: `85327a9c41` ("net/mlx5: Update the list of the PCI supported devices") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:03 -07:00
Eli Britstein	914adbb1bc	net/mlx5e: Fix port tunnel GRE entropy control GRE entropy calculation is a single bit per card, and not per port. Force disable GRE entropy calculation upon the first GRE encap rule, and release the force at the last GRE encap rule removal. This is done per port. Fixes: `97417f6182` ("net/mlx5e: Fix GRE key by controlling port tunnel entropy calculation") Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:03 -07:00
Maor Gottlieb	9a64144d68	net/mlx5: E-Switch, Fix default encap mode Encap mode is related to switchdev mode only. Move the init of the encap mode to eswitch_offloads. Before this change, we reported that eswitch supports encap, even tough the device was in non SRIOV mode. Fixes: `7768d1971d` ('net/mlx5: E-Switch, Add control for encapsulation') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-11 11:45:03 -07:00
Nathan Chancellor	1ff2f0fa45	net/mlx5e: Return in default case statement in tx_post_resync_params clang warns: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2: warning: variable 'rec_seq_sz' is used uninitialized whenever switch default is taken [-Wsometimes-uninitialized] default: ^~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note: uninitialized use occurs here skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz); ^~~~~~~~~~ drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note: initialize the variable 'rec_seq_sz' to silence this warning u16 rec_seq_sz; ^ = 0 1 warning generated. This case statement was clearly designed to be one that should not be hit during runtime because of the WARN_ON statement so just return early to prevent copying uninitialized memory up into rn_be. Fixes: `d2ead1f360` ("net/mlx5e: Add kTLS TX HW offload support") Link: https://github.com/ClangBuiltLinux/linux/issues/590 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 21:40:20 -07:00
David S. Miller	cacf32e997	mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). Return value was changes to 'int' from void but this return statement was not updated, or it slipped in via a merge. Fixes: `b5d9a834f4` ("net/tls: don't clear TX resync flag on error") Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 21:35:08 -07:00
Pablo Neira Ayuso	f9e30088d2	net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload And any other existing fields in this structure that refer to tc. Specifically: * tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule(). * TC_CLSFLOWER_* to FLOW_CLS_. tc_cls_common_offload to tc_cls_common_offload. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:51 -07:00
Pablo Neira Ayuso	0d4fd02e71	net: flow_offload: add flow_block_cb_is_busy() and use it This patch adds a function to check if flow block callback is already in use. Call this new function from flow_block_cb_setup_simple() and from drivers. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso	955bcb6ea0	drivers: net: use flow block API This patch updates flow_block_cb_setup_simple() to use the flow block API. Several drivers are also adjusted to use it. This patch introduces the per-driver list of flow blocks to account for blocks that are already in use. Remove tc_block_offload alias. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso	32f8c4093a	net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* Rename from TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* and remove temporary tcf_block_binder_type alias. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso	9c0e189ec9	net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND Rename from TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND and remove temporary tc_block_command alias. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:50 -07:00
Pablo Neira Ayuso	4e95bc268b	net: flow_offload: add flow_block_cb_setup_simple() Most drivers do the same thing to set up the flow block callbacks, this patch adds a helper function to do this. This preparation patch reduces the number of changes to adapt the existing drivers to use the flow block callback API. This new helper function takes a flow block list per-driver, which is set to NULL until this driver list is used. This patch also introduces the flow_block_command and flow_block_binder_type enumerations, which are renamed to use FLOW_BLOCK_* in follow up patches. There are three definitions (aliases) in order to reduce the number of updates in this patch, which go away once drivers are fully adapted to use this flow block API. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 14:38:50 -07:00
Parav Pandit	f60f315d33	net/mlx5e: Register devlink ports for physical link, PCI PF, VFs Register devlink port of physical port, PCI PF and PCI VF flavour for each PF, VF when a given devlink instance is in switchdev mode. Implement ndo_get_devlink_port callback API to make use of registered devlink ports. This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id() callbacks. Hence, remove them. An example output with 2 VFs, without a PF and single uplink port is below. $devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0 pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1 Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-09 12:02:13 -07:00
Dirk van der Merwe	b5d9a834f4	net/tls: don't clear TX resync flag on error Introduce a return code for the tls_dev_resync callback. When the driver TX resync fails, kernel can retry the resync again until it succeeds. This prevents drivers from attempting to offload TLS packets if the connection is known to be out of sync. We don't worry about the RX resync since they will be retried naturally as more encrypted records get received. Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 20:21:09 -07:00
David S. Miller	af144a9834	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Two cases of overlapping changes, nothing fancy. Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 19:48:57 -07:00
Ivan Khoronzhuk	1da4bbeffe	net: core: page_pool: add user refcnt and reintroduce page_pool_destroy Jesper recently removed page_pool_destroy() (from driver invocation) and moved shutdown and free of page_pool into xdp_rxq_info_unreg(), in-order to handle in-flight packets/pages. This created an asymmetry in drivers create/destroy pairs. This patch reintroduce page_pool_destroy and add page_pool user refcnt. This serves the purpose to simplify drivers error handling as driver now drivers always calls page_pool_destroy() and don't need to track if xdp_rxq_info_reg_mem_model() was unsuccessful. This could be used for a special cases where a single RX-queue (with a single page_pool) provides packets for two net_device'es, and thus needs to register the same page_pool twice with two xdp_rxq_info structures. This patch is primarily to ease API usage for drivers. The recently merged netsec driver, actually have a bug in this area, which is solved by this API change. This patch is a modified version of Ivan Khoronzhuk's original patch. Link: https://lore.kernel.org/netdev/20190625175948.24771-2-ivan.khoronzhuk@linaro.org/ Fixes: `5c67bf0ec4` ("net: netsec: Use page_pool API") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-08 14:58:04 -07:00
Tariq Toukan	d2ead1f360	net/mlx5e: Add kTLS TX HW offload support Add support for transmit side kernel-TLS acceleration. Offload the crypto encryption to HW. Per TLS connection: - Use a separate TIS to maintain the HW context. - Use a separate encryption key. - Maintain static and progress HW contexts by posting the proper WQEs at creation time, or upon resync. - Use a special DUMP opcode to replay the previous frags and sync the HW context. To make sure the SQ is able to serve an xmit request, increase SQ stop room to cover: - static params WQE, - progress params WQE, and - resync DUMP per frag. Currently supporting TLS 1.2, and key size 128bit. Tested over SimX simulator. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:20 -07:00
Tariq Toukan	37badd159c	net/mlx5e: Introduce a fenced NOP WQE posting function Similar to the existing mlx5e_post_nop(), but marks a fence in the WQE control segment. Added as a separate new function to not hurt the performance of the common case. To be used in a downstream patch of the series. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	2b257a6e73	net/mlx5e: Re-work TIS creation functions Let the EN TIS creation function (mlx5e_create_tis) be responsible for applying common mdev related fields. Other specific fields must be set by the caller and passed within the inbox. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	01614d4f60	net/mlx5e: Tx, Unconstify SQ stop room Use an SQ field for stop_room, and use the larger value only if TLS is supported. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Eran Ben Elisha	9ab0233728	net/mlx5e: Tx, Don't implicitly assume SKB-less wqe has one WQEBB When polling a CQE of an SKB-less WQE, don't assume it consumed only one WQEBB. Use wi->num_wqebbs directly instead. In the downstream patch, SKB-less WQEs might have more the one WQEBB, thus this change is needed. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	fd1b225963	net/mlx5e: Tx, Make SQ WQE fetch function type generic Change mlx5e_sq_fetch_wqe to be agnostic to the Work Queue Element (WQE) type. Before this patch, it was specific for struct mlx5e_tx_wqe. In order to allow the change, the function now returns the generic void pointer, and gets the WQE size to do the zero memset. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	740114a87e	net/mlx5e: Tx, Enforce L4 inline copy when needed When ctrl->tisn field exists, this indicates an operation (HW offload) on the TCP payload. For such WQEs, inline the headers up to L4. This is in preparation for kTLS HW offload support, added in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	542578c679	net/mlx5e: Move helper functions to a new txrx datapath header Take datapath helper functions to a new header file en/txrx.h. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	fc707e59c9	net/mlx5: Accel, Add core TLS support for the Connect-X family Add support for the new TLS implementation of the Connect-X family. Introduce a new compilation flag MLX5_TLS for it. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	45d3b55dc6	net/mlx5: Add crypto library to support create/destroy encryption key Encryption key create / destroy is done via CREATE_GENERAL_OBJECT / DESTROY_GENERAL_OBJECT commands. To be used in downstream patches by TLS API wrappers, to configure the TIS context with the encryption key. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	e2869fb206	net/mlx5: Kconfig, Better organize compilation flags Always contain all acceleration functions declarations in 'accel' files, independent to the flags setting. For this, introduce new flags CONFIG_FPGA_{IPSEC/TLS} and use stubs where needed. This obsoletes the need for stubs in 'fpga' files. Remove them. Also use the new flags in Makefile, to decide whether to compile TLS-specific or IPSEC-specific objects, or not. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
Tariq Toukan	c778dd31ac	net/mlx5: Accel, Expose accel wrapper for IPsec FPGA function Do not directly call fpga version of IPsec function from main.c. Wrap it by an accel version, and call the wrapper. This will allow deprecating the FPGA IPsec stubs in downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:29:19 -07:00
David S. Miller	61c2491db7	mlx5-update-2019-07-04 This series adds mlx5 support for devlink fw versions query. 1) Implement the required low level firmware commands 2) Implement the devlink knobs and callbacks for fw versions query. -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAl0eZOQACgkQSD+KveBX +j6a0gf/SbSY2yDz6iyDZy7Zt0e7onD395UyWIoXbXJAI/aMggvdNE233dh369xa d4Zo8zvFijiFpPKmP0p+WZfvR5YPPWgtOYDv1MbaTz5lqsGYUh2kN9FaTyCNZW3O jeNidxjsne70VBrpmOZJX2mxduKwRdxchRg2iQPdeZQGO5Uq3nEdOfJVx/mhsIGw iRczF4IAQpArZEyKXgqMSrtJ99FJ7JSJXnsqjVJyUC1zyCZ+Cq3VV55+u5VCBokd qr/VSYqFd38SEHR5l4nXDXLX23pOwWNhUdPgGopjak8bN3k19vk6GfQ2bzLFnaOW 4fx1k4uTixKOrbLkgzay/YSsoF2TDw== =Bwl7 -----END PGP SIGNATURE----- Merge tag 'mlx5-updates-2019-07-04-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-update-2019-07-04 This series adds mlx5 support for devlink fw versions query. 1) Implement the required low level firmware commands 2) Implement the devlink knobs and callbacks for fw versions query. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 16:24:27 -07:00
Shalom Toledo	72458e2794	mlxsw: spectrum_ptp: Apply the PTP shaper enable/disable logic Apply by filling the PTP shaper parameters array. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	5fc1733897	mlxsw: spectrum: Set up PTP shaper when port status has changed When getting port up down event (PUDE), change the PTP shaper configuration based on hardware time stamping on/off and the port's speed. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	eceed3b145	mlxsw: spectrum_ptp: Enable/disable PTP shaper on a port when getting HWTSTAMP on/off In order to get more accurate hardware time stamping, the driver needs to enable PTP shaper on the port, for speeds lower than 40 Gbps. Enable the PTP shaper on the port when the user turns on the hardware time stamping, and disable it when the user turns off the hardware time stamping. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	4ae5cc42d3	mlxsw: spectrum: Add new operation for getting the port's speed New operation for getting the port's speed as part of port-type-speed operations. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	399569cb0a	mlxsw: spectrum_ptp: Set the PTP shaper parameters Set the PTP shaper parameters during the ptp_init(). For different speeds, there are different parameters. When the port's speed changes and PTP shaper is enabled, the firmware changes the ETS shaper values according to the PTP shaper parameters for this new speed. The PTP shaper parameters array is left empty for now, will be filled in a follow-up patch. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	71147506a9	mlxsw: reg: Add QoS PTP Shaper Configuration Register The QPSC allows advanced configuration of the PTP shapers. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	ea7bb579fc	mlxsw: spectrum: Add note about the PTP shaper Add note about disabling the PTP shaper when calling to mlxsw_sp_port_ets_maxrate_set(). Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:57 -07:00
Shalom Toledo	12f0e2e9ad	mlxsw: reg: Add ptps field in QoS ETS Element Configuration Register The PTP Shaper field is used for enabling and disabling of port-rate based shaper which is slightly lower than port rate. Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-05 15:28:56 -07:00
Shay Agroskin	8338d93788	net/mlx5: Added devlink info callback The callback is invoked using 'devlink dev info <pci>' command and returns the running and pending firmware version of the HCA and the name of the kernel driver. If there is a pending firmware version (a new version is burned but the HCA still runs with the previous) it is returned as the stored firmware version. Otherwise, the running version is returned for this field. Output example: $ devlink dev info pci/0000:00:06.0 pci/0000:00:06.0: driver mlx5_core versions: fixed: fw.psid MT_0000000009 running: fw.version 16.26.0100 stored: fw.version 16.26.0100 Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-04 16:43:16 -04:00
Shay Agroskin	9c86b07e30	net/mlx5: Added fw version query command Using the MCQI and MCQS registers, we query the running and pending fw version of the HCA. The MCQS is queried with sequentially increasing component index, until a component of type BOOT_IMG is found. Querying this component's version using the MCQI register yields the running and pending fw version of the HCA. Querying MCQI for the pending fw version should be done only after validating that such fw version exists. This is done my checking 'component update state' field in MCQS output. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-04 16:43:15 -04:00
Saeed Mahameed	e08a976a16	Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Misc updates from mlx5-next branch: 1) Add the required HW definitions and structures for upcoming TLS support. 2) Add support for MCQI and MCQS hardware registers for fw version query. 3) Added hardware bits and structures definitions for sub-functions 4) Small code cleanup and improvement for PF pci driver. 5) Bluefield (ECPF) updates and refactoring for better E-Switch management on ECPF embedded CPU NIC: 5.1) Consolidate querying eswitch number of VFs 5.2) Register event handler at the correct E-Switch init stage 5.3) Setup PF's inline mode and vlan pop when the ECPF is the E-Swtich manager ( the host PF is basically a VF ). 5.4) Handle Vport UC address changes in switchdev mode. 6) Cleanup the rep and netdev reference when unloading IB rep. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> i# All conflicts fixed but you are still merging.	2019-07-04 16:42:59 -04:00
David S. Miller	c4cde5804d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your net-next tree. There is a minor merge conflict in mlx5 due to `8960b38932` ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel c, struct dim_cq_moder moder, struct mlx5e_cq_param param, struct mlx5e_cq cq) ======= int mlx5e_open_cq(struct mlx5e_channel c, struct net_dim_cq_moder moder, struct mlx5e_cq_param param, struct mlx5e_cq cq) >>>>>>> `e5a3e259ef` Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device netdev = priv->netdev; struct mlx5e_channel c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> `e5a3e259ef` Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-04 12:48:21 -07:00
Parav Pandit	dd28087c14	net/mlx5: Refactor mlx5_esw_query_functions for modularity Functions change event output data size changes when functions other than VFs will be enabled in HCA CAP. With current API, multiple callers needs to align, calculate accurate size of the output data depending on number on non VF functions enabled in the device. Instead of duplicating such math at multiple places, refactor mlx5_esw_query_functions() to return raw output allocated by itself. Caller must free the allocated memory using kvfree() as described in the function comment section. This hides calcuation within mlx5_esw_query_functions() and provides simpler API. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-03 12:50:42 -07:00
Parav Pandit	7e736f9ae3	net/mlx5: E-Switch prepare functions change handler to be modular Eswitch function change handler will service multiple type of events for VFs and non VF functions update. Hence, introduce and use the helper function esw_vfs_changed_event_handler() for handling change in num VFs to improve the code readability. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-03 12:50:42 -07:00
Parav Pandit	2752b82316	net/mlx5: Introduce and use mlx5_eswitch_get_total_vports() Instead MLX5_TOTAL_VPORTS, use mlx5_eswitch_get_total_vports(). mlx5_eswitch_get_total_vports() in subsequent patch accounts for SF vports as well. Expanding MLX5_TOTAL_VPORTS macro would require exposing SF internals to more generic vport.h header file. Such exposure is not desired. Hence a mlx5_eswitch_get_total_vports() is introduced. Given that mlx5_eswitch_get_total_vports() API wants to work on const mlx5_core_dev, change its helper functions also to accept const dev. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>	2019-07-03 12:50:42 -07:00
Yishai Hadas	4e0e2ea188	net/mlx5: Report EQE data upon CQ completion Report EQE data upon CQ completion to let upper layers use this data. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 21:00:20 +03:00
Yishai Hadas	70a43d3fd4	net/mlx5: Report a CQ error event only when a handler was set Report a CQ error event only when a handler was set. This enables mlx5_ib to not set a handler upon CQ creation and use some other mechanism to get this event as of other events by the mlx5_eq_notifier_register API. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 20:59:40 +03:00
Yishai Hadas	38164b7719	net/mlx5: mlx5_core_create_cq() enhancements Enhance mlx5_core_create_cq() to get the command out buffer from the callers to let them use the output. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 20:59:32 +03:00
Yishai Hadas	c0670781f5	net/mlx5: Expose the API to register for ANY event Expose the API to register for ANY event, mlx5_ib will be able to use this functionality for its needs. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 20:56:29 +03:00
Yishai Hadas	b9a7ba5562	net/mlx5: Use event mask based on device capabilities Use the reported device capabilities for the supported user events (i.e. affiliated and un-affiliated) to set the EQ mask. As the event mask can be up to 256 defined by 4 entries of u64 change the applicable code to work accordingly. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 20:55:45 +03:00
Yishai Hadas	1d49ce1e05	net/mlx5: Fix mlx5_core_destroy_cq() error flow The firmware command to destroy a CQ might fail when the object is referenced by other object and the ref count is managed by the firmware. To enable a second successful destruction post the first failure need to change mlx5_eq_del_cq() to be a void function. As an error in mlx5_eq_del_cq() is quite fatal from the option to recover, a debug message inside it should be good enougth and it was changed to be void. Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>	2019-07-03 20:54:57 +03:00
Petr Machata	dbcdb61aaf	mlxsw: spectrum_ptp: Fix validation in mlxsw_sp1_ptp_packet_finish() Before mlxsw_sp1_ptp_packet_finish() sends the packet back, it validates whether the corresponding port is still valid. However the condition is incorrect: when mlxsw_sp_port == NULL, the code dereferences the port to compare it to skb->dev. The condition needs to check whether the port is present and skb->dev still refers to that port (or else is NULL). If that does not hold, bail out. Add a pair of parentheses to fix the condition. Fixes: `d92e4e6e33` ("mlxsw: spectrum: PTP: Support timestamping on Spectrum-1") Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-02 15:31:20 -07:00
Cong Wang	d39d714969	idr: introduce idr_for_each_entry_continue_ul() Similarly, other callers of idr_get_next_ul() suffer the same overflow bug as they don't handle it properly either. Introduce idr_for_each_entry_continue_ul() to help these callers iterate from a given ID. cls_flower needs more care here because it still has overflow when does arg->cookie++, we have to fold its nested loops into one and remove the arg->cookie++. Fixes: `01683a1469` ("net: sched: refactor flower walk to iterate over idr") Fixes: `12d6066c3b` ("net/mlx5: Add flow counters idr") Reported-by: Li Shuang <shuali@redhat.com> Cc: Davide Caratti <dcaratti@redhat.com> Cc: Vlad Buslov <vladbu@mellanox.com> Cc: Chris Mi <chrism@mellanox.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Tested-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 19:15:46 -07:00
Petr Machata	87ee07f8e2	mlxsw: spectrum: PTP: Support ethtool get_ts_info The get_ts_info callback is used for obtaining information about timestamping capabilities of a network device. On Spectrum-1, implement it to advertise the PHC and the capability to do HW timestamping, and the supported RX and TX filters. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:35 -07:00
Petr Machata	8748642751	mlxsw: spectrum: PTP: Support SIOCGHWTSTAMP, SIOCSHWTSTAMP ioctls The SIOCSHWTSTAMP ioctl configures HW timestamping on a given port. Dispatch the ioctls to per-chip handler (which add to ptp_ops). Find which PTP messages need to be timestamped and configure MTPPPC accordingly. The SIOCGHWTSTAMP ioctl is getter for the current configuration. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00
Petr Machata	a773c76cb8	mlxsw: spectrum: PTP: Configure PTP traps and FIFO events Configure MTPTPT to set which message types should arrive under which PTP trap, and MOGCR to clear the timestamp queue after its contents are reported through PTP_ING_FIFO or PTP_EGR_FIFO. With this configuration, PTP packets start arriving through the PTP traps. However since timestamping is disabled by default and there is currently no way to enable it, they will not be timestamped. Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2019-07-01 18:58:34 -07:00

... 2 3 4 5 6 ...

5569 Commits