Commit Graph

4625 Commits

Author SHA1 Message Date
Jiri Pirko 5c661f142c mlxsw: reg: Add multi field to PAGT register
For Spectrum-2 this allows parallel lookups in multiple regions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko a339bf8aaf mlxsw: spectrum_acl: Pass hints priv all the way to ERP code
The hints priv comes from ERP code and it is possible to obtain it from
TCAM code. Add arg to appropriate functions so the hints
priv could be passed back down to ERP code. Pass NULL now as the
follow-up patches would pass an actual hints priv pointer.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:50 -08:00
Jiri Pirko 29a2102a29 mlxsw: spectrum_acl: Implement basic ERP rehash hits creation
Introduce an initial implementation of rehash logic in ERP code.
Currently, the rehash is considered as needed only in case number of
roots in the hints is smaller than the number of roots actually in use.
In that case return hints pointer and let it be obtained through the
callpath through the Spectrum-2 TCAM op.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko c4c2dc5429 mlxsw: spectrum_acl: Split entry struct into entry and ventry
Do the split of entry struct so the new entry struct is related to the
actual HW entry, whereas ventry struct is a SW abstration of that.
This split prepares possibility for ventry to hold 2 HW entries which
is needed for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko b2d6b4d2be mlxsw: spectrum_acl: Split chunk struct into chunk and vchunk
Do the split of chunk struct so the new chunk struct is related to the
actual HW chunk (differs between Spectrum and Spectrum-2),
whereas vchunk struct is a SW abstraction of that. This split
prepares possibility for vchunk to hold 2 HW chunks which is needed
for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 0f54236da0 mlxsw: spectrum_acl: Split region struct into region and vregion
Do the split of region struct so the new region struct is related to the
actual HW region, whereas vregion struct is a SW abstration of that.
This split prepares possibility for vregion to hold 2 HW regions which
is needed for region ERP rehash flow.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 9069a3817d lib: objagg: implement optimization hints assembly and use hints for object creation
Implement simple greedy algo to find more optimized root-delta tree for
a given objagg instance. This "hints" can be used by a driver to:
1) check if the hints are better (driver's choice) than the original
   objagg tree. Driver does comparison of objagg stats and hints stats.
2) use the hints to create a new objagg instance which will construct
   the root-delta tree according to the passed hints. Currently, only a
   simple greedy algorithm is implemented. Basically it finds the roots
   according to the maximal possible user count including deltas.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
Jiri Pirko 7c62cfb8c5 devlink: publish params only after driver init is done
Currently, user can do dump or get of param values right after the
devlink params are registered. However the driver may not be initialized
which is an issue. The same problem happens during notification
upon param registration. Allow driver to publish devlink params
whenever it is ready to handle get() ops. Note that this cannot
be resolved by init reordering, as the "driverinit" params have
to be available before the driver is initialized (it needs the param
values there).

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:02:49 -08:00
David S. Miller a655fe9f19 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
An ipvlan bug fix in 'net' conflicted with the abstraction away
of the IPV6 specific support in 'net-next'.

Similarly, a bug fix for mlx5 in 'net' conflicted with the flow
action conversion in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-08 15:00:17 -08:00
Eran Ben Elisha 7d91126b1a net/mlx5e: Add tx timeout support for mlx5e tx reporter
With this patch, ndo_tx_timeout callback will be redirected to the tx
reporter in order to detect a tx timeout error and report it to the
devlink health. (The watchdog detects tx timeouts, but the driver verify
the issue still exists before launching any recover method).

In addition, recover from tx timeout in case of lost interrupt was added
to the tx reporter recover method. The tx timeout recover from lost
interrupt is not a new feature in the driver, this patch re-organize the
functionality and move it to the tx reporter recovery flow.

tx timeout example:
(with auto_recover set to false, if set to true, the manual recover and
diagnose sections are irrelevant)

$cat /sys/kernel/debug/tracing/trace
...
devlink_health_report: bus_name=pci dev_name=0000:00:09.0
driver_name=mlx5_core reporter_name=tx: TX timeout on queue: 0, SQ: 0x8a,
CQ: 0x35, SQ Cons: 0x2 SQ Prod: 0x2, usecs since last trans: 14912000

$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 0 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": true
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: true
  sqn: 142 HW state: 1 stopped: false

$devlink health recover pci/0000:00:09 reporter tx
$devlink health show
pci/0000:00:09.0:
  name tx
    state healthy #err 1 #recover 1 last_dump_ts N/A
    parameters:
      grace_period 500 auto_recover false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07 10:34:29 -08:00
Eran Ben Elisha de8650a820 net/mlx5e: Add tx reporter support
Add mlx5e tx reporter to devlink health reporters. This reporter will be
responsible for diagnosing, reporting and recovering of tx errors.
This patch declares the TX reporter operations and creates it using the
devlink health API. Currently, this reporter supports reporting and
recovering from send error CQE only. In addition, it adds diagnose
information for the open SQs.

For a local SQ recover (due to driver error report), in case of SQ recover
failure, the recover operation will be considered as a failure.
For a full tx recover, an attempt to close and open the channels will be
done. If this one passed successfully, it will be considered as a
successful recover.

The SQ recover from error CQE flow is not a new feature in the driver,
this patch re-organize the functions and adapt them for the devlink
health API. For this purpose, move code from en_main.c to a new file
named reporter_tx.c.

Diagnose output:
$devlink health diagnose pci/0000:00:09.0 reporter tx -j -p
{
    "SQs": [ {
            "sqn": 138,
            "HW state": 1,
            "stopped": false
        },{
            "sqn": 142,
            "HW state": 1,
            "stopped": false
        } ]
}

$devlink health diagnose pci/0000:00:09.0 reporter tx
SQs:
  sqn: 138 HW state: 1 stopped: false
  sqn: 142 HW state: 1 stopped: false

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-07 10:34:29 -08:00
Tonghao Zhang 218d05ce32 net/mlx5e: Don't overwrite pedit action when multiple pedit used
In some case, we may use multiple pedit actions to modify packets.
The command shown as below: the last pedit action is effective.

$ tc filter add dev netdev_rep parent ffff: protocol ip prio 1    \
	flower skip_sw ip_proto icmp dst_ip 3.3.3.3        \
	action pedit ex munge ip dst set 192.168.1.100 pipe    \
	action pedit ex munge eth src set 00:00:00:00:00:01 pipe    \
	action pedit ex munge eth dst set 00:00:00:00:00:02 pipe    \
	action csum ip pipe    \
	action tunnel_key set src_ip 1.1.1.100 dst_ip 1.1.1.200 dst_port 4789 id 100 \
	action mirred egress redirect dev vxlan0

To fix it, we add max_mod_hdr_actions to mlx5e_tc_flow_parse_attr struction,
max_mod_hdr_actions will store the max pedit action number we support and
num_mod_hdr_actions indicates how many pedit action we used, and store all
pedit action to mod_hdr_actions.

Fixes: d79b6df6b1 ("net/mlx5e: Add parsing of TC pedit actions to HW format")
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 17:12:13 -08:00
Tonghao Zhang 6707f74be8 net/mlx5e: Update hw flows when encap source mac changed
When we offload tc filters to hardware, hardware flows can
be updated when mac of encap destination ip is changed.
But we ignore one case, that the mac of local encap ip can
be changed too, so we should also update them.

To fix it, add route_dev in mlx5e_encap_entry struct to save
the local encap netdevice, and when mac changed, kernel will
flush all the neighbour on the netdevice and send NETEVENT_NEIGH_UPDATE
event. The mlx5 driver will delete the flows and add them when neighbour
available again.

Fixes: 232c001398 ("net/mlx5e: Add support to neighbour update flow")
Cc: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 17:12:13 -08:00
Ido Schimmel 2810c3b252 mlxsw: spectrum_router: Offload blackhole routes
Create a new FIB entry type for blackhole routes and set it in case the
type of the notified route is 'RTN_BLACKHOLE'.

Program such routes with a discard action and mark them as offloaded
since the device is dropping the packets instead of the kernel.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:24:05 -08:00
Florian Fainelli 25ba860514 mlxsw: Implement ndo_get_port_parent_id()
mlxsw implements SWITCHDEV_ATTR_ID_PORT_PARENT_ID and we want to get rid
of switchdev_ops eventually, ease that migration by implementing a
ndo_get_port_parent_id() function which returns what
switchdev_port_attr_get() would do.

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:16:11 -08:00
Florian Fainelli 6dcfa23438 net/mlx5e: Implement ndo_get_port_parent_id()
mlx5e only supports SWITCHDEV_ATTR_ID_PORT_PARENT_ID, which makes it a
great candidate to be converted to use the ndo_get_port_parent_id() NDO
instead of implementing switchdev_port_attr_get().

Since mlx5e makes use of switchdev_port_parent_id() convert it to use
netdev_port_same_parent_id().

Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 14:16:11 -08:00
Nir Dotan d32d02a548 mlxsw: core: Trace EMAD errors
Trace EMAD errors returned from HW.

Signed-off-by: Nir Dotan <nird@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 11:05:57 -08:00
Pablo Neira Ayuso 7386788175 drivers: net: use flow action infrastructure
This patch updates drivers to use the new flow action infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso 3b1903ef97 flow_offload: add statistics retrieval infrastructure and use it
This patch provides the flow_stats structure that acts as container for
tc_cls_flower_offload, then we can use to restore the statistics on the
existing TC actions. Hence, tcf_exts_stats_update() is not used from
drivers anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso c500c86b0c net/mlx5e: support for two independent packet edit actions
This patch adds pedit_headers_action structure to store the result of
parsing tc pedit actions. Then, it calls alloc_tc_pedit_action() to
populate the mlx5e hardware intermediate representation once all actions
have been parsed.

This patch comes in preparation for the new flow_action infrastructure,
where each packet mangling comes in an separated action, ie. not packed
as in tc pedit.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Pablo Neira Ayuso 8f2566225a flow_offload: add flow_rule and flow_match structures and use them
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.

To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.

This introduces two new interfaces:

	bool flow_rule_match_key(rule, dissector_id)

that returns true if a given matching key is set on, and:

	flow_rule_match_XYZ(rule, &match);

To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-06 10:38:25 -08:00
Guy Shattah 1651925d40 net/mlx5e: Use the inner headers to determine tc/pedit offload limitation on decap flows
In packets that need to be decaped the internal headers
have to be checked, not the external ones.

Fixes: bdd66ac0ae ("net/mlx5e: Disallow TC offloading of unsupported match/action combinations")
Signed-off-by: Guy Shattah <sguy@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Or Gerlitz 6363651d6d net/mlx5e: Properly set steering match levels for offloaded TC decap rules
The match level computed by the driver gets to be wrong for decap
rules with wildcarded inner packet match such as:

tc filter add dev vxlan_sys_4789 protocol all parent ffff: prio 2 flower
       enc_dst_ip 192.168.0.9 enc_key_id 100 enc_dst_port 4789
       action tunnel_key unset
       action mirred egress redirect dev eth1

The FW errs for a missing matching meta-data indicator for the outer
headers (where we do have a match), and a wrong matching meta-data
indicator for the inner headers (where we don't have a match).

Fix that by taking into account the matching on the tunnel info and
relating the match level of the encapsulated packet to the firmware
inner headers indicator in case of decap.

As for vxlan we mandate a match on the tunnel udp dst port, and in general
we practically madndate a match on the source or dest ip for any IP tunnel,
the fix was done in a minimal manner around the tunnel match parsing code.

Fixes: d708f90298 ('net/mlx5e: Get the required HW match level while parsing TC flow matches')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Slava Ovsiienko <viacheslavo@mellanox.com>
Reviewed-by: Jianbo Liu <jianbol@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Raed Salem 82eaa1fa04 net/mlx5e: FPGA, fix Innova IPsec TX offload data path performance
At Innova IPsec TX offload data path a special software parser metadata
is used to pass some packet attributes to the hardware, this metadata
is passed using the Ethernet control segment of a WQE (a HW descriptor)
header.

The cited commit might nullify this header, hence the metadata is lost,
this caused a significant performance drop during hw offloading
operation.

Fix by restoring the metadata at the Ethernet control segment in case
it was nullified.

Fixes: 37fdffb217 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
Signed-off-by: Raed Salem <raeds@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-02-05 12:10:19 -08:00
Tonghao Zhang fc9c5a4a5a net/mlx5: Fix code style issue in mlx driver
Add the tab before '}' and keep the code style consistent.

Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-04 18:01:56 -08:00
Jakub Kicinski bff5731d43 net: devlink: report cell size of shared buffers
Shared buffer allocation is usually done in cell increments.
Drivers will either round up the allocation or refuse the
configuration if it's not an exact multiple of cell size.
Drivers know exactly the cell size of shared buffer, so help
out users by providing this information in dumps.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-03 11:25:34 -08:00
Jiri Pirko a97cfe4de1 mlxsw: spectrum_acl: Add C-TCAM spill tracepoint
Add some visibility to the rule addition process and trace whenever rule
spilled into C-TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:00:40 -08:00
Jiri Pirko cb56e21467 mlxsw: spectrum_acl: Include delta bits into hashtable key
Currently only ERP mask masked bits in key are considered for
the hashtable key. That leads to false negative collisions
and fallbacks to C-TCAM in case two keys differ only in delta bits.

Fix this by taking full encoded key as a hashtable key,
including delta bits.

Reported-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30 10:00:40 -08:00
David S. Miller eaf2a47f40 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-29 21:18:54 -08:00
Ido Schimmel 2adeb5f1c3 mlxsw: spectrum_switchdev: Add more extack messages
Add more extack messages that let the user know why VXLAN offload
failed.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 3021afe168 mlxsw: spectrum_acl: Fix rul/rule typo
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 038418eeb9 mlxsw: spectrum_acl: Move mr_ruleset and mr_rule structs
Move the struct to the place where they belong, alongside with the rest
of the MR code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
Jiri Pirko 42d704e018 mlxsw: spectrum_acl: Remove unnecessary arg on action_replace call path
No need to pass ruleset/group and chunk pointers on action_replace call
path, nobody uses them.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-28 10:43:15 -08:00
David S. Miller 3da15ad3e9 mlx5-fixes-2019-01-25
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcS2rdAAoJEEg/ir3gV/o+CZ8IAMX4yWUWjzgrIDdOc1OEgHJQ
 FDCMtTDm/I+m1UInAgoigRia2RyIL99N81Vrx+p+otCOml/U5IXPt5tuJfSNQYZ3
 p9MibQSXtKJ9jhowOVZdYlTx9CXti75BS45wHKZPlL7V+gCC2Q7/8aviuRlkQKAN
 4pQNDreNiiB8PznXG/yCyzEu0Kc9M/Bv89xdLynVpDTxBfEfI8xSv4PttUcAxaQ5
 Fs/489wuODyhAczOquQj1dvn61zUamaPp6z0h+GbZOCQv0AO7ABRZl+7o5C5r+rA
 i7R4ak6WIgW3snIzhCdpynPfas8kUIvxJjkNCZuoFsefrBRsNv5RvtfCkuzyms4=
 =3epF
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-fixes-2019-01-25' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
Mellanox, mlx5 fixes 2019-01-25

This series introduces some fixes to mlx5 driver.
For more information please see tag log below.

Please pull and let me know if there is any problem.

For -stable v4.13
('net/mlx5e: Allow MAC invalidation while spoofchk is ON')

For -stable v4.18
('Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"')
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-27 11:06:45 -08:00
David S. Miller 1d68101367 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2019-01-27 10:43:17 -08:00
Saeed Mahameed b832d4fdf1 net/mlx5e: Reuse fold sw stats in representors
Representors software stats are basic, this patch is reusing the
mlx5e_fold_sw_stats in representors, which sums up the basic stats64 for a
mlx5e netdevice.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:15 -08:00
Tariq Toukan 168af00a3b net/mlx5e: Present the representors SW stats when state is not opened
This behavior is already adopted for all other cases in the cited patch.
The representor's functions were missed, here we modify the them to
behave similarly.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Saeed Mahameed 9659e49a6d net/mlx5e: Separate between ethtool and netdev software stats folding
mlx5e_grp_sw_update_stats can be called from two threads,
1) ndo_get_stats64
2) get_ethtool_stats

For this reason and to minimize concurrency issue impact on 64bit machines
mlx5e_grp_sw_update_stats folds the software stats into a temporary
variable then copies it to the global driver stats, both ethtool and ndo
statistics callbacks will use the global software stats variable to report
whatever stats they need.

Actually ndo_get_stats64 doesn't need to fold the whole software stats
(mlx5e_grp_sw_update_stats), all it needs is five counters to fill the
rtnl_link_stats64 relevant stats parameter.

Hence this patch introduces a simpler helper function to fold software
stats for ndo_get_stats64 which will work directly on rtnl_link_stats64
stats parameter and not on the global or even temporary mlx5e_sw_stats
variable.

Since now mlx5e_grp_sw_update_stats is not called by ndo_get_stats64 we
can make it static and remove the temp var.

Unlike mlx5e_grp_sw_update_stats the new fold stats function doesn't
need to zero out the output statistics parameter since it is already
done by the stack @dev_get_stats().

This patch is fixing stack usage of mlx5e_grp_sw_update_stats on
x86 gcc-4.9 and higher, the concurrency issue between mlx5's
ndo_get_stats64 and get_ethtool_stats is resolved as well.

Fixes: 8bfaf07f78 ("net/mlx5e: Present SW stats when state is not opened")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Or Gerlitz 8e4ca98609 net/mlx5: Add trace points for flow tables create/destroy
We were not tracking flow tables so far, add it up.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Jason Gunthorpe 71129676ab net/mlx5e: Return the allocated flow directly from __mlx5e_add_fdb_flow
This confusing construction confuses the compiler which can't see
that flow is initialized if !err:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c: In function `mlx5e_configure_flower`
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:2727:28: warning:
`flow` may be used uninitialized in this function
[-Wmaybe-uninitialized]

There is no reason for two function outputs, just return the
pointer directly and use ERR_PTR to encode a failure.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Moshe Shemesh 149e566fef net/mlx5e: Expand XPS cpumask to cover all online cpus
Currently we have one cpu in XPS cpumask per tx queue, this is good
enough for default configuration where there is a tx queue per cpu.
However, once configuration changes to use less tx queues, part of the
cpus are not XPS-mapped and so the select queue decision falls back to
hash calculation and balancing is not guaranteed.

Expand XPS cpumask to enable using all cpus even when number of tx
queues is smaller than number of cpus.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:14 -08:00
Tariq Toukan 79d356ef2c net/mlx5e: Take CQ decompress fields into a separate structure
Only the Receive CQ makes use of these fields. Take them
out into a separate struct and save space in the generic
CQ structure.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:13 -08:00
Tariq Toukan 9481627838 net/mlx5e: RX, Make sure packet header does not cross page boundary
In the non-linear SKB memory scheme of Striding RQ, a packet header
could cross page boundary. This requires special care in fast path
that costs LoC, additional runtime instructions and branches.

It could happen when the header (up to 256B) does not fit in
a single stride. Avoid this by working with a stride size that fits
the maximum possible header. Stride is increased form 64B to 256B.

Performance:
Tested packet rate for UDP streams, single ring, on ConnectX-5.

Configuration:
Set Striding RQ and LRO ON (to enabled the non-linear SKB scheme).
GRO OFF, early drop by TC rule.

64B: 4x worse memory utilization, no page-crossers headers
- No degradation (5,887,305 pps).
- The reduction in memory utilization is compensated by the saving of
  branches tests.

192B: 1.33x worse memory utilization, avoid page-crossers headers
- Before: 5,727,252. After: 5,777,037. ~1% gain.

256B: Same memory util, no page-crossers
- Before: 5,691,885. After: 5,748,007. ~1% gain.

Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:16:13 -08:00
Or Gerlitz 6ce966fd26 net/mlx5e: Unblock setting vid 0 for VFs through the uplink rep
It turns out that libvirt uses 0-vid as a default if no vlan was
set for the guest (which is the case for switchdev mode) and errs
if we disallow that:

error: Failed to start domain vm75
error: Cannot set interface MAC/vlanid to 6a:66:2d:48:92:c2/0 \
		for ifname enp59s0f0 vf 0: Operation not supported

So allow this in order not to break existing systems.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Maor Dickman <maord@mellanox.com>
Reviewed-by: Gavi Teitz <gavi@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Or Gerlitz c12ecc2305 net/mlx5e: Move to use common phys port names for vport representors
With VF LAG commit 491c37e49b "net/mlx5e: In case of LAG, one switch
parent id is used for all representors", both uplinks and all the VFs
(on both of them) get the same switchdev id.

This cause the provisioning system method to identify the rep of a given
VF from the parent PF PCI device using switchev id and physical port
name to break, since VFm of PF0 will have the (id, name) as VFm of PF1.

To fix that, we align to use the framework agreed upstream and set by
nfp commit 168c478e10 "nfp: wire get_phys_port_name on representors":

$ cat /sys/class/net/eth4_*/phys_port_name
p0
pf0vf0
pf0vf1

Now, the names will be different, e.g. pf0vf0 vs. pf1vf0.

Fixes: 491c37e49b ("net/mlx5e: In case of LAG, one switch parent id is used for all representors")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reported-by: Waleed Musa <waleedm@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Aya Levin 9d2cbdc5d3 net/mlx5e: Allow MAC invalidation while spoofchk is ON
Prior to this patch the driver prohibited spoof checking on invalid MAC.
Now the user can set this configuration if it wishes to.

This is required since libvirt might invalidate the VF Mac by setting it
to zero, while spoofcheck is ON.

Fixes: 1ab2068a4c ("net/mlx5: Implement vports admin state backup/restore")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:29 -08:00
Moni Shoua 33814e5d12 net/mlx5: Take lock with IRQs disabled to avoid deadlock
The lock in qp_table might be taken from process context or from
interrupt context. This may lead to a deadlock unless it is taken with
IRQs disabled.

Discovered by lockdep

================================
WARNING: inconsistent lock state
4.20.0-rc6
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W}

python/12572 [HC1[1]:SC0[0]:HE0:SE1] takes:
00000000052a4df4 (&(&table->lock)->rlock#2){?.+.}, /0x50 [mlx5_core]
{HARDIRQ-ON-W} state was registered at:
  _raw_spin_lock+0x33/0x70
  mlx5_get_rsc+0x1a/0x50 [mlx5_core]
  mlx5_ib_eqe_pf_action+0x493/0x1be0 [mlx5_ib]
  process_one_work+0x90c/0x1820
  worker_thread+0x87/0xbb0
  kthread+0x320/0x3e0
  ret_from_fork+0x24/0x30
irq event stamp: 103928
hardirqs last  enabled at (103927): [] nk+0x1a/0x1c
hardirqs last disabled at (103928): [] unk+0x1a/0x1c
softirqs last  enabled at (103924): [] tcp_sendmsg+0x31/0x40
softirqs last disabled at (103922): [] 80

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&(&table->lock)->rlock#2);

    lock(&(&table->lock)->rlock#2);

 *** DEADLOCK ***

Fixes: 032080ab43 ("IB/mlx5: Lock QP during page fault handling")
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
Shay Agroskin 92b3277294 net/mlx5e: Fix wrong private flag usage causing checksum disable
MLX5E_PFLAG_* definitions were changed from bitmask to enumerated
values. However, in mlx5e_open_rq(), the proper API (MLX5E_GET_PFLAG macro)
was not used to read the flag value of MLX5E_PFLAG_RX_NO_CSUM_COMPLETE.
Fixed it.

Fixes: 8ff57c18e9 ("net/mlx5e: Improve ethtool private-flags code structure")
Signed-off-by: Shay Agroskin <shayag@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
Bodong Wang 4e046de0f5 Revert "net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager"
This reverts commit 5f5991f36d.

With the original commit, eswitch instance will not be initialized for
a function which is vport group manager but not eswitch manager such as
host PF on SmartNIC (BlueField) card. This will result in a kernel crash
when such a vport group manager is trying to access vports in its group.
E.g, PF vport manager (not eswitch manager) tries to configure the MAC
of its VF vport, a kernel trace will happen similar as bellow:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
 ...
 RIP: 0010:mlx5_eswitch_get_vport_config+0xc/0x180 [mlx5_core]
 ...

Fixes: 5f5991f36d ("net/mlx5e: E-Switch, Initialize eswitch only if eswitch manager")
Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reported-by: Yuval Avnery <yuvalav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2019-01-25 12:00:28 -08:00
David S. Miller 30e5c2c6bf net: Revert devlink health changes.
This reverts the devlink health changes from 9/17/2019,
Jiri wants things to be designed differently and it was
agreed that the easiest way to do this is start from the
beginning again.

Commits reverted:

cb5ccfbe73
880ee82f03
c7af343b4e
ff253fedab
6f9d56132e
fcd852c69d
8a66704a13
12bd0dcefe
aba25279c1
ce019faa70
b8c45a033a

And the follow-on build fix:

o33a0efa4baecd689da9474ce0e8b673eb6931c60

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-25 10:53:23 -08:00