linux-sg2042

Commit Graph

Author	SHA1	Message	Date
Maor Gottlieb	fbc4a69b56	net/mlx5e: Fix aRFS compilation dependency en_arfs.o should be compiled only if both CONFIG_MLX5_CORE_EN and CONFIG_RFS_ACCEL are enabled. en_arfs calls to rps_may_expire_flow which is compiled only if CONFIG_RFS_ACCEL is defined. Move en_arfs.o compilation dependency to be under CONFIG_MLX5_CORE_EN and wrap the en_arfs.c content with ifdef of CONFIG_RFS_ACCEL. Fixes: `1cabe6b096` ('net/mlx5e: Create aRFS flow tables') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 14:04:46 -04:00
Alexander Duyck	f3ed653cd4	net/mlx5e: Fix IPv6 tunnel checksum offload The mlx5 driver exposes support for TSO6 but not IPv6 csum for hardware encapsulated tunnels. This leads to issues as it triggers warnings in skb_checksum_help as it ends up being called as we report supporting the segmentation but not the checksumming for IPv6 frames. This patch corrects that and drops 2 features that don't actually need to be supported in hw_enc_features since they are Rx features and don't actually impact anything by being present in hw_enc_features. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:28 -04:00
Alexander Duyck	b49663c8fb	net/mlx5e: Add support for UDP tunnel segmentation with outer checksum offload This patch assumes that the mlx5 hardware will ignore existing IPv4/v6 header fields for length and checksum as well as the length and checksum fields for outer UDP headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:28 -04:00
Alexander Duyck	09067122db	net/mlx4_en: Add support for inner IPv6 checksum offloads and TSO >From what I can tell the ConnectX-3 will support an inner IPv6 checksum and segmentation offload, however it cannot support outer IPv6 headers. This assumption is based on the fact that I could see the checksum being offloaded for inner header on IPv4 tunnels, but not on IPv6 tunnels. For this reason I am adding the feature to the hw_enc_features and adding an extra check to the features_check call that will disable GSO and checksum offload in the case that the encapsulated frame has an outer IP version of that is not 4. The check in mlx4_en_features_check could be removed if at some point in the future a fix is found that allows the hardware to offload segmentation/checksum on tunnels with an outer IPv6 header. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:27 -04:00
Alexander Duyck	3c9346b240	net/mlx4_en: Add support for UDP tunnel segmentation with outer checksum offload This patch assumes that the mlx4 hardware will ignore existing IPv4/v6 header fields for length and checksum as well as the length and checksum fields for outer UDP headers. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 13:32:27 -04:00
David S. Miller	cba6532100	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts: net/ipv4/ip_gre.c Minor conflicts between tunnel bug fixes in net and ipv6 tunnel cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-04 00:52:29 -04:00
Matthew Finlay	d8cf2dda3d	net/mlx5e: Use workqueue for vxlan ops The vxlan add/delete port NDOs are called under rcu lock. The current mlx5e implementation can potentially block in these calls, which is not allowed. Move to using the mlx5e workqueue to handle these NDOs. Fixes: `b3f63c3d5e` ('net/mlx5e: Add netdev support for VXLAN tunneling') Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:26 -04:00
Matthew Finlay	7bb2975599	net/mlx5e: Implement a mlx5e workqueue Implement a mlx5e workqueue to handle all mlx5e specific tasks. Move all tasks currently using the system workqueue to the new workqueue. This is in preparation for vxlan using the mlx5e workqueue in order to schedule port add/remove operations. Signed-off-by: Matthew Finlay <matt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:26 -04:00
Matthew Finlay	69976fb104	net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue When MLX5_EN=y MLX5_CORE=y and VXLAN=m there is a linker error for vxlan_get_rx_port() due to the fact that VXLAN is a module. Change Kconfig to select VXLAN when MLX5_CORE=y. When MLX5_CORE=m there is no dependency on the value of VXLAN. Fixes: `b3f63c3d5e` ('net/mlx5e: Add netdev support for VXLAN tunneling') Signed-off-by: Matthew Finlay <matt@mellanox.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:26 -04:00
Gal Pressman	5f8a02a441	net/mlx5: Unmap only the relevant IO memory mapping When freeing UAR the driver tries to unmap uar->map and uar->bf_map which are mutually exclusive thus always unmapping a NULL pointer. Make sure we only call iounmap() once, for the actual mapping. Fixes: `0ba422410b` ('net/mlx5: Fix global UAR mapping') Signed-off-by: Gal Pressman <galp@mellanox.com> Reported-by: Doron Tsur <doront@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-05-03 13:37:25 -04:00
Maor Gottlieb	45bf454ae8	net/mlx5e: Enabling aRFS mechanism Accelerated RFS requires that ntuple filtering is enabled via ethtool and driver supports ndo_rx_flow_steer. When the ntuple filtering is enabled, we modify the l3_l4 ttc rules to point on the aRFS flow tables and when the filtering is disabled, we modify the l3_l4 ttc rules to point on the RSS TIRs. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:12 -04:00
Maor Gottlieb	18c908e477	net/mlx5e: Add accelerated RFS support Implement ndo_rx_flow_steer ndo. A new flow steering rule will be composed from the skb 4-tuple and added to the hardware aRFS flow table. Each rule is stored in an internal hash table, if such skb 4-tuple rule already exists we update the corresponding hardware steering rule with the new destination. For garbage collection rps_may_expire_flow will be invoked for a limited amount of old rules upon any ndo_rx_flow_steer invocation. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	1cabe6b096	net/mlx5e: Create aRFS flow tables Create the following four flow tables for aRFS usage: 1. IPv4 TCP - filtering 4-tuple of IPv4 TCP packets. 2. IPv6 TCP - filtering 4-tuple of IPv6 TCP packets. 3. IPv4 UDP - filtering 4-tuple of IPv4 UDP packets. 4. IPv6 UDP - filtering 4-tuple of IPv6 UDP packets. Each flow table has two flow groups: one for the 4-tuple filtering (full match) and the other contains * rule for miss rule. Full match rule means a hit for aRFS and packet will be forwarded to the dedicated RQ/Core, miss rule packets will be forwarded to default RSS hashing. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	5a7b27eb9c	net/mlx5: Initializing CPU reverse mapping Allocating CPU rmap and add entry for each IRQ. CPU rmap is used in aRFS to get the RX queue number of the RX completion interrupts. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	33cfaaa8f3	net/mlx5e: Split the main flow steering table Currently, the main flow table is used for two purposes: One is to do mac filtering and the other is to classify the packet l3-l4 header in order to steer the packet to the right RSS TIR. This design is very complex, for each configured mac address we have to add eleven rules (rule for each traffic type), the same if the device is put to promiscuous/allmulti mode. This scheme isn't scalable for future features like aRFS. In order to simplify it, the main flow table is split to two flow tables: 1. l2 table - filter the packet dmac address, if there is a match we forward to the ttc flow table. 2. TTC (Traffic Type Classifier) table - classify the traffic type of the packet and steer the packet to the right TIR. In this new design, when new mac address is added, the driver adds only one flow rule instead of eleven. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:11 -04:00
Maor Gottlieb	acff797cd1	net/mlx5e: Refactor mlx5e flow steering structs Slightly refactor and re-order the flow steering structs, tables and data-bases for better self-containment and flexibility to add more future steering phases (tables/rules/data bases) e.g: aRFS. Changes: 1. Move the vlan DB and address DB into their table structs. 2. Rename steering table structs to unique format: mlx5e_*_table, e.g: mlx5e_vlan_table. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:10 -04:00
Maor Gottlieb	13de6c106c	net/mlx5: Support different attributes for priorities in namespace Currently, namespace could be initialized only with priorities with the same attributes. Add support to initialize namespace with priorities with different attributes(e.g. different number of levels). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:10 -04:00
Maor Gottlieb	d63cd28608	net/mlx5: Add user chosen levels when allocating flow tables Currently, consumers of the flow steering infrastructure can't choose their own flow table levels and are limited to one flow table per level. This just waste levels. Instead, we introduce here the possibility to use multiple flow tables in a level. The user is free to connect these flow tables, while following the rule (FTEs in FT of level x could only point to FTs of level y where y > x). In addition this patch switch the order of the create/destroy flow tables of the NIC(vlan and main). Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:09 -04:00
Maor Gottlieb	a257b94a18	net/mlx5: Set number of allowed levels in priority Refactors the flow steering namespace creation, by changing the name num_fts to num_levels. When new flow table is created, the driver assign new level to this flow table therefore the meaning is equivalent. Since downstream patches will introduce the ability to create more than one flow table per level, the name num_fts is no longer accurate. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:09 -04:00
Maor Gottlieb	d745098ced	net/mlx5: Introduce modify flow rule destination This API is used for modifying the flow rule destination. This is needed for modifying the pointed flow table by the traffic type classifier rules to point on the aRFS tables. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Tariq Toukan	1da366964e	net/mlx5e: Direct TIR per RQ Introduce new TIRs for direct access per RQ. Now we have 2 available kinds of TIRs: - indirect TIR per traffic type, each points to one RQT (RSS RQT) same as before. - New direct TIR per RQ, each points to RQT with a size of one that forwards packets to that RQ only. Driver will open max channels (num cores) direct TIRs by default, they will be filled with the actual RQs once channels are allocated. Needed for downstream aRFS and ethtool direct steering functionalities. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Matthew Finlay	01a14098d3	net/mlx5e: Call vxlan_get_rx_port() with rtnl lock Hold the rtnl lock when calling vxlan_get_rx_port(). Fixes: `b7aade1548` ("vxlan: break dependency with netdev drivers") Signed-off-by: Matthew Finlay <matt@mellanox.com> Reported-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-29 16:29:08 -04:00
Arnd Bergmann	6b87663fbe	net/mlx5e: avoid stack overflow in mlx5e_open_channels struct mlx5e_channel_param is a large structure that is allocated on the stack of mlx5e_open_channels, and with a recent change it has grown beyond the warning size for the maximum stack that a single function should use: mellanox/mlx5/core/en_main.c: In function 'mlx5e_open_channels': mellanox/mlx5/core/en_main.c:1325:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=] The function is already using dynamic allocation and is not in a fast path, so the easiest workaround is to use another kzalloc for allocating the channel parameters. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `d3c9bc2743` ("net/mlx5e: Added ICO SQs") Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-28 16:46:59 -04:00
Masanari Iida	c01e01597c	treewide: Fix typos in printk This patch fix spelling typos in printk from various part of the codes. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-04-28 10:52:28 +02:00
David S. Miller	c0cc53162a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Minor overlapping changes in the conflicts. In the macsec case, the change of the default ID macro name overlapped with the 64-bit netlink attribute alignment fixes in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-27 15:43:10 -04:00
Saeed Mahameed	1b223dd391	net/mlx5e: Fix checksum handling for non-stripped vlan packets Now as rx-vlan offload can be disabled, packets can be received with vlan tag not stripped, which means is_first_ethertype_ip will return false, for that we need to check if the hardware reported csum OK so we will report CHECKSUM_UNNECESSARY for those packets. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:03 -04:00
Gal Pressman	363501145e	net/mlx5e: Add ethtool support for rxvlan-offload (vlan stripping) Use ethtool -K <interface> rxvlan <on/off> to enable/disable C-TAG vlan stripping by hardware. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	bb64143eee	net/mlx5e: Add ethtool support for dump module EEPROM Add query MCIA, PMLP registers infrastructure and commands. Add ethtool support for get_module_info() and get_module_eeprom() callbacks. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	da54d24ec3	net/mlx5e: Add ethtool support for interface identify (LED blinking) Add the needed hardware command and mlx5_ifc structs for managing LED control. Add set_phys_id ethtool callback to support ethtool -p flag. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Eran Ben Elisha	94cb1ebbaf	net/mlx5e: Add support for RXALL netdev feature Introduce new access register named Ports Check Mask Register (PCMR) to control all HW checks on port. With this register, the driver can enable/disable Hardware FCS validation. When RXALL is enabled/disabled using ndo_set_features, enable/disable fcs check at HW. User can change HW configuration using rx-all flag at ethtool. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:02 -04:00
Gal Pressman	0e405443e8	net/mlx5e: Improve set features ndo resiliency In current mlx5e ndo_set_features implementation, setting some features can success while others can fail. Today, we return one error code which doesn't reflect the current features status of the netdev at the end of the ndo callback. Set netdev->features with features which were successfully set in order to keep the current status in case of failure. For this purpose, define new Macro to set/unset specific feature in netdev->features. This patch introduces a mechanism that uses feature handlers for each feature. Set features will call a generic handler, which will then call a specific handler in his turn and update netdev->features according to it's return value. Each specific handler is responsible to perform driver specific actions, and updating params if needed. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	121fcdc84d	net/mlx5e: Add link down events counter Expose link_down_events counter through ethtool -S. This counter is read from PPort statistics, then proccessed and stored as a special handling software counter. This counter is stored along software counters since it is the only PPort counter that it's size is not 64 bits. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	cf678570d5	net/mlx5e: Add per priority group to PPort counters Expose counters providing information for each priority level (PCP) through ethtool -S option and DCBNL. This includes rx/tx bytes, frames, and pause counters. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	8075cb7238	net/mlx5e: Rename VPort counters VPort and software counters names are confusing and may be unclear, all VPort counters now have a prefix of rx/tx_vport_*. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	9218b44dcc	net/mlx5e: Statistics handling refactoring Redesign ethtool statistics handling and reporting in the driver: 1. Move counters to a separate file (en_stats.h). 2. Remove unnecessary dependencies between stats and strings. 3. Use counter descriptors which hold a name and offset for each counter, and will be used to decide which counters will be exposed. For example when adding a new software counter to ethtool, instead of: 1. Add to stats struct. 2. Add to strings struct in the same order. 3. Change macro defining number of software counters. The only thing needed is to link the new counter to a counter descriptor. VPort counters are a set of hardware traffic counters created automatically for each virtual port opened. PPort counters are a set of counters describing per physical port performance statistics. These counters are gathered from hardware register and divided to groups according to different protocols. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:01 -04:00
Gal Pressman	269e6b3af3	net/mlx5e: Report additional error statistics in get stats ndo Provide rtnl_link_stats64 with information regarding physical errors to be seen in ifconfig and ip tool. Signed-off-by: Gal Pressman <galp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 15:58:00 -04:00
Eric Dumazet	fc96256c90	net/mlx4_en: fix spurious timestamping callbacks When multiple skb are TX-completed in a row, we might incorrectly keep a timestamp of a prior skb and cause extra work. Fixes: `ec693d4701` ("net/mlx4_en: Add HW timestamping (TS) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-26 01:13:18 -04:00
Majd Dibbiny	5fc7197d3a	net/mlx5: Add pci shutdown callback This patch introduces kexec support for mlx5. When switching kernels, kexec() calls shutdown, which unloads the driver and cleans its resources. In addition, remove unregister netdev from shutdown flow. This will allow a clean shutdown, even if some netdev clients did not release their reference from this netdev. Releasing The HW resources only is enough as the kernel is shutting down Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Eli Cohen	78228cbdeb	net/mlx5_core: Remove static from local variable The static is not required and breaks re-entrancy if it will be required. Fixes: `2530236303` ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	cd255efff9	net/mlx5e: Use vport MTU rather than physical port MTU Set and report vport MTU rather than physical MTU, Driver will set both vport and physical port mtu and will rely on the query of vport mtu. SRIOV VFs have to report their MTU to their vport manager (PF), and this will allow them to work with any MTU they need without failing the request. Also for some cases where the PF is not a port owner, PF can work with MTU less than the physical port mtu if set physical port mtu didn't take effect. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	d8edd2469a	net/mlx5e: Fix minimum MTU Minimum MTU that can be set in Connectx4 device is 68. This fixes the case where a user wants to set invalid MTU, the driver will fail to satisfy this request and the interface will stay down. It is better to report an error and continue working with old mtu. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:39 -04:00
Saeed Mahameed	046339eaab	net/mlx5e: Device's mtu field is u16 and not int For set/query MTU port firmware commands the MTU field is 16 bits, here I changed all the "int mtu" parameters of the functions wrapping those firmware commands to be u16. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Majd Dibbiny	64dbbdfef2	net/mlx5_core: Add ConnectX-5 to list of supported devices Add the upcoming ConnectX-5 devices (PF and VF) to the list of supported devices by the mlx5 driver. Signed-off-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Rana Shahout	6e4c218946	net/mlx5e: Fix MLX5E_100BASE_T define Bit 25 of eth_proto_capability in PTYS register is 1000Base-TT and not 100Base-T. Fixes: `f62b8bb8f2` ('net/mlx5: Extend mlx5_core to support ConnectX-4 Ethernet functionality') Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
Maor Gottlieb	c3f9bf628b	net/mlx5_core: Fix soft lockup in steering error flow In the error flow of adding flow rule to auto-grouped flow table, we call to tree_remove_node. tree_remove_node locks the node's parent, however the node's parent is already locked by mlx5_add_flow_rule and this causes a deadlock. After this patch, if we failed to add the flow rule, we unlock the flow table before calling to tree_remove_node. fixes: `f0d22d1874` ('net/mlx5_core: Introduce flow steering autogrouped flow table') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reported-by: Amir Vadai <amir@vadai.me> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-24 14:51:38 -04:00
David S. Miller	1602f49b58	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-23 18:51:33 -04:00
Hannes Frederic Sowa	0c5c3252c4	mlx4: protect mlx4_en_start_port in mlx4_en_restart with rtnl_lock mlx4_en_start_port requires rtnl_lock to be held. Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:35:43 -04:00
Tariq Toukan	5498440756	net/mlx5e: Add ethtool counter for RX buffer allocation failures Counts the number of RX buffer allocation failures and shows it in ethtool statistics. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Saeed Mahameed	e20a0db304	net/mlx5e: Delay skb->data access Move mlx5e_handle_csum and eth_type_trans to the end of mlx5e_build_rx_skb to gain some more time before accessing skb->data, to reduce cache misses. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:06 -04:00
Tariq Toukan	1bfec31627	net/mlx5e: Remove redundant barrier The bit-op operation one line before is an explicit barrier by itself. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	c5adb96f6c	net/mlx5e: Use napi_alloc_skb for RX SKB allocations Instead of netdev_alloc_skb, we use the napi_alloc_skb function which is designated to allocate skbuff's for RX in a channel-specific NAPI instance, and implies the IP packet alignment. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	bc77b240b3	net/mlx5e: Add fragmented memory support for RX multi packet WQE If the allocation of a linear (physically continuous) MPWQE fails, we allocate a fragmented MPWQE. This is implemented via device's UMR (User Memory Registration) which allows to register multiple memory fragments into ConnectX hardware as a continuous buffer. UMR registration is an asynchronous operation and is done via ICO SQs. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	d3c9bc2743	net/mlx5e: Added ICO SQs Added ICO (Internal Control Operations) SQ per channel to be used for driver internal operations such as memory registration for fragmented memory and nop requests upon ifconfig up. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	461017cb00	net/mlx5e: Support RX multi-packet WQE (Striding RQ) Introduce the feature of multi-packet WQE (RX Work Queue Element) referred to as (MPWQE or Striding RQ), in which WQEs are larger and serve multiple packets each. Every WQE consists of many strides of the same size, every received packet is aligned to a beginning of a stride and is written to consecutive strides within a WQE. In the regular approach, each regular WQE is big enough to be capable of serving one received packet of any size up to MTU or 64K in case of device LRO is enabled, making it very wasteful when dealing with small packets or device LRO is enabled. For its flexibility, MPWQE allows a better memory utilization (implying improvements in CPU utilization and packet rate) as packets consume strides according to their size, preserving the rest of the WQE to be available for other packets. MPWQE default configuration: Num of WQEs = 16 Strides Per WQE = 2048 Stride Size = 64 byte The default WQEs memory footprint went from 1024mtu (~1.5MB) to 16 2048 * 64 = 2MB per ring. However, HW LRO can now be supported at no additional cost in memory footprint, and hence we turn it on by default and get an even better performance. Performance tested on ConnectX4-Lx 50G. To isolate the feature under test, the numbers below were measured with HW LRO turned off. We verified that the performance just improves when LRO is turned back on. * Netperf single TCP stream: - BW raised by 10-15% for representative packet sizes: default, 64B, 1024B, 1478B, 65536B. * Netperf multi TCP stream: - No degradation, line rate reached. * Pktgen: packet rate raised by 2-10% for traffic of different message sizes: 64B, 128B, 256B, 1024B, and 1500B. * Pktgen: packet loss in bursts of small messages (64byte), single stream: - \| num packets \| packets loss before \| packets loss after \| 2K \| ~ 1K \| 0 \| 8K \| ~ 6K \| 0 \| 16K \| ~13K \| 0 \| 32K \| ~28K \| 0 \| 64K \| ~57K \| ~24K As expected as the driver can receive as many small packets (<=64B) as the number of total strides in the ring (default = 2048 * 16) vs. 1024 (default ring size regardless of packets size) before this feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:05 -04:00
Tariq Toukan	2f48af128d	net/mlx5e: Use function pointers for RX data path handling In preparation for Striding RQ feature, which will need its own RX handlers. This patch does not change any functionality. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Achiad Shochat <achiad@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	d8c9660dac	net/mlx5e: Use only close NUMA node for default RSS Distribute default RSS table uniformly over the rings of the close NUMA node, instead of all available channels. This way we enforce the preference of close rings over far ones. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Rana Shahout	593cf33829	net/mlx5e: Allocate set of queue counters per netdev Connect all netdev RQs to this set of queue counters. Also, add an "rx_out_of_buffer" counter to ethtool, which indicates RX packet drops due to lack of receive buffers. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Tariq Toukan	237cd21809	net/mlx5: Introduce device queue counters A queue counter can collect several statistics for one or more hardware queues (QPs, RQs, etc ..) that the counter is attached to. For Ethernet it will provide an "out of buffer" counter which collects the number of all packets that are dropped due to lack of software buffers. Here we add device commands to alloc/query/dealloc queue counters. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:09:04 -04:00
Eran Ben Elisha	d21ed3a311	net/mlx4_en: Split SW RX dropped counter per RX ring Count SW packet drops per RX ring instead of a global counter. This will allow monitoring the number of rx drops per ring. In addition, SW rx_dropped counter was overwritten by HW rx_dropped counter, sum both of them instead to show the accurate value. Fixes: `a3333b35da` ('net/mlx4_en: Moderate ethtool callback to [...] ') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reported-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Eugenia Emantayev	2a500090a4	net/mlx4_core: Don't allow to VF change global pause settings Currently changing global pause settings is done via SET_PORT command with input modifier GENERAL. This command is allowed for each VF since MTU setting is done via the same command. Change the above to the following scheme: before passing the request to the FW, the PF will check whether it was issued by a slave. If yes, don't change global pause and warn, otherwise change to the requested value and store for further reference. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Daniel Jurgens	4bfd2e6e53	net/mlx4_core: Avoid repeated calls to pci enable/disable Maintain the PCI status and provide wrappers for enabling and disabling the PCI device. Performing the actions more than once without doing its opposite results in warning logs. This occurred when EEH hotplugged the device causing a warning for disabling an already disabled device. Fixes: `2ba5fbd62b` ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:40 -04:00
Daniel Jurgens	c12833acff	net/mlx4_core: Implement pci_resume callback Move resume related activities to a new pci_resume function instead of performing them in mlx4_pci_slot_reset. This change is needed to avoid a hotplug during EEH recovery due to commit `f2da4ccf8b` ("powerpc/eeh: More relaxed hotplug criterion"). Fixes: `2ba5fbd62b` ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-21 15:02:39 -04:00
Konstantin Khlebnikov	851b10d608	net/mlx4_en: do batched put_page using atomic_sub This patch fixes couple error paths after allocation failures. Atomic set of page reference counter is safe only if it is zero, otherwise set can race with any speculative get_page_unless_zero. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Konstantin Khlebnikov	04aeb56a17	net/mlx4_en: allocate non 0-order pages for RX ring with __GFP_NOMEMALLOC High order pages are optional here since commit `51151a16a6` ("mlx4: allow order-0 memory allocations in RX path"), so here is no reason for depleting reserves. Generic __netdev_alloc_frag() implements the same logic. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-19 20:04:24 -04:00
Masanari Iida	c19ca6cb4c	treewide: Fix typos in printk This patch fix spelling typos found in printk within various part of the kernel sources. Signed-off-by: Masanari Iida <standby24x7@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2016-04-18 11:23:24 +02:00
Jiri Pirko	b94cdabbf1	mlxsw: spectrum_buffers: Use MLXSW_SP_PB_UNUSED define for unused pb Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:43 -04:00
Jiri Pirko	ce78f02042	mlxsw: spectrum_buffers: Use designated initializers for mlxsw_sp_pbs Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-15 13:02:42 -04:00
Jiri Pirko	2d0ed39fbd	mlxsw: spectrum_buffers: Implement occupancy monitoring Implement occupancy API introduced in devlink and mlxsw core. This is done by accessing SBPM register for Port-Pool and SBSR for Port-TC current and max occupancy values. Max clear is implemented using the same registers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	caf7297e7a	mlxsw: core: Introduce support for asynchronous EMAD register access So far it was possible to have one EMAD register access at a time, locked by mutex. This patch extends this interface to allow multiple EMAD register accesses to be in fly at once. That allows faster processing on firmware side avoiding unused time in between EMADs. Measured speedup is ~30% for shared occupancy snapshot operation. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	dd9bdb04d2	mlxsw: core: Add mlxsw specific workqueue and use it for FDB notif. processing Follow-up patch is going to need to use delayed work as well and frequently. The FDB notification processing is already using that and also quite frequently. It makes sense to create separate workqueue just for mlxsw driver in this case and do not pollute system_wq. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	42a7f1d774	mlxsw: reg: Extend SBPM register for occupancy control Since it is not possible to get and clear Port-Pool occupancy data using SBSR register, there's a need to implement that using SBPM. Extend pack helper and add unpack helper to get occupancy values. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:06 -04:00
Jiri Pirko	26176def3c	mlxsw: reg: Add Shared Buffer Status register definition This register allows to query HW for current and maximal buffer usage. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	1ceecc88d2	mlxsw: core: Add devlink shared buffer occupancy callbacks Add middle layer in mlxsw core code to forward shared buffer occupancy calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	0f433fa0ec	mlxsw: spectrum_buffers: Implement shared buffer configuration Implement previously introduced mlxsw core shared buffer API. For Spectrum, that is done utilizing registers SBPR, SBCM and SBPM. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	325f2f197d	mlxsw: core: Add mlxsw_core_port_driver_priv helper Needed in following patch. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	c30a53c7de	mlxsw: spectrum_buffers: Get max_buff defaults into limits exposed to user Although the device supports max_buff magic values 0 and 0xff, these are not exposed to the user via devlink. Therefore, adjust the default values to be within configurable range. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:05 -04:00
Jiri Pirko	bc872506f5	mlxsw: spectrum_buffers: Change initialization of PG 9 As explained in commit `ff6551ec0c` ("mlxsw: spectrum: Correctly configure headroom size") control packets are directed to priority group buffer 9 (PG9) in the ports' headroom buffers. Since we don't want to drop control packets in case they can't be admitted to the switch's shared buffer we bind PG9 to a different ingress pool from the one used by all other PGs. Unlike other PGs, we currently don't expose the binding between PG9 to a pool and leave it fixed. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	5408f7cba3	mlxsw: spectrum_buffers: Remove eg pool 3 default init and CPU port TC binding to it Since there is no congestion control for CPU port traffic, we can change the CPU port TC binding to pool 0 with min_buff and max_buff zeroed. Remove initialization for pool egress pool 3 since it is no longer used by dafault. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	078f9c7132	mlxsw: spectrum_buffers: Cache shared buffer configuration In order to achieve faster dumping of current setting and also in order to provide possibility to get pool mode without a need to query hardware, do cache the configuration in driver. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	aa99bc70ba	mlxsw: spectrum_buffers: Rename "pool" to "pr" in initialization Be consintent with rest of the registers (pm, cm) and use "pr" here. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	b11c3b4018	mlxsw: spectrum_buffers: Push out indexes and direction out of SB structs Structs are in arrays so use array index as pool/tc/prio index. With that, there is need to maintain separate arrays for ingress and egress. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:04 -04:00
Jiri Pirko	94266e3278	mlxsw: spectrum_buffers: Push out shared buffer register writes Pushed them into helper functions. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:03 -04:00
Jiri Pirko	a6179bf0d1	mlxsw: core: Add devlink shared buffer callbacks Add middle layer in mlxsw core code to forward shared buffer calls into specific ASIC drivers. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-14 16:22:03 -04:00
Jiri Pirko	9efc8f655c	mlxsw: reg: Fix SBPM register name Fix copy&paste error and state the name of SBPM register correctly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	497e8592c6	mlxsw: reg: Share direction enum between SBPR, SBCM, SBPM Same field, same values, so share the same enum. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:43 -04:00
Jiri Pirko	b2f10571b9	mlxsw: Do not pass around driver_priv directly Instead of that, pass mlxsw_core and use a helper to get driver priv from driver code. Looks much cleaner that way. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	307c2431ab	mlxsw: Pass mlxsw_core as a param of mlxsw_core_skb_transmit* Instead of passing around driver priv, pass struct mlxsw_core * directly. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Jiri Pirko	932762b69a	mlxsw: Move devlink port registration into common core code Remove devlink port reg/unreg from spectrum and switchx2 code and rather do the common work in core. That also ensures code separation where devlink is only used in core.c. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-08 15:38:42 -04:00
Ido Schimmel	d81a6bdb87	mlxsw: spectrum: Add IEEE 802.1Qbb PFC support Implement the appropriate DCB ops and allow a user to configure certain traffic classes as lossless. The operation configures PFC for both the egress (respecting PFC frames) and ingress (sending PFC frames) parts of the port. At egress, when a PFC frame is received for a PFC enabled priority, then all the priorities mapped to the same TC are stopped. At ingress, the priority group (PG) buffers to which the enabled PFC priorities are mapped are configured to be lossless. PFC frames will be transmitted when the Xoff threshold is crossed. The user-supplied delay parameter is used to determine the PG's size according to the following formula: PG_SIZE = PG_SIZE_LOSSY + delay * CELL_FACTOR + MTU In the worst case scenario the delay will be made up of packets that are all of size CELL_SIZE + 1, which means each packet will require almost twice its true size when buffered in the switch. We therefore multiply this value by the "cell factor", which is close to 2. Another MTU is added in case the transmitting host already started transmitting a maximum length frame when the PFC packet was received. As with PAUSE enabled ports, when the port's MTU is changed both the PGs' size and threshold are adjusted accordingly. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:20 -04:00
Ido Schimmel	34dba0a59d	mlxsw: reg: Introduce per priority counters We are going to add support for PFC as part of DCB ops, which requires us to report the number of PFC frames sent and received per priority. Add per priority counters in order to report number of PFC frames sent and received per priority. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:20 -04:00
Ido Schimmel	9f7ec052b7	mlxsw: spectrum: Add support for PAUSE frames When a packet ingress the switch it's placed in its assigned priority group (PG) buffer in the port's headroom buffer while it goes through the switch's pipeline. After going through the pipeline - which determines its egress port(s) and traffic class - it's moved to the switch's shared buffer awaiting transmission. However, some packets are not eligible to enter the shared buffer due to exceeded quotas or insufficient space. Marking their associated PGs as lossless will cause the packets to accumulate in the PG buffer. Another reason for packets accumulation are complicated pipelines (e.g. involving a lot of ACLs). To prevent packets from being dropped a user can enable PAUSE frames on the port. This will mark all the active PGs as lossless and set their size according to the maximum delay, as it's not configured by user. +----------------+ + \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Delay \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| Xon/Xoff threshold +----------------+ + \| \| \| \| \| \| 2 * MTU \| \| \| +----------------+ + The delay (612 [Cells]) was calculated according to worst-case scenario involving maximum MTU and 100m cables. After marking the PGs as lossless the device is configured to respect incoming PAUSE frames (Rx PAUSE) and generate PAUSE frames (Tx PAUSE) according to user's settings. Whenever the port's headroom configuration changes we take into account the PAUSE configuration, so that we correctly set the PG's type (lossy / lossless), size and threshold. This can happen when: a) The port's MTU changes, as it directly affects the PG's size. b) A PG is created following user configuration, by binding a priority to it. Note that the relevant SUPPORTED flags were already mistakenly set by the driver before this commit. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	155f9de2e0	mlxsw: reg: Add lossless settings for PBMC register When configuring PAUSE frames and PFC we'll need to configure the Xon/Xoff threshold for the priority group (PG) buffers. Add the Xon/Xoff threshold fields to the PBMC register so that we can configure these when needed. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	6f253d8381	mlxsw: reg: Add Port Flow Control Configuration register Add the Port Flow Control Configuration (PFCC) register, which configures both flow control and Priority-based Flow Control (PFC). Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	cc7cf51758	mlxsw: spectrum: Allow setting maximum rate for a TC Allow a user to set maximum rate for a particular TC using DCB ops. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:19 -04:00
Ido Schimmel	8e8dfe9fdf	mlxsw: spectrum: Add IEEE 802.1Qaz ETS support Implement the appropriate DCB ops and allow a user to configure: * Priority to traffic class (TC) mapping with a total of 8 supported TCs * Transmission selection algorithm (TSA) for each TC and the corresponding weights in case of weighted round robin (WRR) As previously explained, we treat the priority group (PG) buffer in the port's headroom as the ingress counterpart of the egress TC. Therefore, when a certain priority to TC mapping is configured, we also configure the port's headroom buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	f00817df2b	mlxsw: spectrum: Introduce support for Data Center Bridging (DCB) Introduce basic infrastructure for DCB and add the missing ops in following patches. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	90183b980d	mlxsw: spectrum: Initialize egress scheduling Before introducing support for DCB ops we should first make sure we initialize the relevant parts in the device correctly. Specifically, the egress scheduling. The device supports a superset of the 802.1Qaz standard with 4 hierarchy levels that can be linked to each other in multiple ways and with different transmission selection algorithms (TSA) employed between them. However, since we only intend to support the 802.1Qaz standard we flatten the hierarchies and let the user configure via DCB ops the TSA and max rate shaper at the subgroup hierarchy (see figure below) and the mapping between switch priority to traffic class. By default, all switch priorities are mapped to traffic class 0, strict priority is employed and max shaper is disabled. Default configuration: switch priority 0 ... switch priority 7 + + \| \| +----------------------------------+ \| +--v--+ +-----+ Traffic Class \| \| \| \| Hierarchy \| TC0 \| ... \| TC7 \| \| \| \| \| +--+--+ +--+--+ \| \| +--v--+ +--v--+ Subgroup \| SG0 \| \| SG7 \| Hierarchy \| \| \| \| +-----+ +-----+ \| TSA \| \| TSA \| +-----+ ... +-----+ \| MAX \| \| MAX \| +--+--+ +--+--+ \| \| +---------------+----------------+ \| +--v--+ Group \| \| Hierarchy \| GR0 \| \| \| +--+--+ \| +--v--+ Port \| \| Hierarchy \| PR0 \| \| \| +-----+ Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	2c63a555e8	mlxsw: reg: Add QoS Switch Traffic Class Table register As part of DCB ops we'll have to configure the priority to traffic class mapping of a port. Add the QoS Switch Traffic Class Table (QTCT) register, which configures the mapping between the packet switch priority and traffic class on the transmit port. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:18 -04:00
Ido Schimmel	b9b7cee405	mlxsw: reg: Add QoS ETS Element Configuration register We are going to introduce support for DCB, so we need to be able to configure the traffic selection algorithm (TSA) used by each traffic class (TC), as well as the bandwidth percentage allocated to each TC in case of ETS. Add the QoS ETS Element Configuration register, which controls the above parameters. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00
Ido Schimmel	d6b7c13b01	mlxsw: spectrum: Set port's shared buffer size to 0 In addition to the priority group (PG) buffers in the headroom, the device enables the allocation of headroom shared buffer, which can be shared between different PGs. However, we are not going to use the headroom shared buffer and instead allow the user to use its size for PGs or the switch's shared buffer. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2016-04-06 17:24:17 -04:00

1 2 3 4 5 ...

1560 Commits