Instead of fixing ring -> vector relations up in ring swap functions
put the reassignment into a helper function which will reinit all
links.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Upcoming XDP support will break the assumption that one can iterate
over IRQ vectors to get to all the rings easily. Use nn->.x_ring
arrays directly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ring allocation helpers encapsulate all ring allocation and
initialization steps nicely. Reuse them on .ndo_open() path.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"Shadow" in ring helpers used to mean that the helper will allocate
rings without touching existing configuration, this was used for
reconfiguration while the device was running. We will soon use
the same helpers for .ndo_open() path, so replace "shadow" with
"ring_set".
No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All functions which need to reallocate ring resources at runtime
look very similar. Centralize that logic into a separate function.
Encapsulate configuration parameters in a structure.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Report number of rings via ethtool .get_channels API.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the driver framework to separate out platform/ACPI specific code
from general code during device initialization. This will allow for the
introduction of PCI device support.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx and Rx DMA channel status determiniation is different depending on the
version of the hardware. Update the channel status processing code to
account for the change. Also, reduce the timeout value used when stopping
the channels.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reading all management counter registers as 64-bit
values. The indication of whether to read the high 32-bits to form
a 64-bit value is indicated in the version data.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to support accessing of the PCS registers
in a new way, while maintaining the current access method. Provide a
version specific field that indicates the method to use.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to be able to use clause 37 auto-negotiation.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare for the future introduction of clause 37 auto-negotiation by
updating the current auto-negotiation related functions to identify
them as clause 73 functions. Move interrupt enablement to the
enable/disable auto-negotiation functions. Update what will be common
routines to check for the current type of AN and process accordingly.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare the code to be able to work with more than one type of phy by
adding additional callable functions into the phy interface and removing
phy specific settings/functions from non-phy related files.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate the FIFO across the hardware Rx queues based on the priority
of the queues. Giving more FIFO resources to queues with a higher
priority. If PFC is active but not enabled for a queue, then less
resources can allocated to the queue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the Rx and Tx fifos are evenly allocated between the hardware
queues of the device. As more queues are instantiated, the fifo memory
needs to be able to be allocated based on queue priority. This allows for
higher priority queues to have more fifo memory than lower priority
queues. Prepare for this by modifying the current fifo calculation to
assign the fifo queue allocation in an array that is then used to program
the hardware.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the length value used for the PCS register dump so that the full
value can be displayed.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the ehea driver is missing a call to netif_carrier_off()
before the interface bring-up; this is necessary in order to
initialize the __LINK_STATE_NOCARRIER bit in the net_device state
field. Otherwise, we observe state UNKNOWN on "ip address" command
output.
This patch adds a call to netif_carrier_off() on ehea's net device
open callback.
Reported-by: Xiong Zhou <zhou@redhat.com>
Reference-ID: IBM bz #137702, Red Hat bz #1089134
Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com>
Signed-off-by: Douglas Miller <dougmill@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The system-status register is actually 16-bit wide and not 8 bit-wide.
Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver allocates replacement buffers before-hand to make
sure whenever an aggregation begins there would be a replacement
for the Rx buffers, as we can't release the buffer until
aggregation is terminated and driver logic assumes the Rx rings
are always full.
For every other Rx page that's being allocated [I.e., regular]
the page is being completely mapped while for the replacement
buffers only the first portion of the page is being mapped.
This means that:
a. Once replacement buffer replenishes the regular Rx ring,
assuming there's more than a single packet on page we'd post unmapped
memory toward HW [assuming mapping is actually done in granularity
smaller than page].
b. Unmaps are being done for the entire page, which is incorrect.
Fixes: 55482edc25 ("qede: Add slowpath/fastpath support and enable hardware GRO")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Synopsys Designware MAC Glue layer for the Oxford Semiconductor OX820.
Acked-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
which is bit mask. This is wrong. Hw actually provides us enum.
Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
Fixes: bf751ba802 ("driver/net: enic: record q_number and rss_hash for skb")
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
XDP statistics are reported in ethtool, in total and per ring,
as follows:
- xdp_drop: the number of packets dropped by xdp.
- xdp_tx: the number of packets forwarded by xdp.
- xdp_tx_full: the number of times an xdp forward failed
due to a full tx xdp ring.
In addition, all packets that are dropped/forwarded by XDP
are no longer accounted in rx_packets/rx_bytes of the ring,
so that they count traffic that is passed to the stack.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separately manage the two types of TX rings: regular ones, and XDP.
Upon an XDP set, do not borrow regular TX rings and convert them
into XDP ones, but allocate new ones, unless we hit the max number
of rings.
Which means that in systems with smaller #cores we will not consume
the current TX rings for XDP, while we are still in the num TX limit.
XDP TX rings counters are not shown in ethtool statistics.
Instead, XDP counters will be added to the respective RX rings
in a downstream patch.
This has no performance implications.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support XDP CQ type, and refactor the CQ type enum.
Rename the is_tx field to match the change.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The coalesce settings behave badly when changing just one value:
... # ethtool -c eth0
rx-usecs: 249
... # ethtool -C eth0 tx-usecs 250
... # ethtool -c eth0
rx-usecs: 248
This occurs due to rounding errors when calculating the microseconds
value - the divisons round down. This causes (eg) the rx-usecs to
decrease by one every time the tx-usecs value is set as per the above.
Fix this by making the divison round-to-nearest.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
'create_root_ns()' does not return an error pointer, so the test can be
simplified to be more consistent.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The documentation says that SGMII_LN_UCDR_SO_GAIN_MODE0 should be
set to 0, not 6, on the Qualcomm Technologies QDF2432.
Signed-off-by: Timur Tabi <timur@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changing the interrupt trigger region id to 2 and the
corresponding threshold set0/set1 values to 8/16.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since ethernet v1 hardware has a bug related to coalescing, disabling
this feature.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com>
Signed-off-by: Toan Le <toanle@apm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We used to always allocate the same number of TX and RX rings
so the support for having r_vectors without one of the rings
was dropped. That makes us, however, unnecessarily limited
to 8 TX rings (8 is the Linux RSS default) most of the time.
Also we are about to add channel count configuration via
ethtool, so bring that support back. TX rings can now default
to num_online_cpus() and RX rings to netif_get_num_default_rss_queues().
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
num_irqs is not used anywhere, replace it with max_r_vecs which holds
number of allocated RX/TX vectors and is going to be useful soon.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_irqs_wanted() doesn't really encapsulate much logic,
remove it and inline the calculations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use unsigned int consistently for vector/ring counts.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are currently using define for max TX rings to allocate IRQ
vectors. It's OK since the max number of rings for TX and RX
are currently the same, but lets make the code nicer by taking
max of the two.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already force ring sizes to be power of 2 so replace
modulo operations with AND (size - 1) in index calculations.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a separate buffer allocation function to be called
from NAPI. We can make assumptions about the context and
buffer size.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Speed up RX processing by moving to the alloc_frag()/build_skb()
paradigm. Since we're no longer mapping the entire buffer for
DMA add helpers which take care of calculating offsets and
lengths.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
nfp_net_rx() is quite long already and about to get longer.
Move buffer drop/recycle to a helper.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a helper function to calculate the buffer size at run time.
Buffer lengths will now depend on the FW prepend configuration
instead of assuming the most space consuming configuration and
defaulting to 2k buffers at initialization time.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't declare functions as static inline in .c files and
remove dead code it was hiding.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ether_setup() will be invoked by alloc_etherdev_mqs(), no need
to call it again.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drop all code related to nfp3200. It was never widely deployed
as a NIC.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are few variables in nfp_net_poll() which are used only
once or unused but set. Remove them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When relaxing the limitation on the number of unicast MAC filters
an interface can configure, qed started passing the MAC quota to
qede. However, the value is initialized only for PFs, causing VFs
to always try and configure themselves as promiscuous
[as they believe they lack the resources to configure the rx-mode].
Fixes: 7b7e70f979 ("qed*: Allow unicast filtering")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver is now setting the ndev's priv_flags instead of adding to it,
causing pktgen failure to utilize various features due to the loss
of the IFF_TX_SKB_SHARING indication.
Fixes: 7b7e70f979 ("qed*: Allow unicast filtering")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today. I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.
On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.
This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Fixes: 56ceecde1f ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed some of unnecessary if statements and unreachable code found by
static code analysis tool.
The return value of i40e_vsi_control_rings(..., false) is always 0. So,
test for non-zero will never be true. The function has been split into
"int i40e_vsi_start_rings()" and "void i40e_vsi_stop_rings()" for better
understanding.
Similarly, the function i40e_vsi_kill_vlan() never fails. So, checking
for return value is also unnecessary. Function definition changed to void.
The i40e_loopback_test() function is not implemented. The function and
all references to loopback testing were removed.
Change-ID: Id45cf66f6689ce2bc4e887de13f073e30e8431bd
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds I40E_NVMUPD_STATE_ERROR state for NVM update.
Without this patch driver has no possibility to return NVM image write
failure.This state is being set when ARQ rises error.
arq_last_status is also updated every time when ARQ event comes,
not only on error cases.
Change-ID: I67ce43ba22a240773c2821b436e96054db0b7c81
Signed-off-by: Maciej Sosin <maciej.sosin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
For some cases when reading from device are incorrect or image is
incorrect, this part of code causes crash due to division by zero.
Change-ID: I8961029a7a87b0a479995823ef8fcbf6471405e1
Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a VF is reset, it gets a new VSI, so all of its MAC filters go
away. Correctly set the number of filters to 0 when freeing VF
resources. This corrects a problem with failure to add filters when the
VF driver is reloaded.
Change-ID: I2acbecf734287b67473bb225293e14b5096acbef
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch reorders the logic at the end of i40e_tx_map to address the
fact that the logic was rather convoluted and much larger than it needed
to be.
In order to try and coalesce the code paths I have updated some of the
comments and repurposed some of the variables in order to reduce
unnecessary overhead.
This patch does the following:
1. Quit tracking skb->xmit_more with a flag, just max out packet_stride
2. Drop tail_bump and do_rs and instead just use desc_count and td_cmd
3. Pull comments from ixgbe that make need for wmb() more explicit.
Change-ID: Ic7da85ec75043c634e87fef958109789bcc6317c
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a common method for finding a VSI by type. The main
motivation for doing this is that the Flow Director path actually had two
ways of handling this, one stopped on first match and one did not. This
patch makes it so that all callers of this function will get the same
approach for finding a VSI.
Change-ID: Ibf25de8acd8466582520694424aa87da66965fbd
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove the second call to msleep outside the loop, and move the msleep
within the loop as the first step. This guarantees that a single loop
will wait the minimum time first, and then after the reset finishes we
no longer need an extra msleep.
Change-ID: Ib2086f0a142402b614f67846bc091754203a0b9a
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The current Rx timestamp hang logic is not very robust because it does
not notice a register is hung until all four timestamps have been
latched and we wait a full 5 seconds. Replace this logic with a newer Rx
hang detection based on storing the jiffies when we first notice
a receive timestamp event. We store each register's time separately,
along with a flag indicating if it is currently latched. Upon first
transitioning to latch, we will update the latch_events[i] jiffies
value. This indicates the time we first noticed this event. The watchdog
routine will simply check that the either the flag has been cleared, or
we have passed at least one second. In this case, it is able to clear
the Rx timestamp register under the assumption that it was for a dropped
frame. The benefit if this strategy is that we should be able to
detect and clear out stalled RXTIME_H registers before we exhaust the
supply of 4, and avoid complete stall of Rx timestamp events.
Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We need a locking mechanism to protect the hardware SYSTIME register
which is split over 2 values, and has internal hardware latching. We
can't allow multiple accesses at the same time. However....
The spinlock_t is overkill here, especially use of spin_lock_irqsave,
since every PTP access will halt hardirqs. Notice that the only places
which need the SYSTIME value are user context and are capable of sleeping.
Thus, it is safe to use a mutex here instead of the spinlock.
Change-ID: I971761a89b58c6aad953590162e85a327fbba232
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When hardware has taken a timestamp for a received packet, it indicates
which RXTIME register the timestamp was placed in by some bits in the
receive descriptor. It uses 3 bits, one to indicate if the descriptor
index is valid (ie: there was a timestamp) and 2 bits to indicate which
of the 4 registers to read. However, the driver currently does not check
the TSYNVALID bit and only checks the index. It assumes a zero index
means no timestamp, and a non zero index means a timestamp occurred.
While this appears to be true, it prevents ever reading a timestamp in
RXTIME[0], and causes the first timestamp the device captures to be
ignored.
Fix this by using the TSYNVALID bit correctly as the true indicator of
whether the packet has an associated timestamp.
Also rename the variable rsyn to tsyn as this is more descriptive and
matches the register names.
Change-ID: I4437e8f3a3df2c2ddb458b0fb61420f3dafc4c12
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We duplicate some code around adding and deleting filters using the
adminq interface. This is prone to errors in case there are bugs. Use
functions which extract the logic to their own portion so that we don't
duplicate it twice in code.
Change-ID: I60d68aeb887976787dec00b23ab386a106e61465
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We determine that a VSI is in vlan_mode whenever it has any filters
with a VLAN other than -1 (I40E_VLAN_ALL). The previous method of doing
so was to perform a loop whenever we needed the check. However, we can
notice that only place where filters are added (i40e_add_filter) can
change the condition from false to true, and the only place we can
return to false is in i40e_vsi_sync_filters_subtask. Thus, we can remove
the loop and use a boolean directly.
Doing this avoids looping over filters repeatedly especially while we're
already inside a loop over all the filters. This should reduce the
latency of filter operations throughout the driver.
Change-ID: Iafde08df588da2a2ea666997d05e11fad8edc338
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently there exists a bug where adding at least one VLAN and then
removing all VLANs leaves the mac filters for the VSI with an incorrect
value for 'vid' which indicates the mac filter's VLAN status.
The current implementation for handling the removal of VLANs is wrong
for a couple reasons. The first is that when i40e_vsi_kill_vlan
iterates through the MAC filters, it fails to account for the MAC filter
status; i.e. it's not accommodating for filters that are about to be
deleted. The second problem is that MAC filters can be deleted in other
places (specifically i40e_set_rx_mode). Thus if it occurs that all the
VLAN MAC filters get deleted we need to switch out of VLAN mode, but the
code path through i40e_vsi_kill_vlan has already been executed and we're
now stuck in VLAN mode.
This patch fixes the issue by removing the check from i40e_vsi_kill_vlan
and puts the check instead in i40e_sync_vsi_filters where we're
guaranteed to see all filter deletions and can properly detect when we
need to switch out of VLAN mode.
Change-ID: Ib38fe6034b356eee9a0e20b8a9eeed5ff2debcd9
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, we fail to correctly restore filters on the temporary add
list when we fail to allocate memory either for deletion or addition.
Replace calls to "goto out;" with calls to a new location that correctly
handles memory allocation failures.
Note that it is safe for us to call i40e_undo_filter_entries on the
tmp_del_list even after we've deleted filters because at this point it
will be empty, so we don't need to separate the logic for add and
delete failure.
Change-Id: Iee107fd219c6e03e2fd9645c2debf8e8384a8521
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Replace the mac_filter_list with a static size hash table of 8bits. The
primary advantage of this is a decrease in latency of operations related
to searching for specific MAC filters, including .set_rx_mode. Using
a linked list resulted in several locations which were O(n^2). Using
a hash table should give us latency growth closer to O(n*log(n)).
Change-ID: I5330bd04053b880e670210933e35830b95948ebb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
When inside a loop where we call i40e_del_filter we use an O(n^2)
pattern where i40e_del_filter calls i40e_find_filter for us. We can
avoid this O(n^2) logic by factoring a function, __i40e_del_filter() out
from the i40e_del_filter code. This allows us to re-use the delete logic
where appropriate without having to search for the filter twice.
This new function benefits several functions including i40e_vsi_add_vlan,
i40e_vsi_kill_vlan, i40e_del_mac_vlan_all, and i40e_vsi_release.
Change-ID: I75fabe0f53bf73f56b80d342e5fdcfcc28f4d3eb
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When adding new MAC address filters, the driver determines if it should
behave in VLAN mode (where all MAC addresses get assigned to every
existing VLAN) or in non-VLAN mode where MAC addresses get assigned the
VLAN_ANY identifier. Under some circumstances it is possible that a VLAN
has been marked for removal (such that all filters of that VLAN are set
to I40E_FILTER_REMOVE), and a subsequent call to i40e_put_mac_in_vlan
may occur prior to the driver subtask that syncs filters to the
hardware.
In this case, we may add filters to the new removed VLAN, even though it
should have been removed. This is most obvious when first adding a new
VLAN. We will delete all filters which are in I40E_VLAN_ANY (-1) and
then re-add them as in VLAN 0 (untagged). Then before we sync filters,
we will add new MAC address filter, which will be added to every VLAN
that exists. Unfortunately, this will include I40E_VLAN_ANY, so we will
end up incorrectly adding filters to the -1 VLAN. This can be fixed by
simply skipping all filters which are marked for removal.
A similar check is not necessary in i40e_del_mac_all_vlan, since we are
deleting, and any filter which we find already marked for removal would
simply be deleted again, which doesn't cause any issues.
Change-Id: I7962154013ce02fe950584690aeeb3ed853d0086
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a PVID has been assigned to a VSI, the function
i40e_put_mac_in_vlan arbitrarily modifies all filters
to have the same VLAN. This is obviously incorrect
because it could be modifying active filters without
putting them into the NEW state. The correct method
is to remove then re-add filters which is already done
in the code where we assign the PVID.
Fix this issue and a few other minor nits at the same
time. First, when we have a PVID don't even bother
looping and simply add the filter with the PVID immediately.
In the case of the loop, we now can remove several checks.
We also don't need to use i40e_find_filter first before
calling i40e_add_filter, since i40e_add_filter implicitly
does a lookup already.
Finally, update the return semantics of this function so
that on failure to add a filter it returns NULL, but on
success, it returns the last filter added. Otherwise,
we're just returning the last filter in the list. An
alternative fix might be to return 0 or an error code,
but this is pretty invasive to every call site.
Change-ID: I2325dfd843aec76d89fb0d7cb0e7c4f290a34840
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A future patch will be modifying these functions and making a call to
a static function which currently is defined after these functions. Move
them in a separate patch to ease review and ensure the moved code is
correct.
Change-ID: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The kernel provides __dev_uc_sync and __dev_mc_sync in order for drivers
which need individual notification of add and delete for each filter.
These functions allow us to vastly simplify our .set_rx_mode handler. We
need to implement two functions for sync and unsync which add and remove
filters respectively.
This change avoids a very complex and inefficient algorithm which
resulted in an abnormal latency for the .set_rx_mode NDO operation. The
resulting code after this change is more readable, more efficient, and
less code.
Due to the callback signature used by these functions we also must
update several other functions to take a const u8 * pointer.
Change-Id: I2ca7fd4e10c0c07ed2291db1ea41bf5987fc6474
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Originally the is_vf and is_netdev fields were added in order to
distinguish between VF and netdev filters in a single VSI. However, it
can be noted that we use separate VSI for SRIOV VFs and for netdev VSI.
Thus, since a single VSI should only ever have one type of filter, we
can simply remove the checks and remove the typing.
In a similar fashion, we can note that the only remaining way to get
multiple filters of a single type is through a debug command that was
added to debugfs. This command is useless in practice, and results in
causing bugs if we keep counter tracking but lose the is_vf and
is_netdev protections as desired above.
Since the only time we'd actually have a counter value besides 0 and
1 is through use of this debugfs hook, we can remove this unnecessary
command, and the entire counter logic it required.
We vastly simplify mac filters by removing
(a) the distinction between VF and netdev filters
(b) counting logic
(c) the ability to add and remove filters bypassing the stack via debugfs
Change-ID: Idf916dd2a1159b1188ddbab5bef6b85ea6bf27d9
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trival fix, dev_err message is missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, each interfaces assumes it receives an equal portion
of HW/FW resources, but this is wasteful - different partitions
[and specifically, parititions exposing different protocol support]
might require different resources.
Implement a new resource learning scheme where the information is
received directly from the management firmware [which has knowledge
of all of the functions and can serve as arbiter].
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets several restrictions about the number of supported VFs
according to available HW/FW resources.
This creates a problem as there are constellations which can't be
supported [as limitation don't accurately describe the resources],
as well as holes where enabling IOV would fail due to supposed
lack of resources.
This introduces a new interal feature - vf-queues, which would
be used to lift some of the restriction and accurately enumerate
the queues that can be used by a given PF's VFs.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Today, RDMA capabilities are learned from management firmware
which provides a per-device indication for all interfaces.
Newer management firmware is capable of providing a per-device
indication [would later be extended to either RoCE/iWARP].
Try using this newer learning mechanism, but fallback in case
management firmware is too old to retain current functionality.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
While the qed_lm_maps is closely tied with the QED_LM_* defines,
when iterating over the array use actual size instead of the qed
define to prevent future possible issues.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Management firmware is interested in various tidbits about
the driver - including the driver state & several configuration
related fields [MTU, primtary MAC, etc.].
This adds the necessray logic to update MFW with such configurations,
some of which are passed directly via qed while for others APIs
are provide so that qede would be able to later configure if needed.
This also introduces a new default configuration for MTU which would
replace the default inherited by being an ethernet device.
Signed-off-by: Sudarsana Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the device, a MID entry represents a group of local ports, which can
later be bound to a MDB entry.
The lookup of an existing MID entry is currently done using the provided
MC MAC address and VID, from the Linux bridge. However, this can result
in an incorrect reuse of the same MID index in different VLAN-unaware
bridges (same IP MC group and VID 0).
Fix this by performing the lookup based on FID instead of VID, which is
unique across different bridges.
Fixes: 3a49b4fde2 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an interface is configured to use Tx/Rx-only queues,
the length of the statistics would be shortened to accomodate only the
statistics required per-each queue, and the values would be provided
accordingly.
However, the strings provided would still contain both Tx and Rx strings
for each one of the queues [regardless of its configuration], which might
lead to out-of-bound access when filling the buffers as well as incorrect
statistics presented.
Fixes: 9a4d7e86ac ("qede: Add support for Tx/Rx-only queues.")
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following patch fixes an issue with the ldmvsw driver where
the network connection of a guest domain becomes non-functional after
the guest domain has panic'd and rebooted.
The root cause was determined to be from the following series of
events:
1. Guest domain panics - resulting in the guest no longer processing
network packets (from ldmvsw driver)
2. The ldmvsw driver (in the control domain) eventually exerts flow
control due to no more available tx drings and stops the tx queue
for the guest domain
3. The LDC of the network connection for the guest is reset when
the guest domain reboots after the panic.
4. The LDC reset event is received by the ldmvsw driver and the ldmvsw
responds by clearing the tx queue for the guest.
5. ldmvsw waits indefinitely for a DATA ACK from the guest - which is
the normal method to re-enable the tx queue. But the ACK never comes
because the tx queue was cleared due to the LDC reset.
To fix this issue, in addition to clearing the tx queue, re-enable the
tx queue on a LDC reset. This prevents the ldmvsw from getting caught in
this deadlocked state of waiting for a DATA ACK which will never come.
Signed-off-by: Aaron Young <Aaron.Young@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The second SET_NETDEV_DEV() in the hunk should be
removed.
Reported-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYFfxWAAoJEORje4g2clinr4MQAMUMoO4akLhppyUuLiT2ZZHc
KshFwUF5RnZq6c2qXSxTVhfajyG6q71EwQNGaAMeGsjoCXM+u99Adgp/lYkXxbzY
e4W3gCvjGWIik5d3JRVq07XutVpeJIc/Qvvc5zc1lUYNR5f71iBw538eG9ic4PXi
4/CpRvcsa8Z9sbtKPcjHwQQRd4ewx/KAD6QyOsVz9GgkBeNMYag3SO731DYSkjRC
MYK85arNC1JUE/MHQKIfYQvjiJVfEyt2FvC8v9tW+bhzP6dAzxRY0yd8ZFJtCiYH
GFGy8vdeCA/0dFRD5cYKPKBiFwUbRC8bt2lLC5ZoUic2nZ23LlO67uDkaPDRYckt
oyhErFRX6Q/goqKFCI4tLUoSBF1bhy9EnbWyOWmcW7qpXRD3VCclS0Ctr++yJnv2
bhhlID56f+dX+rnW/OAERrk8MdVHo5xBUzQ8ZAAF3WDP9LqW+qlYVrEvrFqFIeFM
OCGUbW2xsZaHMZRyx0K6068hy8O4EujjgC9PARi65rrZAAwxlDm4ElJEYvzXZVXA
YoMeXiZGrhoj7+h8OorV0TyB+7mUgxFNlq0tCoi193QS+zIuQqf3XYNmuQGCUflF
YdzZs7/9LpANN4e5yDTCq3CIr8yYv9sRrdnSX0iShvbFBAKClLxpjj2xkpayHFVR
8CRvlb5O1v1fIwSn2z/I
=D5Ix
-----END PGP SIGNATURE-----
Merge tag 'shared-for-4.10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma
Saeed Mahameed says:
====================
Mellanox mlx5 core driver updates 2016-10-25
This series contains some updates and fixes of mlx5 core and
IB drivers with the addition of two features that demand
new low level commands and infrastructure updates.
- SRIOV VF max rate limit support
- mlx5e tc support for FWD rules with counter.
Needed for both net and rdma subsystems.
Updates and Fixes:
From Saeed Mahameed (2):
- mlx5 IB: Skip handling unknown mlx5 events
- Add ConnectX-5 PCIe 4.0 VF device ID
From Artemy Kovalyov (2):
- Update struct mlx5_ifc_xrqc_bits
- Ensure SRQ physical address structure endianness
From Eugenia Emantayev (1):
- Fix length of async_event_mask
New Features:
From Mohamad Haj Yahia (3): mlx5 SRIOV VF max rate limit support
- Introduce TSAR manipulation firmware commands
- Introduce E-switch QoS management
- Add SRIOV VF max rate configuration support
From Mark Bloch (7): mlx5e Tc support for FWD rule with counter
- Don't unlock fte while still using it
- Use fte status to decide on firmware command
- Refactor find_flow_rule
- Group similar rules under the same fte
- Add multi dest support
- Add option to add fwd rule with counter
- mlx5e tc support for FWD rule with counter
Mark here fixed two trivial issues with the flow steering core, and did
some refactoring in the flow steering API to support adding mulit destination
rules to the same hardware flow table entry at once. In the last two patches
added the ability to populate a flow rule with a flow counter to the same flow entry.
V2: Dropped some patches that added new structures without adding any usage of them.
Added SRIOV VF max rate configuration support patch that introduces
the usage of the TSAR infrastructure.
Added flow steering fixes and refactoring in addition to mlx5 tc
support for forward rule with counter.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
SwitchIB and SwitchIB-2 are Infiniband switches with up to 36 ports. This
driver initialize the hardware and Firmware which implements the IB
management and connection with the SM.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SwitchX-2 is IB capable device. This patch add a support to change the
port type between Ethernet and Infiniband.
When the port is set to IB, the FW implements the Subnet Management Agent
(SMA) manage the port. All port attributes can be control remotely by
the SM.
Usage:
$ devlink port show
pci/0000:03:00.0/1: type eth netdev eth0
pci/0000:03:00.0/3: type eth netdev eth1
pci/0000:03:00.0/5: type eth netdev eth2
pci/0000:03:00.0/6: type eth netdev eth3
pci/0000:03:00.0/8: type eth netdev eth4
$ devlink port set pci/0000:03:00.0/1 type ib
$ devlink port show
pci/0000:03:00.0/1: type ib
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we are about to add Infiniband port remove and create we will add
"eth" prefix to port create and remove APIs.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add "port_type_set" API to mlxsw core. The core layer send the change type
callback to the port along with it's private information.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we are about to introduce IB port APIs, we will add prefixes to
existing APIs.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to put a port in Infiniband fabric it should be assigned
to separate swid (Switch partition) that initialized as IB swid.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, devlink register/unregister is done directly from
spectrum/switchx2 port create/remove functions. With a need to
introduce a port type change, the devlink port instances have to be
persistent across type changes, therefore across port create/remove
function calls. So do a bit of reshuffling to achieve that.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to change a port type to Infiniband port we should change his
mapping from local-port to Infiniband. Adding the PLIB (Port Local to
InfiniBand) allows this mapping.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to support Infiniband fabric, we need to introduce IB speeds and
capabilities to PTYS emads.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to add Infiniband support to PTYS. In order to maintain proper
conventions, we will change pack and unpack prefix to eth.
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In SwitchX-2 we configure the port speed to negotiate with 40G link only.
Add support for all other supported speeds.
Fixes: 31557f0f97 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Export to userspace the front panel name of the port, so that udev can
rename the ports accordingly. The convention suggested by switchdev
documentation is used: pX
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be symmentrical with create and do the check outside the remove function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Be symmentrical with create and do the check outside the remove function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do it in a same way we do it in spectrum. Check if port is usable first
and only in that case create a port instance.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We recently discovered a bug in the firmware in which a field's length in
one of the registers was incorrectly set. This caused the firmware to
access garbage data that wasn't initialized by the driver and therefore
emit error messages.
While the bug is already fixed and the driver usually zeros the buffers
passed to the firmware, there are a handful of cases where this isn't
done. Zero the buffer in these cases and prevent similar bugs from
recurring, as they tend to be hard to debug.
Fixes: 52581961d8 ("mlxsw: core: Implement fan control using hwmon")
Signed-off-by: Elad Raz <eladr@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a FWD rule using tc create also a HW counter
for this rule.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently the code supports only drop rules to possess counters,
add that ability also for fwd rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently when calling mlx5_add_flow_rule we accept
only one flow destination, this commit allows to pass
multiple destinations.
This change forces us to change the return structure to a more
flexible one. We introduce a flow handle (struct mlx5_flow_handle),
it holds internally the number for rules created and holds an array
where each cell points the to a flow rule.
From the consumers (of mlx5_add_flow_rule) point of view this
change is only cosmetic and requires only to change the type
of the returned value they store.
From the core point of view, we now need to use a loop when
allocating and deleting rules (e.g given to us a flow handler).
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When adding a new rule, if we can match it with compare_match_value and
flow tag we might be able to insert the rule to the same fte.
In order to do that, there must be an overlap between the actions of the
fte and the new rule.
When updating the action of an existing fte, we must tell the firmware
we are doing so.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The way we compare between two dests will need to be used in other
places in the future, so we factor out the comparison logic
between two dests into a separate function.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
An fte status becomes FS_FTE_STATUS_EXISTING only after it was
created in HW. We can use this in order to simplify the logic on
what firmware command to use. If the status isn't FS_FTE_STATUS_EXISTING
we need to create the fte, otherwise we need only to update it.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
When adding a new rule to an fte, we need to hold the fte lock
until we add that rule to the fte and increase the fte ref count.
Fixes: 0c56b97503 ("net/mlx5_core: Introduce flow steering API")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Add TSAR to the eswitch which will act as the vports rate limiter.
Create/Destroy TSAR on Enable/Dsiable SRIOV.
Attach/Detach vport to eswitch TSAR on Enable/Disable vport.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
TSAR (stands for Transmit Scheduling ARbiter) is a hardware component
that is responsible for selecting the next entity to serve on the
transmit path.
The arbitration defines the QoS policy between the agents connected to
the TSAR.
The TSAR is a consist two main features:
1) BW Allocation between agents:
The TSAR implements a defecit weighted round robin between the agents.
Each agent attached to the TSAR is assigned with a weight and it is
awarded transmission tokens according to this weight.
2) Rate limer per agent:
Each agent attached to the TSAR is (optionally) assigned with a rate
limit.
TSAR will not allow scheduling for an agent exceeding its defined rate
limit.
In this patch we implement the API of manipulating the TSAR.
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
For the mlx5 driver to support ConnectX-5 PCIe 4.0 VFs, we add the
device ID "0x101a" to mlx5_core_pci_table.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
According to PRM async_event_mask have to be 64 bits long.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Pull networking fixes from David Miller:
"Lots of fixes, mostly drivers as is usually the case.
1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
Khoroshilov.
2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
Pedersen.
3) Don't put aead_req crypto struct on the stack in mac80211, from
Ard Biesheuvel.
4) Several uninitialized variable warning fixes from Arnd Bergmann.
5) Fix memory leak in cxgb4, from Colin Ian King.
6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.
7) Several VRF semantic fixes from David Ahern.
8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.
9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.
10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.
11) Fix stale link state during failover in NCSCI driver, from Gavin
Shan.
12) Fix netdev lower adjacency list traversal, from Ido Schimmel.
13) Propvide proper handle when emitting notifications of filter
deletes, from Jamal Hadi Salim.
14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.
15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.
16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.
17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.
18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
Leitner.
19) Revert a netns locking change that causes regressions, from Paul
Moore.
20) Add recursion limit to GRO handling, from Sabrina Dubroca.
21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.
22) Avoid accessing stale vxlan/geneve socket in data path, from
Pravin Shelar"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
geneve: avoid using stale geneve socket.
vxlan: avoid using stale vxlan socket.
qede: Fix out-of-bound fastpath memory access
net: phy: dp83848: add dp83822 PHY support
enic: fix rq disable
tipc: fix broadcast link synchronization problem
ibmvnic: Fix missing brackets in init_sub_crq_irqs
ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
net/mlx4_en: Save slave ethtool stats command
net/mlx4_en: Fix potential deadlock in port statistics flow
net/mlx4: Fix firmware command timeout during interrupt test
net/mlx4_core: Do not access comm channel if it has not yet been initialized
net/mlx4_en: Fix panic during reboot
net/mlx4_en: Process all completions in RX rings after port goes up
net/mlx4_en: Resolve dividing by zero in 32-bit system
net/mlx4_core: Change the default value of enable_qos
net/mlx4_core: Avoid setting ports to auto when only one port type is supported
net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
...
Driver allocates a shadow array for transmitted SKBs with X entries;
That means valid indices are {0,...,X - 1}. [X == 8191]
Problem is the driver also uses X as a mask for a
producer/consumer in order to choose the right entry in the
array which allows access to entry X which is out of bounds.
To fix this, simply allocate X + 1 entries in the shadow array.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When MTU is changed from 9000 to 1500 while there is burst of inbound 9000
bytes packets, adaptor sometimes delivers 9000 bytes packets to 1500 bytes
buffers. This causes memory corruption and sometimes crash.
This is because of a race condition in adaptor between "RQ disable"
clearing descriptor mini-cache and mini-cache valid bit being set by
completion of descriptor fetch. This can result in stale RQ desc being
cached and used when packets arrive. In this case, the stale descriptor
have old MTU value.
Solution is to write RQ->disable twice. The first write will stop any
further desc fetches, allowing the second disable to clear the mini-cache
valid bit without danger of a race.
Also, the check for rq->running becoming 0 after writing rq->enable to 0
is not done properly. When incoming packets are flooding the interface,
rq->running will pulse high for each dropped packet. Since the driver was
waiting for 10us between each poll, it is possible to see rq->running = 1
1000 times in a row, even though it is not actually stuck running.
This results in false failure of vnic_rq_disable(). Fix is to try more
than 1000 time without delay between polls to ensure we do not miss when
running goes low.
In old adaptors rq->enable needs to be re-written to 0 when posted_index
is reset in vnic_rq_clean() in order to keep rq->prefetch_index in sync.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do this so the sysfs has "device" link correctly set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2016-10-27
This series contains fixes to ixgbe and i40e.
Emil fixes a NULL pointer dereference when a macvlan interface is brought
up while the PF is still down.
David root caused the original panic that was fixed by commit id
(a036244c06 "i40e: Fix kernel panic on enable/disable LLDP") and the
fix was not quite correct, so removed the get_default_tc() and replaced
it with a #define since there is only one TC supported as a default.
Guilherme Piccoli fixes an issue where if we modprobe the driver module
without enough MSI-X interrupts, then unload the module and reload it
again, the kernel would crash. So if we fail to allocate enough MSI-X
interrupts, we should disable them since they were previously enabled.
Huaibin Wang found that the order of the arguments for
ndo_dflt_bridge_getlink() were in the correct order, so fix the order.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Following the previous patch, as an optimization, the slave will
not even bother sending the DUMP_ETH_STATS command over the
comm channel.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlx4_en_DUMP_ETH_STATS took the *counter mutex* and then
called the FW command, with WRAPPED attribute. As a result, the fw command
is wrapped on the Hypervisor when it calls mlx4_en_DUMP_ETH_STATS.
The FW command wrapper flow on the hypervisor takes the *slave_cmd_mutex*
during processing.
At the same time, a VF could be in the process of coming up, and could
call mlx4_QUERY_FUNC_CAP. On the hypervisor, the command flow takes the
*slave_cmd_mutex*, then executes mlx4_QUERY_FUNC_CAP_wrapper.
mlx4_QUERY_FUNC_CAP wrapper calls mlx4_get_default_counter_index(),
which takes the *counter mutex*. DEADLOCK.
The fix is that the DUMP_ETH_STATS fw command should be called with
the NATIVE attribute, so that on the hypervisor, this command does not
enter the wrapper flow.
Since the Hypervisor no longer goes through the wrapper code, we also
simply return 0 in mlx4_DUMP_ETH_STATS_wrapper (i.e.the function succeeds,
but the returned data will be all zeroes).
No need to test if it is the Hypervisor going through the wrapper.
Fixes: f9baff509f ("mlx4_core: Add "native" argument to mlx4_cmd ...")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently interrupt test that is part of ethtool selftest runs the
check over all interrupt vectors of the device.
In mlx4_en package part of interrupt vectors are uninitialized since
mlx4_ib doesn't exist. This causes NOP FW command to time out.
Change logic to test current port interrupt vectors only.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the Hypervisor, there are several FW commands which are invoked
before the comm channel is initialized (in mlx4_multi_func_init).
These include MOD_STAT_CONFIG, QUERY_DEV_CAP, INIT_HCA, and others.
If any of these commands fails, say with a timeout, the Hypervisor
driver enters the internal error reset flow. In this flow, the driver
attempts to notify all slaves via the comm channel that an internal error
has occurred.
Since the comm channel has not yet been initialized (i.e., mapped via
ioremap), this will cause dereferencing a NULL pointer.
To fix this, do not access the comm channel in the internal error flow
if it has not yet been initialized.
Fixes: 55ad359225 ("net/mlx4_core: Enable device recovery flow with SRIOV")
Fixes: ab9c17a009 ("mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a kernel panic that occurs as a result of an asynchronous event
handled in roce_gid_mgmt:
mlx4_en_get_drvinfo is called and accesses freed resources.
This happens in a shutdown flow only, since pci device is destroyed
while netdevice is still alive.
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is a race between incoming traffic and
initialization flow. HW is able to receive the packets
after INIT_PORT is done and unicast steering is configured.
Before we set priv->port_up NAPI is not scheduled and
receive queues become full. Therefore we never get
new interrupts about the completions.
This issue could happen if running heavy traffic during
bringing port up.
The resolution is to schedule NAPI once port_up is set.
If receive queues were full this will process all cqes
and release them.
Fixes: c27a02cd94 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When doing roundup_pow_of_two for large enough number with
bit 31, an overflow will occur and a value equal to 1 will
be returned. In this case 1 will be subtracted from the return
value and division by zero will be reached.
Fixes: 31c128b66e ("net/mlx4_en: Choose time-stamping shift value according to HW frequency")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the default status of quality of service back to disabled,
as it hurts performance in some cases.
Fixes: 38438f7c7e ("net/mlx4: Set enhanced QoS support by default when ...")
Signed-off-by: Moshe Lazer <moshel@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When only one port type is supported, it should be read only.
We reject changing requests, even to the auto sense mode.
Fixes: 27bf91d6a0 ("mlx4_core: Add link type autosensing")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The resource type enum in the resource tracker was incorrect.
RES_EQ was put in the position of RES_NPORT_ID (a FC resource).
Since the remaining resources maintain their current values,
and RES_EQ is not passed from slaves to the hypervisor in any
FW command, this change affects only the hypervisor.
Therefore, there is no backwards-compatibility issue.
Fixes: 623ed84b1f ("mlx4_core: initial header-file changes for SRIOV support")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Schedule these XPORT event tasks in the shared workqueue
so that IRQs are not freed in an interrupt context when
sub-CRQs are released.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MAC is capable of RGMII mode and that is probably a more typical
connection type than GMII today (eg it is used by Marvell Reference
designs for several SOCs). Let DT users specify the standard
phy-connection-type = "rgmii-id";
On a phy node.
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to release resources properly in error handling path of
alloc_uld_rxqs(), This patch also removes unwanted arguments
and avoids calling the same function twice.
Fixes: 94cdb8bb99 (cxgb4: Add support for dynamic allocation
of resources for ULD
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:
ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the firmware can't work with a page with dma address 0.
Passing such an address to the firmware will cause the give_pages
command to fail.
To avoid this, in case we get a 0 dma address of a page from the
dma engine, we avoid passing it to FW by remapping to get an address
other than 0.
Fixes: bf0bf77f65 ('mlx5: Support communicating arbitrary host...')
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case that the kernel PCI error handlers are not called, we will
trigger our own recovery flow.
The health work will give priority to the kernel pci error handlers to
recover the PCI by waiting for a small period, if the pci error handlers
are not triggered the manual recovery flow will be executed.
We don't save pci state in case of manual recovery because it will ruin the
pci configuration space and we will lose dma sync.
Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently there is a race between the health care work and the kernel
pci error handlers because both of them detect the error, the first one
to be called will do the error handling.
There is a chance that health care will disable the pci after resuming
pci slot.
Also create a separate WQ because now we will have two types of health
works, one for the error detection and one for the recovery.
Fixes: 89d44f0a6c ('net/mlx5_core: Add pci error handlers to mlx5_core driver')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The health sick status should be cleared when we start the health poll.
This is crucial for driver reload (unload + load) in order to behave
right in case of health issue.
Fixes: fd76ee4da5 ('net/mlx5_core: Fix internal error detection conditions')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Ingress/Egress ACL enable function may fail and it should return
status to its caller to avoid NULL pointer dereference.
Fixes: f942380c12 ('net/mlx5: E-Switch, Vport ingress/egress ACLs rules for spoofchk')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Detaching the netdev before unregistering it cause some netdev cleanup
ndos to fail because they check presence of the netdev, so we need to
unregister the netdev first.
Fixes: 26e59d8077 ('net/mlx5e: Implement mlx5e interface attach/detach callbacks')
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of predicting the index of the wanted LRO timeout value from
hardware capabilities, look for the nearest LRO timeout value.
Fixes: 5c50368f38 ('net/mlx5e: Light-weight netdev open/stop')
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Mohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, last use timestamp is initialized to zero.
This is not the expected value by higher layers such as
when we do TC action offloading. To fix that, set it to
the current time, e.g when the counter/rule is offloaded.
This is the same behaviour of non-offloaded TC actions.
Fixes: 43a335e055 ('mlx5_core: Flow counters infrastructure')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Autogroups groups num is increased when creating a new flow group,
but is never decreased.
Now decreasing it when deleting a flow group.
Fixes: f0d22d1874 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Finding a new autogroup range is done by going over a group list
sorted by each group start index. The search is stopped after finding
the first free range. Adding the newly created group to the list is
wrongly added to the end of the list regardless of its start index as
the parameter of where to insert it is ignored.
This commit makes sure to use that unused parameter to insert
it where requested.
Fixes: f0d22d1874 ('net/mlx5_core: Introduce flow steering autogrouped flow table')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Always query the HCA caps after setting them to update the capablities
data structures. Not doing so results in incorrect capabilities being
reported including max_dc, max_qp and several others.
Fixes: 59211bd3b6 ("net/mlx5: Split the load/unload flow into hardware
and software flows")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ARM 64B cache line systems have L1_CACHE_BYTES set to 128.
cache_line_size() will return the correct size.
Fixes: cf50b5efa2fe('net/mlx5_core/ib: New device capabilities
handling.')
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So the i40e driver had a really convoluted configuration for how to handle
the debug flags contained in msg_level. Part of the issue is that the
driver has its own 32 bit mask that it was using to track a separate set of
debug features. From what I can tell it was trying to use the upper 4 bits
to determine if the value was meant to represent a bit-mask or the numeric
value provided by debug level.
What this patch does is clean this up by compressing those 4 bits into bit
31, as a result we just have to perform a check against the value being
negative to determine if we are looking at a debug level (positive), or a
debug mask (negative). The debug level will populate the msg_level, and
the debug mask will populate the debug_mask in the hardware struct.
I added similar logic for ethtool. If the value being provided has bit 31
set we assume the value being provided is a debug mask, otherwise we assume
it is a msg_enable mask. For displaying we only provide the msg_enable,
and if debug_mask is in use we will print it to the dmesg log.
Lastly I removed the debugfs interface. It is redundant with what we
already have in ethtool and really doesn't belong anyway.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Patch a036244c06 "i40e: Fix kernel panic on enable/disable LLDP"
introduced an error in bit logic.
Originally this bit manipulation was meant to clear two bits to indicate
that DCB was not enabled or capable. An "&" was incorrectly used instead
of an "|" bit operator to combine the two bitmasks into one. This also
created a static checker error since the resultant code was a no-op.
This patch fixes the error by using the correct bit-wise operator.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This is code refactoring. This patch removes the workaround which deleted
a default MAC filter added by the firmware when the interface was brought
up. This filter caused frames to pass disregarding the VLAN tagging.
It used to be automatically applied after reset in pre-SRA FW versions.
This workaround is not needed in production NICs and hence can be removed.
Change-ID: I129fe1aae1f17b5a224c9b29a996d916aa1be1ec
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem where it could take a very
long time (>100 msec) to print the link down notification.
This problem is fixed by changing how often we update link
info from fw, when link is down. Without this patch, it can
take over 100msec to notify user link is down.
Change-ID: Ib876eb30834c7080792becd13ee093b9cbb35d78
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch cleans up several pieces of redundant code in the Rx clean-up
paths.
The first bit is that hdr_addr and the status_err_len portions of the Rx
descriptor represent the same value. As such there is no point in setting
them to 0 before setting them to 0. I'm dropping the second spot where we
are updating the value to 0 so that we only have 1 write for this value
instead of 2.
The second piece is the checking for the DD bit in the packet. We only
need to check for a non-zero value for the status_err_len because if the
device is done with the descriptor it will have written something back and
the DD is just one piece of it. In addition I have moved the reading of
the Rx descriptor bits related to rx_ptype down so that they are actually
below the dma_rmb() call so that we are guaranteed that we don't have any
funky 64b on 32b calls causing any ordering issues.
Change-ID: I256e44a025d3c64a7224aaaec37c852bfcb1871b
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Ethtool -L option with the combined parameter is for changing the number of
multi-purpose channels of the specified network device. The pre-set maximum
for the combined channels is cpu dependent. Currently, for an i40e device,
when the user sets a value between 64 and the maximum that the cpu can
support for the combined parameter, the i40e driver displays the confusing
info in dmesg to only show 64 as the RSS count regardless of what the
accepted user input is as long as it is larger than 64.
This patch fixes the message in the i40e driver when the user uses
ethtool -L to change the number of the combined channels to consistently
display the user requested value if it is valid and accepted by ethtool.
Change-ID: Ia80a68bc844b779a49e0f76e7d3dcc915032d9af
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Move some data to text
$ size drivers/net/ethernet/intel/i40e/i40e_ethtool.o*
text data bss dec hex filename
25012 0 32 25044 61d4 drivers/net/ethernet/intel/i40e/i40e_ethtool.o.new
22868 2120 32 25020 61bc drivers/net/ethernet/intel/i40e/i40e_ethtool.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
There exists a bug in which a 'perfect storm' can occur and cause
interrupts to fail to be correctly affinitized. This causes unexpected
behavior and has a substantial impact on performance when it happens.
The bug occurs if there is heavy traffic, any number of CPUs that have
an i40e interrupt are pegged at 100%, and the interrupt afffinity for
those CPUs is changed. Instead of moving to the new CPU, the interrupt
continues to be polled while there is heavy traffic.
The bug is most readily realized as the driver is first brought up and
all interrupts start on CPU0. If there is heavy traffic and the
interrupt starts polling before the interrupt is affinitized, the
interrupt will be stuck on CPU0 until traffic stops. The bug, however,
can also be wrought out more simply by affinitizing all the interrupts
to a single CPU and then attempting to move any of those interrupts off
while there is heavy traffic.
This patch fixes the bug by registering for update notifications from
the kernel when the interrupt affinity changes. When that fires, we
cache the intended affinity mask. Then, while polling, if the cpu is
pegged at 100% and we failed to clean the rings, we check to make sure
we have the correct affinity and stop polling if we're firing on the
wrong CPU. When the kernel successfully moves the interrupt, it will
start polling on the correct CPU. The performance impact is minimal
since the only time this section gets executed is when performance is
already compromised by the CPU.
Change-ID: I4410a880159b9dba1f8297aa72bef36dca34e830
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Group together the minimum set of offload capabilities that are always
supported by VF in base mode. This define would be used by PF to make
sure VF in base mode gets minimum of base capabilities .
Change-ID: Id5e8f22ba169c8f0a38d22fc36b2cb531c02582c
Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Allow the client interface to reopen existing clients if they were
closed. This allows clients to recover from reset, which is essential
for supporting VF RDMA. In one instance, the driver was not clearing the
open bit when the client was closed. Add the code to clear this bit so
that the state is accurate and the driver will not attempt to reopen
already-open clients. Remove the ref_cnt variable; it was just getting
in the way and was not being used consistently.
Change-ID: Ic71af4553b096963ac0c56a997f887c9a4ed162d
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We cannot currently support SCTP in the hardware, and IPV4_FLOW is not used
anywhere by the software so we can go through and drop the functionality
related to these two flow types.
In addition we cannot support masking based on the protocol value so if the
user is expecting a value other than TCP or UDP we should simply return an
error rather then trying to allocate a filter for a rule that will only
partially match what the user requested.
Change-ID: I10d52bb97d8104d76255fe244551814ff9531a63
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The function is not used so there is no need to carry it forward. I have
plans to add a slightly different function that can be inlined to handle
the same kind of functionality.
Change-ID: Ie2dfcb189dc75e5fbc156bac23003e3b4210ae0f
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Incorrect bit mask was used for testing "get link status" response.
Instead of I40E_AQ_LSE_ENABLE (which is actually 0x03) it most probably
should be I40E_AQ_LSE_IS_ENABLED (which is defined as 0x01).
Change-ID: Ia199142906720507f847de3a33a25c61a9781b2f
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We can reorder the busy wait loop at the start of the Flow Director
transmit function to reduce the overall code size while still retaining the
same functionality. As such I am taking advantage of the opportunity to do
so.
Change-ID: I34c403ca001953c6ac9816e65d5305e73d869026
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem in the client interface that
was causing random stack traces in RDMA driver load and
unload tests. This patch fixes the problem by checking
for an existing client before trying to open it. Without
this patch, there is a timing related null pointer deref.
Change-ID: Ib73d30671a27f6f9770dd53b3e5292b88d6b62da
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Do this so the sysfs has "device" link correctly set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do this so the sysfs has "device" link correctly set.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, mlxsw_pci.ko is the module that registers PCI table for all
drivers (spectrum and switchx2). That is problematic for example with
dracut. Since mlxsw_spectrum.ko and mlxsw_switchx2.ko are loaded
dynamically from within mlxsw_core.ko, dracut does not have track of
them and avoids them from being included in initramfs.
So make this in an ordinary way and define the PCI tables in individual
driver modules, so it can be properly loaded and included in dracut
initramfs image. As a side effect, this patch could remove no longer
necessary driver "kind" strings which were used to link PCI ids with
individual mlxsw drivers.
Suggested-by: Ivan Vecera <ivecera@redhat.com>
Tested-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pci.h needs to be used for inner function declarations. So move the
original one to more appropriate name, pci_hw.h.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only trees which are in use should be compared to requested prefix usage.
Fixes: 53342023ee ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the prefix bitlist is not saved for LPM trees, causing the
compare to always fail which causes the tree to be destroyed and created
for every inserted and removed FIB entry. So fix this by saving
the bitlist as it should have been done from the very beginning.
Fixes: 53342023ee ("mlxsw: spectrum_router: Implement LPM trees management")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d871a, but is compiled
only since commit 9df70b6641.
Note that this may break netlink dumps.
Fixes: 9df70b6641 ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d871a ("switchdev; add VLAN support for port's bridge_getlink")
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If we fail on allocating enough MSI-X interrupts, we should disable
them since they were previously enabled in this point of code.
Not disabling them can lead to WARN_ON() being triggered and subsequent
failure in enabling MSI as a fallback; the below message was shown without
this patch while we played with interrupt allocation in i40e driver:
[ 21.461346] sysfs: cannot create duplicate filename '/devices/pci0007:00/0007:00:00.0/0007:01:00.3/msi_irqs'
[ 21.461459] ------------[ cut here ]------------
[ 21.461514] WARNING: CPU: 64 PID: 1155 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x88/0xc0
Also, we noticed that without this patch, if we modprobe the module without
enough MSI-X interrupts (triggering the above warning), unload the module
and re-load it again, we got a crash on the system.
Signed-off-by: Guilherme G Piccoli <gpiccoli@linux.vnet.ibm.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
in commit a036244c06 a fix
was put into place to avoid a kernel panic when a non-
supported traffic class configuration was put into place
and then lldp was enabled/disabled on the link partner
switch. This fix caused it to be necessary to
unload/reload the driver to reenable DCB once a supported
TC config was in place.
The root cause of the original panic was that the function
i40e_pf_get_default_tc was allowing for a default TC other
than TC 0, and only TC 0 is supported as a default.
This patch removes the get_default_tc function and replaces
it with a #define since there is only one TC supported as
a default.
Change-Id: I448371974e946386d0a7718d73668b450b7c72ef
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Ronald Bynoe <ronald.j.bynoe@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix NULL pointer dereference in the case where a macvlan interface is
brought up while the PF is still down:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa0170fb2>] ixgbe_alloc_rx_buffers+0x42/0x1a0 [ixgbe]
Call Trace:
[<ffffffffa017336b>] ixgbe_configure_rx_ring+0x2eb/0x3d0 [ixgbe]
[<ffffffffa0173811>] ixgbe_fwd_ring_up+0xd1/0x380 [ixgbe]
[<ffffffffa0179709>] ixgbe_fwd_add+0x149/0x230 [ixgbe]
[<ffffffffa0113480>] macvlan_open+0x260/0x2b0 [macvlan]
Reported-by: Matthew Garrett <mjg59@coreos.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
trivial fix to spelling mistake in dev_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Jon Mason <jon.mason@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: d894be57ca92('ethernet: use net core MTU range checking in more drivers')
CC: Jarod Wilson <jarod@redhat.com>
CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Acked-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This node pointer is returned by of_get_child_by_name() with refcount
incremented in this function. of_node_put() on it before exitting this
function.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer() instead of init_timer(), being the preferred/standard
way to set a timer up.
Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
active timer (if the timer is inactive it will be activated).
Use setup_timer and mod_timer to setup and arm a timer, to make the code
cleaner and easier to read.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -ENODEV from the DMA is not supported error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled, spin_lock_irqsave()
make sure always in irq disable context. So the kfree_skb()
should be replaced with dev_kfree_skb_irq().
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's not necessary to free memory allocated with devm_kzalloc in the
remove path and using kfree leads to a double free.
Fixes: 84640e27f2 ("net: netcp: Add Keystone NetCP core ethernet
driver")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return error code -EINVAL from the error handling
case instead of 0, as done elsewhere in this function.
Fixes: e420114eef ("rocker: introduce worlds infrastructure")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3ac72b7b63 ("net: fec: align IP header in hardware") breaks
networking on mx28.
There is an erratum on mx28 (ENGR121613 - ENET big endian mode
not compatible with ARM little endian) that requires an additional
byte-swap operation to workaround this problem.
So call swap_buffer() prior to performing the IP header alignment
to restore network functionality on mx28.
Fixes: 3ac72b7b63 ("net: fec: align IP header in hardware")
Reported-and-tested-by: Henri Roosen <henri.roosen@ginzinger.com>
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Time Sync (PTP) implementation uses the divisor/shift value for converting
the clock ticks to nanoseconds. Driver currently defines shift value as 1,
this results in the nanoseconds value to be calculated as half the actual
value. Hence the user application fails to synchronize the device clock
value with the PTP master device clock. Need to use the 'shift' value of 0.
Signed-off-by: Sony.Chacko <Sony.Chacko@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Push cmd resource query related defines to cmd.h where they belong.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the MLXSW_REG_DEFINE macro to store register name in string form.
Use this string later on instead of hard coded string values.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Save some code and also prepare to easily carry name in string form.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enforce const for getter buf args.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These should be const, so enforce it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver uses incorrect APIs to unmap DMA memory which were
mapped using dma_map_single(). This patch fixes it to use
appropriate APIs for un-mapping DMA memory.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
qed_dcbx_query_params() implementation populate the values to input
buffer based on the dcbx mode and, the current negotiated state/params,
the caller of this API need to memset the buffer to zero.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rx indirection table entries are in the range [0, (rss_count - 1)]. If
user reduces the rss count, the table entries may not be in the ccorrect
range. Need to reconfigure the table with new rss_count as a basis.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the current default values for Rx path i.e., 8 queues of 8Kb entries
each with 4Kb size, interface will consume 256Mb for Rx. The default values
causing the driver probe to fail when the system memory is low. Based on
the perforamnce results, rx-ring count value of 1Kb gives the comparable
performance with Rx coalesce timeout of 12 seconds. Updating the default
values.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the execution of loopback test, driver may receive the packets which
are not originated by this test, loopback implementation need to skip those
packets.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
RSS configuration is not supported for 100G adapters.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent changes in kernel ethtool implementation requires the driver
callback for get_channels() has to populate the values for max tx/rx
coalesce fields.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>