The bnx2 chips do not support per MSI vector masking. On 5706/5708, new MSI
address/data are stored only when the MSI enable bit is toggled. As a result,
SMP affinity no longer works in the latest kernel. A more serious problem is
that the driver will no longer receive interrupts when the MSI receiving CPU
goes offline.
The workaround in this patch only addresses the problem of CPU going offline.
When that happens, the driver's timer function will detect that it is making
no forward progress on pending interrupt events and will recover from it.
Eric Dumazet reported the problem.
We also found that if an interrupt is internally asserted while MSI and INTA
are disabled, the chip will end up in the same state after MSI is re-enabled.
The same workaround is needed for this problem.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Tested-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all related timeout constants to the same location. BNX2
prefix is also added to make them more consistent.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The default rx buffer water marks for XOFF/XON are for 1500 MTU. At
larger MTUs, these water marks need to be adjusted for effective
flow control.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Before, the driver would not care about the return codes from pci_map_*
functions. This could be potentially dangerous if a mapping failed.
Now, we will check all pci_map_* calls. On the transmit side, we switch
to use the new function skb_dma_map(). On the receive side, we add
pci_dma_mapping_error().
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bnx2 driver stores/uses the irq value from the pci_dev internally.
But when it stores the irq value, it has been performing an
integer demotion. Because of the recent changes made to
arch/x86/kernel/io_apic.c, the new method in creating the irq value
(using build_irq_for_pci_dev()) has exposed this bug on x86 systems.
Because of this demotion when calling request_irq() from
bnx2_request_irq(), the driver would get a return code of -EINVAL.
This is because the kernel could not find the requested irq descriptor.
By storing the irq value properly, the kernel can find the correct
irq descriptor and the bnx2 driver can operate normally.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The timer_interval field is only assigned once, and never reassigned.
We can safely replace all instances of the timer_interval with a
constant value.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The name of the board is only used during the initialization of
the adapter. We can save the space of a pointer by not storing
this information.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for configuring secondary unicast addresses. There
are 4 additional perfect match filters which can be used for
secondary unicast address support.
* Modified bnx2_set_mac_addr() to be more generic in handling
the setting of the perfect match filters
* Changed bnx2_set_rx_mode() to handle the unicast dev_addr_list
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Negotiate with boot code and ASF firmware to see if it can
support keeping VLAN tags in the RX packets. If supported
by firmware, the VLAN tag will be kept in the RX packet
unless VLAN acceleration is registered.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable multiple rx rings if MSI-X vectors are available. We enable
up to 7 rx rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add hw_tx_cons_ptr and hw_rx_cons_ptr to speed up the retreival of
the tx and rx consumer index, since the MSI-X and default status
blocks have different structures.
Combine status_blk and status_blk_msix into a union. We'll only use
one type of status block for each vector.
Separate the code to detect more rx and tx work from the code to
detect link related work.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for multi-ring support, rx ring variables are now put
in a separate bnx2_rx_ring_info struct. With MSI-X, we can support
multiple rx rings.
The functions to allocate/free rx memory and to initialize rx rings
are now modified to handle multiple rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for multi-ring support, tx ring variables are now put
in a separate bnx2_tx_ring_info struct. Multi tx ring will not be
enabled until it is fully supported by the stack. Only 1 tx ring
will be used at the moment.
The functions to allocate/free tx memory and to initialize tx rings
are now modified to handle multiple rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the RTNL is held when we invoke flush_scheduled_work() we could
deadlock. One such case is linkwatch, it is a work struct which tries
to grab the RTNL semaphore.
The most common case are net driver ->stop() methods. The
simplest conversion is to instead use cancel_{delayed_}work_sync()
explicitly on the various work struct the driver uses.
This is an OK transformation because these work structs are doing
things like resetting the chip, restarting link negotiation, and so
forth. And if we're bringing down the device, we're about to turn the
chip off and reset it anways. So if we cancel a pending work event,
that's fine here.
Some drivers were working around this deadlock by using a msleep()
polling loop of some sort, and those cases are converted to instead
use cancel_{delayed_}work_sync() as well.
Signed-off-by: David S. Miller <davem@davemloft.net>
To make the bnx2 code more consistent, all instances of
RX_COPY_THRESH have been changed to BNX2_RX_COPY_THRESH.
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rx_offset field is set to a constant value and initialized
only once. By replacing all references to the rx_offset field,
we can eliminate rx_offset from the bnx2 structure. This will
save 4 bytes for every bnx2 instance.
[Added parentheses to the definition of BNX2_RX_OFFSET, as noted
by Ben Hutchings.]
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because of some board issues, we need to disable parallel detect on
an HP blade. Without this patch, the link state can become stuck
when it goes into parallel detect mode.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of the programmable high/low water marks in 5709 for
802.3 flow control.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The CTX_WR macro is unnecessary and obfuscates the code.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The REG_WR_IND/REG_RD_IND macros are unnecessary and obfuscate the
code. Many callers to these macros read and write shared memory from
the bp->shmem_base, so we add 2 similar functions that automatically
add the shared memory base.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the tx coalescing setup code independent of the MSIX vector.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Correct the MII expansion serdes control register definition.
2. Check an additional RUDI_INVALID bit when determining 5706S link.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prefix "bp->phy_flags" names with BNX2_PHY_FLAG_* for consistency.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some blade systems using the 5706 serdes, the hardware sometimes
does not properly generate link down interrupts. We add a workaround
in the driver's timer to force a link-down when some PHY registers
report loss of SYNC.
The parallel detect logic is cleaned up slightly to better integrate
the workaround.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip has problem running in this mode and needs to be disabled.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable new tx ring and add new MSIX handler and NAPI poll function
for the new tx ring. Enable MSIX when the hardware supports it.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To separate TX IRQs into a different MSIX vector, we need to
support a new tx ring. The original tx ring will still be used
when not using MSIX.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change bnx2_napi struct into an array and add code to manage multiple
IRQs. MSIX hardware structures and new registers are also added.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tx related fields used in NAPI polling are moved from the main
bnx2 struct to the bnx2_napi struct.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a bnx2_napi structure that will hold a napi_struct and
other fields to handle NAPI polling for the napi_struct. Various tx
and rx indexes and status block pointers will be moved from the main
bnx2 structure to this bnx2_napi structure.
Most NAPI path functions are modified to be passed this bnx2_napi
struct pointer.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a table to keep track of multiple IRQs and restructure the IRQ
request and free functions so that they can be easily expanded to
handle multiple IRQs.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add function to reuse a page in case of allocation or other errors.
Add code to construct the completed SKB with the additional data in
the pages.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new fields to keep track of the pages and the page rings.
Add functions to allocate and free pages.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Factor out the common functions that will be used to initialize the
normal RX rings and the page rings.
Change the copybreak constant RX_COPY_THRESH to 128. This same
constant will be used for the max. size of the linear SKB when pages
are used. Copybreak will be turned off when pages are used.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the various ring constants to make the code cleaner.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Packets can be left in the RX ring if the NAPI budget is reached.
This is caused by storing the latest rx index at the beginning of
bnx2_rx_int(). We may not process all the work up to this index
if the budget is reached and so some packets in the RX ring may rot
when we later check for more work using this stored rx index.
The fix is to not store this latest hw index and only store the
processed rx index. We use a new function bnx2_get_hw_rx_cons()
to fetch the latest hw rx index.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the default WoL setting to match the NVRAM's setting. It
always defaulted to WoL disabled before and caused a lot of confusion
for users.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a follow up to the patches from Denys Vlasenkos
<vda.linux@googlemail.com> to further optimize firmware loading.
1. In bnx2_init_cpus(), we allocate memory for decompression once
and use it repeatedly instead of doing this for every firmware image.
2. We eliminate the BSS and SBSS firmware sections in bnx2_fw*.h since
these are always zeros.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch modifies gzip unpacking code in bnx2 driver so that
it does not depend on bnx2 internals. I will move this code
out of the driver and into zlib in follow-on patch.
It can be useful in other drivers which need to store firmwares
or any other relatively big binary blobs - fonts, cursor bitmaps,
whatever.
Patch is run tested by Michael Chan (driver author).
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NVRAM interface is slightly modified on the 5709. To properly
support it, we need to change the buffered flag in the flash data
structure into multiple flags to indicate buffered operation, address
translation, and the use of write enable (WREN). The 5709 flash
only requires the buffered operation bit to be set.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>