Pre-allocate a skb at init time to be used for control messages to the HW
if skb allocation fails.
Tolerate failures to send messages initializing some memories at the cost of
parity error detection for these memories.
Retry sending connection id release messages if both alloc_skb(GFP_ATOMIC)
and alloc_skb(GFP_KERNEL) fail.
Do not bring the interface up if messages binding queue set to port fail to
be sent.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use generic MDIO generic values.
Based on Ben Hutchings'review comments.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for the AEL2020 phy.
Add PCI IDs of the boards using this phy.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not call t3_link_fault() under spinlock, as it calls msleep().
Besides, only the access to pi->link_fault needs to be serialized.
Also initialize local variables before checking the link status,
link state fields might otherwise end up containing garbage.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 5e68b772e6
cxgb3: map entire Rx page, feed map+offset to Rx ring.
introduced a regression on platforms defining DECLARE_PCI_UNMAP_ADDR()
and related macros as no-ops.
Rx descriptors are fed with the a page buffer bus address + page chunk offset.
The page buffer bus address is set and retrieved through
pci_unamp_addr_set(), pci_unmap_addr().
These functions being meaningless on x86 (if CONFIG_DMA_API_DEBUG is not set).
The HW ends up with a bogus bus address.
This patch saves the page buffer bus address for all plaftorms.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Followup of commits 9d21493b4b
and 08baf56108
(net: tx scalability works : trans_start)
(net: txq_trans_update() helper)
Now that core network takes care of trans_start updates, dont do it
in drivers themselves, if possible. Multi queue drivers can
avoid one cache miss (on dev->trans_start) in their start_xmit()
handler.
Exceptions are NETIF_F_LLTX drivers (vxge & tehuti)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mdio's dev field needs to be set before mdio ops occur.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Trivial: fixing gcc 4.4 compiler warning:
drivers/net/cxgb3/t3_hw.c: In function ‘t3_prep_adapter’:
drivers/net/cxgb3/t3_hw.c:3782: warning: suggest parentheses around operand of ‘!’ or change ‘|’ to ‘||’ or ‘!’ to ‘~’
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: David S. Miller <davem@davemloft.net>
EEH attempts to recover up 6 times.
The last attempt leaves all the ports and adapter down.hen
The driver is then unloaded, bringing the adapter down again
unconditionally. The unload will hang.
Check if the adapter is already down before trying to bring it down again.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The fatal error task can be scheduled while processing an offload packet
in NAPI context when the connection handle is bogus. this can race
with the ports being brought down and the cxgb3 workqueue being flushed.
Stop napi processing before flushing the work queue.
The ULP drivers (iSCSI, iWARP) might also schedule a task on keventd_wk
while releasing a connection handle (cxgb3_offload.c::cxgb3_queue_tid_release()).
The driver however does not flush any work on keventd_wq while being unloaded.
This patch also fixes this.
Also call cancel_delayed_work_sync in place of the the deprecated
cancel_rearming_delayed_workqueue.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the existing periodic task to handle link faults.
The link fault interrupt handler is also called in work queue context,
which is wrong and might cause potential deadlocks.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It turns out that copying a 16-byte area at ~800k times a second
can be really expensive :) This patch redesigns the frags GRO
interface to avoid copying that area twice.
The two disciples of the frags interface have been converted.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace all DMA_32BIT_MASK macro with DMA_BIT_MASK(32)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Replace all DMA_64BIT_MASK macro with DMA_BIT_MASK(64)
Signed-off-by: Yang Hongyang<yanghy@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DMA mapping can be expensive in the presence of iommus.
Reduce the Rx iommu activity by mapping an entire page, and provide the H/W
the mapped address + offset of the current page chunk.
Reserve bits at the end of the page to track mapping references, so the page
can be unmapped.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use resource_size_t to declare mmio start and len variables.
Print PEX error register after EEH resumed.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable timestamps, update delayed ack threshold for iSCSI/iWARP traffic
Remove the len flag in Tx requests. It might corrupt offload trace packets.
Update SGE context setup to avoid potential H/W misprogrammation.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Start queue set reclaim timers after the queue sets have been
allocated successfully.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently ignores the local or remote link faults
raised at the mac layer. This patch fixes it.
Our mac however only advertizes link events, so wait for the
phy to stabilize the link, then enable mac link events interrupts.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the heurstics workaround unlocking a hung mac:
- reduce Tx mac toggling by enabling Tx drain before resetting the mac
- Take Tx (lack of) activity in account only
- Update the monitoring counter range to 64 bits
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Under RX pressure, The HW might generate a high load of interrupts
to signal mac fifo or free lists overflow.
Disable the interrupts, and poll the relevant status bits
to maintain stats.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Separate TX and RX reclaim handlers
Don't disable interrupts in RX reclaim handler.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Elmininate a cache miss when accessing the CPL header within
the first aggregated buffer.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update skb truesize correctly for the 2nd buffer from a Jumbo frame
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Release page chunk reference in case we fail to map it.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ring free lists door bell less frequently,
specifically every quarter of the active FL
size.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for adapters with a PCI id equal to 0x35.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this sparse warning:
drivers/net/cxgb3/ael1002.c:1010:60: warning: incorrect type in argument 4 (different signedness)
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Impact: Move variable declaration as close to usage as possible.
Fix this sparse warning:
drivers/net/cxgb3/cxgb3_main.c:1586:21: warning: symbol 'cap' shadows an earlier one
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LRO switch is always set to 1 in the rx processing loop.
It breaks the accelerated iSCSI receive traffic.
Fix its computation.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Set up a notification mechanism to inform upper layer modules
(iWARP, iSCSI) of a chip reset due to an EEH event or a fatal error.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes cxgb3 invoke the GRO hooks instead of LRO. As
GRO has a compatible external interface to LRO this is a very
straightforward replacement.
I've kept the ioctl controls for per-queue LRO switches. However,
we should not encourage anyone to use these.
Because of that, I've also kept the skb construction code in
cxgb3. Hopefully we can phase out those per-queue switches
and then kill this too.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently drops to line interrupt mode
if it did not get all the msi-x vectors it requested.
Allow msi-x settings when a minimal amount of vectors
is provided.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The lro manager's frag_align_pad setting was missing,
leading to misaligned access to the skb passed up
to the stack.
Tested-by: Rick Jones <rick.jones2@hp.com>
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I have a system with a Chelsio adapter (driven by cxgb3) whose ports are
part of a Linux bridge. Recently I updated the kernel and discovered
that things stopped working because cxgb3 was doing LRO on packets that
were passed into the bridge code for forwarding. (Incidentally, this
problem manifested itself in a strange way that made debugging a bit
interesting -- for some reason, the skb_warn_if_lro() check in bridge
didn't trigger and these LROed packets were forwarded out a forcedeth
interface, and caused the forcedeth transmit path to get stuck)
This is because cxgb3 has no way of keeping state for the LRO flag until
the interface is brought up, so if the bridging code disables LRO while
the interface is down, then cxgb3_up() will just reenable LRO, and on my
Debian system at least, the init scripts add interfaces to a bridge
before bringing the interfaces up.
Fix this by keeping track of each interface's LRO state in cxgb3 so that
when bridge disables LRO, it stays disabled in cxgb3_up() when the
interface is brought up. I did this by changing the rx_csum_offload
flag into a pair of bit flags; the effect of this on the rx_eth() fast
path is miniscule enough that it should be fine (eg on x86, a cmpb
instruction becomes a testb instruction).
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update contol path between cxgb3 and ULP modules (iWARP, iSCSI)
to provide access to firware and protocol engine info.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function 'vsc8211_set_speed_duplex' is not used, so comment it
out. For 'vsc8211_set_automdi' the function 'vsc8211_set_speed_duplex'
is the only caller, so comment it out as well.
Fix this (sparse) warning:
drivers/net/cxgb3/vsc8211.c:269: warning: 'vsc8211_set_automdi' defined but not used
drivers/net/cxgb3/vsc8211.c:295:5: warning: symbol 'vsc8211_set_speed_duplex' was not declared. Should it be static?
Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
The accelerated iSCSI traffic could use a private IP address unknown to the OS:
- The IP address is required in both drivers to manage ARP requests and connection set up.
- Added an control call to retrieve the ip address.
- Reply to ARP requests dedicated to the private IP address.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Karen Xie <kxie@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The NIC driver can work with mutliple versions of the FW.
Let the driver load when the embedded FW does not match,
and the FW update mechanism failed.
The iWARP module will make its own loading decision.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement NIC Tx multiqueue.
Bump up driver version.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is_pure_response() does "ntohl(var) & const" and then
essentially just tests whether the result is 0 or not; this can be done
more efficiently by computing "var & htonl(const)" instead and doing the
byte swap at compile time instead of run time.
This change slightly shrinks the compiled code; eg on x86-64 we save a
couple of bswapl instructions:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-8 (-8)
function old new delta
t3_sge_intr_msix_napi 544 536 -8
and this also has the pleasant side effect of fixing a sparse warning:
drivers/net/cxgb3/sge.c:2313:15: warning: restricted degrades to integer
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update FW loading path to accomodate in-kernel images location
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add consistency in alloc_ring() parameter checking
to avoid potential memory leaks.
alloc_ring() callers are correct fo far.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix t3_eth_xmit() missing into the netdev_ops structure.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves neigh_setup and hard_start_xmit into the network device ops
structure. For bisection, fix all the previously converted drivers as well.
Bonding driver took the biggest hit on this.
Added a prefetch of the hard_start_xmit in the fast path to try and reduce
any impact this would have.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert this driver to network device ops. Compile tested only.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the iw_cxgb3 module's cxgb3_client "add" func gets called by the
cxgb3 module, the iwarp driver ends up calling the ethtool ops get_drvinfo
function in cxgb3 to get the fw version and other info. Currently the
iwarp driver grabs the rtnl lock around this down call to serialize.
As of 2.6.27 or so, things changed such that the rtnl lock is held around
the call to the netdev driver open function. Also the cxgb3_client "add"
function doesn't get called if the device is down.
So, if you load cxgb3, then load iw_cxgb3, then ifconfig up the device,
the iw_cxgb3 add func gets called with the rtnl_lock held. If you
load cxgb3, ifconfig up the device, then load iw_cxgb3, the add func
gets called without the rtnl_lock held. The former causes the deadlock,
the latter does not.
In addition, there are iw_cxgb3 sysfs handlers that also can call
down into cxgb3 to gather the fw and hw versions. These can be called
concurrently on different processors and at any time. Thus we need to
push this serialization down in the cxgb3 driver get_drvinfo func.
The fix is to remove rtnl lock usage, and use a per-device lock in cxgb3.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Protect against invalid phy entries in the eeprom.
Extend eeprom access timeout.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The generic packet receive code takes care of setting
netdev->last_rx when necessary, for the sake of the
bonding ARP monitor.
Drivers need not do it any more.
Some cases had to be skipped over because the drivers
were making use of the ->last_rx value themselves.
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement ethtool's get_flags and set_flags methods.
It enables ethtool to control the LRO settings.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Commit 147e70e6 ("cxgb3: Use SKB list interfaces instead of home-grown
implementation.") causes a crash in t3_l2t_send_slow() when an iWARP
connection request is received. This is because the new l2t_entry.arpq
skb queue is never initialized, and therefore trying to add an skb to
it causes a NULL dereference. With the old code there was no need to
initialize the queues because the l2t_entry structures were zeroed,
and the code used NULL to mean empty.
Fix this by adding __skb_queue_head_init() when all the l2t_entry
structures get allocated.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add a field to the driver versioning info.
Update version to 1.1.0.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for SR PHY.
Auto-detect phy module type, and report type changes.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add generic code to manage interrupt driven PHYs.
Do not reset the phy after link parameters update,
the new values might get lost.
Return early from link change notification
when the link parameters remain unchanged.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Do not require PHY interrupts to be connected to GPIs in ascending order.
Base interrupt availability both on PHYs supporting them and on GPIs being
hooked up. Allows boards to specify interrupt GPIs though the PHYs don't
use them.
Remove spurious PHY interrupts due to clearing T3DBG interrupts before
setting their polarity.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Second step in overall phy layer reorganization.
Clean up the port_type_info structure.
Support coextistence of clause 22 and clause 45 MDIO devices.
Select the type of MDIO transaction on a per transaction basis.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
First step towards overall PHY layering re-organization.
Allow a status return when a PHY is reset.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allocate a queue set per core, up to the maximum of available qsets.
Share the queue sets on multi port adapters.
Rename MSI-X interrupt vectors ethX-N, N being the queue set number.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
when a fatal error occurs, bring ports down, reset the chip,
and bring ports back up.
Factorize code used for both EEH and fatal error recovery.
Fix timer usage when bringing up/resetting sge queue sets.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A SGE queue set timer might access registers while in EEH recovery,
triggering an EEH error loop. Stop all timers early in EEH process.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
__FUNCTION__ is gcc-specific, use __func__
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The generic lro code checks TCP flags/options.
Remove duplicate tests done in the driver.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Starting with FW version 7.0, the driver needs to allow larger images.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Handling the zero STag in receive work request requires some extra
logic in the driver:
- Only set the QP_PRIV bit for kernel mode QPs.
- Add a zero STag build function for recv wrs. The uP needs a PBL
allocated and passed down in the recv WR so it can construct a HW
PBL for the zero STag S/G entries. Note: we need to place a few
restrictions on zero STag usage because of this:
1) all SGEs in a recv WR must either be zero STag or not. No mixing.
2) an individual SGE length cannot exceed 128MB for a zero-stag SGE.
This should be OK since it's not really practical to allocate
such a large chunk of pinned contiguous DMA mapped memory.
- Add an optimized non-zero-STag recv wr format for kernel users.
This is needed to optimize both zero and non-zero STag cracking in
the recv path for kernel users.
- Remove the iwch_ prefix from the static build functions.
- Bump required FW version.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
- Add a new rdma ctl command called RDMA_GET_MIB to the cxgb3 low
level driver to obtain the protocol mib from the rnic hardware.
- Add new iw_cxgb3 provider method to get the MIB from the low level
driver.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Hide struct vlan_dev_info from drivers to prevent them from growing
more creative ways to use it. Provide accessors for the two drivers
that currently use it.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reset the chip when the PCI link goes down.
Preserve the napi structure when a sge qset's resources are freed.
Replay only HW initialization when the chip comes out of reset.
Signed-off-by: Divy Le ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Fix faiures path when ports are stopped and restarted
in EEH recovery.
Signed-off-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Otherwise theoretically at least
CAP_NET_ADMIN
Reload new firmware
Wait..
Firmware patches kernel
So it should be CAY_SYS_RAWIO - not that I suspect this is in fact a
credible attack vector!
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Open MPI, Intel MPI and other applications don't respect the iWARP
requirement that the client (active) side of the connection send the
first RDMA message. This class of application connection setup is
called peer-to-peer. Typically once the connection is setup, _both_
sides want to send data.
This patch enables supporting peer-to-peer over the chelsio RNIC by
enforcing this iWARP requirement in the driver itself as part of RDMA
connection setup.
Connection setup is extended, when the peer2peer module option is 1,
such that the MPA initiator will send a 0B Read (the RTR) just after
connection setup. The MPA responder will suspend SQ processing until
the RTR message is received and reply-to.
In the longer term, this will be handled in a standardized way by
enhancing the MPA negotiation so peers can indicate whether they
want/need the RTR and what type of RTR (0B read, 0B write, or 0B send)
should be sent. This will be done by standardizing a few bits of the
private data in order to negotiate all this. However this patch
enables peer-to-peer applications now and allows most of the required
firmware and driver changes to be done and tested now.
Design:
- Add a module option, peer2peer, to enable this mode.
- New firmware support for peer-to-peer mode:
- a new bit in the rdma_init WR to tell it to do peer-2-peer
and what form of RTR message to send or expect.
- process _all_ preposted recvs before moving the connection
into rdma mode.
- passive side: defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the RTR is received.
- active side: expect and process the 0B read WR on offload TX
queue. Defer completing the rdma_init WR until all
pre-posted recvs are processed. Suspend SQ processing until
the 0B read WR is processed from the offload TX queue.
- If peer2peer is set, driver posts 0B read request on offload TX
queue just after posting the rdma_init WR to the offload TX queue.
- Add CQ poll logic to ignore unsolicitied read responses.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
None of these files use any of the functionality promised by
asm/semaphore.h. It's possible that they rely on it dragging in some
unrelated header file, but I can't build all these files, so we'll have
fix any build failures as they come up.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Fix the warning:
drivers/net/cxgb3/cxgb3_main.c: In function ‘offload_open’:
drivers/net/cxgb3/cxgb3_main.c:936: warning: ignoring return value of
‘sysfs_create_group’, declared with attribute warn_unused_result
Now the return value is checked; if sysfs_create_group() returns failure,
a warning is printed using dev_dbg, and the code continues as before. Use
of dev_dbg ensures printk is not needlessly included unless desired for
debugging.
Signed-off-by: Dan Noe <dpn@isomerica.net>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>