Replace kfree_skb with dev_consume_skb_any in free_tx_desc that can be
called in hard irq and other contexts. dev_consume_skb_any is used
as this function consumes successfully transmitted skbs.
Replace dev_kfree_skb with dev_kfree_skb_any in t4_eth_xmit that can
be called in hard irq and other contexts, on paths that drop the skb.
Replace dev_kfree_skb with dev_consume_skb_any in t4_eth_xmit that can
be called in hard irq and other contexts, on paths that successfully
transmit the skb.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.
Design:
cxgb4:
Direct ULD callback from the DB FULL/DROP interrupt handler. This allows
the ULD to stop doing user DB writes as quickly as possible.
While user DB usage is disabled, the LLD will accumulate DB write events
for its queues. Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count. This reduces the
load put on the DB fifo when reenabling.
iw_cxgb4:
Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps. This allows
iw_cxgb4 to only set this single bit to disable all DB writes for all
user QPs vs traversing the idr of all the active QPs. If the libcxgb4
doesn't support this, then we fall back to the old approach of marking
each QP. Thus we allow the new driver to work with an older libcxgb4.
When the LLD upcalls iw_cxgb4 indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED. As user
processes see that DB writes are disabled, they call into iw_cxgb4
to submit their DB write events. Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.
When the LLD upcalls iw_cxgb4 indicating DB EMPTY, which is in a workq
context, we change the DB state to FLOW_CONTROL, and begin resuming all
the QPs that are on the flow control list. This logic runs on until
the flow control list is empty or we exit FLOW_CONTROL mode (due to
a DB DROP upcall, for example). QPs are removed from this list, and
their accumulated DB write counts written to the DB FIFO. Sets of QPs,
called chunks in the code, are removed at one time. The chunk size is 64.
So 64 QPs are resumed at a time, and before the next chunk is resumed, the
logic waits (blocks) for the DB FIFO to drain. This prevents resuming to
quickly and overflowing the FIFO. Once the flow control list is empty,
the db state transitions back to NORMAL and user QPs are again allowed
to write directly to the user DB register.
The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops. As the load lightens though, we
resume to normal DB writes directly by user applications.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0034b29 ("cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()")
introduced a regression where-in length was calculated wrongly for LSO path,
causing chip hangs.
So, correct the calculation of len.
Fixes: 0034b29 ("cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()")
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and the
extant logic would flag that as an error.
Based on original work by Casey Leedom <leedom@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Also, modify is_eth_imm() to return header length so it doesn't
have to be recomputed in calc_tx_flits().
Based on original work by Mike Werner <werner@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The LE workaround in firmware is not atomic and fw_ofld_connection_wrs must not interleave.
Therefore, when the workaround is enabled, we need to send all ctrlq WRs on a single ctrl queue.
Based on original work by Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c
net/ipv6/ip6_tunnel.c
net/ipv6/ip6_vti.c
ipv6 tunnel statistic bug fixes conflicting with consolidation into
generic sw per-cpu net stats.
qlogic conflict between queue counting bug fix and the addition
of multiple MAC address support.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 52367a763d
("cxgb4/cxgb4vf: Code cleanup to enable T4 Configuration File support"),
we have failures like this during cxgb4 probe:
cxgb4 0000:01:00.4: bad SGE FL page buffer sizes [65536, 65536]
cxgb4: probe of 0000:01:00.4 failed with error -22
This happens whenever software parameters are used, without a
configuration file. That happens when the hardware was already
initialized (after kexec, or after csiostor is loaded).
It happens that these values are acceptable, rendering fl_pg_order equal
to 0, which is the case of a hard init when the page size is equal or
larger than 65536.
Accepting fl_large_pg equal to fl_small_pg solves the issue, and
shouldn't cause any trouble besides a possible performance reduction
when smaller pages are used. And that can be fixed by a configuration
file.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers should call skb_set_hash to set the hash and its type
in an skbuff.
Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This corrects an regression introduced by "net: Use 16bits for *_headers
fields of struct skbuff" when NET_SKBUFF_DATA_USES_OFFSET is not set. In
that case skb->tail will be a pointer whereas skb->transport_header
will be an offset from head. This is corrected by using wrappers that
ensure that comparisons and calculations are always made using pointers.
Cc: Simon Horman <horms@verge.net.au>
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a protocol argument to the VLAN packet tagging functions. In case of HW
tagging, we need that protocol available in the ndo_start_xmit functions,
so it is stored in a new field in the skb. The new field fits into a hole
(on 64 bit) and doesn't increase the sks's size.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements a low latency Write Combining (aka Write Coalescing) work
request path. PCIE maps User Space Doorbell BAR2 region writes to the new
interface to SGE. SGE pulls a new message from PCIE new interface and if its a
coalesced write work request then pushes it for processing. This patch copies
coalesced work request to memory mapped BAR2 space.
Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch contains updates to firmware/hardware header files shared
between csiostor and cxgb4/cxgb4vf, and the resulting changes to the
cxgb4/cxgb4vf source files.
Signed-off-by: Naresh Kumar Inna <naresh@chelsio.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch adds new enums and macros to enable T4 configuration file support. It
also removes duplicate macro definitions.
It fixes the build failure in cxgb4vf driver introduced because of old macro
definition removal.
It also performs SGE initialization based on T4 configuration file is provided
or not. If it is provided then it uses the parameters provided in it otherwise
it uses hard coded values.
Signed-off-by: Jay Hernandez <jay@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removed duplicate definition for SGE_PF_KDOORBELL, SGE_INT_ENABLE3,
PCIE_MEM_ACCESS_OFFSET registers.
Moved the register field definitions around the register definition.
Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Reviewed-by: Sivakumar Subramani <sivasu@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The skb->pfmemalloc flag gets set to true iff during the slab allocation
of data in __alloc_skb that the the PFMEMALLOC reserves were used. If
page splitting is used, it is possible that pages will be allocated from
the PFMEMALLOC reserve without propagating this information to the skb.
This patch propagates page->pfmemalloc from pages allocated for fragments
to the skb.
It works by reintroducing and expanding the skb_alloc_page() API to take
an skb. If the page was allocated from pfmemalloc reserves, it is
automatically copied. If the driver allocates the page before the skb, it
should call skb_propagate_pfmemalloc() after the skb is allocated to
ensure the flag is copied properly.
Failure to do so is not critical. The resulting driver may perform slower
if it is used for swap-over-NBD or swap-over-NFS but it should not result
in failure.
[davem@davemloft.net: API rename and consistency]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Neil Brown <neilb@suse.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.
For example, this cast:
int y;
int *p = (int *)&y;
I used the coccinelle script below to find and remove these
unnecessary casts. I manually removed the conversions this
script produces of casts with __force, __iomem and __user.
@@
type T;
T *p;
@@
- (T *)p
+ p
A function in atl1e_main.c was passed a const pointer
when it actually modified elements of the structure.
Change the argument to a non-const pointer.
A function in stmmac needed a __force to avoid a sparse
warning. Added it.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
recover LLD EQs for DB drop interrupts. This includes adding a new
db_lock, a spin lock disabling BH too, used by the recovery thread and
the ring_tx_db() paths to allow db drop recovery.
Clean up initial DB avoidance code.
Add read_eq_indices() - this allows the LLD to use the PCIe mw to
efficiently read hw eq contexts.
Add cxgb4_sync_txq_pidx() - called by iw_cxgb4 to sync up the sw/hw
pidx value.
Add flush_eq_cache() and cxgb4_flush_eq_cache(). This allows iw_cxgb4
to flush the sge eq context cache before beginning db drop recovery.
Add module parameter, dbfoifo_int_thresh, to allow tuning the db
interrupt threshold value.
Add dbfifo_int_thresh to cxgb4_lld_info so iw_cxgb4 knows the threshold.
Add module parameter, dbfoifo_drain_delay, to allow tuning the amount
of time delay between DB FULL and EMPTY upcalls to iw_cxgb4.
Signed-off-by: Vipul Pandya <vipul@chelsio.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Given we dont use anymore the struct net_device *dev argument, and this
interface brings litle benefit, remove netdev_{alloc|free}_page(), to
debloat include/linux/skbuff.h a bit.
(Some drivers used a mix of these interfaces and alloc_pages())
When allocating a page given to device for DMA transfer (device to
memory), it makes sense to use a cold one (__GFP_COLD)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
CC: Dimitris Michailidis <dm@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These were getting the macros from an implicit module.h
include via device.h, but we are planning to clean that up.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
drivers/net: Add export.h to wireless/brcm80211/brcmfmac/bcmsdh.c
This relatively recently added file uses EXPORT_SYMBOL and hence
needs export.h included so that it is compatible with the module.h
split up work.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Dimitris Michailidis <dm@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.
Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Moves the drivers for the Chelsio chipsets into
drivers/net/ethernet/chelsio/ and the necessary Kconfig and Makefile
changes.
CC: Divy Le Ray <divy@chelsio.com>
CC: Dimitris Michailidis <dm@chelsio.com>
CC: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>