A rare case of no link due to a missed interrupt may occur due to a
race condition between acknowledging the IGU via the BAR and restoring the NIG
interrupt mask via the GRC.
To solve it, we wait for the IGU ack command to finish prior to restoring the
NIG interrupt mask.
Signed-off-by: Yaniv Rosner <yaniv.rosner@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is now possible to enable dropless flow control via nvram.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Taking PHY lock is not required on some older designs, but we are removing this
complication and always taking it since it is always required on newer designs
and does not worth the code complication on the older boards.
Taking PHY lock was initially required only on specific boards which had their
MDC/MDIO bus crossed, but since this lock is now always required, for example,
when NCSI is present, the PHY lock will always be taken.
Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces the BCM_CNIC define with a flag which can change at run-time
and which does not use the CONFIG_CNIC kconfig option.
For the PF/hypervisor driver cnic is always supported, however allocation of
cnic resources and configuration of the HW for offload mode is done only when
the cnic module registers bnx2x.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull networking changes from David Miller:
1) GRE now works over ipv6, from Dmitry Kozlov.
2) Make SCTP more network namespace aware, from Eric Biederman.
3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
4) Make openvswitch network namespace aware, from Pravin B Shelar.
5) IPV6 NAT implementation, from Patrick McHardy.
6) Server side support for TCP Fast Open, from Jerry Chu and others.
7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
Borkmann.
8) Increate the loopback default MTU to 64K, from Eric Dumazet.
9) Use a per-task rather than per-socket page fragment allocator for
outgoing networking traffic. This benefits processes that have very
many mostly idle sockets, which is quite common.
From Eric Dumazet.
10) Use up to 32K for page fragment allocations, with fallbacks to
smaller sizes when higher order page allocations fail. Benefits are
a) less segments for driver to process b) less calls to page
allocator c) less waste of space.
From Eric Dumazet.
11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
12) VXLAN device driver, one way to handle VLAN issues such as the
limitation of 4096 VLAN IDs yet still have some level of isolation.
From Stephen Hemminger.
13) As usual there is a large boatload of driver changes, with the scale
perhaps tilted towards the wireless side this time around.
Fix up various fairly trivial conflicts, mostly caused by the user
namespace changes.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
hyperv: Add buffer for extended info after the RNDIS response message.
hyperv: Report actual status in receive completion packet
hyperv: Remove extra allocated space for recv_pkt_list elements
hyperv: Fix page buffer handling in rndis_filter_send_request()
hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
hyperv: Fix the max_xfer_size in RNDIS initialization
vxlan: put UDP socket in correct namespace
vxlan: Depend on CONFIG_INET
sfc: Fix the reported priorities of different filter types
sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
sfc: Fix loopback self-test with separate_tx_channels=1
sfc: Fix MCDI structure field lookup
sfc: Add parentheses around use of bitfield macro arguments
sfc: Fix null function pointer in efx_sriov_channel_type
vxlan: virtual extensible lan
igmp: export symbol ip_mc_leave_group
netlink: add attributes to fdb interface
tg3: unconditionally select HWMON support when tg3 is enabled.
Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
gre: fix sparse warning
...
Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...
Move netif_napi_add for all queues from the probe call to the open call, to
avoid the case that napi objects are added for queues that may eventually not
be initialized and activated. With the former behavior, the driver could crash
when netpoll was calling ndo_poll_controller.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the following compiler warnings:
- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:2908:3: warning: comparison
of distinct pointer types lacks a cast [enabled by default]
- drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c:1709:7: warning: comparison
of distinct pointer types lacks a cast [enabled by default]
Signed-off-by: Joren Van Onder <joren.vanonder@gmail.com>
Acked-By: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The 57840 boards come in two flavours: 2 x 20G and 4 x 10G.
To better differentiate between the two flavours, a separate device ID
was assigned to each.
The silicon default value is still the currently supported 57840 device ID
(0x168d), and since a user can damage the nvram (e.g., 'ethtool -E')
the driver will still support this device ID to allow the user to amend the
nvram back into a supported configuration.
Notice this patch contains lines longer than 80 characters (strings).
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Put the numbers used for stop/resume queue in a single place and
fix the condition for sanity check.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert doxygen (or similar) formatted comments to kernel-doc or
unformatted comment. Delete a few that are content-free.
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
l4_rxhash is set on skb when rxhash is obtained from canonical 4-tuple
over transport ports/addresses.
We can set skb->l4_rxhash for all incoming TCP packets on bnx2x for
free, as cqe status contains a hash type information.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. When FCoE offload driver is registered, copy its capabilities to the chip
scratchpad.
2. Copy FCoE/iSCSI MAC addresses in aligned manner to chip scratchpad.
3. Add FCoE/iSCSI statistics collection support
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change updates the date and version of the bnx2x driver.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In multi-function device, allow configuring dcbx admin params from all drivers
on a single physical port.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ethtool -L/-l for setting and getting the number of RSS queues.
The 'combined' field is used as we don't support separate IRQ for Rx and Tx.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves some fields out of the FP structure to different structures, in
order to minimize size of contigiuous memory allocated.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the CNIC-related L2 CIDs (for sending control FCoE / iSCSI packets)
were at fixed position, according to the maximal number of RSS queues multiplied
by the number of traffic-classes. This change makes the CIDs dynamic, as they
are defined to be right after the highest RSS CID. This decreases the memory
allocated for the context.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the current scheme the transmission queues of traffic-class 0 were 0-15, the
transmission queues of traffic-class 1 were 16-31 and so on. If the number of
RSS queues was smaller than 16, there were gaps in transmission queues
numbering, as well as in CIDs numbering. This is both a waste (especially when
16 is increased to 64), and may causes problems with flushing queues when
reducing the number of RSS queues (using ethtool -L). The new scheme eliminates
the gaps.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With increased number of RSS queues, each multiplied by the number of traffic-
classes, we may have up to 64*3=192 CIDs. The current driver scheme with regard
to context allocation supports only 64 CIDs. The new scheme enables scatter-
gatehr list of pages for the context.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. In multi-function device, show only the online tests in self-test results as
only these test are performed (offline tests cannot be performed as they may
corrupt the traffic of other functions on the same physical port). Note that
multi-function mode cannot change while the driver is up.
2. Check result code in NIC load and act accordingly.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change enables to do self-test with external loopback via ethtool.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x driver incorrectly sets ip_summed to CHECKSUM_UNNECESSARY on
encapsulated segments. TCP stack happily accepts frames with bad
checksums, if they are inside a GRE or IPIP encapsulation.
Our understanding is that if no IP or L4 csum validation was done by the
hardware, we should leave ip_summed as is (CHECKSUM_NONE), since
hardware doesn't provide CHECKSUM_COMPLETE support in its cqe.
Then, if IP/L4 checksumming was done by the hardware, set
CHECKSUM_UNNECESSARY if no error was flagged.
Patch based on findings and analysis from Robert Evans
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Cc: Yaniv Rosner <yanivr@broadcom.com>
Cc: Merav Sicron <meravs@broadcom.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Robert Evans <evansr@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes GRO workaround, as issue is fixed in FW 7.2.51.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following patch adds afex multifunction support to the driver (afex
multifunction is based on vntag header) and updates FW version used to 7.2.51.
Support includes the following:
1. Configure vif parameters in firmware (default vlan, vif id, default
priority, allowed priorities) according to values received from NIC.
2. Configure FW to strip/add default vlan according to afex vlan mode.
3. Notify link up to OS only after vif is fully initialized.
4. Support vif list set/get requests and configure FW accordingly.
5. Supply afex statistics upon request from NIC.
6. Special handling to L2 interface in case of FCoE vif.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch revises the way by which rss are configured, removing
an unnecessary module paramater and unrequired modes.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The congestion management code has migrated into a common location,
allowing all fw writes controlling mf congestion to be made in a
single function in the code. This is a semantic change.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Notice this patch includes lines with over 80 characters, as to not
break strings.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, the bnx2x driver needed at least 2 available msix interrupt
vectors in order to use msix. This patch add the possibility of configuring
msix when only one interrupt vector is available.
Notice this patch contains lines with over 80 characters, as it keeps print
strings in a single line.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flow in which the bnx2x driver starts after a previous driver
has been terminated in an 'unclean' manner has several bugs and
FW risks, which makes it possible for the driver to fail after
boot-from-SAN or kdump.
This patch contains a revised flow which performs a safer
initialization, solving the possible crash scenarios.
Notice this patch contains lines with over 80 characters, as it
keeps print-strings in a single line.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously applied patch making the bnx2x statistics consistent
did not apply to old FWs. This remedies it, extending the consistent
behaviour to all drivers.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This includes changes in macros to better distinguish between the two
protocols, and slightly changed the way their macs are set.
Notice this file contains string print lines with more than 80 characters,
as to not break prints.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We've revised driver prints, changing the mask of existing prints
to allow better control over the debug messages, added prints to
error scenarios, removed unnecessary prints and corrected some spelling.
Please note that this patch contains lines with over 80 characters,
as string messages were kept in a single line.
Signed-off-by: Merav Sicron <meravs@broadcom.com>
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch provides workaround for BUG in FW 7.2.16,
which in GRO mode may miscalculate buffer and
place on SGE one frag less than it could.
It may happen only for some MTUs, we mark these MTUs
with gro_check flag during device initialization or
MTU change.
Next FW should include fix for the issue and the
patch could be reverted.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch integrates FW 7.2.16 HSI and implements driver
part of GRO flow.
FW 7.2.16 adds the ability to aggregate packets for GRO
(and not just LRO) and also fixes some bugs.
1. Added new aggregation mode: GRO. In this mode packets are aggregated
such that the original packets can be reconstructed by the OS.
2. 57712 HW bug workaround - initialized all CAM TM registers to 0x32.
3. Adding the FCoE statistics structures to the BNX2X HSI.
4. Wrong configuration of TX HW input buffer size may cause theoretical
performance effect. Performed configuration fix.
5. FCOE - Arrival of packets beyond task IO size can lead to crash.
Fix firmware data-in flow.
6. iSCSI - In rare cases of on-chip termination the graceful termination
timer hangs, and the termination doesn't complete. Firmware fix to MSL
timer tolerance.
7. iSCSI - Chip hangs when target sends FIN out-of-order or with isles
open at the initiator side. Firmware implementation corrected to drop
FIN received out-of-order or with isles still open.
8. iSCSI - Chip hangs when in case of retransmission not aligned to 4-bytes
from the beginning of iSCSI PDU. Firmware implementation corrected
to support arbitrary aligned retransmissions.
9. iSCSI - Arrival of target-initiated NOP-IN during intense ISCSI traffic
might lead to crash. Firmware fix to relevant flow.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 8304859a "bnx2x: add fan failure event handling" made the function
bnx2x_close() non-static unnecessarily. The function is not called from
other sources. Make it static again.
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently bnx2x statistics are reset by inner driver reload, e.g. by MTU
change. This patch fixes this issue - from now on statistics should only
be reset upon device closure.
Thanks to Michal Schmidt <mschmidt@redhat.com> for his initial patch
regarding this issue.
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
1. Sample mcp pulse and mcp sequence in nic load instead of in init_one
as they may change by the time we want to use them.
2. Allow cnic to access device during nic load (by adding a new "LOADING" state
to recovery flow). This prevents the unnecessary cnic timeout which resulted
by cnic attempting to access because nic is loading, but being blocked because
of the Recovery state.
3. Issue 'fake' driver load command to mcp when last driver unloads to prevent
mcp from taking ownership. When recovery is complete unload fake driver to
allow mcp to initialize the hardware before first driver loads.
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support to send driver capabilities, settings and statistics to
management firmware.
[ Redone using many local variables, removed many unnecessary inlines,
and put #defines at the left margin suggested by Joe Perches ]
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add FCoE statistics support for FCoE capable devices.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add Priority flow control counters for ethtool -S.
Signed-off-by: Barak Witkowski <barak@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
in iSCSI SD mode to bnx2x device assigned single mac address
which is supposted to be iscsi mac. If this mode is recognized
bnx2x will disable LRO, decrease number of queues to 1 and rx ring
size to the minumum allowed by FW, this in order minimize memory use.
It will tranfer mac for iscsi usage and zero primary mac of the netdev.
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x uses following formula to compute its rx_buf_sz :
dev->mtu + 2*L1_CACHE_BYTES + 14 + 8 + 8 + 2
Then core network adds NET_SKB_PAD and SKB_DATA_ALIGN(sizeof(struct
skb_shared_info))
Final allocated size for skb head on x86_64 (L1_CACHE_BYTES = 64,
MTU=1500) : 2112 bytes : SLUB/SLAB round this to 4096 bytes.
Since skb truesize is then bigger than SK_MEM_QUANTUM, we have lot of
false sharing because of mem_reclaim in UDP stack.
One possible way to half truesize is to reduce the need by 64 bytes
(2112 -> 2048 bytes)
Instead of allocating a full cache line at the end of packet for
alignment, we can use the fact that skb_shared_info sits at the end of
skb->head, and we can use this room, if we convert bnx2x to new
build_skb() infrastructure.
skb_shared_info will be initialized after hardware finished its
transfert, so we can eventually overwrite the final padding.
Using build_skb() also reduces cache line misses in the driver, since we
use cache hot skb instead of cold ones. Number of in-flight sk_buff
structures is lower, they are recycled while still hot.
Performance results :
(820.000 pps on a rx UDP monothread benchmark, instead of 720.000 pps)
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Eilon Greenstein <eilong@broadcom.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
CC: Tom Herbert <therbert@google.com>
CC: Jamal Hadi Salim <hadi@mojatatu.com>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Thomas Graf <tgraf@infradead.org>
CC: Herbert Xu <herbert@gondor.apana.org.au>
CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>