Add support for displaying and modifying rx and tx ring sizes using
ethtool.
Also, increasing version to 2.3.0.45
Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we are allowing rx ring size modification, reset fetch index
everytime. Otherwise it could have a stale value that can lead to a null
pointer dereference.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: Parvi Kaustubhi <pkaustub@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer function instead of initializing timer with the
function and data fields.
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use setup_timer function instead of initializing timer with the
function and data fields.
Signed-off-by: Allen Pais <allen.lkml@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
if 'ioread32()' returns 0xFFFFFFF, we have to go through the error
handling path as done everywhere else in this function.
Move the 'err_free_wq' label to better match its name and its location
and add a new label 'err_disable_wq'.
Update the code accordingly.
Fixes: 373fb0873d ("enic: add devcmd2")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
With -Wformat-truncation, gcc throws the following warning.
Fix this by increasing the size of devname to accommodate 15 character
netdev interface name and description.
Remove length format precision for %s. We can fit entire name.
Also increment the version.
drivers/net/ethernet/cisco/enic/enic_main.c: In function ‘enic_open’:
drivers/net/ethernet/cisco/enic/enic_main.c:1740:15: warning: ‘%u’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 12 [-Wformat-truncation=]
"%.11s-rx-%u", netdev->name, i);
^~
drivers/net/ethernet/cisco/enic/enic_main.c:1740:5: note: directive argument in the range [0, 16]
"%.11s-rx-%u", netdev->name, i);
^~~~~~~~~~~~~
drivers/net/ethernet/cisco/enic/enic_main.c:1738:4: note: ‘snprintf’ output between 6 and 18 bytes into a destination of size 16
snprintf(enic->msix[intr].devname,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sizeof(enic->msix[intr].devname),
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
"%.11s-rx-%u", netdev->name, i);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case of busy poll, napi_complete_done returns false and does not
dequeue napi. In this case do not unmask the intr. We are guaranteed
napi is called again. This reduces unnecessary iowrites.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define ndo_features_check. Hw supports offload only for ipv4 inner and
ipv4 outer pkt.
Code refactor for setting inner tcp pseudo csum.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload.
enic supports offload of only one ipv4/udp port.
There are two modes that fw supports for vxlan offload.
mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
if checksum of csum is ok. This bit is or of ip_csum_ok and
tcp_udp_csum_ok
mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
BIT(1) is set if outer_ip_csum_ok/
BIT(2) is set if outer_tcp_csum_ok
tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK.
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd
overlay_offload_ctrl: enable/disable offload
overlay_offload_cfg: update offload udp port number
get_supported_feature_ver: get hw supported offload version. Each
version has different bitmap for csum_ok/encap
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
napi_complete_done() allows to opt-in for gro_flush_timeout,
added back in linux-3.19, commit 3b47d30396
("net: gro: add a per device gro flush timer")
This allows for more efficient GRO aggregation without
sacrifying latencies.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The network device operation for reading statistics is only called
in one place, and it ignores the return value. Having a structure
return value is potentially confusing because some future driver could
incorrectly assume that the return value was used.
Fix all drivers with ndo_get_stats64 to have a void function.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets the skb l4/l3 hash based on NIC_CFG_RSS_HASH_TYPE_*,
which is bit mask. This is wrong. Hw actually provides us enum.
Use CQ_ENET_RQ_DESC_RSS_TYPE_* to set l3 and l4 hash type.
Fixes: bf751ba802 ("driver/net: enic: record q_number and rss_hash for skb")
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mostly simple overlapping changes.
For example, David Ahern's adjacency list revamp in 'net-next'
conflicted with an adjacency list traversal bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
When MTU is changed from 9000 to 1500 while there is burst of inbound 9000
bytes packets, adaptor sometimes delivers 9000 bytes packets to 1500 bytes
buffers. This causes memory corruption and sometimes crash.
This is because of a race condition in adaptor between "RQ disable"
clearing descriptor mini-cache and mini-cache valid bit being set by
completion of descriptor fetch. This can result in stale RQ desc being
cached and used when packets arrive. In this case, the stale descriptor
have old MTU value.
Solution is to write RQ->disable twice. The first write will stop any
further desc fetches, allowing the second disable to clear the mini-cache
valid bit without danger of a race.
Also, the check for rq->running becoming 0 after writing rq->enable to 0
is not done properly. When incoming packets are flooding the interface,
rq->running will pulse high for each dropped packet. Since the driver was
waiting for 10us between each poll, it is possible to see rq->running = 1
1000 times in a row, even though it is not actually stuck running.
This results in false failure of vnic_rq_disable(). Fix is to try more
than 1000 time without delay between polls to ensure we do not miss when
running goes low.
In old adaptors rq->enable needs to be re-written to 0 when posted_index
is reset in vnic_rq_clean() in order to keep rq->prefetch_index in sync.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
i is defined as unsigned.
So print it with %u.
Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ethtool api {get|set}_settings is deprecated.
We move the enic driver to new api {get|set}_link_ksettings.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver sets vlan_feature to netdev->features as hardware supports all of
them on vlan interface.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't hide varibles used by the logging macros.
Miscellanea:
o Use the more common ##__VA_ARGS__ extension
o Add missing newlines to formats
o Realign arguments
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Firmware posts the devcmd result in result ring. In case of timeout, driver
does not increment the current result pointer and firmware could post the
result after timeout has occurred. During next devcmd, driver would be
reading the result of previous devcmd.
Fix this by incrementing result even in case of timeout.
Fixes: 373fb0873d ("enic: add devcmd2")
Signed-off-by: Sandeep Pillai <sanpilla@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
NAPI drivers no longer need to observe a particular protocol
to benefit from busy polling (CONFIG_NET_RX_BUSY_POLL=y)
napi_hash_add() and napi_hash_del() are automatically called
from core networking stack, respectively from
netif_napi_add() and netif_napi_del()
This patch depends on free_netdev() and netif_napi_del() being
called from process context, which seems to be the norm.
Drivers might still prefer to call napi_hash_del() on their
own, since they might combine all the rcu grace periods into
a single one, knowing their NAPI structures lifetime, while
core networking stack has no idea of a possible combining.
Once this patch proves to not bring serious regressions,
we will cleanup drivers to either remove napi_hash_del()
or provide appropriate rcu grace periods combining.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The affinity hint is used by the user space daemon, irqbalancer, to
indicate a preferred CPU mask for irqs. This patch sets the irq affinity
hint to local numa core first, when exausted we try non-local numa cores.
Also set tx xps cpus mask bassed on affinity hint.
v2: remove the global affinity policy.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code invokes hang reset in case of error interrupt. We should
hang reset only in case of tx timeout. This because of the way hang reset
is implemented in firmware. Hang reset takes more firmware resources than
soft reset. Adaptor does not generate error interrupt in case of tx
timeout.
Hang reset only in case of tx timeout, in .ndo_tx_timeout. Do soft reset
otherwise. Introduce deferred work, enic_tx_hang_reset, to do hang reset.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the enic adaptors are know to generate spurious interrupts. When
error interrupt is generated, driver just resets the device. This patch
resets the device only when an error is occurred.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The flags argument will allow control of the dissection process (for
instance whether to parse beyond L3).
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
posted_index is RO in firmware. We need not do ioread everytime to get
posted index. Store posted index locally.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
err_out_vnic_unregister is used regardless of whether
SRIOV is enabled or not.
Reported-by: Jesse Brandeburg <jesse.brangeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/built-in.o: In function `.vnic_wq_devcmd2_alloc':
(.text+0x49fe40): multiple definition of `.vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.text+0xb4318): first defined here
drivers/net/built-in.o:(.opd+0x2af00): multiple definition of `vnic_wq_devcmd2_alloc'
drivers/scsi/built-in.o:(.opd+0xad70): first defined here
drivers/net/built-in.o: In function `.vnic_wq_init_start':
(.text+0x49f9c0): multiple definition of `.vnic_wq_init_start'
drivers/scsi/built-in.o:(.text+0xb3b58): first defined here
drivers/net/built-in.o:(.opd+0x2ae88): multiple definition of `vnic_wq_init_start'
drivers/scsi/built-in.o:(.opd+0xace0): first defined here
Rename these to 'enic_*' to avoid the conflict with the functiosn of
the same name in the snic scsi driver.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
>> drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: sparse: incorrect type in assignment (different address spaces)
drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: expected void *res
drivers/net/ethernet/cisco/enic/vnic_dev.c:1095:13: got void [noderef] <asn:2>*
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
devcmd is an interface for driver to communicate with fw/adaptor. It
involves writing data to hardware registers and waiting for the result.
This mechanism does not scale well. The queuing of "no wait" devcmds is
done in firmware memory rather than on the host. Firmware memory is a
rather more scarce and valuable resource than host memory. A devcmd storm
from one vf can disrupt the service on other pf/vf. The lack of flow
control allows for possible denial of server from one VM to another.
Devcmd2 uses work queue to post the devcmds, just like tx work queue. This
allows better flow control.
Initialize devcmd2, if fails we fall back to devcmd1.
Also change the driver version.
Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devcmd resources to vnic_res_type. Add data types used by devcmd.
Signed-off-by: N V V Satyanarayana Reddy <nalreddy@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
pr_info does not give any details about the interface involved. This patch
uses netdev_info for printing the message. Use dev_info where netdev is not
ready.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the structure definitions are in .c file to make them private to
that file. This patch moves the struct definition to .h file, So that their
definitions are accessible from other files.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* Allow setting of adaptive coalescing setting for all types of interrupt.
* In msi & legacy intr, we use single interrupt for rx & tx. In this case
tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs.
Do not display tx_coal values for msi/intx. And do not allow user to set
this as well.
* Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings.
For other values, driver does not return error. So ethtool succeeds for
unsupported values. Introduce enic_coalesce_valid() function to validate
the coalescing values.
* If user requests for coalesce value greater than what adaptor supports,
driver uses the max value. We should at least log this.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adaptive interrupt coalescing is available for msix. This patch adds the support
for msi poll. Interface for adaptive interrupt coalescing is already added in
driver. We just did not enable it for legacy intr & msi.
enic_calc_int_moderation() & enic_set_int_moderation() are defined as static
after enic_poll. Since enic_poll needs it, move both of these function
definitions above enic_poll. No change in functionality.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In enic_poll, we clean tx and rx queues, when low latency busy socket polling
is happening, enic_poll will only clean tx queue. After cleaning tx, it should
return total budget for re-poll.
There is a small window between vnic_intr_unmask() and enic_poll_unlock_napi().
In this window if an irq occurs and napi is scheduled on different cpu, it tries
to acquire enic_poll_lock_napi() and fails. Unlock napi_poll before unmasking
the interrupt.
v2:
Do not change tx wonk done behaviour. Consider only rx work done for completing
napi.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use spinlock to access a single flag. We can avoid spin_locks by using
atomic variable and atomic_cmpxchg(). Use atomic_cmpxchg to set the flag
for idle to poll. And a simple atomic_set to unlock (set idle from poll).
In napi poll, if gro is enabled, we call napi_gro_receive() to deliver the
packets. Before we call napi_complete(), i.e while re-polling, if low
latency busy poll is called, we use netif_receive_skb() to deliver the packets.
At this point if there are some skb's held in GRO, busy poll could deliver the
packets out of order. So we call napi_gro_flush() to flush skbs before we
move the napi poll to idle.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This howto made sense in the 1990s when users had to manually configure
ISA cards with jumpers or vendor utilities, but with the implementation
of PCI it became increasingly less and less relevant, to the point where
it has been well over a decade since I last updated it. And there is
no value in anyone else taking over updating it either.
However the references to it continue to spread as boiler plate text
from one Kconfig file into the next. We are not doing end users any
favours by pointing them at this old document, so lets kill it with
fire, once and for all, to hopefully stop any further spread.
No code is changed in this commit, just Kconfig help text.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We do not check the return value of enic_dev_stats_dump(). If allocation
fails, we will hit NULL pointer reference.
Return only if memory allocation fails. For other failures, we return the
previously recorded values.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.
We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.
With this patch, Ethertype and IP protocol are now included in the
flow hash input.
Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
drivers/net/ethernet/rocker/rocker.c
The rocker commit was two overlapping changes, one to rename
the ->vport member to ->pport, and another making the bitmask
expression use '1ULL' instead of plain '1'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the built-in function instead of memset.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should complete notify_check before returning the credits. Once we return the
credits, adaptor may access the notify data.
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch fixes following sparse warnings:
enic_main.c:92:28: warning: symbol 'mod_table' was not declared. Should it be static?
enic_main.c:109:28: warning: symbol 'mod_range' was not declared. Should it be static?
enic_main.c:1306:5: warning: symbol 'enic_busy_poll' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
this patch fixes following sparse warning:
enic_ethtool.c:95:6: warning: symbol 'enic_intr_coal_set_rx' was not declared. Should it be static?
Signed-off-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Conflicts:
arch/arm/boot/dts/imx6sx-sdb.dts
net/sched/cls_bpf.c
Two simple sets of overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
With the commit d75b1ade56 ("net: less interrupt masking in NAPI") napi repoll
is done only when work_done == budget. When we are in busy_poll we return 0 in
napi_poll. We should return budget.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The same macros are used for rx as well. So rename it.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
When running in kdump kernel, reduce number of resources used by the driver.
This will enable NIC to operate in low memory kdump kernel environment.
Also change the driver version to .83
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When allocation of all RQs fail, we do not free previously allocated buffers,
before returning error. This causes memory leak.
This patch fixes this by calling vnic_rq_clean(), which frees all the rq
buffers.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Removes some functions that are not used anywhere:
enic_dev_enable2_done() enic_dev_enable2() enic_dev_deinit_done()
enic_dev_init_prov2() enic_vnic_dev_deinit()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds generic statistics for enic. As of now dma_map_error is the only
member. dma_map_erro is incremented every time dma maping error happens.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch checks for pci_dma_mapping_error() after dma mapping the data.
If the dma mapping fails we remove the previously queued frags and return
NETDEV_TX_OK.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes vnic_wq_buf doubly liked list. This is needed for dma_mapping
error check, in case some frag's dma map fails, we need to move back and remove
previously queued buffers.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for setting/getting rss hash key using ethtool.
v2:
respin patch to support RSS hash function changes.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use of well known RSS key might increase attack surface.
Switch to a random one, using generic helper so that all
ports share a common key.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Govindarajulu Varadarajan <_govind@gmx.com>
Cc: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All the access to wq has been moved out of hardirq context. We no longer need to
use spin_lock_irqsave.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
enic_isr_legacy(), enic_isr_msix() & enic_isr_msi() run from hard
interrupt context.
They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check and update posted_index only when skb->xmit_more is 0 or tx queue is full.
v2:
use txq_map instead of skb_get_queue_mapping(skb)
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With the commit d75b1ade56 ("net: less interrupt masking in NAPI") napi repoll
is done only when work_done == budget. In tx napi poll we always return 0.
So tx napi is not called again and we do not clean up the tx ring.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The following warning is shown when spinlock debug is enabled.
This occurs when enic_flow_may_expire timer function is running and
enic_stop is called on same CPU.
Fix this by using spink_lock_bh().
=================================
[ INFO: inconsistent lock state ]
3.17.0-netnext-05504-g59f35b8 #268 Not tainted
---------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
ifconfig/443 [HC0[0]:SC0[0]:HE1:SE1] takes:
(&(&enic->rfs_h.lock)->rlock){+.?...}, at:
enic_rfs_flw_tbl_free+0x34/0xd0 [enic]
{IN-SOFTIRQ-W} state was registered at:
[<ffffffff810a25af>] __lock_acquire+0x83f/0x21c0
[<ffffffff810a45f2>] lock_acquire+0xa2/0xd0
[<ffffffff814913fc>] _raw_spin_lock+0x3c/0x80
[<ffffffffa029c3d5>] enic_flow_may_expire+0x25/0x130[enic]
[<ffffffff810bcd07>] call_timer_fn+0x77/0x100
[<ffffffff810bd8e3>] run_timer_softirq+0x1e3/0x270
[<ffffffff8105f9ae>] __do_softirq+0x14e/0x280
[<ffffffff8105fdae>] irq_exit+0x8e/0xb0
[<ffffffff8103da0f>] smp_apic_timer_interrupt+0x3f/0x50
[<ffffffff81493742>] apic_timer_interrupt+0x72/0x80
[<ffffffff81018143>] default_idle+0x13/0x20
[<ffffffff81018a6a>] arch_cpu_idle+0xa/0x10
[<ffffffff81097676>] cpu_startup_entry+0x2c6/0x330
[<ffffffff8103b7ad>] start_secondary+0x21d/0x290
irq event stamp: 2997
hardirqs last enabled at (2997): [<ffffffff81491865>] _raw_spin_unlock_irqrestore+0x65/0x90
hardirqs last disabled at (2996): [<ffffffff814915e6>] _raw_spin_lock_irqsave+0x26/0x90
softirqs last enabled at (2968): [<ffffffff813b57a3>] dev_deactivate_many+0x213/0x260
softirqs last disabled at (2966): [<ffffffff813b5783>] dev_deactivate_many+0x1f3/0x260
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&(&enic->rfs_h.lock)->rlock);
<Interrupt>
lock(&(&enic->rfs_h.lock)->rlock);
*** DEADLOCK ***
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the much more common pr_warn instead of pr_warning.
Other miscellanea:
o Typo fixes submiting/submitting
o Coalesce formats
o Realign arguments
o Add missing terminating '\n' to formats
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for setting/getting rx_copybreak using
generic ethtool tunable.
Defines enic_get_tunable() & enic_set_tunable() to get/set rx_copybreak.
As of now, these two function supports only rx_copybreak.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calling dma_map_single()/dma_unmap_single() is quite expensive compared
to copying a small packet. So let's copy short frames and keep the buffers
mapped.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to
meet kernel coding style guidelines. This issue was reported by checkpatch.
A simplified version of the semantic patch that makes this change is as
follows (http://coccinelle.lip6.fr/):
// <smpl>
@@
identifier i;
declarer name DEFINE_PCI_DEVICE_TABLE;
initializer z;
@@
- DEFINE_PCI_DEVICE_TABLE(i)
+ const struct pci_device_id i[]
= z;
// </smpl>
[bhelgaas: add semantic patch]
Signed-off-by: Benoit Taine <benoit.taine@lip6.fr>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Remove the now unnecessary memset too.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Sujith Sankar <ssujith@cisco.com>
Cc: Govindarajulu Varadarajan <_govind@gmx.com>
Cc: Neel Patel <neepatel@cisco.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch impliments ethtool_ops->get_rxnfc() to display the classifier
filter added by the driver.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch removes the #ifdef CONFIG_RFS_ACCEL around the classifier filter
structures. This makes the filter structures available when CONFIG_RFS_ACCEL = n.
Introduce enic_rfs_timer_start() & enic_rfs_timer_stop() to start/stop the
timer. These two functions are nop when CONFIG_RFS_ACCEL = n.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
enic_set_coalesce() has two problems.
* It should return -EINVAL and not -EOPNOTSUPP for invalid coalesce values.
* In case of MSIX, enic_set_coalesce return error after applying requested
coalescing setting partially. We should either apply all the setting requeste
and return success or apply non and return error.
* This patch also simplifies the algo.
This was introduced by
'7c2ce6e60f703 enic: Add support for adaptive interrupt coalescing'
These changes were suggested by Ben Hutchings here
http://www.spinics.net/lists/netdev/msg283972.html
Also change enic driver version.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If CONFIG_RFS_ACCEL=n:
drivers/net/ethernet/cisco/enic/enic_main.c: In function 'enic_open':
drivers/net/ethernet/cisco/enic/enic_main.c:1603:2: error: implicit declaration of function 'enic_rfs_flw_tbl_init' [-Werror=implicit-function-declaration]
drivers/net/ethernet/cisco/enic/enic_main.c: In function 'enic_stop':
drivers/net/ethernet/cisco/enic/enic_main.c:1630:2: error: implicit declaration of function 'enic_rfs_flw_tbl_free' [-Werror=implicit-function-declaration]
Introduced in commit a145df23ef ("enic: Add
Accelerated RFS support").
Dummy functions are provided, but their prototypes are missing, causing the
build failure. Provide dummy static inline functions instead to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Till now enic had been doing tx clean in isr.
Using napi infrastructure to move the tx clean up out of isr to softirq.
Now, wq isr schedules napi poll. In enic_poll_msix_wq we clean up the tx queus.
This is applicable only on MSIX. In INTx and MSI we use single napi to clean
both rx & tx queues.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for low latency busy_poll.
* Introduce drivers ndo_busy_poll function enic_busy_poll, which is called by
socket waiting for data.
* Introduce locking between napi_poll nad busy_poll
* enic_busy_poll cleans up all the rx pkts possible. While in busy_poll, rq
holds the state ENIC_POLL_STATE_POLL. While in napi_poll, rq holds the state
ENIC_POLL_STATE_NAPI.
* in napi_poll we return if we are in busy_poll. Incase of INTx & msix, we just
service wq and return if busy_poll is going on.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We were experiencing occasional "BUG: scheduling while atomic" splats
in our testing. Enabling DEBUG_SPINLOCK and DEBUG_LOCKDEP in the kernel
exposed a lockdep in the enic driver.
enic 0000:0b:00.0 eth2: Link UP
======================================================
[ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
3.12.0-rc1.x86_64-dbg+ #2 Tainted: GF W
------------------------------------------------------
NetworkManager/4209 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire:
(&(&enic->devcmd_lock)->rlock){+.+...}, at: [<ffffffffa026b7e4>] enic_dev_packet_filter+0x44/0x90 [enic]
The fix was to replace spin_lock with spin_lock_bh for the enic
devcmd_lock, so that soft irqs would be disabled while the lock
is held.
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds supports for Accelerated Receive Flow Steering.
When the desired rx is different from current rq, for a flow, kernel calls the
driver function enic_rx_flow_steer(). enic_rx_flow_steer adds a IP-TCP/UDP
hardware filter.
Driver registers a timer function enic_flow_may_expire. This function is called
every HZ/4 seconds. In this function we check if the added filter has expired
by calling rps_may_expire_flow(). If the flow has expired, it removes the hw
filter.
As of now adaptor supports only IPv4 - TCP/UDP filters.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
rx_cpu_rmap provides the reverse irq cpu affinity. This patch allocates and
sets drivers netdev->rx_cpu_rmap accordingly.
rx_cpu_rmap is set in enic_request_intr() which is called by enic_open and
rx_cpu_rmap is freed in enic_free_intr() which is called by enic_stop.
This is used by Accelerated RFS.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds interface to add and delete IP 5 tuple filter. This interface
is used by Accelerated RFS code to steer a flow to corresponding receive
queue.
As of now adaptor supports only ipv4 + tcp/udp packet steering.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Hardware (in readq(&devcmd->args[0])) returns positive number in case of error.
But _vnic_dev_cmd should return a negative value in case of error.
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This change updates the enic driver to make use of __dev_uc_sync and
__dev_mc_sync calls. Previously the driver was doing its own list
management by storing the mc_addr and uc_addr list in a 32 address array.
With this change the sync data is stored in the netdev_addr_list structures
and instead we just track how many addresses we have written to the device.
When we encounter 32 we stop and print a message as occurred previously with
the old approach.
Other than the core change the only other bit needed was to propagate the
constant attribute with the MAC address as there were several spots where
is twas only passed as a u8 * instead of a const u8 *.
This patch is meant to maintain the original functionality without the use
of the mc_addr and uc_addr arrays.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Division of a 32 bit number by a 64 bit number causes the following link
error introduced by
7c2ce6e60f "enic: Add support for adaptive interrupt coalescing"
drivers/built-in.o: In function `enic_poll_msix':
enic_main.c:(.text+0x48710a): undefined reference to `__udivdi3'
make: *** [vmlinux] Error 1
Since numerator is 32 bit, convert denominator to 32 bit accordingly.
Fixes: 7c2ce6e60f ("enic: Add support for adaptive interrupt coalescing")
Reported-by: Jim Davis <jim.epost@gmail.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Sujith Sankar <ssujith@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for adaptive interrupt coalescing.
For small pkts with low pkt rate, we can decrease the coalescing interrupt
dynamically which decreases the latency. This however increases the cpu
utilization. Based on testing with different coal intr and pkt rate we came up
with a table(mod_table) with rx_rate and coalescing interrupt value where we
get low latency without significant increase in cpu. mod_table table stores
the coalescing timer percentage value for different throughputs.
Function enic_calc_int_moderation() calculates the desired coalescing intr timer
value. This function is called in driver rx napi_poll. The actual value is set
by enic_set_int_moderation() which is called when napi_poll is complete. i.e
when we unmask the rx intr.
Adaptive coal intr is support only when driver is using msix intr. Because
intr is not shared.
Struct mod_range is used to store only the default adaptive coalescing intr
value.
Adaptive coal intr calue is calculated by
timer = range_start + ((rx_coal->range_end - range_start) *
mod_table[index].range_percent / 100);
rx_coal->range_end is the rx-usecs-high value set using ethtool.
range_start is rx-usecs-low, set using ethtool, if rx_small_pkt_bytes_cnt is
greater than 2 * rx_large_pkt_bytes_cnt. i.e small pkts are dominant. Else its
rx-usecs-low + 3.
Cc: Christian Benvenuti <benve@cisco.com>
Cc: Neel Patel <neepatel@cisco.com>
Signed-off-by: Sujith Sankar <ssujith@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net: get rid of SET_ETHTOOL_OPS
Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.
Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
- SET_ETHTOOL_OPS(dev, ops);
+ dev->ethtool_ops = ops;
Compile tested only, but I'd seriously wonder if this broke anything.
Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The enic driver fails to build on ARM with:
In file included from drivers/net/ethernet/cisco/enic/enic_res.c:40:0:
drivers/net/ethernet/cisco/enic/enic.h:48:2: error: expected specifier-qualifier-list before 'irqreturn_t'
irqreturn_t (*isr)(int, void *);
^
Nothing in the driver is explicitly including the irq definitions, so we add
an include of linux/irq.h to pick them up.
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace dev_kfree_skb with dev_kfree_skb_any in enic_hard_start_xmit
that can be called in hard irq and other contexts.
enic_hard_start_xmit only frees the skb when dropping it.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Processing any incoming packets with a with a napi budget of 0
is incorrect driver behavior.
This matters as netpoll will shortly call drivers with a budget of 0
to avoid receive packet processing happening in hard irq context.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>