Several conflicts here.
NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.
Parallel additions to net/pkt_cls.h and net/sch_generic.h
A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.
The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is very similar to the Macvlan VEPA mode, however, there is some
difference. IPvlan uses the mac-address of the lower device, so the VEPA
mode has implications of ICMP-redirects for packets destined for its
immediate neighbors sharing same master since the packets will have same
source and dest mac. The external switch/router will send redirect msg.
Having said that, this will be useful tool in terms of debugging
since IPvlan will not switch packets within its slaves and rely completely
on the external entity as intended in 802.1Qbg.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPvlan has always operated in bridge mode. However there are scenarios
where each slave should be able to talk through the master device but
not necessarily across each other. Think of an environment where each
of a namespace is a private and independant customer. In this scenario
the machine which is hosting these namespaces neither want to tell who
their neighbor is nor the individual namespaces care to talk to neighbor
on short-circuited network path.
This patch implements the mode that is very similar to the 'private' mode
in macvlan where individual slaves can send and receive traffic through
the master device, just that they can not talk among slave devices.
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warnings:
drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:224:5: warning:
symbol 'aq_ethtool_get_coalesce' was not declared. Should it be static?
drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:245:5: warning:
symbol 'aq_ethtool_set_coalesce' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bcm_sf2 and b53 replicate the same operations: clear all VLANs and set
their ports to the default VLAN tag (1 for these devices) so export the
b53 function doing just that.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Creating a macvtap interface with the liquidio VF driver as lower device
causes this alarming message to show up in dmesg:
liquidio_link_ctrl_cmd_completion Unknown cmd 27
That's actually a false alarm because cmd 27 is the value of the macro
OCTNET_CMD_SET_UC_LIST which is known. It's a control command sent from
host to NIC firmware to set the unicast MAC address list of the macvtap
lower device.
Make the false alarm go away by adding a case for OCTNET_CMD_SET_UC_LIST
in liquidio_link_ctrl_cmd_completion().
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In some cases, like internal vSwitch, the host doesn't provide
send indirection table updates. This patch sets the table to be
equal weight after subchannels are all open. Otherwise, all workload
will be on one TX channel.
As tested, this patch has largely increased the throughput over
internal vSwitch.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Check for CAP_NET_ADMIN with ns_capable() instead of capable()
to allow usage of ppp in user namespace other than the init one.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2017-10-27
This patchset is a proposal of how the Traffic Control subsystem can
be used to offload the configuration of the Credit Based Shaper
(defined in the IEEE 802.1Q-2014 Section 8.6.8.2) into supported
network devices.
As part of this work, we've assessed previous public discussions
related to TSN enabling: patches from Henrik Austad (Cisco), the
presentation from Eric Mann at Linux Plumbers 2012, patches from
Gangfeng Huang (National Instruments) and the current state of the
OpenAVNU project (https://github.com/AVnu/OpenAvnu/).
Overview
========
Time-sensitive Networking (TSN) is a set of standards that aim to
address resources availability for providing bandwidth reservation and
bounded latency on Ethernet based LANs. The proposal described here
aims to cover mainly what is needed to enable the following standards:
802.1Qat and 802.1Qav.
The initial target of this work is the Intel i210 NIC, but other
controllers' datasheet were also taken into account, like the Renesas
RZ/A1H RZ/A1M group and the Synopsis DesignWare Ethernet QoS
controller.
Proposal
========
Feature-wise, what is covered here is the configuration interfaces for
HW implementations of the Credit-Based shaper (CBS, 802.1Qav). CBS is
a per-queue shaper. Given that this feature is related to traffic
shaping, and that the traffic control subsystem already provides a
queueing discipline that offloads config into the device driver (i.e.
mqprio), designing a new qdisc for the specific purpose of offloading
the config for the CBS shaper seemed like a good fit.
For steering traffic into the correct queues, we use the socket option
SO_PRIORITY and then a mechanism to map priority to traffic classes /
Tx queues. The qdisc mqprio is currently used in our tests.
As for the CBS config interface, this patchset is proposing a new
qdisc called 'cbs'. Its 'tc' cmd line is:
$ tc qdisc add dev IFACE parent ID cbs locredit N hicredit M sendslope S \
idleslope I
Note that the parameters for this qdisc are the ones defined by the
802.1Q-2014 spec, so no hardware specific functionality is exposed here.
Per-stream shaping, as defined by IEEE 802.1Q-2014 Section 34.6.1, is
not yet covered by this proposal.
v2: Merged patch 6 of the original series into patch 4 based on feedback
from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds checks at approprate places whether *dma_map*() call has
succeeded or not.
Original Work by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirr: In particular with
ethtool -C <ifname> rx-usecs 0 rx-frames 0
now it is possible to disable RX delays when NIC usage requires low-latency.
See this thread for context:
https://www.spinics.net/lists/netdev/msg217665.html
My specific case is that:
We have many computers with gigabit Realtek NICs. For 2 such computers
connected to a gigabit store-and-forward switch the minimum round-trip
time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs.
However it turned out that when Ethernet frame length transitions 127 ->
128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT
transitions step-wise to ~ 270μs.
As David Light said this is RX interrupt mitigation done by NIC which creates
the latency. For workloads when low-latency is required with e.g. Intel,
BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce
the time NIC delays before interrupting CPU, but it turned out
`ethtool -C` is not supported by r8169 driver.
Like Stéphane ANCELOT I've traced the problem down to IntrMitigate being
hardcoded to != 0 for our chips (we have 8168 based NICs):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460
static void rtl_hw_start_8169(struct net_device *dev) {
...
/*
* Undocumented corner. Supposedly:
* (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
*/
RTL_W16(IntrMitigate, 0x0000);
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346
static void rtl_hw_start_8168(struct net_device *dev) {
...
RTL_W16(IntrMitigate, 0x5151);
and then I've also found
https://www.spinics.net/lists/netdev/msg217665.html
and original Francois' patch:
https://www.spinics.net/lists/netdev/msg217984.htmlhttps://www.spinics.net/lists/netdev/msg218207.html
So could we please finally get support for tuning r8169 interrupt
coalescing in tree? (so that next poor soul who hits the problem does
not need to go all the way to dig into driver sources and internet
wildly and finally patch locally
-RTL_W16(IntrMitigate, 0x5151);
+RTL_W16(IntrMitigate, 0x5100);
guessing whether it is right or not and also having to care to deploy
the patch everywhere it needs to be used, etc...).
To do so I've took original Francois's patch from 2012 and reworked it a bit:
- updated to latest net-next.git;
- adjusted scaling setup based on feedback from Hayes to pick up scaling
vector depending not only on link speed but also on CPlusCmd[0:1] and to
adjust CPlusCmd[0:1] correspondingly when setting timings;
- improved a bit (I think so) error handling.
I've tested the patch on "RTL8168d/8111d" (XID 083000c0) and with it and
`ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves:
- minimum RTT latency:
~270μs -> ~30μs (small packet),
~330μs -> ~110μs (full 1.5K ethernet frame)
- average RTT latency:
~480μs -> ~50μs (small packet),
~560μs -> ~125μs (full 1.5K ethernet frame)
( before:
root@neo1:# ping -i 0 -w 3 -s 82 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms
root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms
rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms
after:
root@neo1# ping -i 0 -w 3 -s 82 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms
root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
--- neo2.kirr.nexedi.com ping statistics ---
21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms
the small -> 1.5K latency growth is understandable as it takes ~15μs
to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch
and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet
segments which is already 60μs;
probably something a bit else is also there as e.g. on Linux, even
with `cpupower frequency-set -g performance`, on some computers I've
noticed the kernel can be spending more time in software-only mode
when incoming packets go in less frequently. E.g. this program can
demonstrate the effect for ICMP ECHO processing:
https://lab.nexedi.com/kirr/bcc/blob/43cfc13b/tools/pinglat.py
(later this was found to be partly due to C-states exit latencies) )
We have this patch running in our testing setup for 1 months already
without any issues observed.
It remains to be clarified whether RX and TX timers use the same base.
For now I've set them equally, but Francois's original patch version
suggests it could be not the same.
I've got no feedback at all to my original posting of this patch and questions
https://www.spinics.net/lists/netdev/msg457173.html
neither from Francois, nor from any people from Realtek during one month.
So I suggest we simply apply it to net-next.git now.
Cc: Francois Romieu <romieu@fr.zoreil.com>
Cc: Hayes Wang <hayeswang@realtek.com>
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Stéphane ANCELOT <sancelot@free.fr>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 9a393b5d59 ("tap: tap as an independent module") created a
separate tap module that implements tap functionality and exports
interfaces that will be used by macvtap and ipvtap modules to create
create respective tap devices.
However, that patch introduced a regression wherein the modules macvtap
and ipvtap can be removed (through modprobe -r) while there are
applications using the respective /dev/tapX devices. These applications
cause kernel to hold reference to /dev/tapX through 'struct cdev
macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
modules respectively. So, when the application is later closed the
kernel panics because we are referencing KVA that is present in the
unloaded modules.
----------8<------- Example ----------8<----------
$ sudo ip li add name mv0 link enp7s0 type macvtap
$ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
14:mv0@enp7s0:
$ cat /dev/tap14 &
$ lsmod |egrep -i 'tap|vlan'
macvtap 16384 0
macvlan 24576 1 macvtap
tap 24576 3 macvtap
$ sudo modprobe -r macvtap
$ fg
cat /dev/tap14
^C
<...system panics...>
BUG: unable to handle kernel paging request at ffffffffa038c500
IP: cdev_put+0xf/0x30
----------8<-----------------8<----------
The fix is to set cdev.owner to the module that creates the tap device
(either macvtap or ipvtap). With this set, the operations (in
fs/char_dev.c) on char device holds and releases the module through
cdev_get() and cdev_put() and will not allow the module to unload
prematurely.
Fixes: 9a393b5d59 (tap: tap as an independent module)
Signed-off-by: Girish Moodalbail <girish.moodalbail@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Tobias Klauser <tklauser@distanz.ch>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Wei Yongjun <weiyongjun1@huawei.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Matan Barak <matanb@mellanox.com>
Cc: Leon Romanovsky <leonro@mellanox.com>
Cc: netdev@vger.kernel.org
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Roman Yeryomin <leroi.lists@gmail.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Philippe Reynes <tremyfr@gmail.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Denis Kirjanov <kda@linux-powerpc.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Santosh Raspatur <santosh@chelsio.com>
Cc: Ganesh Goudar <ganeshgr@chelsio.com>
Cc: Casey Leedom <leedom@chelsio.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: David Howells <dhowells@redhat.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Allen Pais <allen.lkml@gmail.com>
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly.
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
An unaligned alloc_frag->offset caused by previous allocation will
result an unaligned skb->head. This will lead unaligned
skb_shared_info and then unaligned dataref which requires to be
aligned for accessing on some architecture. Fix this by aligning
alloc_frag->offset before the frag refilling.
Fixes: 0bbd7dad34 ("tun: make tun_build_skb() thread safe")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Wei Wei <dotweiba@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Reported-by: Wei Wei <dotweiba@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently stmmac driver not copying the valid ethernet
MAC address to MAC registers. This patch takes care
of updating the MAC register with MAC address.
Signed-off-by: Bhadram Varka <vbhadram@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add message to inform the VF MAC was changed and the need to restart
the VF driver for the changes to be effective.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Doing ifconfig down on VF driver in the middle of receiving line rate
traffic causes a kernel panic:
LiquidIO_VF 0000:02:00.3: should not come here should not get rx when poll mode = 0 for vf
BUG: unable to handle kernel NULL pointer dereference at (null)
.
.
.
Call Trace:
<IRQ>
? tasklet_action+0x102/0x120
__do_softirq+0x91/0x292
irq_exit+0xb6/0xc0
do_IRQ+0x4f/0xd0
common_interrupt+0x93/0x93
</IRQ>
RIP: 0010:cpuidle_enter_state+0x142/0x2f0
RSP: 0018:ffffffffa6403e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff59
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001f
RDX: 0000000000000000 RSI: 000000002ab7519f RDI: 0000000000000000
RBP: ffffffffa6403e58 R08: 0000000000000084 R09: 0000000000000018
R10: ffffffffa6403df0 R11: 00000000000003c7 R12: 0000000000000003
R13: ffffd27ebd806800 R14: ffffffffa64d40d8 R15: 0000007be072823f
cpuidle_enter+0x17/0x20
call_cpuidle+0x23/0x40
do_idle+0x18c/0x1f0
cpu_startup_entry+0x64/0x70
rest_init+0xa5/0xb0
start_kernel+0x45e/0x46b
x86_64_start_reservations+0x24/0x26
x86_64_start_kernel+0x6f/0x72
secondary_startup_64+0xa5/0xa5
Code: Bad RIP value.
RIP: (null) RSP: ffff9246ed003f28
CR2: 0000000000000000
---[ end trace 92731e80f31b7d7d ]---
Kernel panic - not syncing: Fatal exception in interrupt
Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception in interrupt
Reason is: in the function assigned to net_device_ops->ndo_stop, the steps
for bringing down the interface are done in the wrong order. The step that
notifies the NIC firmware to stop forwarding packets to host is done too
late. Fix it by moving that step to the beginning.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix undefined symbols when CONFIG_VLAN_8021Q or CONFIG_INET is not set.
Fixes: 8c95f773b4 ("bnxt_en: add support for Flower based vxlan encap/decap offload")
Reported-by: Jakub Kicinski <kubakici@wp.pl>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for Credit-Based Shaper (CBS) qdisc offload
from Traffic Control system. This support enable us to leverage the
Forwarding and Queuing for Time-Sensitive Streams (FQTSS) features
from Intel i210 Ethernet Controller. FQTSS is the former 802.1Qav
standard which was merged into 802.1Q in 2014. It enables traffic
prioritization and bandwidth reservation via the Credit-Based Shaper
which is implemented in hardware by i210 controller.
The patch introduces the igb_setup_tc() function which implements the
support for CBS qdisc hardware offload in the IGB driver. CBS offload
is the only traffic control offload supported by the driver at the
moment.
FQTSS transmission mode from i210 controller is automatically enabled
by the IGB driver when the CBS is enabled for the first hardware
queue. Likewise, FQTSS mode is automatically disabled when CBS is
disabled for the last hardware queue. Changing FQTSS mode requires NIC
reset.
FQTSS feature is supported by i210 controller only.
Signed-off-by: Andre Guedes <andre.guedes@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Henrik Austad <henrik@austad.us>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Defer ringing the Tx doorbell if skb->xmit_more is set unless the Tx queue
is full or stopped. To keep latency low, use a deferral limit of 8
packets. We chose 8 because Octeon can fetch at most 8 packets in a single
PCI read, and our tests show that 8 results in low latency.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Satanand Burla <satananda.burla@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For all non-fatal reset conditions, the hypervisor will send a failover when
we attempt to initialize the crq and the vnic client is expected to handle
that failover instead of the existing non-fatal reset. To handle this, we
need to return from init with a return code that indicates that we have hit
this case.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update ibmvnic reset infrastructure to include a new reset option that will
allow changing of tunable parameters. There currently is no way to request
different capabilities from the vnic server on the fly so this patch
achieves this by resetting the driver and attempting to log in with the
requested changes. If the reset operation fails, the old values of the
tunable parameters are stored in the "fallback" struct and we attempt to
login with the fallback values.
Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add gro_cells so that rmnet devices can call gro_cells_receive
instead of netif_receive_skb.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement 64 bit per cpu stats.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rmnet device needs to assigned for all packets in the
deaggregation path based on the mux id, so the check is not needed.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since packet is always consumed by rmnet_rx_handler(), we always
return RX_HANDLER_CONSUMED. There is no need to pass on this
value through multiple functions.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2017-10-26
This series contains fixes to e1000, igb, ixgbe and i40e.
Vincenzo Maffione fixes a potential race condition which would result in
the interface being up but transmits are disabled in the hardware.
Colin Ian King fixes a possible NULL pointer dereference in e1000, which
was found by Coverity.
Jean-Philippe Brucker fixes a possible kernel panic when a driver cannot
map a transmit buffer, which is caused by an erroneous test.
Alex provides a fix for ixgbe, which is a partial revert of the commit
ffed21bcee ("ixgbe: Don't bother clearing buffer memory for descriptor rings")
because the previous commit messed up the exception handling path by
adding the count back in when we did not need to. Also fixed a typo,
where the transmit ITR setting was being used to determine if we were
using adaptive receive interrupt moderation or not. Lastly, fixed a
memory leak by including programming descriptors in the cleaned count.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
TC flower is not enabled on VFs and when there's no FW support.
Alloc the tc_info{} struct at init time only when TC flower is being
enabled.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements periodic querying of cfa flow stats
in batches to compute the 'lastused' attribute of TC flow stats.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add routines for issuing the hwrm_cfa_encap_record_alloc/free
and hwrm_cfa_decap_filter_alloc/free FW cmds needed for
supporting vxlan encap/decap offload.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds IPv4 vxlan encap/decap action support to TC-flower
offload.
For vxlan encap, the driver maintains a tunnel encap hash-table.
When a new flow with a tunnel encap action arrives, this table
is looked up; if an encap entry exists, it uses the already
programmed encap_record_handle as the tunnel_handle in the
hwrm_cfa_flow_alloc cmd. Else, a new encap node is added and the
L2 header fields are queried via a route lookup.
hwrm_cfa_encap_record_alloc cmd is used to create a new encap
record and the encap_record_handle is used as the tunnel_handle
while adding the flow.
For vxlan decap, the driver maintains a tunnel decap hash-table.
When a new flow with a tunnel decap action arrives, this table
is looked up; if a decap entry exists, it uses the already
programmed decap_filter_handle as the tunnel_handle in the
hwrm_cfa_flow_alloc cmd. Else, a new decap node is added and
a decap_filter_handle is alloc'd via the hwrm_cfa_decap_filter_alloc
cmd. This handle is used as the tunnel_handle while adding the flow.
The code to issue the HWRM FW cmds is introduced in a follow-up patch.
Signed-off-by: Sathya Perla <sathya.perla@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mapping of the ethtool coalescing parameters to hardware parameters
is now done in bnxt_hwrm_set_coal_params(). The same function can
handle both RX and TX settings. The code is now more clear. Some
adjustments have been made to get better hardware settings. The
coal_frames setting is now accurately set in hardware. The max_timer
is set to coal_ticks value.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current IRQ coalescing logic is a little messy. The ethtool
parameters are mapped to hardware parameters in a way that is difficult
to understand. The first step is to better organize the parameters
by adding the new structure bnxt_coal. The structure is used by both
the RX and TX sets of coalescing parameters.
Adjust the default coal_ticks to 14 us and 28 us for RX and TX.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a firmware internal reset after driver is unloaded.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some NICs have a firmware enforced maximum MTU setting by management
firmware. Set up netdev->max_mtu accordingly.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to call bnxt_approve_mac() which will send a message to the
PF if the MAC address hasn't changed.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code retrieves the firmware package version from firmware
everytime ethtool -i is run. There is no reason to do that as the
firmware will not change while the driver is loaded. Get the version
once at init time.
Also, display the full 4-part firmware version string and remove the
less useful interface spec version.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return -EINVAL if the length is zero and not proceed to do essentially
nothing.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Rob Miller <rmiller@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new PCIe device ID and chip number for bcm58804
Signed-off-by: Ray Jui <ray.jui@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>