This patch tries to implement XDP for tun. The implementation was
split into two parts:
- fast path: small and no gso packet. We try to do XDP at page level
before build_skb(). For XDP_TX, since creating/destroying queues
were completely under control of userspace, it was implemented
through generic XDP helper after skb has been built. This could be
optimized in the future.
- slow path: big or gso packet. We try to do it after skb was created
through generic XDP helpers.
Test were done through pktgen with small packets.
xdp1 test shows ~41.1% improvement:
Before: ~1.7Mpps
After: ~2.3Mpps
xdp_redirect to ixgbe shows ~60% improvement:
Before: ~0.8Mpps
After: ~1.38Mpps
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch tries to export some generic xdp helpers to drivers. This
can let driver to do XDP for a specific skb. This is useful for the
case when the packet is hard to be processed at page level directly
(e.g jumbo/GSO frame).
With this patch, there's no need for driver to forbid the XDP set when
configuration is not suitable. Instead, it can defer the XDP for
packets that is hard to be processed directly after skb is created.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We use tun_alloc_skb() which calls sock_alloc_send_pskb() to allocate
skb in the past. This socket based method is not suitable for high
speed userspace like virtualization which usually:
- ignore sk_sndbuf (INT_MAX) and expect to receive the packet as fast as
possible
- don't want to be block at sendmsg()
To eliminate the above overheads, this patch tries to use build_skb()
for small packet. We will do this only when the following conditions
are all met:
- TAP instead of TUN
- sk_sndbuf is INT_MAX
- caller don't want to be blocked
- zerocopy is not used
- packet size is smaller enough to use build_skb()
Pktgen from guest to host shows ~11% improvement for rx pps of tap:
Before: ~1.70Mpps
After : ~1.88Mpps
What's more important, this makes it possible to implement XDP for tap
before creating skbs.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calls to rtnl_dump_ifinfo() are protected by RTNL lock. So are the
{list,unlist}_netdevice() calls where we bump the net->dev_base_seq
number.
For this reason net->dev_base_seq can't change under out feet while
we're looping over links in rtnl_dump_ifinfo(). So move the check for
net->dev_base_seq change (since the last time we were called) out of the
loop.
This way we avoid giving a wrong impression that there are concurrent
updates to the link list going on while we're iterating over them.
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ptp_enable was a global static variable. Moved this global variable to
octeon_device structure and removed extra device id check.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kernel test robot reports error when running test_xdp_redirect.sh.
Check if ip tool supports xdpgeneric, if not, skip the test.
Signed-off-by: William Tu <u9012063@gmail.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make these structures const as they only stored in the profile field of
a mlxsw_driver structure, which is of type const.
Done using Coccinelle.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
OVS_NLERR already adds a newline so these just add blank
lines to the logging.
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The firmware expects a MAC_REPR control message when a MAC representor
is created. The driver should expect a PORTMOD message to follow which
will provide the link states of the physical port associated with the MAC
representor.
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Because we remove the UFO support, we will also remove the
CHECKSUM_UNNECESSARY check in skb_needs_check().
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use the dma_*() interfaces rather than the pci_*() interfaces.
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver does not check if mapping dma memory succeed.
The patch adds the checks and failure handling.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attempts to connect to a local address with a socket bound
to a device with the local address hangs if there is no listener:
$ ip addr sh dev eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 02:e0:f9:1c:00:37 brd ff:ff:ff:ff:ff:ff
inet 10.100.1.4/24 scope global eth1
valid_lft forever preferred_lft forever
inet6 2001:db8:1::4/120 scope global
valid_lft forever preferred_lft forever
inet6 fe80::e0:f9ff:fe1c:37/64 scope link
valid_lft forever preferred_lft forever
$ vrf-test -I eth1 -r 10.100.1.4
<hangs when there is no server>
(don't let the command name fool you; vrf-test works without vrfs.)
The problem is that the original intended device, eth1 in this case, is
lost when the tcp reset is sent, so the socket lookup does not find a
match for the reset and the connect attempt hangs. Fix by adjusting
orig_oif for local traffic to the device from the fib lookup result.
With this patch you get the more user friendly:
$ vrf-test -I eth1 -r 10.100.1.4
connect failed: 111: Connection refused
orig_oif is saved to the newly created rtable as rt_iif and when set
it is used as the dif for socket lookups. It is set based on flowi4_oif
passed in to ip_route_output_key_hash_rcu and will be set to either
the loopback device, an l3mdev device, nothing (flowi4_oif = 0 which
is the case in the example above) or a netdev index depending on the
lookup path. In each case, resetting orig_oif to the device in the fib
result for the RTN_LOCAL case allows the actual device to be preserved
as the skb tx and rx is done over the loopback or VRF device.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implemented workarounds for the following dTSEC Erratum:
A002, A004, A0012, A0014, A004839 on several operations
that involve MAC CFG register changes: adjust link,
rx pause frames, modify MAC address.
Signed-off-by: Florinel Iordache <florinel.iordache@nxp.com>
Acked-by: Madalin Bucur <madalin.bucur@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These were copy and paste bugs, but I believe they are harmless.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function has a copy and paste bug so it accidentally calls the add
function instead of the delete function.
Fixes: 76ad4f0ee7 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Wu says:
====================
rockchip: Add the integrated PHY support
The rk3228 and rk3328 support integrated PHY inside, let's enable
it to work. And the integrated PHY need to do some special setting,
so register the rockchip integrated PHY driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable the gmac2phy, make the gmac2phy work on
the rk3328-evb board.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The gmac2phy controller of rk3328 is connected to integrated PHY
directly inside, add the node for the integrated PHY support.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables the integrated PHY for rk3228 evb board
by default.
To use the external 1000M PHY on evb board, need to make
some switch of evb board to be on.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two mac controllers in the rk3328, the one connects
to external PHY, and the other one connects to integrated PHY.
Like the mac of external PHY, the integrated PHY's mac also
needs to configure the related mac registers at GRF.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is only one mac controller in rk3228, which could connect to
external PHY or integrated PHY, use the grf_com_mux bit15 to route
external/integrated PHY.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To make integrated PHY work, need to configure the PHY clock,
PHY cru reset and related registers.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the documentation for integrated PHY. A boolean property indicates
the PHY is integrated into the same physical package as the Ethernet
MAC. If needed, muxers should be configured to ensure the integrated
PHY is used. The absence of this property indicates the muxers should
be configured so that the external PHY is used.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is wrong setting for rk3328_set_to_rmii(), so remove it.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the rockchip PHY driver built into the kernel.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Enable the rockchip PHY driver for multi_v7_defconfig builds.
Signed-off-by: David Wu <david.wu@rock-chips.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace init_timer_deferrable with setup_deferrable_timer to simplify
the source code.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Frequency can be adjusted in DT it make sense to
print current used value on driver init.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Polling 14 mdio devices on single mdio bus eats 30% of 1Ghz cpu time
due to busy loop in wait(). Add small delay to relax cpu.
Signed-off-by: Max Uvarov <muvarov@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Hemminger says:
====================
netvsc: minor fixes and improvements
These are non-critical bug fixes, related to functionality now in net-next.
1. delaying the automatic bring up of VF device to allow udev to change name.
2. performance improvement
3. handle MAC address change with VF; mostly propogate the error that VF gives.
4. minor cleanups
5. allow setting send/receive buffer size with ethtool.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add ethtool statistics for case where send chimmeny buffer is
exhausted and driver has to fall back to doing scatter/gather
send. Also, add statistic for case where ring buffer is full and
receive completions are delayed.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Control the size of the buffer areas via ethtool ring settings.
They aren't really traditional hardware rings, but host API breaks
receive and send buffer into chunks. The final size of the chunks are
controlled by the host.
The default value of send and receive buffer area for host DMA
is much larger than it needs to be. Experimentation shows that
4M receive and 1M send is sufficient.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function init_page_array is always called with a valid pointer
to RNDIS header. No check for NULL is needed.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Assignment to a typed pointer is sufficient in C.
No cast is needed.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The send and receive buffers are both per-device (not per-channel).
The associated NUMA node is a property of the CPU which is per-channel
therefore it makes no sense to force the receive/send buffer to be
allocated on a particular node (since it is a shared resource).
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If setting new values fails, and the attempt to restore original
settings fails. Then log an error and leave device down.
This should never happen, but if it does don't go down in flames.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If VF is slaved to synthetic device, then any change to netvsc
MAC address should be propagated to the slave device.
If slave device doesn't support MAC address change then it
should also be an error to attempt to change synthetic NIC MAC
address.
It also fixes the error unwind in the original code.
If give a bad address, the old code would change the device
MAC address anyway.
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When hv_pkt_iter_next() returns NULL, it has already called
hv_pkt_iter_close(). Calling it twice can lead to extra host signal.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When VF device is discovered, delay bring it automatically up in
order to allow userspace to some simple changes (like renaming).
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
"ret" isn't necessarily initialized here.
Fixes: 9525ae8395 ("phylink: add phylink infrastructure")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no restriction on queue size alignment. Hence removing check for
valid queue size.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When deleting a queue, clear its corresponding bit in the qmask, vfree its
memory, clear out the pointer that's pointing to it, and decrement the
queue count.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <fmanlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
net: sched: let the offloader decide what to offload
Currently there is a Qdisc_class_ops->tcf_cl_offload callback
that is called to find out if cls would offload rule or not.
This is only supported by sch_ingress and sch_clsact.
So the Qdisc are to decide. However, the driver knows what is he
able to offload, so move the decision making to drivers completely.
Just pass classid there and provide set of helpers to allow
identification of qdisc.
As a side effect, this actually allows clsact egress rules
offload in mlxsw.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
cops->tcf_cl_offload is no longer needed, as the drivers check what they
can and cannot offload using the classid identify helpers. So remove this.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no longer need to use handle in drivers, so remove it from
tc_cls_common_offload struct.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking handle, which does not have the inner class
information and drivers wrongly assume clsact->egress as ingress, use
the newly introduced classid identification helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers need classid to decide they support this specific qdisc+class
or not. So propagate it down via the tc_cls_common_offload struct.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>