I noticed that TSQ (TCP Small queues) was less effective when TSO is
turned off, and GSO is on. If BQL is not enabled, TSQ has then no
effect.
It turns out the GSO engine frees the original gso_skb at the time the
fragments are generated and queued to the NIC.
We should instead call the tcp_wfree() destructor for the last fragment,
to keep the flow control as intended in TSQ. This effectively limits
the number of queued packets on qdisc + NIC layers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcf_csum_skb_nextlayer() / pskb_may_pull() can change skb->head, so we
must be careful not keeping pointers to previous headers.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Grégoire Baron <baronchon@n7mm.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
With recent support for GRO, there is no need to keep both LRO and
GRO. This patch therefore removes the deprecated inet_lro support
from mv643xx_eth. This is work is based on an experimental patch
provided by Eric Dumazet and Willy Tarreau.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Based-on-patch-by: Eric Dumazet <eric.dumazet@gmail.com>
Based-on-patch-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes following warning:
drivers/net/vxlan.c:406:6: warning: symbol 'vxlan_fdb_free' was not declared. Should it be static?
drivers/net/vxlan.c:1111:37: warning: Using plain integer as NULL pointer
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Switch to use skb_partial_csum_set() to simplify the codes.
Cc: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 763eff57de.
It causes build regressions, as per Stephen Rothwell:
====================
After merging the final tree, today's linux-next build (powerpc
allyesconfig) failed like this:
net/core/netprio_cgroup.c:250:29: error: static declaration of 'net_prio_subsys' follows non-static declaration
include/linux/cgroup_subsys.h:71:1: note: previous declaration of 'net_prio_subsys' was here
====================
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Convert use of devm_request_and_ioremap() to the newly introduced
devm_ioremap_resource() which provides more consistent error handling.
devm_ioremap_resource() provides its own error messages so all explicit
error messages can be removed from the failure code paths.
This was found with coccinelle.
Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds GRO support to mv643xx_eth by making it invoke
napi_gro_receive instead of netif_receive_skb.
Signed-off-by: Soeren Moch <smoch@web.de>
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The vlan_features was zero which prevents vlan GSO packets to be transmitted to
userspace. This is suboptimal so enable this by initialize vlan_features for
tuntap.
Netperf shows better performance of guest receiving since vlan TSO works for
tuntap:
before:
netperf -H 192.168.5.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.4 ()
port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.01 2786.67
after:
netperf -H 192.168.5.4
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.4 ()
port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 8085.49
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There's nothing that prevent passing the device features of virtio_net to its
vlan device. So this patch simply passes those to vlan device to benefit from
advanced features.
Netperf shows better sending performance for vlan device since TSO can work on
vlan now.
before:
netperf -H 192.168.5.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 4162.35
after:
netperf -H 192.168.5.2
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.5.2 ()
port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 9365.42
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch moves shared private data kzalloc to managed devm_kzalloc and
cleans now unneccessary kfree and error handling.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds an optional shared block clock to avoid lockups on
clock gated controllers. Besides the new clock, clock handling for
existing clocks is cleaned up and moved to devm_clk_get. Device
tree binding documentation is updated for the new clocks property.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3d604da1e9
("net: mvmdio: get and enable optional clock")
was missing an update of the corresponding device tree binding
documentation. This patch adds the clocks property to mvmdio
binding documentation.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 2b8b328b61 (vhost_net: handle polling
errors when setting backend), we in fact track the polling state through
poll->wqh, so there's no need to duplicate the work with an extra
vhost_net_polling_state. So this patch removes this and make the code simpler.
This patch also removes the all tx starting/stopping code in tx path according
to Michael's suggestion.
Netperf test shows almost the same result in stream test, but gets improvements
on TCP_RR tests (both zerocopy or copy) especially on low load cases.
Tested between multiqueue kvm guest and external host with two direct
connected 82599s.
zerocopy disabled:
sessions|transaction rates|normalize|
before/after/+improvements
1 | 9510.24/11727.29/+23.3% | 693.54/887.68/+28.0% |
25| 192931.50/241729.87/+25.3% | 2376.80/2771.70/+16.6% |
50| 277634.64/291905.76/+5% | 3118.36/3230.11/+3.6% |
zerocopy enabled:
sessions|transaction rates|normalize|
before/after/+improvements
1 | 7318.33/11929.76/+63.0% | 521.86/843.30/+61.6% |
25| 167264.88/242422.15/+44.9% | 2181.60/2788.16/+27.8% |
50| 272181.02/294347.04/+8.1% | 3071.56/3257.85/+6.1% |
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
1) Allow to avoid copying DSCP during encapsulation
by setting a SA flag. From Nicolas Dichtel.
2) Constify the netlink dispatch table, no need to modify it
at runtime. From Mathias Krause.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When setting MTU in SRIOV mode add ETH, VLAN and FCS header length
to the maximum MTU obtained from QUERY_DEV_CAP.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The different steering modes are global to the device, with DMFS
being introduced after SRIOV was merged. Hence, SRIOV guests running
legacy / older Linux kernels or non-Linux drivers may provide
B0 steering directives when the hypervisor is using DMFS and fail.
Under B0 only L2 steering rules are allowed, hence B0 is a subset of DMFS.
Use this fact to enable such legacy guests to run by modifying the SRIOV
B0 steering wrapper to translate guest B0 directives to DMFS ones when
the device uses DMFS. The translated B0 rule has to be kept in the
resource tracker as a B0 object to allow for lookup in case of detach.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A pre-step for supporting guests that use B0 steering over a hypervisor
that runs in DMFS (device managed flow steering mode). Add helper function
which allows to translate L2 attachments / detachments provided in B0 mode
to DMFS rules.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The link change is detected via the interrupt pipe, and bulk
pipes are responsible for transfering packets, so it is reasonable
to stop bulk transfer after link is reported as off.
Two adavantages may be obtained with stopping bulk transfer
after link becomes off:
- USB bus bandwidth is saved(USB bus is shared bus except for
USB3.0), for example, lots of 'IN' token packets and 'NYET'
handshake packets is transfered on 2.0 bus.
- probabaly power might be saved for usb host controller since
cancelling bulk transfer may disable the asynchronous schedule of
host controller.
With this patch, when link becomes off, about ~10% performance
boost can be found on bulk transfer of anther usb device which
is attached to same bus with the usbnet device, see below
test on next-20130410:
- read from usb mass storage(Sandisk Extreme USB 3.0) on pandaboard
with below command after unplugging ethernet cable:
dd if=/dev/sda iflag=direct of=/dev/null bs=1M count=800
- without the patch
1, 838860800 bytes (839 MB) copied, 36.2216 s, 23.2 MB/s
2, 838860800 bytes (839 MB) copied, 35.8368 s, 23.4 MB/s
3, 838860800 bytes (839 MB) copied, 35.823 s, 23.4 MB/s
4, 838860800 bytes (839 MB) copied, 35.937 s, 23.3 MB/s
5, 838860800 bytes (839 MB) copied, 35.7365 s, 23.5 MB/s
average: 23.6MB/s
- with the patch
1, 838860800 bytes (839 MB) copied, 32.3817 s, 25.9 MB/s
2, 838860800 bytes (839 MB) copied, 31.7389 s, 26.4 MB/s
3, 838860800 bytes (839 MB) copied, 32.438 s, 25.9 MB/s
4, 838860800 bytes (839 MB) copied, 32.5492 s, 25.8 MB/s
5, 838860800 bytes (839 MB) copied, 31.6178 s, 26.5 MB/s
average: 26.1MB/s
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Cc: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Peter Korsgaard <jacmet@sunsite.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use usbnet_link_change to handle link change centrally.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the introduced usbnet_link_change to handle link change.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch uses the introduced usbnet_link_change() to handle
link change.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver doesn't implement link_reset() callback, so it needn't
to send link reset event.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces the API of usbnet_link_change, so that
usbnet can handle link change centrally, which may help to
implement killing traffic URBs for saving USB bus bandwidth
and host controller power.
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
John W. Linville says:
====================
Please accept this pull request for the 3.10 stream...
Regarding the mac80211 bits, Johannes says:
"Here I have a bunch of minstrel fixes from Felix, per-interface
multicast filtering from Alex, set_tim debouncing from Ilan,
per-interface debugfs cleanups from Stanislaw, an error return fix from
Wei and a number of small improvements and fixes that I made myself."
And for the iwlwifi bits, Johannes says:
"Andrei changed an instance of kmalloc+memdup to kmemdup, Stanislaw
removed the now unused 5ghz_disable module parameter. I also have a
number of fixes from Ilan, Emmanuel and myself, Emmanuel also continued
working on Bluetooth coexistence."
For the sizeable batch of Bluetooth bits, Gustavo says:
"This is our first batch of patches for 3.10. The biggest changes of this pull
request are from Johan Hedberg, he implemented a HCI request framework to make
life easier when we have to send many HCI commands and a block and wait for
all of the to finish, we were able to fix a few issues in stack with the
introduction of this framework.
Other than that Dean Jenkins did a good work cleaning the RFCOMM code, the
refcnt infrastructure was removed and now we use NULL pointer checks to know
when a object was freed or not. That code was buggy and now it looks a way
better.
The rest of changes are clean ups, fixes and small improvements all over the
Bluetooth subsystem."
Regarding the wl12xx bits, Luca says:
"Some patches intended for 3.10. Mostly bug fixes and other small
improvements."
On top of that, there are updates to brcmfmac, brcmsmac, b43, ssb and
bcma, as well as mwifiex, rt2x00, and ath9k and a few others. The most
notable bit is the addition of a new driver in the rtlwifi family.
Please let me know if there are problems!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Using bool can make code more readable.
Convert uses and tests of int to bool.
This also makes a comparison of tg3->link_up
(itself bool) a bool comparison instead of int.
Reorder stack variable declarations to make
bool fit declaration holes where appropriate.
$ size drivers/net/ethernet/broadcom/tg3.o*
text data bss dec hex filename
169958 27249 58896 256103 3e867 drivers/net/ethernet/broadcom/tg3.o.new
169968 27249 58896 256113 3e871 drivers/net/ethernet/broadcom/tg3.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Nithin Nayak Sujir says:
====================
This patch and the following two patches add support for link flap avoidance
by maintaining the link on power down. This feature is required for
management capable devices to have the management connection
uninterrupted on driver reload, reboot and interface up/down.
The other pros of this feature are
- It speeds up boot up time by several seconds as DHCP addresses can be
acquired faster.
- It avoids lengthy Spanning Tree delay.
On powerup the hardware brings up the phy with default settings. If the
link is not up, the management software configures the phy to gigabit
and starts autonegotiate. Subsequently, as long as the link is up, the
driver and management refrain from resetting and/or changing any
configuration that the link depends on.
The LNK_FLAP_AVOID setting is an NVRAM user configurable bit and is
disabled by default. If this setting is enabled, we skip powering down
the phy and resetting it.
A second NVRAM setting is 1G_ON_VAUX_OK (off by default). This adds
support for gigabit link speed when device is on auxiliary power.
====================
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When LFA is enabled, we don't reset the phy. But EEE settings changes
don't take effect until the phy is reset. Add a phy reset when we detect
a changed EEE setting.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Normally on driver load, we set the default settings for speed and flow
control. However, if the default setting is not compatible with the current link
state, we would autonegotiate and cause a link flap. To avoid this, we
pull the current advertised settings into the config.
A second scenario is if a user changes the speed/duplex/fc settings when
the interface is down. In this case we must not pull the settings from
the phy and overwrite user settings. We avoid that by checking the
USER_CONFIGURED flag.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch and the following two patches add support for link flap avoidance
by maintaining the link on power down. This feature is required for
management capable devices to have the management connection
uninterrupted on driver reload, reboot and interface up/down.
The other pros of this feature are
- It speeds up boot up time by several seconds as DHCP addresses can be
acquired faster.
- It avoids lengthy Spanning Tree delay.
On powerup the hardware brings up the phy with default settings. If the
link is not up, the management software configures the phy to gigabit
and starts autonegotiate. Subsequently, as long as the link is up, the
driver and management refrain from resetting and/or changing any
configuration that the link depends on.
The LNK_FLAP_AVOID setting is an NVRAM user configurable bit and is
disabled by default. If this setting is enabled, we skip powering down
the phy and resetting it.
A second NVRAM setting is 1G_ON_VAUX_OK (off by default). This adds
support for gigabit link speed when device is on auxiliary power.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor for use in the next patch that adds sgmii phy support.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the user executes certain ethtool commands such as -s, -A, -G, -L,
-r a phy reset or autonegotiate is performed which results in management
traffic being interrupted.
Add a warning in these cases.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code unnecessarily resets the phy when we use ethtool to
change the ring parameters or flow control settings. Remove the phy
reset.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The STM45PE20 pinstrap on 5762 devices supports multiple sizes. So treat
it just like the ST45_USPT and the size will be read from 0xf0 via
tg3_get_nvram_size().
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In tg3_setup_copper_phy(), if autonegotiation is disabled, we need to
relink only if the speed or duplex does not match the configured
setting. If flow control does not match, a relink is not necessary as
flow control is not a PHY setting. Later on, we'll call
tg3_setup_flow_ctrl() to set up the MAC to the desired flow control
settings if we're in full duplex mode.
Signed-off-by: Nithin Nayak Sujir <nsujir@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces an UAPI header for the SCTP protocol,
so that we can facilitate the maintenance and development of
user land applications or libraries, in particular in terms
of header synchronization.
To not break compatibility, some fragments from lksctp-tools'
netinet/sctp.h have been carefully included, while taking care
that neither kernel nor user land breaks, so both compile fine
with this change (for lksctp-tools I tested with the old
netinet/sctp.h header and with a newly adapted one that includes
the uapi sctp header). lksctp-tools smoke test run through
successfully as well in both cases.
Suggested-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>