Commit Graph

471539 Commits

Author SHA1 Message Date
Florian Fainelli f7d6b96f34 net: dsa: do not call phy_start_aneg
Commit f7f1de51ed ("net: dsa: start and stop the PHY state machine")
add calls to phy_start() in dsa_slave_open() respectively phy_stop() in
dsa_slave_close().

We also call phy_start_aneg() in dsa_slave_create(), and this call is
messing up with the PHY state machine, since we basically start the
auto-negotiation, and later on restart it when calling phy_start().
phy_start() does not currently handle the PHY_FORCING or PHY_AN states
properly, but such a fix would be too invasive for this window.

Fixes: f7f1de51ed ("net: dsa: start and stop the PHY state machine")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:44:44 -04:00
Sébastien Barré dd3619f2ed Removed unused inet6 address state
the inet6 state INET6_IFADDR_STATE_UP only appeared in its definition.

Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sébastien Barré <sebastien.barre@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:37:17 -04:00
Vijay Subramanian c8753d55af net: Cleanup skb cloning by adding SKB_FCLONE_FREE
SKB_FCLONE_UNAVAILABLE has overloaded meaning depending on type of skb.
1: If skb is allocated from head_cache, it indicates fclone is not available.
2: If skb is a companion fclone skb (allocated from fclone_cache), it indicates
it is available to be used.

To avoid confusion for case 2 above, this patch  replaces
SKB_FCLONE_UNAVAILABLE with SKB_FCLONE_FREE where appropriate. For fclone
companion skbs, this indicates it is free for use.

SKB_FCLONE_UNAVAILABLE will now simply indicate skb is from head_cache and
cannot / will not have a companion fclone.

Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:34:25 -04:00
Eric Dumazet 9fab426de7 mlx4: add a new xmit_more counter
ethtool -S reports a new counter, tracking number of time doorbell
was not triggered, because skb->xmit_more was set.

$ ethtool -S eth0 | egrep "tx_packet|xmit_more"
     tx_packets: 2413288400
     xmit_more: 666121277

I merged the tso_packet false sharing avoidance in this patch as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-04 20:04:14 -04:00
David S. Miller 6106253e69 Merge branch 'gudp'
Tom Herbert says:

====================
net: Generic UDP Encapsulation

Generic UDP Encapsulation (GUE) is UDP encapsulation protocol which
encapsulates packets of various IP protocols. The GUE protocol is
described in http://tools.ietf.org/html/draft-herbert-gue-01.

The receive path of GUE is implemented in the FOU over UDP module (FOU).
This includes a UDP encap receive function for GUE as well as GUE
specific GRO functions. Management and configuration of GUE ports shares
most of the same code with FOU.

For the transmit path, the previous FOU support for IPIP, sit, and GRE
was simply extended for GUE (when GUE is enabled insert the GUE
header on transmit in addition to UDP header inserted for FOU).

Semantically GUE is the same as FOU in that the encapsulation (UDP
and GUE headers) that are inserted on transmission and removed on
reception so that IP packet is processed with the inner header.

This patch set includes:
 - Some fixes to FOU, removal of IPv4,v6 specific GRO functions
 - Support to configure a GUE receive port
 - Implementation of GUE receive path (normal and GRO)
 - Additions to ip_tunnel netlink to configure GUE
 - GUE header inserion in ip_tunnel transmit path

v2:
 - Include net/gue.h in patch set

Testing:

I ran performance numbers using netperf TCP_RR with 200 streams,
comparing encapsulation without GUE, encapsulation with GUE, and
encapsulation with FOU.

 GRE
    TCP_STREAM
      IPv4, FOU, UDP checksum enabled
        14.04% TX CPU utilization
        13.17% RX CPU utilization
        9211 Mbps
      IPv4, GUE, UDP checksum enabled
        14.99% TX CPU utilization
        13.79% RX CPU utilization
        9185 Mbps
      IPv4, FOU, UDP checksum disabled
        13.14% TX CPU utilization
        23.18% RX CPU utilization
        9277 Mbps
      IPv4, GUE, UDP checksum disabled
        13.66% TX CPU utilization
        23.57% RX CPU utilization
        9184 Mbps
    TCP_RR
      IPv4, FOU, UDP checksum enabled
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
      IPv4, GUE, UDP checksum enabled
        93.9% CPU utilization
        158/253/472 90/95/99% latencies
        1.15045e+06 tps

  IPIP
    TCP_STREAM
      FOU, UDP checksum enabled
        15.28% TX CPU utilization
        13.92% RX CPU utilization
        9342 Mbps
      GUE, UDP checksum enabled
        13.99% TX CPU utilization
        13.34% RX CPU utilization
        9210 Mbps
      FOU, UDP checksum disabled
        15.08% TX CPU utilization
        24.64% RX CPU utilization
        9226 Mbps
      GUE, UDP checksum disabled
        15.90% TX CPU utilization
        24.77% RX CPU utilization
        9197 Mbps
    TCP_RR
      FOU, UDP checksum enabled
        94.23% CPU utilization
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
      GUE, UDP checksum enabled
        93.75% CPU utilization
        152/243/442 90/95/99% latencies
        1.17027e+06 tps

  SIT
    TCP_STREAM
      FOU, UDP checksum enabled
        14.47% TX CPU utilization
        14.58% RX CPU utilization
        9106 Mbps
      GUE, UDP checksum enabled
        15.09% TX CPU utilization
        14.84% RX CPU utilization
        9080 Mbps
      FOU, UDP checksum disabled
        15.70% TX CPU utilization
        27.93% RX CPU utilization
        9097 Mbps
      GUE, UDP checksum disabled
        15.04% TX CPU utilization
        27.54% RX CPU utilization
        9073 Mbps
    TCP_RR
      FOU, UDP checksum enabled
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
      GUE, UDP checksum enabled
        97.16% CPU utilization
        172/286/576 90/95/99% latencies
        1.00469e+06 tps
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:36 -07:00
Tom Herbert bc1fc390e1 ip_tunnel: Add GUE support
This patch allows configuring IPIP, sit, and GRE tunnels to use GUE.
This is very similar to fou excpet that we need to insert the GUE header
in addition to the UDP header on transmit.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Tom Herbert 37dd024779 gue: Receive side for Generic UDP Encapsulation
This patch adds support receiving for GUE packets in the fou module. The
fou module now supports direct foo-over-udp (no encapsulation header)
and GUE. To support this a type parameter is added to the fou netlink
parameters.

For a GUE socket we define gue_udp_recv, gue_gro_receive, and
gue_gro_complete to handle the specifics of the GUE protocol. Most
of the code to manage and configure sockets is common with the fou.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:33 -07:00
Tom Herbert efc98d08e1 fou: eliminate IPv4,v6 specific GRO functions
This patch removes fou[46]_gro_receive and fou[46]_gro_complete
functions. The v4 or v6 variants were chosen for the UDP offloads
based on the address family of the socket this is not necessary
or correct. Alternatively, this patch adds is_ipv6 to napi_gro_skb.
This is set in udp6_gro_receive and unset in udp4_gro_receive. In
fou_gro_receive the value is used to select the correct inet_offloads
for the protocol of the outer IP header.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:32 -07:00
Tom Herbert 7371e0221c ip_tunnel: Account for secondary encapsulation header in max_headroom
When adjusting max_header for the tunnel interface based on egress
device we need to account for any extra bytes in secondary encapsulation
(e.g. FOU).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 16:53:32 -07:00
Eric Dumazet 01291202ed net: do not export skb_gro_receive()
skb_gro_receive() is only called from tcp_gro_receive() which is
not in a module.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:54:30 -07:00
Chen Gang ad2a2a6d7c drivers/net/irda/Kconfig: Let SH_IRDA depend on HAS_IOMEM
SH_IRDA needs HAS_IOMEM, so depend on it. The related error(with
allmodconfig under um):

    CC [M]  drivers/net/irda/sh_irda.o
  drivers/net/irda/sh_irda.c: In function ‘sh_irda_probe’:
  drivers/net/irda/sh_irda.c:776:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
    self->membase = ioremap_nocache(res->start, resource_size(res));
    ^
  drivers/net/irda/sh_irda.c:776:16: warning: assignment makes pointer from integer without a cast [enabled by default]
    self->membase = ioremap_nocache(res->start, resource_size(res));
                  ^
  drivers/net/irda/sh_irda.c:821:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
    iounmap(self->membase);
    ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:52:04 -07:00
Chen Gang 65cb29a4f3 drivers/net/ethernet/marvell/Kconfig: Let PXA168_ETH depend on HAS_IOMEM
PXA168_ETH need HAS_IOMEM, so depend on it, the related error (with
allmodconfig under um):

    CC [M]  drivers/net/ethernet/marvell/pxa168_eth.o
  drivers/net/ethernet/marvell/pxa168_eth.c: In function ‘pxa168_eth_probe’:
  drivers/net/ethernet/marvell/pxa168_eth.c:1605:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
    iounmap(pep->base);
    ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:52:04 -07:00
Chen Gang 28b5533a6f drivers/net/dsa/Kconfig: Let NET_DSA_BCM_SF2 depend on HAS_IOMEM
NET_DSA_BCM_SF2 need HAS_IOMEM, so depend on it, the related error (with
allmodconfig under um):

    CC [M]  drivers/net/dsa/bcm_sf2.o
  drivers/net/dsa/bcm_sf2.c: In function ‘bcm_sf2_sw_setup’:
  drivers/net/dsa/bcm_sf2.c:487:3: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
     iounmap(*base);
     ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:52:04 -07:00
Chen Gang 9dc8be2816 drivers/net/can/Kconfig: Let CAN_AT91 depend on HAS_IOMEM
CAN_AT91 needs HAS_IOMEM, so depends on it. The related error (with
allmodconfig under um):

    CC [M]  drivers/net/can/at91_can.o
  drivers/net/can/at91_can.c: In function ‘at91_can_probe’:
  drivers/net/can/at91_can.c:1329:2: error: implicit declaration of function ‘ioremap_nocache’ [-Werror=implicit-function-declaration]
  addr = ioremap_nocache(res->start, resource_size(res));
    ^
  drivers/net/can/at91_can.c:1329:7: warning: assignment makes pointer from integer without a cast [enabled by default]
    addr = ioremap_nocache(res->start, resource_size(res));
         ^
  drivers/net/can/at91_can.c:1384:2: error: implicit declaration of function ‘iounmap’ [-Werror=implicit-function-declaration]
    iounmap(addr);
    ^

Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:52:03 -07:00
David S. Miller 579899a9ea Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2014-10-02

This series contains updates to fm10k, igb, ixgbe and i40e.

Alex provides two updates to the fm10k driver.  First reduces the buffer
size to 2k for all page sizes, since most frames only have a 1500 MTU
so supporting a buffer size larger than this is somewhat wasteful.
Second fixes an issue where the number of transmit queues was not being
updated, so added the lines necessary to update the number of transmit
queues.

Rick Jones provides two patches to convert ixgbe, igb and i40e to use
dev_consume_skb_any().

Emil provides two patches for ixgbe, first cleans up a couple of wait
loops on auto-negotiation that were not needed.  Second fixes an issue
reported by Fujitsu/Red Hat, which consolidates the logic behind the
dynamically setting of TXDCTL.WTHRESH depending on interrupt throttle
rate (ITR) setting regardless of BQL.

Ethan Zhao provides a cleanup patch for ixgbe where he noticed a
duplicate define.

Bernhard Kaindl provides a patch for igb to remove a source of latency
spikes by not calling code that uses mdelay() for feeding a PHY stat
while being called with a spinlock held.

Todd bumps the igb version based on the recent changes.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:43:50 -07:00
David S. Miller 48fea861c9 Merge branch 'mlx5-next'
Eli Cohen says:

====================
mlx5 update for 3.18

This series integrates a new mechanism for populating and extracting field values
used in the driver/firmware interaction around command mailboxes.

Changes from V1:
 - Remove unused definition of memcpy_cpu_to_be32()
 - Remove definitions of non_existent_*() and use BUILD_BUG_ON() instead.
 - Added a patch one line patch to add support for ConnectX-4 devices.

Changes from V0:
 - trimmed the auto-generated file to a minimum, as required by the reviewers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:37 -07:00
Eli Cohen f832dc820f net/mlx5_core: Add ConnectX-4 to list of supported devices
Add the upcoming ConnectX-4 device to the list of supported devices by then
mlx5 driver.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:32 -07:00
Eli Cohen 5903325a64 net/mlx5_core: Identify resources by their type
This patch puts a common part as the first field of mlx5_core_qp. This field is
used to identify which resource generated an event. This is required since upcoming
new resource types such as DC targets are allocated for the same numerical space
as regular QPs and may generate the same events. By searching the resource in the
same table we can then look at the common field to identify the resource.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:32 -07:00
Eli Cohen b775516b04 net/mlx5_core: use set/get macros in device caps
Transform device capabilities related commands to use set/get macros to
manipulate command mailboxes.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:32 -07:00
Eli Cohen d29b796ada net/mlx5_core: Use hardware registers description header file
Add an auto generated header file that describes hardware registers along with
set of macros that set/get values. The macros do static checks to avoid
overflow, handle endianess, and overall provide a clean way to code commands.
Currently the header file is small and we will add structs as we make use of
the macros.
A few commands were removed from the commands enum since they are not supported
currently and will be added when support is available.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:31 -07:00
Eli Cohen c7a08ac7ee net/mlx5_core: Update device capabilities handling
Rearrange struct mlx5_caps so it has a "gen" field to represent the current
capabilities configured for the device. Max capabilities can also be queried
from the device. Also update capabilities struct to contain more fields as per
the latest revision if firmware specification.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:42:31 -07:00
Eric Dumazet 55a93b3ea7 qdisc: validate skb without holding lock
Validation of skb can be pretty expensive :

GSO segmentation and/or checksum computations.

We can do this without holding qdisc lock, so that other cpus
can queue additional packets.

Trick is that requeued packets were already validated, so we carry
a boolean so that sch_direct_xmit() can validate a fresh skb list,
or directly use an old one.

Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
host.

Turning TSO on or off had no effect on throughput, only few more cpu
cycles. Lock contention on qdisc lock disappeared.

Same if disabling TX checksum offload.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:36:11 -07:00
Tobias Klauser 6a05880a8b net: ethernet: Remove superfluous ether_setup after alloc_etherdev
There is no need to call ether_setup after alloc_ethdev since it was
already called there.

Follow commits c706471b26 ("net: axienet: remove unnecessary
ether_setup after alloc_etherdev") and 3c87dcbfb3 ("net: ll_temac:
Remove unnecessary ether_setup after alloc_etherdev") and fix the
pattern in all remaining ethernet drivers.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 15:31:40 -07:00
David S. Miller c2bf5ec204 Merge branch 'qdisc_bulk_dequeue'
Jesper Dangaard Brouer says:

====================
qdisc: bulk dequeue support

This patchset uses DaveM's recent API changes to dev_hard_start_xmit(),
from the qdisc layer, to implement dequeue bulking.

Patch01: "qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE"
 - Implement basic qdisc dequeue bulking
 - This time, 100% relying on BQL limits, no magic safe-guard constants

Patch02: "qdisc: dequeue bulking also pickup GSO/TSO packets"
 - Extend bulking to bulk several GSO/TSO packets
 - Seperate patch, as it introduce a small regression, see test section.

We do have a patch03, which exports a userspace tunable as a BQL
tunable, that can byte-cap or disable the bulking/bursting.  But we
could not agree on it internally, thus not sending it now.  We
basically strive to avoid adding any new userspace tunable.

Testing patch01:
================
 Demonstrating the performance improvement of qdisc dequeue bulking, is
tricky because the effect only "kicks-in" once the qdisc system have a
backlog. Thus, for a backlog to form, we need either 1) to exceed wirespeed
of the link or 2) exceed the capability of the device driver.

For practical use-cases, the measureable effect of this will be a
reduction in CPU usage

01-TCP_STREAM:
--------------
Testing effect for TCP involves disabling TSO and GSO, because TCP
already benefit from bulking, via TSO and especially for GSO segmented
packets.  This patch view TSO/GSO as a seperate kind of bulking, and
avoid further bulking of these packet types.

The measured perf diff benefit (at 10Gbit/s) for a single netperf
TCP_STREAM were 9.24% less CPU used on calls to _raw_spin_lock()
(mostly from sch_direct_xmit).

If my E5-2695v2(ES) CPU is tuned according to:
 http://netoptimizer.blogspot.dk/2014/04/basic-tuning-for-network-overload.html
Then it is possible that a single netperf TCP_STREAM, with GSO and TSO
disabled, can utilize all bandwidth on a 10Gbit/s link.  This will
then cause a standing backlog queue at the qdisc layer.

Trying to pressure the system some more CPU util wise, I'm starting
24x TCP_STREAMs and monitoring the overall CPU utilization.  This
confirms bulking saves CPU cycles when it "kicks-in".

Tool mpstat, while stressing the system with netperf 24x TCP_STREAM, shows:
 * Disabled bulking: sys:2.58%  soft:8.50%  idle:88.78%
 * Enabled  bulking: sys:2.43%  soft:7.66%  idle:89.79%

02-UDP_STREAM
-------------
The measured perf diff benefit for UDP_STREAM were 6.41% less CPU used
on calls to _raw_spin_lock().  24x UDP_STREAM with packet size -m 1472 (to
avoid sending UDP/IP fragments).

03-trafgen driver test
----------------------
The performance of the 10Gbit/s ixgbe driver is limited due to
updating the HW ring-queue tail-pointer on every packet.  As
previously demonstrated with pktgen.

Using trafgen to send RAW frames from userspace (via AF_PACKET), and
forcing it through qdisc path (with option --qdisc-path and -t0),
sending with 12 CPUs.

I can demonstrate this driver layer limitation:
 * 12.8 Mpps with no qdisc bulking
 * 14.8 Mpps with qdisc bulking (full 10G-wirespeed)

Testing patch02:
================
Testing Bulking several GSO/TSO packets:

Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec for 10G while transmitting
approx 813Kpps).

Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec, for 10G
while transmitting approx 813Kpps).

 Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.

Corrosponding to:
 (10000*10^6)*((0.50-0.47)/10^3)/8 = 37500 bytes
 (10000*10^6)*((0.50-0.38)/10^3)/8 = 150000 bytes
 37500/1500  = 25 pkts
 150000/1500 = 100 pkts

 Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms
diff of 0.12ms corrosponding to 1500 bytes at 100Mbit/s. Bulking
several GSOs together, result in a stable high-prio queue delay of
2.23ms.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:23 -07:00
Jesper Dangaard Brouer 808e7ac0bd qdisc: dequeue bulking also pickup GSO/TSO packets
The TSO and GSO segmented packets already benefit from bulking
on their own.

The TSO packets have always taken advantage of the only updating
the tailptr once for a large packet.

The GSO segmented packets have recently taken advantage of
bulking xmit_more API, via merge commit 53fda7f7f9 ("Merge
branch 'xmit_list'"), specifically via commit 7f2e870f2a ("net:
Move main gso loop out of dev_hard_start_xmit() into helper.")
allowing qdisc requeue of remaining list.  And via commit
ce93718fb7 ("net: Don't keep around original SKB when we
software segment GSO frames.").

This patch allow further bulking of TSO/GSO packets together,
when dequeueing from the qdisc.

Testing:
 Measuring HoL (Head-of-Line) blocking for TSO and GSO, with
netperf-wrapper. Bulking several TSO show no performance regressions
(requeues were in the area 32 requeues/sec).

Bulking several GSOs does show small regression or very small
improvement (requeues were in the area 8000 requeues/sec).

 Using ixgbe 10Gbit/s with GSO bulking, we can measure some additional
latency. Base-case, which is "normal" GSO bulking, sees varying
high-prio queue delay between 0.38ms to 0.47ms.  Bulking several GSOs
together, result in a stable high-prio queue delay of 0.50ms.

 Using igb at 100Mbit/s with GSO bulking, shows an improvement.
Base-case sees varying high-prio queue delay between 2.23ms to 2.35ms

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
Jesper Dangaard Brouer 5772e9a346 qdisc: bulk dequeue support for qdiscs with TCQ_F_ONETXQUEUE
Based on DaveM's recent API work on dev_hard_start_xmit(), that allows
sending/processing an entire skb list.

This patch implements qdisc bulk dequeue, by allowing multiple packets
to be dequeued in dequeue_skb().

The optimization principle for this is two fold, (1) to amortize
locking cost and (2) avoid expensive tailptr update for notifying HW.
 (1) Several packets are dequeued while holding the qdisc root_lock,
amortizing locking cost over several packet.  The dequeued SKB list is
processed under the TXQ lock in dev_hard_start_xmit(), thus also
amortizing the cost of the TXQ lock.
 (2) Further more, dev_hard_start_xmit() will utilize the skb->xmit_more
API to delay HW tailptr update, which also reduces the cost per
packet.

One restriction of the new API is that every SKB must belong to the
same TXQ.  This patch takes the easy way out, by restricting bulk
dequeue to qdisc's with the TCQ_F_ONETXQUEUE flag, that specifies the
qdisc only have attached a single TXQ.

Some detail about the flow; dev_hard_start_xmit() will process the skb
list, and transmit packets individually towards the driver (see
xmit_one()).  In case the driver stops midway in the list, the
remaining skb list is returned by dev_hard_start_xmit().  In
sch_direct_xmit() this returned list is requeued by dev_requeue_skb().

To avoid overshooting the HW limits, which results in requeuing, the
patch limits the amount of bytes dequeued, based on the drivers BQL
limits.  In-effect bulking will only happen for BQL enabled drivers.

Small amounts for extra HoL blocking (2x MTU/0.24ms) were
measured at 100Mbit/s, with bulking 8 packets, but the
oscillating nature of the measurement indicate something, like
sched latency might be causing this effect. More comparisons
show, that this oscillation goes away occationally. Thus, we
disregard this artifact completely and remove any "magic" bulking
limit.

For now, as a conservative approach, stop bulking when seeing TSO and
segmented GSO packets.  They already benefit from bulking on their own.
A followup patch add this, to allow easier bisect-ability for finding
regressions.

Jointed work with Hannes, Daniel and Florian.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:37:06 -07:00
Mark Einon 38df6492eb et131x: Add PCIe gigabit ethernet driver et131x to drivers/net
This adds the ethernet driver for Agere et131x devices to
drivers/net/ethernet.

The driver being added has been in the staging tree for some time, and will be
removed from there in a seperate patch. This one merely disables the staging
version to prevent two instances being built.

Signed-off-by: Mark Einon <mark.einon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-03 12:22:19 -07:00
Arturo Borrero 8da4cc1b10 netfilter: nft_masq: register/unregister notifiers on module init/exit
We have to register the notifiers in the masquerade expression from
the the module _init and _exit path.

This fixes crashes when removing the masquerade rule with no
ipt_MASQUERADE support in place (which was masking the problem).

Fixes: 9ba1f72 ("netfilter: nf_tables: add new nft_masq expression")
Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-03 14:24:35 +02:00
Larry Finger 3f08e47291 rtlwifi: Fix static checker warnings for various drivers
Indenting errors yielded the following static checker warnings:

drivers/net/wireless/rtlwifi/rtl8192ee/hw.c:533 rtl92ee_set_hw_reg() warn: add curly braces? (if)
drivers/net/wireless/rtlwifi/rtl8192ee/hw.c:539 rtl92ee_set_hw_reg() warn: add curly braces? (if)

An unreleased version of the static checker also reported:

drivers/net/wireless/rtlwifi/rtl8723be/trx.c:550 rtl8723be_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8188ee/trx.c:621 rtl88ee_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192ee/trx.c:567 rtl92ee_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8821ae/trx.c:758 rtl8821ae_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8723ae/trx.c:494 rtl8723e_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192se/trx.c:315 rtl92se_rx_query_desc() warn: 'hdr' can't be NULL.
drivers/net/wireless/rtlwifi/rtl8192ce/trx.c:392 rtl92ce_rx_query_desc() warn: 'hdr' can't be NULL.

All of these are fixed.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Larry Finger 989377e1cc rtlwifi: Fix Kconfig for RTL8192EE
The driver needs btcoexist, but Kconfig fails to select it. This omission
could cause build errors for some configurations.

Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Sujith Manoharan e2cba8d759 ath9k: Fix flushing in MCC mode
When we are attempting to switch to a new
channel context, the TX queues are flushed, but
the mac80211 queues are not stopped and traffic
can still come down to the driver.

This patch fixes it by stopping the queues
assigned to the current context/vif before
trying to flush.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Sujith Manoharan 5ba8d9d2f0 ath9k: Fix queue handling for channel contexts
When a full chip reset is done, all the queues
across all VIFs are stopped, but if MCC is enabled,
only the queues of the current context is awakened,
when we complete the reset.

This results in unfairness for the inactive context.
Since frames are queued internally in the driver if
there is a context mismatch, we can awaken all the
queues when coming out of a reset.

The VIF-specific queues are still used in flow control,
to ensure fairness when traffic is high.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Sujith Manoharan a064eaa10c ath9k: Add ath9k_chanctx_stop_queues()
This can be used when the queues of a context
needs to be stopped.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Sujith Manoharan b39031536a ath9k: Pass context to ath9k_chanctx_wake_queues()
Change the ath9k_chanctx_wake_queues() API so
that we can pass the channel context that needs its
queues to be stopped.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:32 -04:00
Sujith Manoharan 4f82eecf73 ath9k: Fix queue handling in flush()
When draining of the TX queues fails, a
full HW reset is done. ath_reset() makes sure
that the queues in mac80211 are restarted,
so there is no need to wake them up again.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:31 -04:00
Sujith Manoharan 60913f4d29 ath9k: Remove duplicate code
ath9k_has_tx_pending() can be used to
check if there are pending frames instead
of having duplicate code.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:31 -04:00
Sujith Manoharan fc1314c75e ath9k: Fix pending frame check
Checking for the queue depth outside of
the TX queue lock is incorrect and in this
case, is not required since it is done inside
ath9k_has_pending_frames().

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:31 -04:00
Sujith Manoharan b736728575 ath9k: Check pending frames properly
There is no need to check if the current
channel context has active ACs queued up
if the TX queue is not empty.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:31 -04:00
Sujith Manoharan 4b60af4ab4 ath9k: Print RoC expiration
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:26:31 -04:00
David S. Miller 739e4a758e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/usb/r8152.c
	net/netfilter/nfnetlink.c

Both r8152 and nfnetlink conflicts were simple overlapping changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-02 11:25:43 -07:00
Avinash Patil 030bb75a1d mwifiex: add support for SD8887 chipset
This patch adds SD8887 support to mwifiex.
SD8887 is Marvell's 1x1 11ac solution.

The corresponding firmware image file is located at:
"mrvl/sd8887_uapsta.bin"

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Frank Huang <frankh@marvell.com>
Signed-off-by: Nishant Sarmukadam <nishants@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:23:15 -04:00
Avinash Patil 554a0113cc mwifiex: few more register offset entries for sdio card structure
This patch adds some more defitions to card specific register structure
and removes static defines for these registers.

Signed-off-by: Avinash Patil <patila@marvell.com>
Signed-off-by: Cathy Luo <cluo@marvell.com>
Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:23:15 -04:00
Vladimir Kondratiev dba4b74d2d wil6210: atomic I/O for the card memory
Introduce netdev IOCTLs, to be used by the debug tools.

Allows to read/write single dword value or
memory block, aligned to dword
Different address modes supported:
- BAR offset
- Firmware "linker" address
- target's AHB bus

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:23:14 -04:00
Vladimir Kondratiev c33407a8c5 wil6210: manual FW error recovery mode
Introduce manual FW recovery mode. It is activated if module parameter
@no_fw_recovery set to true. May be changed at runtime.

Recovery information provided by new "recovery" debugfs file. It prints:

mode = [auto|manual]
state = [idle|pending|running]

In manual mode, after FW error, recovery won't start automatically. Instead,
after notification to user space, recovery waits in "pending" state, as indicated by the
"recovery" debugfs file. User space tools may perform data collection and allow to
continue recovery by writing "run" to the "recovery" debugfs file.
Alternatively, recovery pending may be canceled by stopping network interface
i.e. 'ifconfig wlan0 down'

Signed-off-by: Vladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:23:14 -04:00
Sujith Manoharan e6664dff06 ath: Add support for tracing
Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2014-10-02 14:23:14 -04:00
John W. Linville f6cd071891 Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2014-10-02 13:56:19 -04:00
Simon Horman 07dcc686fa ipvs: Clean up comment style in ip_vs.h
* Consistently use the multi-line comment style for networking code:

  /* This
   * That
   * The other thing
   */

* Use single-line comment style for comments with only one line of text.

* In general follow the leading '*' of each line of a comment with a
  single space and then text.

* Add missing line break between functions, remove double line break,
  align comments to previous lines whenever possible.

Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:58 +02:00
Pablo Neira Ayuso 4b7fd5d97e netfilter: explicit module dependency between br_netfilter and physdev
You can use physdev to match the physical interface enslaved to the
bridge device. This information is stored in skb->nf_bridge and it is
set up by br_netfilter. So, this is only available when iptables is
used from the bridge netfilter path.

Since 34666d4 ("netfilter: bridge: move br_netfilter out of the core"),
the br_netfilter code is modular. To reduce the impact of this change,
we can autoload the br_netfilter if the physdev match is used since
we assume that the users need br_netfilter in place.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:57 +02:00
Pablo Neira Ayuso 36d2af5998 netfilter: nf_tables: allow to filter from prerouting and postrouting
This allows us to emulate the NAT table in ebtables, which is actually
a plain filter chain that hooks at prerouting, output and postrouting.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:56 +02:00
Pablo Neira Ayuso 756c1b1a7f netfilter: nft_compat: remove incomplete 32/64 bits arch compat code
This code was based on the wrong asumption that you can probe based
on the match/target private size that we get from userspace. This
doesn't work at all when you have to dump the info back to userspace
since you don't know what word size the userspace utility is using.

Currently, the extensions that require arch compat are limit match
and the ebt_mark match/target. The standard targets are not used by
the nft-xt compat layer, so they are not affected. We can work around
this limitation with a new revision that uses arch agnostic types.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-10-02 18:30:55 +02:00