Commit Graph

662493 Commits

Author SHA1 Message Date
Philippe Reynes 17400f77b8 net: usb: mcs7830: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: poma <poma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:04:00 -07:00
Philippe Reynes bcb58f777e net: usb: dm9601: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:03:59 -07:00
Philippe Reynes d0b3ab309f net: usb: cdc_ncm: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:03:59 -07:00
Philippe Reynes d8c3de2e86 net: usb: sr9800: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:03:59 -07:00
Philippe Reynes eaf9a32a44 net: usb: smsc95xx: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:03:59 -07:00
Philippe Reynes 8bae3551e9 net: usb: usbnet: add new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We add the new api {get|set}_link_ksettings to this driver.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 16:03:58 -07:00
Florian Fainelli 37a30b435b net: bcmgenet: Track per TX/RX rings statistics
__bcmgenet_tx_reclaim() is currently summing TX bytes/packets in a way
that is not SMP friendly, mutliples CPUs could run
__bcmgenet_tx_reclaim() independently and still update stats->tx_bytes
and stats->tx_packets, cloberring the other CPUs statistics.

Fix this by tracking per RX and TX rings the number of bytes, packets,
dropped and errors statistics, and provide a bcmgenet_get_stats()
function which aggregates everything and returns a consistent output.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 15:30:24 -07:00
Nikolay Aleksandrov bf4e0a3db9 net: ipv4: add support for ECMP hash policy choice
This patch adds support for ECMP hash policy choice via a new sysctl
called fib_multipath_hash_policy and also adds support for L4 hashes.
The current values for fib_multipath_hash_policy are:
 0 - layer 3 (default)
 1 - layer 4
If there's an skb hash already set and it matches the chosen policy then it
will be used instead of being calculated (currently only for L4).
In L3 mode we always calculate the hash due to the ICMP error special
case, the flow dissector's field consistentification should handle the
address order thus we can remove the address reversals.
If the skb is provided we always use it for the hash calculation,
otherwise we fallback to fl4, that is if skb is NULL fl4 has to be set.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 15:27:19 -07:00
Andrey Vagin 88997e4208 net/8021q: create device with all possible features in wanted_features
wanted_features is a set of features which have to be enabled if a
hardware allows that.

Currently when a vlan device is created, its wanted_features is set to
current features of its base device.

The problem is that the base device can get new features and they are
not propagated to vlan-s of this device.

If we look at bonding devices, they doesn't have this problem and this
patch suggests to fix this issue by the same way how it works for bonding
devices.

We meet this problem, when we try to create a vlan device over a bonding
device. When a system are booting, real devices require time to be
initialized, so bonding devices created without slaves, then vlan
devices are created and only then ethernet devices are added to the
bonding device. As a result we have vlan devices with disabled
scatter-gather.

* create a bonding device
  $ ip link add bond0 type bond
  $ ethtool -k bond0 | grep scatter
  scatter-gather: off
	tx-scatter-gather: off [requested on]
	tx-scatter-gather-fraglist: off [requested on]

* create a vlan device
  $ ip link add link bond0 name bond0.10 type vlan id 10
  $ ethtool -k bond0.10 | grep scatter
  scatter-gather: off
	tx-scatter-gather: off
	tx-scatter-gather-fraglist: off

* Add a slave device to bond0
  $ ip link set dev eth0 master bond0

And now we can see that the bond0 device has got the scatter-gather
feature, but the bond0.10 hasn't got it.
[root@laptop linux-task-diag]# ethtool -k bond0 | grep scatter
scatter-gather: on
	tx-scatter-gather: on
	tx-scatter-gather-fraglist: on
[root@laptop linux-task-diag]# ethtool -k bond0.10 | grep scatter
scatter-gather: off
	tx-scatter-gather: off
	tx-scatter-gather-fraglist: off

With this patch the vlan device will get all new features from the
bonding device.

Here is a call trace how features which are set in this patch reach
dev->wanted_features.

register_netdevice
   vlan_dev_init
	...
	dev->hw_features = NETIF_F_HW_CSUM | NETIF_F_SG |
		       NETIF_F_FRAGLIST | NETIF_F_GSO_SOFTWARE |
		       NETIF_F_HIGHDMA | NETIF_F_SCTP_CRC |
		       NETIF_F_ALL_FCOE;

	dev->features |= dev->hw_features;
	...
    dev->wanted_features = dev->features & dev->hw_features;
    __netdev_update_features(dev);
        vlan_dev_fix_features
	   ...

Cc: Alexey Kuznetsov <kuznet@virtuozzo.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 15:26:26 -07:00
Ezequiel Lara Gomez b3407c8e5e Cleanup some warning from timestamping code.
Following checkpatch.pl recommendations (which include
replacing with <linux/io.h> the <asm/io.h>, since linux/io.h includes
it).

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:40:02 -07:00
Ezequiel Lara Gomez 6df014cffb Enable tx timestamping on loopback and dummy
This enables developing code that uses SOF_TIMESTAMPING_TX_SOFTWARE
by using localhost addresses (without needing to send packets outside),
as well as enabling unit and functional testing of TX timestamping code
without needing hardware support or network access.

It also fulfills the expectation of software network devices supporting
software-based timestamping.

Tested on qemu using txtimestamping.c from the kernel selftests, and
ethtool -T.

Signed-off-by: Ezequiel Lara Gomez <ezegomez@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:40:01 -07:00
David S. Miller 406910a8ba Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-03-20

This series contains updates to i40e and i40evf only.

Philippe Reynes updates i40e and i40evf to use the new ethtool API for
{get|set}_link_ksettings.

Jake provides the remaining patches in the series, starting with a fix
for i40e where the firmware expected the port numbers for the offloaded
UDP tunnels in Little Endian format and we were sending them in Big Endian
format which put the wrong port number to be put in the UDP tunnel list.
Changed the driver to use __be32 values instead of arrays for
(src|dst)_ip.  Refactored the exit flow of i40e_add_fdir_ethtool() which
removes the dependency on having a non-zero return value.  Fixed a memory
leak by running kfree() and returning immediately when we fail to add
flow director filter.  Fixed a potential issue where could update the
filter count without actually succeeding in adding a filter, by moving
the ATR exit check to after we have sent the TCP/IPv4 filter to the ring
successfully.  Ensures that the fd_tcp_rule count is reset to 0, before
we reprogram the filters so that we do not end up with a stale count
which does not correctly reflect the number of programmed filters.  Added
a check whether we have TCP/IPv4 filters before re-enabling ATR after
flushing and replaying FDIR filters.  Added counters for each filter
type in preparation for adding code to properly check the mask value.
Fixed potential issues by explicitly checking the flow type at the
start of i40e_add_fdir_ethtool().  To avoid possible memory leaks,
we now unconditionally delete the old filter, even if it is identical to
the new filter and ensures will always update the filters as expected.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:35:21 -07:00
David S. Miller 41e95736b3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your
net-next tree. A couple of new features for nf_tables, and unsorted
cleanups and incremental updates for the Netfilter tree. More
specifically, they are:

1) Allow to check for TCP option presence via nft_exthdr, patch
   from Phil Sutter.

2) Add symmetric hash support to nft_hash, from Laura Garcia Liebana.

3) Use pr_cont() in ebt_log, from Joe Perches.

4) Remove some dead code in arp_tables reported via static analysis
   tool, from Colin Ian King.

5) Consolidate nf_tables expression validation, from Liping Zhang.

6) Consolidate set lookup via nft_set_lookup().

7) Remove unnecessary rcu read lock side in bridge netfilter, from
   Florian Westphal.

8) Remove unused variable in nf_reject_ipv4, from Tahee Yoo.

9) Pass nft_ctx struct to object initialization indirections, from
   Florian Westphal.

10) Add code to integrate conntrack helper into nf_tables, also from
    Florian.

11) Allow to check if interface index or name exists via
    NFTA_FIB_F_PRESENT, from Phil Sutter.

12) Simplify resolve_normal_ct(), from Florian.

13) Use per-limit spinlock in nft_limit and xt_limit, from Liping Zhang.

14) Use rwlock in nft_set_rbtree set, also from Liping Zhang.

15) One patch to remove a useless printk at netns init path in ipvs,
    and several patches to document IPVS knobs.

16) Use refcount_t for reference counter in the Netfilter/IPVS code,
    from Elena Reshetova.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:28:08 -07:00
David S. Miller b9974d76f2 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2017-03-17

This series contains updates to mainly igb, with one fix for ixgbe.

Alex does all the changes in the series, starting with adding support
for DMA_ATTR_WEAK_ORDERING to improve performance on some platforms.
Modified igb to use the length of the packet instead of the DD status
bit to determine if a new descriptor is ready to be processed.  Modified
the driver to only go through the region in the receive ring that was
designated to be cleaned up, instead of going through the entire ring
on cleanup.  Cleaned up the transmit side, by clearing the transmit
buffer_info only when resetting the rings.  Added a new upper limit for
receive, which is based on the size of a 2K buffer minus padding, which
will allow us to support build_skb going forward.  Fixed ethtool testing
to only sync on the size of the frame that is being tested, instead of
the entire receive buffer.  Updated the handling of page addresses to
always use a void pointer with the consistent name of "va" to indicate
that we are working with a virtual address.  Added a "chicken bit" so
that we can turn off the new receive allocation feature, in the case
where we need to fallback to the legacy receive path.  Added support for
using 3K buffers in order 1 pages the same way we were using 2K buffers
in 4K pages.  Added support for padding packet, since we limit the size
of the frame, we are able to write to an offset within the buffer instead
of having to write at the very start of the buffer.  This allows us to
leaving padding room for things like supporting XDP in the future.
Refactored the receive buffer page management, since there are 2-3 paths
that can be taken depending on what receive modes are enabled, so to
improve maintainability, break out the common bits into their own
functions.  Add support for build_skb, again.  Lastly, fixed a typo in
igb and ixgbe code comments.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-21 14:19:20 -07:00
Jacob Keller c6da525de7 i40e: always remove old filter when adding new FDir filter
The previous code relied on i40e_match_fdir_input_set to determine when
determining whether to free the old filter. Change this code so that we
simply unconditionally delete the old filter, even if it's identical to
the new filter. This ensures that we don't leak any memory, and that we
always update the filters as expected.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:22 -07:00
Jacob Keller 1ec8deac8c i40e: explicitly fail on extended MAC field for ethtool_rx_flow_spec
Although we will fail the filter later due to checking flow_type which
will have a bogus invalid type, it is possible future refactoring will
remove this hidden failure case. Avoid a possible issue in the future by
explicitly checking the flow type at the start.

Change-Id: Ia98eb26f7b93ccbe38c7141e8f203ef496fc6598
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:22 -07:00
Jacob Keller 097dbf5250 i40e: add counters for UDP/IPv4 and IPv4 filters
In preparation for adding code to properly check the mask values, we
will need to know the number of active filters for each type. Add
counters for each filter type. Rename the already existing fd_tcp_rule
to fd_tcp4_filter_cnt to match the style of other names. To avoid style
warnings, avoid assigning multiple parameters at once, and fix up one
other case where we did so previously.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:22 -07:00
Jacob Keller 510dd4609f i40e: don't re-enable ATR when flushing filters if SB has TCP4/IPv4 rules
When flushing and replaying FDIR filters, it is possible we would
disable ATR, and then re-enable it even though we should have kept
it disabled due to existing TCP/IPv4 filters. Fix this by checking
whether we have TCP4/IPv4 filters before re-enabling.

Alternatively, we could instead restore ATR and then replay filters,
however, this would cause us to rapidly enable and then disable ATR in
some cases.

Change-ID: I076e4cc1e4409bce7f98f3c213295433a4ff43d8
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:21 -07:00
Jacob Keller 6d069425f0 i40e: reset fd_tcp_rule count when restoring filters
Since we're about to reprogram the filters, we need to ensure that the
fd_tcp_rule count is correctly reset to 0. Otherwise, we will keep
a stale count that does not accurately reflect the number of programmed
TCPv4 filters.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:21 -07:00
Jacob Keller e122eb7482 i40e: remove redundant check for fd_tcp_rule when restoring filters
i40e_fdir_filter_restore re-adds all existing filters, which already
checks when adding a TCPv4 filter to disable ATR. We don't need to make
the check twice, so remove this redundant code.

Change-ID: Ia0b0690e23523915199d601494557def135c9d7f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:21 -07:00
Jacob Keller 377cc24980 i40e: exit ATR mode only when adding TCP/IPv4 filter succeeds
Move ATR exit check after we have sent the TCP/IPv4 filter to the ring
successfully. This avoids an issue where we potentially update the
filter count without actually succeeding in adding the filter. Now, we
only increment the fd_tcp_rule after we've succeeded. Additionally, we
will re-enable ATR mode only after deletion of the filter is actually
posted to the FDIR ring.

Change-ID: If5c1dea422081cc5e2de65618b01b4c3bf6bd586
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller e5187ee3ee i40e: return immediately when failing to add fdir filter
Instead of setting err=true and checking this to determine when to free
the raw_packet near the end of the function, simply kfree and return
immediately. The resulting code is a bit cleaner and has one less
variable. This also resolves a subtle bug in the ipv4 case which could
fail to add the first filter and then never free the memory, resulting
in a small memory leak.

Change-ID: I7583aac033481dc794b4acaa14445059c8930ff1
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller 01016da1e5 i40e: rework exit flow of i40e_add_fdir_ethtool
Refactor the exit flow of the i40e_add_fdir_ethtool function. Move the
input_label to the end of the function, removing the dependency on
having a non-zero return value. Add a comment explaining why it is ok
not to free the fdir data structure, because the structure is now stored
in the fdir_filter_list.

Change-Id: I723342181d59cd0c9f3b31140c37961ba37bb242
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller 8ce43dce6f i40e: don't use arrays for (src|dst)_ip
The code originally included src_ip and dst_ip with enough space to
support ipv6 filters. However, no actual support for ipv6 filters has
been implemented. Thus, remove the arrays and just use __be32 values.
Should ipv6 support be added in the future, we can replace these with
a union that has sizes for both values.

Change-Id: I1bc04032244a80eb6ebc8a4e6c723a4a665c1dd5
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller fe0b0cd97b i40e: send correct port number to AdminQ when enabling UDP tunnels
The firmware expects the port numbers for offloaded UDP tunnels in
Little Endian format. We accidentally sent the value in Big Endian
format which obviously will cause the wrong port number to be put into
the UDP tunnels list. This results in VxLAN and Geneve tunnel Rx
offloads being essentially disabled, unless the port number happens to
be identical after byte swapping. Note that i40e_aq_add_udp_tunnel()
will byteswap the parameter from host order into Little Endian so we
don't need worry about passing strictly a __le16 value to the command.

This patch essentially reverts b3f5c7bc88 ("i40e: Fix for extra byte
swap in tunnel setup", 2016-08-24), but in a way that makes the result
much more clear to the reader.

Fixes: b3f5c7bc88 ("i40e: Fix for extra byte swap in tunnel setup", 2016-08-24)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Williams, Mitch A <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:19 -07:00
Philippe Reynes 48ce88022d i40evf: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:43:30 -07:00
Reshetova, Elena 4485a841be netfilter: fix the warning on unused refcount variable
net/netfilter/nfnetlink_acct.c: In function 'nfnl_acct_try_del':
net/netfilter/nfnetlink_acct.c:329:15: warning: unused variable 'refcount' [-Wunused-variable]
unsigned int refcount;
             ^

Fixes: b54ab92b84 ("netfilter: refcounter conversions")
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-20 10:49:12 +01:00
Philippe Reynes a7f909405b i40e: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 19:18:04 -07:00
Alexander Duyck 3a1eb6d10c igb/ixgbe: Fix typo in igb_build_skb and/or ixgbe_build_skb code comment
There was a typo that I had left in the code comments for the igb and ixgbe
functions that enabled build_skb support.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:55:55 -07:00
Alexander Duyck b1bb2eb0a0 igb: Re-add support for build_skb in igb
This reverts commit f9d40f6a99 ("igb: Revert support for build_skb in
igb") and adds a few changes to update it to work with the latest version
of igb. We are now able to revert the removal of this due to the fact
that with the recent changes to the page count and the use of
DMA_ATTR_SKIP_CPU_SYNC we can make the pages writable so we should not be
invalidating the additional data added when we call build_skb.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck e014272672 igb: Break out Rx buffer page management
At this point we have 2 to 3 paths that can be taken depending on what Rx
modes are enabled.  In order to better support that and improve the
maintainability I am breaking out the common bits from those paths and
making them into their own functions.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck e3cdf68d4a igb: Add support for padding packet
With the size of the frame limited we can now write to an offset within the
buffer instead of having to write at the very start of the buffer.  The
advantage to this is that it allows us to leave padding room for things
like supporting XDP in the future.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck 8649aaef40 igb: Add support for using order 1 pages to receive large frames
This patch adds support for using 3K buffers in order 1 pages the same way
we were using 2K buffers in 4K pages.  We are reserving 1K of room for now
to have space available for future headroom and tailroom when we enable
build_skb support.

One side effect of this patch is that we can end up using a larger buffer
if jumbo frames is enabled.  The impact shouldn't be too great, but it
could hurt small packet performance for UDP workloads if jumbo frames is
enabled as the truesize of frames will be larger.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck e08912985b igb: Add support for ethtool private flag to allow use of legacy Rx
Since there are potential drawbacks to the new Rx allocation approach I
thought it best to add a "chicken bit" so that we can turn the feature off
if in the event that a problem is found.

It also provides a means of validating the legacy Rx path in the event that
we are forced to fall back.  At some point in the future when we are
convinced we don't need it anymore we might be able to drop the legacy-rx
flag.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck 3456fd5342 igb: Use page_address offset from page instead of masking virtual address
Update the handling of page addresses so that we always refer to them using
a void pointer, and try to use the consistent name of va indicating we are
working with a virtual address.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck cb0ef1d1dc igb: Only sync size of expected frame in ethtool testing
We only need to sync the size of the frame that is read to test.  We don't
need to sync the entire Rx buffer.  This way the testing is more consistent
with how we handle things in the receive path.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck cfbc871c21 igb: Limit maximum frame Rx based on MTU
In order to support the use of build_skb going forward it will be necessary
to place a maximum limit on the amount of data we can receive when jumbo
frames is not enabled.  In order to do this I am adding a new upper limit
for receive based on the size of a 2K buffer minus padding.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck 7cc6fd4c60 igb: Don't bother clearing Tx buffer_info in igb_clean_tx_ring
In the case of the Tx rings we need to only clear the Tx buffer_info when
we are resetting the rings.  Ideally we do this when we configure the ring
to bring it back up instead of when we are taking it down in order to avoid
dirtying pages we don't need to.

In addition we don't need to clear the Tx descriptor ring since we will
fully repopulate it when we begin transmitting frames and next_to_watch can
be cleared to prevent the ring from being cleaned beyond that point instead
of needing to touch anything in the Tx descriptor ring.

Finally with these changes we can avoid having to reset the skb member of
the Tx buffer_info structure in the cleanup path since the skb will always
be associated with the first buffer which has next_to_watch set.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck d2bead576e igb: Clear Rx buffer_info in configure instead of clean
This change makes it so that instead of going through the entire ring on Rx
cleanup we only go through the region that was designated to be cleaned up
and stop when we reach the region where new allocations should start.

In addition we can avoid having to perform a memset on the Rx buffer_info
structures until we are about to start using the ring again.  By deferring
this we can avoid dirtying the cache any more than we have to which can
help to improve the time needed to bring the interface down and then back
up again in a reset or suspend/resume cycle.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck 7ec0116c91 igb: Use length to determine if descriptor is done
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

In addition I have updated the code so that we only reset the Rx descriptor
length for descriptor zero when resetting a ring instead of having to do a
memset with 0 over the entire ring.  By doing this we can save some time on
initialization.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:44 -07:00
Alexander Duyck 7bd1759282 igb: Add support for DMA_ATTR_WEAK_ORDERING
Since we are already using DMA attributes in igb for Rx there is no reason
why we can't also apply DMA_ATTR_WEAK_ORDERING which is needed on some
platforms to improve performance.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-17 12:11:43 -07:00
Reshetova, Elena b54ab92b84 netfilter: refcounter conversions
refcount_t type and corresponding API (see include/linux/refcount.h)
should be used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2017-03-17 12:49:43 +01:00
Manish Awasthi fe723dff0f liquidio: fix wrong information about link modes reported to ethtool
Information reported to ethtool about link modes is wrong for 25G NIC.  Fix
it by checking for presence of 25G NIC, checking the link speed reported by
NIC firmware, and then assigning proper values to the
ethtool_link_ksettings struct.

Signed-off-by: Manish Awasthi <manish.awasthi@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 21:40:29 -07:00
David S. Miller 513d2d01b7 Merge branch 'netvsc-small-changes'
Stephen Hemminger says:

====================
netvsc: small changes for net-next

One bugfix, and two non-code patches
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 21:39:51 -07:00
stephen hemminger 76f5ed881c netvsc: remove unused #define
Not used anywhere.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 21:39:51 -07:00
stephen hemminger 262b7f142a netvsc: add comments about callback's and NAPI
Add some short description of how callback's and NAPI interoperate.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 21:39:51 -07:00
stephen hemminger 6de38af611 netvsc: avoid race with callback
Change the argument to channel callback from the channel pointer
to the internal data structure containing per-channel info.
This avoids any possible races when callback happens during
initialization and makes IRQ code simpler.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 21:39:50 -07:00
David S. Miller 3a70418b9c Merge branch 'bpf-inline-lookups'
Alexei Starovoitov says:

====================
bpf: inline bpf_map_lookup_elem()

bpf_map_lookup_elem() is one of the most frequently used helper functions.
Improve JITed program performance by inlining this helper.

bpf_map_type	before  after
hash		58M	74M
array		174M	280M

The values are number of lookups per second in ideal conditions
measured by micro-benchmark in patch 6.

The 'perf report' for HASH map type:
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

so the cost of htab_map_lookup_elem() and bpf_map_lookup_elem()
is gone after inlining.

'per-cpu' and 'lru' map types can be optimized similarly in the future.

Note the sparse will complain that bpf is addictive ;)
kernel/bpf/hashtab.c:438:19: sparse: subtraction of functions? Share your drugs
kernel/bpf/verifier.c:3342:38: sparse: subtraction of functions? Share your drugs
it's not a new warning, just in new places.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:12 -07:00
Alexei Starovoitov 95ff141e52 samples/bpf: add map_lookup microbenchmark
$ map_perf_test 128
speed of HASH bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	46M		58M
after	42M		74M

perf report
before:
    54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
     5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
    18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
     2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
     1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
     1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
functions, yet they take sizeable amount of cpu time.
htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
htab_map_lookup_elem() into three BPF insns which causing cpu time
for bpf_prog_da4fc6a3f41761a2() slightly increase.

$ map_perf_test 256
speed of ARRAY bpf_map_lookup_elem() in lookups per second
	w/o JIT		w/JIT
before	97M		174M
after	64M		280M

before:
    37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
    13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
     6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

after:
    32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
     6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler

array_map_gen_lookup() removes calls to array_map_lookup_elem()
and bpf_map_lookup_elem() and replaces them with 7 bpf insns.

The performance without JIT is slower, since executing extra insns
in the interpreter is slower than running native C code,
but with JIT the performance gains are obvious,
since native C->x86 code is replaced with fewer bpf->x86 instructions.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:12 -07:00
Alexei Starovoitov 9015d2f595 bpf: inline htab_map_lookup_elem()
Optimize:
bpf_call
  bpf_map_lookup_elem
    map->ops->map_lookup_elem
      htab_map_lookup_elem
        __htab_map_lookup_elem
into:
bpf_call
  __htab_map_lookup_elem

to improve performance of JITed programs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-03-16 20:44:11 -07:00