Commit Graph

824321 Commits

Author SHA1 Message Date
Willem de Bruijn 005edd1656 selftests/bpf: convert bpf tunnel test to BPF_ADJ_ROOM_MAC
Avoid moving the network layer header when prefixing tunnel headers.

This avoids an explicit call to bpf_skb_store_bytes and an implicit
move of the network header bytes in bpf_skb_adjust_room.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn 6c408decbd bpf: Sync bpf.h to tools
Sync include/uapi/linux/bpf.h with tools/

Changes
  v1->v2:
  - BPF_F_ADJ_ROOM_MASK moved, no longer in this commit
  v2->v3:
  - BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved, no longer in this commit

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn 868d523535 bpf: add bpf_skb_adjust_room encap flags
When pushing tunnel headers, annotate skbs in the same way as tunnel
devices.

For GSO packets, the network stack requires certain fields set to
segment packets with tunnel headers. gro_gse_segment depends on
transport and inner mac header, for instance.

Add an option to pass this information.

Remove the restriction on len_diff to network header length, which
is too short, e.g., for GRE protocols.

Changes
  v1->v2:
  - document new flags
  - BPF_F_ADJ_ROOM_MASK moved
  v2->v3:
  - BPF_F_ADJ_ROOM_ENCAP_L3_MASK moved

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn 2278f6cc15 bpf: add bpf_skb_adjust_room flag BPF_F_ADJ_ROOM_FIXED_GSO
bpf_skb_adjust_room adjusts gso_size of gso packets to account for the
pushed or popped header room.

This is not allowed with UDP, where gso_size delineates datagrams. Add
an option to avoid these updates and allow this call for datagrams.

It can also be used with TCP, when MSS is known to allow headroom,
e.g., through MSS clamping or route MTU.

Changes v1->v2:
  - document flag BPF_F_ADJ_ROOM_FIXED_GSO
  - do not expose BPF_F_ADJ_ROOM_MASK through uapi, as it may change.

Link: https://patchwork.ozlabs.org/patch/1052497/
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn 14aa31929b bpf: add bpf_skb_adjust_room mode BPF_ADJ_ROOM_MAC
bpf_skb_adjust_room net allows inserting room in an skb.

Existing mode BPF_ADJ_ROOM_NET inserts room after the network header
by pulling the skb, moving the network header forward and zeroing the
new space.

Add new mode BPF_ADJUST_ROOM_MAC that inserts room after the mac
header. This allows inserting tunnel headers in front of the network
header without having to recreate the network header in the original
space, avoiding two copies.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:45 -07:00
Willem de Bruijn 8142958954 selftests/bpf: extend bpf tunnel test with tso
Segmentation offload takes a longer path. Verify that the feature
works with large packets.

The test succeeds if not setting dodgy in bpf_skb_adjust_room, as veth
TSO is permissive.

If not setting SKB_GSO_DODGY, this enables tunneled TSO offload on
supporting NICs.

The feature sets SKB_GSO_DODGY because the caller is untrusted. As a
result the packets traverse through the gso stack at least up to TCP.
And fail the gso_type validation, such as the skb->encapsulation check
in gre_gso_segment and the gso_type checks introduced in commit
418e897e07 ("gso: validate gso_type on ipip style tunnel").

This will be addressed in a follow-on feature patch. In the meantime,
disable the new gso tests.

Changes v1->v2:
  - not all netcat versions support flag '-q', use timeout instead

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn 7255fade7b selftests/bpf: extend bpf tunnel test with gre
GRE is a commonly used protocol. Add GRE cases for both IPv4 and IPv6.

It also inserts different sized headers, which can expose some
unexpected edge cases.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn ef81bd0549 selftests/bpf: expand bpf tunnel test to ipv6
The test only uses ipv4 so far, expand to ipv6.
This is mostly a boilerplate near copy of the ipv4 path.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn ccd34cd357 selftests/bpf: expand bpf tunnel test with decap
The bpf tunnel test encapsulates using bpf, then decapsulates using
a standard tunnel device to verify correctness.

Once encap is verified, also test decap, by replacing the tunnel
device on decap with another bpf program.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn 98cdabcd07 selftests/bpf: bpf tunnel encap test
Validate basic tunnel encapsulation using ipip.

Set up two namespaces connected by veth. Connect a client and server.
Do this with and without bpf encap.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Willem de Bruijn 908adce646 bpf: in bpf_skb_adjust_room avoid copy in tx fast path
bpf_skb_adjust_room calls skb_cow on grow.

This expensive operation can be avoided in the fast path when the only
other clone has released the header. This is the common case for TCP,
where one headerless clone is kept on the retransmit queue.

It is safe to do so even when touching the gso fields in skb_shinfo.
Regular tunnel encap with iptunnel_handle_offloads takes the same
optimization.

The tcp stack unclones in the unlikely case that it accesses these
fields through headerless clones packets on the retransmit queue (see
__tcp_retransmit_skb).

If any other clones are present, e.g., from packet sockets,
skb_cow_head returns the same value as skb_cow().

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-22 13:52:44 -07:00
Ivan Vecera f682752627 selftests: bpf: modify urandom_read and link it non-statically
After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.

Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 19:37:30 -07:00
Daniel T. Lee ab99e7a8f7 samples: bpf: add xdp_sample_pkts to .gitignore
This commit adds xdp_sample_pkts to .gitignore which is
currently ommited from the ignore file.

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 19:35:36 -07:00
Alexei Starovoitov 2569473816 Merge branch 'bpf_tcp_check_syncookie'
Lorenz Bauer says:

====================
This series adds the necessary helpers to determine wheter a given
(encapsulated) TCP packet belongs to a connection known to the network stack.

* bpf_skc_lookup_tcp gives access to request and timewait sockets
* bpf_tcp_check_syncookie identifies the final 3WHS ACK when syncookies
  are enabled

The goal is to be able to implement load-balancing approaches like
glb-director [1] or Beamer [2] in pure eBPF. Specifically, we'd like to replace
the functionality of the glb-redirect kernel module [3] by an XDP program or
tc classifier.

Changes in v3:
* Fix missing check for ip4->ihl
* Only cast to unsigned long in BPF_CALLs

Changes in v2:
* Rename bpf_sk_check_syncookie to bpf_tcp_check_syncookie.
* Add bpf_skc_lookup_tcp. Without it bpf_tcp_check_syncookie doesn't make sense.
* Check tcp_synq_no_recent_overflow() in bpf_tcp_check_syncookie.
* Check th->syn in bpf_tcp_check_syncookie.
* Require CONFIG_IPV6 to be a built in.

1: https://github.com/github/glb-director
2: https://www.usenix.org/conference/nsdi18/presentation/olteanu
3: https://github.com/github/glb-director/tree/master/src/glb-redirect
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:12 -07:00
Lorenz Bauer bafc0ba826 selftests/bpf: add tests for bpf_tcp_check_syncookie and bpf_skc_lookup_tcp
Add tests which verify that the new helpers work for both IPv4 and
IPv6, by forcing SYN cookies to always on. Use a new network namespace
to avoid clobbering the global SYN cookie settings.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer 5792d52df1 selftests/bpf: test references to sock_common
Make sure that returning a struct sock_common * reference invokes
the reference tracking machinery in the verifier.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer dbaf2877e9 selftests/bpf: allow specifying helper for BPF_SK_LOOKUP
Make the BPF_SK_LOOKUP macro take a helper function, to ease
writing tests for new helpers.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:11 -07:00
Lorenz Bauer 253c8dde3c tools: update include/uapi/linux/bpf.h
Pull definitions for bpf_skc_lookup_tcp and bpf_sk_check_syncookie.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer 3990408470 bpf: add helper to check for a valid SYN cookie
Using bpf_skc_lookup_tcp it's possible to ascertain whether a packet
belongs to a known connection. However, there is one corner case: no
sockets are created if SYN cookies are active. This means that the final
ACK in the 3WHS is misclassified.

Using the helper, we can look up the listening socket via
bpf_skc_lookup_tcp and then check whether a packet is a valid SYN
cookie ACK.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer edbf8c01de bpf: add skc_lookup_tcp helper
Allow looking up a sock_common. This gives eBPF programs
access to timewait and request sockets.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer 85a51f8c28 bpf: allow helpers to return PTR_TO_SOCK_COMMON
It's currently not possible to access timewait or request sockets
from eBPF, since there is no way to return a PTR_TO_SOCK_COMMON
from a helper. Introduce RET_PTR_TO_SOCK_COMMON to enable this
behaviour.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Lorenz Bauer 0f3adc288d bpf: track references based on is_acquire_func
So far, the verifier only acquires reference tracking state for
RET_PTR_TO_SOCKET_OR_NULL. Instead of extending this for every
new return type which desires these semantics, acquire reference
tracking state iff the called helper is an acquire function.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-21 18:59:10 -07:00
Adrian Ratiu 48e5d98a0e selftests/bpf: Add arm target register definitions
eBPF "restricted C" code can be compiled with LLVM/clang using target
triplets like armv7l-unknown-linux-gnueabihf and loaded/run with small
cross-compiled gobpf/elf [1] programs without requiring a full BCC
port which is also undesirable on small embedded systems due to its
size footprint. The only missing pieces are these helper macros which
otherwise have to be redefined by each eBPF arm program.

[1] https://github.com/iovisor/gobpf/tree/master/elf

Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-20 20:28:58 -07:00
YueHaibing a534ea30e7 net: isdn: Make isdn_ppp_mp_discard and isdn_ppp_mp_reassembly static
Fix sparse warnings:

drivers/isdn/i4l/isdn_ppp.c:1891:16: warning:
 symbol 'isdn_ppp_mp_discard' was not declared. Should it be static?
drivers/isdn/i4l/isdn_ppp.c:1903:6: warning:
 symbol 'isdn_ppp_mp_reassembly' was not declared. Should it be static?

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 12:25:59 -07:00
YueHaibing 881d7afdff net: hns3: Make hclge_destroy_cmd_queue static
Fix sparse warning:

drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c:414:6:
 warning: symbol 'hclge_destroy_cmd_queue' was not declared. Should it be static?

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:20:12 -07:00
David S. Miller 75d317c409 Merge branch 'net-refactor-ndo_select_queue'
Paolo Abeni says:

====================
net: refactor ndo_select_queue()

Currently, on most devices implementing ndo_select_queue(), we get 2
indirect calls per xmit packet, at least in some scenarios.

We can avoid one of such indirect calls refactoring the ndo_select_queue()
usage so that we don't need anymore the 'fallback' argument.

The first patch renames a helper used later as a public API, the second one
changes the af packet implementation so that it uses the common infrastructure
to select the xmit queue, and the second patch drops the now unneeded argument
from ndo_select_queue().

Alternatively we could use the INDIRECT_CALL_WRAPPER infrastructure to avoid
the fallback indirect call in the common case, but this solution allows also
for some code cleanup.

 v1 -> v2:
  - renamed select queue helpers, as per Eric's and David's suggestions
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:55 -07:00
Paolo Abeni a350eccee5 net: remove 'fallback' argument from dev->ndo_select_queue()
After the previous patch, all the callers of ndo_select_queue()
provide as a 'fallback' argument netdev_pick_tx.
The only exceptions are nested calls to ndo_select_queue(),
which pass down the 'fallback' available in the current scope
- still netdev_pick_tx.

We can drop such argument and replace fallback() invocation with
netdev_pick_tx(). This avoids an indirect call per xmit packet
in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
with device drivers implementing such ndo. It also clean the code
a bit.

Tested with ixgbe and CONFIG_FCOE=m

With pktgen using queue xmit:
threads		vanilla 	patched
		(kpps)		(kpps)
1		2334		2428
2		4166		4278
4		7895		8100

 v1 -> v2:
 - rebased after helper's name change

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:55 -07:00
Paolo Abeni b71b5837f8 packet: rework packet_pick_tx_queue() to use common code selection
Currently packet_pick_tx_queue() is the only caller of
ndo_select_queue() using a fallback argument other than
netdev_pick_tx.

Leveraging rx queue, we can obtain a similar queue selection
behavior using core helpers. After this change, ndo_select_queue()
is always invoked with netdev_pick_tx() as fallback.
We can change ndo_select_queue() signature in a followup patch,
dropping an indirect call per transmitted packet in some scenarios
(e.g. TCP syn and XDP generic xmit)

This changes slightly how af packet queue selection happens when
PACKET_QDISC_BYPASS is set. It's now more similar to plan dev_queue_xmit()
tacking in account both XPS and TC mapping.

 v1  -> v2:
  - rebased after helper name change
 RFC -> v1:
  - initialize sender_cpu to the expected value

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:55 -07:00
Paolo Abeni 4bd97d51a5 net: dev: rename queue selection helpers.
With the following patches, we are going to use __netdev_pick_tx() in
many modules. Rename it to netdev_pick_tx(), to make it clear is
a public API.

Also rename the existing netdev_pick_tx() to netdev_core_pick_tx(),
to avoid name clashes.

Suggested-by: Eric Dumazet <edumazet@google.com>
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:18:54 -07:00
David S. Miller 0b963ef20c Merge branch 'qed-next'
Sudarsana Reddy Kalluru says:

====================
qed* enhancements.

The patch series adds couple of enhancements for qed/qede drivers.
Please consider applying it to 'net-next' tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:12:50 -07:00
Sudarsana Reddy Kalluru 1a3ca25062 qed: Define new MF bit for no_vlan config
The patch introduces a new Multi-Function bit for cases where firmware
shouldn't perform the insertion of vlan-0 tag. The new bit is defined to
abstract the implementation from the actual MF mode.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:12:50 -07:00
Sudarsana Reddy Kalluru a88381dece qede: Populate mbi version in ethtool driver query data.
The patch adds support to display MBI image version in 'ethtool -i' output.

Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com>
Signed-off-by: Ariel Elior <aelior@marvell.com>
Signed-off-by: Michal Kalderon <mkalderon@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:12:50 -07:00
Hangbin Liu 254c0a2bfe macvlan: pass get_ts_info and SIOC[SG]HWTSTAMP ioctl to real device
Similiar to commit a6111d3c93 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to
real device") and commit 37dd9255b2 ("vlan: Pass ethtool get_ts_info
queries to real device."), add MACVlan HW ptp support.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:04:41 -07:00
Mao Wenan 1bfe45f4ae net: bridge: use eth_broadcast_addr() to assign broadcast address
This patch is to use eth_broadcast_addr() to assign broadcast address
insetad of memset().

Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:02:47 -07:00
Vakul Garg f295b3ae9f net/tls: Add support of AES128-CCM based ciphers
Added support for AES128-CCM based record encryption. AES128-CCM is
similar to AES128-GCM. Both of them have same salt/iv/mac size. The
notable difference between the two is that while invoking AES128-CCM
operation, the salt||nonce (which is passed as IV) has to be prefixed
with a hardcoded value '2'. Further, CCM implementation in kernel
requires IV passed in crypto_aead_request() to be full '16' bytes.
Therefore, the record structure 'struct tls_rec' has been modified to
reserve '16' bytes for IV. This works for both GCM and CCM based cipher.

Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 11:02:05 -07:00
David S. Miller 6a23c0a6af Merge branch 'net-phy-aquantia-add-interface-mode-handling'
Heiner Kallweit says:

====================
net: phy: aquantia: add interface mode handling

These two patches add interface mode handling for the AQR107/AQCS109.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:58:17 -07:00
Nikita Yushchenko 1e614b5086 net: phy: aquantia: check for changed interface mode in read_status
Depending on the auto-negotiated speed the PHY may change the interface
mode. Check for new mode and set phydev->interface accordingly.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: picked from bigger patch and reworked]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:58:17 -07:00
Andrew Lunn 570c8a7d53 net: phy: aquantia: check for supported interface modes in config_init
Let config_init check for unsupported interface modes on AQR107/AQCS109.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
[hkallweit1@gmail.com: adjusted for AQR107/AQCS109 specifics]
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:58:17 -07:00
Heiner Kallweit 5c5f626bca net: phy: improve handling link_change_notify callback
Currently the Phy driver's link_change_notify callback is called
whenever the state machine is run (every second if polling), no matter
whether the state changed or not. This isn't needed and may confuse
users considering the name of the callback. Actually it contradicts
its kernel-doc description. Therefore let's change the behavior and
call this callback only in case of an actual state change.

This requires changes to the at803x and rockchip drivers.
at803x can be simplified so that it reacts on a state change to
PHY_NOLINK only.
The rockchip driver can also be much simplified. We simply re-init
the AFE/DSP registers whenever we change to PHY_RUNNING and speed
is 100Mbps. This causes very small overhead because we do this even
if the speed was 100Mbps already. But this is negligible and
I think justified by the much simpler code.

Changes are compile-tested only.

A little bit problematic seems to be to find somebody with the
hardware to test the changes to the two PHY drivers. See also [0].
David may be able to test the Rockchip driver.

[0] https://marc.info/?t=153782508800006&r=1&w=2

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:49:33 -07:00
David S. Miller 0b8515eddb Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
100GbE Intel Wired LAN Driver Updates 2019-03-19

This series contains updates to ice driver only.

Michal adds support for the pruning enable flag to avoid seeing
broadcast packets on different VLANs.

Akeem fixes an issue with VF queues being disabled and the VF netdev
network carrier being lost after reset. Fixed an issue issue when doing
PFR and CORER resets, where all VF VSIs need to be reset and rebuilt
with the main VSIs before replaying all VSIs.  Resolved an issue to
properly initialize VFs in the guest OS via PCI passthrough.

Bruce adds a local variable to avoid unnecessary de-references
throughout ice_probe().

Brett cleans up the code a bit by removing the need for a local variable
and re-designs the loop to simply return when get a successful result.
Cleans up the code to replace loop calls with a predefined macro to make
the code more consistent.  Updated the driver to ensure ITR granularity
is always 2 usecs. Refactors the calculation of VSIs per PF into a
general function that can calculate per PF allocations for not just VSIs
but across multiple resource types.  Improve the driver performance of
the driver when using the default settings by determining the ring size
and the number of descriptors for transmit and receive based on a
calculation with the PAGE_SIZE, ICE_MAX_NUM_DESC, and
ICE_REQ_DESC_MULTIPLE.

Chinh fixes an issue, where a reserved bit was possibly being set when
it should never be set.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:14:10 -07:00
David S. Miller 8d3a3048c3 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
1GbE Intel Wired LAN Driver Updates 2019-03-19

This series contains updates to e100, e1000, e1000e, igb, igc and
ixgbe.

Serhey Popovych fixes the return value for several of our older
drivers for netdev_update_features() to notify of changes applied.

Kai-Heng Feng fixes the WoL setting for system suspend, which should
not set to runtime suspend settings for igb.  Then fixes a power
management issue with e1000e for CNP+ devices.

Colin Ian King fixes whitespace issue (indentation), which helps with
readability.

Sasha provides the remaining changes for igc, including the enabling of
multi-queues to receive.  Added support for displaying and configuring
network flow classification (NFC) via ethtool.  Added additional
statistics and basic counters for igc.  Fixed a typo, so it aligns with
our other drivers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-20 10:12:03 -07:00
Brett Creeley ad71b256ba ice: Determine descriptor count and ring size based on PAGE_SIZE
Currently we set the default number of Tx and Rx descriptors to 128 by
default. For Rx this amounts to a full page (assuming 4K pages) because
each Rx descriptor is 32 Bytes, but for Tx it only amounts to a half
page because each Tx descriptor is 16 Bytes (assuming 4K pages).
Instead of assuming 4K pages, determine the ring size and the number of
descriptors for Tx and Rx based on a calculation using the PAGE_SIZE,
ICE_MAX_NUM_DESC, and ICE_REQ_DESC_MULTIPLE. This change is being made
to improve the performance of the driver when using the default
settings.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 17:24:03 -07:00
Akeem G Abodunrin 544f63d307 ice: Reset all VFs with VFLR during SR-IOV init flow
During SR-IOV initialization, we allocate and setup VFs with reset, and
since we were going to inform Firmware about our intention to do VFLR by
disabling LAN TX Queue, then we really have to complete VF reset flow with
VFLR using appropriate registers - Otherwise, reset status bit for VF in
the Guest OS might returns DEADBEEF.
This resolves issue to properly initialize VFs in the Guest OS via PCI
passthrough.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 17:02:26 -07:00
Brett Creeley 7a1f711175 ice: Get resources per function
ice_get_guar_num_vsi currently calculates the number of VSIs per PF.
Rework this into a general function ice_get_num_per_func, that can
calculate per PF allocations for not just VSIs but across multiple
resource types.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 17:00:54 -07:00
Akeem G Abodunrin 1c44e3bce1 ice: Implement flow to reset VFs with PFR and other resets
All VF VSIs need to be reset and rebuild with the main VSIs before
replaying all VSIs, so that all existing switch filters, scheduler tree
and other configuration could be replayed at once. This fixes issues when
doing PFR and CORER reset.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 17:00:09 -07:00
Brett Creeley 70457520ba ice: configure GLINT_ITR to always have an ITR gran of 2
Instead of hoping that our ITR granularity will be 2 usec program the
GLINT_CTL register to make sure the ITR granularity is always 2 usecs.

Now that we know what the ITR granularity will be get rid of the check
in ice_probe() to verify our previous assumption.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:56:10 -07:00
Brett Creeley 80ed404abb ice: use ice_for_each_vsi macro when possible
Replace all instances of:
	for (i = 0; i < pf->num_alloc_vsi; i++)

with the following macro:
	ice_for_each_vsi(pf, i)

This will allow the code to be consistent since there are currently
cases of using both.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:54:23 -07:00
Chinh T Cao d8df260af7 ice : Ensure only valid bits are set in ice_aq_set_phy_cfg
In the ice_aq_set_phy_cfg AQ command, the 16.4 bit is reserved. This
patch will make sure that this bit will never be set to 1.

Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:52:47 -07:00
Brett Creeley 16c3301b55 ice: remove redundant variable and if condition
In ice_pf_rxq_wait we are using an unnecessary local variable and also
we are checking if the timeout time was reached after the loop. Get rid
of the local variable and return 0 right when we get a successful
result. This makes it so we can return -ETIMEDOUT if we ever exit the
loop because we know the timeout time has been hit.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:51:08 -07:00
Bruce Allan 77ed84f49a ice: avoid multiple unnecessary de-references in probe
Add a local variable struct device *dev to avoid unnecessary de-references
throughout ice_probe().

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-03-19 16:49:36 -07:00