When run ethtool cmd(ethtool -t ethx) again and again for a long
time, it will be probabilistically fail. The PHYs' registers may
be on different pages, so it must be switch to the right page
before setting PHYs' registers.
And __lb_up() calls phy_start() to startup the PHYs device, but
this function may change Copper Control Register(Page 0, Register 0)
to an other value. It would cause phy loopback test fail. if we
remove phy_start(), we have to remove the relative phy_stop(),
phy_disconnect() when doing phy loopback to keep the phy stay in
right status.
Reported-by: hejun <hjat2005@huawei.com>
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As Hilink3 and Hilink4 use the same xge training and xge u adaptor for
HNSv2, it needs to select which Hilink to be set before relative serdes
being configed. The hilink_access_sel is the register to do that.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The error info should be printed as "set mask to 64bit fail!" instead of
"set mask to 32bit fail!" in dma_set_mask_and_coherent().
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver uses devm_ioremap_resource, it will unmap the map
automatically, remove the unnecessary the resource free.
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The annotation info for hns_nic_reset_subtask() should be
"for resetting subtask" instead of "for resetting suntask".
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
HNS receives a packet without doing anything, but it should call
skb_reset_mac_header() to initialize the header before using
eth_hdr().
Fixes: 0d6b425a37
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For service port, hns dsaf v1 support to close tx_pause.
However, the port will be invalid when it run command
ethtool to close tx_pause. This patch will fix it.
Signed-off-by: Qianqian Xie <xieqiaqian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The bit fileds of PPE reset register are different between HNS v1 and
HNS v2, but the current procedure just only match HNS v1. Here is a
patch to fix it.
Signed-off-by: Kejian Yan <yankejian@huawei.com>
Signed-off-by: Qianqian Xie <xieqianqian@huawei.com>
Signed-off-by: Yisen Zhuang <Yisen.Zhuang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: probe compatible
This patchset factorizes the legacy and new SMI probing and abstracts
the switch register accesses. This simplifies adding support for new
chips or alternative register accesses.
This will allow us to use a compatible chip info to describe how to
access the SMI device and its switch ID register at probe time.
For the legacy probe, we fix the compatible info to 88E6085. For the
MDIO probe, we will use the compatible info from the device node data.
All patches are reviewed.
Changes since v4:
- fix debug printing (was 'val' instead of '*val')
Changes since v3 [3]:
- better register access abstraction using the chip structure
Changes since v2 [2]:
- do not guess compatible model in legacy probe
- add low level SMI API using a chip structure
- allocate before probe and detection
- add 3 cosmetic patches
Changes since v1 [1]:
- merge style fix from Ben Dooks
- add Acked-by/Reviewed-by tags
- drop one compatible string per model
- detect the SMI device based on the compatible info
- add an SMI ops structure
[1] https://lkml.org/lkml/2016/6/8/1201
[2] https://lkml.org/lkml/2016/6/14/671
[3] https://lkml.org/lkml/2016/6/17/995
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the SMI address of the switch chip is zero, the chip assumes to be
the only one on the SMI master bus and thus responds to all its known
SMI devices addresses (port registers, Global2, etc.)
When its SMI address is not zero, some chips (e.g. 88E6352) use an
indirect access through two SMI Command and Data registers.
Other models (e.g. 88E6060) using less than 16 internal SMI addresses
always use a direct access.
Add a capability flag to describe chips supporting the (indirect)
Multi-chip Addressing Mode, and a low-level API to access the registers
via SMI.
Other accesses (like Ethernet management frames) may be added later.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch ID is located at address 0x3 of every Port Registers bank.
But not all Marvell switches have their Port Registers SMI Addresses
starting at 0x10. 88E6060 starts at 0x8 and 88E6390 starts at 0x0.
Add this data in the info structure and use it in the detection code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
After allocating the chip structure, pass it a compatible info pointer.
The compatible info structure will be used later to describe how to
access the switch registers and where to read the switch ID.
For the standard MDIO probe, get it from the device node data. For the
legacy DSA driver probing, pass it the 88E6085 info.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the common detection code which assigns the info structure to
the chip given the read switch ID.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an helper function to isolate SMI specific assignments and checks.
This function will later help choosing the different SMI accesses based
of the compatible info.
Since the chip structure is already allocated in the legacy probe, use
the mv88e6xxx_reg_read access routine instead of __mv88e6xxx_reg_read.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an helper function to allocate the chip structure at the beginning
of the probe functions. It will be used to initialize the SMI access.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip smi_mutex mutex is used to protect the access to the internal
switch registers, not only the Multi-chip Addressing Mode, as commented.
Since we will isolate SMI-specific pieces of code, avoid the confusion
now by renaming smi_mutex to reg_lock. No functional changes here.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mv88e6xxx_table array and the mv88e6xxx_lookup_info function are
static, so remove the table and size arguments from the lookup function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the optional variant to get the reset GPIO line, instead of checking
for the -ENOENT error.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mixed assignments, allocations and registrations in the probe code
make it hard to follow the logic and figure out what is DSA or chip
specific.
Extract the struct dsa_switch related code in a simple
mv88e6xxx_register_switch helper function.
For symmetry in the code, add a mv88e6xxx_unregister_switch function.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MDIO device probe and remove functions are respectively incrementing
and decrementing the bus refcount themselves. Since these bus level
actions are out of the device scope, remove them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the MDIO probing function, dev is already assigned to &mdiodev->dev
and np is already assigned to mdiodev->dev.of_node, so use them.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
The chip->ds and ds->slave_mii_bus assignments are common to both legacy
and new MDIO probing and are already done in the later setup code.
Remove the duplicated assignments from the MDIO probing code.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes 5 style problems reported by checkpatch:
WARNING: suspect code indent for conditional statements (8, 24)
#492: FILE: drivers/net/dsa/mv88e6xxx.c:492:
+ if (phydev->link)
+ reg |= PORT_PCS_CTRL_LINK_UP;
CHECK: Logical continuations should be on the previous line
#1318: FILE: drivers/net/dsa/mv88e6xxx.c:1318:
+ oldstate == PORT_CONTROL_STATE_FORWARDING)
+ && (state == PORT_CONTROL_STATE_DISABLED ||
CHECK: multiple assignments should be avoided
#1662: FILE: drivers/net/dsa/mv88e6xxx.c:1662:
+ vlan->vid_begin = vlan->vid_end = next.vid;
WARNING: line over 80 characters
#2097: FILE: drivers/net/dsa/mv88e6xxx.c:2097:
+ const struct switchdev_obj_port_vlan *vlan,
WARNING: suspect code indent for conditional statements (16, 32)
#2734: FILE: drivers/net/dsa/mv88e6xxx.c:2734:
+ if (mv88e6xxx_6352_family(ps) || mv88e6xxx_6351_family(ps) ||
[...]
+ reg |= PORT_CONTROL_EGRESS_ADD_TAG;
total: 0 errors, 3 warnings, 2 checks, 3805 lines checked
It also rebases and integrates changes sent by Ben Dooks [1]:
The driver has a number of functions that are not exported or
declared elsewhere, so make them static to avoid the following
warnings from sparse:
drivers/net/dsa/mv88e6xxx.c:113:5: warning: symbol 'mv88e6xxx_reg_read' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:167:5: warning: symbol 'mv88e6xxx_reg_write' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:231:5: warning: symbol 'mv88e6xxx_set_addr' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:367:6: warning: symbol 'mv88e6xxx_ppu_state_init' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3157:5: warning: symbol 'mv88e6xxx_phy_page_read' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3169:5: warning: symbol 'mv88e6xxx_phy_page_write' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3583:26: warning: symbol 'mv88e6xxx_switch_driver' was not declared. Should it be static?
drivers/net/dsa/mv88e6xxx.c:3621:5: warning: symbol 'mv88e6xxx_probe' was not declared. Should it be static?
[1] http://patchwork.ozlabs.org/patch/632708/
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Acked-by: Mans Rullgard <mans@mansr.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
ipv6: better traceroute support in sit and gre
In commit ca15a078bd ("sit: generate icmpv6 error when receiving icmpv4
error"), Oussama Ghorbel added minimal support for SIT tunnels to
generate ICMPv6 messages when receiving ICMPv4 messages.
This patch series extends this to GRE tunnels.
It also adds support for ICMPV6_TIME_EXCEED.
Partial support for RFC 4884 is added, to forward ICMP Multi-part
extensions eventually found in the ICMPv4 message.
v2: replaced an overlapping memcpy() by memmove()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When receiving an ICMPv4 message containing extensions as
defined in RFC 4884, and translating it to ICMPv6 at SIT
or GRE tunnel, we need some extra manipulation in order
to properly forward the extensions.
This patch only takes care of Time Exceeded messages as they
are the ones that typically carry information from various
routers in a fabric during a traceroute session.
It also avoids complex skb logic if the data_len is not
a multiple of 8.
RFC states :
The "original datagram" field MUST contain at least 128 octets.
If the original datagram did not contain 128 octets, the
"original datagram" field MUST be zero padded to 128 octets.
In practice routers use 128 bytes of original datagram, not more.
Initial translation was added in commit ca15a078bd
("sit: generate icmpv6 error when receiving icmpv4 error")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ipgre_err() can call ip6_err_gen_icmpv6_unreach() for proper
support of ipv4+gre+icmp+ipv6+... frames, used for example
by traceroute/mtr.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For better traceroute/mtr support for SIT and GRE tunnels,
we translate IPV4 ICMP ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED
We also have to translate the IPv4 source IP address of ICMP
message to IPv6 v4mapped.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We want to use this helper from GRE as well, so this is
the time to move it in net/ipv6/icmp.c
Also add a @nhs parameter, since SIT and GRE have different
values for the header(s) to skip.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SIT or GRE tunnels might want to translate an IPV4 address
into a v4mapped one when translating ICMP to ICMPv6.
This patch adds the parameter to icmp6_send() but
does not change icmpv6_send() signature.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.
Signed-off-by: Philippe Reynes <tremyfr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warnings:
net/rds/tcp.c:59:5: warning:
symbol 'rds_tcp_min_sndbuf' was not declared. Should it be static?
net/rds/tcp.c:60:5: warning:
symbol 'rds_tcp_min_rcvbuf' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove including <linux/version.h> that don't need it.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove including <linux/version.h> that don't need it.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
-----BEGIN PGP SIGNATURE-----
iQEcBAABCgAGBQJXZAIqAAoJED07qiWsqSVqW0YH/RMgC6CDlJUtHr7+B8YLpi0e
BZOzAHH7mQdP+Z2kIXZp8Dnziq5G8heWNjRwTMPLHvOel+oms+0ZK6VY/kJYArdb
ViGGgi/gQ334JGqJYi2utkyIIIRH7ZxwcblF1aaaFVfFy7tZMVuppIWVzR/V0Gje
/5FftT1f04/6iumEq4es+Jb0OC9azoebSs1DUZTIvYOz3XrnCbB1FdmDN+a3xZKC
Qyav6QVnp/m2InzGSN+Kd/W++EP6YckdBp/++2hizsOvSIOfe8GRqkc0r7fLbfZ8
rucDJNi+GrLat4wNza4t3FKf3rBenBFzii14OEUTE0JgpY90fGUF3n/i0duaB/k=
=90Ns
-----END PGP SIGNATURE-----
Merge tag 'linux-can-next-for-4.8-20160617' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2016-06-17
this is a pull request of 14 patches for net-next/master.
Geert Uytterhoeven contributes a patch that adds a file patterns for
CAN device tree bindings to MAINTAINERS. A patch by Alexander Aring
fixes warnings when building without proc support. A patch by me
improves the sample point calculation. Marek Vasut's patch converts
the slcan driver to use CAN_MTU. A patch by William Breathitt Gray
converts the tscan1 driver to use module_isa_driver.
Two patches by Maximilian Schneider for the gs_usb driver fix coding
style and add support for set_phys_id callback. 5 patches by Oliver
Hartkopp add support for CANFD to the bcm. And finally two patches
by Ramesh Shanmugasundaram, which add support for the rcar_canfd
driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk says:
====================
net: ethernet: ti: cpsw: delete rx_descs property
There is no reason in rx_descs property because davinici_cpdma
driver splits pool of descriptors equally between tx and rx channels.
So, this patch series makes driver to use available number of
descriptors for rx channels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason to hold s/w dependent parameter in device tree.
Even more, there is no reason in this parameter because davinici_cpdma
driver splits pool of descriptors equally between tx and rx channels
anyway.
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no reason in rx_descs property because davinici_cpdma
driver splits pool of descriptors equally between tx and rx channels.
That is, if number of descriptors 256, 128 of them are for rx
channels. While receiving, the descriptor is freed to the pool and
then allocated with new skb. And if in DT the "rx_descs" is set to
64, then 128 - 64 = 64 descriptors are always in the pool and cannot
be used, for tx, for instance. It's not correct resource usage,
better to set it to half of pool, then the rx pool can be used in
full. It will not have any impact on performance, as anyway, the
"redundant" descriptors were unused.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
"up_map" is a u64 type but we're not using the high 32 bits.
Fixes: 35c55c9877 ('tipc: add neighbor monitoring framework')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern says:
====================
net: vrf: Fix ipv6 source address selection
IPv6 address selection is currently messed up for several use cases such
as unnumbered deployments with global addresses on the VRF device and none
on the enslaved devices.
Update the source address selection to consider the real output route as
opposed to the VRF route that sends packets to the VRF device first (ie.,
implement get_saddr6 similar to the IPv4 method) and update the IPv6
address selection to consider L3 domains and preference for addresses on
the VRF device).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 version of 3f2fb9a834 ("net: l3mdev: address selection should only
consider devices in L3 domain") and the follow up commit, a17b693cdd876
("net: l3mdev: prefer VRF master for source address selection").
That is, if outbound device is given then the address preference order
is an address from that device, an address from the master device if it
is enslaved, and then an address from a device in the same L3 domain.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 source address selection needs to consider the real egress route.
Similar to IPv4 implement a get_saddr6 method which is called if
source address has not been set. The get_saddr6 method does a full
lookup which means pulling a route from the VRF FIB table and properly
considering linklocal/multicast destination addresses. Lookup failures
(eg., unreachable) then cause the source address selection to fail
which gets propagated back to the caller.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VRF driver needs access to ip6_route_get_saddr code. Since it does
little beyond ipv6_dev_get_saddr and ipv6_dev_get_saddr is already
exported for modules move ip6_route_get_saddr to the header as an
inline.
Code move only; no functional change.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The variable i was declared but was never used and we were getting a
build warning for that.
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck says:
====================
Future-proof tunnel offload handlers
These patches are meant to address two things. First we are currently
using the ndo_add/del_vxlan_port calls with VXLAN-GPE tunnels and we
cannot really support that as it is likely to cause more harm than
good since VXLAN-GPE can support tunnels without a MAC address on the
inner header.
As such we need to add a new offload to advertise this, but in doing so it
would mean introducing 3 new functions for the driver to request the ports,
and then for the tunnel to push the changes to add and delete the ports to
the device. However instead of taking that approach I think it would be
much better if we just made one common function for fetching the ports, and
provided a generic means to push the tunnels to the device. So in order to
make this work this patch set does several things.
First it merges the existing VXLAN and GENEVE functionality into one set of
functions and passes an enum in order to specify the type of tunnel we want
to offload. By doing this we only have to extend this enum in the future
if we want to add additional types.
Second it goes through the drivers replacing all of the tunnel specific
offload calls with implementations that support the generic calls so that
we can drop the VXLAN and GENEVE specific calls entirely.
Finally I go through in the last patch and replace the VXLAN specific
offload request that was being used for VXLAN-GPE with one that specifies
if we want to offload VXLAN or VXLAN-GPE so that the hardware can decide if
it can actually support it or not.
I also ended up with some minor clean-up built into the driver patches for
this. Most of it is to either fix misuse of build flags, specifying a type
to ignore instead of the type that should be used, or in the case of ixgbe
I actually moved a rtnl_lock/unlock in order to avoid taking it unless it
was actually needed.
v2:
I did my best to remove the word "offload" from any of the calls or
notifiers as this isn't really an offload. It
is a workaround for the fact that the drivers don't provide basic features
like CHECKSUM_COMPLETE. I also added a disclaimer to the section defining
the function prototypes explaining that these are essentially workarounds.
I ended up going through and stripping all of the VXLAN and GENEVE build
flags from the drivers. There isn't much point in carrying them. In
addition I dropped the use of the vxlan.h or geneve.h header files in favor
of udp_tunnel.h in the cases where a driver didn't need anything from
either of those headers.
I updated the tunnel add/del functions so that they pass a udp_tunnel_info
structure instead of a list of arguments. This way we should be able to
add additional information in the future with little impact on the other
drivers.
I updated bnxt so that it doesn't use a hard-coded port number for GENEVE.
I have been able to test mlx4e, mlx5e, and i40e and verified functionality
on these drivers. Though there are patches for the net tree I submitted
due to unrelated bugs I found in the mlx4e and i40e/i40evf drivers.
v3:
Fixed a typo that caused us to add geneve port when we should have been
deleting it.
Ended up dropping geneve and vxlan wrappers for
udp_tunnel_notify_rx_add/del_port and instead just called them directly.
Updated comments for functions to call out RTNL instead of port lock.
Updated patch description to remove changes that were moved into a second
patch.
Rebased on latest net-next to fix merge conflict on bnxt driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The fact is VXLAN with Generic Protocol Extensions cannot be supported by
the same hardware parsers that support VXLAN. The protocol extensions
allow for things like a Next Protocol field which in turn allows for things
other than Ethernet to be passed over the tunnel. Most existing parsers
will not know how to interpret this.
To resolve this I am giving VXLAN-GPE its own UDP encapsulation offload
type. This way hardware that does support GPE can simply add this type to
the switch statement for VXLAN, and if they don't support it then this will
fix any issues where headers might be interpreted incorrectly.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>