Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables.
Fixes: 7f47b19bd7 ("mlxsw: spectrum_kvdl: Add support for per part occupancy")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a new trap for PIMv6 packets. As PIM already has a designated trap
group [ & rate limiter], simply use the same for PIMv6 as well.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Following previous patches driver is ready to handle notifications
arriving from ip6mr - start processing those when they arrive following
the same manner ipmr currently goes through.
This should enable driver to start offloading ipv6 multicast routes.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Populate the various operation structures meant for IPv6 with logic
unique to that protocol suite.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
spectrum_router and spectrum_mr have several APIs that are used to
manipulate configurations originating from ipmr fib notifications.
Following previous patches all the protocol-specifics that are necessary
for the configuration are hidden within spectrum_mr. This allows us to
clean the API and make sure that other than choosing the mr_table based
on the fib notification family, spectrum_router wouldn't care about the
source of the notification when passing it onward to spectrum_mr.
This would later allow us to leverage the same code for fib
notifications originating from ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current multicast routing logic in driver assumes it's always meant to
deal with IPv4 multicast routes, leaving several placeholders for
later IPv6 support [currently usually WARN()].
This patch changes the driver's internal multicast route struct into
holding a common mr_mfc instead of the IPv4 mfc_cache.
The various placeholders are grouped into 2:
- Functions that require only the common bits; These remain and the
restriction for IPv4-only is lifted.
- Function that require IPv4-specifics - for handling these functions
we add sets of operations that encapsulate the protocol differences
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A step toward offloading IPv6 routing, this adds an additional
multicast routing table meant for IPv6 [with its underlying TCAM
region] and populates the default rule for IPv6 multicast packets.
Following this, ingress IPv6 multicast packets would be trapped and
delivered to the host CPU.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit c011ec1bbf ("mlxsw: spectrum: Add the multicast routing
offloading logic") spectrum_mr did not populate the protocol portion of
the catcahall_route_params; mr-tcam logic worked correctly for ipv4
since the enum value for MLXSW_SP_L3_PROTO_IPV4 is '0'.
Explicitly fill the protocol as we'll soon need to differentiate between
ipv4 and ipv6.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new fields for the rmft register necessary for setting the IPv6
multicast FIB table. Add a matching wrapper function for filling
the register in the IPv6 scenario.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to what was done in commit 4af5964e58 ("mlxsw: reg:
Configure RIF to forward IPv4 multicast packets by default") we now set
two additional bits to allow IPv6 multicast forwarding.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since ipmr and ip6mr are using the same mr_mfc struct at their core, we
can now refactor the ipmr_cache_{hold,put} logic and apply refcounting
to both ipmr and ip6mr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Like vif notifications, move the notifier struct for MFC as well as its
helpers into a common file; Currently they're only used by ipmr.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prefer the direct use of octal for permissions.
Done with checkpatch -f --types=SYMBOLIC_PERMS --fix-inplace
and some typing.
Miscellanea:
o Whitespace neatening around these conversions.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In net commit 8175f7c4736f ("mlxsw: spectrum: Prevent duplicate
mirrors") we prevented the user from mirroring more than once from a
single binding point (port-direction pair).
The fix was essentially reverted in a merge conflict resolution when net
was merged into net-next. Restore it.
Fixes: 03fe2debbb ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the first element of struct mlxsw_sp_span_parms is a pointer,
to zero-initialize this structure the correct notation is not = {0}, but
rather = {NULL}, as reported by sparse.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update MTU of overlay loopback in accordance with the setting on the
tunnel netdevice.
Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the function so that it can be called without forward declaration
from a function that will be added in a follow-up patch.
Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fun set of conflict resolutions here...
For the mac80211 stuff, these were fortunately just parallel
adds. Trivially resolved.
In drivers/net/phy/phy.c we had a bug fix in 'net' that moved the
function phy_disable_interrupts() earlier in the file, whilst in
'net-next' the phy_error() call from this function was removed.
In net/ipv4/xfrm4_policy.c, David Ahern's changes to remove the
'rt_table_id' member of rtable collided with a bug fix in 'net' that
added a new struct member "rt_mtu_locked" which needs to be copied
over here.
The mlxsw driver conflict consisted of net-next separating
the span code and definitions into separate files, whilst
a 'net' bug fix made some changes to that moved code.
The mlx5 infiniband conflict resolution was quite non-trivial,
the RDMA tree's merge commit was used as a guide here, and
here are their notes:
====================
Due to bug fixes found by the syzkaller bot and taken into the for-rc
branch after development for the 4.17 merge window had already started
being taken into the for-next branch, there were fairly non-trivial
merge issues that would need to be resolved between the for-rc branch
and the for-next branch. This merge resolves those conflicts and
provides a unified base upon which ongoing development for 4.17 can
be based.
Conflicts:
drivers/infiniband/hw/mlx5/main.c - Commit 42cea83f95
(IB/mlx5: Fix cleanup order on unload) added to for-rc and
commit b5ca15ad7e (IB/mlx5: Add proper representors support)
add as part of the devel cycle both needed to modify the
init/de-init functions used by mlx5. To support the new
representors, the new functions added by the cleanup patch
needed to be made non-static, and the init/de-init list
added by the representors patch needed to be modified to
match the init/de-init list changes made by the cleanup
patch.
Updates:
drivers/infiniband/hw/mlx5/mlx5_ib.h - Update function
prototypes added by representors patch to reflect new function
names as changed by cleanup patch
drivers/infiniband/hw/mlx5/ib_rep.c - Update init/de-init
stage list to match new order from cleanup patch
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In 'auto-neg off' the device have sent AN (auto-negotiation) frames
with the forced speed. Thus, fix it using an_disable_admin field in
Port type and speed (PTYS) register. This field indicates if speed
negotiation frames would be send by the port or not.
Add the field and enable/disable it for 'auto-neg on/off', make the
port to start/stop sending AN (auto-negotiation) frames. Note that for
SwitchX2 the behavior doesn't change (i.e support only AN enabled with
forced speed).
Signed-off-by: Tal Bar <talb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This new firmware contains:
- Support for auto-neg disable mode
Signed-off-by: Tal Bar <talb@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
top_hierarchy arg can be determined by comparing parent_resource_id to
DEVLINK_RESOURCE_ID_PARENT_TOP so it does not need to be a separate
argument.
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new ACL group is created its region (ACL) list is initially
empty. Thus, the call to mlxsw_sp_acl_tcam_group_update() would
basically invalidate an already invalid (non-existent) group.
Remove the unnecessary call and make the function symmetric to its del()
counterpart.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver currently creates empty ACL groups, binds them to the
requested port and then fills them with actual ACLs that point to TCAM
regions.
However, empty ACL groups are considered invalid and upcoming firmware
versions are going to forbid their binding.
Work around this limitation by only performing the binding after the
first ACL was added to the group.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to set some of the fields within 'mbox_config_profile',
since they are reserved and capability mask should be set to zero.
Signed-off-by: Tal Bar <talb@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the opcodes don't use in, out or both mboxes. In such cases, the
mbox address is a reserved field and FW expects it to be zero.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 9ffcc3725f ("mlxsw: spectrum: Allow packets to be trapped
from any PG") I fixed a problem where packets could not be trapped to
the CPU due to exceeded shared buffer quotas. The mentioned commit
explains the problem in detail.
The problem was fixed by assigning a minimum quota for the CPU port and
the traffic class used for scheduling traffic to the CPU.
However, commit 117b0dad2d ("mlxsw: Create a different trap group list
for each device") assigned different traffic classes to different
packet types and rendered the fix useless.
Fix the problem by assigning a minimum quota for the CPU port and all
the traffic classes that are currently in use.
Fixes: 117b0dad2d ("mlxsw: Create a different trap group list for each device")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Eddie Shklaer <eddies@mellanox.com>
Tested-by: Eddie Shklaer <eddies@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warnings:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:371:5: warning:
symbol 'mlxsw_sp_kvdl_single_occ_get' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:384:5: warning:
symbol 'mlxsw_sp_kvdl_chunks_occ_get' was not declared. Should it be static?
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:397:5: warning:
symbol 'mlxsw_sp_kvdl_large_chunks_occ_get' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw_spectrum supports offloading of a tc action mirred egress mirror
to a gretap or an ip6gretap netdevice, which necessitates calls to
functions defined in ip_gre, ip6_gre and ip6_tunnel modules. Previously
this was enabled by introducing a hard dependency of MLXSW_SPECTRUM on
NET_IPGRE and IPV6_GRE. However the rest of mlxsw is careful about
picking which modules are absolutely required, and therefore the better
approach is to make mlxsw_spectrum tolerant of absence of one or both of
the GRE flavors.
Hence rework the NET_IPGRE and IPV6_GRE dependencies to just guard
matching modularity, and hide the corresponding code in spectrum_span.c
in an #if IS_ENABLED. Mark mlxsw_sp_span_entry_tunnel_parms_common as
maybe unused, to muffle warnings if neither GRE flavor is selected,
which seems cleaner than introducing a composite #if.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move the function next to the rest of gretap4 functions. Thus the
generic functions shared between gretap4 and gretap6 are in one block at
the beginning, followed by a gretap4 block, followed by a gretap6 block.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to abstract away access to the
ipv6.sysctl.multipath_hash_policy variable, which is not available on
systems compiled without IPv6 support, introduce a wrapper function
ip6_multipath_hash_policy() that falls back to 0 on non-IPv6 systems.
Use this wrapper from mlxsw/spectrum_router instead of a direct
reference.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum ASIC doesn't support mirroring more than once from a single
binding point (which is a port-direction pair). Therefore detect that a
second binding of a given binding point is attempted.
To that end, extend struct mlxsw_sp_span_inspected_port to track whether
a given binding point is bound or not. Extend
mlxsw_sp_span_entry_port_find() to look for ports based on the full
unique key: port number, direction, and boundness.
Besides fixing the overt bug where configured mirrors are not offloaded,
this also fixes a more subtle bug: mlxsw_sp_span_inspected_port_del()
just defers to mlxsw_sp_span_entry_bound_port_find(), and that used to
find the first port with the right number (disregarding the type). Thus
by adding and removing egress and ingress mirrors in the right order,
one could trick the system into believing it has no egress mirrors when
in fact it did have some. That then caused that
mlxsw_sp_span_port_mtu_update() didn't update mirroring buffer when MTU
was changed.
Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For ok GACT action, TERMINATE binding_cmd should be used in action set
passed down to HW.
Fixes: b2925957ec ("mlxsw: spectrum_flower: Offload "ok" termination action")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All of the conflicts were cases of overlapping changes.
In net/core/devlink.c, we have to make care that the
resouce size_params have become a struct member rather
than a pointer to such an object.
Signed-off-by: David S. Miller <davem@davemloft.net>
Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.
Fixes: 56ade8fe3f ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reported-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Shalom Toledo <shalomt@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to 28678f07f1 ("mlxsw: spectrum_router: Update multipath hash
parameters upon netevents") for IPv4, make sure the kernel and asic are
using the same hash algorithm for path selection.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rename NETEVENT_MULTIPATH_HASH_UPDATE to
NETEVENT_IPV4_MPATH_HASH_UPDATE to denote it relates to a change
in the IPv4 hash policy.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mfc_cache and mfc6_cache are almost identical - the main difference is
in the origin/group addresses and comparison-key. Make a common
structure encapsulating most of the multicast routing logic - mr_mfc
and convert both ipmr and ip6mr into using it.
For easy conversion [casting, in this case] mr_mfc has to be the first
field inside every multicast routing abstraction utilizing it.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One of the basic construct in the device is a port-VLAN pair, which can
be bound to a FID or a RIF in order to direct packets to the bridge or
the router, respectively.
Since not all the netdevs are configured with a VLAN (e.g., sw1p1 vs.
sw1p1.10), VID 1 is used to represent these and thus this VID can be
used by both upper devices of mlxsw ports and by the driver itself.
However, this VID is not reference counted and therefore might be freed
prematurely, which can result in various WARNINGs. For example:
$ ip link add name br0 type bridge vlan_filtering 1
$ teamd -t team0 -d -c '{"runner": {"name": "lacp"}}'
$ ip link set dev team0 master br0
$ ip link set dev enp1s0np1 master team0
$ ip address add 192.0.2.1/24 dev enp1s0np1
The enslavement to team0 will fail because team0 already has an upper
and thus vlan_vids_del_by_dev() will be executed as part of team's error
path which will delete VID 1 from enp1s0np1 (added by br0 as PVID). The
WARNING will be generated when the driver will realize it can't find VID
1 on the port and bind it to a RIF.
Fix this by adding a reference count to the VLAN entries on the port, in
a similar fashion to the reference counting used by the corresponding
'vlan_vid_info' structure in the 8021q driver.
Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Reported-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Tal Bar <talb@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multicast snooping is enabled, the Linux bridge resorts to flooding
unregistered multicast packets to all ports only in case it did not
detect a querier in the network.
The above condition is not reflected to underlying drivers, which is
especially problematic in IPv6 environments, as multicast snooping is
enabled by default and since neighbour solicitation packets might be
treated as unregistered multicast packets in case there is no
corresponding MDB entry.
Until the Linux bridge reflects its querier state to underlying drivers,
simply treat unregistered multicast packets as broadcast and allow them
to reach their destination.
Fixes: 9df552ef3e ("mlxsw: spectrum: Improve IPv6 unregistered multicast flooding")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code uses global variables, adjusts them and passes pointer down
to devlink. With every other mlxsw_core instance, the previously passed
pointer values are rewritten. Fix this by de-globalize the variables and
also memcpy size_params during devlink resource registration.
Also, introduce a convenient size_param_init helper.
Fixes: ef3116e540 ("mlxsw: spectrum: Register KVD resources with devlink")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IP_TTL, IP_ECN and IP_DSCP are using the same offset within the
scratchpad as L4 ports. Fix this by shifting all up.
Fixes: 5f57e09091 ("mlxsw: acl: Add ip ttl acl element")
Fixes: i80d0fe4710c ("mlxsw: acl: Add ip tos acl element")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Handle graft command for an offloaded sch_prio.
Grafting a qdisc to any place other than under its original parent is not
supported by mlxsw and will cause the grafted qdisc to stop being
offloaded.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the number the bands of sch_prio is decreased, child qdiscs on the
deleted bands would get deleted as well.
This change and deletions are being done under sch_tree_lock of the
sch_prio qdisc. Part of the destruction of qdisc is unoffloading it, if
it is offloaded. Un-offloading can't be done inside this lock.
Move the offload command to be done before reducing the number of bands,
so unoffloading of the qdiscs that are about to be deleted could be done
outside of the lock.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sch_prio as root qdisc should count all the drops its children have. Since
it is possible for it to have sch_red children, it needs to count RED early
drops.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing a child qdisc its backlog will be decreased from the parent
backlog. The driver backlog count should do the same.
When the parent changes its configuration, the child might need to clean
its stats. However, the backlog can't be cleaned with the rest of the
stats, because it reflects a momentary value that needs to be synced with
the core, not the history of the qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Priority counters count packets according to their packet priority.
Collect the stats for sch_red based on these counters, so the qdisc bstats
will be the sum of counters matching the priorities marked in the qdisc
priomap.
Changing the mapping of the priorities to bands while traffic is running
can result in losing the stats of the bands qdiscs from their last dump
call to this change, as if the qdisc was unoffloaded and re-offloaded. It
will not affect the traffic behaviour according to sch_red.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add priority map per qdisc, to indicate which priorities are being
directed through this qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add TX packets and bytes counters per switch priority per port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the option to set a qdisc per tclass. Match the qdisc to the tclass by
parent ID. Supported currently for sch_red only.
It allows offloading sch_prio as root qdisc and sch_red as its child.
(However, doing so might corrupt the stats for both parent and child.)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similarly to mirror-to-gretap, this enables mirroring to IPv6 gretap
netdevice.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a user requests mirror from a mlxsw physical port (possibly based
on an ACL match) to a gretap netdevice, the driver needs to resolve the
request to a particular physical port that the mirrored packets will
egress through, and a suite of configuration keys (importantly, IP and
MAC addresses). That means calling into routing and neighbor kernel code
to simulate the decisions made by the system for packets passing through
a gretap netdevice.
Add a new instance of mlxsw_sp_span_entry_ops to support this.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The check for whether a mirror port (which is a mlxsw front panel port)
belongs to the same mlxsw instance as the mirrored port, is currently
only done in spectrum_acl, even though it's applicable for the matchall
case as well. Thus move it to mlxsw_sp_span_entry_create().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For some netdevices, for which mlxsw offloads mirroring, may have a
complex relationship between the declared intent and low-level
device configuration.
Trying to accurately track which changes might influence offloading
decisions is finicky and error-prone. Instead, this patch introduces a
function mlxsw_sp_span_entry_respin, which re-queries the configuration
anew and, if different, removes the existing offloads and installs new
ones.
Call this function strategically at event handlers that might influence
the mirroring configuration.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support mirroring to different device types, the functions that
partake in configuring the port analyzer need to be extended to admit
non-trivial SPAN types.
Create a structure where all details of SPAN configuration are kept,
struct mlxsw_sp_span_parms. Also create struct mlxsw_sp_span_entry_ops
to keep per-SPAN-type operations.
Instantiate the latter once for MLXSW_REG_MPAT_SPAN_TYPE_LOCAL_ETH, and
once for a suite of NOP callbacks used for invalidated SPAN entry. Put
the formet as a sole member of a new array mlxsw_sp_span_entry_types,
where all known SPAN types are kept. Introduce a new function,
mlxsw_sp_span_entry_ops(), to look up the right ops suite given a
netdevice.
Change mlxsw_sp_span_mirror_add() to use both parms and ops structures.
Change mlxsw_sp_span_entry_get() and mlxsw_sp_span_entry_create() to
take these as arguments. Modify mlxsw_sp_span_entry_configure() and
mlxsw_sp_span_entry_deconfigure() to dispatch to ops.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the only mirror action supported by mlxsw is mirror to another
mlxsw physical port. Correspondingly, span_entry, which tracks each
mlxsw mirror in the system, currently holds a u8 number of the
destination port.
To extend this system to mirror to gretap and ip6gretap netdevices, have
struct mlxsw_sp_span_entry actually hold the destination netdevice
itself.
This change then trickles down in obvious manner to SPAN module API and
mirror-related interfaces in struct mlxsw_afa_ops.
To prevent use of invalid pointer, NETDEV_UNREGISTER needs to be hooked
and the corresponding SPAN entry invalidated.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Configuring the hardware for encapsulated SPAN involves more code than
the simple mirroring case. Extract the related code to a separate
function to separate it from the rest of SPAN entry creation. Extract
deconfigure as well for symmetry, even though disablement is the same
regardless of SPAN type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is known statically ahead of time which SPAN entry will have which
ID. Just initialize it eagerly in mlxsw_sp_span_init(), don't wait until
the entry is actually created. This simplifies some code in
mlxsw_sp_span_entry_create()
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of removing span_entry by the port number, allow removing by
SPAN id. That simplifies some code right here, and for mirroring to soft
netdevices, avoids problems with netdevice pointer invalidation and
reuse.
Rename mlxsw_sp_span_entry_find() to mlxsw_sp_span_entry_find_by_port()
and keep it--follow-up patches will make use of it.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support encapsulated SPAN, extend mlxsw_reg_mpat_pack() with a field
to set the SPAN type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
MPAT Register is used to query and configure the Switch Port Analyzer
Table. To configure Port Analyzer to encapsulate mirrored packets,
additional fields need to be specified for the MPAT register.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support mirroring to ip6gretap, the SPAN module needs to be able to
decode IPv6 addresses specified at that tunnel.
Extend mlxsw_sp_ipip_netdev_saddr() and mlxsw_sp_ipip_netdev_daddr() to
support IPv6 addresses. To that end, add and publish a support function
mlxsw_sp_ipip_netdev_parms6().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the logic for determining whether a given IPv4/IPv6 address is
all-zeroes from mlxsw_sp_ipip_tunnel_complete to a separate function.
Make that function public within the module.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
gcc warns that 'resource_id' is not initialized if we don't come though
any of the three 'case' statements before:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c: In function 'mlxsw_sp_kvdl_part_init':
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:275:8: error: 'resource_id' may be used uninitialized in this function [-Werror=maybe-uninitialized]
In the current code, that won't happen, but it's more robust to explicitly
handle this by returning a failure from mlxsw_sp_kvdl_part_init.
Fixes: 887839e696 ("mlxsw: spectrum_kvdl: Add support for dynamic partition set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Calculating the number of entries now uses 64-bit arithmetic that
causes a link error on 32-bit architectures:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.o: In function `mlxsw_sp_kvdl_init':
spectrum_kvdl.c:(.text+0x51c): undefined reference to `__aeabi_uldivmod'
We could probably use a 32-bit division here as before, but since this is
not in a performance critical path, div_u64() seems cleaner here.
Fixes: 887839e696 ("mlxsw: spectrum_kvdl: Add support for dynamic partition set")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we only allowed VLAN devices to be put in a VLAN-unaware
bridge, but some users need the ability to enslave physical ports as
well.
This is achieved by mapping the port and VID 1 to the bridge's vFID,
instead of the port and the VID used by the VLAN device.
The above is valid because as long as the port is not enslaved to a
bridge, VID 1 is guaranteed to be configured as PVID and egress
untagged.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for calculating occupancy for separate kvdl parts.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for dynamic partition set via the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The linear part of the KVD memory is sub-divided into multiple parts. This
patch exposes this internal partitions via the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After adding size validation logic into core cleanup is required.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mlxsw replaces (or deletes) a route it removes the offload
indication from the replaced route. This is problematic for IPv4 routes,
as the offload indication is stored in the fib_info which is usually
shared between multiple routes.
Instead of unconditionally clearing the offload indication, only clear
it if no other route is using the fib_info.
Fixes: 3984d1a89f ("mlxsw: spectrum_router: Provide offload indication using nexthop flags")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use NL_SET_ERR_MSG_MOD helper which adds the module name instead
of specifying the prefix each time.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the upcoming work on SPAN, it makes sense to move the current code
to a module of its own. It already has a well-defined API boundary to
the mirror management (which is used from matchall and ACL code). A
couple more functions need to be exported for the functions that
spectrum.c needs to use for MTU handling and subsystem init/fini.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The member ref_count already determines whether a given SPAN entry is
used, and is as easy to use as a dedicated boolean.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ip_tunnel_parm, where GRE and several other tunnel types hold
information, is IPv4-specific. The current router / ipip code in mlxsw
however uses it as if it were generic.
Make it clear that it's not. Rename many functions from _params_ to
_params4_. mlxsw_sp_ipip_parms_saddr() and _daddr() take a proto
argument to dispatch on it. Move the dispatch logic to
mlxsw_sp_ipip_netdev_saddr() and _daddr(), and replace with
single-protocol functions.
In struct mlxsw_sp_ipip_entry, move the "parms" field to a (for the time
being, singleton) union. Update users throughout.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct ip_tunnel_parm, which is used in spectrum_ipip.h, is defined in
if_tunnel.h. However, the former neglects to include the latter.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since mlxsw_sp_fib_create() and mlxsw_sp_mr_table_create()
use ERR_PTR macro to propagate int err through return of a pointer,
the return value is not NULL in case of failure. So if one
of the calls fails, one of vr->fib4, vr->fib6 or vr->mr4_table
is not NULL and mlxsw_sp_vr_is_used wrongly assumes
that vr is in use which leads to crash like following one:
[ 1293.949291] BUG: unable to handle kernel NULL pointer dereference at 00000000000006c9
[ 1293.952729] IP: mlxsw_sp_mr_table_flush+0x15/0x70 [mlxsw_spectrum]
Fix this by using local variables to hold the pointers and set vr->*
only in case everything went fine.
Fixes: 76610ebbde ("mlxsw: spectrum_router: Refactor virtual router handling")
Fixes: a3d9bc506d ("mlxsw: spectrum_router: Extend virtual routers with IPv6 support")
Fixes: d42b0965b1 ("mlxsw: spectrum_router: Add multicast routes notification handling functionality")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver periodically samples all neighbors configured in device
in order to update the kernel regarding their state. When finding
an entry configured in HW that doesn't show in neigh_lookup()
driver logs an error message.
This introduces a race when removing multiple neighbors -
it's possible that a given entry would still be configured in HW
as its removal is still being processed but is already removed
from the kernel's neighbor tables.
Simply remove the error message and gracefully accept such events.
Fixes: c723c735fa ("mlxsw: spectrum_router: Periodically update the kernel's neigh table")
Fixes: 60f040ca11 ("mlxsw: spectrum_router: Periodically dump active IPv6 neighbours")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.
However, this optimization had to be removed in commit a69518cf0b
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.
Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.
To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Next patch will try to optimize the LPM tree and make sure only used
prefix lengths are present, to avoid unnecessary look-ups.
Pass the currently removed FIB node to the unlinking function as its
associated prefix length is a potential candidate for removal from the
tree.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.
Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for mirror action. Only one mirror action can be set per rule.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce extension of mlxsw_afa_ops in order to add/del mirroring and
implement the ops for Spectrum.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extend SPAN API for ACL case. In case of ACL triggering the MPAR register
shouldn't be configured. This patch also export those helpers for
ACL usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The patch extends the trap action for mirroring.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, the caller of mlxsw_afa_block_append_counter needed to allocate
counter index by hand. Benefit from the previously introduced resource
infra and counter_index_get/put callbacks, and allocate the counter
index in place where it is needed, inside the action append function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the resource list needs to be used also for other entries different
to fwd_entry_ref, make the list generic. For that purpose, introduce a
resource structure with couple of helpers that the code which need to
store a per-block resource should use.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce extension of mlxsw_afa_ops in order to get/put counter indexes
and implement the ops for Spectrum.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
During initialization the driver checks whether the flashed FW image
suits its requirements by checking that it's sufficiently new.
However, there's only a weak backward compatibility scheme that is
actually guaranteed by the FW, so driver must also upper bound the
version to prevent compatibility issues between current driver and some
possible future fw.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_kvdl.c:289:5: warning:
symbol 'mlxsw_sp_kvdl_part_occ' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new LPM tree is created, we try to replace the trees in the
existing virtual routers with it. If we fail, the tree needs to be
freed.
Currently, this does not happen in the unlikely case where we fail to
bind the tree to the first virtual router, since its reference count
never transitions from 1 to 0.
Fix that by taking a reference before binding the tree.
Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to convert from mlxsw_sp_port to net_device and back again.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from the prepared TC and in-driver ACL infrastructure and
introduce block sharing offload. For that, a new struct "block" is
introduced in spectrum_acl in order to hold a list of specific
block-port bindings.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead, pass netdev and ingress flag to ruleset unbind op.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to prepare for follow-up changes, make the bind/unbind helpers
very simple. That required move of ht insertion/removal and bind/unbind
calls into mlxsw_sp_acl_ruleset_create/destroy.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 0dfb33a0d7 ("sch_red: report backlog information") copied
child's backlog into RED's backlog. Back then RED did not maintain
its own backlog counts. This has changed after commit 2ccccf5fb4
("net_sched: update hierarchical backlog too") and commit d7f4f332f0
("sch_red: update backlog as well"). Copying is no longer necessary.
Tested:
$ tc -s qdisc show dev veth0
qdisc red 1: root refcnt 2 limit 400000b min 30000b max 30000b ecn
Sent 20942 bytes 221 pkt (dropped 0, overlimits 0 requeues 0)
backlog 1260b 14p requeues 14
marked 0 early 0 pdrop 0 other 0
qdisc tbf 2: parent 1: rate 1Kbit burst 15000b lat 3585.0s
Sent 20942 bytes 221 pkt (dropped 0, overlimits 138 requeues 0)
backlog 1260b 14p requeues 14
Recently RED offload was added. We need to make sure drivers don't
depend on resetting the stats. This means backlog should be treated
like any other statistic:
total_stat = new_hw_stat - prev_hw_stat;
Adjust mlxsw.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Nogah Frankel <nogahf@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes the following sparse warning:
drivers/net/ethernet/mellanox/mlxsw/spectrum_qdisc.c:464:1: warning:
symbol 'mlxsw_sp_qdisc_prio_unoffload' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for hot reload. First, all the driver/core resources are
released but the PCI and devlink instances, then reset is performed
through the PCI interface. Finally the driver performs initialization.
In case of reload failure the driver is left in a partially initialized
state. Special care is taken during the driver removal in order to
properly handle this state.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now the KVD partition was static. This patch introduces the
ability to get the resource sizes via devlink. In case the resource is not
available the default configuration is used.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for getting the kvdl occupancy through the resource interface.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Connect current dpipe tables to resources. The tables are connected
in the following fashion:
1. IPv4 host -> KVD hash single
2. IPv6 host -> KVD hash double
3. Adjacency -> KVD linear
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Register the KVD resources with devlink. The KVD is a memory resource
which is subdivided into three partitions which are the linear, hash
single and hash double.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is a preparation stage before introducing hot reload. During the
reload process the ASIC should be resetted by accessing the PCI BAR due
to unavailability of the mailbox/emad interfaces.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support basic stats for PRIO qdisc, which includes tx packets and bytes
count, drops count and backlog size. The rest of the stats are irrelevant
for this qdisc offload.
Since backlog is not only incremental but reflecting momentary value, in
case of a qdisc that stops being offloaded but is not destroyed, backlog
value needs to be updated about the un-offloading.
For that reason an unoffload function is being added to the ops struct.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for offloading PRIO qdisc as root qdisc.
The support is for up to 8 bands.
Routed packets priority is determined by the DSCP field with the default
translations. Bridged packets priority is determined by the PCP field, if
exist, otherwise it is set to 0.
Since both options have only priorities 0-7, higher priorities mapping are
being ignored.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When routing ip packets, the kernel is setting the SKB's priority
based on the tos field of the packet.
Imitate this behavior in the mlxsw router, having the internal
switch priority of a routed packet determined according to its DS
field.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add rdpm definition - router DSCP to priority mapping register.
This register will be utilized later to align the default mapping between
packet DSCP and switch-priority to the kernel's mapping between
packet priority and skb priority.
This is the first non-bit indexed register where the entries are arranged
in descending order, i.e., entry at offset 0 matches configuration for
dscp[63]. As a result, the item's step is converted into a signed variable
to support descending arrays [where step would be negative].
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit eb789980d0 ("mlxsw: spectrum_router: Populate adjacency
entries according to weights") the driver includes support for
non-equal-cost multipath, but IPv4 nexthops were the only user.
Now that the kernel supports weighted IPv6 nexthops, we can extend the
driver to support it as well.
This is done by assigning each nexthop its configured weight, so that it
will be populated accordingly in the device's adjacency table. The
`weight` parameter is also taken into account when comparing nexthop
groups in order not to consolidate non-identical groups.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
BPF alignment tests got a conflict because the registers
are output as Rn_w instead of just Rn in net-next, and
in net a fixup for a testcase prohibits logical operations
on pointers before using them.
Also, we should attempt to patch BPF call args if JIT always on is
enabled. Instead, if we fail to JIT the subprogs we should pass
an error back up and fail immediately.
Signed-off-by: David S. Miller <davem@davemloft.net>
If a qdisc is being replaced by another qdisc of the same type, it can
simply override over its configuration.
However, if it replaces a qdisc of another type, it needs to be removed
before setting the new qdisc.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Create a generic qdisc replace function.
For that goal, add three functions to the qdisc ops struct:
* check_params: Checks if the given parameters are offloadable.
* replace: Offload the given parameters.
* clean_stats: clean the qdisc stats for the offloaded qdisc.
integrate RED offloading into using the new internal replace API.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a destroy function to the qdiscs ops struct.
Create a generic qdisc destroy function, that clears the qdisc metadata as
well as calling the specific qdisc destroy function.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Qdisc struct have the Qdisc_class_ops struct.
This patch introduces the similar ops struct for the mlxsw_sp_qdisc_ops
struct. It allows better readability as well as code reusability for the
common parts of some functions like destroy.
The first operations to be added are the statistics getters.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Every qdisc op gets the qdisc handle ID as well as its location. Each one
of them, beside replace, checks if the handle doesn't match the qdisc in
the given location, and if so, it returns without running the actual op.
Unite these checks to one comparison function and avoid sending the handle
id to these ops.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tclass number is needed for most of the operations related to the qdisc in
the driver. Create a field for it in the mlxsw_sp_qdisc instead of passing
it to every function as parameter.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Improve readability by changing the clean stats function to handle only
RED. Qdiscs that will be offloaded in the future will have a clean stats
function of their own.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Clean RED offloaded stats and make them more generic by breaking the
generic qdisc stats to a struct of their own.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the value of the xstats requested from the driver for offloaded RED
to be incremental, like the normal stats.
It increases consistency - if a qdisc stops being offloaded its xstats
don't change.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the name of the stats struct to be generic, so it could be used for
other qdisc offload, that will be added in the next patches.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move all the qdisc related data from the spectrum.h to spectrum_qdisc.c.
Create an init and fini functions for the qdiscs.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Resolve the sparse warning:
"sparse: Variable length array is used."
Use 2 arrays for 2 PRM register accesses.
Fixes: 96f17e0776 ("mlxsw: spectrum: Support RED qdisc offload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After performing reset driver polls on HW indication until learning
that the reset is done, but immediately after reset the device becomes
unresponsive which might lead to completion timeout on the first read.
Wait for 100ms before starting the polling.
Fixes: 233fa44bd6 ("mlxsw: pci: Implement reset done check")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that
have uppers") the driver forbids enslavement to netdevs that already
have uppers of their own, as this can result in various ordering
problems.
This requirement proved to be too strict for some users who need to be
able to enslave ports to a bridge that already has uppers. In this case,
we can allow the enslavement if the bridge is already known to us, as
any configuration performed on top of the bridge was already reflected
to the device.
Fixes: 25cc72a338 ("mlxsw: spectrum: Forbid linking to devices that have uppers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When we remove the neighbour associated with a nexthop we should always
refuse to write the nexthop to the adjacency table. Regardless if it is
already present in the table or not.
Otherwise, we risk dereferencing the NULL pointer that was set instead
of the neighbour.
Fixes: a7ff87acd9 ("mlxsw: spectrum_router: Implement next-hop routing")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of overlapping changes. Also on the net-next side
the XDP state management is handled more in the generic
layers so undo the 'net' nfp fix which isn't applicable
in net-next.
Include a necessary change by Jakub Kicinski, with log message:
====================
cls_bpf no longer takes care of offload tracking. Make sure
netdevsim performs necessary checks. This fixes a warning
caused by TC trying to remove a filter it has not added.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This reverts commit 63dd00fa3e.
RAUHT DELETE_ALL seems to trigger a bug in FW. That manifests by later
calls to RAUHT ADD of an IPv6 neighbor to fail with "bad parameter"
error code.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Fixes: 63dd00fa3e ("mlxsw: spectrum_router: Add batch neighbour deletion")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Three sets of overlapping changes, two in the packet scheduler
and one in the meson-gxl PHY driver.
Signed-off-by: David S. Miller <davem@davemloft.net>
Learning is currently enabled for ports which are OVS slaves -
even though OVS doesn't need this indication.
Since we're not associating a fid with the port, HW would continuously
notify driver of learned [& aged] MACs which would be logged as errors.
Fixes: 2b94e58df5 ("mlxsw: spectrum: Allow ports to work under OVS master")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, whenever the NETIF_F_HW_TC feature changes, we silently
always allow it, but we actually do not disable the flows in HW
on disable. That breaks user's expectations. So just forbid
the feature disable in case there are any filters offloaded.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcfm_dev always points to the correct netdev and we already
hold a refcnt, so no need to use tcfm_ifindex to lookup again.
If we would support moving target netdev across netns, using
pointer would be better than ifindex.
This also fixes dumping obsolete ifindex, now after the
target device is gone we just dump 0 as ifindex.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mlxsw_sp_nexthop_rif_update() walks the list of nexthops
associated with a RIF, and updates the corresponding entries in the
switch. It is used in particular when a tunnel underlay netdevice moves
to a different VRF, and all the nexthops are migrated over to a new RIF.
The problem is that each nexthop holds a reference to its RIF, and that
is not updated. So after the old RIF is gone, further activity on these
nexthops (such as downing the underlay netdevice) dereferences a
dangling pointer.
Fix the issue by updating rif of impacted nexthops before calling
mlxsw_sp_nexthop_rif_update().
Fixes: 0c5f1cd5ba ("mlxsw: spectrum_router: Generalize __mlxsw_sp_ipip_entry_update_tunnel()")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some tunnels that are offloadable on their own can nonetheless be
demoted to slow path if their local address is in conflict with that of
another tunnel. When a route is formed for such a tunnel,
mlxsw_sp_nexthop_ipip_init() fails to find the corresponding IPIP entry,
and that triggers a FIB abort.
Resolve the problem by not assuming that a tunnel for which
mlxsw_sp_ipip_ops.can_offload() holds also automatically has an IPIP
entry.
Fixes: af641713e9 ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw driver currently doesn't offload GRE tunnels if they have the
same local address and use the same underlay VRF. When such a situation
arises, the tunnels in conflict are demoted to slow path.
However, the current code only verifies this condition on tunnel
creation and tunnel change, not when a tunnel is moved to a different
VRF. When the tunnel has no bound device, underlay and overlay are the
same. Thus moving a tunnel moves the underlay as well, and that can
cause local address conflict.
So modify mlxsw_sp_netdevice_ipip_ol_vrf_event() to check if there are
any conflicting tunnels, and demote them if yes.
Fixes: af641713e9 ("mlxsw: spectrum_router: Onload conflicting tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a new local route is added, an IPIP entry is looked up to determine
whether the route should be offloaded as a tunnel decap or as a trap.
That decision should take into account whether the tunnel netdevice in
question is actually IFF_UP, and only install a decap offload if it is.
Fixes: 0063587d35 ("mlxsw: spectrum: Support decap-only IP-in-IP tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On some systems, when we unsplit a port we need to re-create two ports
instead. On other systems, only one needs to be re-created.
Do not try to create a port if during driver initialization it was
assigned a negative module number, which is invalid.
This avoids the following error during unsplit:
[ 941.012478] mlxsw_spectrum 0000:01:00.0: Port 43: Failed to map module
The error is harmless and caused by the fact that a local port is
already mapped to module 0.
Fixes: be94535f95 ("mlxsw: spectrum: Make split flow match firmware requirements")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 4a3c67a6e7 ("mlxsw: spectrum_router: Don't batch neighbour
deletion") I removed the support for batch deletion of neighbours on a
router interface (RIF) since at that time the firmware did not support
it for IPv6 neighbours.
This is now supported by the version enforced by the driver, so there is
no reason to delete neighbours one by one anymore.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to return a negative error code from the VID create error handling
case instead of 0, as done elsewhere in this function.
Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_QDISC_STATS. This call updates the generic qdisc stats from the
cache if the handle ID that is asked for matching the root qdisc ID and
fails otherwise.
Currently doesn't support qlen and rqueues.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_RED_XSTATS. This call returns the RED qdisc xstats from the cache
if the handle ID that is asked for matching the root qdisc ID and fails
otherwise.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add more statistics to be collected from the HW periodically. These stats
are tclass based (beside ECN marked packet, that exist only port based).
They are needed to expose RED qdisc stats and xstats correctly.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds the counter group definitions for 2 new counter groups
which are necessary for gaining ECN & wred counters.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED.
This call sets RED qdisc on a traffic class.
This patch supports RED qdisc only as a root qdisc and set in on the
default tclass. It can be set with or without ECN.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds 2 new registers:
- Congestion WRED ECN TClass Profile Register [CWTP]
- Congestion WRED ECN TClass and Pool Mapping Register [CWTPM]
These registers would later be needed to offload RED-related
functionality to the HW.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Files removed in 'net-next' had their license header updated
in 'net'. We take the remove from 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the bound device of a tunnel device is down, encapsulated packets
are not egressed anymore, but tunnel decap still works. Extend
mlxsw_sp_nexthop_rif_update() to take IFF_UP into consideration when
deciding whether a given next hop should be offloaded.
Because the new logic was added to mlxsw_sp_nexthop_rif_update(), this
fixes the case where a newly-added tunnel has a down bound device, which
would previously be fully offloaded. Now the down state of the bound
device is noted and next hops forwarding to such tunnel are not
offloaded.
In addition to that, notice NETDEV_UP and NETDEV_DOWN of a bound device
to force refresh of tunnel encap route offloads.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bound device of an IP-in-IP tunnel changes, such as through
'ip tunnel change name $name dev $dev', the loopback backing the tunnel
needs to be recreated.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Changes to L3 tunnel netdevices (through `ip tunnel change' as well as
`ip link set') lead to NETDEV_CHANGE being generated on the tunnel
device. Because what is relevant for the tunnel in question depends on
the tunnel type, handling of the event is dispatched to the IPIP module
through a newly-added interface mlxsw_sp_ipip_ops.ol_netdev_change().
IPIP tunnels now remember the last set of tunnel parameters in struct
mlxsw_sp_ipip_entry.parms, and use it to figure out what exactly has
changed.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a bound device of a tunnel netdevice changes VRF, the loopback RIF
that backs the tunnel needs to be updated and existing encapsulating
routes need to be refreshed.
Note that several tunnels can share the same bound device, in which case
all the impacted tunnels need to be updated.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The approach for offloading IP tunnels implemented currently by mlxsw
doesn't allow two tunnels that have the same local IP address in the
same (underlay) VRF. Previously, offloads were introduced on demand as
encap routes were formed. When such a route was created that would cause
offload of a conflicting tunnel, mlxsw_sp_ipip_entry_create() would
detect it and return -EEXIST, which would propagate up and cause FIB
abort.
Now however IPIP entries are created as soon as an offloadable netdevice
is created, and the failure prevents creation of such device.
Furthermore, if the driver is installed at the point where such
conflicting tunnels exist, the failure actually prevents successful
modprobe.
Furthermore, follow-up patches implement handling of NETDEV_CHANGE due
to the local address change. However, NETDEV_CHANGE can't be vetoed. The
failure merely means that the offloads weren't updated, but the change
in Linux configuration is not rolled back. It is thus desirable to have
a robust way of handling these conflicts, which can later be reused for
handling NETDEV_CHANGE as well.
To fix this, when a conflicting tunnel is created, instead of failing,
simply pull the old tunnel to slow path and reject offloading the
new one.
Introduce two functions: mlxsw_sp_ipip_entry_demote_tunnel() and
mlxsw_sp_ipip_demote_tunnel_by_saddr() to handle this. Make them both
public, because they will be useful later on in this patchset.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When trying to determine whether there are other offloaded tunnels with
the same local address, mlxsw_sp_ipip_entry_create() should look for a
tunnel with matching UL protocol, matching saddr, in the same VRF.
However instead of taking into account the UL protocol of the tunnel
netdevice (which mlxsw_sp_ipip_entry_saddr_matches() then compares to
the UL protocol of inspected IPIP entry), it deduces the UL protocol
from the inspected IPIP entry (and that's compared to itself).
This is currently immaterial, because only one tunnel type is offloaded,
and therefore the UL protocol always matches, but introducing support
for a tunnel with IPv6 underlay would uncover this error.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work that needs to be done to update HW configuration in response to
changes is similar to what __mlxsw_sp_ipip_entry_update_tunnel() already
does, but with a number of twists: each change requires a different
subset of things to happen. Extend the function to support all these
uses, and allow finely-grained configuration of what should happen at
each call through a suite of function arguments.
Publish the updated function to allow use from the spectrum_ipip module.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The work that's done by mlxsw_sp_netdevice_ipip_ol_vrf_event() is a good
basis for a more versatile function that would take care of all sorts of
tunnel updates requests: __mlxsw_sp_ipip_entry_update_tunnel(). Extract
that function. Factor out a helper mlxsw_sp_ipip_entry_ol_lb_update() as
well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mlxsw_sp_rif_create() takes an extack parameter. So far,
for creation of loopback interfaces, NULL was passed. For some events
however the extack can be extracted and passed along. So do that for
NETDEV_CHANGEUPPER handler.
Use the opportunity to update the type of info argument that
mlxsw_sp_netdevice_ipip_ol_event() takes. Follow-up patches will
introduce handling of more changes, and some of them carry an extack as
well, but in an info structure of a different type. Though not strictly
erroneous (the pointer could be cast whichever way), it makes no sense
to pretend the value is always of a certain type, when in fact it isn't.
So change the prototype of the above-mentioned function as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The piece of logic to promote decap route, if any, is useful for generic
tunnel updates, not just for handling of NETDEV_UP events on tunnel
interfaces. Extract it to a separate function.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This function only ever returns 0, so don't pretend it returns anything
useful and just make it void.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To implement NETDEV_CHANGE notifications on IP-in-IP tunnels, the
handler needs to figure out what actually changed, to understand how
exactly to update the offloads. It will do so by storing struct
ip_tunnel_parm with previous configuration, and comparing that to the
new version.
To facilitate these comparisons, extract the code that operates on
struct ip_tunnel_parm from the existing accessor functions, and make
those a thin wrapper that extracts tunnel parameters and dispatches.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These functions ideologically belong to the IPIP module, and some
follow-up work will benefit from their presence there.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some of the code down the road needs this logic as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To distinguish between events related to tunnel device itself and its
bound device, rename a number of functions related to handling tunneling
netdevice events to include _ol_ (for "overlay") in the name. That
leaves room in the namespace for underlay-related functions, which would
have _ul_ in the name.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure the device and the kernel are performing the multipath hash
according to the same parameters by updating the device whenever the
relevant netevent is generated.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now we used the hardware's defaults for multipath hash
computation. This patch aligns the hardware's multipath parameters with
the kernel's.
For IPv4 packets, the parameters are determined according to the
'fib_multipath_hash_policy' sysctl during module initialization. In case
L3-mode is requested, only the source and destination IP addresses are
used. There is no special handling of ICMP error packets.
In case L4-mode is requested, a 5-tuple is used: source and destination
IP addresses, source and destination ports and IP protocol. Note that
the layer 4 fields are not considered for fragmented packets.
For IPv6 packets, the source and destination IP addresses are used, as
well as the flow label and the next header fields.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RECRv2 register is used for setting up the router's ECMP hash
configuration.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The struct containing the work item queued from the netevent handler is
named after the only event it is currently used for, which is neighbour
updates.
Use a more appropriate name for the struct, as we are going to use it
for more events.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We are going to need to respond to netevents notifying us about
multipath hash updates by configuring the device's hash parameters.
Embed the netevent notifier in the router struct so that we could
retrieve it upon notifications and use it to configure the device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
6dVh26uchcEQLN/XqUDt
=x306
-----END PGP SIGNATURE-----
Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull initial SPDX identifiers from Greg KH:
"License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the
'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
binding shorthand, which can be used instead of the full boiler plate
text.
This patch is based on work done by Thomas Gleixner and Kate Stewart
and Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset
of the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to
license had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied
to a file was done in a spreadsheet of side by side results from of
the output of two independent scanners (ScanCode & Windriver)
producing SPDX tag:value files created by Philippe Ombredanne.
Philippe prepared the base worksheet, and did an initial spot review
of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537
files assessed. Kate Stewart did a file by file comparison of the
scanner results in the spreadsheet to determine which SPDX license
identifier(s) to be applied to the file. She confirmed any
determination that was not immediately clear with lawyers working with
the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained
>5 lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that
became the concluded license(s).
- when there was disagreement between the two scanners (one detected
a license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply
(and which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases,
confirmation by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.
The Windriver scanner is based on an older version of FOSSology in
part, so they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot
checks in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect
the correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial
patch version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch
license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
applied SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
License cleanup: add SPDX license identifier to uapi header files with a license
License cleanup: add SPDX license identifier to uapi header files with no license
License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.
By default all files without license information are under the default
license of the kernel, which is GPL version 2.
Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier. The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.
This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.
How this work was done:
Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
- file had no licensing information it it.
- file was a */uapi/* one with no licensing information in it,
- file was a */uapi/* one with existing licensing information,
Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.
The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne. Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.
The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed. Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.
Criteria used to select files for SPDX license identifier tagging was:
- Files considered eligible had to be source code files.
- Make and config files were included as candidates if they contained >5
lines of source
- File already had some variant of a license header in it (even if <5
lines).
All documentation files were explicitly excluded.
The following heuristics were used to determine which SPDX license
identifiers to apply.
- when both scanners couldn't find any license traces, file was
considered to have no license information in it, and the top level
COPYING file license applied.
For non */uapi/* files that summary was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 11139
and resulted in the first patch in this series.
If that file was a */uapi/* path one, it was "GPL-2.0 WITH
Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was:
SPDX license identifier # files
---------------------------------------------------|-------
GPL-2.0 WITH Linux-syscall-note 930
and resulted in the second patch in this series.
- if a file had some form of licensing information in it, and was one
of the */uapi/* ones, it was denoted with the Linux-syscall-note if
any GPL family license was found in the file or had no licensing in
it (per prior point). Results summary:
SPDX license identifier # files
---------------------------------------------------|------
GPL-2.0 WITH Linux-syscall-note 270
GPL-2.0+ WITH Linux-syscall-note 169
((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
LGPL-2.1+ WITH Linux-syscall-note 15
GPL-1.0+ WITH Linux-syscall-note 14
((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
LGPL-2.0+ WITH Linux-syscall-note 4
LGPL-2.1 WITH Linux-syscall-note 3
((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1
and that resulted in the third patch in this series.
- when the two scanners agreed on the detected license(s), that became
the concluded license(s).
- when there was disagreement between the two scanners (one detected a
license but the other didn't, or they both detected different
licenses) a manual inspection of the file occurred.
- In most cases a manual inspection of the information in the file
resulted in a clear resolution of the license that should apply (and
which scanner probably needed to revisit its heuristics).
- When it was not immediately clear, the license identifier was
confirmed with lawyers working with the Linux Foundation.
- If there was any question as to the appropriate license identifier,
the file was flagged for further research and to be revisited later
in time.
In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.
Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights. The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.
Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.
In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.
Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
- a full scancode scan run, collecting the matched texts, detected
license ids and scores
- reviewing anything where there was a license detected (about 500+
files) to ensure that the applied SPDX license was correct
- reviewing anything where there was no detection but the patch license
was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
SPDX license was correct
This produced a worksheet with 20 files needing minor correction. This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.
These .csv files were then reviewed by Greg. Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected. This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.) Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This restores the original behaviour before the block callbacks were
introduced. Allow the drivers to do binding of block always, no matter
if the NETIF_F_HW_TC feature is on or off. Move the check to the block
callback which is called for rule insertion.
Reported-by: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smooth Cong Wang's bug fix into 'net-next'. Basically put
the bulk of the tcf_block_put() logic from 'net' into
tcf_block_put_ext(), but after the offload unbind.
Signed-off-by: David S. Miller <davem@davemloft.net>
It fixes a problem for the last chunk where 'chunk_size' is smaller than
MLXSW_I2C_BLK_MAX and data is copied to the wrong offset, overriding
previous data.
Fixes: 6882b0aee1 ("mlxsw: Introduce support for I2C bus")
Signed-off-by: Vadim Pasternak <vadimp@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ASIC has the ability to generate events whenever a sensor indicates
the temperature goes above or below its high or low thresholds,
respectively.
In new firmware versions the firmware enforces a minimum of 5
degrees Celsius difference between both thresholds. Make the driver
conform to this requirement.
Note that this is required even when the events are disabled, as in
certain systems interrupts are generated via GPIO based on these
thresholds.
Fixes: 85926f8770 ("mlxsw: reg: Add definition of temperature management registers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Adding a FIB rule on a spectrum platform silently aborts FIB offload:
$ ip ru add pref 99 from all to 192.168.1.1 table 10
$ dmesg -c
[ 623.144736] mlxsw_spectrum 0000:03:00.0: FIB abort triggered. Note that FIB entries are no longer being offloaded to this device.
This patch reworks FIB rule handling to return a message to the user:
$ ip ru add pref 99 from all to 8.8.8.8 table 11
Error: spectrum: FIB rules not supported. Aborting offload.
spectrum currently only checks whether the fib rule is a default rule or
an l3mdev rule, both of which it knows how to handle. Any other it aborts
FIB offload. Move the processing to check the rule type inline with the
user request. If the rule is an unsupported one, then a work queue entry
is used to abort the offload. Change the rule delete handling to just
return since it does nothing at the moment.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace recurring magic number in PPCNT register with a define.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the HW stats cache to be local. Rename it for better clarity.
It holds the results of the last result of HW stats that are being read
periodically, in order to have answer for stats request immediately.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the spectrum_mr_tcam.c include the spectrum_mr_tcam.h header file.
Cleans up sparse warning:
symbol 'mlxsw_sp_mr_tcam_ops' was not declared. Should it be static?
Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function is only used internally in spectrum_mr.c and is not declared
in the header file, thus make it static.
Cleans up sparse warning:
symbol 'mlxsw_sp_mr_dev_vif_lookup' was not declared. Should it be static?
Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix various endianness issues in comparisons and assignments. The fix is
entirely cosmetic as all the values fixed are endianness-agnostic.
Cleans up sparse warnings:
spectrum_mr.c:156:49: warning: restricted __be32 degrades to integer
spectrum_mr.c:206:26: warning: restricted __be32 degrades to integer
spectrum_mr.c:212:31: warning: incorrect type in assignment (different
base types)
spectrum_mr.c:212:31: expected restricted __be32 [usertype] addr4
spectrum_mr.c:212:31: got unsigned int
spectrum_mr.c:214:32: warning: incorrect type in assignment (different
base types)
spectrum_mr.c:214:32: expected restricted __be32 [usertype] addr4
spectrum_mr.c:214:32: got unsigned int
spectrum_mr.c:461:16: warning: restricted __be32 degrades to integer
spectrum_mr.c:461:49: warning: restricted __be32 degrades to integer
Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the dump the per netlink packet entry counter should be zeroed out
when new packet is created.
Fixes: 190d38a52a ("mlxsw: spectrum_dpipe: Add support for adjacency table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reported-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The KVD linear is currently partitioned into two partitions. One for
single entries and another for groups of 32 entries.
Add another partition consisting of groups of 512 entries which will
allow us to more accurately represent the nexthop weights in non-equal
cost multi-path routing.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The memory region where adjacency entries (nexthops) are stored is
called the KVD linear and is configured during initialization with a
size of 64K.
Extend this area with 32K more entries, that will be partitioned into 64
groups of 0.5K entries, thereby allowing us to support weighted nexthops
with high accuracy.
Change the ratio between both types of hash entries, so as to prevent
reduction in the number of double hash entries, which are used for IPv6
neighbours and routes with a prefix length greater than 64.
Note that the user will be able to control all these sizes once the
devlink resource manager is introduced.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now the driver assumed all the nexthops have an equal weight
and wrote each to a single adjacency entry.
This patch takes the `weight` parameter into account and populates the
adjacency group according to the relative weight of each nexthop.
Specifically, the weights of all the nexthops that should be offloaded
are first normalized and then used to calculate the upper adjacency
index of each nexthop. This is done according to the hash-threshold
algorithm used by the kernel for IPv4 multi-path routing.
Adjacency groups are currently limited to 32 entries which limits the
weights that can be used, but follow-up patches will introduce groups of
512 entries.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The device has certain restrictions regarding the size of an adjacency
group.
Have the router determine the size of the adjacency group according to
available KVDL allocation sizes and these restrictions.
This was not needed until now since only allocations of up 32 entries
were supported and these are all valid sizes for an adjacency group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the first step towards non-equal-cost multi-path support, store each
nexthop's weight.
For IPv6 nexthops always set the weight to 1, as it only supports ECMP.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current KVDL allocation API allows the user to specify the requested
number of entries, but the user has no way of knowing how many entries
were actually allocated.
This works because existing users (e.g., router) request the exact
number they end up using. With the introduction of large adjacency
groups, this will change, as the router will have the ability to choose
from several allocation sizes, where larger allocations provide higher
accuracy with respect to requested weights and better resilience against
nexthop failures.
One option is to have the router try several allocations of descending
size until one succeeds, but a better way is to simply allow it to query
the actual allocation size and then size its request accordingly.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The KVD linear (KVDL) allocator currently consists of a very large
bitmap that reflects the KVDL's usage. The boundaries of each partition
as well as their allocation size are represented using defines.
This representation requires us to patch all the functions that act on a
partition whenever the partitioning scheme is changed. In addition, it
does not enable the dynamic configuration of the KVDL using the
up-coming resource manager.
Add objects to represent these partitions as well as the accompanying
code that acts on them to perform allocations and de-allocations.
In the following patches, this will allow us to easily add another
partition as well as new operations to act on these partitions.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The adjacency group size is part of the match on the adjacency group and
should therefore be exposed using dpipe.
When non-equal-cost multi-path support will be introduced, the group's
size will help users understand the exact number of adjacency entries
each nexthop occupies, as a nexthop will no longer correspond to a
single entry.
The output for a multi-path route with two nexthops, one with weight 255
and the second 1 will be:
Example:
$ devlink dpipe table dump pci/0000:01:00.0 name mlxsw_adj
pci/0000:01:00.0:
index 0
match_value:
type field_exact header mlxsw_meta field adj_index value 65536
type field_exact header mlxsw_meta field adj_size value 512
type field_exact header mlxsw_meta field adj_hash_index value 0
action_value:
type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:64
type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 3 value 1
index 1
match_value:
type field_exact header mlxsw_meta field adj_index value 65536
type field_exact header mlxsw_meta field adj_size value 512
type field_exact header mlxsw_meta field adj_hash_index value 510
action_value:
type field_modify header ethernet field destination mac value e4:1d:2d:a5:f3:65
type field_modify header mlxsw_meta field erif_port mapping ifindex mapping_value 4 value 2
Thus, the first nexthop occupies 510 adjacency entries and the second 2,
which leads to a ratio of 255 to 1.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There were quite a few overlapping sets of changes here.
Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.
Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly. If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.
In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().
Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.
The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum tunnels do not default to ttl of "inherit" like the Linux ones
do. Configure TIGCR on router init so that the TTL of tunnel packets is
copied from the overlay packets.
Fixes: ee954d1a91 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The TIGCR register is used for setting up the IPinIP Tunnel
configuration.
Fixes: ee954d1a91 ("mlxsw: spectrum_router: Support GRE tunnels")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All drivers are converted to use block callbacks for TC_SETUP_CLS*.
So it is now safe to remove the calls to ndo_setup_tc from cls_*
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Benefit from the newly introduced block callback infrastructure and
convert ndo_setup_tc calls for matchall and flower offloads to block
callbacks.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use container_of to convert the generic fib_notifier_info into
the event specific data structure.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add extack argument down to mlxsw_sp_rif_create and mlxsw_sp_vr_create
to set an error message on RIF or VR overflow. Now on overflow of
either resource the user gets an informative message as opposed to
failing with EBUSY.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for inetaddr_validator and inet6addr_validator. The
notifiers provide a means for validating ipv4 and ipv6 addresses
before the addresses are installed and on failure the error
is propagated back to the user.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When an EMAD is transmitted, a timeout work item is scheduled with a
delay of 200ms, so that another EMAD will be retried until a maximum of
five retries.
In certain situations, it's possible for the function waiting on the
EMAD to be associated with a work item that is queued on the same
workqueue (`mlxsw_core`) as the timeout work item. This results in
flushing a work item on the same workqueue.
According to commit e159489baa ("workqueue: relax lockdep annotation
on flush_work()") the above may lead to a deadlock in case the workqueue
has only one worker active or if the system in under memory pressure and
the rescue worker is in use. The latter explains the very rare and
random nature of the lockdep splats we have been seeing:
[ 52.730240] ============================================
[ 52.736179] WARNING: possible recursive locking detected
[ 52.742119] 4.14.0-rc3jiri+ #4 Not tainted
[ 52.746697] --------------------------------------------
[ 52.752635] kworker/1:3/599 is trying to acquire lock:
[ 52.758378] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c4fa4>] flush_work+0x3a4/0x5e0
[ 52.767837]
but task is already holding lock:
[ 52.774360] (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.784495]
other info that might help us debug this:
[ 52.791794] Possible unsafe locking scenario:
[ 52.798413] CPU0
[ 52.801144] ----
[ 52.803875] lock(mlxsw_core_driver_name);
[ 52.808556] lock(mlxsw_core_driver_name);
[ 52.813236]
*** DEADLOCK ***
[ 52.819857] May be due to missing lock nesting notation
[ 52.827450] 3 locks held by kworker/1:3/599:
[ 52.832221] #0: (mlxsw_core_driver_name){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.842846] #1: ((&(&bridge->fdb_notify.dw)->work)){+.+.}, at: [<ffffffff811c65c4>] process_one_work+0x7d4/0x12f0
[ 52.854537] #2: (rtnl_mutex){+.+.}, at: [<ffffffff822ad8e7>] rtnl_lock+0x17/0x20
[ 52.863021]
stack backtrace:
[ 52.867890] CPU: 1 PID: 599 Comm: kworker/1:3 Not tainted 4.14.0-rc3jiri+ #4
[ 52.875773] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
[ 52.886267] Workqueue: mlxsw_core mlxsw_sp_fdb_notify_work [mlxsw_spectrum]
[ 52.894060] Call Trace:
[ 52.909122] __lock_acquire+0xf6f/0x2a10
[ 53.025412] lock_acquire+0x158/0x440
[ 53.047557] flush_work+0x3c4/0x5e0
[ 53.087571] __cancel_work_timer+0x3ca/0x5e0
[ 53.177051] cancel_delayed_work_sync+0x13/0x20
[ 53.182142] mlxsw_reg_trans_bulk_wait+0x12d/0x7a0 [mlxsw_core]
[ 53.194571] mlxsw_core_reg_access+0x586/0x990 [mlxsw_core]
[ 53.225365] mlxsw_reg_query+0x10/0x20 [mlxsw_core]
[ 53.230882] mlxsw_sp_fdb_notify_work+0x2a3/0x9d0 [mlxsw_spectrum]
[ 53.237801] process_one_work+0x8f1/0x12f0
[ 53.321804] worker_thread+0x1fd/0x10c0
[ 53.435158] kthread+0x28e/0x370
[ 53.448703] ret_from_fork+0x2a/0x40
[ 53.453017] mlxsw_spectrum 0000:01:00.0: EMAD retries (2/5) (tid=bf4549b100000774)
[ 53.453119] mlxsw_spectrum 0000:01:00.0: EMAD retries (5/5) (tid=bf4549b100000770)
[ 53.453132] mlxsw_spectrum 0000:01:00.0: EMAD reg access failed (tid=bf4549b100000770,reg_id=200b(sfn),type=query,status=0(operation performed))
[ 53.453143] mlxsw_spectrum 0000:01:00.0: Failed to get FDB notifications
Fix this by creating another workqueue for EMAD timeouts, thereby
preventing the situation of a work item trying to flush a work item
queued on the same workqueue.
Fixes: caf7297e7a ("mlxsw: core: Introduce support for asynchronous EMAD register access")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Formerly, IPIP entries were created lazily by next hops that referenced
an offloadable IP-in-IP netdevice. However now that they are created
eagerly as a reaction to events on such netdevices, the reference
counting is useless. Hence drop it.
The routes whose next hops reference an offloaded IP-in-IP netdevice
actually linger around a bit after their device is unregistered.
However, mlxsw_sp_ipip_entry_destroy() also destroys the backing
loopback, and mlxsw_sp_rif_destroy() transitively (via
mlxsw_sp_nexthop_rif_gone_sync()) calls mlxsw_sp_nexthop_ipip_fini(),
which unlinks the IPIP entry from a next hop. Thus no dangling pointers
are left behind for the brief window after netdevice is gone, but routes
not yet.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPIP entries are created as soon as an offloadable device is created.
That means that when such a device is later moved to a different VRF,
the loopback device that backs the tunnel is wrong.
Thus when an offloadable encapsulating netdevice moves from one VRF to
another, make sure that the loopback is updated as necessary.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code for offloading IP-in-IP tunneling assumes that there is no
decap without encap. But that's never true for IPv6 overlays, and is not
true for IPv4 ones either, if net.ipv4.conf.*.rp_filter is unset.
To support decap-only tunnels, an IPIP entry is now created as soon as
an offloadable tunneling device is created. When that netdevice is up'd,
a decap route is looked up and possibly offloaded. Thus decap is not
handled implicitly as part of mlxsw_sp_ipip_entry_get() call anymore,
but needs to be done explicitly after the get, if desired.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, all netdevice notifications that the driver cared about were
related to its own ports, and mlxsw_sp could be retrieved from the
netdevice's private data. For IP-in-IP offloading however, the driver
cares about events on foreign netdevices, and getting at mlxsw_sp or
router data structures from the handler is inconvenient.
Therefore move the netdevice notifier blocks from global scope to struct
mlxsw_sp to allow retrieval from the notifier block pointer itself.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Support the SWITCHDEV_ATTR_ID_BRIDGE_MROUTER port attribute switchdev
notification.
To do that, add the mrouter flag to struct mlxsw_sp_bridge_device, which
indicates whether the bridge device was set to be mrouter port. This field
is set when:
- A new bridge is created, where the value is taken from the kernel
bridge value.
- A switchdev SWITCHDEV_ATTR_ID_BRIDGE_MROUTER notification is sent.
In addition, change the bridge MID entries to include the router port when
the bridge device is configured to be mrouter port. The MID entries are
updated in the following cases:
- When a new MID entry is created, update the router port according to the
bridge mrouter state.
- When a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER notification is sent, update all
the bridge's MID entries.
This is aligned with the case where a bridge slave is configured to be
mrouter port.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum, MDB entries point to MID entries, that indicate which ports a
packet should be forwarded to. Add the support in creating MID entries that
forward the packet to the Spectrum router port.
This will be later used to handle the bridge mrouter port switchdev
notifications.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In Spectrum hardware, the router port is a virtual port that is the gateway
to the routing mechanism. Hence, in order for a packet to be L3 forwarded,
it must first be L2 forwarded to the router port inside the hardware.
Further patches in this patchset are going to introduce support in bridge
device used as an mrouter port. In this case, the router port index will be
needed in order to update the MDB entries to include the router port. Thus,
export the mlxsw_sp_router_port function, which returns the index of the
Spectrum router port.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The code that actually takes care of bridge offload introduces a few
more non-trivial constraints with regards to bridge enslavements.
Propagate extack there to indicate the reason.
$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add link enp1s0np1 name enp1s0np1.20 type vlan id 20
$ ip link add name br0 type bridge
$ ip link set dev enp1s0np1.10 master br0
$ ip link set dev enp1s0np1.20 master br0
Error: spectrum: Can not bridge VLAN uppers of the same port.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to physical ports, enslavement of VLAN devices can also fail.
Use extack to indicate why the enslavement failed.
$ ip link add link enp1s0np1 name enp1s0np1.10 type vlan id 10
$ ip link add name bond0 type bond mode 802.3ad
$ ip link set dev enp1s0np1.10 master bond0
Error: spectrum: VLAN devices only support bridge and VRF uppers.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I increased the scale of supported VRFs by having
all of them share the same LPM tree.
In order to avoid look-ups for prefix lengths that don't exist, each
route removal would trigger an aggregation across all the active virtual
routers to see which prefix lengths are in use and which aren't and
structure the tree accordingly.
With the way the data structures are currently laid out, this is a very
expensive operation. When preformed repeatedly - due to the invocation
of the abort mechanism - and with enough VRFs, this can result in a hung
task.
For now, avoid this optimization until it can be properly re-added in
net-next.
Fixes: fc922bb0dd ("mlxsw: spectrum_router: Use one LPM tree for all virtual routers")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
Tested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw fails device enslavement for a number of reasons. Use the extack
facility to return an error message to the user stating why the enslave
is failing.
Messages are prefixed with "spectrum" so users know it is a constraint
imposed by the hardware driver. For example:
$ ip li add br0.11 link br0 type vlan id 11
$ ip li set swp11 master br0
Error: spectrum: Enslaving a port to a device that already has an upper device is not supported.
Signed-off-by: David Ahern <dsahern@gmail.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We accidentally return success if the kmalloc_array() call fails.
Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw_afa_block_create() doesn't return error pointers, it returns NULL
on error.
Fixes: 0e14c7777a ("mlxsw: spectrum: Add the multicast routing hardware logic")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the support of trap-and-forward route action in the multicast routing
offloading logic. A route will be set to trap-and-forward action if one (or
more) of its output interfaces is not offload-able, i.e. does not have a
valid Spectrum RIF.
This way, a route with mixed output VIFs list, which contains both
offload-able and un-offload-able devices can go through partial offloading
in hardware, and the rest will be done in the kernel ipmr module.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition to the current multicast route actions, which include trap
route action and a forward route action, add the trap-and-forward multicast
route action, and implement it in the multicast routing hardware logic.
To implement that, add a trap-and-forward ACL action as the last action in
the route flexible action set. The used trap is the ACL2 trap, which marks
the packets with offload_mr_forward_mark, to prevent the packet from being
forwarded again by the kernel.
Note: At that stage the offloading logic does not support trap-and-forward
multicast routes. This patch adds the support only in the hardware logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a multicast route is configured with trap-and-forward action, the
packets should be marked with skb->offload_mr_fwd_mark, in order to prevent
the packets from being forwarded again by the kernel ipmr module.
Due to this, it is not possible to use the already existing multicast trap
(MLXSW_TRAP_ID_ACL1) as the packet should be marked differently. Add the
MLXSW_TRAP_ID_ACL2 which is for trap-and-forward multicast routes, and set
the offload_mr_fwd_mark skb field in its handler.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use trap/discard flex action to implement trap and forward. The action will
later be used for multicast routing, as the multicast routing mechanism is
done using ACL flexible actions in Spectrum hardware. Using that action, it
will be possible to implement a trap-and-forward route.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When considering whether to set RTNH_F_OFFLOAD flag on an IPv6 route,
mlxsw_sp_fib6_entry_offload_set() looks up the mlxsw_sp_nexthop
corresponding to a given route, and decides based on whether the next
hop's offloaded flag was set. When looking for the matching next hop, it
also takes into account the device of the route, which must match next
hop's RIF.
IPIP next hops however hitherto didn't set the RIF. As a result, IPv6
routes forwarding traffic to IP-in-IP netdevices are never marked as
offloaded, even when they actually are.
Thus track RIF of IPIP next hops the same way as that of ETHERNET next
hops.
Fixes: 8f28a30976 ("mlxsw: spectrum_router: Support IPv6 overlay encap")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When creating a new RIF, bumping RIF count of the containing VR is the
last thing to be done. Symmetrically, when destroying a RIF, RIF count
is first dropped and only then the rest of the cleanup proceeds.
That's a problem for loopback RIFs. Those hold two VR references: one
for overlay and one for underlay. mlxsw_sp_rif_destroy() releases the
overlay one, and the deconfigure() callback the underlay one. But if
both overlay and underlay are the same, and if there are no other
artifacts holding the VR alive, this put actually destroys the VR. Later
on, when mlxsw_sp_rif_destroy() calls mlxsw_sp_vr_put() for the same VR,
the VR will already have been released and the kernel crashes with NULL
pointer dereference.
The underlying problem is that the RIF under destruction ends up
referencing the overlay VR much longer than it claims: all the way until
the call to mlxsw_sp_vr_put(). So line up the reference counting
properly to reflect this. Make corresponding changes in
mlxsw_sp_rif_create() as well for symmetry.
Fixes: 6ddb7426a7 ("mlxsw: spectrum_router: Introduce loopback RIFs")
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the unlikely event that mfc->mfc_un.res.ttls[i] is 255 for all
values of i from 0 to MAXIVS-1, the err is not set at all and hence
has a garbage value on the error return at the end of the function,
so initialize it to 0. Also, the error return check on err and goto
to err: inside the for loop makes it impossible for err to be zero
at the end of the for loop, so we can remove the redundant err check
at the end of the loop.
Detected by CoverityScan CID#1457207 ("Unitialized scalar value")
Fixes: c011ec1bbf ("mlxsw: spectrum: Add the multicast routing offloading logic")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Yotam Gigi <yotamg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make the Spectrum router logic not ignore the RTNL_FAMILY_IPMR FIB
notifications.
Past commits added the IPMR VIF and MFC add/del notifications via the
fib_notifier chain. In addition, a code for handling these notifications in
the Spectrum router logic was added. Make the Spectrum router logic not
ignore these notifications and forward the requests to the Spectrum
multicast router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to the fact that multicast routes hold the minimum MTU of all the
egress RIFs and trap packets that don't meet it, notify the mulitcast
router code on RIF MTU changes.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add functionality for calling the multicast routing offloading logic upon
MFC and VIF add and delete notifications. In addition, call the multicast
routing upon RIF addition and deletion events.
As the multicast routing offload logic may sleep, the actual calls are done
in a deferred work. To ensure the MFC object is not freed in that interval,
a reference is held to it. In case of a failure, the abort mechanism is
used, which ejects all the routes from the hardware and triggers the
traffic to flow through the kernel.
Note: At that stage, the FIB notifications are still ignored, and will be
enabled in a further patch.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the mlxsw Spectrum driver offloads only either the RT_TABLE_MAIN
FIB table or the VRF tables, so the RT_TABLE_LOCAL table is squashed to the
RT_TABLE_MAIN table to allow local routes to be offloaded too.
By default, multicast MFC routes which are not assigned to any user
requested table are put in the RT_TABLE_DEFAULT table.
Due to the fact that offloading multicast MFC routes support in Spectrum
router logic is going to be introduced soon, squash the default table to
MAIN too.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the multicast routing hardware API introduced in previous patch
for the specific spectrum hardware.
The spectrum hardware multicast routes are written using the RMFT2 register
and point to an ACL flexible action set. The actions used for multicast
routes are:
- Counter action, which allows counting bytes and packets on multicast
routes.
- Multicast route action, which provide RPF check and do the actual packet
duplication to a list of RIFs.
- Trap action, in the case the route action specified by the called is
trap.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the multicast router offloading logic, which is in charge of handling
the VIF and MFC notifications and translating it to the hardware logic API.
The offloading logic has to overcome several obstacles in order to safely
comply with the kernel multicast router user API:
- It must keep track of the mapping between VIFs to netdevices. The user
can add an MFC cache entry pointing to a VIF, delete the VIF and add
re-add it with a different netdevice. The offloading logic has to handle
this in order to be compatible with the kernel logic.
- It must keep track of the mapping between netdevices to spectrum RIFs,
as the current hardware implementation assume having a RIF for every
port in a multicast router.
- It must handle routes pointing to pimreg device to be trapped to the
kernel, as the packet should be delivered to userspace.
- It must handle routes pointing tunnel VIFs. The current implementation
does not support multicast forwarding to tunnels, thus routes that point
to a tunnel should be trapped to the kernel.
- It must be aware of proxy multicast routes, which include both (*,*)
routes and duplicate routes. Currently proxy routes are not offloaded
and trigger the abort mechanism: removal of all routes from hardware and
triggering the traffic to go through the kernel.
The multicast routing offloading logic also updates the counters of the
offloaded MFC routes in a periodic work.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Propagate error instead of doing WARN_ON right away.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for controlling nexthop counters via dpipe.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for adjacency table dump.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for setting counters on nexthops based on dpipe's adjacency
table counter status. This patch also adds the ability for getting the
counter value, which will be used by the dpipe adjacency table dump
implementation in the next patches.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to add the ability for setting counters on nexthops the RATR
register should be extended.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add initial support for router adjacency table. The table does lookup
based on the nexthop-group index and the local nexthop offset. After
locating the nexthop entry it sets the destination MAC address and the
egress RIF.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done as a preparation before introducing the ability to dump the
adjacency table via dpipe, and to count the table size. The current table
implementation avoids tunnel entries, thus a helper for checking if
the nexthop group contains tunnel entries is also provided. The mlxsw's
nexthop representative struct stays private to the router module.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use list_is_last helper to check for last neighbor.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Keep nexthops in a linked list for easy access.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds field for mlxsw's meta header which will be used to
describe the match/action behavior of the adjacency table.
The fields are:
1. Adj_index - The global index of the nexthop group in the adjacency
table.
2. Adj_hash_index - Local index offset which is based on packets hash
mod the nexthop group size.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix indentation in mlxsw_meta header's description.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a mrouter is registered or leaves a mid, don't update the HW.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In mdb flush the port is being removed from all the mids it is registered
to. But if the port is mrouter, all the mids floods to it.
This patch remove mrouter ports from mids it is not registered to in the
mdb flush.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Whenever a port starts / stops being mrouter, update all the mdb entries
in the HW to flood / stop flooding mc packets there.
The change should happen only if the port is not in the mid. (If it is,
the mid should flood mc packets to this port anyway)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When mc is enabled, whenever a mc packet doesn't hit any mdb entry it is
being flood to the ports marked as mrouters. However, all mc packets should
be flooded to them even if they match an entry in the mdb.
This patch adds the mrouter ports to every mdb entry that is being written
to the HW.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When a port is being removed from a bridge, flush the bridge mdb to remove
the mids of that port.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When multicast is disabled, flood mc packets only to port that are marked
BR_MCAST_FLOOD (instead to all).
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use the generic mc flood function to decide whether to flood mc to a port
when mc is being enabled / disabled.
Move this function in the file to avoid forward declaration.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove all the mdb entries from the HW when mc is being disabled and
re-write them when it is being enabled.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Don't write multicast related data to the HW when mc is disabled.
Also, don't allocate mid id to new mids (so the remove function could know
that they weren't wrote to the HW)
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Break mid deletion into two function, so it will be possible in the future
to delete a mid entry for other reasons then switchdev command (like port
deletion).
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Attach mid getting and releasing mid id to the HW write / remove, and add
a flag to indicate whether the mid is in the HW. It is done because mid id
is also HW index to this mid.
This change allows adding in the following patches the ability to have a
mid in the mdb cache but not in the HW. It will be useful for being able
to disable the multicast.
It means that the mdb is being written / delete to the HW in the mid
allocation / removing function, not after them.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Break the smid write function into two, one that cleans the ports that
might be still written there and one that changes an exiting mid entry.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of saving all the mids in the same list, save them per vlan
device. This change allows a more efficient mid find.
Also, in the next patches, there will be added a lot of loops over all the
mids in bridge device for multicast disable, mrouter change and ndb flush.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since there is a bitmap for the ports registered to each mid, there is no
need for a ref count, since it will always be the number of set bits in
this bitmap. Any check of the ref count was replaced with checking if the
bitmap is empty.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a bitmap of ports to the mid struct to hold the ports that are
registered to this mid.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the naming of mc_router to mrouter to keep consistency.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add three new traps needed for multicast routing:
- PIM: Trap for PIM protocol control packets.
- RPF: Trap for packets that fail the RPF check on a specific hardware
route entry.
- MULTICAST: Generic trap for multicast. It is used for routes that trap
the packets to the CPU.
The RPF and MULTICAST traps have rate limiters as these traps may have
line-rate of packets trapped. The PIM trap has a rate limiter similarly to
other L3 control protocols. The rate limiters are implemented by adding
three new trap groups for the newly introduced traps.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw_sp_rif struct, defined as private struct in spectrum_router.c
will be used in the multicast router source file. Due to the fact that the
dev field will be needed by the multicast router logic, add an access
function to it.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Turn on two bits on the Spectrum RIF configuration:
- IPv4 multicast: when a multicast packet arrives on a RIF, send it to go
through multicast routes lookup.
- IPv4 multicast forwarding enable: when multicast packet arrives on a
RIF, allow it to be forwarded by multicast routes. If this bit is not
set, multicast packets will go through multicast routing lookup but will
be dropped at the egress of the ports.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RRCR register is used for copying and moving TCAM multicast routes
from different offsets. It will be used to allow routes relocation for
parman ops as part of the multicast router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RMFT-V2 register is used to configure and query the multicast table and
will be used by the multicast router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The multicast ERIF list entries resource indicates the number of entries
that can be put in one rigr2 register operation. While the register can
hold up to MLXSW_REG_RIGR2_MAX_ERIFS ( = 32) ERIF entries, the actual
number allowed by firmware is indicated with this resource.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RIGR-V2 register is used to add, remove and query egress interface list
of a multicast forwarding entry and it will be used by the multicast
router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This register is used for allocation of regions in the TCAM table and it
will be used by the multicast router offloading logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The MLXSW_REG_PXXX_FLEX_ACTION_SET_LEN is relevant for the multicast router
registers too, so rename it to have a general name which is not bound to a
specific register.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the trap ACL action to be configured with different traps. This
allows the multicast router offloading code to use that same ACL action
with the multicast router traps. By using different traps, the multicast
router can have different trap policies and can handle the packet
differently.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The Spectrum multicast forwarding is done using an ACL action. Add the
mcrouter ACL action that will be used to offload the multicast router
logic.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A flexible action instance allows, given a set of ops, creating, committing
and sharing a set of ACL action blocks. The flexible action instance in
question is using the spectrum KVD linear space to store the flexible
action sets.
Move this flexible action instance to the common spectrum struct to allow
other users (such as multicast router) to get that functionality.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The multicast router offloading code is going to require the counter_pools
initialization to occur before the router initialization, thus, change the
spectrum initialization order to fix it.
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver doesn't support events from address families other than IPv4
and IPv6, so ignore them. Otherwise, we risk queueing a work item before
it's initialized.
This can happen in case a VRF is configured when MROUTE_MULTIPLE_TABLES
is enabled, as the VRF driver will try to add an l3mdev rule for the
IPMR family.
Fixes: 65e65ec137 ("mlxsw: spectrum_router: Don't ignore IPv6 notifications")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Andreas Rammhold <andreas@rammhold.de>
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When removing the offloading of mirred actions under
matchall classifiers, mlxsw would find the destination port
associated with the offloaded action and utilize it for undoing
the configuration.
Depending on the order by which ports are removed, it's possible that
the destination port would get removed before the source port.
In such a scenario, when actions would be flushed for the source port
mlxsw would perform an illegal dereference as the destination port is
no longer listed.
Since the only item necessary for undoing the configuration on the
destination side is the port-id and that in turn is already maintained
by mlxsw on the source-port, simply stop trying to access the
destination port and use the port-id directly instead.
Fixes: 763b4b70af ("mlxsw: spectrum: Add support in matchall mirror TC offloading")
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The current code does not handle correctly the access to the upper page
in case of SFP/SFP+ EEPROM. In that case the offset should be local
and the I2C address should be changed.
Fixes: 2ea109039c ("mlxsw: spectrum: Add support for access cable info via ethtool")
Reported-by: Florian Klink <flokli@flokli.de>
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces callbacks and tunnel type to offload GRE tunnels.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct mlxsw_sp_rif is a router-private structure, and therefore
everything related to it is as well: parameters, and derived RIF types
including loopbacks. IPIP module needs access to some details of
loopback interfaces, but exporting all the RIF shebang would create too
large an interface.
So instead export just the bare minimum necessary: accessors for RIF
index and underlay VRF ID.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These traps are generated for packets that fail checks for source IP,
encapsulation type, or GRE key. Trap these packets to CPU for follow-up
handling by the kernel, which will send ICMP destination unreachable
responses.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The local route that points at IPIP's underlay device (decap route) can
be present long before the GRE device. Thus when an encap route is
added, it's necessary to look inside the underlay FIB if the decap route
is already present. If so, the current trap offload needs to be
withdrawn and replaced with a decap offload.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike encapsulation, which is represented by a next hop forwarding to
an IPIP tunnel, decapsulation is a type of local route. It is created
for local routes whose prefix corresponds to the local address of one of
offloaded IPIP tunnels. When the tunnel is removed (i.e. all the encap
next hops are removed), the decap offload is migrated back to a trap for
resolution in slow path.
This patch assumes that decap route is already present when encap route
is added. A follow-up patch will fix this issue.
Note that this patch only supports IPv4 underlay. Support for IPv6
underlay will be subject to follow-up work apart from this patchset.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the missing bits to recognize IPv6 next hops as IPIP ones to enable
offloading of IPv6 overlay encapsulation.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces some common code for tracking of offloaded IP-in-IP
tunnels, and support for offloading IPv4 overlay encapsulating routes in
particular. A follow-up patch will introduce IPv6 overlay as well.
Offloaded tunnels are kept in a linked list of mlxsw_sp_ipip_entry
objects hooked up in mlxsw_sp_router. A network device that represents
the tunnel is used as a key to look up the corresponding IPIP entry.
Note that in the future, more general keying mechanism will be needed,
because parts of the tunnel information can be provided by the route.
IPIP entries are reference counted, because several next hops may end up
using the same tunnel, and we only want to offload it once.
Encapsulation path hooks into next hop handling. Routes that forward to
a tunnel are now considered gateway routes, thus giving them the same
treatment that other remote routes get. An IPIP next hop type is
introduced.
Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. If a tunnel type doesn't match any of the
known tunnel types, the next-hop is not considered an IPIP next hop.
The list of IPIP tunnel types is currently empty, follow-up patches will
add support for GRE. Traffic to IPIP tunnel types that are not
explicitly recognized by the driver traps and is handled in slow path.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the router, some next hops may reference an encapsulating netdevice,
such as GRE or IPIP. To properly offload these next hops, mlxsw needs to
keep track of whether a given next hop is a regular Ethernet entry, or
an IP-in-IP tunneling entry.
To facilitate this book-keeping, add a type field to struct
mlxsw_sp_nexthop. There is, as of this patch, only one next hop type:
MLXSW_SP_NEXTHOP_TYPE_ETH. Follow-up patches will introduce the IP-in-IP
variant.
There are several places where next hops are initialized in the IPv4
path. Instead of replicating the logic at every one of them, factor it
out to a function mlxsw_sp_nexthop4_type_init(). The corresponding fini
is actually protocol-neutral, so put it to mlxsw_sp_nexthop_type_fini(),
but create a corresponding protocoled _fini function that dispatches to
the protocol-neutral one.
The IPv6 path is simpler, but for symmetry with IPv4, create the same
suite of functions with corresponding logic.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 counterpart of the previous patch: introduce a function to
determine whether a given route is a gateway route.
The new function takes a mlxsw_sp argument which follow-up patches will
use. Thus mlxsw_sp_fib6_entry_type_set() got that argument as well.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For IPv4 IP-in-IP offload, routes that direct traffic to IP-in-IP
devices need to be considered gateway routes as well. That involves a
bit more logic, so extract the current test to a separate function,
where the logic can be later added.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When offloading L3 tunnels, an adjacency entry is created that loops the
packet back into the underlay router. Loopback interfaces then hold the
corresponding information and are created for IP-in-IP netdevices.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Loopback RIFs, which will be introduced in a follow-up patch, differ
from other RIFs in that they do not have a FID associated with them.
To support this, demote FID allocation from mlxsw_sp_rif_create to
configure op of the existing RIF types, and likewise the FID release
from mlxsw_sp_rif_destroy to deconfigure op.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Details of individual tunnel types are kept in an array of
mlxsw_sp_ipip_ops objects. Follow-up patches will use the list to
determine whether a constructed RIF should be a loopback, and to decide
whether a next hop references a tunnel.
The list is currently empty, follow-up patches will add support for GRE.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The spectrum_ipip module that will be introduced in the follow-up
patches needs to know the data type.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To support IPIP, the driver needs to be able to construct an IPIP
adjacency. Change mlxsw_reg_ratr_pack to take an adjacency type as an
argument. Adjust the one existing caller.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike other interface types, loopback RIFs do not have MAC address. So
drop the corresponding argument from mlxsw_reg_ritr_pack() and move it
to a new function. Call that from callers of mlxsw_reg_ritr_pack.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RTDP register is used for configuring the tunnel decap properties of
NVE and IPinIP.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To implement IP-in-IP decapsulation, Spectrum uses LPM entries of type
IP2ME with tunnel validity bit and tunnel pointer set. The necessary
register fields are already available, so add a function to pack the
RALUE as appropriate.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This enum is used with reg_ratr_trap_id, so move it next to the register
definition.
While at it, drop the enumerator initializers.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, adjacencies have always been of type Ethernet (with value of 0),
and thus there was no need to explicitly support RATR type. However to
support IP-in-IP adjacencies, this type and a suite of IP-in-IP-specific
attributes need to be added.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the register so that loopback RIFs can be created and loopback
properties specified.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the abort mechanism is invoked a default route directing packets to
the CPU is programmed in all the virtual routers currently in use. This
can result in packet loss in case a new VRF is configured.
Upon abort, program the default route in all virtual routers, whether
they are in use or not.
The patch is directed at net-next since post-abort fixes aren't critical
and packet loss due to a missing default route will be insignificant
compared to packet loss caused by the CPU port policer.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I relied on the fact that anycast routes use the loopback device as
their nexthop device to trap packets hitting them to the CPU.
After commit 4832c30d54 ("net: ipv6: put host and anycast routes on
device with address") this is no longer the case and such routes are
programmed with a forward action (note the 'offload' flag):
anycast cafe:: dev enp3s0np7 proto kernel metric 0 offload pref medium
This will prevent the router from locally receiving packets destined to
the Subnet-Router anycast address.
Fix this by specifically programming anycast routes with action trap,
which results in the following output:
anycast cafe:: dev enp3s0np7 proto kernel metric 0 pref medium
Fixes: 4832c30d54 ("net: ipv6: put host and anycast routes on device with address")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.
Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.
One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.
Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.
Fixes: 0d65fc1304 ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nogah Frankel <nogahf@mellanox.com>
Tested-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for controlling IPv6 neighbor counters via dpipe.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for setting counters on IPv6 neighbors based on dpipe's host6
table counter status.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for IPv6 host table dump.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the host entry filler helper to be applicable for both IPv4/6
addresses.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add helper for accessing destination IP in case of IPv6 neighbor.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPv6 host table initial support. The action behavior for both IPv4/6
tables is the same, thus the same action dump op is used. Neighbors with
link local address are ignored.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neighbors with link local addresses are not offloaded to the host table,
yet, the are maintained in the driver for adjacency table usage. When
dumping the IPv6 host neighbors this link local neighbors should be
ignored. This patch exports this helper for dpipe usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During the neighbor traversal the neighbors from different families
should be ignored.
Fixes: c58035a74aba ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Makes no sense to have dpipe compiled in when devlink is not enabled,
because the devlink dpipe registation is noop function. So don't compile
it in. This also fixes missing extern structs errors.
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Fixes: a86f030915 ("mlxsw: spectrum_dpipe: Add support for IPv4 host table dump")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for controlling neighbor counters via dpipe.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for IPv4 host table dump.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for setting counters on neighbors based on dpipe's host table
counter status. This patch also adds the ability for getting the counter
value, which will be used by the dpipe host table implementation in the
next patches.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done as a preparation before introducing support for neighbor
counters. The flow counter's type enum is used by many registers, yet,
until now it was used only by mgpc and thus it was private. This patch
updates the namespace for more generic usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change label name for case of erif table init failure.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is done as a preparation before introducing the ability to dump the
host table via dpipe, and to count the table size. The mlxsw's neighbor
representative struct stays private to the router module.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The entry clear routine can be shared between the drivers, thus it is
moved inside devlink.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Up until now the dpipe table's size was static and known at registration
time. The host table does not have constant size and it is resized in
dynamic manner. In order to support this behavior the size is changed
to be obtained dynamically via an op.
This patch also adjust the current dpipe table for the new API.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix ERIF's table operations name space.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If action is gact goto_chain, offload it to HW by jumping to another
ruleset.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to lookup ruleset in order to offload goto_chain termination
action. This patch adds it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For goto_chain action we need to know group_id of a ruleset to jump to.
Provide infrastructure in order to get it.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Reflect chain index coming down from TC core and create a ruleset per
chain. Note that only chain 0, being the implicit chain, is bound to the
device for processing. The rest of chains have to be "jumped-to" by
actions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update the value of the mrouter flag in struct mlxsw_sp_bridge_port when
it is being changed.
Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I made an embarrassing mistake and used 'IPV6' instead of 'CONFIG_IPV6'
around the function that updates the kernel about IPv6 neighbours
activity. This can be a problem if the kernel has more neighbours than a
certain threshold and it starts deleting those that are supposedly
inactive.
Fixes: b5f3e0d430 ("mlxsw: spectrum_router: Fix build when IPv6 isn't enabled")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 routes currently lack nexthop flags as in IPv4. This has several
implications.
In the forwarding path, it requires us to check the carrier state of the
nexthop device and potentially ignore a linkdown route, instead of
checking for RTNH_F_LINKDOWN.
It also requires capable drivers to use the user facing IPv6-specific
route flags to provide offload indication, instead of using the nexthop
flags as in IPv4.
Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
provide offload indication instead of the RTF_OFFLOAD flag, which is
removed while it's still not part of any official kernel release.
In the near future we would like to use the field for the
RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
not be ready in time for the current cycle.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Due to limited ASIC resources the maximum number of routes is limited by
the nexthop resource. In order to improve the routing scale nexthop
consolidation should be performed.
This patch adds support for IPv6 neighbor consolidation. The hash value
is calculated based on the nexthop set, by performing bitwise xor on the
ifindexs of the nexthops, in a similar way to IPv4's kernel implementation.
In case of collision a full match is performed between the sets which
include address and ifindex comparison.
Non gateway nexthop groups are not inserted to the hash table due to
lack of nexthop device (ifindex).
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch does preparation before introducing IPv6 nexthop group
consolidation. Currently the nexthop group hash table is used only by
IPv4 and uses fixed key size. In order to support the IPv6's variable
length key the current table is changed.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of LPM trees available for lookup is much smaller than the
number of virtual routers, which are used to implement VRFs. In
addition, an LPM tree can only be used by one protocol - either IPv4 or
IPv6.
Therefore, in order to increase the number of supported virtual routers
to the maximum we need to be able to share LPM trees across virtual
routers instead of trying to find an optimized tree for each.
Do that by allocating one LPM tree for each protocol, but make sure it
will only include prefixes that are actually used, so as to not perform
unnecessary lookups.
Since changing the structure of a bound tree isn't recommended, whenever
a new tree it required, it's first created and then bound to each
virtual router, replacing the old one.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of relying on the LPM tree to be assigned to the virtual router
before binding the two, lets pass it explicitly.
This will later allow us to return upon binding error instead of having
to perform a rollback of the assignment.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is no point in returning a value from function whose return value
is never checked.
Even if the return value was checked, there wouldn't be anything to do
about it, as these functions are either called from error or deletion
paths.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make these structures const as they only stored in the profile field of
a mlxsw_driver structure, which is of type const.
Done using Coccinelle.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of checking handle, which does not have the inner class
information and drivers wrongly assume clsact->egress as ingress, use
the newly introduced classid identification helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.
The TCP conflict was a case of local variables in a function
being removed from both net and net-next.
In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.
Signed-off-by: David S. Miller <davem@davemloft.net>
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
prio is not cls_flower specific, but it is meaningful for all
classifiers. Seems that only mlxsw cares about the value. Obviously,
cls offload in other drivers is broken.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To sync-up with the naming in the rest of the driver, rename the cls arg.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Let mlxsw_sp_setup_tc be a splitter for specific setup_tc types and push
out cls_flower and cls_matchall specific codes into separate functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order to be aligned with the rest of the types, rename
TC_SETUP_MATCHALL to TC_SETUP_CLSMATCHALL.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The rest of the helpers are named tcf_exts_*, so change the name of
the action number helpers to be aligned. While at it, change to inline
functions.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Each multicast group (MID) stores a bitmap of ports to which a packet
should be forwarded to in case an MDB entry associated with the MID is
hit.
Since the initial introduction of IGMP snooping in commit 3a49b4fde2
("mlxsw: Adding layer 2 multicast support") the driver didn't correctly
free these multicast groups upon ungraceful situations such as the
removal of the upper bridge device or module removal.
The correct way to fix this is to associate each MID with the bridge
ports member in it and then drop the reference in case the bridge port
is destroyed, but this will result in a lot more code and will be fixed
in net-next.
For now, upon module removal, traverse the MID list and release each
one.
Fixes: 3a49b4fde2 ("mlxsw: Adding layer 2 multicast support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some operations in the bridge driver such as MDB deletion are preformed
in an atomic context and thus deferred to a process context by the
switchdev infrastructure.
Therefore, by the time the operation is performed by the underlying
device driver it's possible the bridge port context is already gone.
This is especially true for removal flows, but theoretically can also be
invoked during addition.
Remove the warnings in such situations and return normally.
Fixes: c57529e1d5 ("mlxsw: spectrum: Replace vPorts with Port-VLAN")
Fixes: 3922285d96 ("net: bridge: Add support for offloading port attributes")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now have all the necessary IPv6 infrastructure in place, so stop
ignoring these notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Without resorting to ACLs, the device performs route lookup solely based
on the destination IP address.
In case source-specific routing is needed, an error is returned and the
abort mechanism is activated, thus allowing the kernel to take over
forwarding decisions.
Instead of aborting, we can trap specific destination prefixes where
source-specific routes are present, but this will result in a lot more
code that is unlikely to ever be used.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In case we got a replace event, then the replaced route must exist. If
the route isn't capable of multipath, then replace first matching
non-multipath capable route.
If the route is capable of multipath and matching multipath capable
route is found, then replace it. Otherwise, replace first matching
non-multipath capable route.
The new route is inserted before the replaced one. In case the replaced
route is currently offloaded, then it's overwritten in the device's table
by the new route and later deleted, thus not impacting routed traffic.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>