Not all of the bits of the LOCAL_PKT_PROC_CNTXT register are valid.
Until IPA v4.5, there are 17 valid bits (though the bottom three
must be zero). Starting with IPA v4.5, 18 bits are valid.
Introduce proc_cntxt_base_addr_encoded() to encode the base address
for use in the register using only the valid bits.
Shorten the name of the register (omit "_BASE") to avoid the need to
wrap the line in the one place it's used.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define the ENDP_INIT_NAT register for setting up the NAT
configuration for an endpoint. We aren't using NAT at this
time, so explicitly set the type to BYPASS for all endpoints.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add IPA version definitions for all IPA v3.x and v4.x. Fix the GSI
version associated with IPA version 4.1.
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Modify conditional tests throughout the IPA code so they do not
assume that IPA v3.5.1 is the oldest version supported. Also remove
assumptions that IPA v4.5 is the newest version of IPA supported.
Augment versions in comments with "+", to be clearer that the
comment applies to a version and subsequent versions. (E.g.,
"present for IPA v4.2+" instead of just "present for v4.2".)
Signed-off-by: Alex Elder <elder@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
After commit 098a697b49 ("tcp_metrics: Use a single hash table
for all network namespaces."), tcpm_hash_bucket is local to
net/ipv4/tcp_metrics.c
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit 73f156a6e8 ("inetpeer: get rid of ip_id_count")
I used a very small hash table that could be abused
by patient attackers to reveal sensitive information.
Switch to a dynamic sizing, depending on RAM size.
Typical big hosts will now use 128x more storage (2 MB)
to get a similar increase in security and reduction
of hash collisions.
As a bonus, use of alloc_large_system_hash() spreads
allocated memory among all NUMA nodes.
Fixes: 73f156a6e8 ("inetpeer: get rid of ip_id_count")
Reported-by: Amit Klein <aksecurity@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cited commit added a new attribute before the existing group reference
count attribute, thereby changing its value and breaking existing
applications on new kernels.
Before:
# psample -l
libpsample ERROR psample_group_foreach: failed to recv message: Operation not supported
After:
# psample -l
Group Num Refcount Group Seq
1 1 0
Fix by restoring the value of the old attribute and remove the
misleading comments from the enumerator to avoid future bugs.
Cc: stable@vger.kernel.org
Fixes: d8bed686ab ("net: psample: Add tunnel support")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reported-by: Adiel Bidani <adielb@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel says:
====================
mlxsw: Add support for resilient nexthop groups
This patchset adds support for resilient nexthop groups in mlxsw. As far
as the hardware is concerned, resilient groups are the same as regular
groups. The differences lie in how mlxsw manages the individual
adjacency entries (nexthop buckets) that make up the group.
The first difference is that unlike regular groups the driver needs to
periodically update the kernel about activity of nexthop buckets so that
the kernel will not treat the buckets as idle, given traffic is
offloaded from the CPU to the ASIC. This is similar to what mlxsw is
already doing with respect to neighbour entries. The update interval is
set to 1 second to allow for short idle timers.
The second difference is that nexthop buckets that correspond to an
unresolved neighbour must be programmed to the device, as the size of
the group must remain fixed. This is achieved by programming such
entries with trap action, in order to trigger neighbour resolution by
the kernel.
The third difference is atomic replacement of individual nexthop
buckets. While the driver periodically updates the kernel about activity
of nexthop buckets, it is possible for a bucket to become active just
before the kernel decides to replace it with a different nexthop. To
avoid such situations and connections being reset, the driver instructs
the device to only replace an adjacency entry if it is inactive.
Failures are propagated back to the nexthop code.
Patchset overview:
Patches #1-#7 gradually add support for resilient nexthop groups
Patch #8 finally enables such groups to be programmed to the device
Patches #9-#10 add mlxsw-specific selftests
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Test that unsupported resilient nexthop group configurations are
rejected and that offload / trap indication is correctly set on nexthop
buckets in a resilient group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The number of nexthop buckets in a resilient nexthop group never
changes, so when the gateway address of a nexthop cannot be resolved,
the nexthop buckets are programmed to trap packets to the CPU in order
to trigger resolution. For example:
# ip nexthop add id 1 via 198.51.100.1 dev swp3
# ip nexthop add id 10 group 1 type resilient buckets 32
# ip nexthop bucket get id 10 index 0
id 10 index 0 idle_time 1.44 nhid 1 trap
Where 198.51.100.1 is a made-up IP.
Test that in this case packets are indeed trapped to the CPU via the
unresolved neigh trap.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that mlxsw supports resilient nexthop groups, allow them to be
programmed after validating that their configuration conforms to the
device's limitations (e.g., number of buckets is within predefined
range).
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The kernel periodically checks the idle time of nexthop buckets to
determine if they are idle and can be re-populated with a new nexthop.
When the resilient nexthop group is offloaded to hardware, the kernel
will not see activity on nexthop buckets unless it is reported from
hardware.
Therefore, periodically (every 1 second) query the hardware for activity
of adjacency entries used as part of a resilient nexthop group and
report it to the nexthop code.
The activity is only queried if resilient nexthop groups are in use. The
delayed work is canceled otherwise.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The RATRAD register is used to dump and optionally clear activity bits
of router adjacency table entries. Will be used by the next patch to
query and clear the activity of nexthop buckets in a resilient nexthop
group.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
So far, mlxsw only updated hardware flags ('offload' / 'trap') on
nexthop objects. For resilient nexthop groups, these flags need to be
updated on individual nexthop buckets as well.
Update these flags whenever updating the flags of the encapsulating
nexthop object and whenever a nexthop bucket is replaced.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace a single nexthop bucket upon receiving a
'NEXTHOP_EVENT_BUCKET_REPLACE' notification.
When the 'force' parameter is not set, instruct the device to only
overwrite an adjacency entry if its activity is cleared, so as not to
break existing flows using the adjacency entry. The device does not
provide feedback if the replacement was successful in this case, so the
contents of the adjacency entry after the replacement are compared with
the replacement request.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Have the caller pass a pointer to the payload of the RATR register to
the function updating a single nexthop / adjacency entry.
In a subsequent patch, this will allow the caller to make sure
replacement was successful by querying the state of the adjacency entry
after replacement and comparing with the initial request.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allow the driver to instruct the device to only overwrite an adjacency
entry if its activity is cleared. Currently, adjacency entry is always
overwritten, regardless of activity.
This will be used by subsequent patches to prevent replacement of active
nexthop buckets.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Parse the configuration of resilient nexthop groups to existing mlxsw
data structures. Unlike non-resilient groups, nexthops without a valid
MAC or router interface (RIF) are programmed with a trap action instead
of not being programmed at all.
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
- The Open Routing (Open/R) network protocol netlink handler uses ID 99
- Will also add to `/etc/iproute2/rt_protos` once this is accepted
- For more information: https://github.com/facebook/openr
Signed-off-by: From: Cooper Lees <me@cooperlees.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When enetc runs out of exact match entries for unicast address
filtering, it switches to an approach based on hash tables, where
multiple MAC addresses might end up in the same bucket.
However, the enetc_set_mac_ht_flt function currently depends on the
system endianness, because it interprets the 64-bit hash value as an
array of two u32 elements. Modify this to use lower_32_bits and
upper_32_bits.
Tested by forcing enetc to go into hash table mode by creating two
macvlan upper interfaces:
ip link add link eno0 address 00:01:02:03:00:00 eno0.0 type macvlan && ip link set eno0.0 up
ip link add link eno0 address 00:01:02:03:00:01 eno0.1 type macvlan && ip link set eno0.1 up
and verified that the same bit values are written to the registers
before and after:
enetc_sync_mac_filters: addr 00:00:80:00:40:10 exact match 0
enetc_sync_mac_filters: addr 00:00:00:00:80:00 exact match 0
enetc_set_mac_ht_flt: hash 0x80008000000000 UMHFR0 0x0 UMHFR1 0x800080
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ENETC has a 64-entry hash table for VLAN RX filtering per Station
Interface, which is accessed through two 32-bit registers: VHFR0 holding
the low portion, and VHFR1 holding the high portion.
The enetc_set_vlan_ht_filter function looks at the pf->vlan_ht_filter
bitmap, which is fundamentally an unsigned long variable, and casts it
to a u32 array of two elements. It puts the first u32 element into VHFR0
and the second u32 element into VHFR1.
It is easy to imagine that this will not work on big endian systems
(although, yes, we have bigger problems, because currently enetc assumes
that the CPU endianness is equal to the controller endianness, aka
little endian - but let's assume that we could add a cpu_to_le32 in
enetc_wd_reg and a le32_to_cpu in enetc_rd_reg).
Let's use lower_32_bits and upper_32_bits which are designed to work
regardless of endianness.
Tested that both the old and the new method produce the same results:
$ ethtool -K eth1 rx-vlan-filter on
$ ip link add link eth1 name eth1.100 type vlan id 100
enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x20
enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x20
$ ip link add link eth1 name eth1.101 type vlan id 101
enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x30
enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x30
$ ip link add link eth1 name eth1.34 type vlan id 34
enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x0 VHFR1 0x34
enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x0 VHFR1 0x34
$ ip link add link eth1 name eth1.1024 type vlan id 1024
enetc_set_vlan_ht_filter: method 1: si_idx 0 VHFR0 0x1 VHFR1 0x34
enetc_set_vlan_ht_filter: method 2: si_idx 0 VHFR0 0x1 VHFR1 0x34
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use memdup_user_nul() helper instead of open-coding to
simplify the code.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Made changes to coding style as suggested by checkpatch.pl
changes are of the type:
space required before the open parenthesis '('
space required after that ','
Signed-off-by: Sai Kalyaan Palla <saikalyaan63@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wong Vee Khee says:
====================
Add support for Clause-45 PHY Loopback
This patch series add support for Clause-45 PHY loopback.
It involves adding a generic API in the PHY framework, which can be
accessed by all C45 PHY drivers using the .set_loopback callback.
Also, enable PHY loopback for the Marvell 88x3310/88x2110 driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for PHY loopback for Marvell 88x2110 and Marvell 88x3310.
This allow user to perform PHY loopback test using ethtool selftest.
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add generic code to enable C45 PHY loopback into the common phy-c45.c
file. This will allow C45 PHY drivers aceess this by setting
.set_loopback.
Suggested-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
sprintf() is declared with a restrict keyword to not allow input and
output to point to the same buffer:
lib/test_rhashtable.c: In function 'print_ht':
lib/test_rhashtable.c:504:4: error: 'sprintf' argument 3 overlaps destination object 'buff' [-Werror=restrict]
504 | sprintf(buff, "%s\nbucket[%d] -> ", buff, i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lib/test_rhashtable.c:489:7: note: destination object referenced by 'restrict'-qualified argument 1 was declared here
489 | char buff[512] = "";
| ^~~~
Rework this function to remember the last offset instead to
avoid the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
When compile testing this driver on a platform on which probe() is
known to fail at compile time, gcc warns about the cgx_lmactype_string[]
array being uninitialized:
In function 'strncpy',
inlined from 'link_status_user_format' at /git/arm-soc/drivers/net/ethernet/marvell/octeontx2/af/cgx.c:838:2,
inlined from 'cgx_link_change_handler' at /git/arm-soc/drivers/net/ethernet/marvell/octeontx2/af/cgx.c:853:2:
include/linux/fortify-string.h:27:30: error: argument 2 null where non-null expected [-Werror=nonnull]
27 | #define __underlying_strncpy __builtin_strncpy
Address this by turning the runtime initialization into a fixed array,
which should also produce better code.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Sunil Goutham <sgoutham@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cross timestamping is supported on Integrated Ethernet Controller in
Intel SoC such as EHL and TGL with Always Running Timer.
The hardware cross-timestamp result is made available to
applications through the PTP_SYS_OFFSET_PRECISE ioctl which calls
stmmac_getcrosststamp().
Device time is stored in the MAC Auxiliary register. The 64-bit System
time (ART timestamp) is stored in registers that are only addressable
by using MDIO space.
Signed-off-by: Tan Tee Min <tee.min.tan@intel.com>
Co-developed-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/maintaning/maintaining/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/procdure/procedure/
s/maintanance/maintenance/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
s/preceeds/precedes/ .....two different places
s/rsponse/response/
s/cetain/certain/
s/precison/precision/
Fix a sentence construction as per suggestion.
Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
netfilter: flowtable enhancements
[ This is v2 that includes documentation enhancements, including
existing limitations. This is a rebase on top on net-next. ]
The following patchset augments the Netfilter flowtable fastpath to
support for network topologies that combine IP forwarding, bridge,
classic VLAN devices, bridge VLAN filtering, DSA and PPPoE. This
includes support for the flowtable software and hardware datapaths.
The following pictures provides an example scenario:
fast path!
.------------------------.
/ \
| IP forwarding |
| / \ \/
| br0 wan ..... eth0
. / \ host C
-> veth1 veth2
. switch/router
.
.
eth0
host A
The bridge master device 'br0' has an IP address and a DHCP server is
also assumed to be running to provide connectivity to host A which
reaches the Internet through 'br0' as default gateway. Then, packet
enters the IP forwarding path and Netfilter is used to NAT the packets
before they leave through the wan device.
The general idea is to accelerate forwarding by building a fast path
that takes packets from the ingress path of the bridge port and place
them in the egress path of the wan device (and vice versa). Hence,
skipping the classic bridge and IP stack paths.
** Patch from #1 to #6 add the infrastructure which describes the list of
netdevice hops to reach a given destination MAC address in the local
network topology.
Patch #1 adds dev_fill_forward_path() and .ndo_fill_forward_path() to
netdev_ops.
Patch #2 adds .ndo_fill_forward_path for vlan devices, which provides
the next device hop via vlan->real_dev, the vlan ID and the
protocol.
Patch #3 adds .ndo_fill_forward_path for bridge devices, which allows to make
lookups to the FDB to locate the next device hop (bridge port) in the
forwarding path.
Patch #4 extends bridge .ndo_fill_forward_path to support for bridge VLAN
filtering.
Patch #5 adds .ndo_fill_forward_path for PPPoE devices.
Patch #6 adds .ndo_fill_forward_path for DSA.
Patches from #7 to #14 update the flowtable software datapath:
Patch #7 adds the transmit path type field to the flow tuple. Two transmit
paths are supported so far: the neighbour and the xfrm transmit
paths.
Patch #8 and #9 update the flowtable datapath to use dev_fill_forward_path()
to obtain the real ingress/egress device for the flowtable datapath.
This adds the new ethernet xmit direct path to the flowtable.
Patch #10 adds native flowtable VLAN support (up to 2 VLAN tags) through
dev_fill_forward_path(). The flowtable stores the VLAN id and
protocol in the flow tuple.
Patch #11 adds native flowtable bridge VLAN filter support through
dev_fill_forward_path().
Patch #12 adds native flowtable bridge PPPoE through dev_fill_forward_path().
Patch #13 adds DSA support through dev_fill_forward_path().
Patch #14 extends flowtable selftests to cover for flowtable software
datapath enhancements.
** Patches from #15 to #20 update the flowtable hardware offload datapath:
Patch #15 extends the flowtable hardware offload to support for the
direct ethernet xmit path. This also includes VLAN support.
Patch #16 stores the egress real device in the flow tuple. The software
flowtable datapath uses dev_hard_header() to transmit packets,
hence it might refer to VLAN/DSA/PPPoE software device, not
the real ethernet device.
Patch #17 deals with switchdev PVID hardware offload to skip it on
egress.
Patch #18 adds FLOW_ACTION_PPPOE_PUSH to the flow_offload action API.
Patch #19 extends the flowtable hardware offload to support for PPPoE
Patch #20 adds TC_SETUP_FT support for DSA.
** Patches from #20 to #23: Felix Fietkau adds a new driver which support
hardware offload for the mtk PPE engine through the existing flow
offload API which supports for the flowtable enhancements coming in
this batch.
Patch #24 extends the documentation and describe existing limitations.
Please, apply, thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates the flowtable documentation to describe recent
enhancements:
- Offload action is available after the first packets go through the
classic forwarding path.
- IPv4 and IPv6 are supported. Only TCP and UDP layer 4 are supported at
this stage.
- Tuple has been augmented to track VLAN id and PPPoE session id.
- Bridge and IP forwarding integration, including bridge VLAN filtering
support.
- Hardware offload support.
- Describe the [OFFLOAD] and [HW_OFFLOAD] tags in the conntrack table
listing.
- Replace 'flow offload' by 'flow add' in example rulesets (preferred
syntax).
- Describe existing cache limitations.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds support for offloading IPv4 routed flows, including SNAT/DNAT,
one VLAN, PPPoE and DSA.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The PPE (packet processing engine) is used to offload NAT/routed or even
bridged flows. This patch brings up the PPE and uses it to get a packet
hash. It also contains some functionality that will be used to bring up
flow offloading.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When using DSA, set the special tag in GDM ingress control to allow the MAC
to parse packets properly earlier. This affects rx DMA source port reporting.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dsa infrastructure provides a well-defined hierarchy of devices,
pass up the call to set up the flow block to the master device. From the
software dataplane, the netfilter infrastructure uses the dsa slave
devices to refer to the input and output device for the given skbuff.
Similarly, the flowtable definition in the ruleset refers to the dsa
slave port devices.
This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT
with the master device via the dsa slave devices.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add a PPPoE push action if layer 2 protocol is ETH_P_PPP_SES to add
PPPoE flowtable hardware offload support.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add an action to represent the PPPoE hardware offload support that
includes the session ID.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The switch might have already added the VLAN tag through PVID hardware
offload. Keep this extra VLAN in the flowtable but skip it on egress.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
If there is a forward path to reach an ethernet device and hardware
offload is enabled, then use the direct xmit path.
Moreover, store the real device in the direct xmit path info since
software datapath uses dev_hard_header() to push the layer encapsulation
headers while hardware offload refers to the real device.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the flow tuple xmit_type is set to FLOW_OFFLOAD_XMIT_DIRECT, the
dst_cache pointer is not valid, and the h_source/h_dest/ifidx out fields
need to be used.
This patch also adds the FLOW_ACTION_VLAN_PUSH action to pass the VLAN
tag to the driver.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds two new tests to cover bridge and vlan support:
- Add a bridge device to the Router1 (nsr1) container and attach the
veth0 device to the bridge. Set the IP address to the bridge device
to exercise the bridge forwarding path.
- Add vlan encapsulation between to the bridge device in the Router1 and
one of the sender containers (ns1).
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Replace the master ethernet device by the dsa slave port. Packets coming
in from the software ingress path use the dsa slave port as input
device.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the PPPoE protocol and session id to the flow tuple using the encap
fields to uniquely identify flows from the receive path. For the
transmit path, dev_hard_header() on the vlan device push the headers.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the vlan tag based when PVID is set on.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the vlan id and protocol to the flow tuple to uniquely identify
flows from the receive path. For the transmit path, dev_hard_header() on
the vlan device push the headers. This patch includes support for two
vlan headers (QinQ) from the ingress path.
Add a generic encap field to the flowtable entry which stores the
protocol and the tag id. This allows to reuse these fields in the PPPoE
support coming in a later patch.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The egress device in the tuple is obtained from route. Use
dev_fill_forward_path() instead to provide the real egress device for
this flow whenever this is available.
The new FLOW_OFFLOAD_XMIT_DIRECT type uses dev_queue_xmit() to transmit
ethernet frames. Cache the source and destination hardware address to
use dev_queue_xmit() to transfer packets.
The FLOW_OFFLOAD_XMIT_DIRECT replaces FLOW_OFFLOAD_XMIT_NEIGH if
dev_fill_forward_path() finds a direct transmit path.
In case of topology updates, if peer is moved to different bridge port,
the connection will time out, reconnect will result in a new entry with
the correct path. Snooping fdb updates would allow for cleaning up stale
flowtable entries.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>