2020-05-06 03:20:55 +08:00
|
|
|
// SPDX-License-Identifier: GPL-2.0
|
2021-09-17 19:17:35 +08:00
|
|
|
/* Copyright 2020 NXP
|
2020-05-06 03:20:55 +08:00
|
|
|
*/
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
#include <net/tc_act/tc_gate.h>
|
2020-05-06 03:20:55 +08:00
|
|
|
#include <linux/dsa/8021q.h>
|
2020-06-01 02:25:51 +08:00
|
|
|
#include "sja1105_vl.h"
|
2020-05-06 03:20:55 +08:00
|
|
|
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
#define SJA1105_SIZE_VL_STATUS 8
|
|
|
|
|
2020-06-24 21:54:44 +08:00
|
|
|
/* Insert into the global gate list, sorted by gate action time. */
|
|
|
|
static int sja1105_insert_gate_entry(struct sja1105_gating_config *gating_cfg,
|
|
|
|
struct sja1105_rule *rule,
|
|
|
|
u8 gate_state, s64 entry_time,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct sja1105_gate_entry *e;
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
e = kzalloc(sizeof(*e), GFP_KERNEL);
|
|
|
|
if (!e)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
e->rule = rule;
|
|
|
|
e->gate_state = gate_state;
|
|
|
|
e->interval = entry_time;
|
|
|
|
|
|
|
|
if (list_empty(&gating_cfg->entries)) {
|
|
|
|
list_add(&e->list, &gating_cfg->entries);
|
|
|
|
} else {
|
|
|
|
struct sja1105_gate_entry *p;
|
|
|
|
|
|
|
|
list_for_each_entry(p, &gating_cfg->entries, list) {
|
|
|
|
if (p->interval == e->interval) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Gate conflict");
|
|
|
|
rc = -EBUSY;
|
|
|
|
goto err;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (e->interval < p->interval)
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
list_add(&e->list, p->list.prev);
|
|
|
|
}
|
|
|
|
|
|
|
|
gating_cfg->num_entries++;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
err:
|
|
|
|
kfree(e);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* The gate entries contain absolute times in their e->interval field. Convert
|
|
|
|
* that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
|
|
|
|
*/
|
|
|
|
static void
|
|
|
|
sja1105_gating_cfg_time_to_interval(struct sja1105_gating_config *gating_cfg,
|
|
|
|
u64 cycle_time)
|
|
|
|
{
|
|
|
|
struct sja1105_gate_entry *last_e;
|
|
|
|
struct sja1105_gate_entry *e;
|
|
|
|
struct list_head *prev;
|
|
|
|
|
|
|
|
list_for_each_entry(e, &gating_cfg->entries, list) {
|
|
|
|
struct sja1105_gate_entry *p;
|
|
|
|
|
|
|
|
prev = e->list.prev;
|
|
|
|
|
|
|
|
if (prev == &gating_cfg->entries)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
p = list_entry(prev, struct sja1105_gate_entry, list);
|
|
|
|
p->interval = e->interval - p->interval;
|
|
|
|
}
|
|
|
|
last_e = list_last_entry(&gating_cfg->entries,
|
|
|
|
struct sja1105_gate_entry, list);
|
net: dsa: sja1105: fix tc-gate schedule with single element
The sja1105_gating_cfg_time_to_interval function does this, as per the
comments:
/* The gate entries contain absolute times in their e->interval field. Convert
* that to proper intervals (i.e. "0, 5, 10, 15" to "5, 5, 5, 5").
*/
To perform that task, it iterates over gating_cfg->entries, at each step
updating the interval of the _previous_ entry. So one interval remains
to be updated at the end of the loop: the last one (since it isn't
"prev" for anyone else).
But there was an erroneous check, that the last element's interval
should not be updated if it's also the only element. I'm not quite sure
why that check was there, but it's clearly incorrect, as a tc-gate
schedule with a single element would get an e->interval of zero,
regardless of the duration requested by the user. The switch wouldn't
even consider this configuration as valid: it will just drop all traffic
that matches the rule.
Fixes: 834f8933d5dd ("net: dsa: sja1105: implement tc-gate using time-triggered virtual links")
Reported-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-24 21:54:47 +08:00
|
|
|
last_e->interval = cycle_time - last_e->interval;
|
2020-06-24 21:54:44 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
static void sja1105_free_gating_config(struct sja1105_gating_config *gating_cfg)
|
|
|
|
{
|
|
|
|
struct sja1105_gate_entry *e, *n;
|
|
|
|
|
|
|
|
list_for_each_entry_safe(e, n, &gating_cfg->entries, list) {
|
|
|
|
list_del(&e->list);
|
|
|
|
kfree(e);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sja1105_compose_gating_subschedule(struct sja1105_private *priv,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
struct sja1105_gating_config *gating_cfg = &priv->tas_data.gating_cfg;
|
|
|
|
struct sja1105_rule *rule;
|
|
|
|
s64 max_cycle_time = 0;
|
|
|
|
s64 its_base_time = 0;
|
|
|
|
int i, rc = 0;
|
|
|
|
|
2020-06-24 21:54:45 +08:00
|
|
|
sja1105_free_gating_config(gating_cfg);
|
|
|
|
|
2020-06-24 21:54:44 +08:00
|
|
|
list_for_each_entry(rule, &priv->flow_block.rules, list) {
|
|
|
|
if (rule->type != SJA1105_RULE_VL)
|
|
|
|
continue;
|
|
|
|
if (rule->vl.type != SJA1105_VL_TIME_TRIGGERED)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
if (max_cycle_time < rule->vl.cycle_time) {
|
|
|
|
max_cycle_time = rule->vl.cycle_time;
|
|
|
|
its_base_time = rule->vl.base_time;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!max_cycle_time)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
dev_dbg(priv->ds->dev, "max_cycle_time %lld its_base_time %lld\n",
|
|
|
|
max_cycle_time, its_base_time);
|
|
|
|
|
|
|
|
gating_cfg->base_time = its_base_time;
|
|
|
|
gating_cfg->cycle_time = max_cycle_time;
|
|
|
|
gating_cfg->num_entries = 0;
|
|
|
|
|
|
|
|
list_for_each_entry(rule, &priv->flow_block.rules, list) {
|
|
|
|
s64 time;
|
|
|
|
s64 rbt;
|
|
|
|
|
|
|
|
if (rule->type != SJA1105_RULE_VL)
|
|
|
|
continue;
|
|
|
|
if (rule->vl.type != SJA1105_VL_TIME_TRIGGERED)
|
|
|
|
continue;
|
|
|
|
|
|
|
|
/* Calculate the difference between this gating schedule's
|
|
|
|
* base time, and the base time of the gating schedule with the
|
|
|
|
* longest cycle time. We call it the relative base time (rbt).
|
|
|
|
*/
|
|
|
|
rbt = future_base_time(rule->vl.base_time, rule->vl.cycle_time,
|
|
|
|
its_base_time);
|
|
|
|
rbt -= its_base_time;
|
|
|
|
|
|
|
|
time = rbt;
|
|
|
|
|
|
|
|
for (i = 0; i < rule->vl.num_entries; i++) {
|
|
|
|
u8 gate_state = rule->vl.entries[i].gate_state;
|
|
|
|
s64 entry_time = time;
|
|
|
|
|
|
|
|
while (entry_time < max_cycle_time) {
|
|
|
|
rc = sja1105_insert_gate_entry(gating_cfg, rule,
|
|
|
|
gate_state,
|
|
|
|
entry_time,
|
|
|
|
extack);
|
|
|
|
if (rc)
|
|
|
|
goto err;
|
|
|
|
|
|
|
|
entry_time += rule->vl.cycle_time;
|
|
|
|
}
|
|
|
|
time += rule->vl.entries[i].interval;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
sja1105_gating_cfg_time_to_interval(gating_cfg, max_cycle_time);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
err:
|
|
|
|
sja1105_free_gating_config(gating_cfg);
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
/* The switch flow classification core implements TTEthernet, which 'thinks' in
|
|
|
|
* terms of Virtual Links (VL), a concept borrowed from ARINC 664 part 7.
|
|
|
|
* However it also has one other operating mode (VLLUPFORMAT=0) where it acts
|
|
|
|
* somewhat closer to a pre-standard implementation of IEEE 802.1Qci
|
|
|
|
* (Per-Stream Filtering and Policing), which is what the driver is going to be
|
|
|
|
* implementing.
|
|
|
|
*
|
|
|
|
* VL Lookup
|
|
|
|
* Key = {DMAC && VLANID +---------+ Key = { (DMAC[47:16] & VLMASK ==
|
|
|
|
* && VLAN PCP | | VLMARKER)
|
|
|
|
* && INGRESS PORT} +---------+ (both fixed)
|
|
|
|
* (exact match, | && DMAC[15:0] == VLID
|
|
|
|
* all specified in rule) | (specified in rule)
|
|
|
|
* v && INGRESS PORT }
|
|
|
|
* ------------
|
|
|
|
* 0 (PSFP) / \ 1 (ARINC664)
|
|
|
|
* +-----------/ VLLUPFORMAT \----------+
|
|
|
|
* | \ (fixed) / |
|
|
|
|
* | \ / |
|
|
|
|
* 0 (forwarding) v ------------ |
|
|
|
|
* ------------ |
|
|
|
|
* / \ 1 (QoS classification) |
|
|
|
|
* +---/ ISCRITICAL \-----------+ |
|
|
|
|
* | \ (per rule) / | |
|
|
|
|
* | \ / VLID taken from VLID taken from
|
|
|
|
* v ------------ index of rule contents of rule
|
|
|
|
* select that matched that matched
|
|
|
|
* DESTPORTS | |
|
|
|
|
* | +---------+--------+
|
|
|
|
* | |
|
|
|
|
* | v
|
|
|
|
* | VL Forwarding
|
|
|
|
* | (indexed by VLID)
|
|
|
|
* | +---------+
|
|
|
|
* | +--------------| |
|
|
|
|
* | | select TYPE +---------+
|
|
|
|
* | v
|
|
|
|
* | 0 (rate ------------ 1 (time
|
|
|
|
* | constrained) / \ triggered)
|
|
|
|
* | +------/ TYPE \------------+
|
|
|
|
* | | \ (per VLID) / |
|
|
|
|
* | v \ / v
|
|
|
|
* | VL Policing ------------ VL Policing
|
|
|
|
* | (indexed by VLID) (indexed by VLID)
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | | TYPE=0 | | TYPE=1 |
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | select SHARINDX select SHARINDX to
|
|
|
|
* | to rate-limit re-enter VL Forwarding
|
|
|
|
* | groups of VL's with new VLID for egress
|
|
|
|
* | to same quota |
|
|
|
|
* | | |
|
|
|
|
* | select MAXLEN -> exceed => drop select MAXLEN -> exceed => drop
|
|
|
|
* | | |
|
|
|
|
* | v v
|
|
|
|
* | VL Forwarding VL Forwarding
|
|
|
|
* | (indexed by SHARINDX) (indexed by SHARINDX)
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | | TYPE=0 | | TYPE=1 |
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | select PRIORITY, select PRIORITY,
|
|
|
|
* | PARTITION, DESTPORTS PARTITION, DESTPORTS
|
|
|
|
* | | |
|
|
|
|
* | v v
|
|
|
|
* | VL Policing VL Policing
|
|
|
|
* | (indexed by SHARINDX) (indexed by SHARINDX)
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | | TYPE=0 | | TYPE=1 |
|
|
|
|
* | +---------+ +---------+
|
|
|
|
* | | |
|
|
|
|
* | v |
|
|
|
|
* | select BAG, -> exceed => drop |
|
|
|
|
* | JITTER v
|
|
|
|
* | | ----------------------------------------------
|
|
|
|
* | | / Reception Window is open for this VL \
|
|
|
|
* | | / (the Schedule Table executes an entry i \
|
|
|
|
* | | / M <= i < N, for which these conditions hold): \ no
|
|
|
|
* | | +----/ \-+
|
|
|
|
* | | |yes \ WINST[M] == 1 && WINSTINDEX[M] == VLID / |
|
|
|
|
* | | | \ WINEND[N] == 1 && WINSTINDEX[N] == VLID / |
|
|
|
|
* | | | \ / |
|
|
|
|
* | | | \ (the VL window has opened and not yet closed)/ |
|
|
|
|
* | | | ---------------------------------------------- |
|
|
|
|
* | | v v
|
|
|
|
* | | dispatch to DESTPORTS when the Schedule Table drop
|
|
|
|
* | | executes an entry i with TXEN == 1 && VLINDEX == i
|
|
|
|
* v v
|
|
|
|
* dispatch immediately to DESTPORTS
|
|
|
|
*
|
|
|
|
* The per-port classification key is always composed of {DMAC, VID, PCP} and
|
|
|
|
* is non-maskable. This 'looks like' the NULL stream identification function
|
|
|
|
* from IEEE 802.1CB clause 6, except for the extra VLAN PCP. When the switch
|
|
|
|
* ports operate as VLAN-unaware, we do allow the user to not specify the VLAN
|
|
|
|
* ID and PCP, and then the port-based defaults will be used.
|
|
|
|
*
|
|
|
|
* In TTEthernet, routing is something that needs to be done manually for each
|
|
|
|
* Virtual Link. So the flow action must always include one of:
|
|
|
|
* a. 'redirect', 'trap' or 'drop': select the egress port list
|
|
|
|
* Additionally, the following actions may be applied on a Virtual Link,
|
|
|
|
* turning it into 'critical' traffic:
|
|
|
|
* b. 'police': turn it into a rate-constrained VL, with bandwidth limitation
|
|
|
|
* given by the maximum frame length, bandwidth allocation gap (BAG) and
|
|
|
|
* maximum jitter.
|
|
|
|
* c. 'gate': turn it into a time-triggered VL, which can be only be received
|
|
|
|
* and forwarded according to a given schedule.
|
|
|
|
*/
|
|
|
|
|
|
|
|
static bool sja1105_vl_key_lower(struct sja1105_vl_lookup_entry *a,
|
|
|
|
struct sja1105_vl_lookup_entry *b)
|
|
|
|
{
|
|
|
|
if (a->macaddr < b->macaddr)
|
|
|
|
return true;
|
|
|
|
if (a->macaddr > b->macaddr)
|
|
|
|
return false;
|
|
|
|
if (a->vlanid < b->vlanid)
|
|
|
|
return true;
|
|
|
|
if (a->vlanid > b->vlanid)
|
|
|
|
return false;
|
|
|
|
if (a->port < b->port)
|
|
|
|
return true;
|
|
|
|
if (a->port > b->port)
|
|
|
|
return false;
|
|
|
|
if (a->vlanprior < b->vlanprior)
|
|
|
|
return true;
|
|
|
|
if (a->vlanprior > b->vlanprior)
|
|
|
|
return false;
|
|
|
|
/* Keys are equal */
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
tied with the sja1105 switch: each port uses the same pvid which is also
used for standalone operation (a unique one from which the source port
and device ID can be retrieved when packets from that port are forwarded
to the CPU). Since each port has a unique pvid when performing
autonomous forwarding, the switch must be configured for Shared VLAN
Learning (SVL) such that the VLAN ID itself is ignored when performing
FDB lookups. Without SVL, packets would always be flooded, since FDB
lookup in the source port's VLAN would never find any entry.
First of all, to make tag_8021q more palatable to switches which might
not support Shared VLAN Learning, let's just use a common VLAN for all
ports that are under the same bridge.
Secondly, using Shared VLAN Learning means that FDB isolation can never
be enforced. But if all ports under the same VLAN-unaware bridge share
the same VLAN ID, it can.
The disadvantage is that the CPU port can no longer perform precise
source port identification for these packets. But at least we have a
mechanism which has proven to be adequate for that situation: imprecise
RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
termination on VLAN-aware bridges.
The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
same one as we were previously using for imprecise TX (bridge TX
forwarding offload). It is already allocated, it is just a matter of
using it.
Note that because now all ports under the same bridge share the same
VLAN, the complexity of performing a tag_8021q bridge join decreases
dramatically. We no longer have to install the RX VLAN of a newly
joining port into the port membership of the existing bridge ports.
The newly joining port just becomes a member of the VLAN corresponding
to that bridge, and the other ports are already members of it from when
they joined the bridge themselves. So forwarding works properly.
This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
these calls directly into the sja1105 driver.
With this new mode of operation, a port controlled by tag_8021q can have
two pvids whereas before it could only have one. The pvid for standalone
operation is different from the pvid used for VLAN-unaware bridging.
This is done, again, so that FDB isolation can be enforced.
Let tag_8021q manage this by deleting the standalone pvid when a port
joins a bridge, and restoring it when it leaves it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:16 +08:00
|
|
|
/* FIXME: this should change when the bridge upper of the port changes. */
|
|
|
|
static u16 sja1105_port_get_tag_8021q_vid(struct dsa_port *dp)
|
|
|
|
{
|
|
|
|
unsigned long bridge_num;
|
|
|
|
|
|
|
|
if (!dp->bridge)
|
net: dsa: tag_8021q: merge RX and TX VLANs
In the old Shared VLAN Learning mode of operation that tag_8021q
previously used for forwarding, we needed to have distinct concepts for
an RX and a TX VLAN.
An RX VLAN could be installed on all ports that were members of a given
bridge, so that autonomous forwarding could still work, while a TX VLAN
was dedicated for precise packet steering, so it just contained the CPU
port and one egress port.
Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
all over, those lines have been blurred and we no longer have the need
to do precise TX towards a port that is in a bridge. As for standalone
ports, it is fine to use the same VLAN ID for both RX and TX.
This patch changes the tag_8021q format by shifting the VLAN range it
reserves, and halving it. Previously, our DIR bits were encoding the
VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
tag_8021q reserved 2K VLANs, or 50% of the available range.
Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
reserve only 1K VLANs, and a different range now (the last 1K). This is
done so that we leave the old format in place in case we need to return
to it.
In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
they are no longer distinct. For example, felix which did different
things for different VLAN types, now needs to handle the RX and the TX
logic for the same VLAN.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
|
|
|
return dsa_tag_8021q_standalone_vid(dp);
|
net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
tied with the sja1105 switch: each port uses the same pvid which is also
used for standalone operation (a unique one from which the source port
and device ID can be retrieved when packets from that port are forwarded
to the CPU). Since each port has a unique pvid when performing
autonomous forwarding, the switch must be configured for Shared VLAN
Learning (SVL) such that the VLAN ID itself is ignored when performing
FDB lookups. Without SVL, packets would always be flooded, since FDB
lookup in the source port's VLAN would never find any entry.
First of all, to make tag_8021q more palatable to switches which might
not support Shared VLAN Learning, let's just use a common VLAN for all
ports that are under the same bridge.
Secondly, using Shared VLAN Learning means that FDB isolation can never
be enforced. But if all ports under the same VLAN-unaware bridge share
the same VLAN ID, it can.
The disadvantage is that the CPU port can no longer perform precise
source port identification for these packets. But at least we have a
mechanism which has proven to be adequate for that situation: imprecise
RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
termination on VLAN-aware bridges.
The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
same one as we were previously using for imprecise TX (bridge TX
forwarding offload). It is already allocated, it is just a matter of
using it.
Note that because now all ports under the same bridge share the same
VLAN, the complexity of performing a tag_8021q bridge join decreases
dramatically. We no longer have to install the RX VLAN of a newly
joining port into the port membership of the existing bridge ports.
The newly joining port just becomes a member of the VLAN corresponding
to that bridge, and the other ports are already members of it from when
they joined the bridge themselves. So forwarding works properly.
This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
these calls directly into the sja1105 driver.
With this new mode of operation, a port controlled by tag_8021q can have
two pvids whereas before it could only have one. The pvid for standalone
operation is different from the pvid used for VLAN-unaware bridging.
This is done, again, so that FDB isolation can be enforced.
Let tag_8021q manage this by deleting the standalone pvid when a port
joins a bridge, and restoring it when it leaves it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:16 +08:00
|
|
|
|
|
|
|
bridge_num = dsa_port_bridge_num_get(dp);
|
|
|
|
|
2022-02-25 17:22:21 +08:00
|
|
|
return dsa_tag_8021q_bridge_vid(bridge_num);
|
net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
tied with the sja1105 switch: each port uses the same pvid which is also
used for standalone operation (a unique one from which the source port
and device ID can be retrieved when packets from that port are forwarded
to the CPU). Since each port has a unique pvid when performing
autonomous forwarding, the switch must be configured for Shared VLAN
Learning (SVL) such that the VLAN ID itself is ignored when performing
FDB lookups. Without SVL, packets would always be flooded, since FDB
lookup in the source port's VLAN would never find any entry.
First of all, to make tag_8021q more palatable to switches which might
not support Shared VLAN Learning, let's just use a common VLAN for all
ports that are under the same bridge.
Secondly, using Shared VLAN Learning means that FDB isolation can never
be enforced. But if all ports under the same VLAN-unaware bridge share
the same VLAN ID, it can.
The disadvantage is that the CPU port can no longer perform precise
source port identification for these packets. But at least we have a
mechanism which has proven to be adequate for that situation: imprecise
RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
termination on VLAN-aware bridges.
The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
same one as we were previously using for imprecise TX (bridge TX
forwarding offload). It is already allocated, it is just a matter of
using it.
Note that because now all ports under the same bridge share the same
VLAN, the complexity of performing a tag_8021q bridge join decreases
dramatically. We no longer have to install the RX VLAN of a newly
joining port into the port membership of the existing bridge ports.
The newly joining port just becomes a member of the VLAN corresponding
to that bridge, and the other ports are already members of it from when
they joined the bridge themselves. So forwarding works properly.
This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
these calls directly into the sja1105 driver.
With this new mode of operation, a port controlled by tag_8021q can have
two pvids whereas before it could only have one. The pvid for standalone
operation is different from the pvid used for VLAN-unaware bridging.
This is done, again, so that FDB isolation can be enforced.
Let tag_8021q manage this by deleting the standalone pvid when a port
joins a bridge, and restoring it when it leaves it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:16 +08:00
|
|
|
}
|
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
static int sja1105_init_virtual_links(struct sja1105_private *priv,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
struct sja1105_vl_policing_entry *vl_policing;
|
|
|
|
struct sja1105_vl_forwarding_entry *vl_fwd;
|
2020-05-06 03:20:55 +08:00
|
|
|
struct sja1105_vl_lookup_entry *vl_lookup;
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
bool have_critical_virtual_links = false;
|
2020-05-06 03:20:55 +08:00
|
|
|
struct sja1105_table *table;
|
|
|
|
struct sja1105_rule *rule;
|
|
|
|
int num_virtual_links = 0;
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
int max_sharindx = 0;
|
2020-05-06 03:20:55 +08:00
|
|
|
int i, j, k;
|
|
|
|
|
|
|
|
/* Figure out the dimensioning of the problem */
|
|
|
|
list_for_each_entry(rule, &priv->flow_block.rules, list) {
|
|
|
|
if (rule->type != SJA1105_RULE_VL)
|
|
|
|
continue;
|
|
|
|
/* Each VL lookup entry matches on a single ingress port */
|
|
|
|
num_virtual_links += hweight_long(rule->port_mask);
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
|
|
|
|
if (rule->vl.type != SJA1105_VL_NONCRITICAL)
|
|
|
|
have_critical_virtual_links = true;
|
|
|
|
if (max_sharindx < rule->vl.sharindx)
|
|
|
|
max_sharindx = rule->vl.sharindx;
|
2020-05-06 03:20:55 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
if (num_virtual_links > SJA1105_MAX_VL_LOOKUP_COUNT) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Not enough VL entries available");
|
|
|
|
return -ENOSPC;
|
|
|
|
}
|
|
|
|
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
if (max_sharindx + 1 > SJA1105_MAX_VL_LOOKUP_COUNT) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Policer index out of range");
|
|
|
|
return -ENOSPC;
|
|
|
|
}
|
|
|
|
|
|
|
|
max_sharindx = max_t(int, num_virtual_links, max_sharindx) + 1;
|
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
/* Discard previous VL Lookup Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_LOOKUP];
|
|
|
|
if (table->entry_count) {
|
|
|
|
kfree(table->entries);
|
|
|
|
table->entry_count = 0;
|
|
|
|
}
|
|
|
|
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
/* Discard previous VL Policing Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_POLICING];
|
|
|
|
if (table->entry_count) {
|
|
|
|
kfree(table->entries);
|
|
|
|
table->entry_count = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Discard previous VL Forwarding Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_FORWARDING];
|
|
|
|
if (table->entry_count) {
|
|
|
|
kfree(table->entries);
|
|
|
|
table->entry_count = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Discard previous VL Forwarding Parameters Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_FORWARDING_PARAMS];
|
|
|
|
if (table->entry_count) {
|
|
|
|
kfree(table->entries);
|
|
|
|
table->entry_count = 0;
|
|
|
|
}
|
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
/* Nothing to do */
|
|
|
|
if (!num_virtual_links)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* Pre-allocate space in the static config tables */
|
|
|
|
|
|
|
|
/* VL Lookup Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_LOOKUP];
|
|
|
|
table->entries = kcalloc(num_virtual_links,
|
|
|
|
table->ops->unpacked_entry_size,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!table->entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
table->entry_count = num_virtual_links;
|
|
|
|
vl_lookup = table->entries;
|
|
|
|
|
|
|
|
k = 0;
|
|
|
|
|
|
|
|
list_for_each_entry(rule, &priv->flow_block.rules, list) {
|
|
|
|
unsigned long port;
|
|
|
|
|
|
|
|
if (rule->type != SJA1105_RULE_VL)
|
|
|
|
continue;
|
|
|
|
|
2021-05-24 21:14:15 +08:00
|
|
|
for_each_set_bit(port, &rule->port_mask, SJA1105_MAX_NUM_PORTS) {
|
2020-05-06 03:20:55 +08:00
|
|
|
vl_lookup[k].format = SJA1105_VL_FORMAT_PSFP;
|
|
|
|
vl_lookup[k].port = port;
|
|
|
|
vl_lookup[k].macaddr = rule->key.vl.dmac;
|
|
|
|
if (rule->key.type == SJA1105_KEY_VLAN_AWARE_VL) {
|
|
|
|
vl_lookup[k].vlanid = rule->key.vl.vid;
|
|
|
|
vl_lookup[k].vlanprior = rule->key.vl.pcp;
|
|
|
|
} else {
|
net: dsa: tag_8021q: merge RX and TX VLANs
In the old Shared VLAN Learning mode of operation that tag_8021q
previously used for forwarding, we needed to have distinct concepts for
an RX and a TX VLAN.
An RX VLAN could be installed on all ports that were members of a given
bridge, so that autonomous forwarding could still work, while a TX VLAN
was dedicated for precise packet steering, so it just contained the CPU
port and one egress port.
Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX
all over, those lines have been blurred and we no longer have the need
to do precise TX towards a port that is in a bridge. As for standalone
ports, it is fine to use the same VLAN ID for both RX and TX.
This patch changes the tag_8021q format by shifting the VLAN range it
reserves, and halving it. Previously, our DIR bits were encoding the
VLAN direction (RX/TX) and were set to either 1 or 2. This meant that
tag_8021q reserved 2K VLANs, or 50% of the available range.
Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q
reserve only 1K VLANs, and a different range now (the last 1K). This is
done so that we leave the old format in place in case we need to return
to it.
In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan
functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and
they are no longer distinct. For example, felix which did different
things for different VLAN types, now needs to handle the RX and the TX
logic for the same VLAN.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:20 +08:00
|
|
|
/* FIXME */
|
2021-10-21 01:49:55 +08:00
|
|
|
struct dsa_port *dp = dsa_to_port(priv->ds, port);
|
net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging
For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too
tied with the sja1105 switch: each port uses the same pvid which is also
used for standalone operation (a unique one from which the source port
and device ID can be retrieved when packets from that port are forwarded
to the CPU). Since each port has a unique pvid when performing
autonomous forwarding, the switch must be configured for Shared VLAN
Learning (SVL) such that the VLAN ID itself is ignored when performing
FDB lookups. Without SVL, packets would always be flooded, since FDB
lookup in the source port's VLAN would never find any entry.
First of all, to make tag_8021q more palatable to switches which might
not support Shared VLAN Learning, let's just use a common VLAN for all
ports that are under the same bridge.
Secondly, using Shared VLAN Learning means that FDB isolation can never
be enforced. But if all ports under the same VLAN-unaware bridge share
the same VLAN ID, it can.
The disadvantage is that the CPU port can no longer perform precise
source port identification for these packets. But at least we have a
mechanism which has proven to be adequate for that situation: imprecise
RX (dsa_find_designated_bridge_port_by_vid), which is what we use for
termination on VLAN-aware bridges.
The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the
same one as we were previously using for imprecise TX (bridge TX
forwarding offload). It is already allocated, it is just a matter of
using it.
Note that because now all ports under the same bridge share the same
VLAN, the complexity of performing a tag_8021q bridge join decreases
dramatically. We no longer have to install the RX VLAN of a newly
joining port into the port membership of the existing bridge ports.
The newly joining port just becomes a member of the VLAN corresponding
to that bridge, and the other ports are already members of it from when
they joined the bridge themselves. So forwarding works properly.
This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the
cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put
these calls directly into the sja1105 driver.
With this new mode of operation, a port controlled by tag_8021q can have
two pvids whereas before it could only have one. The pvid for standalone
operation is different from the pvid used for VLAN-unaware bridging.
This is done, again, so that FDB isolation can be enforced.
Let tag_8021q manage this by deleting the standalone pvid when a port
joins a bridge, and restoring it when it leaves it.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-25 17:22:16 +08:00
|
|
|
u16 vid = sja1105_port_get_tag_8021q_vid(dp);
|
2020-05-06 03:20:55 +08:00
|
|
|
|
|
|
|
vl_lookup[k].vlanid = vid;
|
|
|
|
vl_lookup[k].vlanprior = 0;
|
|
|
|
}
|
|
|
|
/* For critical VLs, the DESTPORTS mask is taken from
|
|
|
|
* the VL Forwarding Table, so no point in putting it
|
|
|
|
* in the VL Lookup Table
|
|
|
|
*/
|
|
|
|
if (rule->vl.type == SJA1105_VL_NONCRITICAL)
|
|
|
|
vl_lookup[k].destports = rule->vl.destports;
|
|
|
|
else
|
|
|
|
vl_lookup[k].iscritical = true;
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
vl_lookup[k].flow_cookie = rule->cookie;
|
2020-05-06 03:20:55 +08:00
|
|
|
k++;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* UM10944.pdf chapter 4.2.3 VL Lookup table:
|
|
|
|
* "the entries in the VL Lookup table must be sorted in ascending
|
|
|
|
* order (i.e. the smallest value must be loaded first) according to
|
|
|
|
* the following sort order: MACADDR, VLANID, PORT, VLANPRIOR."
|
|
|
|
*/
|
|
|
|
for (i = 0; i < num_virtual_links; i++) {
|
|
|
|
struct sja1105_vl_lookup_entry *a = &vl_lookup[i];
|
|
|
|
|
|
|
|
for (j = i + 1; j < num_virtual_links; j++) {
|
|
|
|
struct sja1105_vl_lookup_entry *b = &vl_lookup[j];
|
|
|
|
|
|
|
|
if (sja1105_vl_key_lower(b, a)) {
|
|
|
|
struct sja1105_vl_lookup_entry tmp = *a;
|
|
|
|
|
|
|
|
*a = *b;
|
|
|
|
*b = tmp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
if (!have_critical_virtual_links)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
/* VL Policing Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_POLICING];
|
|
|
|
table->entries = kcalloc(max_sharindx, table->ops->unpacked_entry_size,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!table->entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
table->entry_count = max_sharindx;
|
|
|
|
vl_policing = table->entries;
|
|
|
|
|
|
|
|
/* VL Forwarding Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_FORWARDING];
|
|
|
|
table->entries = kcalloc(max_sharindx, table->ops->unpacked_entry_size,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!table->entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
table->entry_count = max_sharindx;
|
|
|
|
vl_fwd = table->entries;
|
|
|
|
|
|
|
|
/* VL Forwarding Parameters Table */
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_FORWARDING_PARAMS];
|
|
|
|
table->entries = kcalloc(1, table->ops->unpacked_entry_size,
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!table->entries)
|
|
|
|
return -ENOMEM;
|
|
|
|
table->entry_count = 1;
|
|
|
|
|
|
|
|
for (i = 0; i < num_virtual_links; i++) {
|
|
|
|
unsigned long cookie = vl_lookup[i].flow_cookie;
|
|
|
|
struct sja1105_rule *rule = sja1105_rule_find(priv, cookie);
|
|
|
|
|
|
|
|
if (rule->vl.type == SJA1105_VL_NONCRITICAL)
|
|
|
|
continue;
|
|
|
|
if (rule->vl.type == SJA1105_VL_TIME_TRIGGERED) {
|
|
|
|
int sharindx = rule->vl.sharindx;
|
|
|
|
|
|
|
|
vl_policing[i].type = 1;
|
|
|
|
vl_policing[i].sharindx = sharindx;
|
|
|
|
vl_policing[i].maxlen = rule->vl.maxlen;
|
|
|
|
vl_policing[sharindx].type = 1;
|
|
|
|
|
|
|
|
vl_fwd[i].type = 1;
|
|
|
|
vl_fwd[sharindx].type = 1;
|
|
|
|
vl_fwd[sharindx].priority = rule->vl.ipv;
|
|
|
|
vl_fwd[sharindx].partition = 0;
|
|
|
|
vl_fwd[sharindx].destports = rule->vl.destports;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-05-13 01:20:37 +08:00
|
|
|
sja1105_frame_memory_partitioning(priv);
|
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
int sja1105_vl_redirect(struct sja1105_private *priv, int port,
|
|
|
|
struct netlink_ext_ack *extack, unsigned long cookie,
|
|
|
|
struct sja1105_key *key, unsigned long destports,
|
|
|
|
bool append)
|
|
|
|
{
|
|
|
|
struct sja1105_rule *rule = sja1105_rule_find(priv, cookie);
|
2021-09-23 02:36:55 +08:00
|
|
|
struct dsa_port *dp = dsa_to_port(priv->ds, port);
|
|
|
|
bool vlan_aware = dsa_port_is_vlan_filtering(dp);
|
2020-05-06 03:20:55 +08:00
|
|
|
int rc;
|
|
|
|
|
2021-09-23 02:36:55 +08:00
|
|
|
if (!vlan_aware && key->type != SJA1105_KEY_VLAN_UNAWARE_VL) {
|
2020-05-06 03:20:55 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
2020-05-13 01:20:27 +08:00
|
|
|
"Can only redirect based on DMAC");
|
2020-05-06 03:20:55 +08:00
|
|
|
return -EOPNOTSUPP;
|
2021-09-23 02:36:55 +08:00
|
|
|
} else if (vlan_aware && key->type != SJA1105_KEY_VLAN_AWARE_VL) {
|
2020-05-06 03:20:55 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
2020-05-13 01:20:27 +08:00
|
|
|
"Can only redirect based on {DMAC, VID, PCP}");
|
2020-05-06 03:20:55 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!rule) {
|
|
|
|
rule = kzalloc(sizeof(*rule), GFP_KERNEL);
|
|
|
|
if (!rule)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
rule->cookie = cookie;
|
|
|
|
rule->type = SJA1105_RULE_VL;
|
|
|
|
rule->key = *key;
|
|
|
|
list_add(&rule->list, &priv->flow_block.rules);
|
|
|
|
}
|
|
|
|
|
|
|
|
rule->port_mask |= BIT(port);
|
|
|
|
if (append)
|
|
|
|
rule->vl.destports |= destports;
|
|
|
|
else
|
|
|
|
rule->vl.destports = destports;
|
|
|
|
|
|
|
|
rc = sja1105_init_virtual_links(priv, extack);
|
|
|
|
if (rc) {
|
|
|
|
rule->port_mask &= ~BIT(port);
|
|
|
|
if (!rule->port_mask) {
|
|
|
|
list_del(&rule->list);
|
|
|
|
kfree(rule);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
int sja1105_vl_delete(struct sja1105_private *priv, int port,
|
|
|
|
struct sja1105_rule *rule, struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
int rc;
|
|
|
|
|
|
|
|
rule->port_mask &= ~BIT(port);
|
|
|
|
if (!rule->port_mask) {
|
|
|
|
list_del(&rule->list);
|
|
|
|
kfree(rule);
|
|
|
|
}
|
|
|
|
|
2020-06-24 21:54:46 +08:00
|
|
|
rc = sja1105_compose_gating_subschedule(priv, extack);
|
2020-05-06 03:20:55 +08:00
|
|
|
if (rc)
|
|
|
|
return rc;
|
|
|
|
|
|
|
|
rc = sja1105_init_virtual_links(priv, extack);
|
|
|
|
if (rc)
|
|
|
|
return rc;
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
|
2020-06-24 21:54:46 +08:00
|
|
|
rc = sja1105_init_scheduling(priv);
|
|
|
|
if (rc < 0)
|
|
|
|
return rc;
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
|
2020-05-06 03:20:55 +08:00
|
|
|
return sja1105_static_config_reload(priv, SJA1105_VIRTUAL_LINKS);
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
int sja1105_vl_gate(struct sja1105_private *priv, int port,
|
|
|
|
struct netlink_ext_ack *extack, unsigned long cookie,
|
|
|
|
struct sja1105_key *key, u32 index, s32 prio,
|
|
|
|
u64 base_time, u64 cycle_time, u64 cycle_time_ext,
|
|
|
|
u32 num_entries, struct action_gate_entry *entries)
|
|
|
|
{
|
|
|
|
struct sja1105_rule *rule = sja1105_rule_find(priv, cookie);
|
2021-09-23 02:36:55 +08:00
|
|
|
struct dsa_port *dp = dsa_to_port(priv->ds, port);
|
|
|
|
bool vlan_aware = dsa_port_is_vlan_filtering(dp);
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
int ipv = -1;
|
|
|
|
int i, rc;
|
|
|
|
s32 rem;
|
|
|
|
|
|
|
|
if (cycle_time_ext) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Cycle time extension not supported");
|
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
div_s64_rem(base_time, sja1105_delta_to_ns(1), &rem);
|
|
|
|
if (rem) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Base time must be multiple of 200 ns");
|
|
|
|
return -ERANGE;
|
|
|
|
}
|
|
|
|
|
|
|
|
div_s64_rem(cycle_time, sja1105_delta_to_ns(1), &rem);
|
|
|
|
if (rem) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Cycle time must be multiple of 200 ns");
|
|
|
|
return -ERANGE;
|
|
|
|
}
|
|
|
|
|
2021-09-23 02:36:55 +08:00
|
|
|
if (!vlan_aware && key->type != SJA1105_KEY_VLAN_UNAWARE_VL) {
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
2020-05-13 01:20:27 +08:00
|
|
|
"Can only gate based on DMAC");
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
return -EOPNOTSUPP;
|
2021-09-23 02:36:55 +08:00
|
|
|
} else if (vlan_aware && key->type != SJA1105_KEY_VLAN_AWARE_VL) {
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
2020-05-13 01:20:27 +08:00
|
|
|
"Can only gate based on {DMAC, VID, PCP}");
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
return -EOPNOTSUPP;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!rule) {
|
|
|
|
rule = kzalloc(sizeof(*rule), GFP_KERNEL);
|
|
|
|
if (!rule)
|
|
|
|
return -ENOMEM;
|
|
|
|
|
|
|
|
list_add(&rule->list, &priv->flow_block.rules);
|
|
|
|
rule->cookie = cookie;
|
|
|
|
rule->type = SJA1105_RULE_VL;
|
|
|
|
rule->key = *key;
|
|
|
|
rule->vl.type = SJA1105_VL_TIME_TRIGGERED;
|
|
|
|
rule->vl.sharindx = index;
|
|
|
|
rule->vl.base_time = base_time;
|
|
|
|
rule->vl.cycle_time = cycle_time;
|
|
|
|
rule->vl.num_entries = num_entries;
|
|
|
|
rule->vl.entries = kcalloc(num_entries,
|
|
|
|
sizeof(struct action_gate_entry),
|
|
|
|
GFP_KERNEL);
|
|
|
|
if (!rule->vl.entries) {
|
|
|
|
rc = -ENOMEM;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
for (i = 0; i < num_entries; i++) {
|
|
|
|
div_s64_rem(entries[i].interval,
|
|
|
|
sja1105_delta_to_ns(1), &rem);
|
|
|
|
if (rem) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Interval must be multiple of 200 ns");
|
|
|
|
rc = -ERANGE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (!entries[i].interval) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Interval cannot be zero");
|
|
|
|
rc = -ERANGE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ns_to_sja1105_delta(entries[i].interval) >
|
|
|
|
SJA1105_TAS_MAX_DELTA) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Maximum interval is 52 ms");
|
|
|
|
rc = -ERANGE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (entries[i].maxoctets != -1) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Cannot offload IntervalOctetMax");
|
|
|
|
rc = -EOPNOTSUPP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ipv == -1) {
|
|
|
|
ipv = entries[i].ipv;
|
|
|
|
} else if (ipv != entries[i].ipv) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack,
|
|
|
|
"Only support a single IPV per VL");
|
|
|
|
rc = -EOPNOTSUPP;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
rule->vl.entries[i] = entries[i];
|
|
|
|
}
|
|
|
|
|
|
|
|
if (ipv == -1) {
|
|
|
|
if (key->type == SJA1105_KEY_VLAN_AWARE_VL)
|
|
|
|
ipv = key->vl.pcp;
|
|
|
|
else
|
|
|
|
ipv = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* TODO: support per-flow MTU */
|
|
|
|
rule->vl.maxlen = VLAN_ETH_FRAME_LEN + ETH_FCS_LEN;
|
|
|
|
rule->vl.ipv = ipv;
|
|
|
|
}
|
|
|
|
|
|
|
|
rule->port_mask |= BIT(port);
|
|
|
|
|
|
|
|
rc = sja1105_compose_gating_subschedule(priv, extack);
|
|
|
|
if (rc)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
rc = sja1105_init_virtual_links(priv, extack);
|
|
|
|
if (rc)
|
|
|
|
goto out;
|
|
|
|
|
|
|
|
if (sja1105_gating_check_conflicts(priv, -1, extack)) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "Conflict with tc-taprio schedule");
|
|
|
|
rc = -ERANGE;
|
|
|
|
goto out;
|
|
|
|
}
|
|
|
|
|
|
|
|
out:
|
|
|
|
if (rc) {
|
|
|
|
rule->port_mask &= ~BIT(port);
|
|
|
|
if (!rule->port_mask) {
|
|
|
|
list_del(&rule->list);
|
|
|
|
kfree(rule->vl.entries);
|
|
|
|
kfree(rule);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
static int sja1105_find_vlid(struct sja1105_private *priv, int port,
|
|
|
|
struct sja1105_key *key)
|
|
|
|
{
|
|
|
|
struct sja1105_vl_lookup_entry *vl_lookup;
|
|
|
|
struct sja1105_table *table;
|
|
|
|
int i;
|
|
|
|
|
|
|
|
if (WARN_ON(key->type != SJA1105_KEY_VLAN_AWARE_VL &&
|
|
|
|
key->type != SJA1105_KEY_VLAN_UNAWARE_VL))
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
table = &priv->static_config.tables[BLK_IDX_VL_LOOKUP];
|
|
|
|
vl_lookup = table->entries;
|
|
|
|
|
|
|
|
for (i = 0; i < table->entry_count; i++) {
|
|
|
|
if (key->type == SJA1105_KEY_VLAN_AWARE_VL) {
|
|
|
|
if (vl_lookup[i].port == port &&
|
|
|
|
vl_lookup[i].macaddr == key->vl.dmac &&
|
|
|
|
vl_lookup[i].vlanid == key->vl.vid &&
|
|
|
|
vl_lookup[i].vlanprior == key->vl.pcp)
|
|
|
|
return i;
|
|
|
|
} else {
|
|
|
|
if (vl_lookup[i].port == port &&
|
|
|
|
vl_lookup[i].macaddr == key->vl.dmac)
|
|
|
|
return i;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
|
|
|
|
int sja1105_vl_stats(struct sja1105_private *priv, int port,
|
|
|
|
struct sja1105_rule *rule, struct flow_stats *stats,
|
|
|
|
struct netlink_ext_ack *extack)
|
|
|
|
{
|
|
|
|
const struct sja1105_regs *regs = priv->info->regs;
|
|
|
|
u8 buf[SJA1105_SIZE_VL_STATUS] = {0};
|
|
|
|
u64 unreleased;
|
|
|
|
u64 timingerr;
|
|
|
|
u64 lengtherr;
|
|
|
|
int vlid, rc;
|
|
|
|
u64 pkts;
|
|
|
|
|
|
|
|
if (rule->vl.type != SJA1105_VL_TIME_TRIGGERED)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
vlid = sja1105_find_vlid(priv, port, &rule->key);
|
|
|
|
if (vlid < 0)
|
|
|
|
return 0;
|
|
|
|
|
|
|
|
rc = sja1105_xfer_buf(priv, SPI_READ, regs->vl_status + 2 * vlid, buf,
|
|
|
|
SJA1105_SIZE_VL_STATUS);
|
|
|
|
if (rc) {
|
|
|
|
NL_SET_ERR_MSG_MOD(extack, "SPI access failed");
|
|
|
|
return rc;
|
|
|
|
}
|
|
|
|
|
|
|
|
sja1105_unpack(buf, &timingerr, 31, 16, SJA1105_SIZE_VL_STATUS);
|
|
|
|
sja1105_unpack(buf, &unreleased, 15, 0, SJA1105_SIZE_VL_STATUS);
|
|
|
|
sja1105_unpack(buf, &lengtherr, 47, 32, SJA1105_SIZE_VL_STATUS);
|
|
|
|
|
|
|
|
pkts = timingerr + unreleased + lengtherr;
|
|
|
|
|
2020-06-19 14:01:07 +08:00
|
|
|
flow_stats_update(stats, 0, pkts - rule->vl.stats.pkts, 0,
|
net: dsa: sja1105: implement tc-gate using time-triggered virtual links
Restrict the TTEthernet hardware support on this switch to operate as
closely as possible to IEEE 802.1Qci as possible. This means that it can
perform PTP-time-based ingress admission control on streams identified
by {DMAC, VID, PCP}, which is useful when trying to ensure the
determinism of traffic scheduled via IEEE 802.1Qbv.
The oddity comes from the fact that in hardware (and in TTEthernet at
large), virtual links always need a full-blown action, including not
only the type of policing, but also the list of destination ports. So in
practice, a single tc-gate action will result in all packets getting
dropped. Additional actions (either "trap" or "redirect") need to be
specified in the same filter rule such that the conforming packets are
actually forwarded somewhere.
Apart from the VL Lookup, Policing and Forwarding tables which need to
be programmed for each flow (virtual link), the Schedule engine also
needs to be told to open/close the admission gates for each individual
virtual link. A fairly accurate (and detailed) description of how that
works is already present in sja1105_tas.c, since it is already used to
trigger the egress gates for the tc-taprio offload (IEEE 802.1Qbv). Key
point here, we remember that the schedule engine supports 8
"subschedules" (execution threads that iterate through the global
schedule in parallel, and that no 2 hardware threads must execute a
schedule entry at the same time). For tc-taprio, each egress port used
one of these 8 subschedules, leaving a total of 4 subschedules unused.
In principle we could have allocated 1 subschedule for the tc-gate
offload of each ingress port, but actually the schedules of all virtual
links installed on each ingress port would have needed to be merged
together, before they could have been programmed to hardware. So
simplify our life and just merge the entire tc-gate configuration, for
all virtual links on all ingress ports, into a single subschedule. Be
sure to check that against the usual hardware scheduling conflicts, and
program it to hardware alongside any tc-taprio subschedule that may be
present.
The following scenarios were tested:
1. Quantitative testing:
tc qdisc add dev swp2 clsact
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate index 1 base-time 0 \
sched-entry OPEN 1200 -1 -1 \
sched-entry CLOSE 1200 -1 -1 \
action trap
ping 192.168.1.2 -f
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
.............................
--- 192.168.1.2 ping statistics ---
948 packets transmitted, 467 received, 50.7384% packet loss, time 9671ms
2. Qualitative testing (with a phase-aligned schedule - the clocks are
synchronized by ptp4l, not shown here):
Receiver (sja1105):
tc qdisc add dev swp2 clsact
now=$(phc_ctl /dev/ptp1 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc filter add dev swp2 ingress flower skip_sw \
dst_mac 42:be:24:9b:76:20 \
action gate base-time ${base_time} \
sched-entry OPEN 60000 -1 -1 \
sched-entry CLOSE 40000 -1 -1 \
action trap
Sender (enetc):
now=$(phc_ctl /dev/ptp0 get | awk '/clock time is/ {print $5}') && \
sec=$(echo $now | awk -F. '{print $1}') && \
base_time="$(((sec + 2) * 1000000000))" && \
echo "base time ${base_time}"
tc qdisc add dev eno0 parent root taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time ${base_time} \
sched-entry S 01 50000 \
sched-entry S 00 50000 \
flags 2
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
1425 packets transmitted, 1424 packets received, 0% packet loss
round-trip min/avg/max = 0.322/0.361/0.990 ms
And just for comparison, with the tc-taprio schedule deleted:
ping -A 192.168.1.1
PING 192.168.1.1 (192.168.1.1): 56 data bytes
...
^C
--- 192.168.1.1 ping statistics ---
33 packets transmitted, 19 packets received, 42% packet loss
round-trip min/avg/max = 0.336/0.464/0.597 ms
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-06 03:20:56 +08:00
|
|
|
jiffies - rule->vl.stats.lastused,
|
|
|
|
FLOW_ACTION_HW_STATS_IMMEDIATE);
|
|
|
|
|
|
|
|
rule->vl.stats.pkts = pkts;
|
|
|
|
rule->vl.stats.lastused = jiffies;
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|