linux-sg2042/tools
Maxim Mikityanskiy d03b195b5a sch_htb: Hierarchical QoS hardware offload
HTB doesn't scale well because of contention on a single lock, and it
also consumes CPU. This patch adds support for offloading HTB to
hardware that supports hierarchical rate limiting.

In the offload mode, HTB passes control commands to the driver using
ndo_setup_tc. The driver has to replicate the whole hierarchy of classes
and their settings (rate, ceil) in the NIC. Every modification of the
HTB tree caused by the admin results in ndo_setup_tc being called.

After this setup, the HTB algorithm is done completely in the NIC. An SQ
(send queue) is created for every leaf class and attached to the
hierarchy, so that the NIC can calculate and obey aggregated rate
limits, too. In the future, it can be changed, so that multiple SQs will
back a single leaf class.

ndo_select_queue is responsible for selecting the right queue that
serves the traffic class of each packet.

The data path works as follows: a packet is classified by clsact, the
driver selects a hardware queue according to its class, and the packet
is enqueued into this queue's qdisc.

This solution addresses two main problems of scaling HTB:

1. Contention by flow classification. Currently the filters are attached
to the HTB instance as follows:

    # tc filter add dev eth0 parent 1:0 protocol ip flower dst_port 80
    classid 1:10

It's possible to move classification to clsact egress hook, which is
thread-safe and lock-free:

    # tc filter add dev eth0 egress protocol ip flower dst_port 80
    action skbedit priority 1:10

This way classification still happens in software, but the lock
contention is eliminated, and it happens before selecting the TX queue,
allowing the driver to translate the class to the corresponding hardware
queue in ndo_select_queue.

Note that this is already compatible with non-offloaded HTB and doesn't
require changes to the kernel nor iproute2.

2. Contention by handling packets. HTB is not multi-queue, it attaches
to a whole net device, and handling of all packets takes the same lock.
When HTB is offloaded, it registers itself as a multi-queue qdisc,
similarly to mq: HTB is attached to the netdev, and each queue has its
own qdisc.

Some features of HTB may be not supported by some particular hardware,
for example, the maximum number of classes may be limited, the
granularity of rate and ceil parameters may be different, etc. - so, the
offload is not enabled by default, a new parameter is used to enable it:

    # tc qdisc replace dev eth0 root handle 1: htb offload

Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-22 20:41:29 -08:00
..
accounting
arch tools headers UAPI: Synch KVM's svm.h header with the kernel 2020-12-24 09:24:20 -03:00
bootconfig tools/bootconfig: Add tracing_on support to helper scripts 2021-01-14 10:32:20 -05:00
bpf tools/bpftool: Add -Wall when building BPF programs 2021-01-13 19:11:14 -08:00
build Merge remote-tracking branch 'torvalds/master' into perf/core 2020-12-17 14:37:24 -03:00
cgroup blk-iocost: update iocost_monitor.py 2020-09-01 19:38:33 -06:00
debugging docs: Update documentation to reflect what TAINT_CPU_OUT_OF_SPEC means 2020-12-08 10:53:58 -07:00
edid
firewire
firmware
gpio tools: gpio: add option to report wall-clock time to gpio-event-mon 2020-12-05 23:22:48 +01:00
hv
iio iio: add IIO_MOD_O2 modifier 2020-08-22 10:53:12 +01:00
include sch_htb: Hierarchical QoS hardware offload 2021-01-22 20:41:29 -08:00
io_uring tools/io_uring: fix compile breakage 2020-09-21 07:50:58 -06:00
kvm/kvm_stat tools/kvm_stat: Exempt time-based counters 2020-12-11 19:18:51 -05:00
laptop
leds
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-01-20 12:16:11 -08:00
memory-model tools/memory-model: Label MP tests' producers and consumers 2020-11-06 17:25:17 -08:00
objtool Fix a segfault that occurs when built with Clang. 2020-12-27 09:08:23 -08:00
pci
pcmcia
perf perf inject: Correct event attribute sizes 2021-01-15 17:28:28 -03:00
power Kbuild updates for v5.11 2020-12-22 14:02:39 -08:00
scripts tools: Factor HOSTCC, HOSTLD, HOSTAR definitions 2020-11-11 12:18:22 -08:00
spi
testing Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2021-01-20 12:16:11 -08:00
thermal/tmon
time
usb tools: usb: move to tools buildsystem 2020-08-19 14:11:44 +02:00
virtio tools/virtio: add barrier for aarch64 2020-12-18 16:14:30 -05:00
vm mm: Add PG_arch_2 page flag 2020-09-04 12:46:06 +01:00
wmi
Makefile bpf: Compile resolve_btfids tool at kernel compilation start 2020-07-13 10:42:02 -07:00