OpenCloudOS-Kernel

History

Jakub Kicinski 95d1815f09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next 1) Incorrect error check in nft_expr_inner_parse(), from Dan Carpenter. 2) Add DATA_SENT state to SCTP connection tracking helper, from Sriram Yagnaraman. 3) Consolidate nf_confirm for ipv4 and ipv6, from Florian Westphal. 4) Add bitmask support for ipset, from Vishwanath Pai. 5) Handle icmpv6 redirects as RELATED, from Florian Westphal. 6) Add WARN_ON_ONCE() to impossible case in flowtable datapath, from Li Qiong. 7) A large batch of IPVS updates to replace timer-based estimators by kthreads to scale up wrt. CPUs and workload (millions of estimators). Julian Anastasov says: This patchset implements stats estimation in kthread context. It replaces the code that runs on single CPU in timer context every 2 seconds and causing latency splats as shown in reports [1], [2], [3]. The solution targets setups with thousands of IPVS services, destinations and multi-CPU boxes. Spread the estimation on multiple (configured) CPUs and multiple time slots (timer ticks) by using multiple chains organized under RCU rules. When stats are not needed, it is recommended to use run_estimation=0 as already implemented before this change. RCU Locking: - As stats are now RCU-locked, tot_stats, svc and dest which hold estimator structures are now always freed from RCU callback. This ensures RCU grace period after the ip_vs_stop_estimator() call. Kthread data: - every kthread works over its own data structure and all such structures are attached to array. For now we limit kthreads depending on the number of CPUs. - even while there can be a kthread structure, its task may not be running, eg. before first service is added or while the sysctl var is set to an empty cpulist or when run_estimation is set to 0 to disable the estimation. - the allocated kthread context may grow from 1 to 50 allocated structures for timer ticks which saves memory for setups with small number of estimators - a task and its structure may be released if all estimators are unlinked from its chains, leaving the slot in the array empty - every kthread data structure allows limited number of estimators. Kthread 0 is also used to initially calculate the max number of estimators to allow in every chain considering a sub-100 microsecond cond_resched rate. This number can be from 1 to hundreds. - kthread 0 has an additional job of optimizing the adding of estimators: they are first added in temp list (est_temp_list) and later kthread 0 distributes them to other kthreads. The optimization is based on the fact that newly added estimator should be estimated after 2 seconds, so we have the time to offload the adding to chain from controlling process to kthread 0. - to add new estimators we use the last added kthread context (est_add_ktid). The new estimators are linked to the chains just before the estimated one, based on add_row. This ensures their estimation will start after 2 seconds. If estimators are added in bursts, common case if all services and dests are initially configured, we may spread the estimators to more chains and as result, reducing the initial delay below 2 seconds. Many thanks to Jiri Wiesner for his valuable comments and for spending a lot of time reviewing and testing the changes on different platforms with 48-256 CPUs and 1-8 NUMA nodes under different cpufreq governors. The new IPVS estimators do not use workqueue infrastructure because: - The estimation can take long time when using multiple IPVS rules (eg. millions estimator structures) and especially when box has multiple CPUs due to the for_each_possible_cpu usage that expects packets from any CPU. With est_nice sysctl we have more control how to prioritize the estimation kthreads compared to other processes/kthreads that have latency requirements (such as servers). As a benefit, we can see these kthreads in top and decide if we will need some further control to limit their CPU usage (max number of structure to estimate per kthread). - with kthreads we run code that is read-mostly, no write/lock operations to process the estimators in 2-second intervals. - work items are one-shot: as estimators are processed every 2 seconds, they need to be re-added every time. This again loads the timers (add_timer) if we use delayed works, as there are no kthreads to do the timings. [1] Report from Yunhong Jiang: https://lore.kernel.org/netdev/D25792C1-1B89-45DE-9F10-EC350DC04ADC@gmail.com/ [2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2 [3] Report from Dust: https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: ipvs: run_estimation should control the kthread tasks ipvs: add est_cpulist and est_nice sysctl vars ipvs: use kthreads for stats estimation ipvs: use u64_stats_t for the per-cpu counters ipvs: use common functions for stats allocation ipvs: add rcu protection to stats netfilter: flowtable: add a 'default' case to flowtable datapath netfilter: conntrack: set icmpv6 redirects as RELATED netfilter: ipset: Add support for new bitmask parameter netfilter: conntrack: merge ipv4+ipv6 confirm functions netfilter: conntrack: add sctp DATA_SENT state netfilter: nft_inner: fix IS_ERR() vs NULL check ==================== Link: https://lore.kernel.org/r/20221211101204.1751-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>		2022-12-12 14:45:36 -08:00
..
caif	tty: cumulate and document tty_struct::flow* members	2021-05-13 16:57:16 +02:00
device_drivers	net/mlx5: E-Switch, Implement devlink port function cmds to control migratable	2022-12-07 20:09:18 -08:00
devlink	Documentation: devlink: add devlink documentation for the etas_es58x driver	2022-12-12 11:39:13 +01:00
dsa	docs: net: dsa: update information about multiple CPU ports	2022-09-20 10:32:36 +02:00
mac80211_hwsim	docs: net: convert two README files to ReST format	2019-07-31 13:31:56 -06:00
6lowpan.rst	docs: networking: convert 6lowpan.txt to ReST	2020-02-28 14:52:36 +01:00
6pack.rst	docs: networking: convert 6pack.txt to ReST	2020-04-28 14:38:38 -07:00
af_xdp.rst	doc, af_xdp: Fix bind flags option typo	2021-07-12 16:55:01 +02:00
alias.rst	docs: networking: Convert alias.txt to rst	2018-07-18 15:28:27 -07:00
arcnet-hardware.rst	docs: networking: arcnet-hardware.rst: don't duplicate chapter names	2020-05-01 12:24:43 -07:00
arcnet.rst	Documentation: networking: arcnet: drop doubled word	2020-07-04 17:46:21 -07:00
atm.rst	docs: networking: convert atm.txt to ReST	2020-04-28 14:38:38 -07:00
ax25.rst	Documentation: networking: ax25: drop doubled word	2020-07-04 17:46:21 -07:00
bareudp.rst	Documentation: bareudp: Corrected description of bareudp module.	2020-07-28 17:53:03 -07:00
batman-adv.rst	batman-adv: Move IRC channel to hackint.org	2021-08-08 20:05:46 +02:00
bonding.rst	Documentation: bonding: correct xmit hash steps	2022-12-02 10:46:45 +00:00
bridge.rst	docs: networking: Convert bridge.txt to rst	2018-07-18 15:28:27 -07:00
can.rst	can: add termination resistor documentation	2022-10-19 21:33:29 +02:00
can_ucan_protocol.rst	Documentation: networking: can_ucan_protocol: drop doubled words	2020-07-04 17:46:21 -07:00
cdc_mbim.rst	docs: networking: convert cdc_mbim.txt to ReST	2020-04-28 14:38:39 -07:00
checksum-offloads.rst	docs: networking: convert netdev-features.txt to ReST	2020-04-30 12:56:36 -07:00
dccp.rst	net: dccp: Add SIOCOUTQ IOCTL support (send buffer fill)	2020-07-22 17:00:37 -07:00
dctcp.rst	docs: networking: convert dctcp.txt to ReST	2020-04-28 14:38:39 -07:00
dns_resolver.rst	docs: networking: convert dns_resolver.txt to ReST	2020-04-28 14:39:46 -07:00
driver.rst	Documentation: networking: correct possessive "its"	2022-08-31 12:36:08 -07:00
eql.rst	docs: networking: convert eql.txt to ReST	2020-04-28 14:39:46 -07:00
ethtool-netlink.rst	ethtool: add netlink based get rss support	2022-12-05 17:25:00 -08:00
failover.rst	net: Introduce generic failover module	2018-05-28 22:59:54 -04:00
fib_trie.rst	docs: networking: convert fib_trie.txt to ReST	2020-04-28 14:39:46 -07:00
filter.rst	treewide: use get_random_u32() when possible	2022-10-11 17:42:58 -06:00
gen_stats.rst	docs: networking: convert gen_stats.txt to ReST	2020-04-28 14:39:46 -07:00
generic-hdlc.rst	docs: networking: convert generic-hdlc.txt to ReST	2020-04-28 14:39:46 -07:00
generic_netlink.rst	Documentation: networking: Update generic_netlink_howto URL	2022-11-23 17:25:02 -08:00
gtp.rst	docs: networking: convert gtp.txt to ReST	2020-04-28 14:39:46 -07:00
ieee802154.rst	docs: net: ieee802154.rst: fix C expressions	2020-10-15 07:49:41 +02:00
ila.rst	docs: networking: convert ila.txt to ReST	2020-04-28 14:39:47 -07:00
index.rst	Documentation: networking: TC queue based filtering	2022-10-25 10:32:40 +02:00
ioam6-sysctl.rst	ipv6: ioam: Documentation for new IOAM sysctls	2021-07-21 08:14:33 -07:00
ip-sysctl.rst	sctp: add sysctl net.sctp.l3mdev_accept	2022-11-18 11:42:54 +00:00
ip_dynaddr.rst	docs: networking: convert ip_dynaddr.txt to ReST	2020-04-28 14:39:47 -07:00
ipddp.rst	docs: networking: convert ipddp.txt to ReST	2020-04-28 14:39:47 -07:00
ipsec.rst	docs: networking: convert ipsec.txt to ReST	2020-04-28 14:39:47 -07:00
ipv6.rst	docs: networking: convert ipv6.txt to ReST	2020-04-28 14:40:18 -07:00
ipvlan.rst	Documentation: networking: correct possessive "its"	2022-08-31 12:36:08 -07:00
ipvs-sysctl.rst	ipvs: run_estimation should control the kthread tasks	2022-12-10 22:44:43 +01:00
j1939.rst	can: j1939: add tables for the CAN identifier and its fields	2020-11-20 09:43:29 +01:00
kapi.rst	wimax: move out to staging	2020-10-29 19:27:45 +01:00
kcm.rst	docs: networking: convert kcm.txt to ReST	2020-04-28 14:40:19 -07:00
l2tp.rst	Documentation: networking: correct possessive "its"	2022-08-31 12:36:08 -07:00
lapb-module.rst	docs: networking: convert lapb-module.txt to ReST	2020-04-30 12:56:35 -07:00
mac80211-auth-assoc-deauth.txt	…
mac80211-injection.rst	doc: networking: wireless: fix wiki website url	2020-06-08 10:05:53 +02:00
mctp.rst	mctp: Add SIOCMCTP{ALLOC,DROP}TAG ioctls for tag control	2022-02-09 12:00:11 +00:00
mpls-sysctl.rst	docs: networking: convert mpls-sysctl.txt to ReST	2020-04-30 12:56:36 -07:00
mptcp-sysctl.rst	Documentation: mptcp: fix pm_type formatting	2022-09-13 10:18:44 +02:00
msg_zerocopy.rst	docs: use the lore redirector everywhere	2021-10-12 13:58:19 -06:00
multiqueue.rst	docs: networking: convert multiqueue.txt to ReST	2020-04-30 12:56:36 -07:00
net_dim.rst	docs: networking: add full DIM API	2020-04-10 18:11:04 -07:00
net_failover.rst	Documentation: networking: net_failover: Fix documentation	2021-11-17 13:59:49 +00:00
netconsole.rst	docs: networking: convert netconsole.txt to ReST	2020-04-30 12:56:36 -07:00
netdev-features.rst	net: hsr: add offloading support	2021-02-11 13:24:44 -08:00
netdevices.rst	net: bonding: move ioctl handling to private ndo operation	2021-07-27 20:11:45 +01:00
netfilter-sysctl.rst	docs: networking: convert netfilter-sysctl.txt to ReST	2020-04-30 12:56:36 -07:00
netif-msg.rst	docs: networking: convert netif-msg.txt to ReST	2020-04-30 12:56:36 -07:00
nexthop-group-resilient.rst	Documentation: net: Document resilient next-hop groups	2021-03-29 13:51:38 -07:00
nf_conntrack-sysctl.rst	netfilter: conntrack: remove nf_conntrack_helper documentation	2022-09-20 23:50:03 +02:00
nf_flowtable.rst	docs: nf_flowtable: fix compilation and warnings	2021-03-25 17:42:02 -07:00
nfc.rst	docs: networking: nfc: change to rst format	2019-11-23 11:00:19 -08:00
openvswitch.rst	docs: networking: convert openvswitch.txt to ReST	2020-04-30 12:56:36 -07:00
operstates.rst	docs: operstates: document IF_OPER_TESTING	2021-08-02 15:16:04 +01:00
packet_mmap.rst	docs: networking: Replace strncpy() with strscpy()	2021-06-04 11:21:43 -06:00
page_pool.rst	Documentation: update networking/page_pool.rst	2022-03-03 09:55:28 +00:00
phonet.rst	docs: networking: convert phonet.txt to ReST	2020-04-30 12:56:37 -07:00
phy.rst	docs: networking: phy: add missing space	2022-10-05 20:32:39 -07:00
pktgen.rst	pktgen: document the latest pktgen usage options	2021-08-25 13:44:30 +01:00
plip.rst	docs: networking: convert PLIP.txt to ReST	2020-04-30 12:56:37 -07:00
ppp_generic.rst	docs: update ppp_generic.rst to document new ioctls	2020-12-10 13:57:36 -08:00
proc_net_tcp.rst	docs: networking: convert proc_net_tcp.txt to ReST	2020-04-30 12:56:37 -07:00
radiotap-headers.rst	docs: networking: convert radiotap-headers.txt to ReST	2020-04-30 12:56:37 -07:00
rds.rst	Doc: networking: Fix the title's Sphinx overline in rds.rst	2021-11-29 15:18:21 -07:00
regulatory.rst	doc: networking: wireless: fix wiki website url	2020-06-08 10:05:53 +02:00
representors.rst	docs: net: add an explanation of VF (and other) Representors	2022-09-21 07:31:38 -07:00
rxrpc.rst	rxrpc: Remove rxrpc_get_reply_time() which is no longer used	2022-09-01 11:44:13 +01:00
scaling.rst	docs: networking: update XPS to account for netif_set_xps_queue	2020-10-13 16:21:54 -07:00
sctp.rst	docs: networking: convert sctp.txt to ReST	2020-04-30 12:56:38 -07:00
secid.rst	docs: networking: convert secid.txt to ReST	2020-04-30 12:56:38 -07:00
seg6-sysctl.rst	doc: move seg6_flowlabel to seg6-sysctl.rst	2021-04-14 13:13:15 -07:00
segmentation-offloads.rst	networking: : fix typos in code comments	2019-05-20 20:24:34 -04:00
sfp-phylink.rst	doc: sfp-phylink: Fix a broken reference	2022-08-02 21:45:07 -07:00
skbuff.rst	skbuff: render the checksum comment to documentation	2022-05-10 17:48:37 -07:00
smc-sysctl.rst	net/smc: Unbind r/w buffer size from clcsock and make them tunable	2022-09-22 12:58:21 +02:00
snmp_counter.rst	net-next: docs: Fix typos in snmp_counter.rst	2021-01-05 17:07:38 -08:00
statistics.rst	docs: networking: extend the statistics documentation	2021-04-16 16:59:20 -07:00
strparser.rst	docs: networking: convert strparser.txt to ReST	2020-04-30 12:56:38 -07:00
switchdev.rst	docs: net: add an explanation of VF (and other) Representors	2022-09-21 07:31:38 -07:00
sysfs-tagging.rst	Documentation: better locations for sysfs-pci, sysfs-tagging	2020-10-09 09:33:23 -06:00
tc-actions-env-rules.rst	docs: networking: convert tc-actions-env-rules.txt to ReST	2020-04-30 12:56:38 -07:00
tc-queue-filters.rst	Documentation: networking: TC queue based filtering	2022-10-25 10:32:40 +02:00
tcp-thin.rst	docs: networking: convert tcp-thin.txt to ReST	2020-04-30 12:56:38 -07:00
team.rst	docs: networking: convert team.txt to ReST	2020-04-30 12:56:38 -07:00
timestamping.rst	net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP	2022-12-08 19:49:21 -08:00
tipc.rst	Documentation: add more details in tipc.rst	2021-07-01 13:18:18 -07:00
tls-offload-layers.svg	Documentation: add TLS offload documentation	2019-05-22 12:18:20 -07:00
tls-offload-reorder-bad.svg	Documentation: add TLS offload documentation	2019-05-22 12:18:20 -07:00
tls-offload-reorder-good.svg	Documentation: add TLS offload documentation	2019-05-22 12:18:20 -07:00
tls-offload.rst	net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled	2021-01-19 15:58:05 -08:00
tls.rst	tls: rx: add counter for NoPad violations	2022-07-11 19:48:33 -07:00
tproxy.rst	docs: networking: convert tproxy.txt to ReST	2020-04-30 12:56:38 -07:00
tuntap.rst	docs: networking: Replace strncpy() with strscpy()	2021-06-04 11:21:43 -06:00
udplite.rst	docs: networking: convert udplite.txt to ReST	2020-05-01 12:24:40 -07:00
vrf.rst	doc: Document unexpected tcp_l3mdev_accept=1 behavior	2021-08-23 11:53:24 +01:00
vxlan.rst	docs: vxlan: add info about device features	2020-09-28 12:50:12 -07:00
x25-iface.rst	net: x25: Queue received packets in the drivers instead of per-CPU queues	2021-04-05 11:42:12 -07:00
x25.rst	net: x25: Remove unimplemented X.25-over-LLC code stubs	2020-12-12 17:15:33 -08:00
xfrm_device.rst	xfrm: document IPsec packet offload mode	2022-12-05 10:40:29 +01:00
xfrm_proc.rst	docs: networking: convert xfrm_proc.txt to ReST	2020-05-01 12:24:40 -07:00
xfrm_sync.rst	docs: networking: convert xfrm_sync.txt to ReST	2020-05-01 12:24:41 -07:00
xfrm_sysctl.rst	docs: networking: convert xfrm_sysctl.txt to ReST	2020-05-01 12:24:41 -07:00