linux-sg2042/net
Daniel Borkmann c5c6a8ab45 net: tcp: add key management to congestion control
This patch adds necessary infrastructure to the congestion control
framework for later per route congestion control support.

For a per route congestion control possibility, our aim is to store
a unique u32 key identifier into dst metrics, which can then be
mapped into a tcp_congestion_ops struct. We argue that having a
RTAX key entry is the most simple, generic and easy way to manage,
and also keeps the memory footprint of dst entries lower on 64 bit
than with storing a pointer directly, for example. Having a unique
key id also allows for decoupling actual TCP congestion control
module management from the FIB layer, i.e. we don't have to care
about expensive module refcounting inside the FIB at this point.

We first thought of using an IDR store for the realization, which
takes over dynamic assignment of unused key space and also performs
the key to pointer mapping in RCU. While doing so, we stumbled upon
the issue that due to the nature of dynamic key distribution, it
just so happens, arguably in very rare occasions, that excessive
module loads and unloads can lead to a possible reuse of previously
used key space. Thus, previously stale keys in the dst metric are
now being reassigned to a different congestion control algorithm,
which might lead to unexpected behaviour. One way to resolve this
would have been to walk FIBs on the actually rare occasion of a
module unload and reset the metric keys for each FIB in each netns,
but that's just very costly.

Therefore, we argue a better solution is to reuse the unique
congestion control algorithm name member and map that into u32 key
space through jhash. For that, we split the flags attribute (as it
currently uses 2 bits only anyway) into two u32 attributes, flags
and key, so that we can keep the cacheline boundary of 2 cachelines
on x86_64 and cache the precalculated key at registration time for
the fast path. On average we might expect 2 - 4 modules being loaded
worst case perhaps 15, so a key collision possibility is extremely
low, and guaranteed collision-free on LE/BE for all in-tree modules.
Overall this results in much simpler code, and all without the
overhead of an IDR. Due to the deterministic nature, modules can
now be unloaded, the congestion control algorithm for a specific
but unloaded key will fall back to the default one, and on module
reload time it will switch back to the expected algorithm
transparently.

Joint work with Florian Westphal.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-05 22:55:24 -05:00
..
6lowpan net/6lowpan: Remove FSF address from GPL statement. 2014-12-05 12:43:04 +01:00
9p 9p/trans_virtio: enable VQs early 2014-10-15 10:25:04 +10:30
802 net: set name_assign_type in alloc_netdev() 2014-07-15 16:12:48 -07:00
8021q vlan: Add ability to always enable TSO/UFO 2014-12-12 10:58:53 -05:00
appletalk new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
atm put iov_iter into msghdr 2014-12-09 16:29:03 -05:00
ax25 new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
batman-adv batman-adv: avoid NULL dereferences and fix if check 2014-12-23 23:13:37 -05:00
bluetooth Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2015-01-02 15:58:21 -05:00
bridge net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined 2015-01-05 22:52:06 -05:00
caif put iov_iter into msghdr 2014-12-09 16:29:03 -05:00
can can: fix spelling errors 2014-12-07 21:22:05 +01:00
ceph libceph: specify position of extent operation 2014-12-17 20:09:52 +03:00
core net: Do not call ndo_dflt_fdb_dump if ndo_fdb_dump is defined 2015-01-05 22:52:06 -05:00
dcb dcbnl : Disable software interrupts before taking dcb_lock 2014-11-16 14:50:52 -05:00
dccp net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
decnet new helper: memcpy_to_msg() 2014-11-24 04:28:51 -05:00
dns_resolver Merge commit 'v3.16' into next 2014-10-01 00:44:04 +10:00
dsa Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
ethernet net: Add Transparent Ethernet Bridging GRO support. 2015-01-02 15:46:41 -05:00
hsr net/hsr: Remove left-over never-true conditional code. 2014-07-11 15:04:40 -07:00
ieee802154 nl802154: introduce support for cca settings 2014-12-19 00:19:23 +01:00
ipv4 net: tcp: add key management to congestion control 2015-01-05 22:55:24 -05:00
ipv6 net: fib6: convert cfg metric to u32 outside of table write lock 2015-01-05 22:55:24 -05:00
ipx switch ipxrtr_route_packet() from iovec to msghdr 2014-11-24 04:28:49 -05:00
irda irda: Convert function pointer arrays and uses to const 2014-12-10 15:33:16 -05:00
iucv net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
key new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
l2tp l2tp : multicast notification to the registered listeners 2014-12-31 14:17:20 -05:00
lapb lapb: move EXPORT_SYMBOL after functions. 2014-10-24 15:51:42 -04:00
llc llc: Make llc_sap_action_t function pointer arrays const 2014-12-10 15:21:24 -05:00
mac80211 mac80211: free management frame keys when removing station 2014-12-17 14:00:17 +01:00
mac802154 ieee802154: iface: move multiple node type check 2014-12-20 00:06:55 +01:00
mpls mpls: Fix allowed protocols for mpls gso 2014-12-23 23:57:31 -05:00
netfilter rhashtable: Per bucket locks & deferred expansion/shrinking 2015-01-03 14:32:57 -05:00
netlabel netlabel: kernel-doc warning fix 2014-10-09 01:40:05 -04:00
netlink netlink: Lockless lookup with RCU grace period in socket release 2015-01-03 14:32:57 -05:00
netrom new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
nfc Merge tag 'master-2014-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next 2014-12-09 18:12:03 -05:00
openvswitch genetlink: pass only network namespace to genl_has_listeners() 2014-12-27 02:20:23 -05:00
packet packet: Fixed TPACKET V3 to signal poll when block is closed rather than every packet 2014-12-22 15:41:15 -05:00
phonet new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
rds rds: Fix min() warning in rds_message_inc_copy_to_user() 2014-12-15 11:49:09 -05:00
rfkill Driver core patches for 3.19-rc1 2014-12-14 16:10:09 -08:00
rose new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
rxrpc net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
sched Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-12-10 15:48:20 -05:00
sctp net: introduce helper macro for_each_cmsghdr 2014-12-10 22:41:55 -05:00
sunrpc NFS client updates for Linux 3.19 2014-12-10 15:13:13 -08:00
switchdev bridge: call netdev_sw_port_stp_update when bridge port STP status changes 2014-12-02 20:01:22 -08:00
tipc tipc: replace 0 by NULL for pointers 2014-12-31 13:11:39 -05:00
unix put iov_iter into msghdr 2014-12-09 16:29:03 -05:00
vmw_vsock put iov_iter into msghdr 2014-12-09 16:29:03 -05:00
wimax wimax: convert printk to pr_foo() 2014-10-07 20:28:44 -04:00
wireless cfg80211: correctly check ad-hoc channels 2014-12-12 13:40:38 +01:00
x25 new helper: memcpy_from_msg() 2014-11-24 04:28:48 -05:00
xfrm Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next 2014-12-08 21:30:21 -05:00
Kconfig net: introduce generic switch devices support 2014-12-02 20:01:20 -08:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-12-16 15:53:03 -08:00
compat.c put iov_iter into msghdr 2014-12-09 16:29:03 -05:00
socket.c [regression] chunk lost from bd9b51 2014-12-19 07:13:21 -05:00
sysctl_net.c