Although RCU protection would be possible during diag dump, doing
so allows for concurrent table mutations which can render the
in-table offset between individual Netlink messages invalid and
thus cause legitimate sockets to be skipped in the dump.
Since the diag dump is relatively low volume and consistency is
more important than performance, the table mutex is held during
dump.
Reported-by: Andrey Wagin <avagin@gmail.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Signed-off-by: David S. Miller <davem@davemloft.net>
Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.
Makes use of the new resizable hash table to avoid locking on the
lookup.
The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 30%.
Also splits nl_table_lock into a separate mutex to protect hash table
mutations and allow synchronize_rcu() to sleep while waiting for readers
during expansion and shrinking.
Before:
9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup
6.42% kpktgend_0 [pktgen] [k] mod_cur_headers
6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker
6.23% kpktgend_0 [kernel.kallsyms] [k] memset
4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy
3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract
2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2
After:
15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup
8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker
7.92% kpktgend_0 [pktgen] [k] mod_cur_headers
5.11% kpktgend_0 [kernel.kallsyms] [k] memset
4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract
4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock
3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2
[...]
0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net/netlink/diag.c: In function 'sk_diag_put_rings_cfg':
net/netlink/diag.c:28:17: error: 'struct netlink_sock' has no member named 'pg_vec_lock'
net/netlink/diag.c:29:29: error: 'struct netlink_sock' has no member named 'rx_ring'
net/netlink/diag.c:31:30: error: 'struct netlink_sock' has no member named 'tx_ring'
net/netlink/diag.c:33:19: error: 'struct netlink_sock' has no member named 'pg_vec_lock'
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The netlink_diag can be built as a module, just like it's done in
unix sockets.
The core dumping message carries the basic info about netlink sockets:
family, type and protocol, portis, dst_group, dst_portid, state.
Groups can be received as an optional parameter NETLINK_DIAG_GROUPS.
Netlink sockets cab be filtered by protocols.
The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.
The file /proc/net/netlink doesn't provide enough information for
dumping netlink sockets. It doesn't provide dst_group, dst_portid,
groups above 32.
v2: fix NETLINK_DIAG_MAX. Now it's equal to the last constant.
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>