Sync code to the same with tk4 pub/lts/0017-kabi, except deleted rue
and wujing. Partners can submit pull requests to this branch, and we
can pick the commits to tk4 pub/lts/0017-kabi easly.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Gitee limit the repo's size to 3GB, to reduce the size of the code,
sync codes to ock 5.4.119-20.0009.21 in one commit.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Sync kernel codes to the same with 590eaf1fec ("Init Repo base on
linux 5.4.32 long term, and add base tlinux kernel interfaces."), which
is from tk4, and it is the base of tk4.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Since v5.2 (commit "netlink: re-add parse/validate functions in strict
mode") NL_VALIDATE_STRICT is enabled. Fix the ipset nla_policies which did
not support strict mode and convert from deprecated parsings to verified ones.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Same as commit 1b4a75108d ("netfilter: ipset: Copy the right MAC
address in bitmap:ip,mac and hash:ip,mac sets"), another copy and paste
went wrong in commit 8cc4ccf583 ("netfilter: ipset: Allow matching on
destination MAC address for mac and ipmac sets").
When I fixed this for IPv4 in 1b4a75108d, I didn't realise that
hash:ip,mac sets also support IPv6 as family, and this is covered by a
separate function, hash_ipmac6_kadt().
In hash:ip,mac sets, the first dimension is the IP address, and the
second dimension is the MAC address: check the IPSET_DIM_TWO_SRC flag
in flags while deciding which MAC address to copy, destination or
source.
This way, mixing source and destination matches for the two dimensions
of ip,mac hash type works as expected, also for IPv6. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 2001:db8::1/64 dev veth1
ip -net A addr add 2001:db8::2/64 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_hash hash:ip,mac family inet6
ip netns exec A ipset add test_hash 2001:db8::1,${dst}
ip netns exec A ip6tables -A INPUT -p icmpv6 --icmpv6-type 135 -j ACCEPT
ip netns exec A ip6tables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset now correctly matches a test packet:
# ping -c1 2001:db8::2 >/dev/null
# echo $?
0
Reported-by: Chen, Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("netfilter: ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
The copy_to_user() function returns the number of bytes remaining to be
copied. In this code, that positive return is checked at the end of the
function and we return zero/success. What we should do instead is
return -EFAULT.
Fixes: a7b4f989a6 ("netfilter: ipset: IP set core support")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for net-next:
1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.
2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.
3) More strict validation of IPVS sysctl values, from Junwei Hu.
4) Remove unnecessary spaces after on the right hand side of assignments,
from yangxingwu.
5) Add offload support for bitwise operation.
6) Extend the nft_offload_reg structure to store immediate date.
7) Collapse several ip_set header files into ip_set.h, from
Jeremy Sowden.
8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
from Jeremy Sowden.
9) Fix several sparse warnings due to missing prototypes, from
Valdis Kletnieks.
10) Use static lock initialiser to ensure connlabel spinlock is
initialized on boot time to fix sched/act_ct.c, patch
from Florian Westphal.
====================
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
linux/netfilter/ipset/ip_set.h included four other header files:
include/linux/netfilter/ipset/ip_set_comment.h
include/linux/netfilter/ipset/ip_set_counter.h
include/linux/netfilter/ipset/ip_set_skbinfo.h
include/linux/netfilter/ipset/ip_set_timeout.h
Of these the first three were not included anywhere else. The last,
ip_set_timeout.h, was included in a couple of other places, but defined
inline functions which call other inline functions defined in ip_set.h,
so ip_set.h had to be included before it.
Inlined all four into ip_set.h, and updated the other files that
included ip_set_timeout.h.
Signed-off-by: Jeremy Sowden <jeremy@azazel.net>
Acked-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Shijie Luo reported that when stress-testing ipset with multiple concurrent
create, rename, flush, list, destroy commands, it can result
ipset <version>: Broken LIST kernel message: missing DATA part!
error messages and broken list results. The problem was the rename operation
was not properly handled with respect of listing. The patch fixes the issue.
Reported-by: Shijie Luo <luoshijie1@huawei.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I added to the
KADT functions for sets matching on MAC addreses the copy of source or
destination MAC address depending on the configured match.
This was done correctly for hash:mac, but for hash:ip,mac and
bitmap:ip,mac, copying and pasting the same code block presents an
obvious problem: in these two set types, the MAC address is the second
dimension, not the first one, and we are actually selecting the MAC
address depending on whether the first dimension (IP address) specifies
source or destination.
Fix this by checking for the IPSET_DIM_TWO_SRC flag in option flags.
This way, mixing source and destination matches for the two dimensions
of ip,mac set types works as expected. With this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset create test_bitmap bitmap:ip,mac range 192.0.0.0/16
ip netns exec A ipset add test_bitmap 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_bitmap src,dst -j DROP
ip netns exec A ipset create test_hash hash:ip,mac
ip netns exec A ipset add test_hash 192.0.2.1,${dst}
ip netns exec A iptables -A INPUT -m set ! --match-set test_hash src,dst -j DROP
ipset correctly matches a test packet:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
In commit 8cc4ccf583 ("ipset: Allow matching on destination MAC address
for mac and ipmac sets"), ipset.git commit 1543514c46a7, I removed the
KADT check that prevents matching on destination MAC addresses for
hash:mac sets, but forgot to remove the same check for hash:ip,mac set.
Drop this check: functionality is now commented in man pages and there's
no reason to restrict to source MAC address matching anymore.
Reported-by: Chen Yi <yiche@redhat.com>
Fixes: 8cc4ccf583 ("ipset: Allow matching on destination MAC address for mac and ipmac sets")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Resolve conflict between d2912cb15b ("treewide: Replace GPLv2
boilerplate/reference with SPDX - rule 500") removing the GPL disclaimer
and fe03d47456 ("Update my email address") which updates Jozsef
Kadlecsik's email.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Based on 2 normalized pattern(s):
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation
this program is free software you can redistribute it and or modify
it under the terms of the gnu general public license version 2 as
published by the free software foundation #
extracted by the scancode license scanner the SPDX license identifier
GPL-2.0-only
has been chosen to replace the boilerplate/reference in 4122 file(s).
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Enrico Weigelt <info@metux.net>
Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Allison Randal <allison@lohutok.net>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
It's better to use my kadlec@netfilter.org email address in
the source code. I might not be able to use
kadlec@blackhole.kfki.hu in the future.
Signed-off-by: Jozsef Kadlecsik <kadlec@netfilter.org>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
If a fresh array block is allocated during resize, the current in-memory
set size should be increased by the size of the block, not replaced by it.
Before the fix, adding entries to a hash set type, leading to a table
resize, caused an inconsistent memory size to be reported. This becomes
more obvious when swapping sets with similar sizes:
# cat hash_ip_size.sh
#!/bin/sh
FAIL_RETRIES=10
tries=0
while [ ${tries} -lt ${FAIL_RETRIES} ]; do
ipset create t1 hash:ip
for i in `seq 1 4345`; do
ipset add t1 1.2.$((i / 255)).$((i % 255))
done
t1_init="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"
ipset create t2 hash:ip
for i in `seq 1 4360`; do
ipset add t2 1.2.$((i / 255)).$((i % 255))
done
t2_init="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"
ipset swap t1 t2
t1_swap="$(ipset list t1|sed -n 's/Size in memory: \(.*\)/\1/p')"
t2_swap="$(ipset list t2|sed -n 's/Size in memory: \(.*\)/\1/p')"
ipset destroy t1
ipset destroy t2
tries=$((tries + 1))
if [ ${t1_init} -lt 10000 ] || [ ${t2_init} -lt 10000 ]; then
echo "FAIL after ${tries} tries:"
echo "T1 size ${t1_init}, after swap ${t1_swap}"
echo "T2 size ${t2_init}, after swap ${t2_swap}"
exit 1
fi
done
echo "PASS"
# echo -n 'func hash_ip4_resize +p' > /sys/kernel/debug/dynamic_debug/control
# ./hash_ip_size.sh
[ 2035.018673] attempt to resize set t1 from 10 to 11, t 00000000fe6551fa
[ 2035.078583] set t1 resized from 10 (00000000fe6551fa) to 11 (00000000172a0163)
[ 2035.080353] Table destroy by resize 00000000fe6551fa
FAIL after 4 tries:
T1 size 9064, after swap 71128
T2 size 71128, after swap 9064
Reported-by: NOYB <JunkYardMail1@Frontier.com>
Fixes: 9e41f26a50 ("netfilter: ipset: Count non-static extension memory for userspace")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
In dump_init() the outdated comment was incorrect and we had a missing
validation check of nla_parse_deprecated().
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
When nla_parse fails, we should not use the results (the first
argument). The fix checks if it fails, and if so, returns its error code
upstream.
Signed-off-by: Aditya Pakki <pakki001@umn.edu>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Both functions are using exactly the same code, except the command value
passed to call_ad function.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
One of the memset call is buggy: it does not erase full array, but only pointer size.
Moreover, after a check, first step of nla_parse_nested/nla_parse is to
erase tb array as well. We can remove both calls safely.
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Add SPDX license identifiers to all Make/Kconfig files which:
- Have no license information of any form
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Add SPDX license identifiers to all files which:
- Have no license information of any form
- Have EXPORT_.*_SYMBOL_GPL inside which was used in the
initial scan/conversion to ignore the file
These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:
GPL-2.0-only
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We currently have two levels of strict validation:
1) liberal (default)
- undefined (type >= max) & NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
- garbage at end of message accepted
2) strict (opt-in)
- NLA_UNSPEC attributes accepted
- attribute length >= expected accepted
Split out parsing strictness into four different options:
* TRAILING - check that there's no trailing data after parsing
attributes (in message or nested)
* MAXTYPE - reject attrs > max known type
* UNSPEC - reject attributes with NLA_UNSPEC policy entries
* STRICT_ATTRS - strictly validate attribute size
The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().
Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.
We end up with the following renames:
* nla_parse -> nla_parse_deprecated
* nla_parse_strict -> nla_parse_deprecated_strict
* nlmsg_parse -> nlmsg_parse_deprecated
* nlmsg_parse_strict -> nlmsg_parse_deprecated_strict
* nla_parse_nested -> nla_parse_nested_deprecated
* nla_validate_nested -> nla_validate_nested_deprecated
Using spatch, of course:
@@
expression TB, MAX, HEAD, LEN, POL, EXT;
@@
-nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
+nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression NLH, HDRLEN, TB, MAX, POL, EXT;
@@
-nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
+nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
@@
expression TB, MAX, NLA, POL, EXT;
@@
-nla_parse_nested(TB, MAX, NLA, POL, EXT)
+nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
@@
expression START, MAX, POL, EXT;
@@
-nla_validate_nested(START, MAX, POL, EXT)
+nla_validate_nested_deprecated(START, MAX, POL, EXT)
@@
expression NLH, HDRLEN, MAX, POL, EXT;
@@
-nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
+nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.
Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.
Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.
In effect then, this adds fully strict validation for any new command.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
After the previous commit, both ipset_nest_start() and ipset_nest_end() are
just aliases for nla_nest_start() and nla_nest_end() so that there is no
need to keep them.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
1) Support for destination MAC in ipset, from Stefano Brivio.
2) Disallow all-zeroes MAC address in ipset, also from Stefano.
3) Add IPSET_CMD_GET_BYNAME and IPSET_CMD_GET_BYINDEX commands,
introduce protocol version number 7, from Jozsef Kadlecsik.
A follow up patch to fix ip_set_byindex() is also included
in this batch.
4) Honor CTA_MARK_MASK from ctnetlink, from Andreas Jaggi.
5) Statify nf_flow_table_iterate(), from Taehee Yoo.
6) Use nf_flow_table_iterate() to simplify garbage collection in
nf_flow_table logic, also from Taehee Yoo.
7) Don't use _bh variants of call_rcu(), rcu_barrier() and
synchronize_rcu_bh() in Netfilter, from Paul E. McKenney.
8) Remove NFC_* cache definition from the old caching
infrastructure.
9) Remove layer 4 port rover in NAT helpers, use random port
instead, from Florian Westphal.
10) Use strscpy() in ipset, from Qian Cai.
11) Remove NF_NAT_RANGE_PROTO_RANDOM_FULLY branch now that
random port is allocated by default, from Xiaozhou Liu.
12) Ignore NF_NAT_RANGE_PROTO_RANDOM too, from Florian Westphal.
13) Limit port allocation selection routine in NAT to avoid
softlockup splats when most ports are in use, from Florian.
14) Remove unused parameters in nf_ct_l4proto_unregister_sysctl()
from Yafang Shao.
15) Direct call to nf_nat_l4proto_unique_tuple() instead of
indirection, from Florian Westphal.
16) Several patches to remove all layer 4 NAT indirections,
remove nf_nat_l4proto struct, from Florian Westphal.
17) Fix RTP/RTCP source port translation when SNAT is in place,
from Alin Nastac.
18) Selective rule dump per chain, from Phil Sutter.
19) Revisit CLUSTERIP target, this includes a deadlock fix from
netns path, sleep in atomic, remove bogus WARN_ON_ONCE()
and disallow mismatching IP address and MAC address.
Patchset from Taehee Yoo.
20) Update UDP timeout to stream after 2 seconds, from Florian.
21) Shrink UDP established timeout to 120 seconds like TCP timewait.
22) Sysctl knobs to set GRE timeouts, from Yafang Shao.
23) Move seq_print_acct() to conntrack core file, from Florian.
24) Add enum for conntrack sysctl knobs, also from Florian.
25) Place nf_conntrack_acct, nf_conntrack_helper, nf_conntrack_events
and nf_conntrack_timestamp knobs in the core, from Florian Westphal.
As a side effect, shrink netns_ct structure by removing obsolete
sysctl anchors, also from Florian.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To make overflows as obvious as possible and to prevent code from blithely
proceeding with a truncated string. This also has a side-effect to fix a
compilation warning when using GCC 8.2.1.
net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
net/netfilter/ipset/ip_set_core.c:2027:3: warning: 'strncpy' writing 32 bytes into a region of size 2 overflows the destination [-Wstringop-overflow=]
Signed-off-by: Qian Cai <cai@gmx.us>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
New function added by "Introduction of new commands and protocol
version 7" is not working, since we return skb2 to user
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
In the error handling block, nla_nest_cancel(skb, atd) is called to
cancel the nest operation. But then, ipset_nest_end(skb, atd) is
unexpected called to end the nest operation. This patch calls the
ipset_nest_end only on the branch that nla_nest_cancel is not called.
Fixes: 45040978c8 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that call_rcu()'s callback is not invoked until after bh-disable
regions of code have completed (in addition to explicitly marked
RCU read-side critical sections), call_rcu() can be used in place
of call_rcu_bh(). Similarly, rcu_barrier() can be used in place of
rcu_barrier_bh() and synchronize_rcu() in place of synchronize_rcu_bh().
This commit therefore makes these changes.
Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jozsef Kadlecsik says:
====================
- Introduction of new commands and thus protocol version 7. The
new commands makes possible to eliminate the getsockopt interface
of ipset and use solely netlink to communicate with the kernel.
Due to the strict attribute checking both in user/kernel space,
a new protocol number was introduced. Both the kernel/userspace is
fully backward compatible.
- Make invalid MAC address checks consisten, from Stefano Brivio.
The patch depends on the next one.
- Allow matching on destination MAC address for mac and ipmac sets,
also from Stefano Brivio.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The ip_set() macro is called when either ip_set_ref_lock held only
or no lock/nfnl mutex is held at dumping. Take this into account
properly. Also, use Pablo's suggestion to use rcu_dereference_raw(),
the ref_netlink protects the set.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
ip_set_create() and ip_set_net_init() attempt to allocate physically
contiguous memory for ip_set_list. If memory is fragmented, the
allocations could easily fail:
vzctl: page allocation failure: order:7, mode:0xc0d0
Call Trace:
dump_stack+0x19/0x1b
warn_alloc_failed+0x110/0x180
__alloc_pages_nodemask+0x7bf/0xc60
alloc_pages_current+0x98/0x110
kmalloc_order+0x18/0x40
kmalloc_order_trace+0x26/0xa0
__kmalloc+0x279/0x290
ip_set_net_init+0x4b/0x90 [ip_set]
ops_init+0x3b/0xb0
setup_net+0xbb/0x170
copy_net_ns+0xf1/0x1c0
create_new_namespaces+0xf9/0x180
copy_namespaces+0x8e/0xd0
copy_process+0xb61/0x1a00
do_fork+0x91/0x320
Use kvcalloc() to fallback to 0-order allocations if high order
page isn't available.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Allow /0 as advertised for hash:net,port,net sets.
For "hash:net,port,net", ipset(8) says that "either subnet
is permitted to be a /0 should you wish to match port
between all destinations."
Make that statement true.
Before:
# ipset create cidrzero hash:net,port,net
# ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
ipset v6.34: The value of the CIDR parameter of the IP address is invalid
# ipset create cidrzero6 hash:net,port,net family inet6
# ipset add cidrzero6 ::/0,12345,::/0
ipset v6.34: The value of the CIDR parameter of the IP address is invalid
After:
# ipset create cidrzero hash:net,port,net
# ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
# ipset test cidrzero 192.168.205.129,12345,172.16.205.129
192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero.
# ipset create cidrzero6 hash:net,port,net family inet6
# ipset add cidrzero6 ::/0,12345,::/0
# ipset test cidrzero6 fe80::1,12345,ff00::1
fe80::1,tcp:12345,ff00::1 is in set cidrzero6.
See also:
https://bugzilla.kernel.org/show_bug.cgi?id=200897df7ff6efb0
Signed-off-by: Eric Westbrook <linux@westbrook.io>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 45040978c8 ("netfilter: ipset: Fix set:list type crash
when flush/dump set in parallel") postponed decreasing set
reference counters to the RCU callback.
An 'ipset del' command can terminate before the RCU grace period
is elapsed, and if sets are listed before then, the reference
counter shown in userspace will be wrong:
# ipset create h hash:ip; ipset create l list:set; ipset add l
# ipset del l h; ipset list h
Name: h
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 88
References: 1
Number of entries: 0
Members:
# sleep 1; ipset list h
Name: h
Type: hash:ip
Revision: 4
Header: family inet hashsize 1024 maxelem 65536
Size in memory: 88
References: 0
Number of entries: 0
Members:
Fix this by making the reference count update synchronous again.
As a result, when sets are listed, ip_set_name_byindex() might
now fetch a set whose reference count is already zero. Instead
of relying on the reference count to protect against concurrent
set renaming, grab ip_set_ref_lock as reader and copy the name,
while holding the same lock in ip_set_rename() as writer
instead.
Reported-by: Li Shuang <shuali@redhat.com>
Fixes: 45040978c8 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Two new commands (IPSET_CMD_GET_BYNAME, IPSET_CMD_GET_BYINDEX) are
introduced. The new commands makes possible to eliminate the getsockopt
operation (in iptables set/SET match/target) and thus use only netlink
communication between userspace and kernel for ipset. With the new
protocol version, userspace can exactly know which functionality is
supported by the running kernel.
Both the kernel and userspace is fully backward compatible.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Set types bitmap:ipmac and hash:ipmac check that MAC addresses
are not all zeroes.
Introduce one missing check, and make the remaining ones
consistent, using is_zero_ether_addr() instead of comparing
against an array containing zeroes.
This was already done for hash:mac sets in commit 26c97c5d8d
("netfilter: ipset: Use is_zero_ether_addr instead of static and
memcmp").
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
There doesn't seem to be any reason to restrict MAC address
matching to source MAC addresses in set types bitmap:ipmac,
hash:ipmac and hash:mac. With this patch, and this setup:
ip netns add A
ip link add veth1 type veth peer name veth2 netns A
ip addr add 192.0.2.1/24 dev veth1
ip -net A addr add 192.0.2.2/24 dev veth2
ip link set veth1 up
ip -net A link set veth2 up
ip netns exec A ipset create test hash:mac
dst=$(ip netns exec A cat /sys/class/net/veth2/address)
ip netns exec A ipset add test ${dst}
ip netns exec A iptables -P INPUT DROP
ip netns exec A iptables -I INPUT -m set --match-set test dst -j ACCEPT
ipset will match packets based on destination MAC address:
# ping -c1 192.0.2.2 >/dev/null
# echo $?
0
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
lockdep_assert_held() is better suited to checking locking requirements,
since it won't get confused when someone else holds the lock. This is
also a step towards possibly removing spin_is_locked().
Signed-off-by: Lance Roy <ldr709@gmail.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: <netfilter-devel@vger.kernel.org>
Cc: <coreteam@netfilter.org>
Cc: <netdev@vger.kernel.org>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Userspace `ipset` command forbids family option for hash:mac type:
ipset create test hash:mac family inet4
ipset v6.30: Unknown argument: `family'
However, this check is not done in kernel itself. When someone use
external netlink applications (pyroute2 python library for example), one
can create hash:mac with invalid family and inconsistant results from
userspace (`ipset` command cannot read set content anymore).
This patch enforce the logic in kernel, and forbids insertion of
hash:mac with a family set.
Since IP_SET_PROTO_UNDEF is defined only for hash:mac, this patch has no
impact on other hash:* sets
Signed-off-by: Florent Fourcot <florent.fourcot@wifirst.fr>
Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Pablo Neira Ayuso says:
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:
1) Make sure userspace provides a valid standard target verdict, from
Florian Westphal.
2) Sanitize error target size, also from Florian.
3) Validate that last rule in basechain matches underflow/policy since
userspace assumes this when decoding the ruleset blob that comes
from the kernel, from Florian.
4) Consolidate hook entry checks through xt_check_table_hooks(),
patch from Florian.
5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
very large compat offset arrays, so we have a reasonable upper limit
and fuzzers don't exercise the oom-killer. Patches from Florian.
6) Several WARN_ON checks on xtables mutex helper, from Florian.
7) xt_rateest now has a hashtable per net, from Cong Wang.
8) Consolidate counter allocation in xt_counters_alloc(), from Florian.
9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
from Xin Long.
10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
Felix Fietkau.
11) Consolidate code through flow_offload_fill_dir(), also from Felix.
12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
to remove a dependency with flowtable and ipv6.ko, from Felix.
13) Cache mtu size in flow_offload_tuple object, this is safe for
forwarding as f87c10a8aa describes, from Felix.
14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
modular infrastructure, from Felix.
15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
Ahmed Abdelsalam.
16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.
17) Support for counting only to nf_conncount infrastructure, patch
from Yi-Hung Wei.
18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
to nft_ct.
19) Use boolean as return value from ipt_ah and from IPVS too, patch
from Gustavo A. R. Silva.
20) Remove useless parameters in nfnl_acct_overquota() and
nf_conntrack_broadcast_help(), from Taehee Yoo.
21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.
22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.
23) Fix typo in xt_limit, from Geert Uytterhoeven.
24) Do no use VLAs in Netfilter code, again from Gustavo.
25) Use ADD_COUNTER from ebtables, from Taehee Yoo.
26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.
27) Use pr_*() and add pr_fmt(), from Arushi Singhal.
28) Add synproxy support to ctnetlink.
29) ICMP type and IGMP matching support for ebtables, patches from
Matthias Schiffer.
30) Support for the revision infrastructure to ebtables, from
Bernie Harris.
31) String match support for ebtables, also from Bernie.
32) Documentation for the new flowtable infrastructure.
33) Use generic comparison functions in ebt_stp, from Joe Perches.
34) Demodularize filter chains in nftables.
35) Register conntrack hooks in case nftables NAT chain is added.
36) Merge assignments with return in a couple of spots in the
Netfilter codebase, also from Arushi.
37) Document that xtables percpu counters are stored in the same
memory area, from Ben Hutchings.
38) Revert mark_source_chains() sanity checks that break existing
rulesets, from Florian Westphal.
39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
To make the test a bit clearer and to reduce object size a little.
Miscellanea:
o remove now unnecessary static const array
$ size ip_set_hash_mac.o*
text data bss dec hex filename
22822 4619 64 27505 6b71 ip_set_hash_mac.o.allyesconfig.new
22932 4683 64 27679 6c1f ip_set_hash_mac.o.allyesconfig.old
10443 1040 0 11483 2cdb ip_set_hash_mac.o.defconfig.new
10507 1040 0 11547 2d1b ip_set_hash_mac.o.defconfig.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Synchronous pernet_operations are not allowed anymore.
All are asynchronous. So, drop the structure member.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
#ifdef CONFIG_IP_SET
extern func(void);
#else
static inline func(void);
#endif
So, there are no more pernet operations, dereferencing
net_generic(net, ip_set_net_id).
ip_set_net_ops are OK to be executed in parallel
for several net, so we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix OOM that syskaller triggers with ipt_replace.size = -1 and
IPT_SO_SET_REPLACE socket option, from Dmitry Vyukov.
2) Check for too long extension name in xt_request_find_{match|target}
that result in out-of-bound reads, from Eric Dumazet.
3) Fix memory exhaustion bug in ipset hash:*net* types when adding ranges
that look like x.x.x.x-255.255.255.255, from Jozsef Kadlecsik.
4) Fix pointer leaks to userspace in x_tables, from Dmitry Vyukov.
5) Insufficient sanity checks in clusterip_tg_check(), also from Dmitry.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix wraparound bug which could lead to memory exhaustion when adding an
x.x.x.x-255.255.255.255 range to any hash:*net* types.
Fixes Netfilter's bugzilla id #1212, reported by Thomas Schwark.
Fixes: 48596a8ddc ("netfilter: ipset: Fix adding an IPv4 range containing more than 2^31 addresses")
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Several reasons for this:
* Several modules maintain internal version numbers, that they print at
boot/module load time, that are not exposed to userspace, as a
primitive mechanism to make revision number control from the earlier
days of Netfilter.
* IPset shows the protocol version at boot/module load time, instead
display this via module description, as Jozsef suggested.
* Remove copyright notice at boot/module load time in two spots, the
Netfilter codebase is a collective development effort, if we would
have to display copyrights for each contributor at boot/module load
time for each extensions we have, we would probably fill up logs with
lots of useless information - from a technical standpoint.
So let's be consistent and remove them all.
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Patch "netfilter: ipset: use nfnl_mutex_is_locked" is added the real
mutex locking check, which revealed the missing locking in ip_set_net_exit().
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Reported-by: syzbot+36b06f219f2439fe62e1@syzkaller.appspotmail.com
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The matching of the counters was not taken into account, fixed.
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Make use of the swap macro and remove unnecessary variables tmp.
This makes the code easier to read and maintain.
This code was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
When sets are extremely large we can get softlockup during ipset -L.
We could fix this by adding cond_resched_rcu() at the right location
during iteration, but this only works if RCU nesting depth is 1.
At this time entire variant->list() is called under under rcu_read_lock_bh.
This used to be a read_lock_bh() but as rcu doesn't really lock anything,
it does not appear to be needed, so remove it (ipset increments set
reference count before this, so a set deletion should not be possible).
Reported-by: Li Shuang <shuali@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>