OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Lorenzo Bianconi	0fabd2aa19	net: netfilter: add bpf_ct_set_nat_info kfunc helper Introduce bpf_ct_set_nat_info kfunc helper in order to set source and destination nat addresses/ports in a new allocated ct entry not inserted in the connection tracking table yet. Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/9567db2fdfa5bebe7b7cc5870f7a34549418b4fc.1663778601.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-21 19:25:26 -07:00
Daniel Xu	fdf214978a	bpf: Move nf_conn extern declarations to filter.h We're seeing the following new warnings on netdev/build_32bit and netdev/build_allmodconfig_warn CI jobs: ../net/core/filter.c:8608:1: warning: symbol 'nf_conn_btf_access_lock' was not declared. Should it be static? ../net/core/filter.c:8611:5: warning: symbol 'nfct_bsa' was not declared. Should it be static? Fix by ensuring extern declaration is present while compiling filter.o. Fixes: `864b656f82` ("bpf: Add support for writing to nf_conn:mark") Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/2bd2e0283df36d8a4119605878edb1838d144174.1663683114.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2022-09-20 14:41:35 -07:00
Daniel Xu	5a090aa350	bpf: Rename nfct_bsa to nfct_btf_struct_access The former name was a little hard to guess. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/73adc72385c8b162391fbfb404f0b6d4c5cc55d7.1663683114.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2022-09-20 14:30:34 -07:00
Daniel Xu	864b656f82	bpf: Add support for writing to nf_conn:mark Support direct writes to nf_conn:mark from TC and XDP prog types. This is useful when applications want to store per-connection metadata. This is also particularly useful for applications that run both bpf and iptables/nftables because the latter can trivially access this metadata. One example use case would be if a bpf prog is responsible for advanced packet classification and iptables/nftables is later used for routing due to pre-existing/legacy code. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-09-10 17:27:32 -07:00
Kees Cook	710d21fdff	netlink: Bounds-check struct nlmsgerr creation In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(), switch from __nlmsg_put to nlmsg_put(), and explain the bounds check for dealing with the memcpy() across a composite flexible array struct. Avoids this future run-time warning: memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16) Cc: Jakub Kicinski <kuba@kernel.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: syzbot <syzkaller@googlegroups.com> Cc: netfilter-devel@vger.kernel.org Cc: coreteam@netfilter.org Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org Signed-off-by: David S. Miller <davem@davemloft.net>	2022-09-05 14:45:22 +01:00
Jakub Kicinski	9c5d03d362	genetlink: start to validate reserved header bytes We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-29 12:47:15 +01:00
Jakub Kicinski	880b0dd94f	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/mellanox/mlx5/core/en_fs.c `21234e3a84` ("net/mlx5e: Fix use after free in mlx5e_fs_init()") `c7eafc5ed0` ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer") https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/ https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-25 16:07:42 -07:00
Jakub Kicinski	24c7a64ea4	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net 1) Fix crash with malformed ebtables blob which do not provide all entry points, from Florian Westphal. 2) Fix possible TCP connection clogging up with default 5-days timeout in conntrack, from Florian. 3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian. 4) Do not allow to update implicit chains. 5) Make table handle allocation per-netns to fix data race. 6) Do not truncated payload length and offset, and checksum offset. Instead report EINVAl. 7) Enable chain stats update via static key iff no error occurs. 8) Restrict osf expression to ip, ip6 and inet families. 9) Restrict tunnel expression to netdev family. 10) Fix crash when trying to bind again an already bound chain. 11) Flowtable garbage collector might leave behind pending work to delete entries. This patch comes with a previous preparation patch as dependency. 12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases netfilter: flowtable: fix stuck flows on cleanup due to pending work netfilter: flowtable: add function to invoke garbage collection immediately netfilter: nf_tables: disallow binding to already bound chain netfilter: nft_tunnel: restrict it to netdev family netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families netfilter: nf_tables: do not leave chain stats enabled on error netfilter: nft_payload: do not truncate csum_offset and csum_type netfilter: nft_payload: report ERANGE for too long offset and length netfilter: nf_tables: make table handle allocation per-netns friendly netfilter: nf_tables: disallow updates of implicit chain netfilter: nft_tproxy: restrict to prerouting hook netfilter: conntrack: work around exceeded receive window netfilter: ebtables: reject blobs that don't provide all entry points ==================== Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-24 19:18:10 -07:00
Kuniyuki Iwashima	1227c1771d	net: Fix data-races around sysctl_[rw]mem_(max\|default). While reading sysctl_[rw]mem_(max\|default), they can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: `1da177e4c3` ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-24 13:46:57 +01:00
Pablo Neira Ayuso	9afb4b2734	netfilter: flowtable: fix stuck flows on cleanup due to pending work To clear the flow table on flow table free, the following sequence normally happens in order: 1) gc_step work is stopped to disable any further stats/del requests. 2) All flow table entries are set to teardown state. 3) Run gc_step which will queue HW del work for each flow table entry. 4) Waiting for the above del work to finish (flush). 5) Run gc_step again, deleting all entries from the flow table. 6) Flow table is freed. But if a flow table entry already has pending HW stats or HW add work step 3 will not queue HW del work (it will be skipped), step 4 will wait for the pending add/stats to finish, and step 5 will queue HW del work which might execute after freeing of the flow table. To fix the above, this patch flushes the pending work, then it sets the teardown flag to all flows in the flowtable and it forces a garbage collector run to queue work to remove the flows from hardware, then it flushes this new pending work and (finally) it forces another garbage collector run to remove the entry from the software flowtable. Stack trace: [47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460 [47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704 [47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2 [47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009) [47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table] [47773.889727] Call Trace: [47773.890214] dump_stack+0xbb/0x107 [47773.890818] print_address_description.constprop.0+0x18/0x140 [47773.892990] kasan_report.cold+0x7c/0xd8 [47773.894459] kasan_check_range+0x145/0x1a0 [47773.895174] down_read+0x99/0x460 [47773.899706] nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table] [47773.907137] flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table] [47773.913372] process_one_work+0x8ac/0x14e0 [47773.921325] [47773.921325] Allocated by task 592159: [47773.922031] kasan_save_stack+0x1b/0x40 [47773.922730] __kasan_kmalloc+0x7a/0x90 [47773.923411] tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct] [47773.924363] tcf_ct_init+0x71c/0x1156 [act_ct] [47773.925207] tcf_action_init_1+0x45b/0x700 [47773.925987] tcf_action_init+0x453/0x6b0 [47773.926692] tcf_exts_validate+0x3d0/0x600 [47773.927419] fl_change+0x757/0x4a51 [cls_flower] [47773.928227] tc_new_tfilter+0x89a/0x2070 [47773.936652] [47773.936652] Freed by task 543704: [47773.937303] kasan_save_stack+0x1b/0x40 [47773.938039] kasan_set_track+0x1c/0x30 [47773.938731] kasan_set_free_info+0x20/0x30 [47773.939467] __kasan_slab_free+0xe7/0x120 [47773.940194] slab_free_freelist_hook+0x86/0x190 [47773.941038] kfree+0xce/0x3a0 [47773.941644] tcf_ct_flow_table_cleanup_work Original patch description and stack trace by Paul Blakey. Fixes: `c29f74e0df` ("netfilter: nf_flow_table: hardware offload support") Reported-by: Paul Blakey <paulb@nvidia.com> Tested-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	759eebbcfa	netfilter: flowtable: add function to invoke garbage collection immediately Expose nf_flow_table_gc_run() to force a garbage collector run from the offload infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	e02f0d3970	netfilter: nf_tables: disallow binding to already bound chain Update nft_data_init() to report EINVAL if chain is already bound. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Reported-by: Gwangun Jung <exsociety@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	01e4092d53	netfilter: nft_tunnel: restrict it to netdev family Only allow to use this expression from NFPROTO_NETDEV family. Fixes: `af308b94a2` ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	5f3b7aae14	netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families As it was originally intended, restrict extension to supported families. Fixes: `b96af92d6e` ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	43eb8949cf	netfilter: nf_tables: do not leave chain stats enabled on error Error might occur later in the nf_tables_addchain() codepath, enable static key only after transaction has been created. Fixes: `9f08ea8481` ("netfilter: nf_tables: keep chain counters away from hot path") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	7044ab281f	netfilter: nft_payload: do not truncate csum_offset and csum_type Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support. Fixes: `7ec3f7b47b` ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso	94254f990c	netfilter: nft_payload: report ERANGE for too long offset and length Instead of offset and length are truncation to u8, report ERANGE. Fixes: `96518518cc` ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	ab482c6b66	netfilter: nf_tables: make table handle allocation per-netns friendly mutex is per-netns, move table_netns to the pernet area. read-write to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0: nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline] nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline] nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921 Fixes: `f102d66b33` ("netfilter: nf_tables: use dedicated mutex to guard transactions") Reported-by: Abhishek Shah <abhishek.shah@columbia.edu> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso	5dc52d83ba	netfilter: nf_tables: disallow updates of implicit chain Updates on existing implicit chain make no sense, disallow this. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-24 07:43:20 +02:00
Florian Westphal	18bbc32133	netfilter: nft_tproxy: restrict to prerouting hook TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this. This fixes a crash (null dereference) when using tproxy from e.g. output. Fixes: `4ed8eb6570` ("netfilter: nf_tables: Add native tproxy support") Reported-by: Shell Chen <xierch@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 21:24:34 +02:00
Florian Westphal	cf97769c76	netfilter: conntrack: work around exceeded receive window When a TCP sends more bytes than allowed by the receive window, all future packets can be marked as invalid. This can clog up the conntrack table because of 5-day default timeout. Sequence of packets: 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1] 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8] 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68 Window is 5840 starting from 33212 -> 39052. 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 This is fine, conntrack will flag the connection as having outstanding data (UNACKED), which lowers the conntrack timeout to 300s. 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318 Packet 10 is already sending more than permitted, but conntrack doesn't validate this (only seq is tested vs. maxend, not 'seq+len'). 38484 is acceptable, but only up to 39052, so this packet should not have been sent (or only 568 bytes, not 1318). At this point, connection is still in '300s' mode. Next packet however will get flagged: 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326 nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH .. Now, a couple of replies/acks comes in: 12 initiator > responder: [.], ack 34530, win 4368, [.. irrelevant acks removed ] 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0 This ack is significant -- this acks the last packet send by the responder that conntrack considered valid. This means that ack == td_end. This will withdraw the 'unacked data' flag, the connection moves back to the 5-day timeout of established conntracks. 17 initiator > responder: ack 40128, win 10030, ... This packet is also flagged as invalid. Because conntrack only updates state based on packets that are considered valid, packet 11 'did not exist' and that gets us: nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG Because this received and processed by the endpoints, the conntrack entry remains in a bad state, no packets will ever be considered valid again: 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, .. 31 initiator > responder: [.], ack 40433, win 11348, .. 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 .. ... all trigger 'ACK is over bound' test and we end up with non-early-evictable 5-day default timeout. NB: This patch triggers a bunch of checkpatch warnings because of silly indent. I will resend the cleanup series linked below to reduce the indent level once this change has propagated to net-next. I could route the cleanup via nf but that causes extra backport work for stable maintainers. Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8 Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-23 18:26:48 +02:00
Stephen Hemminger	1202cdd665	Remove DECnet support from kernel DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: David Ahern <dsahern@kernel.org> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-08-22 14:26:30 +01:00
Jakub Kicinski	268603d79c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-18 21:17:10 -07:00
Jakub Kicinski	3f5f728a72	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Andrii Nakryiko says: ==================== bpf-next 2022-08-17 We've added 45 non-merge commits during the last 14 day(s) which contain a total of 61 files changed, 986 insertions(+), 372 deletions(-). The main changes are: 1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt Kanzenbach and Jesper Dangaard Brouer. 2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko. 3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov. 4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires. 5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle unsupported names on old kernels, from Hangbin Liu. 6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton, from Hao Luo. 7) Relax libbpf's requirement for shared libs to be marked executable, from Henqgi Chen. 8) Improve bpf_iter internals handling of error returns, from Hao Luo. 9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard. 10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong. 11) bpftool improvements to handle full BPF program names better, from Manu Bretelle. 12) bpftool fixes around libcap use, from Quentin Monnet. 13) BPF map internals clean ups and improvements around memory allocations, from Yafang Shao. 14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup iterator to work on cgroupv1, from Yosry Ahmed. 15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong. 16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) selftests/bpf: Few fixes for selftests/bpf built in release mode libbpf: Clean up deprecated and legacy aliases libbpf: Streamline bpf_attr and perf_event_attr initialization libbpf: Fix potential NULL dereference when parsing ELF selftests/bpf: Tests libbpf autoattach APIs libbpf: Allows disabling auto attach selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm libbpf: Making bpf_prog_load() ignore name if kernel doesn't support selftests/bpf: Update CI kconfig selftests/bpf: Add connmark read test selftests/bpf: Add existing connection bpf_*_ct_lookup() test bpftool: Clear errno after libcap's checks bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation bpftool: Fix a typo in a comment libbpf: Add names for auxiliary maps bpf: Use bpf_map_area_alloc consistently on bpf map creation bpf: Make __GFP_NOWARN consistent in bpf map creation bpf: Use bpf_map_area_free instread of kvfree bpf: Remove unneeded memset in queue_stack_map creation libbpf: preserve errno across pr_warn/pr_info/pr_debug ... ==================== Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-17 20:29:36 -07:00
Geert Uytterhoeven	aa5762c342	netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y NF_CONNTRACK_PROCFS was marked obsolete in commit `54b07dca68` ("netfilter: provide config option to disable ancient procfs parts") in v3.3. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-17 08:46:30 +02:00
Pablo Neira Ayuso	1b6345d416	netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified Since `f3a2181e16` ("netfilter: nf_tables: Support for sets with multiple ranged fields"), it possible to combine intervals and concatenations. Later on, `ef516e8625` ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag for userspace to report that the set stores a concatenation. Make sure NFT_SET_CONCAT is set on if field_count is specified for consistency. Otherwise, if NFT_SET_CONCAT is specified with no field_count, bail out with EINVAL. Fixes: `ef516e8625` ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 18:41:21 +02:00
Pablo Neira Ayuso	fc0ae524b5	netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END These flags are mutually exclusive, report EINVAL in this case. Fixes: `aaa31047a6` ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 17:54:59 +02:00
Pablo Neira Ayuso	88cccd908d	netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags If the NFT_SET_CONCAT\|NFT_SET_INTERVAL flags are set on, then the netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise, NFTA_SET_ELEM_KEY_END should not be present. For catch-all element, NFTA_SET_ELEM_KEY_END should not be present. The NFT_SET_ELEM_INTERVAL_END is never used with this set flags combination. Fixes: `7b225d0b5c` ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-15 17:54:51 +02:00
Pablo Neira Ayuso	5a2f3dc318	netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag If the NFTA_SET_ELEM_OBJREF netlink attribute is present and NFT_SET_OBJECT flag is set on, report EINVAL. Move existing sanity check earlier to validate that NFT_SET_OBJECT requires NFTA_SET_ELEM_OBJREF. Fixes: `8aeff920dc` ("netfilter: nf_tables: add stateful object reference to set elements") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-12 16:32:16 +02:00
Pablo Neira Ayuso	271c5ca826	netfilter: nf_tables: really skip inactive sets when allocating name While looping to build the bitmap of used anonymous set names, check the current set in the iteration, instead of the one that is being created. Fixes: `37a9cc5255` ("netfilter: nf_tables: add generation mask to sets") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 18:53:48 +02:00
Florian Westphal	0b2f3212b5	netfilter: nfnetlink: re-enable conntrack expectation events To avoid allocation of the conntrack extension area when possible, the default behaviour was changed to only allocate the event extension if a userspace program is subscribed to a notification group. Problem is that while 'conntrack -E' does enable the event allocation behind the scenes, 'conntrack -E expect' does not: no expectation events are delivered unless user sets "net.netfilter.nf_conntrack_events" back to 1 (always on). Fix the autodetection to also consider EXP type group. We need to track the 6 event groups (3+3, new/update/destroy for events and for expectations each) independently, else we'd disable events again if an expectation group becomes empty while there is still an active event group. Fixes: `2794cdb0b9` ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist") Reported-by: Yi Chen <yiche@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2022-08-11 18:09:54 +02:00
Florian Westphal	2024439bd5	netfilter: nf_tables: fix scheduling-while-atomic splat nf_tables_check_loops() can be called from rhashtable list walk so cond_resched() cannot be used here. Fixes: `81ea010667` ("netfilter: nf_tables: add rescheduling points during loop detection walks") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 17:57:28 +02:00
Florian Westphal	976bf59c69	netfilter: nf_ct_irc: cap packet search space to 4k This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. In practice, IRC commands are not expected to exceed 512 bytes, plus this is interactive protocol, so we should not see large packets in practice. Given most IRC connections nowadays use TLS so this helper could also be removed in the near future. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:51:13 +02:00
Florian Westphal	c783a29c7e	netfilter: nf_ct_ftp: prefer skb_linearize This uses a pseudo-linearization scheme with a 64k global buffer, but BIG TCP arrival means IPv6 TCP stack can generate skbs that exceed this size. Use skb_linearize. It should be possible to rewrite this to properly deal with segmented skbs (i.e., only do small chunk-wise accesses), but this is going to be a lot more intrusive than this because every helper function needs to get the sk_buff instead of a pointer to a raw data buffer. In practice, provided we're really looking at FTP control channel packets, there should never be a case where we deal with huge packets. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:51:01 +02:00
Florian Westphal	f3e124c36f	netfilter: nf_ct_h323: cap packet size at 64k With BIG TCP, packets generated by tcp stack may exceed 64kb. Cap datalen at 64kb. The internal message format uses 16bit fields, so no embedded message can exceed 64k size. Multiple h323 messages in a single superpacket may now result in a message to get treated as incomplete/truncated, but thats better than scribbling past h323_buffer. Another alternative suitable for net tree would be a switch to skb_linearize(). Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:50:49 +02:00
Florian Westphal	a664375da7	netfilter: nf_ct_sane: remove pseudo skb linearization For historical reason this code performs pseudo linearization of skbs via skb_header_pointer and a global 64k buffer. With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb. Rewrite this to only extract the needed header data. This also allows to get rid of the locking. Fixes: `7c4e983c4f` ("net: allow gso_max_size to exceed 65536") Fixes: `0fe79f28bf` ("net: allow gro_max_size to exceed 65536") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-11 16:50:25 +02:00
Pablo Neira Ayuso	c485c35ff6	netfilter: nf_tables: possible module reference underflow in error path dst->ops is set on when nft_expr_clone() fails, but module refcount has not been bumped yet, therefore nft_expr_destroy() leads to module reference underflow. Fixes: `8cfd9b0f85` ("netfilter: nftables: generalize set expressions support") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso	4963674c2e	netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces the flag notation. Fixes: `7b225d0b5c` ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso	3400278328	netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access The generation ID is bumped from the commit path while holding the mutex, however, netlink dump operations rely on RCU. This patch also adds missing cb->base_eq initialization in nf_tables_dump_set(). Fixes: `38e029f14a` ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-10 17:06:05 +02:00
Florian Westphal	580077855a	netfilter: nf_tables: fix null deref due to zeroed list head In nf_tables_updtable, if nf_tables_table_enable returns an error, nft_trans_destroy is called to free the transaction object. nft_trans_destroy() calls list_del(), but the transaction was never placed on a list -- the list head is all zeroes, this results in a null dereference: BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59 Call Trace: nft_trans_destroy+0x26/0x59 nf_tables_newtable+0x4bc/0x9bc [..] Its sane to assume that nft_trans_destroy() can be called on the transaction object returned by nft_trans_alloc(), so make sure the list head is initialised. Fixes: `55dd6f9307` ("netfilter: nf_tables: use new transaction infrastructure to handle table") Reported-by: mingi cho <mgcho.minic@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:30 +02:00
Pablo Neira Ayuso	f323ef3a0d	netfilter: nf_tables: disallow jump to implicit chain from set element Extend struct nft_data_desc to add a flag field that specifies nft_data_init() is being called for set element data. Use it to disallow jump to implicit chain from set element, only jump to chain via immediate expression is allowed. Fixes: `d0e2c7de92` ("netfilter: nf_tables: add NFT_CHAIN_BINDING") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso	341b694160	netfilter: nf_tables: upfront validation of data via nft_data_init() Instead of parsing the data and then validate that type and length are correct, pass a description of the expected data so it can be validated upfront before parsing it to bail out earlier. This patch adds a new .size field to specify the maximum size of the data area. The .len field is optional and it is used as an input/output field, it provides the specific length of the expected data in the input path. If then .len field is not specified, then obtained length from the netlink attribute is stored. This is required by cmp, bitwise, range and immediate, which provide no netlink attribute that describes the data length. The immediate expression uses the destination register type to infer the expected data type. Relying on opencoded validation of the expected data might lead to subtle bugs as described in `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data"). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 20:13:29 +02:00
Thadeu Lima de Souza Cascardo	36d5b29132	netfilter: nf_tables: do not allow RULE_ID to refer to another chain When doing lookups for rules on the same batch by using its ID, a rule from a different chain can be used. If a rule is added to a chain but tries to be positioned next to a rule from a different chain, it will be linked to chain2, but the use counter on chain1 would be the one to be incremented. When looking for rules by ID, use the chain that was used for the lookup by name. The chain used in the context copied to the transaction needs to match that same chain. That way, struct nft_rule does not need to get enlarged with another member. Fixes: `1a94e38d25` ("netfilter: nf_tables: add NFTA_RULE_ID attribute") Fixes: `75dd48e2e4` ("netfilter: nf_tables: Support RULE_ID reference in new rule") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:18 +02:00
Thadeu Lima de Souza Cascardo	95f466d223	netfilter: nf_tables: do not allow CHAIN_ID to refer to another table When doing lookups for chains on the same batch by using its ID, a chain from a different table can be used. If a rule is added to a table but refers to a chain in a different table, it will be linked to the chain in table2, but would have expressions referring to objects in table1. Then, when table1 is removed, the rule will not be removed as its linked to a chain in table2. When expressions in the rule are processed or removed, that will lead to a use-after-free. When looking for chains by ID, use the table that was used for the lookup by name, and only return chains belonging to that same table. Fixes: `837830a4b4` ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:17 +02:00
Thadeu Lima de Souza Cascardo	470ee20e06	netfilter: nf_tables: do not allow SET_ID to refer to another table When doing lookups for sets on the same batch by using its ID, a set from a different table can be used. Then, when the table is removed, a reference to the set may be kept after the set is freed, leading to a potential use-after-free. When looking for sets by ID, use the table that was used for the lookup by name, and only return sets belonging to that same table. This fixes CVE-2022-2586, also reported as ZDI-CAN-17470. Reported-by: Team Orca of Sea Security (@seasecresponse) Fixes: `958bee14d0` ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: <stable@vger.kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:17 +02:00
Pablo Neira Ayuso	34aae2c2fb	netfilter: nf_tables: validate variable length element extension Update template to validate variable length extensions. This patch adds a new .ext_len[id] field to the template to store the expected extension length. This is used to sanity check the initialization of the variable length extension. Use PTR_ERR() in nft_set_elem_init() to report errors since, after this update, there are two reason why this might fail, either because of ENOMEM or insufficient room in the extension field (EINVAL). Kernels up until `7e6bc1f6ca` ("netfilter: nf_tables: stricter validation of element data") allowed to copy more data to the extension than was allocated. This ext_len field allows to validate if the destination has the correct size as additional check. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2022-08-09 19:38:16 +02:00
Kumar Kartikeya Dwivedi	6e116280b4	net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink The current ifdefry for code shared by the BPF and ctnetlink side looks ugly. As per Pablo's request, simplify this by unconditionally compiling in the code. This can be revisited when the shared code between the two grows further. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2022-08-09 09:41:54 -07:00
Pablo Neira Ayuso	b06ada6df9	netfilter: flowtable: fix incorrect Kconfig dependencies Remove default to 'y', this infrastructure is not fundamental for the flowtable operational. Add a missing dependency on CONFIG_NF_FLOW_TABLE. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Fixes: `b038177636` ("netfilter: nf_flow_table: count pending offload workqueue tasks") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 18:50:15 -07:00
Florian Westphal	399a14ec79	netfilter: nf_tables: fix crash when nf_trace is enabled do not access info->pkt when info->trace is not 1. nft_traceinfo is not initialized, except when tracing is enabled. The 'nft_trace_enabled' static key cannot be used for this, we must always check info->trace first. Pass nft_pktinfo directly to avoid this. Fixes: `e34b9ed96c` ("netfilter: nf_tables: avoid skb access on nf_stolen") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-08-05 18:50:14 -07:00
Jakub Kicinski	272ac32f56	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-07-28 18:21:16 -07:00

1 2 3 4 5 ...

6128 Commits