Commit Graph

6128 Commits

Author SHA1 Message Date
Lorenzo Bianconi 0fabd2aa19 net: netfilter: add bpf_ct_set_nat_info kfunc helper
Introduce bpf_ct_set_nat_info kfunc helper in order to set source and
destination nat addresses/ports in a new allocated ct entry not inserted
in the connection tracking table yet.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/9567db2fdfa5bebe7b7cc5870f7a34549418b4fc.1663778601.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-21 19:25:26 -07:00
Daniel Xu fdf214978a bpf: Move nf_conn extern declarations to filter.h
We're seeing the following new warnings on netdev/build_32bit and
netdev/build_allmodconfig_warn CI jobs:

    ../net/core/filter.c:8608:1: warning: symbol
    'nf_conn_btf_access_lock' was not declared. Should it be static?
    ../net/core/filter.c:8611:5: warning: symbol 'nfct_bsa' was not
    declared. Should it be static?

Fix by ensuring extern declaration is present while compiling filter.o.

Fixes: 864b656f82 ("bpf: Add support for writing to nf_conn:mark")
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/2bd2e0283df36d8a4119605878edb1838d144174.1663683114.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-20 14:41:35 -07:00
Daniel Xu 5a090aa350 bpf: Rename nfct_bsa to nfct_btf_struct_access
The former name was a little hard to guess.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/73adc72385c8b162391fbfb404f0b6d4c5cc55d7.1663683114.git.dxu@dxuuu.xyz
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2022-09-20 14:30:34 -07:00
Daniel Xu 864b656f82 bpf: Add support for writing to nf_conn:mark
Support direct writes to nf_conn:mark from TC and XDP prog types. This
is useful when applications want to store per-connection metadata. This
is also particularly useful for applications that run both bpf and
iptables/nftables because the latter can trivially access this metadata.

One example use case would be if a bpf prog is responsible for advanced
packet classification and iptables/nftables is later used for routing
due to pre-existing/legacy code.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ebca06dea366e3e7e861c12f375a548cc4c61108.1662568410.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-09-10 17:27:32 -07:00
Kees Cook 710d21fdff netlink: Bounds-check struct nlmsgerr creation
In preparation for FORTIFY_SOURCE doing bounds-check on memcpy(),
switch from __nlmsg_put to nlmsg_put(), and explain the bounds check
for dealing with the memcpy() across a composite flexible array struct.
Avoids this future run-time warning:

  memcpy: detected field-spanning write (size 32) of single field "&errmsg->msg" at net/netlink/af_netlink.c:2447 (size 16)

Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Jozsef Kadlecsik <kadlec@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: syzbot <syzkaller@googlegroups.com>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220901071336.1418572-1-keescook@chromium.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-09-05 14:45:22 +01:00
Jakub Kicinski 9c5d03d362 genetlink: start to validate reserved header bytes
We had historically not checked that genlmsghdr.reserved
is 0 on input which prevents us from using those precious
bytes in the future.

One use case would be to extend the cmd field, which is
currently just 8 bits wide and 256 is not a lot of commands
for some core families.

To make sure that new families do the right thing by default
put the onus of opting out of validation on existing families.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-29 12:47:15 +01:00
Jakub Kicinski 880b0dd94f Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ethernet/mellanox/mlx5/core/en_fs.c
  21234e3a84 ("net/mlx5e: Fix use after free in mlx5e_fs_init()")
  c7eafc5ed0 ("net/mlx5e: Convert ethtool_steering member of flow_steering struct to pointer")
https://lore.kernel.org/all/20220825104410.67d4709c@canb.auug.org.au/
https://lore.kernel.org/all/20220823055533.334471-1-saeed@kernel.org/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-25 16:07:42 -07:00
Jakub Kicinski 24c7a64ea4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

1) Fix crash with malformed ebtables blob which do not provide all
   entry points, from Florian Westphal.

2) Fix possible TCP connection clogging up with default 5-days
   timeout in conntrack, from Florian.

3) Fix crash in nf_tables tproxy with unsupported chains, also from Florian.

4) Do not allow to update implicit chains.

5) Make table handle allocation per-netns to fix data race.

6) Do not truncated payload length and offset, and checksum offset.
   Instead report EINVAl.

7) Enable chain stats update via static key iff no error occurs.

8) Restrict osf expression to ip, ip6 and inet families.

9) Restrict tunnel expression to netdev family.

10) Fix crash when trying to bind again an already bound chain.

11) Flowtable garbage collector might leave behind pending work to
    delete entries. This patch comes with a previous preparation patch
    as dependency.

12) Allow net.netfilter.nf_conntrack_frag6_high_thresh to be lowered,
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: nf_defrag_ipv6: allow nf_conntrack_frag6_high_thresh increases
  netfilter: flowtable: fix stuck flows on cleanup due to pending work
  netfilter: flowtable: add function to invoke garbage collection immediately
  netfilter: nf_tables: disallow binding to already bound chain
  netfilter: nft_tunnel: restrict it to netdev family
  netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
  netfilter: nf_tables: do not leave chain stats enabled on error
  netfilter: nft_payload: do not truncate csum_offset and csum_type
  netfilter: nft_payload: report ERANGE for too long offset and length
  netfilter: nf_tables: make table handle allocation per-netns friendly
  netfilter: nf_tables: disallow updates of implicit chain
  netfilter: nft_tproxy: restrict to prerouting hook
  netfilter: conntrack: work around exceeded receive window
  netfilter: ebtables: reject blobs that don't provide all entry points
====================

Link: https://lore.kernel.org/r/20220824220330.64283-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-24 19:18:10 -07:00
Kuniyuki Iwashima 1227c1771d net: Fix data-races around sysctl_[rw]mem_(max|default).
While reading sysctl_[rw]mem_(max|default), they can be changed
concurrently.  Thus, we need to add READ_ONCE() to its readers.

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-24 13:46:57 +01:00
Pablo Neira Ayuso 9afb4b2734 netfilter: flowtable: fix stuck flows on cleanup due to pending work
To clear the flow table on flow table free, the following sequence
normally happens in order:

  1) gc_step work is stopped to disable any further stats/del requests.
  2) All flow table entries are set to teardown state.
  3) Run gc_step which will queue HW del work for each flow table entry.
  4) Waiting for the above del work to finish (flush).
  5) Run gc_step again, deleting all entries from the flow table.
  6) Flow table is freed.

But if a flow table entry already has pending HW stats or HW add work
step 3 will not queue HW del work (it will be skipped), step 4 will wait
for the pending add/stats to finish, and step 5 will queue HW del work
which might execute after freeing of the flow table.

To fix the above, this patch flushes the pending work, then it sets the
teardown flag to all flows in the flowtable and it forces a garbage
collector run to queue work to remove the flows from hardware, then it
flushes this new pending work and (finally) it forces another garbage
collector run to remove the entry from the software flowtable.

Stack trace:
[47773.882335] BUG: KASAN: use-after-free in down_read+0x99/0x460
[47773.883634] Write of size 8 at addr ffff888103b45aa8 by task kworker/u20:6/543704
[47773.885634] CPU: 3 PID: 543704 Comm: kworker/u20:6 Not tainted 5.12.0-rc7+ #2
[47773.886745] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)
[47773.888438] Workqueue: nf_ft_offload_del flow_offload_work_handler [nf_flow_table]
[47773.889727] Call Trace:
[47773.890214]  dump_stack+0xbb/0x107
[47773.890818]  print_address_description.constprop.0+0x18/0x140
[47773.892990]  kasan_report.cold+0x7c/0xd8
[47773.894459]  kasan_check_range+0x145/0x1a0
[47773.895174]  down_read+0x99/0x460
[47773.899706]  nf_flow_offload_tuple+0x24f/0x3c0 [nf_flow_table]
[47773.907137]  flow_offload_work_handler+0x72d/0xbe0 [nf_flow_table]
[47773.913372]  process_one_work+0x8ac/0x14e0
[47773.921325]
[47773.921325] Allocated by task 592159:
[47773.922031]  kasan_save_stack+0x1b/0x40
[47773.922730]  __kasan_kmalloc+0x7a/0x90
[47773.923411]  tcf_ct_flow_table_get+0x3cb/0x1230 [act_ct]
[47773.924363]  tcf_ct_init+0x71c/0x1156 [act_ct]
[47773.925207]  tcf_action_init_1+0x45b/0x700
[47773.925987]  tcf_action_init+0x453/0x6b0
[47773.926692]  tcf_exts_validate+0x3d0/0x600
[47773.927419]  fl_change+0x757/0x4a51 [cls_flower]
[47773.928227]  tc_new_tfilter+0x89a/0x2070
[47773.936652]
[47773.936652] Freed by task 543704:
[47773.937303]  kasan_save_stack+0x1b/0x40
[47773.938039]  kasan_set_track+0x1c/0x30
[47773.938731]  kasan_set_free_info+0x20/0x30
[47773.939467]  __kasan_slab_free+0xe7/0x120
[47773.940194]  slab_free_freelist_hook+0x86/0x190
[47773.941038]  kfree+0xce/0x3a0
[47773.941644]  tcf_ct_flow_table_cleanup_work

Original patch description and stack trace by Paul Blakey.

Fixes: c29f74e0df ("netfilter: nf_flow_table: hardware offload support")
Reported-by: Paul Blakey <paulb@nvidia.com>
Tested-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 759eebbcfa netfilter: flowtable: add function to invoke garbage collection immediately
Expose nf_flow_table_gc_run() to force a garbage collector run from the
offload infrastructure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso e02f0d3970 netfilter: nf_tables: disallow binding to already bound chain
Update nft_data_init() to report EINVAL if chain is already bound.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Reported-by: Gwangun Jung <exsociety@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 01e4092d53 netfilter: nft_tunnel: restrict it to netdev family
Only allow to use this expression from NFPROTO_NETDEV family.

Fixes: af308b94a2 ("netfilter: nf_tables: add tunnel support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 5f3b7aae14 netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
As it was originally intended, restrict extension to supported families.

Fixes: b96af92d6e ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 43eb8949cf netfilter: nf_tables: do not leave chain stats enabled on error
Error might occur later in the nf_tables_addchain() codepath, enable
static key only after transaction has been created.

Fixes: 9f08ea8481 ("netfilter: nf_tables: keep chain counters away from hot path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 7044ab281f netfilter: nft_payload: do not truncate csum_offset and csum_type
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type
is not support.

Fixes: 7ec3f7b47b ("netfilter: nft_payload: add packet mangling support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:21 +02:00
Pablo Neira Ayuso 94254f990c netfilter: nft_payload: report ERANGE for too long offset and length
Instead of offset and length are truncation to u8, report ERANGE.

Fixes: 96518518cc ("netfilter: add nftables")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso ab482c6b66 netfilter: nf_tables: make table handle allocation per-netns friendly
mutex is per-netns, move table_netns to the pernet area.

*read-write* to 0xffffffff883a01e8 of 8 bytes by task 6542 on cpu 0:
 nf_tables_newtable+0x6dc/0xc00 net/netfilter/nf_tables_api.c:1221
 nfnetlink_rcv_batch net/netfilter/nfnetlink.c:513 [inline]
 nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:634 [inline]
 nfnetlink_rcv+0xa6a/0x13a0 net/netfilter/nfnetlink.c:652
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x652/0x730 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x643/0x740 net/netlink/af_netlink.c:1921

Fixes: f102d66b33 ("netfilter: nf_tables: use dedicated mutex to guard transactions")
Reported-by: Abhishek Shah <abhishek.shah@columbia.edu>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:20 +02:00
Pablo Neira Ayuso 5dc52d83ba netfilter: nf_tables: disallow updates of implicit chain
Updates on existing implicit chain make no sense, disallow this.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-24 07:43:20 +02:00
Florian Westphal 18bbc32133 netfilter: nft_tproxy: restrict to prerouting hook
TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
This fixes a crash (null dereference) when using tproxy from e.g. output.

Fixes: 4ed8eb6570 ("netfilter: nf_tables: Add native tproxy support")
Reported-by: Shell Chen <xierch@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-23 21:24:34 +02:00
Florian Westphal cf97769c76 netfilter: conntrack: work around exceeded receive window
When a TCP sends more bytes than allowed by the receive window, all future
packets can be marked as invalid.
This can clog up the conntrack table because of 5-day default timeout.

Sequence of packets:
 01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1]
 02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8]
 03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0
 04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68

Window is 5840 starting from 33212 -> 39052.

 05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0
 06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

This is fine, conntrack will flag the connection as having outstanding
data (UNACKED), which lowers the conntrack timeout to 300s.

 07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
 10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318

Packet 10 is already sending more than permitted, but conntrack doesn't
validate this (only seq is tested vs. maxend, not 'seq+len').

38484 is acceptable, but only up to 39052, so this packet should
not have been sent (or only 568 bytes, not 1318).

At this point, connection is still in '300s' mode.

Next packet however will get flagged:
 11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326

nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH ..

Now, a couple of replies/acks comes in:

 12 initiator > responder: [.], ack 34530, win 4368,
[.. irrelevant acks removed ]
 16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0

This ack is significant -- this acks the last packet send by the
responder that conntrack considered valid.

This means that ack == td_end.  This will withdraw the
'unacked data' flag, the connection moves back to the 5-day timeout
of established conntracks.

 17 initiator > responder: ack 40128, win 10030, ...

This packet is also flagged as invalid.

Because conntrack only updates state based on packets that are
considered valid, packet 11 'did not exist' and that gets us:

nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG

Because this received and processed by the endpoints, the conntrack entry
remains in a bad state, no packets will ever be considered valid again:

 30 responder > initiator: [F.], seq 40432, ack 2045, win 391, ..
 31 initiator > responder: [.], ack 40433, win 11348, ..
 32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 ..

... all trigger 'ACK is over bound' test and we end up with
non-early-evictable 5-day default timeout.

NB: This patch triggers a bunch of checkpatch warnings because of silly
indent.  I will resend the cleanup series linked below to reduce the
indent level once this change has propagated to net-next.

I could route the cleanup via nf but that causes extra backport work for
stable maintainers.

Link: https://lore.kernel.org/netfilter-devel/20220720175228.17880-1-fw@strlen.de/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-23 18:26:48 +02:00
Stephen Hemminger 1202cdd665 Remove DECnet support from kernel
DECnet is an obsolete network protocol that receives more attention
from kernel janitors than users. It belongs in computer protocol
history museum not in Linux kernel.

It has been "Orphaned" in kernel since 2010. The iproute2 support
for DECnet was dropped in 5.0 release. The documentation link on
Sourceforge says it is abandoned there as well.

Leave the UAPI alone to keep userspace programs compiling.
This means that there is still an empty neighbour table
for AF_DECNET.

The table of /proc/sys/net entries was updated to match
current directories and reformatted to be alphabetical.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: David Ahern <dsahern@kernel.org>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-22 14:26:30 +01:00
Jakub Kicinski 268603d79c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-18 21:17:10 -07:00
Jakub Kicinski 3f5f728a72 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Andrii Nakryiko says:

====================
bpf-next 2022-08-17

We've added 45 non-merge commits during the last 14 day(s) which contain
a total of 61 files changed, 986 insertions(+), 372 deletions(-).

The main changes are:

1) New bpf_ktime_get_tai_ns() BPF helper to access CLOCK_TAI, from Kurt
   Kanzenbach and Jesper Dangaard Brouer.

2) Few clean ups and improvements for libbpf 1.0, from Andrii Nakryiko.

3) Expose crash_kexec() as kfunc for BPF programs, from Artem Savkov.

4) Add ability to define sleepable-only kfuncs, from Benjamin Tissoires.

5) Teach libbpf's bpf_prog_load() and bpf_map_create() to gracefully handle
   unsupported names on old kernels, from Hangbin Liu.

6) Allow opting out from auto-attaching BPF programs by libbpf's BPF skeleton,
   from Hao Luo.

7) Relax libbpf's requirement for shared libs to be marked executable, from
   Henqgi Chen.

8) Improve bpf_iter internals handling of error returns, from Hao Luo.

9) Few accommodations in libbpf to support GCC-BPF quirks, from James Hilliard.

10) Fix BPF verifier logic around tracking dynptr ref_obj_id, from Joanne Koong.

11) bpftool improvements to handle full BPF program names better, from Manu
    Bretelle.

12) bpftool fixes around libcap use, from Quentin Monnet.

13) BPF map internals clean ups and improvements around memory allocations,
    from Yafang Shao.

14) Allow to use cgroup_get_from_file() on cgroupv1, allowing BPF cgroup
    iterator to work on cgroupv1, from Yosry Ahmed.

15) BPF verifier internal clean ups, from Dave Marchevsky and Joanne Koong.

16) Various fixes and clean ups for selftests/bpf and vmtest.sh, from Daniel
    Xu, Artem Savkov, Joanne Koong, Andrii Nakryiko, Shibin Koikkara Reeny.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits)
  selftests/bpf: Few fixes for selftests/bpf built in release mode
  libbpf: Clean up deprecated and legacy aliases
  libbpf: Streamline bpf_attr and perf_event_attr initialization
  libbpf: Fix potential NULL dereference when parsing ELF
  selftests/bpf: Tests libbpf autoattach APIs
  libbpf: Allows disabling auto attach
  selftests/bpf: Fix attach point for non-x86 arches in test_progs/lsm
  libbpf: Making bpf_prog_load() ignore name if kernel doesn't support
  selftests/bpf: Update CI kconfig
  selftests/bpf: Add connmark read test
  selftests/bpf: Add existing connection bpf_*_ct_lookup() test
  bpftool: Clear errno after libcap's checks
  bpf: Clear up confusion in bpf_skb_adjust_room()'s documentation
  bpftool: Fix a typo in a comment
  libbpf: Add names for auxiliary maps
  bpf: Use bpf_map_area_alloc consistently on bpf map creation
  bpf: Make __GFP_NOWARN consistent in bpf map creation
  bpf: Use bpf_map_area_free instread of kvfree
  bpf: Remove unneeded memset in queue_stack_map creation
  libbpf: preserve errno across pr_warn/pr_info/pr_debug
  ...
====================

Link: https://lore.kernel.org/r/20220817215656.1180215-1-andrii@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-17 20:29:36 -07:00
Geert Uytterhoeven aa5762c342 netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68
("netfilter: provide config option to disable ancient procfs parts") in
v3.3.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-17 08:46:30 +02:00
Pablo Neira Ayuso 1b6345d416 netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified
Since f3a2181e16 ("netfilter: nf_tables: Support for sets with
multiple ranged fields"), it possible to combine intervals and
concatenations. Later on, ef516e8625 ("netfilter: nf_tables:
reintroduce the NFT_SET_CONCAT flag") provides the NFT_SET_CONCAT flag
for userspace to report that the set stores a concatenation.

Make sure NFT_SET_CONCAT is set on if field_count is specified for
consistency. Otherwise, if NFT_SET_CONCAT is specified with no
field_count, bail out with EINVAL.

Fixes: ef516e8625 ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15 18:41:21 +02:00
Pablo Neira Ayuso fc0ae524b5 netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END
These flags are mutually exclusive, report EINVAL in this case.

Fixes: aaa31047a6 ("netfilter: nftables: add catch-all set element support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15 17:54:59 +02:00
Pablo Neira Ayuso 88cccd908d netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags
If the NFT_SET_CONCAT|NFT_SET_INTERVAL flags are set on, then the
netlink attribute NFTA_SET_ELEM_KEY_END must be specified. Otherwise,
NFTA_SET_ELEM_KEY_END should not be present.

For catch-all element, NFTA_SET_ELEM_KEY_END should not be present.
The NFT_SET_ELEM_INTERVAL_END is never used with this set flags
combination.

Fixes: 7b225d0b5c ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-15 17:54:51 +02:00
Pablo Neira Ayuso 5a2f3dc318 netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag
If the NFTA_SET_ELEM_OBJREF netlink attribute is present and
NFT_SET_OBJECT flag is set on, report EINVAL.

Move existing sanity check earlier to validate that NFT_SET_OBJECT
requires NFTA_SET_ELEM_OBJREF.

Fixes: 8aeff920dc ("netfilter: nf_tables: add stateful object reference to set elements")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-12 16:32:16 +02:00
Pablo Neira Ayuso 271c5ca826 netfilter: nf_tables: really skip inactive sets when allocating name
While looping to build the bitmap of used anonymous set names, check the
current set in the iteration, instead of the one that is being created.

Fixes: 37a9cc5255 ("netfilter: nf_tables: add generation mask to sets")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 18:53:48 +02:00
Florian Westphal 0b2f3212b5 netfilter: nfnetlink: re-enable conntrack expectation events
To avoid allocation of the conntrack extension area when possible,
the default behaviour was changed to only allocate the event extension
if a userspace program is subscribed to a notification group.

Problem is that while 'conntrack -E' does enable the event allocation
behind the scenes, 'conntrack -E expect' does not: no expectation events
are delivered unless user sets
"net.netfilter.nf_conntrack_events" back to 1 (always on).

Fix the autodetection to also consider EXP type group.

We need to track the 6 event groups (3+3, new/update/destroy for events and
for expectations each) independently, else we'd disable events again
if an expectation group becomes empty while there is still an active
event group.

Fixes: 2794cdb0b9 ("netfilter: nfnetlink: allow to detect if ctnetlink listeners exist")
Reported-by: Yi Chen <yiche@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
2022-08-11 18:09:54 +02:00
Florian Westphal 2024439bd5 netfilter: nf_tables: fix scheduling-while-atomic splat
nf_tables_check_loops() can be called from rhashtable list
walk so cond_resched() cannot be used here.

Fixes: 81ea010667 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 17:57:28 +02:00
Florian Westphal 976bf59c69 netfilter: nf_ct_irc: cap packet search space to 4k
This uses a pseudo-linearization scheme with a 64k global buffer,
but BIG TCP arrival means IPv6 TCP stack can generate skbs
that exceed this size.

In practice, IRC commands are not expected to exceed 512 bytes, plus
this is interactive protocol, so we should not see large packets
in practice.

Given most IRC connections nowadays use TLS so this helper could also be
removed in the near future.

Fixes: 7c4e983c4f ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:51:13 +02:00
Florian Westphal c783a29c7e netfilter: nf_ct_ftp: prefer skb_linearize
This uses a pseudo-linearization scheme with a 64k global buffer,
but BIG TCP arrival means IPv6 TCP stack can generate skbs
that exceed this size.

Use skb_linearize.  It should be possible to rewrite this to properly
deal with segmented skbs (i.e., only do small chunk-wise accesses),
but this is going to be a lot more intrusive than this because every
helper function needs to get the sk_buff instead of a pointer to a raw
data buffer.

In practice, provided we're really looking at FTP control channel packets,
there should never be a case where we deal with huge packets.

Fixes: 7c4e983c4f ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:51:01 +02:00
Florian Westphal f3e124c36f netfilter: nf_ct_h323: cap packet size at 64k
With BIG TCP, packets generated by tcp stack may exceed 64kb.
Cap datalen at 64kb.  The internal message format uses 16bit fields,
so no embedded message can exceed 64k size.

Multiple h323 messages in a single superpacket may now result
in a message to get treated as incomplete/truncated, but thats
better than scribbling past h323_buffer.

Another alternative suitable for net tree would be a switch to
skb_linearize().

Fixes: 7c4e983c4f ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:50:49 +02:00
Florian Westphal a664375da7 netfilter: nf_ct_sane: remove pseudo skb linearization
For historical reason this code performs pseudo linearization of skbs
via skb_header_pointer and a global 64k buffer.

With arrival of BIG TCP, packets generated by TCP stack can exceed 64kb.

Rewrite this to only extract the needed header data.  This also allows
to get rid of the locking.

Fixes: 7c4e983c4f ("net: allow gso_max_size to exceed 65536")
Fixes: 0fe79f28bf ("net: allow gro_max_size to exceed 65536")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-11 16:50:25 +02:00
Pablo Neira Ayuso c485c35ff6 netfilter: nf_tables: possible module reference underflow in error path
dst->ops is set on when nft_expr_clone() fails, but module refcount has
not been bumped yet, therefore nft_expr_destroy() leads to module
reference underflow.

Fixes: 8cfd9b0f85 ("netfilter: nftables: generalize set expressions support")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso 4963674c2e netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag
These are mutually exclusive, actually NFTA_SET_ELEM_KEY_END replaces
the flag notation.

Fixes: 7b225d0b5c ("netfilter: nf_tables: add NFTA_SET_ELEM_KEY_END attribute")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Pablo Neira Ayuso 3400278328 netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access
The generation ID is bumped from the commit path while holding the
mutex, however, netlink dump operations rely on RCU.

This patch also adds missing cb->base_eq initialization in
nf_tables_dump_set().

Fixes: 38e029f14a ("netfilter: nf_tables: set NLM_F_DUMP_INTR if netlink dumping is stale")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-10 17:06:05 +02:00
Florian Westphal 580077855a netfilter: nf_tables: fix null deref due to zeroed list head
In nf_tables_updtable, if nf_tables_table_enable returns an error,
nft_trans_destroy is called to free the transaction object.

nft_trans_destroy() calls list_del(), but the transaction was never
placed on a list -- the list head is all zeroes, this results in
a null dereference:

BUG: KASAN: null-ptr-deref in nft_trans_destroy+0x26/0x59
Call Trace:
 nft_trans_destroy+0x26/0x59
 nf_tables_newtable+0x4bc/0x9bc
 [..]

Its sane to assume that nft_trans_destroy() can be called
on the transaction object returned by nft_trans_alloc(), so
make sure the list head is initialised.

Fixes: 55dd6f9307 ("netfilter: nf_tables: use new transaction infrastructure to handle table")
Reported-by: mingi cho <mgcho.minic@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:30 +02:00
Pablo Neira Ayuso f323ef3a0d netfilter: nf_tables: disallow jump to implicit chain from set element
Extend struct nft_data_desc to add a flag field that specifies
nft_data_init() is being called for set element data.

Use it to disallow jump to implicit chain from set element, only jump
to chain via immediate expression is allowed.

Fixes: d0e2c7de92 ("netfilter: nf_tables: add NFT_CHAIN_BINDING")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Pablo Neira Ayuso 341b694160 netfilter: nf_tables: upfront validation of data via nft_data_init()
Instead of parsing the data and then validate that type and length are
correct, pass a description of the expected data so it can be validated
upfront before parsing it to bail out earlier.

This patch adds a new .size field to specify the maximum size of the
data area. The .len field is optional and it is used as an input/output
field, it provides the specific length of the expected data in the input
path. If then .len field is not specified, then obtained length from the
netlink attribute is stored. This is required by cmp, bitwise, range and
immediate, which provide no netlink attribute that describes the data
length. The immediate expression uses the destination register type to
infer the expected data type.

Relying on opencoded validation of the expected data might lead to
subtle bugs as described in 7e6bc1f6ca ("netfilter: nf_tables:
stricter validation of element data").

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 20:13:29 +02:00
Thadeu Lima de Souza Cascardo 36d5b29132 netfilter: nf_tables: do not allow RULE_ID to refer to another chain
When doing lookups for rules on the same batch by using its ID, a rule from
a different chain can be used. If a rule is added to a chain but tries to
be positioned next to a rule from a different chain, it will be linked to
chain2, but the use counter on chain1 would be the one to be incremented.

When looking for rules by ID, use the chain that was used for the lookup by
name. The chain used in the context copied to the transaction needs to
match that same chain. That way, struct nft_rule does not need to get
enlarged with another member.

Fixes: 1a94e38d25 ("netfilter: nf_tables: add NFTA_RULE_ID attribute")
Fixes: 75dd48e2e4 ("netfilter: nf_tables: Support RULE_ID reference in new rule")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:18 +02:00
Thadeu Lima de Souza Cascardo 95f466d223 netfilter: nf_tables: do not allow CHAIN_ID to refer to another table
When doing lookups for chains on the same batch by using its ID, a chain
from a different table can be used. If a rule is added to a table but
refers to a chain in a different table, it will be linked to the chain in
table2, but would have expressions referring to objects in table1.

Then, when table1 is removed, the rule will not be removed as its linked to
a chain in table2. When expressions in the rule are processed or removed,
that will lead to a use-after-free.

When looking for chains by ID, use the table that was used for the lookup
by name, and only return chains belonging to that same table.

Fixes: 837830a4b4 ("netfilter: nf_tables: add NFTA_RULE_CHAIN_ID attribute")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:17 +02:00
Thadeu Lima de Souza Cascardo 470ee20e06 netfilter: nf_tables: do not allow SET_ID to refer to another table
When doing lookups for sets on the same batch by using its ID, a set from a
different table can be used.

Then, when the table is removed, a reference to the set may be kept after
the set is freed, leading to a potential use-after-free.

When looking for sets by ID, use the table that was used for the lookup by
name, and only return sets belonging to that same table.

This fixes CVE-2022-2586, also reported as ZDI-CAN-17470.

Reported-by: Team Orca of Sea Security (@seasecresponse)
Fixes: 958bee14d0 ("netfilter: nf_tables: use new transaction infrastructure to handle sets")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:17 +02:00
Pablo Neira Ayuso 34aae2c2fb netfilter: nf_tables: validate variable length element extension
Update template to validate variable length extensions. This patch adds
a new .ext_len[id] field to the template to store the expected extension
length. This is used to sanity check the initialization of the variable
length extension.

Use PTR_ERR() in nft_set_elem_init() to report errors since, after this
update, there are two reason why this might fail, either because of
ENOMEM or insufficient room in the extension field (EINVAL).

Kernels up until 7e6bc1f6ca ("netfilter: nf_tables: stricter
validation of element data") allowed to copy more data to the extension
than was allocated. This ext_len field allows to validate if the
destination has the correct size as additional check.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2022-08-09 19:38:16 +02:00
Kumar Kartikeya Dwivedi 6e116280b4 net: netfilter: Remove ifdefs for code shared by BPF and ctnetlink
The current ifdefry for code shared by the BPF and ctnetlink side looks
ugly. As per Pablo's request, simplify this by unconditionally compiling
in the code. This can be revisited when the shared code between the two
grows further.

Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20220725085130.11553-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-08-09 09:41:54 -07:00
Pablo Neira Ayuso b06ada6df9 netfilter: flowtable: fix incorrect Kconfig dependencies
Remove default to 'y', this infrastructure is not fundamental for the
flowtable operational.

Add a missing dependency on CONFIG_NF_FLOW_TABLE.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: b038177636 ("netfilter: nf_flow_table: count pending offload workqueue tasks")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-05 18:50:15 -07:00
Florian Westphal 399a14ec79 netfilter: nf_tables: fix crash when nf_trace is enabled
do not access info->pkt when info->trace is not 1.
nft_traceinfo is not initialized, except when tracing is enabled.

The 'nft_trace_enabled' static key cannot be used for this, we must
always check info->trace first.

Pass nft_pktinfo directly to avoid this.

Fixes: e34b9ed96c ("netfilter: nf_tables: avoid skb access on nf_stolen")
Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-08-05 18:50:14 -07:00
Jakub Kicinski 272ac32f56 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-28 18:21:16 -07:00