Commit Graph

40866 Commits

Author SHA1 Message Date
Ilya Leoshkevich 49f67f393f bpf: btf: Add BTF_FMODEL_SIGNED_ARG flag
s390x eBPF JIT needs to know whether a function return value is signed
and which function arguments are signed, in order to generate code
compliant with the s390x ABI.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-26-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:45:15 -08:00
Ilya Leoshkevich 0f0e5f5bd5 bpf: iterators: Split iterators.lskel.h into little- and big- endian versions
iterators.lskel.h is little-endian, therefore bpf iterator is currently
broken on big-endian systems. Introduce a big-endian version and add
instructions regarding its generation. Unfortunately bpftool's
cross-endianness capabilities are limited to BTF right now, so the
procedure requires access to a big-endian machine or a configured
emulator.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-25-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:45:15 -08:00
David Vernet cb4a21ea59 bpf: Build-time assert that cpumask offset is zero
The first element of a struct bpf_cpumask is a cpumask_t. This is done
to allow struct bpf_cpumask to be cast to a struct cpumask. If this
element were ever moved to another field, any BPF program passing a
struct bpf_cpumask * to a kfunc expecting a const struct cpumask * would
immediately fail to load. Add a build-time assertion so this is
assumption is captured and verified.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230128141537.100777-1-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-28 12:21:40 -08:00
Jakub Kicinski 2d104c390f bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY9RqJgAKCRDbK58LschI
 gw2IAP9G5uhFO5abBzYLupp6SY3T5j97MUvPwLfFqUEt7EXmuwEA2lCUEWeW0KtR
 QX+QmzCa6iHxrW7WzP4DUYLue//FJQY=
 =yYqA
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next 2023-01-28

We've added 124 non-merge commits during the last 22 day(s) which contain
a total of 124 files changed, 6386 insertions(+), 1827 deletions(-).

The main changes are:

1) Implement XDP hints via kfuncs with initial support for RX hash and
   timestamp metadata kfuncs, from Stanislav Fomichev and
   Toke Høiland-Jørgensen.
   Measurements on overhead: https://lore.kernel.org/bpf/875yellcx6.fsf@toke.dk

2) Extend libbpf's bpf_tracing.h support for tracing arguments of
   kprobes/uprobes and syscall as a special case, from Andrii Nakryiko.

3) Significantly reduce the search time for module symbols by livepatch
   and BPF, from Jiri Olsa and Zhen Lei.

4) Enable cpumasks to be used as kptrs, which is useful for tracing
   programs tracking which tasks end up running on which CPUs
   in different time intervals, from David Vernet.

5) Fix several issues in the dynptr processing such as stack slot liveness
   propagation, missing checks for PTR_TO_STACK variable offset, etc,
   from Kumar Kartikeya Dwivedi.

6) Various performance improvements, fixes, and introduction of more
   than just one XDP program to XSK selftests, from Magnus Karlsson.

7) Big batch to BPF samples to reduce deprecated functionality,
   from Daniel T. Lee.

8) Enable struct_ops programs to be sleepable in verifier,
   from David Vernet.

9) Reduce pr_warn() noise on BTF mismatches when they are expected under
   the CONFIG_MODULE_ALLOW_BTF_MISMATCH config anyway, from Connor O'Brien.

10) Describe modulo and division by zero behavior of the BPF runtime
    in BPF's instruction specification document, from Dave Thaler.

11) Several improvements to libbpf API documentation in libbpf.h,
    from Grant Seltzer.

12) Improve resolve_btfids header dependencies related to subcmd and add
    proper support for HOSTCC, from Ian Rogers.

13) Add ipip6 and ip6ip decapsulation support for bpf_skb_adjust_room()
    helper along with BPF selftests, from Ziyang Xuan.

14) Simplify the parsing logic of structure parameters for BPF trampoline
    in the x86-64 JIT compiler, from Pu Lehui.

15) Get BTF working for kernels with CONFIG_RUST enabled by excluding
    Rust compilation units with pahole, from Martin Rodriguez Reboredo.

16) Get bpf_setsockopt() working for kTLS on top of TCP sockets,
    from Kui-Feng Lee.

17) Disable stack protection for BPF objects in bpftool given BPF backends
    don't support it, from Holger Hoffstätte.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (124 commits)
  selftest/bpf: Make crashes more debuggable in test_progs
  libbpf: Add documentation to map pinning API functions
  libbpf: Fix malformed documentation formatting
  selftests/bpf: Properly enable hwtstamp in xdp_hw_metadata
  selftests/bpf: Calls bpf_setsockopt() on a ktls enabled socket.
  bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().
  bpf/selftests: Verify struct_ops prog sleepable behavior
  bpf: Pass const struct bpf_prog * to .check_member
  libbpf: Support sleepable struct_ops.s section
  bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
  selftests/bpf: Fix vmtest static compilation error
  tools/resolve_btfids: Alter how HOSTCC is forced
  tools/resolve_btfids: Install subcmd headers
  bpf/docs: Document the nocast aliasing behavior of ___init
  bpf/docs: Document how nested trusted fields may be defined
  bpf/docs: Document cpumask kfuncs in a new file
  selftests/bpf: Add selftest suite for cpumask kfuncs
  selftests/bpf: Add nested trust selftests suite
  bpf: Enable cpumasks to be queried and used as kptrs
  bpf: Disallow NULLable pointers for trusted kfuncs
  ...
====================

Link: https://lore.kernel.org/r/20230128004827.21371-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-28 00:00:14 -08:00
Jakub Kicinski b568d3072a Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Conflicts:

drivers/net/ethernet/intel/ice/ice_main.c
  418e53401e ("ice: move devlink port creation/deletion")
  643ef23bd9 ("ice: Introduce local var for readability")
https://lore.kernel.org/all/20230127124025.0dacef40@canb.auug.org.au/
https://lore.kernel.org/all/20230124005714.3996270-1-anthony.l.nguyen@intel.com/

drivers/net/ethernet/engleder/tsnep_main.c
  3d53aaef43 ("tsnep: Fix TX queue stop/wake for multiple queues")
  25faa6a4c5 ("tsnep: Replace TX spin_lock with __netif_tx_lock")
https://lore.kernel.org/all/20230127123604.36bb3e99@canb.auug.org.au/

net/netfilter/nf_conntrack_proto_sctp.c
  13bd9b31a9 ("Revert "netfilter: conntrack: add sctp DATA_SENT state"")
  a44b765148 ("netfilter: conntrack: unify established states for SCTP paths")
  f71cb8f45d ("netfilter: conntrack: sctp: use nf log infrastructure for invalid packets")
https://lore.kernel.org/all/20230127125052.674281f9@canb.auug.org.au/
https://lore.kernel.org/all/d36076f3-6add-a442-6d4b-ead9f7ffff86@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-27 22:56:18 -08:00
David Vernet 51a52a29eb bpf: Pass const struct bpf_prog * to .check_member
The .check_member field of struct bpf_struct_ops is currently passed the
member's btf_type via const struct btf_type *t, and a const struct
btf_member *member. This allows the struct_ops implementation to check
whether e.g. an ops is supported, but it would be useful to also enforce
that the struct_ops prog being loaded for that member has other
qualities, like being sleepable (or not). This patch therefore updates
the .check_member() callback to also take a const struct bpf_prog *prog
argument.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-4-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
David Vernet 1e12d3ef47 bpf: Allow BPF_PROG_TYPE_STRUCT_OPS programs to be sleepable
BPF struct_ops programs currently cannot be marked as sleepable. This
need not be the case -- struct_ops programs can be sleepable, and e.g.
invoke kfuncs that export the KF_SLEEPABLE flag. So as to allow future
struct_ops programs to invoke such kfuncs, this patch updates the
verifier to allow struct_ops programs to be sleepable. A follow-on patch
will add support to libbpf for specifying struct_ops.s as a sleepable
struct_ops program, and then another patch will add testcases to the
dummy_st_ops selftest suite which test sleepable struct_ops behavior.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125164735.785732-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 10:25:57 -08:00
David Vernet bdbda39584 bpf/docs: Document cpumask kfuncs in a new file
Now that we've added a series of new cpumask kfuncs, we should document
them so users can easily use them. This patch adds a new cpumasks.rst
file to document them.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-6-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:50 -08:00
David Vernet 516f4d3397 bpf: Enable cpumasks to be queried and used as kptrs
Certain programs may wish to be able to query cpumasks. For example, if
a program that is tracing percpu operations wishes to track which tasks
end up running on which CPUs, it could be useful to associate that with
the tasks' cpumasks. Similarly, programs tracking NUMA allocations, CPU
scheduling domains, etc, could potentially benefit from being able to
see which CPUs a task could be migrated to.

This patch enables these types of use cases by introducing a series of
bpf_cpumask_* kfuncs. Amongst these kfuncs, there are two separate
"classes" of operations:

1. kfuncs which allow the caller to allocate and mutate their own
   cpumask kptrs in the form of a struct bpf_cpumask * object. Such
   kfuncs include e.g. bpf_cpumask_create() to allocate the cpumask, and
   bpf_cpumask_or() to mutate it. "Regular" cpumasks such as p->cpus_ptr
   may not be passed to these kfuncs, and the verifier will ensure this
   is the case by comparing BTF IDs.

2. Read-only operations which operate on const struct cpumask *
   arguments. For example, bpf_cpumask_test_cpu(), which tests whether a
   CPU is set in the cpumask. Any trusted struct cpumask * or struct
   bpf_cpumask * may be passed to these kfuncs. The verifier allows
   struct bpf_cpumask * even though the kfunc is defined with struct
   cpumask * because the first element of a struct bpf_cpumask is a
   cpumask_t, so it is safe to cast.

A follow-on patch will add selftests which validate these kfuncs, and
another will document them.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:49 -08:00
David Vernet caf713c338 bpf: Disallow NULLable pointers for trusted kfuncs
KF_TRUSTED_ARGS kfuncs currently have a subtle and insidious bug in
validating pointers to scalars. Say that you have a kfunc like the
following, which takes an array as the first argument:

bool bpf_cpumask_empty(const struct cpumask *cpumask)
{
	return cpumask_empty(cpumask);
}

...
BTF_ID_FLAGS(func, bpf_cpumask_empty, KF_TRUSTED_ARGS)
...

If a BPF program were to invoke the kfunc with a NULL argument, it would
crash the kernel. The reason is that struct cpumask is defined as a
bitmap, which is itself defined as an array, and is accessed as a memory
address by bitmap operations. So when the verifier analyzes the
register, it interprets it as a pointer to a scalar struct, which is an
array of size 8. check_mem_reg() then sees that the register is NULL and
returns 0, and the kfunc crashes when it passes it down to the cpumask
wrappers.

To fix this, this patch adds a check for KF_ARG_PTR_TO_MEM which
verifies that the register doesn't contain a possibly-NULL pointer if
the kfunc is KF_TRUSTED_ARGS.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230125143816.721952-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-25 07:57:49 -08:00
David Vernet b613d335a7 bpf: Allow trusted args to walk struct when checking BTF IDs
When validating BTF types for KF_TRUSTED_ARGS kfuncs, the verifier
currently enforces that the top-level type must match when calling
the kfunc. In other words, the verifier does not allow the BPF program
to pass a bitwise equivalent struct, despite it being allowed according
to the C standard.

For example, if you have the following type:

struct  nf_conn___init {
	struct nf_conn ct;
};

The C standard stipulates that it would be safe to pass a struct
nf_conn___init to a kfunc expecting a struct nf_conn. The verifier
currently disallows this, however, as semantically kfuncs may want to
enforce that structs that have equivalent types according to the C
standard, but have different BTF IDs, are not able to be passed to
kfuncs expecting one or the other. For example, struct nf_conn___init
may not be queried / looked up, as it is allocated but may not yet be
fully initialized.

On the other hand, being able to pass types that are equivalent
according to the C standard will be useful for other types of kfunc /
kptrs enabled by BPF.  For example, in a follow-on patch, a series of
kfuncs will be added which allow programs to do bitwise queries on
cpumasks that are either allocated by the program (in which case they'll
be a 'struct bpf_cpumask' type that wraps a cpumask_t as its first
element), or a cpumask that was allocated by the main kernel (in which
case it will just be a straight cpumask_t, as in task->cpus_ptr).

Having the two types of cpumasks allows us to distinguish between the
two for when a cpumask is read-only vs. mutatable. A struct bpf_cpumask
can be mutated by e.g. bpf_cpumask_clear(), whereas a regular cpumask_t
cannot be. On the other hand, a struct bpf_cpumask can of course be
queried in the exact same manner as a cpumask_t, with e.g.
bpf_cpumask_test_cpu().

If we were to enforce that top level types match, then a user that's
passing a struct bpf_cpumask to a read-only cpumask_t argument would
have to cast with something like bpf_cast_to_kern_ctx() (which itself
would need to be updated to expect the alias, and currently it only
accommodates a single alias per prog type). Additionally, not specifying
KF_TRUSTED_ARGS is not an option, as some kfuncs take one argument as a
struct bpf_cpumask *, and another as a struct cpumask *
(i.e. cpumask_t).

In order to enable this, this patch relaxes the constraint that a
KF_TRUSTED_ARGS kfunc must have strict type matching, and instead only
enforces strict type matching if a type is observed to be a "no-cast
alias" (i.e., that the type names are equivalent, but one is suffixed
with ___init).

Additionally, in order to try and be conservative and match existing
behavior / expectations, this patch also enforces strict type checking
for acquire kfuncs. We were already enforcing it for release kfuncs, so
this should also improve the consistency of the semantics for kfuncs.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230120192523.3650503-3-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-24 20:15:13 -08:00
David Vernet 57539b1c0a bpf: Enable annotating trusted nested pointers
In kfuncs, a "trusted" pointer is a pointer that the kfunc can assume is
safe, and which the verifier will allow to be passed to a
KF_TRUSTED_ARGS kfunc. Currently, a KF_TRUSTED_ARGS kfunc disallows any
pointer to be passed at a nonzero offset, but sometimes this is in fact
safe if the "nested" pointer's lifetime is inherited from its parent.
For example, the const cpumask_t *cpus_ptr field in a struct task_struct
will remain valid until the task itself is destroyed, and thus would
also be safe to pass to a KF_TRUSTED_ARGS kfunc.

While it would be conceptually simple to enable this by using BTF tags,
gcc unfortunately does not yet support this. In the interim, this patch
enables support for this by using a type-naming convention. A new
BTF_TYPE_SAFE_NESTED macro is defined in verifier.c which allows a
developer to specify the nested fields of a type which are considered
trusted if its parent is also trusted. The verifier is also updated to
account for this. A patch with selftests will be added in a follow-on
change, along with documentation for this feature.

Signed-off-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230120192523.3650503-2-void@manifault.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-24 20:15:01 -08:00
Petr Pavlu 0254127ab9 module: Don't wait for GOING modules
During a system boot, it can happen that the kernel receives a burst of
requests to insert the same module but loading it eventually fails
during its init call. For instance, udev can make a request to insert
a frequency module for each individual CPU when another frequency module
is already loaded which causes the init function of the new module to
return an error.

Since commit 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for
modules that have finished loading"), the kernel waits for modules in
MODULE_STATE_GOING state to finish unloading before making another
attempt to load the same module.

This creates unnecessary work in the described scenario and delays the
boot. In the worst case, it can prevent udev from loading drivers for
other devices and might cause timeouts of services waiting on them and
subsequently a failed boot.

This patch attempts a different solution for the problem 6e6de3dee5
was trying to solve. Rather than waiting for the unloading to complete,
it returns a different error code (-EBUSY) for modules in the GOING
state. This should avoid the error situation that was described in
6e6de3dee5 (user space attempting to load a dependent module because
the -EEXIST error code would suggest to user space that the first module
had been loaded successfully), while avoiding the delay situation too.

This has been tested on linux-next since December 2022 and passes
all kmod selftests except test 0009 with module compression enabled
but it has been confirmed that this issue has existed and has gone
unnoticed since prior to this commit and can also be reproduced without
module compression with a simple usleep(5000000) on tools/modprobe.c [0].
These failures are caused by hitting the kernel mod_concurrent_max and can
happen either due to a self inflicted kernel module auto-loead DoS somehow
or on a system with large CPU count and each CPU count incorrectly triggering
many module auto-loads. Both of those issues need to be fixed in-kernel.

[0] https://lore.kernel.org/all/Y9A4fiobL6IHp%2F%2FP@bombadil.infradead.org/

Fixes: 6e6de3dee5 ("kernel/module.c: Only return -EEXIST for modules that have finished loading")
Co-developed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Cc: stable@vger.kernel.org
Reviewed-by: Petr Mladek <pmladek@suse.com>
[mcgrof: enhance commit log with testing and kmod test result interpretation ]
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-24 12:52:52 -08:00
Toke Høiland-Jørgensen fd7c211d68 bpf: Support consuming XDP HW metadata from fext programs
Instead of rejecting the attaching of PROG_TYPE_EXT programs to XDP
programs that consume HW metadata, implement support for propagating the
offload information. The extension program doesn't need to set a flag or
ifindex, these will just be propagated from the target by the verifier.
We need to create a separate offload object for the extension program,
though, since it can be reattached to a different program later (which
means we can't just inherit the offload information from the target).

An additional check is added on attach that the new target is compatible
with the offload information in the extension prog.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-9-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:11 -08:00
Stanislav Fomichev 3d76a4d3d4 bpf: XDP metadata RX kfuncs
Define a new kfunc set (xdp_metadata_kfunc_ids) which implements all possible
XDP metatada kfuncs. Not all devices have to implement them. If kfunc is not
supported by the target device, the default implementation is called instead.
The verifier, at load time, replaces a call to the generic kfunc with a call
to the per-device one. Per-device kfunc pointers are stored in separate
struct xdp_metadata_ops.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-8-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:11 -08:00
Stanislav Fomichev 2b3486bc2d bpf: Introduce device-bound XDP programs
New flag BPF_F_XDP_DEV_BOUND_ONLY plus all the infra to have a way
to associate a netdev with a BPF program at load time.

netdevsim checks are dropped in favor of generic check in dev_xdp_attach.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-6-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:10 -08:00
Stanislav Fomichev 89bbc53a4d bpf: Reshuffle some parts of bpf/offload.c
To avoid adding forward declarations in the main patch, shuffle
some code around. No functional changes.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-5-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:10 -08:00
Stanislav Fomichev f1fc43d039 bpf: Move offload initialization into late_initcall
So we don't have to initialize it manually from several paths.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-4-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:10 -08:00
Stanislav Fomichev 9d03ebc71a bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded
BPF offloading infra will be reused to implement
bound-but-not-offloaded bpf programs. Rename existing
helpers for clarity. No functional changes.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-23 09:38:10 -08:00
Linus Torvalds 2475bf0250 - Make sure the scheduler doesn't use stale frequency scaling values when latter
get disabled due to a value error
 
 - Fix a NULL pointer access on UP configs
 
 - Use the proper locking when updating CPU capacity
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmPNKRsACgkQEsHwGGHe
 VUr8NA/9GTqGUpOq0iRQkEOE1oAdT4ZxJ9dQpaWjwWSNv40HT7XJBzjtvzAyiOBF
 LTUVClBct5i1/j0tVcNq1zXEj3im2e24Ki6A1TWukejgGAMT/7siSkuChEDAMg2M
 b79nKCMpuIZMzkJND3qTkW/aMPpAyU82G8BeLCjw7vgPBsbVkjgbxGFKYdHgpLZa
 kdX/GhOufu40jcGeUxA4zWTpXfuXT3OG7JYLrlHeJ/HEdzy9kLCkWH4jHHllzPQw
 c4JwG7UnNkDKD6zkG0Guzi1zQy39egh/kaj7FQmVap9sneq+x69T4ta0BfXdBXnQ
 Vsqc/nnOybUzr9Gjg5W6KRZWk0k6hK9n3+cye88BfRvzMY0KAgxeCiZEn7cuKqZp
 15Agzz77vcwt32QsxjSp+hKQxJtePcBCurDhkfuAZyPELNIBeCW6inrXArprfTIg
 IfEF068GKsvDGu3I/z49VXFZ9YBoerQREHbN/xL1VNeB8VoAxAE227OMUvhMcIu1
 jxVvkESwc9BIybTcJbUjgg2i9t+Fv3gtZBQ6GFfDboHneia4U5a9aTyN6QGjX+5P
 SIXPhePbFCNrhW7JbSGqKBd96MczbFNQinRhgX1lBU0241cAnchY4nH9fUvv7Zrt
 b/QDg58tb2bJkm1Z08L256ZETELOU9nLJVxUbtBNSSO9dfkvoBs=
 =aNSK
 -----END PGP SIGNATURE-----

Merge tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

 - Make sure the scheduler doesn't use stale frequency scaling values
   when latter get disabled due to a value error

 - Fix a NULL pointer access on UP configs

 - Use the proper locking when updating CPU capacity

* tag 'sched_urgent_for_v6.2_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/aperfmperf: Erase stale arch_freq_scale values when disabling frequency invariance readings
  sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
  sched/fair: Fixes for capacity inversion detection
  sched/uclamp: Fix a uninitialized variable warnings
2023-01-22 12:14:58 -08:00
Linus Torvalds c88a311470 Driver core fixes for 6.2-rc5
Here are 3 small driver and kernel core fixes for 6.2-rc5.  They
 include:
   - potential gadget fixup in do_prlimit
   - device property refcount leak fix
   - test_async_probe bugfix for reported problem.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY8wB5g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl+5ACfbXPXK7nokMtxvs/9ybhH+IM63X0AmwYXZ5mK
 3dCNVFru/lAZzS7HaR5F
 =4fuA
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are three small driver and kernel core fixes for 6.2-rc5. They
  include:

   - potential gadget fixup in do_prlimit

   - device property refcount leak fix

   - test_async_probe bugfix for reported problem"

* tag 'driver-core-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  prlimit: do_prlimit needs to have a speculation check
  driver core: Fix test_async_probe_init saves device in wrong array
  device property: fix of node refcount leak in fwnode_graph_get_next_endpoint()
2023-01-21 11:17:23 -08:00
Linus Torvalds 83cd5fd014 Kbuild fixes for v6.2 (3rd)
- Hide LDFLAGS_vmlinux from decompressor Makefiles to fix error messages
    when GNU Make 4.4 is used.
 
  - Fix 'make modules' build error when CONFIG_DEBUG_INFO_BTF_MODULES=y.
 
  - Fix warnings emitted by GNU Make 4.4 in scripts/kconfig/Makefile.
 
  - Support GNU Make 4.4 for scripts/jobserver-exec.
 
  - Show clearer error message when kernel/gen_kheaders.sh fails due to
    missing cpio.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmPLnykVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGQ6QQAK+nhDBi+2X2F6D/KP4hIHSawRAx
 oqbrYf+xfVB6sBpcqwlzW1jajqmHgwIYX0OmUMEGOoYsKcJ+ZtmMmGnBaukepXjt
 6KVyLghNNdGPYHGrwMrvNIB2qUHQhrCP82laU701adac+mRnEAnubvIk+nJl00mF
 g2gnlwtxqfH09xO2BICCMYzTnag63bIlNzkIFB4yz2LWGQZ3knHJ7THNOr9J3O3v
 lx5bsQOGJYqq7q8UiTM5Y5GiWKhzupF56Q86ppIduV6LmzD7aj5sQgieGcgbkLW9
 K2xXE/eIVKFPo5tazlDH5i/4oOo0ykjimt0qOd7ya1jHsgU1Qpst2cbe+evJP8fs
 FcorOaizpvGYEM4C5kBh9x4kGdu71Dx9T/+JWHZ1u4vxw78DD4CqhdcZE7sR5cVr
 A5RcbtIurNUka1GTllu27GqVrxLc8splMiyx9456MfHixywyvmpagW6DiU2MgLcx
 wrlwN4VMylCAEKWNHB2FyeHevJqwfZgqfLTXvNGN6xQ4hITuVwTFpO6RdzztXVba
 qIMMK6eK+6PKIidVDPb5dEJpkownlubccE84lYl55qSVo3CgKuweZOH1If78gGQU
 927fFDyVTFtJsf68EEUUGxUS8OgWBQD9daTbNqnK28PLWWG/wtEjgHipycE4/QWN
 lPMHP/qE7x3DLSB9
 =m1Ee
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - Hide LDFLAGS_vmlinux from decompressor Makefiles to fix error
   messages when GNU Make 4.4 is used.

 - Fix 'make modules' build error when CONFIG_DEBUG_INFO_BTF_MODULES=y.

 - Fix warnings emitted by GNU Make 4.4 in scripts/kconfig/Makefile.

 - Support GNU Make 4.4 for scripts/jobserver-exec.

 - Show clearer error message when kernel/gen_kheaders.sh fails due to
   missing cpio.

* tag 'kbuild-fixes-v6.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kheaders: explicitly validate existence of cpio command
  scripts: support GNU make 4.4 in jobserver-exec
  kconfig: Update all declared targets
  scripts: rpm: make clear that mkspec script contains 4.13 feature
  init/Kconfig: fix LOCALVERSION_AUTO help text
  kbuild: fix 'make modules' error when CONFIG_DEBUG_INFO_BTF_MODULES=y
  kbuild: export top-level LDFLAGS_vmlinux only to scripts/Makefile.vmlinux
  init/version-timestamp.c: remove unneeded #include <linux/version.h>
  docs: kbuild: remove mention to dropped $(objtree) feature
2023-01-21 10:56:37 -08:00
Greg Kroah-Hartman 7397906057 prlimit: do_prlimit needs to have a speculation check
do_prlimit() adds the user-controlled resource value to a pointer that
will subsequently be dereferenced.  In order to help prevent this
codepath from being used as a spectre "gadget" a barrier needs to be
added after checking the range.

Reported-by: Jordy Zomer <jordyzomer@google.com>
Tested-by: Jordy Zomer <jordyzomer@google.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-01-21 16:14:17 +01:00
Kumar Kartikeya Dwivedi 1ee72bcbe4 bpf: Avoid recomputing spi in process_dynptr_func
Currently, process_dynptr_func first calls dynptr_get_spi and then
is_dynptr_reg_valid_init and is_dynptr_reg_valid_uninit have to call it
again to obtain the spi value. Instead of doing this twice, reuse the
already obtained value (which is by default 0, and is only set for
PTR_TO_STACK, and only used in that case in aforementioned functions).
The input value for these two functions will either be -ERANGE or >= 1,
and can either be permitted or rejected based on the respective check.

Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi f5b625e5f8 bpf: Combine dynptr_get_spi and is_spi_bounds_valid
Currently, a check on spi resides in dynptr_get_spi, while others
checking its validity for being within the allocated stack slots happens
in is_spi_bounds_valid. Almost always barring a couple of cases (where
being beyond allocated stack slots is not an error as stack slots need
to be populated), both are used together to make checks. Hence, subsume
the is_spi_bounds_valid check in dynptr_get_spi, and return -ERANGE to
specially distinguish the case where spi is valid but not within
allocated slots in the stack state.

The is_spi_bounds_valid function is still kept around as it is a generic
helper that will be useful for other objects on stack similar to dynptr
in the future.

Suggested-by: Joanne Koong <joannelkoong@gmail.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi 379d4ba831 bpf: Allow reinitializing unreferenced dynptr stack slots
Consider a program like below:

void prog(void)
{
	{
		struct bpf_dynptr ptr;
		bpf_dynptr_from_mem(...);
	}
	...
	{
		struct bpf_dynptr ptr;
		bpf_dynptr_from_mem(...);
	}
}

Here, the C compiler based on lifetime rules in the C standard would be
well within in its rights to share stack storage for dynptr 'ptr' as
their lifetimes do not overlap in the two distinct scopes. Currently,
such an example would be rejected by the verifier, but this is too
strict. Instead, we should allow reinitializing over dynptr stack slots
and forget information about the old dynptr object.

The destroy_if_dynptr_stack_slot function already makes necessary checks
to avoid overwriting referenced dynptr slots. This is done to present a
better error message instead of forgetting dynptr information on stack
and preserving reference state, leading to an inevitable but
undecipherable error at the end about an unreleased reference which has
to be associated back to its allocating call instruction to make any
sense to the user.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi f8064ab90d bpf: Invalidate slices on destruction of dynptrs on stack
The previous commit implemented destroy_if_dynptr_stack_slot. It
destroys the dynptr which given spi belongs to, but still doesn't
invalidate the slices that belong to such a dynptr. While for the case
of referenced dynptr, we don't allow their overwrite and return an error
early, we still allow it and destroy the dynptr for unreferenced dynptr.

To be able to enable precise and scoped invalidation of dynptr slices in
this case, we must be able to associate the source dynptr of slices that
have been obtained using bpf_dynptr_data. When doing destruction, only
slices belonging to the dynptr being destructed should be invalidated,
and nothing else. Currently, dynptr slices belonging to different
dynptrs are indistinguishible.

Hence, allocate a unique id to each dynptr (CONST_PTR_TO_DYNPTR and
those on stack). This will be stored as part of reg->id. Whenever using
bpf_dynptr_data, transfer this unique dynptr id to the returned
PTR_TO_MEM_OR_NULL slice pointer, and store it in a new per-PTR_TO_MEM
dynptr_id register state member.

Finally, after establishing such a relationship between dynptrs and
their slices, implement precise invalidation logic that only invalidates
slices belong to the destroyed dynptr in destroy_if_dynptr_stack_slot.

Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi ef8fc7a07c bpf: Fix partial dynptr stack slot reads/writes
Currently, while reads are disallowed for dynptr stack slots, writes are
not. Reads don't work from both direct access and helpers, while writes
do work in both cases, but have the effect of overwriting the slot_type.

While this is fine, handling for a few edge cases is missing. Firstly,
a user can overwrite the stack slots of dynptr partially.

Consider the following layout:
spi: [d][d][?]
      2  1  0

First slot is at spi 2, second at spi 1.
Now, do a write of 1 to 8 bytes for spi 1.

This will essentially either write STACK_MISC for all slot_types or
STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write
of zeroes). The end result is that slot is scrubbed.

Now, the layout is:
spi: [d][m][?]
      2  1  0

Suppose if user initializes spi = 1 as dynptr.
We get:
spi: [d][d][d]
      2  1  0

But this time, both spi 2 and spi 1 have first_slot = true.

Now, when passing spi 2 to dynptr helper, it will consider it as
initialized as it does not check whether second slot has first_slot ==
false. And spi 1 should already work as normal.

This effectively replaced size + offset of first dynptr, hence allowing
invalid OOB reads and writes.

Make a few changes to protect against this:
When writing to PTR_TO_STACK using BPF insns, when we touch spi of a
STACK_DYNPTR type, mark both first and second slot (regardless of which
slot we touch) as STACK_INVALID. Reads are already prevented.

Second, prevent writing	to stack memory from helpers if the range may
contain any STACK_DYNPTR slots. Reads are already prevented.

For helpers, we cannot allow it to destroy dynptrs from the writes as
depending on arguments, helper may take uninit_mem and dynptr both at
the same time. This would mean that helper may write to uninit_mem
before it reads the dynptr, which would be bad.

PTR_TO_MEM: [?????dd]

Depending on the code inside the helper, it may end up overwriting the
dynptr contents first and then read those as the dynptr argument.

Verifier would only simulate destruction when it does byte by byte
access simulation in check_helper_call for meta.access_size, and
fail to catch this case, as it happens after argument checks.

The same would need to be done for any other non-trivial objects created
on the stack in the future, such as bpf_list_head on stack, or
bpf_rb_root on stack.

A common misunderstanding in the current code is that MEM_UNINIT means
writes, but note that writes may also be performed even without
MEM_UNINIT in case of helpers, in that case the code after handling meta
&& meta->raw_mode will complain when it sees STACK_DYNPTR. So that
invalid read case also covers writes to potential STACK_DYNPTR slots.
The only loophole was in case of meta->raw_mode which simulated writes
through instructions which could overwrite them.

A future series sequenced after this will focus on the clean up of
helper access checks and bugs around that.

Fixes: 97e03f5210 ("bpf: Add verifier support for dynptrs")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi 79168a669d bpf: Fix missing var_off check for ARG_PTR_TO_DYNPTR
Currently, the dynptr function is not checking the variable offset part
of PTR_TO_STACK that it needs to check. The fixed offset is considered
when computing the stack pointer index, but if the variable offset was
not a constant (such that it could not be accumulated in reg->off), we
will end up a discrepency where runtime pointer does not point to the
actual stack slot we mark as STACK_DYNPTR.

It is impossible to precisely track dynptr state when variable offset is
not constant, hence, just like bpf_timer, kptr, bpf_spin_lock, etc.
simply reject the case where reg->var_off is not constant. Then,
consider both reg->off and reg->var_off.value when computing the stack
pointer index.

A new helper dynptr_get_spi is introduced to hide over these details
since the dynptr needs to be located in multiple places outside the
process_dynptr_func checks, hence once we know it's a PTR_TO_STACK, we
need to enforce these checks in all places.

Note that it is disallowed for unprivileged users to have a non-constant
var_off, so this problem should only be possible to trigger from
programs having CAP_PERFMON. However, its effects can vary.

Without the fix, it is possible to replace the contents of the dynptr
arbitrarily by making verifier mark different stack slots than actual
location and then doing writes to the actual stack address of dynptr at
runtime.

Fixes: 97e03f5210 ("bpf: Add verifier support for dynptrs")
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:02 -08:00
Kumar Kartikeya Dwivedi d6fefa1105 bpf: Fix state pruning for STACK_DYNPTR stack slots
The root of the problem is missing liveness marking for STACK_DYNPTR
slots. This leads to all kinds of problems inside stacksafe.

The verifier by default inside stacksafe ignores spilled_ptr in stack
slots which do not have REG_LIVE_READ marks. Since this is being checked
in the 'old' explored state, it must have already done clean_live_states
for this old bpf_func_state. Hence, it won't be receiving any more
liveness marks from to be explored insns (it has received REG_LIVE_DONE
marking from liveness point of view).

What this means is that verifier considers that it's safe to not compare
the stack slot if was never read by children states. While liveness
marks are usually propagated correctly following the parentage chain for
spilled registers (SCALAR_VALUE and PTR_* types), the same is not the
case for STACK_DYNPTR.

clean_live_states hence simply rewrites these stack slots to the type
STACK_INVALID since it sees no REG_LIVE_READ marks.

The end result is that we will never see STACK_DYNPTR slots in explored
state. Even if verifier was conservatively matching !REG_LIVE_READ
slots, very next check continuing the stacksafe loop on seeing
STACK_INVALID would again prevent further checks.

Now as long as verifier stores an explored state which we can compare to
when reaching a pruning point, we can abuse this bug to make verifier
prune search for obviously unsafe paths using STACK_DYNPTR slots
thinking they are never used hence safe.

Doing this in unprivileged mode is a bit challenging. add_new_state is
only set when seeing BPF_F_TEST_STATE_FREQ (which requires privileges)
or when jmps_processed difference is >= 2 and insn_processed difference
is >= 8. So coming up with the unprivileged case requires a little more
work, but it is still totally possible. The test case being discussed
below triggers the heuristic even in unprivileged mode.

However, it no longer works since commit
8addbfc7b3 ("bpf: Gate dynptr API behind CAP_BPF").

Let's try to study the test step by step.

Consider the following program (C style BPF ASM):

0  r0 = 0;
1  r6 = &ringbuf_map;
3  r1 = r6;
4  r2 = 8;
5  r3 = 0;
6  r4 = r10;
7  r4 -= -16;
8  call bpf_ringbuf_reserve_dynptr;
9  if r0 == 0 goto pc+1;
10 goto pc+1;
11 *(r10 - 16) = 0xeB9F;
12 r1 = r10;
13 r1 -= -16;
14 r2 = 0;
15 call bpf_ringbuf_discard_dynptr;
16 r0 = 0;
17 exit;

We know that insn 12 will be a pruning point, hence if we force
add_new_state for it, it will first verify the following path as
safe in straight line exploration:
0 1 3 4 5 6 7 8 9 -> 10 -> (12) 13 14 15 16 17

Then, when we arrive at insn 12 from the following path:
0 1 3 4 5 6 7 8 9 -> 11 (12)

We will find a state that has been verified as safe already at insn 12.
Since register state is same at this point, regsafe will pass. Next, in
stacksafe, for spi = 0 and spi = 1 (location of our dynptr) is skipped
seeing !REG_LIVE_READ. The rest matches, so stacksafe returns true.
Next, refsafe is also true as reference state is unchanged in both
states.

The states are considered equivalent and search is pruned.

Hence, we are able to construct a dynptr with arbitrary contents and use
the dynptr API to operate on this arbitrary pointer and arbitrary size +
offset.

To fix this, first define a mark_dynptr_read function that propagates
liveness marks whenever a valid initialized dynptr is accessed by dynptr
helpers. REG_LIVE_WRITTEN is marked whenever we initialize an
uninitialized dynptr. This is done in mark_stack_slots_dynptr. It allows
screening off mark_reg_read and not propagating marks upwards from that
point.

This ensures that we either set REG_LIVE_READ64 on both dynptr slots, or
none, so clean_live_states either sets both slots to STACK_INVALID or
none of them. This is the invariant the checks inside stacksafe rely on.

Next, do a complete comparison of both stack slots whenever they have
STACK_DYNPTR. Compare the dynptr type stored in the spilled_ptr, and
also whether both form the same first_slot. Only then is the later path
safe.

Fixes: 97e03f5210 ("bpf: Add verifier support for dynptrs")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230121002241.2113993-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-20 17:55:02 -08:00
Jakub Kicinski b3c588cd55 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/ipa/ipa_interrupt.c
drivers/net/ipa/ipa_interrupt.h
  9ec9b2a308 ("net: ipa: disable ipa interrupt during suspend")
  8e461e1f09 ("net: ipa: introduce ipa_interrupt_enable()")
  d50ed35587 ("net: ipa: enable IPA interrupt handlers separate from registration")
https://lore.kernel.org/all/20230119114125.5182c7ab@canb.auug.org.au/
https://lore.kernel.org/all/79e46152-8043-a512-79d9-c3b905462774@tessares.net/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-20 12:28:23 -08:00
Linus Torvalds 5deaa98587 Including fixes from wireless, bluetooth, bpf and netfilter.
Current release - regressions:
 
  - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
    addrconf", fix nsna_ping mode of team
 
  - wifi: mt76: fix bugs in Rx queue handling and DMA mapping
 
  - eth: mlx5:
    - add missing mutex_unlock in error reporter
    - protect global IPsec ASO with a lock
 
 Current release - new code bugs:
 
  - rxrpc: fix wrong error return in rxrpc_connect_call()
 
 Previous releases - regressions:
 
  - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2
 
  - wifi:
    - mac80211: fix crashes on Rx due to incorrect initialization of
      rx->link and rx->link_sta
    - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
      aggregation handling, crashes
    - brcmfmac: fix regression for Broadcom PCIe wifi devices
    - rndis_wlan: prevent buffer overflow in rndis_query_oid
 
  - netfilter: conntrack: handle tcp challenge acks during connection
    reuse
 
  - sched: avoid grafting on htb_destroy_class_offload when destroying
 
  - virtio-net: correctly enable callback during start_xmit, fix stalls
 
  - tcp: avoid the lookup process failing to get sk in ehash table
 
  - ipa: disable ipa interrupt during suspend
 
  - eth: stmmac: enable all safety features by default
 
 Previous releases - always broken:
 
  - bpf:
    - fix pointer-leak due to insufficient speculative store bypass
      mitigation (Spectre v4)
    - skip task with pid=1 in send_signal_common() to avoid a splat
    - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
      PERF_BPF_EVENT_PROG_UNLOAD events
    - fix potential deadlock in htab_lock_bucket from same bucket index
      but different map_locked index
 
  - bluetooth:
    - fix a buffer overflow in mgmt_mesh_add()
    - hci_qca: fix driver shutdown on closed serdev
    - ISO: fix possible circular locking dependency
    - CIS: hci_event: fix invalid wait context
 
  - wifi: brcmfmac: fixes for survey dump handling
 
  - mptcp: explicitly specify sock family at subflow creation time
 
  - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
    header bits
 
  - tcp: fix rate_app_limited to default to 1
 
  - l2tp: close all race conditions in l2tp_tunnel_register()
 
  - eth: mlx5: fixes for QoS config and eswitch configuration
 
  - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()
 
  - eth: stmmac: fix invalid call to mdiobus_get_phy()
 
 Misc:
 
  - ethtool: add netlink attr in rss get reply only if the value is
    not empty
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmPKxFAACgkQMUZtbf5S
 Irtdlg/+OUggv3sKhcpUv39SnmMyiIhnNj9KhG+25Iiy7MJxaoNntCGsW3KXXTGo
 JylOGMesociz+hCv4xHl9J61uwmz+qrUPKqqi7hSbEoHlAa3OrubQb+LW4/x0Jp0
 bNJqqAYt04C+txhvsuF9odfZbKtvQ7RIU0XEzqEoES4UXTYQoCHEAwfn5twKDNNS
 /x9OLSZnctAv1pinKZ8QTLjz0IZwHaBAbWNXkLe/HEu9nGrUndFtA5rJjyzrjw10
 ZltTDfV2lr3SWVHsJShnTJ64u+aPBGmJmVzeNw64qRrmnYdFMCpUVoH222IurexO
 aVPY9WUOwgUovetB8fmhPF0+n5Aa6pbTb4toQB1oVZ8X0h7WNrdfXZug1QDQOMbC
 eGzsNdk6hvOeqBhbIKPLQXzaIxbPyXM+KUUbOxi+V4dahG79vG2BaQsrpFymueVs
 cna7pL8dE1S9dR3SEB0KW4nyoWIugukZrzuX0efv1hxovuWn4yNJBt2lp8gQwY6v
 yTk93Ou2LYDrm4yXLrHHWYNXU1u68Pq0o14xbx7tOYGan/evqfaaa1lmAvj7b1bq
 g19FB4IrwA/1ZBoaOIMV8Ml7u5ww9LAFzJRAClEptOopADN4Gro2jgUYWjmxm+uV
 RdlpQ2mI8iEeEH0FOITmdlFy7cbh7TWIkoiXHcCWifgfUE7sxnY=
 =F3be
 -----END PGP SIGNATURE-----

Merge tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
 "Including fixes from wireless, bluetooth, bpf and netfilter.

  Current release - regressions:

   - Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6
     addrconf", fix nsna_ping mode of team

   - wifi: mt76: fix bugs in Rx queue handling and DMA mapping

   - eth: mlx5:
      - add missing mutex_unlock in error reporter
      - protect global IPsec ASO with a lock

  Current release - new code bugs:

   - rxrpc: fix wrong error return in rxrpc_connect_call()

  Previous releases - regressions:

   - bluetooth: hci_sync: fix use of HCI_OP_LE_READ_BUFFER_SIZE_V2

   - wifi:
      - mac80211: fix crashes on Rx due to incorrect initialization of
        rx->link and rx->link_sta
      - mac80211: fix bugs in iTXQ conversion - Tx stalls, incorrect
        aggregation handling, crashes
      - brcmfmac: fix regression for Broadcom PCIe wifi devices
      - rndis_wlan: prevent buffer overflow in rndis_query_oid

   - netfilter: conntrack: handle tcp challenge acks during connection
     reuse

   - sched: avoid grafting on htb_destroy_class_offload when destroying

   - virtio-net: correctly enable callback during start_xmit, fix stalls

   - tcp: avoid the lookup process failing to get sk in ehash table

   - ipa: disable ipa interrupt during suspend

   - eth: stmmac: enable all safety features by default

  Previous releases - always broken:

   - bpf:
      - fix pointer-leak due to insufficient speculative store bypass
        mitigation (Spectre v4)
      - skip task with pid=1 in send_signal_common() to avoid a splat
      - fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
        PERF_BPF_EVENT_PROG_UNLOAD events
      - fix potential deadlock in htab_lock_bucket from same bucket
        index but different map_locked index

   - bluetooth:
      - fix a buffer overflow in mgmt_mesh_add()
      - hci_qca: fix driver shutdown on closed serdev
      - ISO: fix possible circular locking dependency
      - CIS: hci_event: fix invalid wait context

   - wifi: brcmfmac: fixes for survey dump handling

   - mptcp: explicitly specify sock family at subflow creation time

   - netfilter: nft_payload: incorrect arithmetics when fetching VLAN
     header bits

   - tcp: fix rate_app_limited to default to 1

   - l2tp: close all race conditions in l2tp_tunnel_register()

   - eth: mlx5: fixes for QoS config and eswitch configuration

   - eth: enetc: avoid deadlock in enetc_tx_onestep_tstamp()

   - eth: stmmac: fix invalid call to mdiobus_get_phy()

  Misc:

   - ethtool: add netlink attr in rss get reply only if the value is not
     empty"

* tag 'net-6.2-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  Revert "Merge branch 'octeontx2-af-CPT'"
  tcp: fix rate_app_limited to default to 1
  bnxt: Do not read past the end of test names
  net: stmmac: enable all safety features by default
  octeontx2-af: add mbox to return CPT_AF_FLT_INT info
  octeontx2-af: update cpt lf alloc mailbox
  octeontx2-af: restore rxc conf after teardown sequence
  octeontx2-af: optimize cpt pf identification
  octeontx2-af: modify FLR sequence for CPT
  octeontx2-af: add mbox for CPT LF reset
  octeontx2-af: recover CPT engine when it gets fault
  net: dsa: microchip: ksz9477: port map correction in ALU table entry register
  selftests/net: toeplitz: fix race on tpacket_v3 block close
  net/ulp: use consistent error code when blocking ULP
  octeontx2-pf: Fix the use of GFP_KERNEL in atomic context on rt
  tcp: avoid the lookup process failing to get sk in ehash table
  Revert "net: team: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf"
  MAINTAINERS: add networking entries for Willem
  net: sched: gred: prevent races when adding offloads to stats
  l2tp: prevent lockdep issue in l2tp_tunnel_register()
  ...
2023-01-20 09:58:44 -08:00
Jiri Olsa 6a5f2d6ee8 bpf: Change modules resolving for kprobe multi link
We currently use module_kallsyms_on_each_symbol that iterates all
modules/symbols and we try to lookup each such address in user
provided symbols/addresses to get list of used modules.

This fix instead only iterates provided kprobe addresses and calls
__module_address on each to get list of used modules. This turned
out to be simpler and also bit faster.

On my setup with workload (executed 10 times):

   # test_progs -t kprobe_multi_bench_attach/modules

Current code:

 Performance counter stats for './test.sh' (5 runs):

    76,081,161,596      cycles:k                   ( +-  0.47% )

           18.3867 +- 0.0992 seconds time elapsed  ( +-  0.54% )

With the fix:

 Performance counter stats for './test.sh' (5 runs):

    74,079,889,063      cycles:k                   ( +-  0.04% )

           17.8514 +- 0.0218 seconds time elapsed  ( +-  0.12% )

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230116101009.23694-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-19 17:07:40 -08:00
Zhen Lei 07cc2c931e livepatch: Improve the search performance of module_kallsyms_on_each_symbol()
Currently we traverse all symbols of all modules to find the specified
function for the specified module. But in reality, we just need to find
the given module and then traverse all the symbols in it.

Let's add a new parameter 'const char *modname' to function
module_kallsyms_on_each_symbol(), then we can compare the module names
directly in this function and call hook 'fn' after matching. If 'modname'
is NULL, the symbols of all modules are still traversed for compatibility
with other usage cases.

Phase1: mod1-->mod2..(subsequent modules do not need to be compared)
                |
Phase2:          -->f1-->f2-->f3

Assuming that there are m modules, each module has n symbols on average,
then the time complexity is reduced from O(m * n) to O(m) + O(n).

Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Song Liu <song@kernel.org>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20230116101009.23694-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-19 17:07:15 -08:00
Linus Torvalds d368967cb1 Printk fixes for 6.2-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmPJTNMACgkQUqAMR0iA
 lPL3tBAApmvatVLQfjT10AnpXv/RhNqeL3JbTHCOU7EiG9OpmoNrXzJA+GAyhyXz
 FAViF2Xyfrl3XtYsJUe0hVQNUuGm2tz2owqg7OFuTcvcs9AzGhHK7CCoJv+f580P
 MyojbXFaBw/eqk5LMB5lqR6rhjiz5UeGTXL32StdqBBV9KlZTpwt36Rwc59i0Pkn
 uk4QSp2bJQIgHkSZt8pyspRoR2QLZhntVHvAbvqjIsRWVyqkIFvPc7djKVxPfJBM
 4tf9ijEHkbLL1cFjhFvbstjsEuUu0G3lmgNa/0zID2tXbywQXKTfg+sa7AOWCn8o
 EN9bbTdwGSkfbOnH+l9cmmljeSPPyMCskEFvm7c5yaFR1DZHj6G1OjklQnG6Anjc
 l2GD5kYYi2QTRCI6YMtjb7851i/nY4oAOmCMbzDtWlzn4Kxs0vWdXHoaw/KcoBgh
 W7z2crC7k7+N6DUAkEJz0ivIQJtrASCV9X9DyYC/Y03VahFNsNVA3Le8mJGeWZhk
 rAR+M8sp+UCBByWOPHIUT0UknfyGt4MczmtS9eftsbRwVNUPRlagJrOahJCGooKU
 WmXvu8q9in/+h2X3JhLe4thTQd72XmvbKWXYJthYtMDZMPwkezTYFnDi3n21bPV7
 o+pl80LJG0F1KJHLHt2tO+9ul8f1mvuIyG5MXOMkT0CQCBDNQQQ=
 =aYZ6
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

Pull printk fixes from Petr Mladek:

 - Prevent a potential deadlock when configuring kgdb console

 - Fix a kernel doc warning

* tag 'printk-for-6.2-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
  kernel/printk/printk.c: Fix W=1 kernel-doc warning
  tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
2023-01-19 12:32:07 -08:00
Petr Mladek 21493c6e96 Merge branch 'rework/console-list-lock' into for-linus 2023-01-19 14:56:38 +01:00
Thomas Weißschuh 13e1df0928 kheaders: explicitly validate existence of cpio command
If the cpio command is not available the error emitted by
gen_kheaders.so is not clear as all output of the call to cpio is
discarded:

GNU make 4.4:

  GEN     kernel/kheaders_data.tar.xz
find: 'standard output': Broken pipe
find: write error
make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127
make[1]: *** [scripts/Makefile.build:504: kernel] Error 2

GNU make < 4.4:

  GEN     kernel/kheaders_data.tar.xz
make[2]: *** [kernel/Makefile:157: kernel/kheaders_data.tar.xz] Error 127
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [scripts/Makefile.build:504: kernel] Error 2

Add an explicit check that will trigger a clear message about the issue:

  CHK     kernel/kheaders_data.tar.xz
./kernel/gen_kheaders.sh: line 17: type: cpio: not found

The other commands executed by gen_kheaders.sh are part of a standard
installation, so they are not checked.

Reported-by: Amy Parker <apark0006@student.cerritos.edu>
Link: https://lore.kernel.org/lkml/CAPOgqxFva=tOuh1UitCSN38+28q3BNXKq19rEsVNPRzRqKqZ+g@mail.gmail.com/
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2023-01-18 16:34:04 +09:00
Jakub Kicinski 423c1d363c bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY8WHEgAKCRDbK58LschI
 g+gDAP9K24FwikbCHy145i/KHFaIk4ZDSIfjff8uyKDq73h9QwEAvBvrxko7d+dh
 EHdhJGoqufV8n5wilYOrOGN7ShMwFAg=
 =aFQl
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
bpf 2023-01-16

We've added 6 non-merge commits during the last 8 day(s) which contain
a total of 6 files changed, 22 insertions(+), 24 deletions(-).

The main changes are:

1) Mitigate a Spectre v4 leak in unprivileged BPF from speculative
   pointer-as-scalar type confusion, from Luis Gerhorst.

2) Fix a splat when pid 1 attaches a BPF program that attempts to
   send killing signal to itself, from Hao Sun.

3) Fix BPF program ID information in BPF_AUDIT_UNLOAD as well as
   PERF_BPF_EVENT_PROG_UNLOAD events, from Paul Moore.

4) Fix BPF verifier warning triggered from invalid kfunc call in
   backtrack_insn, also from Hao Sun.

5) Fix potential deadlock in htab_lock_bucket from same bucket index
   but different map_locked index, from Tonghao Zhang.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation
  bpf: hash map, avoid deadlock with suitable hash mask
  bpf: remove the do_idr_lock parameter from bpf_prog_free_id()
  bpf: restore the ebpf program ID for BPF_AUDIT_UNLOAD and PERF_BPF_EVENT_PROG_UNLOAD
  bpf: Skip task with pid=1 in send_signal_common()
  bpf: Skip invalid kfunc call in backtrack_insn
====================

Link: https://lore.kernel.org/r/20230116230745.21742-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-17 19:13:02 -08:00
Jiri Olsa 700e6f853e bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program
Currently we allow to load any tracing program as sleepable,
but BPF_TRACE_RAW_TP can't sleep. Making the check explicit
for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP
will fail to load.

Updating the verifier error to mention iter programs as well.

Acked-by: Song Liu <song@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230117223705.440975-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-01-17 16:56:04 -08:00
Anuradha Weeraman 4fe59a130c kernel/printk/printk.c: Fix W=1 kernel-doc warning
Fix W=1 kernel-doc warning:

kernel/printk/printk.c:
 - Include function parameter in console_lock_spinning_disable_and_check()

Signed-off-by: Anuradha Weeraman <anuradha@debian.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230116125635.374567-1-anuradha@debian.org
2023-01-16 16:59:17 +01:00
John Ogness 3ef5abd9b5 tty: serial: kgdboc: fix mutex locking order for configure_kgdboc()
Several mutexes are taken while setting up console serial ports. In
particular, the tty_port->mutex and @console_mutex are taken:

  serial_pnp_probe
    serial8250_register_8250_port
      uart_add_one_port (locks tty_port->mutex)
        uart_configure_port
          register_console (locks @console_mutex)

In order to synchronize kgdb's tty_find_polling_driver() with
register_console(), commit 6193bc9084 ("tty: serial: kgdboc:
synchronize tty_find_polling_driver() and register_console()") takes
the @console_mutex. However, this leads to the following call chain
(with locking):

  platform_probe
    kgdboc_probe
      configure_kgdboc (locks @console_mutex)
        tty_find_polling_driver
          uart_poll_init (locks tty_port->mutex)
            uart_set_options

This is clearly deadlock potential due to the reverse lock ordering.

Since uart_set_options() requires holding @console_mutex in order to
serialize early initialization of the serial-console lock, take the
@console_mutex in uart_poll_init() instead of configure_kgdboc().

Since configure_kgdboc() was using @console_mutex for safe traversal
of the console list, change it to use the SRCU iterator instead.

Add comments to uart_set_options() kerneldoc mentioning that it
requires holding @console_mutex (aka the console_list_lock).

Fixes: 6193bc9084 ("tty: serial: kgdboc: synchronize tty_find_polling_driver() and register_console()")
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
[pmladek@suse.com: Export console_srcu_read_lock_is_held() to fix build kgdboc as a module.]
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230112161213.1434854-1-john.ogness@linutronix.de
2023-01-16 16:44:53 +01:00
Waiman Long 5657c11678 sched/core: Fix NULL pointer access fault in sched_setaffinity() with non-SMP configs
The kernel commit 9a5418bc48 ("sched/core: Use kfree_rcu() in
do_set_cpus_allowed()") introduces a bug for kernels built with non-SMP
configs. Calling sched_setaffinity() on such a uniprocessor kernel will
cause cpumask_copy() to be called with a NULL pointer leading to general
protection fault. This is not really a problem in real use cases as
there aren't that many uniprocessor kernel configs in use and calling
sched_setaffinity() on such a uniprocessor system doesn't make sense.

Fix this problem by making sure cpumask_copy() will not be called in
such a case.

Fixes: 9a5418bc48 ("sched/core: Use kfree_rcu() in do_set_cpus_allowed()")
Reported-by: kernel test robot <yujie.liu@intel.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230115193122.563036-1-longman@redhat.com
2023-01-16 10:07:25 +01:00
Linus Torvalds 8b7be52f3f modules-6.2-rc4
Just one fix for modules by Nick.
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmPB5S4SHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoin0OUQALx+uch7cZvl3aznH4CAkF1U2mKuESLT
 unQsnSlxIF0BETTAdPbhcCVEtKi102Xq4R6ffCSilB33b0IGM6qY6ZUQb/GMGhg5
 qIur7EDiFksjCBmBuE1iNaAknkiKeirkJTEC4lEOl8OLXprmnuMeeR+E0BDi5sH3
 YIxexLuhyKeF9Ke4bjAJ7A4WKvh2R7yamISCUcP6GL3w7U9urSjo9FHKcUmFH9nr
 qCX/1zE0fz7iJzb9YtBNVhdgKNOYNOxa7TDNXQCVuLcZQfmQqDMBVgKdOkj6TAWX
 6L2CHT4N9IjPt9+DrlEUf6bSSwP4N4aFdyMAo6UlVXcgvEbTS/kdoMSqFOAQSAdb
 G/lzsvS3fD76VcCdZkwAFYnEhUTJ4xWTS0oaI++tu0EFX5lvRuQv2DRt0WULlD/u
 L0paUwmjtVajcSIATxRZkjoMiVD4btDRz30kaIUU/xoc1Gg/EADrSLHESaZ9eZVL
 EJ40aqLLIRBXGZrVEzvf97HIzuQiKfaPzywNvbMpxG3m0tV2pn3Z4ts/A8aO7c+O
 mBDnTURiZN6pT+xsnJBvqWrlXwPRUGwI+NjRcdPZhUyfgj5MHpEI8PAcUWy6TTUn
 H2P6x2iC3/nypqhnwjoixSptjaUWcak3R6UgwVS2YqfjePCqaq0wg9DAFDQH0Yx3
 awAOoum0Ubin
 =aa3U
 -----END PGP SIGNATURE-----

Merge tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull module fix from Luis Chamberlain:
 "Just one fix for modules by Nick"

* tag 'modules-6.2-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
  kallsyms: Fix scheduling with interrupts disabled in self-test
2023-01-14 08:17:27 -06:00
Nicholas Piggin da35048f26 kallsyms: Fix scheduling with interrupts disabled in self-test
kallsyms_on_each* may schedule so must not be called with interrupts
disabled. The iteration function could disable interrupts, but this
also changes lookup_symbol() to match the change to the other timing
code.

Reported-by: Erhard F. <erhard_f@mailbox.org>
Link: https://lore.kernel.org/all/bug-216902-206035@https.bugzilla.kernel.org%2F/
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212251728.8d0872ff-oliver.sang@intel.com
Fixes: 30f3bb0977 ("kallsyms: Add self-test facility")
Tested-by: "Erhard F." <erhard_f@mailbox.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-01-13 15:09:08 -08:00
Luis Gerhorst e4f4db4779 bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation
To mitigate Spectre v4, 2039f26f3a ("bpf: Fix leakage due to
insufficient speculative store bypass mitigation") inserts lfence
instructions after 1) initializing a stack slot and 2) spilling a
pointer to the stack.

However, this does not cover cases where a stack slot is first
initialized with a pointer (subject to sanitization) but then
overwritten with a scalar (not subject to sanitization because
the slot was already initialized). In this case, the second write
may be subject to speculative store bypass (SSB) creating a
speculative pointer-as-scalar type confusion. This allows the
program to subsequently leak the numerical pointer value using,
for example, a branch-based cache side channel.

To fix this, also sanitize scalars if they write a stack slot
that previously contained a pointer. Assuming that pointer-spills
are only generated by LLVM on register-pressure, the performance
impact on most real-world BPF programs should be small.

The following unprivileged BPF bytecode drafts a minimal exploit
and the mitigation:

  [...]
  // r6 = 0 or 1 (skalar, unknown user input)
  // r7 = accessible ptr for side channel
  // r10 = frame pointer (fp), to be leaked
  //
  r9 = r10 # fp alias to encourage ssb
  *(u64 *)(r9 - 8) = r10 // fp[-8] = ptr, to be leaked
  // lfence added here because of pointer spill to stack.
  //
  // Ommitted: Dummy bpf_ringbuf_output() here to train alias predictor
  // for no r9-r10 dependency.
  //
  *(u64 *)(r10 - 8) = r6 // fp[-8] = scalar, overwrites ptr
  // 2039f26f3aca: no lfence added because stack slot was not STACK_INVALID,
  // store may be subject to SSB
  //
  // fix: also add an lfence when the slot contained a ptr
  //
  r8 = *(u64 *)(r9 - 8)
  // r8 = architecturally a scalar, speculatively a ptr
  //
  // leak ptr using branch-based cache side channel:
  r8 &= 1 // choose bit to leak
  if r8 == 0 goto SLOW // no mispredict
  // architecturally dead code if input r6 is 0,
  // only executes speculatively iff ptr bit is 1
  r8 = *(u64 *)(r7 + 0) # encode bit in cache (0: slow, 1: fast)
SLOW:
  [...]

After running this, the program can time the access to *(r7 + 0) to
determine whether the chosen pointer bit was 0 or 1. Repeat this 64
times to recover the whole address on amd64.

In summary, sanitization can only be skipped if one scalar is
overwritten with another scalar. Scalar-confusion due to speculative
store bypass can not lead to invalid accesses because the pointer
bounds deducted during verification are enforced using branchless
logic. See 979d63d50c ("bpf: prevent out of bounds speculation on
pointer arithmetic") for details.

Do not make the mitigation depend on !env->allow_{uninit_stack,ptr_leaks}
because speculative leaks are likely unexpected if these were enabled.
For example, leaking the address to a protected log file may be acceptable
while disabling the mitigation might unintentionally leak the address
into the cached-state of a map that is accessible to unprivileged
processes.

Fixes: 2039f26f3a ("bpf: Fix leakage due to insufficient speculative store bypass mitigation")
Signed-off-by: Luis Gerhorst <gerhorst@cs.fau.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Henriette Hofmeier <henriette.hofmeier@rub.de>
Link: https://lore.kernel.org/bpf/edc95bad-aada-9cfc-ffe2-fa9bb206583c@cs.fau.de
Link: https://lore.kernel.org/bpf/20230109150544.41465-1-gerhorst@cs.fau.de
2023-01-13 17:18:35 +01:00
Qais Yousef da07d2f9c1 sched/fair: Fixes for capacity inversion detection
Traversing the Perf Domains requires rcu_read_lock() to be held and is
conditional on sched_energy_enabled(). Ensure right protections applied.

Also skip capacity inversion detection for our own pd; which was an
error.

Fixes: 44c7b80bff ("sched/fair: Detect capacity inversion")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-3-qyousef@layalina.io
2023-01-13 11:40:21 +01:00
Qais Yousef e26fd28db8 sched/uclamp: Fix a uninitialized variable warnings
Addresses the following warnings:

> config: riscv-randconfig-m031-20221111
> compiler: riscv64-linux-gcc (GCC) 12.1.0
>
> smatch warnings:
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_min'.
> kernel/sched/fair.c:7263 find_energy_efficient_cpu() error: uninitialized symbol 'util_max'.

Fixes: 244226035a ("sched/uclamp: Fix fits_capacity() check in feec()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230112122708.330667-2-qyousef@layalina.io
2023-01-13 11:40:21 +01:00
Jakub Kicinski a99da46ac0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/usb/r8152.c
  be53771c87 ("r8152: add vendor/device ID pair for Microsoft Devkit")
  ec51fbd1b8 ("r8152: add USB device driver for config selection")
https://lore.kernel.org/all/20230113113339.658c4723@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-01-12 19:59:56 -08:00
Tonghao Zhang 9f907439dc bpf: hash map, avoid deadlock with suitable hash mask
The deadlock still may occur while accessed in NMI and non-NMI
context. Because in NMI, we still may access the same bucket but with
different map_locked index.

For example, on the same CPU, .max_entries = 2, we update the hash map,
with key = 4, while running bpf prog in NMI nmi_handle(), to update
hash map with key = 20, so it will have the same bucket index but have
different map_locked index.

To fix this issue, using min mask to hash again.

Fixes: 20b6cc34ea ("bpf: Avoid hashtab deadlock with map_locked")
Signed-off-by: Tonghao Zhang <tong@infragraf.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Song Liu <song@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Hou Tao <houtao1@huawei.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230111092903.92389-1-tong@infragraf.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-01-12 18:55:42 -08:00
Linus Torvalds 772d0e9144 - Fix various DocBook formatting errors in kernel/time/ that
generated (justified) warnings during a kernel-doc build.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPAGrkRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1i+txAAuYNFi2RntHyD/0GIkgcyeS91gRE4WlTA
 WvCmg0I56oCiFY/TjRJSGOetEDAMs6CIeGmy5u4Al2Drnmw1LtSFwaJZoENJgqMG
 ytANTNqB4tYmYHVxm6U1eQSxse9vFb0EP/DrK15v+NapkYWXeQUnnGGrfnheeXiE
 Vg/PlaRkYncAzz424BHBvzK8YAmWiC9CfWAGxmX2PBNq8EL2P4BoFG/Tb0QpvWOL
 svR0QewSG7A+WnN9i42KaciMKo1pvtPdZPrxu4Mr5yT6t6LzdGLEhqRK+oejzwB0
 En3kdI2uMv4Bn0YFwuXtASKpcRHF5qxfpMFRCNhZS0YzwGIJH68frGVhGVS2OFbW
 UbsD0DBFd30/D/BTtG7OpYBkSC/9DT0SsPpbpc0mqolSSAG/fXy960T5FJ9FzD1b
 S2TWnY6ZoYpKkrOXZ6DA6jHdc7GV7B1HmQxlVs0S5nzJ9bRCqI/qtUfY3/9pXq6e
 UyuovcWpQJlaGrDe1TpgzdEvw71+/qokYsuB/V0sViRLDg7VYkYUtQxQYcg+0Jhy
 J1F7WzxNkkuAgttZqMDKyBCZXFTbCAnihx/aleDnwGzPlF8Rz4l8nU++VvbV9NZp
 j0hX3GwqhO8fpk9ilRHuFjzzr3ZhhQg5OumZfMWeYvp13hUxoej4e5ZSj7VWhkV5
 cENcjdvspG8=
 =nO/P
 -----END PGP SIGNATURE-----

Merge tag 'timers-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer doc fixes from Ingo Molnar:

 - Fix various DocBook formatting errors in kernel/time/ that generated
   (justified) warnings during a kernel-doc build.

* tag 'timers-urgent-2023-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  time: Fix various kernel-doc problems
2023-01-12 16:53:39 -06:00