- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a preprocessor
option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl/MzyMVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGKJ8P/2kLq296XAPjqC90/LWMja8dsXO/
Wgaq8zC819x0JFuGdBKlwlFe3AvFYRtts9V5+mzjxvsOjH/6+xzyrXjRPCwZYqlj
XKC3ZwuS2SGDPFCriI1edwTUp5tyDnG/VBjqbf3ybQnz0LAShidXBD9IlM/XX9Rz
BlWqd7Uib50Pq8AfM2JVokrSmkkvhqxocIsmjTa0wvRjRAw7+aVkGNCWXqnTho7y
YuHmTWbmUQIROF3Bzs1fkGp+qaQofPRfA1tTwaTVvgmt8rEqyzXi11y6kj56INfg
/pq4O1KrplKtJFdrcjj4/eptqHG3I+Jq56qCHVescF6+bH6cc6BUL8qDdAzFZQai
e/pWCzREqFDKchEmT2d0Uzik8Zfxi5Cw68Otpzb4LqTUUxXSoRx1R9Of/Ei5QZum
6b6s9Q41UwH983UQCOOSGjXGZYP6fZG1a0XejbduYo7TL4KEECAO/FlLBWGttYH3
0i3aKz3aDKb/fo7hDbbqg+o6F0mShEraqxMmWgIvgGt+k76j0O0wS2KryqpTd7Vv
xg72suGM7f9QBA50lZ0r32fm86XnlqwQAm9ZMaSXR1Ii7j4F9UNRmR/FUYq7dPwa
COkuHr+9LqzV/tkluWi2rjLIGPaCuEVeSCcQ/wIDdp2iOyb54CbozwK0Yi2dxxus
jVFKwSaMUDHrkSj6
=/ysh
-----END PGP SIGNATURE-----
Merge tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild fixes from Masahiro Yamada:
- Move -Wcast-align to W=3, which tends to be false-positive and there
is no tree-wide solution.
- Pass -fmacro-prefix-map to KBUILD_CPPFLAGS because it is a
preprocessor option and makes sense for .S files as well.
- Disable -gdwarf-2 for Clang's integrated assembler to avoid warnings.
- Disable --orphan-handling=warn for LLD 10.0.1 to avoid warnings.
- Fix undesirable line breaks in *.mod files.
* tag 'kbuild-fixes-v5.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: avoid split lines in .mod files
kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
kbuild: Hoist '--orphan-handling' into Kconfig
Kbuild: do not emit debug info for assembly with LLVM_IAS=1
kbuild: use -fmacro-prefix-map for .S sources
Makefile.extrawarn: move -Wcast-align to W=3
ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
along with a few others. Newer versions of ld.lld do not have these
warnings. As a result, do not add '--orphan-handling=warn' to
LDFLAGS_vmlinux if ld.lld's version is not new enough.
Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Link: https://github.com/ClangBuiltLinux/linux/issues/1193
Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).
To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.
To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.
Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
While allmodconfig and allyesconfig build for s390 there are also
various bots running compile tests with randconfig, where PCI is
disabled. This reveals that a lot of drivers should actually depend on
HAS_IOMEM.
Adding this to each device driver would be a never ending story,
therefore just disable COMPILE_TEST for s390.
The reasoning is more or less the same as described in
commit bc083a64b6 ("init/Kconfig: make COMPILE_TEST depend on !UML").
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Currently, LOG_BUF_SHIFT defaults to 17, which is 2 ^ 17 bytes = 128 KB,
and LOG_CPU_MAX_BUF_SHIFT defaults to 12, which is 2 ^ 12 bytes = 4 KB.
Half of 128 KB is 64 KB, so more than 16 CPUs are required for the value
to be used, as then the sum of contributions is greater than 64 KB for
the first time. My guess is, that the description was written with the
configuration values used in the SUSE in mind.
Fixes: 23b2899f7f ("printk: allow increasing the ring buffer depending on the number of CPUs")
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200811092924.6256-1-pmenzel@molgen.mpg.de
Add redirect_neigh() BPF packet redirect helper, allowing to limit stack
traversal in common container configs and improving TCP back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
Expand netlink policy support and improve policy export to user space.
(Ge)netlink core performs request validation according to declared
policies. Expand the expressiveness of those policies (min/max length
and bitmasks). Allow dumping policies for particular commands.
This is used for feature discovery by user space (instead of kernel
version parsing or trial and error).
Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge.
Allow more than 255 IPv4 multicast interfaces.
Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
In Multi-patch TCP (MPTCP) support concurrent transmission of data
on multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
Support SMC-Dv2 version of SMC, which enables multi-subnet deployments.
Allow more calls to same peer in RxRPC.
Support two new Controller Area Network (CAN) protocols -
CAN-FD and ISO 15765-2:2016.
Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
Add TC actions for implementing MPLS L2 VPNs.
Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary notifications
and make it easier to offload nexthops into HW by converting
to a blocking notifier.
Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific
TCP option use.
Reorganize TCP congestion control (CC) initialization to simplify life
of TCP CC implemented in BPF.
Add support for shipping BPF programs with the kernel and loading them
early on boot via the User Mode Driver mechanism, hence reusing all the
user space infra we have.
Support sleepable BPF programs, initially targeting LSM and tracing.
Add bpf_d_path() helper for returning full path for given 'struct path'.
Make bpf_tail_call compatible with bpf-to-bpf calls.
Allow BPF programs to call map_update_elem on sockmaps.
Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
Support listing and getting information about bpf_links via the bpf
syscall.
Enhance kernel interfaces around NIC firmware update. Allow specifying
overwrite mask to control if settings etc. are reset during update;
report expected max time operation may take to users; support firmware
activation without machine reboot incl. limits of how much impact
reset may have (e.g. dropping link or not).
Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
Adopt or extend devlink use for debug, monitoring, fw update
in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw,
mv88e6xxx, dpaa2-eth).
In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
Add XDP support for Intel's igb driver.
Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
Improve performance for packets which don't require much offloads
on recent Mellanox NICs by 20% by making multiple packets share
a descriptor entry.
Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto
subtree to drivers/net. Move MDIO drivers out of the phy directory.
Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAl+ItRwACgkQMUZtbf5S
IrtTMg//UxpdR/MirT1DatBU0K/UGAZY82hV7F/UC8tPgjfHZeHvWlDFxfi3YP81
PtPKbhRZ7DhwBXefUp6nY3UdvjftrJK2lJm8prJUPSsZRye8Wlcb7y65q7/P2y2U
Efucyopg6RUrmrM0DUsIGYGJgylQLHnMYUl/keCsD4t5Bp4ksyi9R2t5eitGoWzh
r3QGdbSa0AuWx4iu0i+tqp6Tj0ekMBMXLVb35dtU1t0joj2KTNEnSgABN3prOa8E
iWYf2erOau68Ogp3yU3miCy0ZU4p/7qGHTtzbcp677692P/ekak6+zmfHLT9/Pjy
2Stq2z6GoKuVxdktr91D9pA3jxG4LxSJmr0TImcGnXbvkMP3Ez3g9RrpV5fn8j6F
mZCH8TKZAoD5aJrAJAMkhZmLYE1pvDa7KolSk8WogXrbCnTEb5Nv8FHTS1Qnk3yl
wSKXuvutFVNLMEHCnWQLtODbTST9DI/aOi6EctPpuOA/ZyL1v3pl+gfp37S+LUTe
owMnT/7TdvKaTD0+gIyU53M6rAWTtr5YyRQorX9awIu/4Ha0F0gYD7BJZQUGtegp
HzKt59NiSrFdbSH7UdyemdBF4LuCgIhS7rgfeoUXMXmuPHq7eHXyHZt5dzPPa/xP
81P0MAvdpFVwg8ij2yp2sHS7sISIRKq17fd1tIewUabxQbjXqPc=
=bc1U
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAl+EN+oACgkQUqAMR0iA
lPK/gA//WXBjC4FSPNr0j7kPFKQhADS3cUcp+GfuI4rYkYcJHV0yJn1kvctg1rUC
Je+Hc+Hy5Nk93lwejj5BvQoc31zOeoPDyMje5zi5te4H2NQkaoGXHOMvUnaLcNeo
g+HJvx+NU9MDjuc5amtK8YD69jzErD+eqrHpQOg4UToMXXcBXLafTThIi9vT1fzP
9uwWBRlpdQyY7tYbbwFiDuu33PyoWlc6Ksp8qKdLBLz2AmGd1Rvaq+ePsq8b9tHJ
pfv1agW0GTpzoN2pm5gFXOoYniHB/ooB1L0QLq7ylaociEyb8WbTtkn4v++EjxW8
aGsO1WdO0MQeIWDxXQR5DYD3s+Me2DMhFPDqUc2/s0q2SGWUPFcsmCsvMAOx/clA
HDfTWkyzB4FarZOTv0gZ7jYNOVukFzUQ1IBTtWpJifC9fT0xrRkKmKE1UgmWv0ei
Hx5VFQyQGsDh3sUcRLhW91p4sqJCs7l01zw1A/0rb7a+QTHAqZRtbz5hyTjlViiT
57XiyXynXW8N4Q5U6uAxCbkFFi+nP/XVQ5ggZ/QLn/4hfWWUcu0vt2bOGkRwryAT
zYmDqViraEVWKIom74UzZ0nrIBtdhvtbFQIYuyiCQKpKMwytWXUQbUASZL2mfBZi
h5eJx7etV6f5to5mNRsj8bbN5buX9UheEd0QFD9NJdS6aadqTac=
=9vEl
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
"The big new thing is the fully lockless ringbuffer implementation,
including the support for continuous lines. It will allow to store and
read messages in any situation wihtout the risk of deadlocks and
without the need of temporary per-CPU buffers.
The access is still serialized by logbuf_lock. It synchronizes few
more operations, for example, temporary buffer for formatting the
message, syslog and kmsg_dump operations. The lock removal is being
discussed and should be ready for the next release.
The continuous lines are handled exactly the same way as before to
avoid regressions in user space. It means that they are appended to
the last message when the caller is the same. Only the last message
can be extended.
The data ring includes plain text of the messages. Except for an
integer at the beginning of each message that points back to the
descriptor ring with other metadata.
The dictionary has to stay. journalctl uses it to filter the log. It
allows to show messages related to a given device. The dictionary
values are stored in the descriptor ring with the other metadata.
This is the first part of the printk rework as discussed at Plumbers
2019, see https://lore.kernel.org/r/87k1acz5rx.fsf@linutronix.de. The
next big step will be handling consoles by kthreads during the normal
system operation. It will require special handling of situations when
the kthreads could not get scheduled, for example, early boot,
suspend, panic.
Other changes:
- Add John Ogness as a reviewer for printk subsystem. He is author of
the rework and is familiar with the code and history.
- Fix locking in serial8250_do_startup() to prevent lockdep report.
- Few code cleanups"
* tag 'printk-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (27 commits)
printk: Use fallthrough pseudo-keyword
printk: reduce setup_text_buf size to LOG_LINE_MAX
printk: avoid and/or handle record truncation
printk: remove dict ring
printk: move dictionary keys to dev_printk_info
printk: move printk_info into separate array
printk: reimplement log_cont using record extension
printk: ringbuffer: add finalization/extension support
printk: ringbuffer: change representation of states
printk: ringbuffer: clear initial reserved fields
printk: ringbuffer: add BLK_DATALESS() macro
printk: ringbuffer: relocate get_data()
printk: ringbuffer: avoid memcpy() on state_var
printk: ringbuffer: fix setting state in desc_read()
kernel.h: Move oops_in_progress to printk.h
scripts/gdb: update for lockless printk ringbuffer
scripts/gdb: add utils.read_ulong()
docs: vmcoreinfo: add lockless printk ringbuffer vmcoreinfo
printk: reduce LOG_BUF_SHIFT range for H8300
printk: ringbuffer: support dataless records
...
nommu-mmap.rst was moved to Documentation/admin-guide/mm; this patch
updates the remaining stale references to Documentation/mm.
Fixes: 800c02f5d0 ("docs: move nommu-mmap.txt to admin-guide and rename to ReST")
Signed-off-by: Stephen Kitt <steve@sk2.org>
Link: https://lore.kernel.org/r/20200812092230.27541-1-steve@sk2.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
The .bss section for the h8300 is relatively small. A value of
CONFIG_LOG_BUF_SHIFT that is larger than 19 will create a static
printk ringbuffer that is too large. Limit the range appropriately
for the H8300.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20200812073122.25412-1-john.ogness@linutronix.de
Introduce sleepable BPF programs that can request such property for themselves
via BPF_F_SLEEPABLE flag at program load time. In such case they will be able
to use helpers like bpf_copy_from_user() that might sleep. At present only
fentry/fexit/fmod_ret and lsm programs can request to be sleepable and only
when they are attached to kernel functions that are known to allow sleeping.
The non-sleepable programs are relying on implicit rcu_read_lock() and
migrate_disable() to protect life time of programs, maps that they use and
per-cpu kernel structures used to pass info between bpf programs and the
kernel. The sleepable programs cannot be enclosed into rcu_read_lock().
migrate_disable() maps to preempt_disable() in non-RT kernels, so the progs
should not be enclosed in migrate_disable() as well. Therefore
rcu_read_lock_trace is used to protect the life time of sleepable progs.
There are many networking and tracing program types. In many cases the
'struct bpf_prog *' pointer itself is rcu protected within some other kernel
data structure and the kernel code is using rcu_dereference() to load that
program pointer and call BPF_PROG_RUN() on it. All these cases are not touched.
Instead sleepable bpf programs are allowed with bpf trampoline only. The
program pointers are hard-coded into generated assembly of bpf trampoline and
synchronize_rcu_tasks_trace() is used to protect the life time of the program.
The same trampoline can hold both sleepable and non-sleepable progs.
When rcu_read_lock_trace is held it means that some sleepable bpf program is
running from bpf trampoline. Those programs can use bpf arrays and preallocated
hash/lru maps. These map types are waiting on programs to complete via
synchronize_rcu_tasks_trace();
Updates to trampoline now has to do synchronize_rcu_tasks_trace() and
synchronize_rcu_tasks() to wait for sleepable progs to finish and for
trampoline assembly to finish.
This is the first step of introducing sleepable progs. Eventually dynamically
allocated hash maps can be allowed and networking program types can become
sleepable too.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: KP Singh <kpsingh@google.com>
Link: https://lore.kernel.org/bpf/20200827220114.69225-3-alexei.starovoitov@gmail.com
Add kernel module with user mode driver that populates bpffs with
BPF iterators.
$ mount bpffs /my/bpffs/ -t bpf
$ ls -la /my/bpffs/
total 4
drwxrwxrwt 2 root root 0 Jul 2 00:27 .
drwxr-xr-x 19 root root 4096 Jul 2 00:09 ..
-rw------- 1 root root 0 Jul 2 00:27 maps.debug
-rw------- 1 root root 0 Jul 2 00:27 progs.debug
The user mode driver will load BPF Type Formats, create BPF maps, populate BPF
maps, load two BPF programs, attach them to BPF iterators, and finally send two
bpf_link IDs back to the kernel.
The kernel will pin two bpf_links into newly mounted bpffs instance under
names "progs.debug" and "maps.debug". These two files become human readable.
$ cat /my/bpffs/progs.debug
id name attached
11 dump_bpf_map bpf_iter_bpf_map
12 dump_bpf_prog bpf_iter_bpf_prog
27 test_pkt_access
32 test_main test_pkt_access test_pkt_access
33 test_subprog1 test_pkt_access_subprog1 test_pkt_access
34 test_subprog2 test_pkt_access_subprog2 test_pkt_access
35 test_subprog3 test_pkt_access_subprog3 test_pkt_access
36 new_get_skb_len get_skb_len test_pkt_access
37 new_get_skb_ifindex get_skb_ifindex test_pkt_access
38 new_get_constant get_constant test_pkt_access
The BPF program dump_bpf_prog() in iterators.bpf.c is printing this data about
all BPF programs currently loaded in the system. This information is unstable
and will change from kernel to kernel as ".debug" suffix conveys.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200819042759.51280-4-alexei.starovoitov@gmail.com
Patch series "mm: Expand CONFIG_SLAB_FREELIST_HARDENED to include SLAB"
In reviewing Vlastimil Babka's latest slub debug series, I realized[1]
that several checks under CONFIG_SLAB_FREELIST_HARDENED weren't being
applied to SLAB. Fix this by expanding the Kconfig coverage, and adding a
simple double-free test for SLAB.
This patch (of 2):
Include SLAB caches when performing kmem_cache pointer verification. A
defense against such corruption[1] should be applied to all the
allocators. With this added, the "SLAB_FREE_CROSS" and "SLAB_FREE_PAGE"
LKDTM tests now pass on SLAB:
lkdtm: Performing direct entry SLAB_FREE_CROSS
lkdtm: Attempting cross-cache slab free ...
------------[ cut here ]------------
cache_from_obj: Wrong slab cache. lkdtm-heap-b but object is from lkdtm-heap-a
WARNING: CPU: 2 PID: 2195 at mm/slab.h:530 kmem_cache_free+0x8d/0x1d0
...
lkdtm: Performing direct entry SLAB_FREE_PAGE
lkdtm: Attempting non-Slab slab free ...
------------[ cut here ]------------
virt_to_cache: Object is not a Slab page!
WARNING: CPU: 1 PID: 2202 at mm/slab.h:489 kmem_cache_free+0x196/0x1d0
Additionally clean up neighboring Kconfig entries for clarity,
readability, and redundant option removal.
[1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf
Fixes: 598a0717a8 ("mm/slab: validate cache membership under freelist hardening")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexander Popov <alex.popov@linux.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Matthew Garrett <mjg59@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Vijayanand Jitta <vjitta@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Link: http://lkml.kernel.org/r/20200625215548.389774-1-keescook@chromium.org
Link: http://lkml.kernel.org/r/20200625215548.389774-2-keescook@chromium.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again for a
while. Unless we decide to switch to asciidoc or something...:)
- Lots of typo fixes, warning fixes, and more.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl8oVkwPHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YoW8H/jJ/xnXFn7tkgVPQAlL3k5HCnK7A5nDP9RVR
cg1pTx1cEFdjzxPlJyExU6/v+AImOvtweHXC+JDK7YcJ6XFUNYXJI3LxL5KwUXbY
BL/xRFszDSXH2C7SJF5GECcFYp01e/FWSLN3yWAh+g+XwsKiTJ8q9+CoIDkHfPGO
7oQsHKFu6s36Af0LfSgxk4sVB7EJbo8e4psuPsP5SUrl+oXRO43Put0rXkR4yJoH
9oOaB51Do5fZp8I4JVAqGXvpXoExyLMO4yw0mASm6YSZ3KyjR8Fae+HD9Cq4ZuwY
0uzb9K+9NEhqbfwtyBsi99S64/6Zo/MonwKwevZuhtsDTK4l4iU=
=JQLZ
-----END PGP SIGNATURE-----
Merge tag 'docs-5.9' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"It's been a busy cycle for documentation - hopefully the busiest for a
while to come. Changes include:
- Some new Chinese translations
- Progress on the battle against double words words and non-HTTPS
URLs
- Some block-mq documentation
- More RST conversions from Mauro. At this point, that task is
essentially complete, so we shouldn't see this kind of churn again
for a while. Unless we decide to switch to asciidoc or
something...:)
- Lots of typo fixes, warning fixes, and more"
* tag 'docs-5.9' of git://git.lwn.net/linux: (195 commits)
scripts/kernel-doc: optionally treat warnings as errors
docs: ia64: correct typo
mailmap: add entry for <alobakin@marvell.com>
doc/zh_CN: add cpu-load Chinese version
Documentation/admin-guide: tainted-kernels: fix spelling mistake
MAINTAINERS: adjust kprobes.rst entry to new location
devices.txt: document rfkill allocation
PCI: correct flag name
docs: filesystems: vfs: correct flag name
docs: filesystems: vfs: correct sync_mode flag names
docs: path-lookup: markup fixes for emphasis
docs: path-lookup: more markup fixes
docs: path-lookup: fix HTML entity mojibake
CREDITS: Replace HTTP links with HTTPS ones
docs: process: Add an example for creating a fixes tag
doc/zh_CN: add Chinese translation prefer section
doc/zh_CN: add clearing-warn-once Chinese version
doc/zh_CN: add admin-guide index
doc:it_IT: process: coding-style.rst: Correct __maybe_unused compiler label
futex: MAINTAINERS: Re-add selftests directory
...
kernel and initrd images.
ZSTD has a very fast decompressor, yet it compresses better than gzip.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl8oNX0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jdcg/9GaPGjmNgMqi3tbfzU3z11OrbraRBgMj5
jHIZ89DuzwsqU+jbwGHGiF45ge85iPK6i2ovR3ePzL0LAlLYT3gqzPcl3kkog4E9
0E0JAddx974uW4toc8cGFEHNf4vXtvvi45FL2yvDoap9xLEcpJsQRdu9upPB4U3s
+qotO6wJitM74g4l2WdbStzCAcL4ZXFA/ix19nUyLh4QlFBDqUHwufIhW1G0ciL4
txMXJ23L7e+b6FUvGyK3vFhba1isPdz5xQdQTy2DCK20rQhGu1IBsqzymEibbgIp
/j4yHfUKSpxdblFcpZfknI1VM1mbt/WN5dKDKm9UnYBhA/R/2PN0klfrAQAT4SOS
sP3bxXqTRXBjmop0NjOLCdjGCySYnPLFPlB6REIrMcvs6LYUSTqMZEusj7McwD7h
IqS4zGEMa5A+c6Q4160Qz+zrXIyh/n/bTR/6uOKUktkUQaJ+079P64NK9RtCYZTk
dkIHJChjmWZGxxXHEbo+4e7bM8gAMHDmX2pdWE5u72oYJRqBv7PVyl+SHBk+onxM
crtKvqOp8Q8coirlfjx5UynZeZmH1VuIFjpvnwlAtqxZGvuTWZ0ojq3E3Y/XwHQj
bVejr9AQ1gS9ZBTKwwd5cf7mnOuiXrHrBP3E7buoRw8bWtL+yqHyybqccZnSOUVN
lGFshs+7J5o=
=bARW
-----END PGP SIGNATURE-----
Merge tag 'x86-boot-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:
"The main change in this cycle was to add support for ZSTD-compressed
kernel and initrd images.
ZSTD has a very fast decompressor, yet it compresses better than gzip"
* tag 'x86-boot-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation: dontdiff: Add zstd compressed files
.gitignore: Add ZSTD-compressed files
x86: Add support for ZSTD compressed kernel
x86: Bump ZO_z_extra_bytes margin for zstd
usr: Add support for zstd compressed initramfs
init: Add support for zstd compressed kernel
lib: Add zstd support to decompress
lib: Prepare zstd for preboot environment, improve performance
- Add the zstd and zstd22 cmds to scripts/Makefile.lib
- Add the HAVE_KERNEL_ZSTD and KERNEL_ZSTD options
Architecture specific support is still needed for decompression.
Signed-off-by: Nick Terrell <terrelln@fb.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20200730190841.2071656-4-nickrterrell@gmail.com
Qian reported that the current setup forgoes the Kconfig dependencies and
results in warnings such as:
WARNING: unmet direct dependencies detected for SCHED_THERMAL_PRESSURE
Depends on [n]: SMP [=y] && CPU_FREQ_THERMAL [=n]
Selected by [y]:
- ARM64 [=y]
Revert commit
e17ae7fea8 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
and re-implement it by making the option default to 'y' for arm64 and arm,
which respects Kconfig dependencies (i.e. will remain 'n' if
CPU_FREQ_THERMAL=n).
Fixes: e17ae7fea8 ("arm, arm64: Select CONFIG_SCHED_THERMAL_PRESSURE")
Reported-by: Qian Cai <cai@lca.pw>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200729135718.1871-1-valentin.schneider@arm.com
As Russell pointed out [1], this option is severely lacking in the
documentation department, and figuring out if one has the required
dependencies to benefit from turning it on is not straightforward.
Make it non user-visible, and add a bit of help to it. While at it, make it
depend on CPU_FREQ_THERMAL.
[1]: https://lkml.kernel.org/r/20200603173150.GB1551@shell.armlinux.org.uk
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200712165917.9168-3-valentin.schneider@arm.com
scripts/cc-can-link.sh tests if the compiler can link userspace
programs.
When $(CC) is GCC, it is checked against the target architecture
because the toolchain prefix is specified as a part of $(CC).
When $(CC) is Clang, it is checked against the host architecture
because --target option is missing.
Pass $(CLANG_FLAGS) to scripts/cc-can-link.sh to evaluate the link
capability for the target architecture.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
The nommu-mmap.txt file provides description of user visible
behaviuour. So, move it to the admin-guide.
As it is already at the ReST, also rename it.
Suggested-by: Mike Rapoport <rppt@linux.ibm.com>
Suggested-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Link: https://lore.kernel.org/r/3a63d1833b513700755c85bf3bda0a6c4ab56986.1592918949.git.mchehab+huawei@kernel.org
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
- fix build rules in binderfs sample
- fix build errors when Kbuild recurses to the top Makefile
- covert '---help---' in Kconfig to 'help'
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7lBuYVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGHvIP/3iErjPshpg/phwH8NTCS4SFkiti
BZRM+2lupSn7Qs53BTpVzIkXoHBJQZlJxlQ5HY8ScO+fiz28rKZr+b40us+je1Q+
SkvSPfwZzxjEg7lAZutznG4KgItJLWJKmDyh9T8Y8TAuG4f8WO0hKnXoAp3YorS2
zppEIxso8O5spZPjp+fF/fPbxPjIsabGK7Jp2LpSVFR5pVDHI/ycTlKQS+MFpMEx
6JIpdFRw7TkvKew1dr5uAWT5btWHatEqjSR3JeyVHv3EICTGQwHmcHK67cJzGInK
T51+DT7/CpKtmRgGMiTEu/INfMzzoQAKl6Fcu+vMaShTN97Hk9DpdtQyvA6P/h3L
8GA4UBct05J7fjjIB7iUD+GYQ0EZbaFujzRXLYk+dQqEJRbhcCwvdzggGp0WvGRs
1f8/AIpgnQv8JSL/bOMgGMS5uL2dSLsgbzTdr6RzWf1jlYdI1i4u7AZ/nBrwWP+Z
iOBkKsVceEoJrTbaynl3eoYqFLtWyDau+//oBc2gUvmhn8ioM5dfqBRiJjxJnPG9
/giRj6xRIqMMEw8Gg8PCG7WebfWxWyaIQwlWBbPok7DwISURK5mvOyakZL+Q25/y
6MBr2H8NEJsf35q0GTINpfZnot7NX4JXrrndJH8NIRC7HEhwd29S041xlQJdP0rs
E76xsOr3hrAmBu4P
=1NIT
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull more Kbuild updates from Masahiro Yamada:
- fix build rules in binderfs sample
- fix build errors when Kbuild recurses to the top Makefile
- covert '---help---' in Kconfig to 'help'
* tag 'kbuild-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
treewide: replace '---help---' in Kconfig files with 'help'
kbuild: fix broken builds because of GZIP,BZIP2,LZOP variables
samples: binderfs: really compile this sample and fix build issues
Since commit 84af7a6194 ("checkpatch: kconfig: prefer 'help' over
'---help---'"), the number of '---help---' has been gradually
decreasing, but there are still more than 2400 instances.
This commit finishes the conversion. While I touched the lines,
I also fixed the indentation.
There are a variety of indentation styles found.
a) 4 spaces + '---help---'
b) 7 spaces + '---help---'
c) 8 spaces + '---help---'
d) 1 space + 1 tab + '---help---'
e) 1 tab + '---help---' (correct indentation)
f) 1 tab + 1 space + '---help---'
g) 1 tab + 2 spaces + '---help---'
In order to convert all of them to 1 tab + 'help', I ran the
following commend:
$ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/'
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl7U/i8ACgkQ+7dXa6fL
C2u2eg/+Oy6ybq0hPovYVkFI9WIG7ZCz7w9Q6BEnfYMqqn3dnfJxKQ3l4pnQEOWw
f4QfvpvevsYfMtOJkYcG6s66rQgbFdqc5TEyBBy0QNp3acRolN7IXkcopvv9xOpQ
JxedpbFG1PTFLWjvBpyjlrUPouwLzq2FXAf1Ox0ZIMw6165mYOMWoli1VL8dh0A0
Ai7JUB0WrvTNbrwhV413obIzXT/rPCdcrgbQcgrrLPex8lQ47ZAE9bq6k4q5HiwK
KRzEqkQgnzId6cCNTFBfkTWsx89zZunz7jkfM5yx30MvdAtPSxvvpfIPdZRZkXsP
E2K9Fk1/6OQZTC0Op3Pi/bt+hVG/mD1p0sQUDgo2MO3qlSS+5mMkR8h3mJEgwK12
72P4YfOJkuAy2z3v4lL0GYdUDAZY6i6G8TMxERKu/a9O3VjTWICDOyBUS6F8YEAK
C7HlbZxAEOKTVK0BTDTeEUBwSeDrBbvH6MnRlZCG5g1Fos2aWP0udhjiX8IfZLO7
GN6nWBvK1fYzfsUczdhgnoCzQs3suoDo04HnsTPGJ8De52T4x2RsjV+gPx0nrNAq
eWChl1JvMWsY2B3GLnl9XQz4NNN+EreKEkk+PULDGllrArrPsp5Vnhb9FJO1PVCU
hMDJHohPiXnKbc8f4Bd78OhIvnuoGfJPdM5MtNe2flUKy2a2ops=
=YTGf
-----END PGP SIGNATURE-----
Merge tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull notification queue from David Howells:
"This adds a general notification queue concept and adds an event
source for keys/keyrings, such as linking and unlinking keys and
changing their attributes.
Thanks to Debarshi Ray, we do have a pull request to use this to fix a
problem with gnome-online-accounts - as mentioned last time:
https://gitlab.gnome.org/GNOME/gnome-online-accounts/merge_requests/47
Without this, g-o-a has to constantly poll a keyring-based kerberos
cache to find out if kinit has changed anything.
[ There are other notification pending: mount/sb fsinfo notifications
for libmount that Karel Zak and Ian Kent have been working on, and
Christian Brauner would like to use them in lxc, but let's see how
this one works first ]
LSM hooks are included:
- A set of hooks are provided that allow an LSM to rule on whether or
not a watch may be set. Each of these hooks takes a different
"watched object" parameter, so they're not really shareable. The
LSM should use current's credentials. [Wanted by SELinux & Smack]
- A hook is provided to allow an LSM to rule on whether or not a
particular message may be posted to a particular queue. This is
given the credentials from the event generator (which may be the
system) and the watch setter. [Wanted by Smack]
I've provided SELinux and Smack with implementations of some of these
hooks.
WHY
===
Key/keyring notifications are desirable because if you have your
kerberos tickets in a file/directory, your Gnome desktop will monitor
that using something like fanotify and tell you if your credentials
cache changes.
However, we also have the ability to cache your kerberos tickets in
the session, user or persistent keyring so that it isn't left around
on disk across a reboot or logout. Keyrings, however, cannot currently
be monitored asynchronously, so the desktop has to poll for it - not
so good on a laptop. This facility will allow the desktop to avoid the
need to poll.
DESIGN DECISIONS
================
- The notification queue is built on top of a standard pipe. Messages
are effectively spliced in. The pipe is opened with a special flag:
pipe2(fds, O_NOTIFICATION_PIPE);
The special flag has the same value as O_EXCL (which doesn't seem
like it will ever be applicable in this context)[?]. It is given up
front to make it a lot easier to prohibit splice&co from accessing
the pipe.
[?] Should this be done some other way? I'd rather not use up a new
O_* flag if I can avoid it - should I add a pipe3() system call
instead?
The pipe is then configured::
ioctl(fds[1], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
ioctl(fds[1], IOC_WATCH_QUEUE_SET_FILTER, &filter);
Messages are then read out of the pipe using read().
- It should be possible to allow write() to insert data into the
notification pipes too, but this is currently disabled as the
kernel has to be able to insert messages into the pipe *without*
holding pipe->mutex and the code to make this work needs careful
auditing.
- sendfile(), splice() and vmsplice() are disabled on notification
pipes because of the pipe->mutex issue and also because they
sometimes want to revert what they just did - but one or more
notification messages might've been interleaved in the ring.
- The kernel inserts messages with the wait queue spinlock held. This
means that pipe_read() and pipe_write() have to take the spinlock
to update the queue pointers.
- Records in the buffer are binary, typed and have a length so that
they can be of varying size.
This allows multiple heterogeneous sources to share a common
buffer; there are 16 million types available, of which I've used
just a few, so there is scope for others to be used. Tags may be
specified when a watchpoint is created to help distinguish the
sources.
- Records are filterable as types have up to 256 subtypes that can be
individually filtered. Other filtration is also available.
- Notification pipes don't interfere with each other; each may be
bound to a different set of watches. Any particular notification
will be copied to all the queues that are currently watching for it
- and only those that are watching for it.
- When recording a notification, the kernel will not sleep, but will
rather mark a queue as having lost a message if there's
insufficient space. read() will fabricate a loss notification
message at an appropriate point later.
- The notification pipe is created and then watchpoints are attached
to it, using one of:
keyctl_watch_key(KEY_SPEC_SESSION_KEYRING, fds[1], 0x01);
watch_mount(AT_FDCWD, "/", 0, fd, 0x02);
watch_sb(AT_FDCWD, "/mnt", 0, fd, 0x03);
where in both cases, fd indicates the queue and the number after is
a tag between 0 and 255.
- Watches are removed if either the notification pipe is destroyed or
the watched object is destroyed. In the latter case, a message will
be generated indicating the enforced watch removal.
Things I want to avoid:
- Introducing features that make the core VFS dependent on the
network stack or networking namespaces (ie. usage of netlink).
- Dumping all this stuff into dmesg and having a daemon that sits
there parsing the output and distributing it as this then puts the
responsibility for security into userspace and makes handling
namespaces tricky. Further, dmesg might not exist or might be
inaccessible inside a container.
- Letting users see events they shouldn't be able to see.
TESTING AND MANPAGES
====================
- The keyutils tree has a pipe-watch branch that has keyctl commands
for making use of notifications. Proposed manual pages can also be
found on this branch, though a couple of them really need to go to
the main manpages repository instead.
If the kernel supports the watching of keys, then running "make
test" on that branch will cause the testing infrastructure to spawn
a monitoring process on the side that monitors a notifications pipe
for all the key/keyring changes induced by the tests and they'll
all be checked off to make sure they happened.
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/keyutils.git/log/?h=pipe-watch
- A test program is provided (samples/watch_queue/watch_test) that
can be used to monitor for keyrings, mount and superblock events.
Information on the notifications is simply logged to stdout"
* tag 'notifications-20200601' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
smack: Implement the watch_key and post_notification hooks
selinux: Implement the watch_key security hook
keys: Make the KEY_NEED_* perms an enum rather than a mask
pipe: Add notification lossage handling
pipe: Allow buffers to be marked read-whole-or-error for notifications
Add sample notification program
watch_queue: Add a key/keyring notification facility
security: Add hooks to rule on setting a watch
pipe: Add general notification queue support
pipe: Add O_NOTIFICATION_PIPE
security: Add a hook for the point of notification insertion
uapi: General notification queue definitions
Pull READ/WRITE_ONCE rework from Will Deacon:
"This the READ_ONCE rework I've been working on for a while, which
bumps the minimum GCC version and improves code-gen on arm64 when
stack protector is enabled"
[ Side note: I'm _really_ tempted to raise the minimum gcc version to
4.9, so that we can just say that we require _Generic() support.
That would allow us to more cleanly handle a lot of the cases where we
depend on very complex macros with 'sizeof' or __builtin_choose_expr()
with __builtin_types_compatible_p() etc.
This branch has a workaround for sparse not handling _Generic(),
either, but that was already fixed in the sparse development branch,
so it's really just gcc-4.9 that we'd require. - Linus ]
* 'rwonce/rework' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
compiler_types.h: Use unoptimized __unqual_scalar_typeof for sparse
compiler_types.h: Optimize __unqual_scalar_typeof compilation time
compiler.h: Enforce that READ_ONCE_NOCHECK() access size is sizeof(long)
compiler-types.h: Include naked type in __pick_integer_type() match
READ_ONCE: Fix comment describing 2x32-bit atomicity
gcov: Remove old GCC 3.4 support
arm64: barrier: Use '__unqual_scalar_typeof' for acquire/release macros
locking/barriers: Use '__unqual_scalar_typeof' for load-acquire macros
READ_ONCE: Drop pointer qualifiers when reading from scalar types
READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses
READ_ONCE: Simplify implementations of {READ,WRITE}_ONCE()
arm64: csum: Disable KASAN for do_csum()
fault_inject: Don't rely on "return value" from WRITE_ONCE()
net: tls: Avoid assigning 'const' pointer to non-const pointer
netfilter: Avoid assigning 'const' pointer to non-const pointer
compiler/gcc: Raise minimum GCC version for kernel builds to 4.8
- fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32
- ensure to rebuild all objects when the compiler is upgraded
- exclude system headers from dependency tracking and fixdep processing
- fix potential bit-size mismatch between the kernel and BPF user-mode
helper
- add the new syntax 'userprogs' to build user-space programs for the
target architecture (the same arch as the kernel)
- compile user-space sample code under samples/ for the target arch
instead of the host arch
- make headers_install fail if a CONFIG option is leaked to user-space
- sanitize the output format of scripts/checkstack.pl
- handle ARM 'push' instruction in scripts/checkstack.pl
- error out before modpost if a module name conflict is found
- error out when multiple directories are passed to M= because this
feature is broken for a long time
- add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info
- a lot of cleanups of modpost
- dump vmlinux symbols out into vmlinux.symvers, and reuse it in the
second pass of modpost
- do not run the second pass of modpost if nothing in modules is updated
- install modules.builtin(.modinfo) by 'make install' as well as by
'make modules_install' because it is useful even when CONFIG_MODULES=n
- add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ
to allow users to use alternatives such as pigz, pbzip2, etc.
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl7brm0VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGjeEP/Rrf8H9cp/Tq+ALQCBycI3W5ZEHg
n2EqprZkVP2MlOV0d+8b9t4PdZf6E5Wmfv26sMaBAhl6X1KQI/0NgPMnTINvy5jJ
Q2SMhj9y8Gwr3XKFu9Hd/0U+Sax5rz+LmY84tdF95dXzPIUWjAEVnbmN+ofY6T++
sNf2YGNFSR6iiqr3uCYA0hHZmpKlfhVgDPAdncWa5aadSsuQb79nZQWefGeVEsuD
HrISpwnkhBc0qY1xyWry6agE92xWmkNkdjKq6A7peguZL02XySWLRWjyHoiiaPOB
6U4urKs/NSXqPgxGxwZthhwERHryC3+g4s8wRBDKE6ISRWKBBA2ruHpgdF5h/utu
re1ZP2qRcAt8NBFynr4MEb2AU0mYkv7iEgfLJ7NUCRlMOtqrn5RFwnS4r8ReyQp5
1UM11RbPhYgYjM5g9hBHJ7nK944/kfvy1/4jF4I1+M5O7QL6f00pu3r2bBIa/65g
DWrNOpIliKG27GgnRlxi7HgLfxs9etFcXTpHO0ymgnMmlz+7FQsdceR9qqybGU9o
yBWw6zculMQjb3E+k0DTnE5kLWsycbua921wxM9ABSxRmJi7WciNF73RdLUIBoAY
VUbwrP2aIpdL+2uyX6RqdTaWzEBpW8omszr46aQ96pX+RiqMrPvJRLaA/tr3ZH8g
tdHenJPWdHSaOcO4
=GKe5
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- fix warnings in 'make clean' for ARCH=um, hexagon, h8300, unicore32
- ensure to rebuild all objects when the compiler is upgraded
- exclude system headers from dependency tracking and fixdep processing
- fix potential bit-size mismatch between the kernel and BPF user-mode
helper
- add the new syntax 'userprogs' to build user-space programs for the
target architecture (the same arch as the kernel)
- compile user-space sample code under samples/ for the target arch
instead of the host arch
- make headers_install fail if a CONFIG option is leaked to user-space
- sanitize the output format of scripts/checkstack.pl
- handle ARM 'push' instruction in scripts/checkstack.pl
- error out before modpost if a module name conflict is found
- error out when multiple directories are passed to M= because this
feature is broken for a long time
- add CONFIG_DEBUG_INFO_COMPRESSED to support compressed debug info
- a lot of cleanups of modpost
- dump vmlinux symbols out into vmlinux.symvers, and reuse it in the
second pass of modpost
- do not run the second pass of modpost if nothing in modules is
updated
- install modules.builtin(.modinfo) by 'make install' as well as by
'make modules_install' because it is useful even when
CONFIG_MODULES=n
- add new command line variables, GZIP, BZIP2, LZOP, LZMA, LZ4, and XZ
to allow users to use alternatives such as pigz, pbzip2, etc.
* tag 'kbuild-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (96 commits)
kbuild: add variables for compression tools
Makefile: install modules.builtin even if CONFIG_MODULES=n
mksysmap: Fix the mismatch of '.L' symbols in System.map
kbuild: doc: rename LDFLAGS to KBUILD_LDFLAGS
modpost: change elf_info->size to size_t
modpost: remove is_vmlinux() helper
modpost: strip .o from modname before calling new_module()
modpost: set have_vmlinux in new_module()
modpost: remove mod->skip struct member
modpost: add mod->is_vmlinux struct member
modpost: remove is_vmlinux() call in check_for_{gpl_usage,unused}()
modpost: remove mod->is_dot_o struct member
modpost: move -d option in scripts/Makefile.modpost
modpost: remove -s option
modpost: remove get_next_text() and make {grab,release_}file static
modpost: use read_text_file() and get_line() for reading text files
modpost: avoid false-positive file open error
modpost: fix potential mmap'ed file overrun in get_src_version()
modpost: add read_text_file() and get_line() helpers
modpost: do not call get_modinfo() for vmlinux(.o)
...
This allows C code to make use of compilers with support for output
variables along the fallthrough path via preprocessor define:
CONFIG_CC_HAS_ASM_GOTO_OUTPUT
[ This is not used anywhere yet, and currently released compilers don't
support this yet, but it's coming, and I have some local experimental
patches to take advantage of it when it does - Linus ]
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some init systems (eg. systemd) have init at their own paths, for
example, /usr/lib/systemd/systemd. A compatibility symlink to one of the
hardcoded init paths is provided by another package, usually named
something like systemd-sysvcompat or similar.
Currently distro maintainers who are hands-off on the bootloader are more
or less required to include those compatibility links as part of their
base distribution, because it's hard to migrate away from them since
there's a risk some users will not get the message to set init= on the
kernel command line appropriately.
Moreover, for distributions where the init system is something the
distribution itself is opinionated about (eg. Arch, which has systemd in
the required `base` package), we could usually reasonably configure this
ahead of time when building the distribution kernel. However, we
currently simply don't have any way to configure the kernel to do this.
Here's an example discussion where removing sysvcompat was discussed by
distro maintainers[0].
This patch adds a new Kconfig tunable, CONFIG_DEFAULT_INIT, which if set
is tried before the hardcoded fallback list. So the order of precedence
is now thus:
1. init= on command line (on failure: panic)
2. CONFIG_DEFAULT_INIT (on failure: try #3)
3. Hardcoded fallback list (on failure: panic)
This new config parameter will allow distribution maintainers to move away
from these compatibility links safely, without having to worry that their
users might not have the right init=.
There are also two other benefits of this over having the distribution
maintain a symlink:
1. One of the value propositions over simply having distributions
maintain a /sbin/init symlink via a package is that it also frees
distributions which have a preferred default, but not mandatory, init
system from having their package manager fight with their users for
control of /{s,}bin/init. Instead, the distribution simply makes
their preference known in CONFIG_DEFAULT_INIT, and if the user
installs another init system and uninstalls the default one they can
still make use of /{s,}bin/init and friends for their own uses. This
makes more cases Just Work(tm) without the user having to perform
extra configuration via init=.
2. Since before this we don't know which path the distribution actually
_intends_ to serve init from, we don't pr_err if it is simply
missing, and usually will just silently put the user in a /bin/sh
shell. Now that the distribution can make a declaration of intent, we
can be more vocal when this init system fails to launch for any
reason, even if it's simply because no file exists at that location,
speeding up the palaver of init/mount dependency/etc debugging a bit.
[0]: https://lists.archlinux.org/pipermail/arch-dev-public/2019-January/029435.html
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: http://lkml.kernel.org/r/20200522160234.GA1487022@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge more updates from Andrew Morton:
"More mm/ work, plenty more to come
Subsystems affected by this patch series: slub, memcg, gup, kasan,
pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
thp, mmap, kconfig"
* akpm: (131 commits)
arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
riscv: support DEBUG_WX
mm: add DEBUG_WX support
drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
powerpc/mm: drop platform defined pmd_mknotpresent()
mm: thp: don't need to drain lru cache when splitting and mlocking THP
hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
sparc32: register memory occupied by kernel as memblock.memory
include/linux/memblock.h: fix minor typo and unclear comment
mm, mempolicy: fix up gup usage in lookup_node
tools/vm/page_owner_sort.c: filter out unneeded line
mm: swap: memcg: fix memcg stats for huge pages
mm: swap: fix vmstats for huge pages
mm: vmscan: limit the range of LRU type balancing
mm: vmscan: reclaim writepage is IO cost
mm: vmscan: determine anon/file pressure balance at the reclaim root
mm: balance LRU lists based on relative thrashing
mm: only count actual rotations as LRU reclaim cost
...
Without swap page tracking, users that are otherwise memory controlled can
easily escape their containment and allocate significant amounts of memory
that they're not being charged for. That's because swap does readahead,
but without the cgroup records of who owned the page at swapout, readahead
pages don't get charged until somebody actually faults them into their
page table and we can identify an owner task. This can be maliciously
exploited with MADV_WILLNEED, which triggers arbitrary readahead
allocations without charging the pages.
Make swap swap page tracking an integral part of memcg and remove the
Kconfig options. In the first place, it was only made configurable to
allow users to save some memory. But the overhead of tracking cgroup
ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
swap, or 0.04%. Saving that at the expense of broken containment
semantics is not something we should present as a coequal option.
The swapaccount=0 boot option will continue to exist, and it will
eliminate the page_counter overhead and hide the swap control files, but
it won't disable swap slot ownership tracking.
This patch makes sure we always have the cgroup records at swapin time;
the next patch will fix the actual bug by charging readahead swap pages at
swapin time rather than at fault time.
v2: fix double swap charge bug in cgroup1/cgroup2 code gating
[hannes@cmpxchg.org: fix crash with cgroup_disable=memory]
Link: http://lkml.kernel.org/r/20200521215855.GB815153@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org
Debugged-by: Hugh Dickins <hughd@google.com>
Debugged-by: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the generic
PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
-----BEGIN PGP SIGNATURE-----
iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAl7WK54aHHRzYm9nZW5k
QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHAbjA/9EEFeqNg9UNUH6/TS18QV
qkxKp0+LC4Jk+SduzLyYsYy6l/dSaKYl8m9jyJsWjM6BvBZTcMJJOnzIPRafI0s+
MK8GCSZunAkm25DsDvfobQUkbQ/UHjY/fuRpNslbDcsYqIKv90hUMd21ccXY6KC5
RY+aMlpjgksg1X8JJ7k1Rs05sXyUPqpESteyqehF1b/+Iyv7H2L3v5EvQwvPDs6f
TyVgNJU2B3RCU6/uAcWmHdVLxXd+Y8fM0vC8DCO0pg0rGf4be0FbZztHmeq6r2wy
g7wsO7acKWGzulFQD5ftVSQ6i8KHIDNDePmDMtU5oFcXkzUDdGvd3j3Gst19/nve
ZftNmQHOY1JqGUOhdq1fDG/4M3Vc5bvh3W6eMG22TuMLEWsOF8teY8uUa/vxOb+B
2NsJ9q6ylRS7RDWWOrApJWfFYPvhr5wlLxT+azWNa9y3bjV8vDLjNdU0mRLA1nsu
yLzYMwIhtWfZhkJZ+xJVSmQ6LjAHDN5TF/LEx/9itLg5t9wrEosFPAtOv8V15hy4
KBNvvWeoy7RRmBTNuKh7r9Ui4jw7GgxL4D1OwzCsF//GAiGyuuh0zMuUE8EXA6K5
MpdGt+bSOcLl8ILTtGir8e4MXLawDH8n94f8QWLb9FcOvU4KHUjRKU7EQ6dyD5dk
a7xskGLXWdVO3IJ/Xvxcaeo=
=eAtN
-----END PGP SIGNATURE-----
Merge tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS updates from Thomas Bogendoerfer:
- added support for MIPSr5 and P5600 cores
- converted Loongson PCI driver into a PCI host driver using the
generic PCI framework
- added emulation of CPUCFG command for Loogonson64 cpus
- removed of LASAT, PMC MSP71xx and NEC MARKEINS/EMMA
- ioremap cleanup
- fix for a race between two threads faulting the same page
- various cleanups and fixes
* tag 'mips_5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (143 commits)
MIPS: ralink: drop ralink_clk_init for mt7621
MIPS: ralink: bootrom: mark a function as __init to save some memory
MIPS: Loongson64: Reorder CPUCFG model match arms
MIPS: Expose Loongson CPUCFG availability via HWCAP
MIPS: Loongson64: Guard against future cores without CPUCFG
MIPS: Fix build warning about "PTR_STR" redefinition
MIPS: Loongson64: Remove not used pci.c
MIPS: Loongson64: Define PCI_IOBASE
MIPS: CPU_LOONGSON2EF need software to maintain cache consistency
MIPS: DTS: Fix build errors used with various configs
MIPS: Loongson64: select NO_EXCEPT_FILL
MIPS: Fix IRQ tracing when call handle_fpe() and handle_msa_fpe()
MIPS: mm: add page valid judgement in function pte_modify
mm/memory.c: Add memory read privilege on page fault handling
mm/memory.c: Update local TLB if PTE entry exists
MIPS: Do not flush tlb page when updating PTE entry
MIPS: ingenic: Default to a generic board
MIPS: ingenic: Add support for GCW Zero prototype
MIPS: ingenic: DTS: Add memory info of GCW Zero
MIPS: Loongson64: Switch to generic PCI driver
...
Make it possible to have a general notification queue built on top of a
standard pipe. Notifications are 'spliced' into the pipe and then read
out. splice(), vmsplice() and sendfile() are forbidden on pipes used for
notifications as post_one_notification() cannot take pipe->mutex. This
means that notifications could be posted in between individual pipe
buffers, making iov_iter_revert() difficult to effect.
The way the notification queue is used is:
(1) An application opens a pipe with a special flag and indicates the
number of messages it wishes to be able to queue at once (this can
only be set once):
pipe2(fds, O_NOTIFICATION_PIPE);
ioctl(fds[0], IOC_WATCH_QUEUE_SET_SIZE, queue_depth);
(2) The application then uses poll() and read() as normal to extract data
from the pipe. read() will return multiple notifications if the
buffer is big enough, but it will not split a notification across
buffers - rather it will return a short read or EMSGSIZE.
Notification messages include a length in the header so that the
caller can split them up.
Each message has a header that describes it:
struct watch_notification {
__u32 type:24;
__u32 subtype:8;
__u32 info;
};
The type indicates the source (eg. mount tree changes, superblock events,
keyring changes, block layer events) and the subtype indicates the event
type (eg. mount, unmount; EIO, EDQUOT; link, unlink). The info field
indicates a number of things, including the entry length, an ID assigned to
a watchpoint contributing to this buffer and type-specific flags.
Supplementary data, such as the key ID that generated an event, can be
attached in additional slots. The maximum message size is 127 bytes.
Messages may not be padded or aligned, so there is no guarantee, for
example, that the notification type will be on a 4-byte bounary.
Signed-off-by: David Howells <dhowells@redhat.com>
On Fedora, linking static glibc requires the glibc-static RPM package,
which is not part of the glibc-devel package.
CONFIG_CC_CAN_LINK does not check the capability of static linking,
so you can enable CONFIG_BPFILTER_UMH, then fail to build:
HOSTLD net/bpfilter/bpfilter_umh
/usr/bin/ld: cannot find -lc
collect2: error: ld returned 1 exit status
Add CONFIG_CC_CAN_LINK_STATIC, and make CONFIG_BPFILTER_UMH depend
on it.
Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Alexei Starovoitov <ast@kernel.org>
bpfilter_umh is built for the default machine bit of the compiler,
which may not match to the bit size of the kernel.
This happens in the scenario below:
You can use biarch GCC that defaults to 64-bit for building the 32-bit
kernel. In this case, Kbuild passes -m32 to teach the compiler to
produce 32-bit kernel space objects. However, it is missing when
building bpfilter_umh. It is built as a 64-bit ELF, and then embedded
into the 32-bit kernel.
The 32-bit kernel and 64-bit umh is a bad combination.
In theory, we can have 32-bit umh running on 64-bit kernel, but we do
not have a good reason to support such a usecase.
The best is to match the bit size between them.
Pass -m32 or -m64 to the umh build command if it is found in
$(KBUILD_CFLAGS). Evaluate CC_CAN_LINK against the kernel bit-size.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull networking fixes from David Miller:
1) Fix sk_psock reference count leak on receive, from Xiyu Yang.
2) CONFIG_HNS should be invisible, from Geert Uytterhoeven.
3) Don't allow locking route MTUs in ipv6, RFCs actually forbid this,
from Maciej Żenczykowski.
4) ipv4 route redirect backoff wasn't actually enforced, from Paolo
Abeni.
5) Fix netprio cgroup v2 leak, from Zefan Li.
6) Fix infinite loop on rmmod in conntrack, from Florian Westphal.
7) Fix tcp SO_RCVLOWAT hangs, from Eric Dumazet.
8) Various bpf probe handling fixes, from Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (68 commits)
selftests: mptcp: pm: rm the right tmp file
dpaa2-eth: properly handle buffer size restrictions
bpf: Restrict bpf_trace_printk()'s %s usage and add %pks, %pus specifier
bpf: Add bpf_probe_read_{user, kernel}_str() to do_refine_retval_range
bpf: Restrict bpf_probe_read{, str}() only to archs where they work
MAINTAINERS: Mark networking drivers as Maintained.
ipmr: Add lockdep expression to ipmr_for_each_table macro
ipmr: Fix RCU list debugging warning
drivers: net: hamradio: Fix suspicious RCU usage warning in bpqether.c
net: phy: broadcom: fix BCM54XX_SHD_SCR3_TRDDAPD value for BCM54810
tcp: fix error recovery in tcp_zerocopy_receive()
MAINTAINERS: Add Jakub to networking drivers.
MAINTAINERS: another add of Karsten Graul for S390 networking
drivers: ipa: fix typos for ipa_smp2p structure doc
pppoe: only process PADT targeted at local interfaces
selftests/bpf: Enforce returning 0 for fentry/fexit programs
bpf: Enforce returning 0 for fentry/fexit progs
net: stmmac: fix num_por initialization
security: Fix the default value of secid_to_secctx hook
libbpf: Fix register naming in PT_REGS s390 macros
...
Given the legacy bpf_probe_read{,str}() BPF helpers are broken on archs
with overlapping address ranges, we should really take the next step to
disable them from BPF use there.
To generally fix the situation, we've recently added new helper variants
bpf_probe_read_{user,kernel}() and bpf_probe_read_{user,kernel}_str().
For details on them, see 6ae08ae3de ("bpf: Add probe_read_{user, kernel}
and probe_read_{user,kernel}_str helpers").
Given bpf_probe_read{,str}() have been around for ~5 years by now, there
are plenty of users at least on x86 still relying on them today, so we
cannot remove them entirely w/o breaking the BPF tracing ecosystem.
However, their use should be restricted to archs with non-overlapping
address ranges where they are working in their current form. Therefore,
move this behind a CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE and
have x86, arm64, arm select it (other archs supporting it can follow-up
on it as well).
For the remaining archs, they can workaround easily by relying on the
feature probe from bpftool which spills out defines that can be used out
of BPF C code to implement the drop-in replacement for old/new kernels
via: bpftool feature probe macro
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/bpf/20200515101118.6508-2-daniel@iogearbox.net
Similarly to the CC_IS_CLANG config, add LD_IS_LLD to avoid GNU ld
specific logic such as ld-version or ld-ifversion and gain the
ability to select potential features that depend on the linker at
configuration time such as LTO.
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
[nc: Reword commit message]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Reviewed-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Commit 21c54b7747 ("kconfig: show compiler version text in the top
comment") added the environment variable, CC_VERSION_TEXT in the comment
of the top Kconfig file. It can detect the compiler update, and invoke
the syncconfig because all environment variables referenced in Kconfig
files are recorded in include/config/auto.conf.cmd
This commit makes it a CONFIG option in order to ensure the full rebuild
when the compiler is updated.
This works like follows:
include/config/kconfig.h contains "CONFIG_CC_VERSION_TEXT" in the comment
block.
The top Makefile specifies "-include $(srctree)/include/linux/kconfig.h"
to guarantee it is included from all kernel source files.
fixdep parses every source file and all headers included from it,
searching for words prefixed with "CONFIG_". Then, fixdep finds
CONFIG_CC_VERSION_TEXT in include/config/kconfig.h and adds
include/config/cc/version/text.h into every .*.cmd file.
When the compiler is updated, syncconfig is invoked because init/Kconfig
contains the reference to the environment variable CC_VERTION_TEXT.
CONFIG_CC_VERSION_TEXT is updated to the new version string, and
include/config/cc/version/text.h is touched.
In the next rebuild, Make will rebuild every files since the timestamp
of include/config/cc/version/text.h is newer than that of target.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
The result of '$(CC) --version | head -n 1' has already been computed
by the top Makefile, and stored in the environment variable,
CC_VERSION_TEXT.
'echo' is cheaper than the two commands $(CC) and 'head' although this
optimization is not noticeable level.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Nathan Chancellor <natechancellor@gmail.com>
We have some rather random rules about when we accept the
"maybe-initialized" warnings, and when we don't.
For example, we consider it unreliable for gcc versions < 4.9, but also
if -O3 is enabled, or if optimizing for size. And then various kernel
config options disabled it, because they know that they trigger that
warning by confusing gcc sufficiently (ie PROFILE_ALL_BRANCHES).
And now gcc-10 seems to be introducing a lot of those warnings too, so
it falls under the same heading as 4.9 did.
At the same time, we have a very straightforward way to _enable_ that
warning when wanted: use "W=2" to enable more warnings.
So stop playing these ad-hoc games, and just disable that warning by
default, with the known and straight-forward "if you want to work on the
extra compiler warnings, use W=123".
Would it be great to have code that is always so obvious that it never
confuses the compiler whether a variable is used initialized or not?
Yes, it would. In a perfect world, the compilers would be smarter, and
our source code would be simpler.
That's currently not the world we live in, though.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is very rare to see versions of GCC prior to 4.8 being used to build
the mainline kernel. These old compilers are also know to have codegen
issues which can lead to silent miscompilation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145
Raise the minimum GCC version for kernel build to 4.8 and remove some
tautological Kconfig dependencies as a consequence.
Cc: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Will Deacon <will@kernel.org>
- Ensure that the compiler and linker versions are aligned so that ld
doesn't complain about not understanding a .note.gnu.property section
(emitted when pointer authentication is enabled).
- Force -mbranch-protection=none when the feature is not enabled, in
case a compiler may choose a different default value.
- Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and rarely
enabled.
- Fix checking 16-bit Thumb-2 instructions checking mask in the
emulation of the SETEND instruction (it could match the bottom half of
a 32-bit Thumb-2 instruction).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAl6PUYAACgkQa9axLQDI
XvH83g/7B5v0RFqjqVW4/cQKoN1rii7qSA8pBfNgGiCMJKtoGvliAlp3xWEtlW0h
nYJ4gCvey946r5kvZrjdBXC/Ulo2CcGYtX0n8d+8IB6wXAnGcQ0DUBUFZ4+fAU9Z
F7+R7its24dma9R1wIFHFmQUdlO+EgQTfQFvhQKYMSNVaFQF73Sp/vk3oKhJ2E0x
QevgDBQSmmcX3DFxhUW7BdcdboBgtTDUGdhcImdorgp7QmI1r40espJKX4VMKvmb
pfzwg+i7KM6N1RDhRfA2oFMegXwI3rvM3XesqYaua8+xWD5vJuIQfq+ysEq9F9x/
Hnu+W9nbcN8RKQ9JToiqkE7ifuOBTvaIJaqsgIXYSqtYjatuPAh85MkrorHi9Ji2
9i7fc0GMTgtgYDo/93++l8SmmRJMX+h+9KtGtxx39+UqGjToJMCnPGjwBSwe4wdK
lKOAgj488HHsNwTlrRUnq1hXjNjd1w+ON7JM2L3IyRNX/eWN60VxwzwHkZMByCOj
jlcY4ISWquigW4w9Sp4nxEhLF9dWT1+OrE33Xh3CUxPU94jSEvgcDHcxuGeGOlrA
QjN1B2APZFox8XbOsLgeG2kKe5C3Fui90SEn0GyA0ncVLsXDI78VnVJR9uz5+6Pd
ALVQKkJxswhSDPQFlH+7CmQAcr8jWyLEEvyXXaZsoJmewzCpEPM=
=pHRG
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Ensure that the compiler and linker versions are aligned so that ld
doesn't complain about not understanding a .note.gnu.property section
(emitted when pointer authentication is enabled).
- Force -mbranch-protection=none when the feature is not enabled, in
case a compiler may choose a different default value.
- Remove CONFIG_DEBUG_ALIGN_RODATA. It was never in defconfig and
rarely enabled.
- Fix checking 16-bit Thumb-2 instructions checking mask in the
emulation of the SETEND instruction (it could match the bottom half
of a 32-bit Thumb-2 instruction).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: armv8_deprecated: Fix undef_hook mask for thumb setend
arm64: remove CONFIG_DEBUG_ALIGN_RODATA feature
arm64: Always force a branch protection mode when the compiler has one
arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch
init/kconfig: Add LD_VERSION Kconfig
Accurate userfaultfd WP tracking is possible by tracking exactly which
virtual memory ranges were writeprotected by userland. We can't relay
only on the RW bit of the mapped pagetable because that information is
destroyed by fork() or KSM or swap. If we were to relay on that, we'd
need to stay on the safe side and generate false positive wp faults for
every swapped out page.
[peterx@redhat.com: append _PAGE_UFD_WP to _PAGE_CHG_MASK]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Shaohua Li <shli@fb.com>
Link: http://lkml.kernel.org/r/20200220163112.11409-4-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
core:
- Support for cgroup tracking in samples to allow cgroup based
analysis
tools:
- Support for cgroup analysis
- Commandline option and hotkey for perf top to change the sort order
- A set of fixes all over the place
- Various build system related improvements
- Updates of the X86 pmu event JSON data
- Documentation updates
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6J2iATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoXMEEACy6WaNabX7EzkLUnX0WCgxnZVSryNR
4EnxLJSg5lChEe4q2mOS3mRMpzlXHQieWcFxlwda7kjIbFlgQvjJQUiYlAvxRRO/
giY/GwCtTi/Flcb+7wKxTgMYmtAUOZDWeQbBZUlFLi9vyeCHVkjget9EyVsgbe/W
EmRsrPuKOVMUTeEwm3zpIE051DObpiWLNge++My70q5W/yNsS94PbNydgKO7osqk
pX37YVyBFpI2IQxMGzaE3yK7OxXRjYljZaz1tONFDMEYOX9gmxpDsCCflsP1ZOzL
4/P4faRvAOElwxtYBelKmRl8eboqhRpTEK0Et0TI0LYbUZrE2nisDi0LTKPWQb0k
Om2Qi6AfZs67PVzx9htlx6rfee72+sUluz5BDKOGH0pNJ8CFy70ns8InLsZqbBZ7
SgFVNjx6bHxB58VuVE9WEzr/KVs6zI/SuJlH7WG7FLXbm5j0cETfCjg40JlDpSNh
CZs+Epky1zyytrVJ9Gc7KnRlw8VB2eWEQ2cQ0sqj5w7WxhfrnsCmQf1zk4sofhOF
iz3Dvb9Llz/pYWGZbEiQAuI+8bo0psJptidzpmpbIXs/woKDhW49w1FxqowJQMWe
+lGWcauSfo3gjgEnTkOWAx3yiH4i9rvRChX8Jh0Z07d6Kwf19YYfxcuhlkx0Wutj
eKaaErWDtWAxpA==
=2UUD
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more perf updates from Thomas Gleixner:
"Perf updates all over the place:
core:
- Support for cgroup tracking in samples to allow cgroup based
analysis
tools:
- Support for cgroup analysis
- Commandline option and hotkey for perf top to change the sort order
- A set of fixes all over the place
- Various build system related improvements
- Updates of the X86 pmu event JSON data
- Documentation updates"
* tag 'perf-urgent-2020-04-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits)
perf python: Fix clang detection to strip out options passed in $CC
perf tools: Support Python 3.8+ in Makefile
perf script: Fix invalid read of directory entry after closedir()
perf script report: Fix SEGFAULT when using DWARF mode
perf script: add -S/--symbols documentation
perf pmu-events x86: Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf events parser: Add missing Intel CPU events to parser
perf script: Allow --symbol to accept hexadecimal addresses
perf report/top TUI: Fix title line formatting
perf top: Support hotkey to change sort order
perf top: Support --group-sort-idx to change the sort order
perf symbols: Fix arm64 gap between kernel start and module end
perf build-test: Honour JOBS to override detection of number of cores
perf script: Add --show-cgroup-events option
perf top: Add --all-cgroups option
perf record: Add --all-cgroups option
perf record: Support synthesizing cgroup events
perf report: Add 'cgroup' sort key
perf cgroup: Maintain cgroup hierarchy
perf tools: Basic support for CGROUP event
...
perf python:
Arnaldo Carvalho de Melo:
- Fix clang detection to strip out options passed in $CC.
build:
He Zhe:
- Normalize gcc parameter when generating arch errno table, fixing
the build by removing options from $(CC).
Sam Lunt:
- Support Python 3.8+ in Makefile.
perf report/top:
Arnaldo Carvalho de Melo:
- Fix title line formatting.
perf script:
Andreas Gerstmayr:
- Fix SEGFAULT when using DWARF mode.
- Fix invalid read of directory entry after closedir(), found with valgrind.
Hagen Paul Pfeifer:
- Introduce --deltatime option.
Stephane Eranian:
- Allow --symbol to accept hexadecimal addresses.
Ian Rogers:
- Add -S/--symbols documentation
Namhyung Kim:
- Add --show-cgroup-events option.
perf python:
Arnaldo Carvalho de Melo:
- Include rwsem.c in the python binding, needed by the cgroups improvements.
build-test:
Arnaldo Carvalho de Melo:
- Honour JOBS to override detection of number of cores
perf top:
Jin Yao:
- Support --group-sort-idx to change the sort order
- perf top: Support hotkey to change sort order
perf pmu-events x86:
Jin Yao:
- Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf symbols arm64:
Kemeng Shi:
- Fix arm64 gap between kernel start and module end
kernel perf subsystem:
Namhyung Kim:
- Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
to allow cgroup tracking, saving a link between cgroup path and
its id number.
perf cgroup:
Namhyung Kim:
- Maintain cgroup hierarchy.
perf report:
Namhyung Kim:
- Add 'cgroup' sort key.
perf record:
Namhyung Kim:
- Support synthesizing cgroup events for pre-existing cgroups.
- Add --all-cgroups option
Documentation:
Tony Jones:
- Update docs regarding kernel/user space unwinding.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXodMtwAKCRCyPKLppCJ+
J+leAP0Ws0dGzMSIwcMVc7zvK1IsYOTlZ8lYXJePxD+Po/YPdAEA6Squf4gwZ2wm
b9R7w50dlCkMJ9LaueCeZZjh/4asFwQ=
=auOb
-----END PGP SIGNATURE-----
Merge tag 'perf-urgent-for-mingo-5.7-20200403' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
Pull perf/urgent fixes and improvements from Arnaldo Carvalho de Melo:
perf python:
Arnaldo Carvalho de Melo:
- Fix clang detection to strip out options passed in $CC.
build:
He Zhe:
- Normalize gcc parameter when generating arch errno table, fixing
the build by removing options from $(CC).
Sam Lunt:
- Support Python 3.8+ in Makefile.
perf report/top:
Arnaldo Carvalho de Melo:
- Fix title line formatting.
perf script:
Andreas Gerstmayr:
- Fix SEGFAULT when using DWARF mode.
- Fix invalid read of directory entry after closedir(), found with valgrind.
Hagen Paul Pfeifer:
- Introduce --deltatime option.
Stephane Eranian:
- Allow --symbol to accept hexadecimal addresses.
Ian Rogers:
- Add -S/--symbols documentation
Namhyung Kim:
- Add --show-cgroup-events option.
perf python:
Arnaldo Carvalho de Melo:
- Include rwsem.c in the python binding, needed by the cgroups improvements.
build-test:
Arnaldo Carvalho de Melo:
- Honour JOBS to override detection of number of cores
perf top:
Jin Yao:
- Support --group-sort-idx to change the sort order
- perf top: Support hotkey to change sort order
perf pmu-events x86:
Jin Yao:
- Use CPU_CLK_UNHALTED.THREAD in Kernel_Utilization metric
perf symbols arm64:
Kemeng Shi:
- Fix arm64 gap between kernel start and module end
kernel perf subsystem:
Namhyung Kim:
- Add PERF_RECORD_CGROUP event and Add PERF_SAMPLE_CGROUP feature,
to allow cgroup tracking, saving a link between cgroup path and
its id number.
perf cgroup:
Namhyung Kim:
- Maintain cgroup hierarchy.
perf report:
Namhyung Kim:
- Add 'cgroup' sort key.
perf record:
Namhyung Kim:
- Support synthesizing cgroup events for pre-existing cgroups.
- Add --all-cgroups option
Documentation:
Tony Jones:
- Update docs regarding kernel/user space unwinding.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This option can be used in Kconfig files to compare the ld version
and enable/disable incompatible config options if required.
This option is used in the subsequent patch along with GCC_VERSION to
filter out an incompatible feature.
Signed-off-by: Amit Daniel Kachhap <amit.kachhap@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Pull networking updates from David Miller:
"Highlights:
1) Fix the iwlwifi regression, from Johannes Berg.
2) Support BSS coloring and 802.11 encapsulation offloading in
hardware, from John Crispin.
3) Fix some potential Spectre issues in qtnfmac, from Sergey
Matyukevich.
4) Add TTL decrement action to openvswitch, from Matteo Croce.
5) Allow paralleization through flow_action setup by not taking the
RTNL mutex, from Vlad Buslov.
6) A lot of zero-length array to flexible-array conversions, from
Gustavo A. R. Silva.
7) Align XDP statistics names across several drivers for consistency,
from Lorenzo Bianconi.
8) Add various pieces of infrastructure for offloading conntrack, and
make use of it in mlx5 driver, from Paul Blakey.
9) Allow using listening sockets in BPF sockmap, from Jakub Sitnicki.
10) Lots of parallelization improvements during configuration changes
in mlxsw driver, from Ido Schimmel.
11) Add support to devlink for generic packet traps, which report
packets dropped during ACL processing. And use them in mlxsw
driver. From Jiri Pirko.
12) Support bcmgenet on ACPI, from Jeremy Linton.
13) Make BPF compatible with RT, from Thomas Gleixnet, Alexei
Starovoitov, and your's truly.
14) Support XDP meta-data in virtio_net, from Yuya Kusakabe.
15) Fix sysfs permissions when network devices change namespaces, from
Christian Brauner.
16) Add a flags element to ethtool_ops so that drivers can more simply
indicate which coalescing parameters they actually support, and
therefore the generic layer can validate the user's ethtool
request. Use this in all drivers, from Jakub Kicinski.
17) Offload FIFO qdisc in mlxsw, from Petr Machata.
18) Support UDP sockets in sockmap, from Lorenz Bauer.
19) Fix stretch ACK bugs in several TCP congestion control modules,
from Pengcheng Yang.
20) Support virtual functiosn in octeontx2 driver, from Tomasz
Duszynski.
21) Add region operations for devlink and use it in ice driver to dump
NVM contents, from Jacob Keller.
22) Add support for hw offload of MACSEC, from Antoine Tenart.
23) Add support for BPF programs that can be attached to LSM hooks,
from KP Singh.
24) Support for multiple paths, path managers, and counters in MPTCP.
From Peter Krystad, Paolo Abeni, Florian Westphal, Davide Caratti,
and others.
25) More progress on adding the netlink interface to ethtool, from
Michal Kubecek"
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2121 commits)
net: ipv6: rpl_iptunnel: Fix potential memory leak in rpl_do_srh_inline
cxgb4/chcr: nic-tls stats in ethtool
net: dsa: fix oops while probing Marvell DSA switches
net/bpfilter: remove superfluous testing message
net: macb: Fix handling of fixed-link node
net: dsa: ksz: Select KSZ protocol tag
netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
net: stmmac: add EHL 2.5Gbps PCI info and PCI ID
net: stmmac: add EHL PSE0 & PSE1 1Gbps PCI info and PCI ID
net: stmmac: create dwmac-intel.c to contain all Intel platform
net: dsa: bcm_sf2: Support specifying VLAN tag egress rule
net: dsa: bcm_sf2: Add support for matching VLAN TCI
net: dsa: bcm_sf2: Move writing of CFP_DATA(5) into slicing functions
net: dsa: bcm_sf2: Check earlier for FLOW_EXT and FLOW_MAC_EXT
net: dsa: bcm_sf2: Disable learning for ASP port
net: dsa: b53: Deny enslaving port 7 for 7278 into a bridge
net: dsa: b53: Prevent tagged VLAN on port 7 for 7278
net: dsa: b53: Restore VLAN entries upon (re)configuration
net: dsa: bcm_sf2: Fix overflow checks
hv_netvsc: Remove unnecessary round_up for recv_completion_cnt
...
[Build system]
- add CONFIG_UNUSED_KSYMS_WHITELIST, which will be useful to define
a fixed set of export symbols for Generic Kernel Image (GKI)
- allow to run 'make dt_binding_check' without .config
- use full schema for checking DT examples in *.yaml files
- make modpost fail for missing MODULE_IMPORT_NS(), which makes more
sense because we know the produced modules are never loadable
- Remove unused 'AS' variable
[Kconfig]
- sanitize DEFCONFIG_LIST, and remove ARCH_DEFCONFIG from Kconfig files
- relax the 'imply' behavior so that symbols implied by y can become m
- make 'imply' obey 'depends on' in order to make 'imply' really weak
[Misc]
- add documentation on building the kernel with Clang/LLVM
- revive __HAVE_ARCH_STRLEN for 32bit sparc to use optimized strlen()
- fix warning from deb-pkg builds when CONFIG_DEBUG_INFO=n
- various script and Makefile cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl6DbP8VHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGAfkQALZqMCqtX9cAJej04+lnBCzwVPep
6s8/s6vW6PF92sHv+SJtHvKSnDekcZT2xT8dkPDaVmuOye8xhENs5dFZ4tSKO5D0
F8YkkM17mu/cylNZ2UCy/8weh6/TjsD7pa+mFqWo/++30JiXm12v3mVFR568KPXI
kFau/3ALvY1NIr2wUAI2SOd6A4v/Epzpk0ltnFg3f5iWVFKlE03MGueAF+YZzq7v
UrU73HdUxF/SBW2Jz3UtV9XY8P38uQmmtoDE8SZikG4PjW03q9w6pnhntDBl/H2b
dZFg40eG7SHXN4L+OOI32ae9jePHvKpsnjeaeNoT/DZpwpuuxXu7C2EmUy+wCAnM
Rw4+kiAVNppRMRH1GTdp1XjLY6PwPqizzZGmufwX+W3MI8oZdlLSUJLbrO73P/aF
QR3MgkJkjvgmRVPP9fr8SNcZ39tDGI4KqLdWvjVVSC/s86aDnw/34puEfw0lj4vs
gCi923iJQ7Y/QWX63TYZhy96pnedlwE2s6aR1InVER3+XMH9K1nW34CDaKQsp1CB
6zyrd40+K5ETOKo3OAjq4FttlhRkEpX9nIsffCzOz6tybysHTSrCzYhfjpIAzzYj
Et5HpXbegHShIqN44yqBumt6YkTZac6Aub9FzInW2LPzZgiofDaNesDQmnQmIZOa
JlUyBrjXRfwkvCH0
=wT8A
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
"Build system:
- add CONFIG_UNUSED_KSYMS_WHITELIST, which will be useful to define a
fixed set of export symbols for Generic Kernel Image (GKI)
- allow to run 'make dt_binding_check' without .config
- use full schema for checking DT examples in *.yaml files
- make modpost fail for missing MODULE_IMPORT_NS(), which makes more
sense because we know the produced modules are never loadable
- Remove unused 'AS' variable
Kconfig:
- sanitize DEFCONFIG_LIST, and remove ARCH_DEFCONFIG from Kconfig
files
- relax the 'imply' behavior so that symbols implied by 'y' can
become 'm'
- make 'imply' obey 'depends on' in order to make 'imply' really weak
Misc:
- add documentation on building the kernel with Clang/LLVM
- revive __HAVE_ARCH_STRLEN for 32bit sparc to use optimized strlen()
- fix warning from deb-pkg builds when CONFIG_DEBUG_INFO=n
- various script and Makefile cleanups"
* tag 'kbuild-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
Makefile: Update kselftest help information
kbuild: deb-pkg: fix warning when CONFIG_DEBUG_INFO is unset
kbuild: add outputmakefile to no-dot-config-targets
kbuild: remove AS variable
net: wan: wanxl: refactor the firmware rebuild rule
net: wan: wanxl: use $(M68KCC) instead of $(M68KAS) for rebuilding firmware
net: wan: wanxl: use allow to pass CROSS_COMPILE_M68k for rebuilding firmware
kbuild: add comment about grouped target
kbuild: add -Wall to KBUILD_HOSTCXXFLAGS
kconfig: remove unused variable in qconf.cc
sparc: revive __HAVE_ARCH_STRLEN for 32bit sparc
kbuild: refactor Makefile.dtbinst more
kbuild: compute the dtbs_install destination more simply
Makefile: disallow data races on gcc-10 as well
kconfig: make 'imply' obey the direct dependency
kconfig: allow symbols implied by y to become m
net: drop_monitor: use IS_REACHABLE() to guard net_dm_hw_report()
modpost: return error if module is missing ns imports and MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS=n
modpost: rework and consolidate logging interface
kbuild: allow to run dt_binding_check without kernel configuration
...
Pull scheduler updates from Ingo Molnar:
"The main changes in this cycle are:
- Various NUMA scheduling updates: harmonize the load-balancer and
NUMA placement logic to not work against each other. The intended
result is better locality, better utilization and fewer migrations.
- Introduce Thermal Pressure tracking and optimizations, to improve
task placement on thermally overloaded systems.
- Implement frequency invariant scheduler accounting on (some) x86
CPUs. This is done by observing and sampling the 'recent' CPU
frequency average at ~tick boundaries. The CPU provides this data
via the APERF/MPERF MSRs. This hopefully makes our capacity
estimates more precise and keeps tasks on the same CPU better even
if it might seem overloaded at a lower momentary frequency. (As
usual, turbo mode is a complication that we resolve by observing
the maximum frequency and renormalizing to it.)
- Add asymmetric CPU capacity wakeup scan to improve capacity
utilization on asymmetric topologies. (big.LITTLE systems)
- PSI fixes and optimizations.
- RT scheduling capacity awareness fixes & improvements.
- Optimize the CONFIG_RT_GROUP_SCHED constraints code.
- Misc fixes, cleanups and optimizations - see the changelog for
details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
threads: Update PID limit comment according to futex UAPI change
sched/fair: Fix condition of avg_load calculation
sched/rt: cpupri_find: Trigger a full search as fallback
kthread: Do not preempt current task if it is going to call schedule()
sched/fair: Improve spreading of utilization
sched: Avoid scale real weight down to zero
psi: Move PF_MEMSTALL out of task->flags
MAINTAINERS: Add maintenance information for psi
psi: Optimize switching tasks inside shared cgroups
psi: Fix cpu.pressure for cpu.max and competing cgroups
sched/core: Distribute tasks within affinity masks
sched/fair: Fix enqueue_task_fair warning
thermal/cpu-cooling, sched/core: Move the arch_set_thermal_pressure() API to generic scheduler code
sched/rt: Remove unnecessary push for unfit tasks
sched/rt: Allow pulling unfitting task
sched/rt: Optimize cpupri_find() on non-heterogenous systems
sched/rt: Re-instate old behavior in select_task_rq_rt()
sched/rt: cpupri_find: Implement fallback mechanism for !fit case
sched/fair: Fix reordering of enqueue/dequeue_task_fair()
sched/fair: Fix runnable_avg for throttled cfs
...
LSM and tracing programs share their helpers with bpf_tracing_func_proto
which is only defined (in bpf_trace.c) when BPF_EVENTS is enabled.
Instead of adding __weak symbol, make BPF_LSM depend on BPF_EVENTS so
that both tracing and LSM programs can actually share helpers.
Fixes: fc611f47f2 ("bpf: Introduce BPF_PROG_TYPE_LSM")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200330204059.13024-1-kpsingh@chromium.org
Introduce types and configs for bpf programs that can be attached to
LSM hooks. The programs can be enabled by the config option
CONFIG_BPF_LSM.
Signed-off-by: KP Singh <kpsingh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Florent Revest <revest@google.com>
Reviewed-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Acked-by: James Morris <jamorris@linux.microsoft.com>
Link: https://lore.kernel.org/bpf/20200329004356.27286-2-kpsingh@chromium.org
The PERF_SAMPLE_CGROUP bit is to save (perf_event) cgroup information in
the sample. It will add a 64-bit id to identify current cgroup and it's
the file handle in the cgroup file system. Userspace should use this
information with PERF_RECORD_CGROUP event to match which cgroup it
belongs.
I put it before PERF_SAMPLE_AUX for simplicity since it just needs a
64-bit word. But if we want bigger samples, I can work on that
direction too.
Committer testing:
$ pahole perf_sample_data | grep -w cgroup -B5 -A5
/* --- cacheline 4 boundary (256 bytes) was 56 bytes ago --- */
struct perf_regs regs_intr; /* 312 16 */
/* --- cacheline 5 boundary (320 bytes) was 8 bytes ago --- */
u64 stack_user_size; /* 328 8 */
u64 phys_addr; /* 336 8 */
u64 cgroup; /* 344 8 */
/* size: 384, cachelines: 6, members: 22 */
/* padding: 32 */
};
$
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lore.kernel.org/lkml/20200325124536.2800725-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The support for __uint128_t is dependent on the target bit size.
GCC that defaults to the 32-bit can still build the 64-bit kernel
with -m64 flag passed.
However, $(cc-option,-D__SIZEOF_INT128__=0) is evaluated against the
default machine bit, which may not match to the kernel it is building.
Theoretically, this could be evaluated separately for 64BIT/32BIT.
config CC_HAS_INT128
bool
default !$(cc-option,$(m64-flag) -D__SIZEOF_INT128__=0) if 64BIT
default !$(cc-option,$(m32-flag) -D__SIZEOF_INT128__=0)
I simplified it more because the 32-bit compiler is unlikely to support
__uint128_t.
Fixes: c12d3362a7 ("int128: move __uint128_t compiler test to Kconfig")
Reported-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: George Spelvin <lkml@sdf.org>
Extrapolating on the existing framework to track rt/dl utilization using
pelt signals, add a similar mechanism to track thermal pressure. The
difference here from rt/dl utilization tracking is that, instead of
tracking time spent by a CPU running a RT/DL task through util_avg, the
average thermal pressure is tracked through load_avg. This is because
thermal pressure signal is weighted time "delta" capacity unlike util_avg
which is binary. "delta capacity" here means delta between the actual
capacity of a CPU and the decreased capacity a CPU due to a thermal event.
In order to track average thermal pressure, a new sched_avg variable
avg_thermal is introduced. Function update_thermal_load_avg can be called
to do the periodic bookkeeping (accumulate, decay and average) of the
thermal pressure.
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20200222005213.3873-2-thara.gopinath@linaro.org
CONFIG_TRIM_UNUSED_KSYMS currently removes all unused exported symbols
from ksymtab. This works really well when using in-tree drivers, but
cannot be used in its current form if some of them are out-of-tree.
Indeed, even if the list of symbols required by out-of-tree drivers is
known at compile time, the only solution today to guarantee these don't
get trimmed is to set CONFIG_TRIM_UNUSED_KSYMS=n. This not only wastes
space, but also makes it difficult to control the ABI usable by vendor
modules in distribution kernels such as Android. Being able to control
the kernel ABI surface is particularly useful to ship a unique Generic
Kernel Image (GKI) for all vendors, which is a first step in the
direction of getting all vendors to contribute their code upstream.
As such, attempt to improve the situation by enabling users to specify a
symbol 'whitelist' at compile time. Any symbol specified in this
whitelist will be kept exported when CONFIG_TRIM_UNUSED_KSYMS is set,
even if it has no in-tree user. The whitelist is defined as a simple
text file, listing symbols, one per line.
Acked-by: Jessica Yu <jeyu@kernel.org>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Matthias Maennich <maennich@google.com>
Reviewed-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Most of the Kconfig commands (except defconfig and all*config) read
the .config file as a base set of CONFIG options.
When it does not exist, the files in DEFCONFIG_LIST are searched in
this order and loaded if found.
I do not see much sense in the last two lines in DEFCONFIG_LIST.
[1] ARCH_DEFCONFIG
The entry for DEFCONFIG_LIST is guarded by 'depends on !UML'. So, the
ARCH_DEFCONFIG definition in arch/x86/um/Kconfig is meaningless.
arch/{sh,sparc,x86}/Kconfig define ARCH_DEFCONFIG depending on 32 or
64 bit variant symbols. This is a little bit strange; ARCH_DEFCONFIG
should be a fixed string because the base config file is loaded before
the symbol evaluation stage.
Using KBUILD_DEFCONFIG makes more sense because it is fixed before
Kconfig is invoked. Fortunately, arch/{sh,sparc,x86}/Makefile define it
in the same way, and it works as expected. Hence, replace ARCH_DEFCONFIG
with "arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)".
[2] arch/$(ARCH)/defconfig
This file path is no longer valid. The defconfig files are always located
in the arch configs/ directories.
$ find arch -name defconfig | sort
arch/alpha/configs/defconfig
arch/arm64/configs/defconfig
arch/csky/configs/defconfig
arch/nds32/configs/defconfig
arch/riscv/configs/defconfig
arch/s390/configs/defconfig
arch/unicore32/configs/defconfig
The path arch/*/configs/defconfig is already covered by
"arch/$(SRCARCH)/configs/$(KBUILD_DEFCONFIG)". So, this file path is
not necessary.
I moved the default KBUILD_DEFCONFIG to the top Makefile. Otherwise,
the 7 architectures listed above would end up with endless loop of
syncconfig.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Change in API of bootconfig (before it comes live in a release)
- Have a magic value "BOOTCONFIG" in initrd to know a bootconfig exists
- Set CONFIG_BOOT_CONFIG to 'n' by default
- Show error if "bootconfig" on cmdline but not compiled in
- Prevent redefining the same value
- Have a way to append values
- Added a SELECT BLK_DEV_INITRD to fix a build failure
Synthetic event fixes:
- Switch to raw_smp_processor_id() for recording CPU value in preempt
section. (No care for what the value actually is)
- Fix samples always recording u64 values
- Fix endianess
- Check number of values matches number of fields
- Fix a printing bug
Fix of trace_printk() breaking postponed start up tests
Make a function static that is only used in a single file.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXlW4vxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qtioAP0WLEm3dWO0z3321h/a0DSshC+Bslu3
HDPTsGVGrXmvggEA/lr1ikRHd8PsO7zW8BfaZMxoXaTqXiuSrzEWxnMlFw0=
=O8PM
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing and bootconfig updates:
"Fixes and changes to bootconfig before it goes live in a release.
Change in API of bootconfig (before it comes live in a release):
- Have a magic value "BOOTCONFIG" in initrd to know a bootconfig
exists
- Set CONFIG_BOOT_CONFIG to 'n' by default
- Show error if "bootconfig" on cmdline but not compiled in
- Prevent redefining the same value
- Have a way to append values
- Added a SELECT BLK_DEV_INITRD to fix a build failure
Synthetic event fixes:
- Switch to raw_smp_processor_id() for recording CPU value in preempt
section. (No care for what the value actually is)
- Fix samples always recording u64 values
- Fix endianess
- Check number of values matches number of fields
- Fix a printing bug
Fix of trace_printk() breaking postponed start up tests
Make a function static that is only used in a single file"
* tag 'trace-v5.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
bootconfig: Fix CONFIG_BOOTTIME_TRACING dependency issue
bootconfig: Add append value operator support
bootconfig: Prohibit re-defining value on same key
bootconfig: Print array as multiple commands for legacy command line
bootconfig: Reject subkey and value on same parent key
tools/bootconfig: Remove unneeded error message silencer
bootconfig: Add bootconfig magic word for indicating bootconfig explicitly
bootconfig: Set CONFIG_BOOT_CONFIG=n by default
tracing: Clear trace_state when starting trace
bootconfig: Mark boot_config_checksum() static
tracing: Disable trace_printk() on post poned tests
tracing: Have synthetic event test use raw_smp_processor_id()
tracing: Fix number printing bug in print_synth_event()
tracing: Check that number of vals matches number of synth event fields
tracing: Make synth_event trace functions endian-correct
tracing: Make sure synth_event_trace() example always uses u64
Since commit d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by
default") also changed the CONFIG_BOOTTIME_TRACING to select
CONFIG_BOOT_CONFIG to show the boot-time tracing on the menu,
it introduced wrong dependencies with BLK_DEV_INITRD as below.
WARNING: unmet direct dependencies detected for BOOT_CONFIG
Depends on [n]: BLK_DEV_INITRD [=n]
Selected by [y]:
- BOOTTIME_TRACING [=y] && TRACING_SUPPORT [=y] && FTRACE [=y] && TRACING [=y]
This makes the CONFIG_BOOT_CONFIG selects CONFIG_BLK_DEV_INITRD to
fix this error and make CONFIG_BOOTTIME_TRACING=n by default, so
that both boot-time tracing and boot configuration off but those
appear on the menu list.
Link: http://lkml.kernel.org/r/158264140162.23842.11237423518607465535.stgit@devnote2
Fixes: d8a953ddde ("bootconfig: Set CONFIG_BOOT_CONFIG=n by default")
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Compiled-tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Add bootconfig magic word to the end of bootconfig on initrd
image for indicating explicitly the bootconfig is there.
Also tools/bootconfig treats wrong size or wrong checksum or
parse error as an error, because if there is a bootconfig magic
word, there must be a bootconfig.
The bootconfig magic word is "#BOOTCONFIG\n", 12 bytes word.
Thus the block image of the initrd file with bootconfig is
as follows.
[Initrd][bootconfig][size][csum][#BOOTCONFIG\n]
Link: http://lkml.kernel.org/r/158220112263.26565.3944814205960612841.stgit@devnote2
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Set CONFIG_BOOT_CONFIG=n by default. This also warns
user if CONFIG_BOOT_CONFIG=n but "bootconfig" is given
in the kernel command line.
Link: http://lkml.kernel.org/r/158220111291.26565.9036889083940367969.stgit@devnote2
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Fix an uninitialized variable
- Fix compile bug to bootconfig userspace tool (in tools directory)
- Suppress some error messages of bootconfig userspace tool
- Remove unneded CONFIG_LIBXBC from bootconfig
- Allocate bootconfig xbc_nodes dynamically.
To ease complaints about taking up static memory at boot up
- Use of parse_args() to parse bootconfig instead of strstr() usage
Prevents issues of double quotes containing the interested string
- Fix missing ring_buffer_nest_end() on synthetic event error path
- Return zero not -EINVAL on soft disabled synthetic event
(soft disabling must be the same as hard disabling, which returns zero)
- Consolidate synthetic event code (remove duplicate code)
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXkMTwRQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qnJQAQD5aKiD6jx+zGtLsFTuZMcEGvhhUuJ6
oUaSvhKO3UqezwD/V5avKuuC0wWt//gOWDY+0+4QNjmvn1GLnaNYohs6fg0=
=NLT9
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various fixes:
- Fix an uninitialized variable
- Fix compile bug to bootconfig userspace tool (in tools directory)
- Suppress some error messages of bootconfig userspace tool
- Remove unneded CONFIG_LIBXBC from bootconfig
- Allocate bootconfig xbc_nodes dynamically. To ease complaints about
taking up static memory at boot up
- Use of parse_args() to parse bootconfig instead of strstr() usage
Prevents issues of double quotes containing the interested string
- Fix missing ring_buffer_nest_end() on synthetic event error path
- Return zero not -EINVAL on soft disabled synthetic event (soft
disabling must be the same as hard disabling, which returns zero)
- Consolidate synthetic event code (remove duplicate code)"
* tag 'trace-v5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Consolidate trace() functions
tracing: Don't return -EINVAL when tracing soft disabled synth events
tracing: Add missing nest end to synth_event_trace_start() error case
tools/bootconfig: Suppress non-error messages
bootconfig: Allocate xbc_nodes array dynamically
bootconfig: Use parse_args() to find bootconfig and '--'
tracing/kprobe: Fix uninitialized variable bug
bootconfig: Remove unneeded CONFIG_LIBXBC
tools/bootconfig: Fix wrong __VA_ARGS__ usage
Since there is no user except CONFIG_BOOT_CONFIG and no plan
to use it from other functions, CONFIG_LIBXBC can be removed
and we can use CONFIG_BOOT_CONFIG directly.
Link: http://lkml.kernel.org/r/158098769281.939.16293492056419481105.stgit@devnote2
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
- Added new "bootconfig".
Looks for a file appended to initrd to add boot config options.
This has been discussed thoroughly at Linux Plumbers.
Very useful for adding kprobes at bootup.
Only enabled if "bootconfig" is on the real kernel command line.
- Created dynamic event creation.
Merges common code between creating synthetic events and
kprobe events.
- Rename perf "ring_buffer" structure to "perf_buffer"
- Rename ftrace "ring_buffer" structure to "trace_buffer"
Had to rename existing "trace_buffer" to "array_buffer"
- Allow trace_printk() to work withing (some) tracing code.
- Sort of tracing configs to be a little better organized
- Fixed bug where ftrace_graph hash was not being protected properly
- Various other small fixes and clean ups
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCXjtAURQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qshOAQDzopQmvAVrrI6oogghr8JQA30Z2yqT
i+Ld7vPWL2MV9wEA1S+zLGDSYrj8f/vsCq6BxRYT1ApO+YtmY6LTXiUejwg=
=WNds
-----END PGP SIGNATURE-----
Merge tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing updates from Steven Rostedt:
- Added new "bootconfig".
This looks for a file appended to initrd to add boot config options,
and has been discussed thoroughly at Linux Plumbers.
Very useful for adding kprobes at bootup.
Only enabled if "bootconfig" is on the real kernel command line.
- Created dynamic event creation.
Merges common code between creating synthetic events and kprobe
events.
- Rename perf "ring_buffer" structure to "perf_buffer"
- Rename ftrace "ring_buffer" structure to "trace_buffer"
Had to rename existing "trace_buffer" to "array_buffer"
- Allow trace_printk() to work withing (some) tracing code.
- Sort of tracing configs to be a little better organized
- Fixed bug where ftrace_graph hash was not being protected properly
- Various other small fixes and clean ups
* tag 'trace-v5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (88 commits)
bootconfig: Show the number of nodes on boot message
tools/bootconfig: Show the number of bootconfig nodes
bootconfig: Add more parse error messages
bootconfig: Use bootconfig instead of boot config
ftrace: Protect ftrace_graph_hash with ftrace_sync
ftrace: Add comment to why rcu_dereference_sched() is open coded
tracing: Annotate ftrace_graph_notrace_hash pointer with __rcu
tracing: Annotate ftrace_graph_hash pointer with __rcu
bootconfig: Only load bootconfig if "bootconfig" is on the kernel cmdline
tracing: Use seq_buf for building dynevent_cmd string
tracing: Remove useless code in dynevent_arg_pair_add()
tracing: Remove check_arg() callbacks from dynevent args
tracing: Consolidate some synth_event_trace code
tracing: Fix now invalid var_ref_vals assumption in trace action
tracing: Change trace_boot to use synth_event interface
tracing: Move tracing selftests to bottom of menu
tracing: Move mmio tracer config up with the other tracers
tracing: Move tracing test module configs together
tracing: Move all function tracing configs together
tracing: Documentation for in-kernel synthetic event API
...
- Fix for time travel mode
- Disable CONFIG_CONSTRUCTORS again
- A new command line option to have an non-raw serial line
- Preparations to remove obsolete UML network drivers
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl4k2EYWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wTe2EACDEsoWZvvKnocFH/umFfZdxciU
Ys5noEPElnILVIwV+Gm9SHq/RQWzG8BqSOirfOn1iGhEqWjDTPzwqPuqFGxKtRVp
VoaYDA506oDH903i4vj1OuGDHxgModEmR/GFqU9uEtXUws2qbeZQcG0COkquJU8X
URMz4XB+KLqDI2TvOTnbWevjJnslwLIqRuDdZ2q0d685J1XhRhuq/srgZGMiUpGn
4H/E4k0UxlC082oh9QWRFYYyc6vhyvlguupphzBgICZQmP4P4ck3pe23OT+vOWBl
+e2ti9MlB9/Tv3dGhzmq2180U0D74RvtHIi7RjUdaTcEoOkgDwXqKsZ1CY4kCV78
mxrXHCE6YUMvsQcTBxobXYD/zUXeqXtlSHyGQ4MUATCvI6ag8vWKWjGXV/kDVWdf
FEeL0O6AHjruTrPxi1aSJ3TFG+JerXCGZpSt2DG67sCcWJ/RqYnrs45DF4U6ywf4
BQ/nA0bpdZouLrhtCS6yBRvPiA5TVXHmrQMpK/LsOpBD4sKCV+MXghbYoWAwcSoM
H+RSpf1em3zQrlRcuNPW8XGVkqOmUKn9pFzT9ybWv0h2hVhrDiutjJEPgbpJooIr
yB0G/MVTtk3Xrok2lq8TT+Hp13TWCTFynsmKYvgv4s37p5jA5fvKL0vhdhIlAxHE
FCyGsZIkAcMLfjvC3Q==
=yi/o
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Anton Ivanov:
"I am sending this on behalf of Richard who is traveling.
This contains the following changes for UML:
- Fix for time travel mode
- Disable CONFIG_CONSTRUCTORS again
- A new command line option to have an non-raw serial line
- Preparations to remove obsolete UML network drivers"
* tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: Fix time-travel=inf-cpu with xor/raid6
Revert "um: Enable CONFIG_CONSTRUCTORS"
um: Mark non-vector net transports as obsolete
um: Add an option to make serial driver non-raw
Pull networking updates from David Miller:
1) Add WireGuard
2) Add HE and TWT support to ath11k driver, from John Crispin.
3) Add ESP in TCP encapsulation support, from Sabrina Dubroca.
4) Add variable window congestion control to TIPC, from Jon Maloy.
5) Add BCM84881 PHY driver, from Russell King.
6) Start adding netlink support for ethtool operations, from Michal
Kubecek.
7) Add XDP drop and TX action support to ena driver, from Sameeh
Jubran.
8) Add new ipv4 route notifications so that mlxsw driver does not have
to handle identical routes itself. From Ido Schimmel.
9) Add BPF dynamic program extensions, from Alexei Starovoitov.
10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes.
11) Add support for macsec HW offloading, from Antoine Tenart.
12) Add initial support for MPTCP protocol, from Christoph Paasch,
Matthieu Baerts, Florian Westphal, Peter Krystad, and many others.
13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu
Cherian, and others.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits)
net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC
udp: segment looped gso packets correctly
netem: change mailing list
qed: FW 8.42.2.0 debug features
qed: rt init valid initialization changed
qed: Debug feature: ilt and mdump
qed: FW 8.42.2.0 Add fw overlay feature
qed: FW 8.42.2.0 HSI changes
qed: FW 8.42.2.0 iscsi/fcoe changes
qed: Add abstraction for different hsi values per chip
qed: FW 8.42.2.0 Additional ll2 type
qed: Use dmae to write to widebus registers in fw_funcs
qed: FW 8.42.2.0 Parser offsets modified
qed: FW 8.42.2.0 Queue Manager changes
qed: FW 8.42.2.0 Expose new registers and change windows
qed: FW 8.42.2.0 Internal ram offsets modifications
MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver
Documentation: net: octeontx2: Add RVU HW and drivers overview
octeontx2-pf: ethtool RSS config support
octeontx2-pf: Add basic ethtool support
...
Pull objtool updates from Ingo Molnar:
"The main changes are to move the ORC unwind table sorting from early
init to build-time - this speeds up booting.
No change in functionality intended"
* 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/unwind/orc: Fix !CONFIG_MODULES build warning
x86/unwind/orc: Remove boot-time ORC unwind tables sorting
scripts/sorttable: Implement build-time ORC unwind table sorting
scripts/sorttable: Rename 'sortextable' to 'sorttable'
scripts/sortextable: Refactor the do_func() function
scripts/sortextable: Remove dead code
scripts/sortextable: Clean up the code to meet the kernel coding style better
scripts/sortextable: Rewrite error/success handling
Fix Kconfig help message since the bootconfig file is
only available to be appended to initramfs. And also
add a reference to the documentation.
Link: http://lkml.kernel.org/r/157949058031.25888.18399447161895787505.stgit@devnote2
Reported-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
This reverts commit 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS").
There are two issues with this commit, uncovered by Anton in tests
on some (Debian) systems:
1) I completely forgot to call any constructors if CONFIG_CONSTRUCTORS
isn't set. Don't recall now if it just wasn't needed on my system, or
if I never tested this case.
2) With that fixed, it works - with CONFIG_CONSTRUCTORS *unset*. If I
set CONFIG_CONSTRUCTORS, it fails again, which isn't totally
unexpected since whatever wanted to run is likely to have to run
before the kernel init etc. that calls the constructors in this case.
Basically, some constructors that gcc emits (libc has?) need to run
very early during init; the failure mode otherwise was that the ptrace
fork test already failed:
----------------------
$ ./linux mem=512M
Core dump limits :
soft - 0
hard - NONE
Checking that ptrace can change system call numbers...check_ptrace : child exited with exitcode 6, while expecting 0; status 0x67f
Aborted
----------------------
Thinking more about this, it's clear that we simply cannot support
CONFIG_CONSTRUCTORS in UML. All the cases we need now (gcov, kasan)
involve not use of the __attribute__((constructor)), but instead
some constructor code/entry generated by gcc. Therefore, we cannot
distinguish between kernel constructors and system constructors.
Thus, revert this commit.
Cc: stable@vger.kernel.org [5.4+]
Fixes: 786b2384bf ("um: Enable CONFIG_CONSTRUCTORS")
Reported-by: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Anton Ivanov <anton.ivanov@cambridgegreys.co.uk>
Signed-off-by: Richard Weinberger <richard@nod.at>
To support time namespaces in the vdso with a minimal impact on regular non
time namespace affected tasks, the namespace handling needs to be hidden in
a slow path.
The most obvious place is vdso_seq_begin(). If a task belongs to a time
namespace then the VVAR page which contains the system wide vdso data is
replaced with a namespace specific page which has the same layout as the
VVAR page. That page has vdso_data->seq set to 1 to enforce the slow path
and vdso_data->clock_mode set to VCLOCK_TIMENS to enforce the time
namespace handling path.
The extra check in the case that vdso_data->seq is odd, e.g. a concurrent
update of the vdso data is in progress, is not really affecting regular
tasks which are not part of a time namespace as the task is spin waiting
for the update to finish and vdso_data->seq to become even again.
If a time namespace task hits that code path, it invokes the corresponding
time getter function which retrieves the real VVAR page, reads host time
and then adds the offset for the requested clock which is stored in the
special VVAR page.
If VDSO time namespace support is disabled the whole magic is compiled out.
Initial testing shows that the disabled case is almost identical to the
host case which does not take the slow timens path. With the special timens
page installed the performance hit is constant time and in the range of
5-7%.
For the vdso functions which are not using the sequence count an
unconditional check for vdso_data->clock_mode is added which switches to
the real vdso when the clock_mode is VCLOCK_TIMENS.
[avagin: Make do_hres_timens() work with raw clocks too: choose vdso_data
pointer by CS_RAW offset.]
Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20191112012724.250792-21-dima@arista.com
Time Namespace isolates clock values.
The kernel provides access to several clocks CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME, etc.
CLOCK_REALTIME
System-wide clock that measures real (i.e., wall-clock) time.
CLOCK_MONOTONIC
Clock that cannot be set and represents monotonic time since
some unspecified starting point.
CLOCK_BOOTTIME
Identical to CLOCK_MONOTONIC, except it also includes any time
that the system is suspended.
For many users, the time namespace means the ability to changes date and
time in a container (CLOCK_REALTIME). Providing per namespace notions of
CLOCK_REALTIME would be complex with a massive overhead, but has a dubious
value.
But in the context of checkpoint/restore functionality, monotonic and
boottime clocks become interesting. Both clocks are monotonic with
unspecified starting points. These clocks are widely used to measure time
slices and set timers. After restoring or migrating processes, it has to be
guaranteed that they never go backward. In an ideal case, the behavior of
these clocks should be the same as for a case when a whole system is
suspended. All this means that it is required to set CLOCK_MONOTONIC and
CLOCK_BOOTTIME clocks, which can be achieved by adding per-namespace
offsets for clocks.
A time namespace is similar to a pid namespace in the way how it is
created: unshare(CLONE_NEWTIME) system call creates a new time namespace,
but doesn't set it to the current process. Then all children of the process
will be born in the new time namespace, or a process can use the setns()
system call to join a namespace.
This scheme allows setting clock offsets for a namespace, before any
processes appear in it.
All available clone flags have been used, so CLONE_NEWTIME uses the highest
bit of CSIGNAL. It means that it can be used only with the unshare() and
the clone3() system calls.
[ tglx: Adjusted paragraph about clone3() to reality and massaged the
changelog a bit. ]
Co-developed-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://criu.org/Time_namespace
Link: https://lists.openvz.org/pipermail/criu/2018-June/041504.html
Link: https://lore.kernel.org/r/20191112012724.250792-4-dima@arista.com
Load the extended boot config data from the tail of initrd
image. If there is an SKC data there, it has
[(u32)size][(u32)checksum] header (in really, this is a
footer) at the end of initrd. If the checksum (simple sum
of bytes) is match, this starts parsing it from there.
Link: http://lkml.kernel.org/r/157867222435.17873.9936667353335606867.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Extra Boot Config (XBC) allows admin to pass a tree-structured
boot configuration file when boot up the kernel. This extends
the kernel command line in an efficient way.
Boot config will contain some key-value commands, e.g.
key.word = value1
another.key.word = value2
It can fold same keys with braces, also you can write array
data. For example,
key {
word1 {
setting1 = data
setting2
}
word2.array = "val1", "val2"
}
User can access these key-value pair and tree structure via
SKC APIs.
Link: http://lkml.kernel.org/r/157867221257.17873.1775090991929862549.stgit@devnote2
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Use a more generic name for additional table sorting usecases,
such as the upcoming ORC table sorting feature. This tool is
not tied to exception table sorting anymore.
No functional changes intended.
[ mingo: Rewrote the changelog. ]
Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: linux-kbuild@vger.kernel.org
Link: https://lkml.kernel.org/r/20191204004633.88660-6-shile.zhang@linux.alibaba.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
After Spectre 2 fix via 290af86629 ("bpf: introduce BPF_JIT_ALWAYS_ON
config") most major distros use BPF_JIT_ALWAYS_ON configuration these days
which compiles out the BPF interpreter entirely and always enables the
JIT. Also given recent fix in e1608f3fa8 ("bpf: Avoid setting bpf insns
pages read-only when prog is jited"), we additionally avoid fragmenting
the direct map for the BPF insns pages sitting in the general data heap
since they are not used during execution. Latter is only needed when run
through the interpreter.
Since both x86 and arm64 JITs have seen a lot of exposure over the years,
are generally most up to date and maintained, there is more downside in
!BPF_JIT_ALWAYS_ON configurations to have the interpreter enabled by default
rather than the JIT. Add a ARCH_WANT_DEFAULT_BPF_JIT config which archs can
use to set the bpf_jit_{enable,kallsyms} to 1. Back in the days the
bpf_jit_kallsyms knob was set to 0 by default since major distros still
had /proc/kallsyms addresses exposed to unprivileged user space which is
not the case anymore. Hence both knobs are set via BPF_JIT_DEFAULT_ON which
is set to 'y' in case of BPF_JIT_ALWAYS_ON or ARCH_WANT_DEFAULT_BPF_JIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/f78ad24795c2966efcc2ee19025fa3459f622185.1575903816.git.daniel@iogearbox.net
- remove unneeded asm headers from hexagon, ia64
- add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
- add 'helpnewconfig' target, which shows help for new CONFIG options
- support 'make nsdeps' for external modules
- make rebuilds faster by deleting $(wildcard $^) checks
- remove compile tests for kernel-space headers
- refactor modpost to simplify modversion handling
- make single target builds faster
- optimize and clean up scripts/kallsyms.c
- refactor various Makefiles and scripts
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl3lKCUVHG1hc2FoaXJv
eUBrZXJuZWwub3JnAAoJED2LAQed4NsGu9sP/iTW/RjDxbAsu0aP8jFqzLK/xKB/
NQn/+dD76TjEmjgew9AXszf2rJL+ixKVymGM08FV59Bbguvi8XmAB/QXK21Sjb5j
rVl3N97TWNkvXM+QJyly23G2UtbubRSPo3g+e70BZrw3lcmrsK+sAmTOL5KtIrNX
9BHM803JwqsMJyvBwTBBw3UFeeBqb38Qx6gmigfSihuDf6pvjoVDKskpsDno3wX7
rdiXYxAsKQLQ/P2ym/bV/Oqe90RqRtV/2/WCpLshlwHkiM9huflv6GjgCkkbAx5H
N3TSptlS7l/2B/XKHgA5ALjHjUlxTGBzLLoevarCd8loKcQXFlgx+vd3nM/WJlHJ
x9UpTklDwGP9eUBsa9W980tEyUVsFGMAC8EcTdW6NN2IRtuCOSA5N2FYYt8/SDd0
2b3PhElTJIp4pTWSYN6JZxB1R8n/YBgxLqOJ6N2U6B9CdKFUCHlwGH23QfN89km/
WEMP85bsaab/dnyxbwelkoYYYyPgUHsC13AbpkHdrDxMbAGO+G1PwpHxC6ErF2en
wRGrcUxWTfHRykO5aJIQtCB9b1fv73134mTzB5fTYd6GtjepGBSBCO9xb2Iy4sc9
Y+nHVVDUrihvSOpJgqh677PcLDutOZR8fFCoc1ZMDAbBsDvrb0Qsee6oEidj98xc
5kXp9YZh/tdh/tdo
=zUaB
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- remove unneeded asm headers from hexagon, ia64
- add 'dir-pkg' target, which works like 'tar-pkg' but skips archiving
- add 'helpnewconfig' target, which shows help for new CONFIG options
- support 'make nsdeps' for external modules
- make rebuilds faster by deleting $(wildcard $^) checks
- remove compile tests for kernel-space headers
- refactor modpost to simplify modversion handling
- make single target builds faster
- optimize and clean up scripts/kallsyms.c
- refactor various Makefiles and scripts
* tag 'kbuild-v5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (59 commits)
MAINTAINERS: update Kbuild/Kconfig maintainer's email address
scripts/kallsyms: remove redundant initializers
scripts/kallsyms: put check_symbol_range() calls close together
scripts/kallsyms: make check_symbol_range() void function
scripts/kallsyms: move ignored symbol types to is_ignored_symbol()
scripts/kallsyms: move more patterns to the ignored_prefixes array
scripts/kallsyms: skip ignored symbols very early
scripts/kallsyms: add const qualifiers where possible
scripts/kallsyms: make find_token() return (unsigned char *)
scripts/kallsyms: replace prefix_underscores_count() with strspn()
scripts/kallsyms: add sym_name() to mitigate cast ugliness
scripts/kallsyms: remove unneeded length check for prefix matching
scripts/kallsyms: remove redundant is_arm_mapping_symbol()
scripts/kallsyms: set relative_base more effectively
scripts/kallsyms: shrink table before sorting it
scripts/kallsyms: fix definitely-lost memory leak
scripts/kallsyms: remove unneeded #ifndef ARRAY_SIZE
kbuild: make single target builds even faster
modpost: respect the previous export when 'exported twice' is warned
modpost: do not set ->preloaded for symbols from Module.symvers
...
Pull sysctl system call removal from Eric Biederman:
"As far as I can tell we have reached the point where no one enables
the sysctl system call anymore. It still is enabled in a few
defconfigs but they are mostly the rarely used one and in asking
people about that it was more cut & paste enabled than anything else.
This is single commit that just deletes code. Leaving just enough code
so that the deprecated sysctl warning continues to be printed. If my
analysis turns out to be wrong and someone actually cares it will be
easy to revert this commit and have the system call again.
There was one new xtensa defconfig in linux-next that enabled the
system call this cycle and when asked about it the maintainer of the
code replied that it was not enabled on purpose. As of today's
linux-next tree that defconfig no longer enables the system call.
What we saw in the review discussion was that if we go a step farther
than my patch and mess with uapi headers there are pieces of code that
won't compile, but nothing minds the system call actually disappearing
from the kernel"
Link: https://lore.kernel.org/lkml/201910011140.EA0181F13@keescook/
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Remove the sysctl system call
This system call has been deprecated almost since it was introduced, and
in a survey of the linux distributions I can no longer find any of them
that enable CONFIG_SYSCTL_SYSCALL. The only indication that I can find
that anyone might care is that a few of the defconfigs in the kernel
enable CONFIG_SYSCTL_SYSCALL. However this appears in only 31 of 414
defconfigs in the kernel, so I suspect this symbols presence is simply
because it is harmless to include rather than because it is necessary.
As there appear to be no users of the sysctl system call, remove the
code. As this removes one of the few uses of the internal kernel mount
of proc I hope this allows for even more simplifications of the proc
filesystem.
Cc: Alex Smith <alex.smith@imgtec.com>
Cc: Anders Berg <anders.berg@lsi.com>
Cc: Apelete Seketeli <apelete@seketeli.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Chee Nouk Phoon <cnphoon@altera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christian Ruppert <christian.ruppert@abilis.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Harvey Hunt <harvey.hunt@imgtec.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Hongliang Tao <taohl@lemote.com>
Cc: Hua Yan <yanh@lemote.com>
Cc: Huacai Chen <chenhc@lemote.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: Jonas Jensen <jonas.jensen@gmail.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Jun Nie <jun.nie@linaro.org>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Kevin Wells <kevin.wells@nxp.com>
Cc: Kumar Gala <galak@codeaurora.org>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Phil Edworthy <phil.edworthy@renesas.com>
Cc: Pierrick Hascoet <pierrick.hascoet@abilis.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Roland Stigge <stigge@antcom.de>
Cc: Santosh Shilimkar <santosh.shilimkar@ti.com>
Cc: Scott Telford <stelford@cadence.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: Tanmay Inamdar <tinamdar@apm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Wolfram Sang <w.sang@pengutronix.de>
Acked-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
In order to use 128-bit integer arithmetic in C code, the architecture
needs to have declared support for it by setting ARCH_SUPPORTS_INT128,
and it requires a version of the toolchain that supports this at build
time. This is why all existing tests for ARCH_SUPPORTS_INT128 also test
whether __SIZEOF_INT128__ is defined, since this is only the case for
compilers that can support 128-bit integers.
Let's fold this additional test into the Kconfig declaration of
ARCH_SUPPORTS_INT128 so that we can also use the symbol in Makefiles,
e.g., to decide whether a certain object needs to be included in the
first place.
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
There are both positive and negative options about this feature.
At first, I thought it was a good idea, but actually Linus stated a
negative opinion (https://lkml.org/lkml/2019/9/29/227). I admit it
is ugly and annoying.
The baseline I'd like to keep is the compile-test of uapi headers.
(Otherwise, kernel developers have no way to ensure the correctness
of the exported headers.)
I will maintain a small build rule in usr/include/Makefile.
Remove the other header test functionality.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Drop various work-arounds we have for workqueues:
- We no longer need the async_list for tracking sequential IO.
- We don't have to maintain our own mm tracking/setting.
- We don't need a separate workqueue for buffered writes. This didn't
even work that well to begin with, as it was suboptimal for multiple
buffered writers on multiple files.
- We can properly cancel pending interruptible work. This fixes
deadlocks with particularly socket IO, where we cannot cancel them
when the io_uring is closed. Hence the ring will wait forever for
these requests to complete, which may never happen. This is different
from disk IO where we know requests will complete in a finite amount
of time.
- Due to being able to cancel work interruptible work that is already
running, we can implement file table support for work. We need that
for supporting system calls that add to a process file table.
- It gets us one step closer to adding async support for any system
call.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull kernel lockdown mode from James Morris:
"This is the latest iteration of the kernel lockdown patchset, from
Matthew Garrett, David Howells and others.
From the original description:
This patchset introduces an optional kernel lockdown feature,
intended to strengthen the boundary between UID 0 and the kernel.
When enabled, various pieces of kernel functionality are restricted.
Applications that rely on low-level access to either hardware or the
kernel may cease working as a result - therefore this should not be
enabled without appropriate evaluation beforehand.
The majority of mainstream distributions have been carrying variants
of this patchset for many years now, so there's value in providing a
doesn't meet every distribution requirement, but gets us much closer
to not requiring external patches.
There are two major changes since this was last proposed for mainline:
- Separating lockdown from EFI secure boot. Background discussion is
covered here: https://lwn.net/Articles/751061/
- Implementation as an LSM, with a default stackable lockdown LSM
module. This allows the lockdown feature to be policy-driven,
rather than encoding an implicit policy within the mechanism.
The new locked_down LSM hook is provided to allow LSMs to make a
policy decision around whether kernel functionality that would allow
tampering with or examining the runtime state of the kernel should be
permitted.
The included lockdown LSM provides an implementation with a simple
policy intended for general purpose use. This policy provides a coarse
level of granularity, controllable via the kernel command line:
lockdown={integrity|confidentiality}
Enable the kernel lockdown feature. If set to integrity, kernel features
that allow userland to modify the running kernel are disabled. If set to
confidentiality, kernel features that allow userland to extract
confidential information from the kernel are also disabled.
This may also be controlled via /sys/kernel/security/lockdown and
overriden by kernel configuration.
New or existing LSMs may implement finer-grained controls of the
lockdown features. Refer to the lockdown_reason documentation in
include/linux/security.h for details.
The lockdown feature has had signficant design feedback and review
across many subsystems. This code has been in linux-next for some
weeks, with a few fixes applied along the way.
Stephen Rothwell noted that commit 9d1f8be5cf ("bpf: Restrict bpf
when kernel lockdown is in confidentiality mode") is missing a
Signed-off-by from its author. Matthew responded that he is providing
this under category (c) of the DCO"
* 'next-lockdown' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (31 commits)
kexec: Fix file verification on S390
security: constify some arrays in lockdown LSM
lockdown: Print current->comm in restriction messages
efi: Restrict efivar_ssdt_load when the kernel is locked down
tracefs: Restrict tracefs when the kernel is locked down
debugfs: Restrict debugfs when the kernel is locked down
kexec: Allow kexec_file() with appropriate IMA policy when locked down
lockdown: Lock down perf when in confidentiality mode
bpf: Restrict bpf when kernel lockdown is in confidentiality mode
lockdown: Lock down tracing and perf kprobes when in confidentiality mode
lockdown: Lock down /proc/kcore
x86/mmiotrace: Lock down the testmmiotrace module
lockdown: Lock down module params that specify hardware parameters (eg. ioport)
lockdown: Lock down TIOCSSERIAL
lockdown: Prohibit PCMCIA CIS storage when the kernel is locked down
acpi: Disable ACPI table override if the kernel is locked down
acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down
ACPI: Limit access to custom_method when the kernel is locked down
x86/msr: Restrict MSR access when the kernel is locked down
x86: Lock down IO port access when the kernel is locked down
...
Pull integrity updates from Mimi Zohar:
"The major feature in this time is IMA support for measuring and
appraising appended file signatures. In addition are a couple of bug
fixes and code cleanup to use struct_size().
In addition to the PE/COFF and IMA xattr signatures, the kexec kernel
image may be signed with an appended signature, using the same
scripts/sign-file tool that is used to sign kernel modules.
Similarly, the initramfs may contain an appended signature.
This contained a lot of refactoring of the existing appended signature
verification code, so that IMA could retain the existing framework of
calculating the file hash once, storing it in the IMA measurement list
and extending the TPM, verifying the file's integrity based on a file
hash or signature (eg. xattrs), and adding an audit record containing
the file hash, all based on policy. (The IMA support for appended
signatures patch set was posted and reviewed 11 times.)
The support for appended signature paves the way for adding other
signature verification methods, such as fs-verity, based on a single
system-wide policy. The file hash used for verifying the signature and
the signature, itself, can be included in the IMA measurement list"
* 'next-integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: ima_api: Use struct_size() in kzalloc()
ima: use struct_size() in kzalloc()
sefltest/ima: support appended signatures (modsig)
ima: Fix use after free in ima_read_modsig()
MODSIGN: make new include file self contained
ima: fix freeing ongoing ahash_request
ima: always return negative code for error
ima: Store the measurement again when appraising a modsig
ima: Define ima-modsig template
ima: Collect modsig
ima: Implement support for module-style appended signatures
ima: Factor xattr_verify() out of ima_appraise_measurement()
ima: Add modsig appraise_type option for module-style appended signatures
integrity: Select CONFIG_KEYS instead of depending on it
PKCS#7: Introduce pkcs7_get_digest()
PKCS#7: Refactor verify_pkcs7_signature()
MODSIGN: Export module signature definitions
ima: initialize the "template" field with the default template
Summary of modules changes for the 5.4 merge window:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing kernel
developers to better manage the export surface, allow subsystem
maintainers to explicitly state that usage of some exported symbols
should only be limited to certain users (think: inter-module or
inter-driver symbols, debugging symbols, etc), as well as more easily
limiting the availability of namespaced symbols to other parts of the
kernel. With the module import requirement, it is also easier to spot
the misuse of exported symbols during patch review. Two new macros are
introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is
thoroughly documented in Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there.
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJdh3n8AAoJEMBFfjjOO8Fy94kP+QHZF39QDvLbxAzEYAETAS+o
CFu6wix/DrAwFkTU/kX1eAsAwDBEz0xkMciR4BsLX3sIafUVERxtDXVAui/dA1+6
zfw2c3ObyVwPEk6aUPFprgkj+08gxujsJFlYTsQQUhtRbmxg6R7hD6t6ANxiHaY2
AQe5TzOWXoIa2hHO+7rPMqf8l6qiFCaL0s3v5jrmBXa5mHmc4PVy95h1J6xQVw2u
b+SlvKeylHv+OtCtvthkAJS3hfS35J/1TNb/RNYIvh60IfEguEuFsGuQ9JiSSAZS
pv1cJ+I5d4v8Y/md1rZpdjTJL9gCrq/UUC67+UkejCOn0C+7XM2eR4Bu/jWvdMSn
ZQDHcPhFSIfmP7FaKomPogaBbw1sI1FvM5930pPJzHnyO9+cefBXe7rWaaB+y0At
GAxOtmk1dKh01BT7YO/C0oVuX87csWd74NHypVsbs0TgQo5jBFdZRheyDrq5YB+s
tVK+5H0nqQrCcfo/TvhcsZlgITTGtgTPenaW99/i7qNa9mRUtxC/VkE+aob6HNRF
1iBxxopOTxGN8akyKOVumtkuTQH3EJfouZee//pWbXLzyDmScg/k67vuao8kxbyq
NA1piFAGJAHFsHATxrbvNOq6jZ5bfUT8pwSTs83JppuR++8Hxk7zaShS3/LvsvHt
6ist/epOwTZ7oiNQ04nj
=72Uy
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull modules updates from Jessica Yu:
"The main bulk of this pull request introduces a new exported symbol
namespaces feature. The number of exported symbols is increasingly
growing with each release (we're at about 31k exports as of 5.3-rc7)
and we currently have no way of visualizing how these symbols are
"clustered" or making sense of this huge export surface.
Namespacing exported symbols allows kernel developers to more
explicitly partition and categorize exported symbols, as well as more
easily limiting the availability of namespaced symbols to other parts
of the kernel. For starters, we have introduced the USB_STORAGE
namespace to demonstrate the API's usage. I have briefly summarized
the feature and its main motivations in the tag below.
Summary:
- Introduce exported symbol namespaces.
This new feature allows subsystem maintainers to partition and
categorize their exported symbols into explicit namespaces. Module
authors are now required to import the namespaces they need.
Some of the main motivations of this feature include: allowing
kernel developers to better manage the export surface, allow
subsystem maintainers to explicitly state that usage of some
exported symbols should only be limited to certain users (think:
inter-module or inter-driver symbols, debugging symbols, etc), as
well as more easily limiting the availability of namespaced symbols
to other parts of the kernel.
With the module import requirement, it is also easier to spot the
misuse of exported symbols during patch review.
Two new macros are introduced: EXPORT_SYMBOL_NS() and
EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in
Documentation/kbuild/namespaces.rst.
- Some small code and kbuild cleanups here and there"
* tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Remove leftover '#undef' from export header
module: remove unneeded casts in cmp_name()
module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES
module: remove redundant 'depends on MODULES'
module: Fix link failure due to invalid relocation on namespace offset
usb-storage: export symbols in USB_STORAGE namespace
usb-storage: remove single-use define for debugging
docs: Add documentation for Symbol Namespaces
scripts: Coccinelle script for namespace dependencies.
modpost: add support for generating namespace dependencies
export: allow definition default namespaces in Makefiles or sources
module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS
modpost: add support for symbol namespaces
module: add support for symbol namespaces.
export: explicitly align struct kernel_symbol
module: support reading multiple values per modinfo tag
- virtio support
- Fixes for our new time travel mode
- Various improvements to make lockdep and kasan work better
- SPDX header updates
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAl2Fx9kWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wa2kD/9UJ5JOe6yBeMfPO5Vv8vpJRc10
0gS8qDbzfutrWddq1wUvEaNCIQY4NOf4tsqjauYHpTUA/0AWwruz++iyI9u3XWEQ
0b+ZMhKXkws3UgPwWIxrgLr0106wz6Xuz6d36nqpAc6F4MJhC3LqUCC9yEp3hxMd
pSF65ueQXp7NKfOAqqKU1m3FnfmyBTpsL5PpA6OEZn//kt/Qz5PhIjHpC3JwIBQb
z0OUhE/6mmWb66wtqHIx4Zd2ybLLnsfby24q+1e8J2B+gcORxhubvgCIGY+PU98o
EW3N4aMevUdgG9MJbnlZUgWeZ1bsByail2z8aFElRKefT2xkEnjxfQZgKahI6LnO
jzLm9pk3RjTiZxvYkEbgRAjBkZD514M6FvOlyrHtLxMDfWE6/z71VKDqFjEyeIHQ
QpDjwEjdJTxVHr4Ol+VnZe1lE5zXLNuCFT5qdPQBqyr8g151T7jwYXnGK2SqGo2D
UQ6/KnaN+pgM7BaqcNtwciKk3Xjng0BDLfdZs7z8F3bzv53rg2mpQt5iPm+nWFPa
aNt4B3FKXv3+YnjuSbi5NlvKKK9alRcvZTOk8jFjwOVmFJXlvMCzegZnuTxtqU+j
XpwmUlsT6aMV7vPZN2ta7y1bjOijzZIjL0O7rP4Obxwfp3dTGGYX/T6vW8F2o9V6
evyx/KSD6nqlY1bvwQ==
=oxpp
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- virtio support
- fixes for our new time travel mode
- various improvements to make lockdep and kasan work better
- SPDX header updates
* tag 'for-linus-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: (25 commits)
um: irq: Fix LAST_IRQ usage in init_IRQ()
um: Add SPDX headers for files in arch/um/include
um: Add SPDX headers for files in arch/um/os-Linux
um: Add SPDX headers to files in arch/um/kernel/
um: Add SPDX headers for files in arch/um/drivers
um: virtio: Implement VHOST_USER_PROTOCOL_F_REPLY_ACK
um: virtio: Implement VHOST_USER_PROTOCOL_F_SLAVE_REQ
um: drivers: Add virtio vhost-user driver
um: Use real DMA barriers
um: Don't use generic barrier.h
um: time-travel: Restrict time update in IRQ handler
um: time-travel: Fix periodic timers
um: Enable CONFIG_CONSTRUCTORS
um: Place (soft)irq text with macros
um: Fix VDSO compiler warning
um: Implement TRACE_IRQFLAGS_SUPPORT
um: Remove misleading #define ARCh_IRQ_ENABLED
um: Avoid using uninitialized regs
um: Remove sig_info[SIGALRM]
um: Error handling fixes in vector drivers
...
gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
crude heuristic that gcc uses to estimate the size of the code
represented by an asm() statement. From the gcc docs
If you use 'asm inline' instead of just 'asm', then for inlining
purposes the size of the asm is taken as the minimum size, ignoring
how many instructions GCC thinks it is.
For compatibility with older compilers, we obviously want a
#if [understands asm inline]
#define asm_inline asm inline
#else
#define asm_inline asm
#endif
But since we #define the identifier inline to attach some attributes,
we have to use an alternate spelling of that keyword. gcc provides
both __inline__ and __inline, and we currently #define both to inline,
so they all have the same semantics. We have to free up one of
__inline__ and __inline, and the latter is by far the easiest.
The two x86 changes cause smaller code gen differences than I'd
expect, but I think we do want the asm_inline thing available sooner
or later, so this is just to get the ball rolling.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAl2D9iAACgkQGXyLc2ht
IW31DxAA1ajlhPor7NccuhJrz3hXMkPfEBEaTEYp98a5XYGARqrRwMEDzB7j9O/t
6BZ/ZD8/IehuoNjNP3E5IOnDfvP+a94WL25/hjpTN4aBuZKWJz0X7As8TJJ+bwQc
v2Hyo+yqzcSEhCI7Mc34uo+TuPaFYEoKHvg+hhSXi4h7c5eqtGKCaB2286iEkk/6
bAo7n6ogYN64wXjbVXePmpQWgVJG2tsz/blG0hHMISW5UTzWkK/hYZkSf6jdFGSN
aft1l9EMGx5skAQwFfnDOgf805/TE4jliD2nvaZzT0f/UtYkGjx7C77dcFlPY9Mf
9R1M01rCQ0KfxR5BqHPN/DbTrd+GzBvKswjrTIB+TwopEfy8yVY2uRIq1lP+aobD
V3HOtdicgw3wB/fF40pjqoCp3ByawKLlzRpQuqdSmqHs0kS6ForbfLClg8riru6a
VnJqOXkLlZXU/VysdxuCPbAqN/Rvf9YV3H25x8BOJJWsXa0RdotSbyPIpyPcUo5K
NycxR/vrLJ8hQqEdOY9iKJP+IMGmAfOJD8fppG/Xsnav8c+NtHPAQlW1c8ee/x2g
Dbm2AWPCZnrSrFiQ1mDUdc921d/sLTUC/nOHzR8FiZEJGmYVmYSSw3PekFjQW1aV
TIqvghk15XLB7Ye8JE/eKmoNIBtOsftXpYBzI3716tX46bnPu+4=
=+BVw
-----END PGP SIGNATURE-----
Merge tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux
Pull asm inline support from Miguel Ojeda:
"Make use of gcc 9's "asm inline()" (Rasmus Villemoes):
gcc 9+ (and gcc 8.3, 7.5) provides a way to override the otherwise
crude heuristic that gcc uses to estimate the size of the code
represented by an asm() statement. From the gcc docs
If you use 'asm inline' instead of just 'asm', then for inlining
purposes the size of the asm is taken as the minimum size, ignoring
how many instructions GCC thinks it is.
For compatibility with older compilers, we obviously want a
#if [understands asm inline]
#define asm_inline asm inline
#else
#define asm_inline asm
#endif
But since we #define the identifier inline to attach some attributes,
we have to use an alternate spelling of that keyword. gcc provides
both __inline__ and __inline, and we currently #define both to inline,
so they all have the same semantics.
We have to free up one of __inline__ and __inline, and the latter is
by far the easiest.
The two x86 changes cause smaller code gen differences than I'd
expect, but I think we do want the asm_inline thing available sooner
or later, so this is just to get the ball rolling"
* tag 'compiler-attributes-for-linus-v5.4' of git://github.com/ojeda/linux:
x86: bug.h: use asm_inline in _BUG_FLAGS definitions
x86: alternative.h: use asm_inline for all alternative variants
compiler-types.h: add asm_inline definition
compiler_types.h: don't #define __inline
lib/zstd/mem.h: replace __inline by inline
staging: rtl8723bs: replace __inline by inline
- add modpost warn exported symbols marked as 'static' because 'static'
and EXPORT_SYMBOL is an odd combination
- break the build early if gold linker is used
- optimize the Bison rule to produce .c and .h files by a single
pattern rule
- handle PREEMPT_RT in the module vermagic and UTS_VERSION
- warn CONFIG options leaked to the user-space except existing ones
- make single targets work properly
- rebuild modules when module linker scripts are updated
- split the module final link stage into scripts/Makefile.modfinal
- fix the missed error code in merge_config.sh
- improve the error message displayed on the attempt of the O= build
in unclean source tree
- remove 'clean-dirs' syntax
- disable -Wimplicit-fallthrough warning for Clang
- add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC
- remove ARCH_{CPP,A,C}FLAGS variables
- add $(BASH) to run bash scripts
- change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
instead of the basename
- stop suppressing Clang's -Wunused-function warnings when W=1
- fix linux/export.h to avoid genksyms calculating CRC of trimmed
exported symbols
- misc cleanups
-----BEGIN PGP SIGNATURE-----
iQJSBAABCgA8FiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl1+OnoeHHlhbWFkYS5t
YXNhaGlyb0Bzb2Npb25leHQuY29tAAoJED2LAQed4NsGoKEQAKcid9lDacMe5KWT
4Ic93hANMFKZ9Qy8WoxivnOr1a93NcloZ0Bhka96QUt7hYUkLmDCs99eMbxKuMfP
m/ViHepojOBPzq+VtAGWOiIyPMCA7XDrTPph4wcPDKeOURTreK1PZ20fxDoAR4to
+qaqKZJGdRcNf2DpJN1yIosz8Wj0Sa2LQrRi9jgUHi3bzgvLfL7P9WM2xyZMggAc
GaSktCEFL0UzMFlMpYyDrKh2EV6ryOnN8+bVAKbmWP89tuU3njutycKdWOoL+bsj
tH2kjFThxQyIcZGNHS1VzNunYAFE2q5nj2q47O1EDN6sjTYUoRn5cHwPam6x3Kly
NH88xDEtJ7sUUc9GZEIXADWWD0f08QIhAH5x+jxFg3529lNgyrNHRSQ2XceYNAnG
i/GnMJ0EhODOFKusXw7sNlWFKtukep+8/pwnvfTXWQu6plEm5EQ3a3RL5SESubVo
mHzXsQDFCE0x/UrsJxEAww+3YO3pQEelfVi74W9z0cckpbRF8FuUq/69ltOT15l4
X+gCz80lXMWBKw/kNoR4GQoAJo3KboMEociawwoj72HXEHTPLJnCdUOsAf3n+opj
xuz/UPZ4WYSgKdnbmmDbJ+1POA1NqtARZZXpMVyKVVCOiLafbJkLQYwLKEpE2mOO
TP9igzP1i3/jPWec8cJ6Fa8UwuGh
=VGqV
-----END PGP SIGNATURE-----
Merge tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- add modpost warn exported symbols marked as 'static' because 'static'
and EXPORT_SYMBOL is an odd combination
- break the build early if gold linker is used
- optimize the Bison rule to produce .c and .h files by a single
pattern rule
- handle PREEMPT_RT in the module vermagic and UTS_VERSION
- warn CONFIG options leaked to the user-space except existing ones
- make single targets work properly
- rebuild modules when module linker scripts are updated
- split the module final link stage into scripts/Makefile.modfinal
- fix the missed error code in merge_config.sh
- improve the error message displayed on the attempt of the O= build in
unclean source tree
- remove 'clean-dirs' syntax
- disable -Wimplicit-fallthrough warning for Clang
- add CONFIG_CC_OPTIMIZE_FOR_SIZE_O3 for ARC
- remove ARCH_{CPP,A,C}FLAGS variables
- add $(BASH) to run bash scripts
- change *CFLAGS_<basetarget>.o to take the relative path to $(obj)
instead of the basename
- stop suppressing Clang's -Wunused-function warnings when W=1
- fix linux/export.h to avoid genksyms calculating CRC of trimmed
exported symbols
- misc cleanups
* tag 'kbuild-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (63 commits)
genksyms: convert to SPDX License Identifier for lex.l and parse.y
modpost: use __section in the output to *.mod.c
modpost: use MODULE_INFO() for __module_depends
export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols
export.h: remove defined(__KERNEL__), which is no longer needed
kbuild: allow Clang to find unused static inline functions for W=1 build
kbuild: rename KBUILD_ENABLE_EXTRA_GCC_CHECKS to KBUILD_EXTRA_WARN
kbuild: refactor scripts/Makefile.extrawarn
merge_config.sh: ignore unwanted grep errors
kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)
modpost: add NOFAIL to strndup
modpost: add guid_t type definition
kbuild: add $(BASH) to run scripts with bash-extension
kbuild: remove ARCH_{CPP,A,C}FLAGS
kbuild,arc: add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3 for ARC
kbuild: Do not enable -Wimplicit-fallthrough for clang for now
kbuild: clean up subdir-ymn calculation in Makefile.clean
kbuild: remove unneeded '+' marker from cmd_clean
kbuild: remove clean-dirs syntax
kbuild: check clean srctree even earlier
...
Pull scheduler updates from Ingo Molnar:
- MAINTAINERS: Add Mark Rutland as perf submaintainer, Juri Lelli and
Vincent Guittot as scheduler submaintainers. Add Dietmar Eggemann,
Steven Rostedt, Ben Segall and Mel Gorman as scheduler reviewers.
As perf and the scheduler is getting bigger and more complex,
document the status quo of current responsibilities and interests,
and spread the review pain^H^H^H^H fun via an increase in the Cc:
linecount generated by scripts/get_maintainer.pl. :-)
- Add another series of patches that brings the -rt (PREEMPT_RT) tree
closer to mainline: split the monolithic CONFIG_PREEMPT dependencies
into a new CONFIG_PREEMPTION category that will allow the eventual
introduction of CONFIG_PREEMPT_RT. Still a few more hundred patches
to go though.
- Extend the CPU cgroup controller with uclamp.min and uclamp.max to
allow the finer shaping of CPU bandwidth usage.
- Micro-optimize energy-aware wake-ups from O(CPUS^2) to O(CPUS).
- Improve the behavior of high CPU count, high thread count
applications running under cpu.cfs_quota_us constraints.
- Improve balancing with SCHED_IDLE (SCHED_BATCH) tasks present.
- Improve CPU isolation housekeeping CPU allocation NUMA locality.
- Fix deadline scheduler bandwidth calculations and logic when cpusets
rebuilds the topology, or when it gets deadline-throttled while it's
being offlined.
- Convert the cpuset_mutex to percpu_rwsem, to allow it to be used from
setscheduler() system calls without creating global serialization.
Add new synchronization between cpuset topology-changing events and
the deadline acceptance tests in setscheduler(), which were broken
before.
- Rework the active_mm state machine to be less confusing and more
optimal.
- Rework (simplify) the pick_next_task() slowpath.
- Improve load-balancing on AMD EPYC systems.
- ... and misc cleanups, smaller fixes and improvements - please see
the Git log for more details.
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
sched/psi: Correct overly pessimistic size calculation
sched/fair: Speed-up energy-aware wake-ups
sched/uclamp: Always use 'enum uclamp_id' for clamp_id values
sched/uclamp: Update CPU's refcount on TG's clamp changes
sched/uclamp: Use TG's clamps to restrict TASK's clamps
sched/uclamp: Propagate system defaults to the root group
sched/uclamp: Propagate parent clamps
sched/uclamp: Extend CPU's cgroup controller
sched/topology: Improve load balancing on AMD EPYC systems
arch, ia64: Make NUMA select SMP
sched, perf: MAINTAINERS update, add submaintainers and reviewers
sched/fair: Use rq_lock/unlock in online_fair_sched_group
cpufreq: schedutil: fix equation in comment
sched: Rework pick_next_task() slow-path
sched: Allow put_prev_task() to drop rq->lock
sched/fair: Expose newidle_balance()
sched: Add task_struct pointer to sched_class::set_curr_task
sched: Rework CPU hotplug task selection
sched/{rt,deadline}: Fix set_next_task vs pick_next_task
sched: Fix kerneldoc comment for ia64_set_curr_task
...
We do need to call the constructors for *modules*, and
at least for KASAN in the future, we must call even the
kernel constructors only later when the kernel has been
initialized.
Instead of relying on libc to call them, emit an empty
section for libc and let the kernel's CONSTRUCTORS code
do the rest of the job.
Tested that it indeed doesn't work in modules, and does
work after the fixes in both, with a few functions with
__attribute__((constructor)) in both dynamic and static
builds.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
This adds an asm_inline macro which expands to "asm inline" [1] when
the compiler supports it. This is currently gcc 9.1+, gcc 8.3
and (once released) gcc 7.5 [2]. It expands to just "asm" for other
compilers.
Using asm inline("foo") instead of asm("foo") overrules gcc's
heuristic estimate of the size of the code represented by the asm()
statement, and makes gcc use the minimum possible size instead. That
can in turn affect gcc's inlining decisions.
I wasn't sure whether to make this a function-like macro or not - this
way, it can be combined with volatile as
asm_inline volatile()
but perhaps we'd prefer to spell that
asm_inline_volatile()
anyway.
The Kconfig logic is taken from an RFC patch by Masahiro Yamada [3].
[1] Technically, asm __inline, since both inline and __inline__
are macros that attach various attributes, making gcc barf if one
literally does "asm inline()". However, the third spelling __inline is
available for referring to the bare keyword.
[2] https://lore.kernel.org/lkml/20190907001411.GG9749@gate.crashing.org/
[3] https://lore.kernel.org/lkml/1544695154-15250-1-git-send-email-yamada.masahiro@socionext.com/
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
When CONFIG_MODULES is disabled, CONFIG_UNUSED_SYMBOLS is pointless,
thus it should be invisible.
Instead of adding "depends on MODULES", I moved it to the sub-menu
"Enable loadable module support", which is a better fit. I put it
close to TRIM_UNUSED_KSYMS because it depends on !UNUSED_SYMBOLS.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
These are located in the 'if MODULES' ... 'endif' block.
Remove the redundant dependencies.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
If MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS is enabled (default=n), the
requirement for modules to import all namespaces that are used by
the module is relaxed.
Enabling this option effectively allows (invalid) modules to be loaded
while only a warning is emitted.
Disabling this option keeps the enforcement at module loading time and
loading is denied if the module's imports are not satisfactory.
Reviewed-by: Martijn Coenen <maco@android.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Matthias Maennich <maennich@google.com>
Signed-off-by: Jessica Yu <jeyu@kernel.org>
arch/arc/Makefile overrides -O2 with -O3. This is the only user of
ARCH_CFLAGS. There is no user of ARCH_CPPFLAGS or ARCH_AFLAGS.
My plan is to remove ARCH_{CPP,A,C}FLAGS after refactoring the ARC
Makefile.
Currently, ARC has no way to enable -Wmaybe-uninitialized because both
-O3 and -Os disable it. Enabling it will be useful for compile-testing.
This commit allows allmodconfig (, which defaults to -O2) to enable it.
Add CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE_O3=y to all the defconfig files
in arch/arc/configs/ in order to keep the current config settings.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
The cgroup CPU bandwidth controller allows to assign a specified
(maximum) bandwidth to the tasks of a group. However this bandwidth is
defined and enforced only on a temporal base, without considering the
actual frequency a CPU is running on. Thus, the amount of computation
completed by a task within an allocated bandwidth can be very different
depending on the actual frequency the CPU is running that task.
The amount of computation can be affected also by the specific CPU a
task is running on, especially when running on asymmetric capacity
systems like Arm's big.LITTLE.
With the availability of schedutil, the scheduler is now able
to drive frequency selections based on actual task utilization.
Moreover, the utilization clamping support provides a mechanism to
bias the frequency selection operated by schedutil depending on
constraints assigned to the tasks currently RUNNABLE on a CPU.
Giving the mechanisms described above, it is now possible to extend the
cpu controller to specify the minimum (or maximum) utilization which
should be considered for tasks RUNNABLE on a cpu.
This makes it possible to better defined the actual computational
power assigned to task groups, thus improving the cgroup CPU bandwidth
controller which is currently based just on time constraints.
Extend the CPU controller with a couple of new attributes uclamp.{min,max}
which allow to enforce utilization boosting and capping for all the
tasks in a group.
Specifically:
- uclamp.min: defines the minimum utilization which should be considered
i.e. the RUNNABLE tasks of this group will run at least at a
minimum frequency which corresponds to the uclamp.min
utilization
- uclamp.max: defines the maximum utilization which should be considered
i.e. the RUNNABLE tasks of this group will run up to a
maximum frequency which corresponds to the uclamp.max
utilization
These attributes:
a) are available only for non-root nodes, both on default and legacy
hierarchies, while system wide clamps are defined by a generic
interface which does not depends on cgroups. This system wide
interface enforces constraints on tasks in the root node.
b) enforce effective constraints at each level of the hierarchy which
are a restriction of the group requests considering its parent's
effective constraints. Root group effective constraints are defined
by the system wide interface.
This mechanism allows each (non-root) level of the hierarchy to:
- request whatever clamp values it would like to get
- effectively get only up to the maximum amount allowed by its parent
c) have higher priority than task-specific clamps, defined via
sched_setattr(), thus allowing to control and restrict task requests.
Add two new attributes to the cpu controller to collect "requested"
clamp values. Allow that at each non-root level of the hierarchy.
Keep it simple by not caring now about "effective" values computation
and propagation along the hierarchy.
Update sysctl_sched_uclamp_handler() to use the newly introduced
uclamp_mutex so that we serialize system default updates with cgroup
relate updates.
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Michal Koutny <mkoutny@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Alessio Balsini <balsini@android.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Link: https://lkml.kernel.org/r/20190822132811.31294-2-patrick.bellasi@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
CONFIG_CC_OPTIMIZE_FOR_SIZE was originally an independent boolean
option, but commit 877417e6ff ("Kbuild: change CC_OPTIMIZE_FOR_SIZE
definition") turned it into a choice between _PERFORMANCE and _SIZE.
The phrase "If unsure, say N." sounds like an independent option.
Reword the help text to make it appropriate for the choice menu.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Add CONFIG_ASM_MODVERSIONS. This allows to remove one if-conditional
nesting in scripts/Makefile.build.
scripts/Makefile.build is run every time Kbuild descends into a
sub-directory. So, I want to avoid $(wildcard ...) evaluation
where possible although computing $(wildcard ...) is so cheap that
it may not make measurable performance difference.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
This reverts commit 71c67a31f0.
Commit 117acf5c29 ("powerpc/Makefile: Always pass --synthetic to nm if
supported") removed the only conditional definition of $(NM), so we can
revert our temporary bodge to avoid Kconfig recursion and go back to
passing $(NM) through to the 'tools-support-relr.sh' when detecting
support for RELR relocations.
Signed-off-by: Will Deacon <will@kernel.org>