We expect no warnings to be issued when we specify __GFP_NOWARN, but
currently in paths like alloc_pages() and kmalloc(), there are still some
warnings printed, fix it.
But for some warnings that report usage problems, we don't deal with them.
If such warnings are printed, then we should fix the usage problems.
Such as the following case:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
[zhengqi.arch@bytedance.com: v2]
Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jiri Slaby <jirislaby@kernel.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
At many places in kernel, It is necessary to convert sysfs input to
corresponding bool value e.g. "false" or "0" need to be converted to bool
false, "true" or "1" need to be converted to bool true, places where such
conversion is needed currently check the input string manually,
kstrtobool() can be utilized at such places but currently it doesn't have
support to accept "false"/"true".
Add support to accept "false"/"true" as valid string in kstrtobool().
[akpm@linux-foundation.org: undo s/iff/if/, per Matthew]
Link: https://lkml.kernel.org/r/20220426180203.70782-1-jvgediya@linux.ibm.com
Signed-off-by: Jagdish Gediya <jvgediya@linux.ibm.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Richard Fitzgerald <rf@opensource.cirrus.com>
Cc: Petr Mladek <pmladek@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
If we pass too short string to "hex2bin" (and the string size without
the terminating NUL character is even), "hex2bin" reads one byte after
the terminating NUL character. This patch fixes it.
Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on
error - so we can't just return the variable "hi" or "lo" on error.
This inconsistency may be fixed in the next merge window, but for the
purpose of fixing this bug, we just preserve the existing behavior and
return -1 and -EINVAL.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Fixes: b78049831f ("lib: add error checking to hex2bin")
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The function hex2bin is used to load cryptographic keys into device
mapper targets dm-crypt and dm-integrity. It should take constant time
independent on the processed data, so that concurrently running
unprivileged code can't infer any information about the keys via
microarchitectural convert channels.
This patch changes the function hex_to_bin so that it contains no
branches and no memory accesses.
Note that this shouldn't cause performance degradation because the size
of the new function is the same as the size of the old function (on
x86-64) - and the new function causes no branch misprediction penalties.
I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64
i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32
sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64
powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are
no branches in the generated code.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is a race between xas_split() and xas_load() which can result in
the wrong page being returned, and thus data corruption. Fortunately,
it's hard to hit (syzbot took three months to find it) and often guarded
with VM_BUG_ON().
The anatomy of this race is:
thread A thread B
order-9 page is stored at index 0x200
lookup of page at index 0x274
page split starts
load of sibling entry at offset 9
stores nodes at offsets 8-15
load of entry at offset 8
The entry at offset 8 turns out to be a node, and so we descend into it,
and load the page at index 0x234 instead of 0x274. This is hard to fix
on the split side; we could replace the entire node that contains the
order-9 page instead of replacing the eight entries. Fixing it on
the lookup side is easier; just disallow sibling entries that point
to nodes. This cannot ever be a useful thing as the descent would not
know the correct offset to use within the new node.
The test suite continues to pass, but I have not added a new test for
this bug.
Reported-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Tested-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com
Fixes: 6b24ca4a1a ("mm: Use multi-index entries in the page cache")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Here are 2 small driver core changes for 5.18-rc2.
They are the final bits in the removal of the default_attrs field in
struct kobj_type. I had to wait until after 5.18-rc1 for all of the
changes to do this came in through different development trees, and then
one new user snuck in. So this series has 2 changes:
- removal of the default_attrs field in the powerpc/pseries/vas
code. Change has been acked by the PPC maintainers to come
through this tree
- removal of default_attrs from struct kobj_type now that all
in-kernel users are removed. This cleans up the kobject code
a little bit and removes some duplicated functionality that
confused people (now there is only one way to do default
groups.)
All of these have been in linux-next for all of this week with no
reported problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYlLRHg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yn+9gCfXN0OvKmw5QD55z8YGp/jIycK0ToAnifJ/OX+
sU2V8ZQfNbV8xw7iXfc2
=L+Uc
-----END PGP SIGNATURE-----
Merge tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here are two small driver core changes for 5.18-rc2.
They are the final bits in the removal of the default_attrs field in
struct kobj_type. I had to wait until after 5.18-rc1 for all of the
changes to do this came in through different development trees, and
then one new user snuck in. So this series has two changes:
- removal of the default_attrs field in the powerpc/pseries/vas code.
The change has been acked by the PPC maintainers to come through
this tree
- removal of default_attrs from struct kobj_type now that all
in-kernel users are removed.
This cleans up the kobject code a little bit and removes some
duplicated functionality that confused people (now there is only
one way to do default groups)
Both of these have been in linux-next for all of this week with no
reported problems"
* tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
kobject: kobj_type: remove default_attrs
powerpc/pseries/vas: use default_groups in kobj_type
When partialDecoding, it is EOF if we've either filled the output buffer
or can't proceed with reading an offset for following match.
In some extreme corner cases when compressed data is suitably corrupted,
UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial
may lead to read out of bound problem during decoding. lz4 upstream has
fixed it [2] and this issue has been disscussed here [3] before.
current decompression routine was ported from lz4 v1.8.3, bumping
lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd
better fix it first.
[1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/
[2] c5d6f8a8be#
[3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/
Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com
Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com
Signed-off-by: Guo Xuenan <guoxuenan@huawei.com>
Reviewed-by: Nick Terrell <terrelln@fb.com>
Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Cc: Yann Collet <cyan@fb.com>
Cc: Chengyang Fan <cy.fan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that all in-kernel users of default_attrs for the kobj_type are gone
and converted to properly use the default_groups pointer instead, it can
be safely removed.
There is one standard way to create sysfs files in a kobj_type, and not
two like before, causing confusion as to which should be used.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJHUe0QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpvpNEAC1bxwOgI8Kbi7j37pPClrB2aQRgp1WsTkA
z56rU7BTPApaKGjfObv0CvmUIBcyG6uJhTSr9QGvg0mZDCDDJz58ESIYomvfw+Ob
tfdBLykxL6ad2/JAVTslTH/UUzfyZj5/+JT5KmldOMh1q6KDRQJt022AAKI5Lkdu
XKkAvCV9ZQFwcfzVROb/ribYUkokRHjtQVv8nqyJ7CJ5OEYoI0ghQJNr7/Va9MXA
6YbHJHErbQUsJbxDqqScqkQ3H9upUnJg/CIDKyuptUPT3vDzDkRT9yPvrOhzEk9E
8VEufNO8v/0P26xw/thqPwn8poXTVd61i8HZMvmclofTqL9kqoii1+v4OPgl9uws
7liR2j2HLF/Xd5uceVP/RYvRGzdujdpdj4MgQK6AcPz2LivWY9vMekG/FW0+LxBY
AvILmpSvPAhbRW94lZU6AU/mdqYBolWrz97pke0zPVHSv9OopaYca5pzXWytszPT
o633R3Au/0tUQj4be/v7JZNnK1ESj8KZD7aon/cRH2aejIN87bCLo4BZLELVliPZ
cBdizPJu2tzhhAZyEuaz4IyftL69tCxi2NCiN4mER43mIsDVMxauz7LhDwO0527q
oBHIs7fAObOuNCtXOe9/BiMicGgCp+yil/6EdYexQmyNkVkSOejj9kyI/UAVpgQe
NZSNBuD9UQ==
=QzvG
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Either fixes or a few additions that got missed in the initial merge
window pull. In detail:
- List iterator fix to avoid leaking value post loop (Jakob)
- One-off fix in minor count (Christophe)
- Fix for a regression in how io priority setting works for an
exiting task (Jiri)
- Fix a regression in this merge window with blkg_free() being called
in an inappropriate context (Ming)
- Misc fixes (Ming, Tom)"
* tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block:
blk-wbt: remove wbt_track stub
block: use dedicated list iterator variable
block: Fix the maximum minor value is blk_alloc_ext_minor()
block: restore the old set_task_ioprio() behaviour wrt PF_EXITING
block: avoid calling blkg_free() in atomic context
lib/sbitmap: allocate sb->map via kvzalloc_node
- Documentation update
- Fix test-suite build after move of bitmap.h
- Fix xas_create_range() when a large entry is already present
- Fix xas_split() of a shadow entry
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmJHBfoACgkQDpNsjXcp
gj4eGggAlBsHZCBDT1wY45hQjaZA+GlI1Q7M8/x+MkaK3CN6O3FMdNcbUx/KVkMJ
YItwoh9X5VywsMD4ASxPqT/3t2lJFV7ldNvwQpLr1eVSP34XsVxprYDgT09a/CXS
JEwLoyy18FMCZJTWPdszGvazrtAaQmvEMwcz3Y9km93qVx5o+dvninGsKWfOuu+O
b/+VIv0wHG0RfsXVrC10BfzMlqe50YMrLOWVrb66+XDdjtITeZ2M7PXRtsa5iOtG
TDFzngSrOl59gqqhvDrhZOHY2S+wJnuCaXiG6w6rBLDRucZ5p2x4WWYeqtZGQlDk
nLi6wMAp3fTt6+JlbXPtT01RHWZEyw==
=xrXd
-----END PGP SIGNATURE-----
Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray
Pull XArray updates from Matthew Wilcox:
- Documentation update
- Fix test-suite build after move of bitmap.h
- Fix xas_create_range() when a large entry is already present
- Fix xas_split() of a shadow entry
* tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray:
XArray: Update the LRU list in xas_split()
XArray: Fix xas_create_range() when multi-order entry present
XArray: Include bitmap.h from xarray.h
XArray: Document the locking requirement for the xa_state
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmJFwUMWHHJpY2hhcmRA
c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wQqBD/9gLyeiVp2eu1YFVir64IASgVjK
lNdlAfUwfebtEsw65JcfY8K64910ahw6TvkjTT2A+QGeJIYaVwmw69bLXJUvQq31
C7ZDsMHptuNiZrHDL9SoA0DfwqRdJx3tgGzDnSkhX+2T7Zs5n1nLRMBmn/NJV9Qy
CmxG9fLH1VsU0p6RI76WST3GPLOqWa3jCeHK1vMGZNXI+eo5prHc59lkOcT7lEy7
M4vJRaAV6pCDDYMQdDOYr1PDEeG7/h49EqdKylkOhonDyYB649rL6Lc9nRBvSts3
NXX/qYy1Sj1AlOSR5IOon6QCyk1hap9kr85QoCtz3VMabD/yLlBovZzLOLaF+0S6
dQWgKg806g8QYQGxN03Ph0Pb5cA6hAjr8nVmAuICJDWgmY6Oo74pEvhI8toofFzk
NJzwa6G99xNhfggeTcGdG0ddQDT8N3enKspDPkzpN127GzU5cgvI1Z8wnZXB7JDM
zLMCxzwehocCSrFlh9aQDFK1XJfEWuP66xEPl5cX46//IMKqsrXEOjNlCTRUmA5F
OhU4qqb01OW3K4HPaAkBcGPZ0HhFn6JREUFyNW07dg6s73IWzf0CaNKeYJS7abln
tdvfPg3OPNXCjHd3aCW22EzuB9R/K8BNMkva3QQZxtUa+tOjBdBd9JBJ+vHGA1MN
7/k60wl1dt8/N9yHFg==
=YsK8
-----END PGP SIGNATURE-----
Merge tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Devicetree support (for testing)
- Various cleanups and fixes: UBD, port_user, uml_mconsole
- Maintainer update
* tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: run_helper: Write error message to kernel log on exec failure on host
um: port_user: Improve error handling when port-helper is not found
um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar
um: port_user: Search for in.telnetd in PATH
um: clang: Strip out -mno-global-merge from USER_CFLAGS
docs: UML: Mention telnetd for port channel
um: Remove unused timeval_to_ns() function
um: Fix uml_mconsole stop/go
um: Cleanup syscall_handler_t definition/cast, fix warning
uml: net: vector: fix const issue
um: Fix WRITE_ZEROES in the UBD Driver
um: Migrate vector drivers to NAPI
um: Fix order of dtb unflatten/early init
um: fix and optimize xor select template for CONFIG64 and timetravel mode
um: Document dtb command line option
lib/logic_iomem: correct fallback config references
um: Remove duplicated include in syscalls_64.c
MAINTAINERS: Update UserModeLinux entry
When splitting a value entry, we may need to add the new nodes to the LRU
list and remove the parent node from the LRU list. The WARN_ON checks
in shadow_lru_isolate() catch this oversight. This bug was latent
until we stopped splitting folios in shrink_page_list() with commit
820c4e2e6f ("mm/vmscan: Free non-shmem folios without splitting them").
That allows the creation of large shadow entries, and subsequently when
trying to page in a small page, we will split the large shadow entry
in __filemap_add_folio().
Fixes: 8fc75643c5 ("XArray: add xas_split")
Reported-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
The "test_dev" pointer is freed but then returned to the caller.
Fixes: d9c6a72d6f ("kmod: add test driver to stress test the module loader")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
If there is already an entry present that is of order >= XA_CHUNK_SHIFT
when we call xas_create_range(), xas_create_range() will misinterpret
that entry as a node and dereference xa_node->parent, generally leading
to a crash that looks something like this:
general protection fault, probably for non-canonical address 0xdffffc0000000001:
0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0
RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline]
RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725
It's deterministically reproducable once you know what the problem is,
but producing it in a live kernel requires khugepaged to hit a race.
While the problem has been present since xas_create_range() was
introduced, I'm not aware of a way to hit it before the page cache was
converted to use multi-index entries.
Fixes: 6b24ca4a1a ("mm: Use multi-index entries in the page cache")
Reported-by: syzbot+0d2b0bf32ca5cfd09f2e@syzkaller.appspotmail.com
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
- Enable strict FORTIFY_SOURCE compile-time validation of memcpy buffers
- Add Clang features needed for FORTIFY_SOURCE support
- Enable FORTIFY_SOURCE for Clang where possible
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmI+NxwWHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhnPEACI1AUB9OHzL+VbLhX6zzvPuFRm
7MC11PWyPTa4tkhKGTlVvYbHKwrfcJyAG85rKpz5euWVlzVFkifouT4YAG959CYK
OGUj9WXPRpQ3IIPXXazZOtds4T5sP/m6dSts2NaRIX4w0NKOo3p2mlxUaYoagH1Z
j178epRJ+lbUwPdBmGsSGceb5qDKqubz/sXh51lY3YoLdMZGiom6FLva4STenzZq
SBEJqD2AM0tPWSkrue4OCRig7IsiLhzLvP8jC303suLLHn3eVTvoIT+RRBvwFqXo
MX9B6i3DdCjbWoOg9gA0Jhc6+2+kP7MU1MO6WfWP6IVZh2V1pk4Avmgxy6ypxfwU
fMNqH7CrFmojKOWqF55/1zfrQNNLqnHD3HiDAHpCtATN8kpcZGZXMUb3kT4FIij1
2Mcf6mBQOSqZTg4OvgKzPWGZYJe3KJp5lup5zhWmcOSV0o2gNhFCwXHEmhlNRLzw
idnbghjqBE74UcThQQjyWNBldzdPWVAjgaD696CnziRDCtHiTsrQaIrRsjx9P8NX
3GpoIp0vqDFG4SjFkuGishmlyMWXb3B2Ij7s2WCCSYRHLgOUJQgkhkw5wNZ7F2zD
qjEXaRZXecG5W/gwA4Ak9I2o6oKaK5HPMhNxYp7mlbceYcnuw9gSqeqRAgqX9LJA
kg7orn733jgfMrGhHw==
=8qRJ
-----END PGP SIGNATURE-----
Merge tag 'memcpy-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull FORTIFY_SOURCE updates from Kees Cook:
"This series consists of two halves:
- strict compile-time buffer size checking under FORTIFY_SOURCE for
the memcpy()-family of functions (for extensive details and
rationale, see the first commit)
- enabling FORTIFY_SOURCE for Clang, which has had many overlapping
bugs that we've finally worked past"
* tag 'memcpy-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fortify: Add Clang support
fortify: Make sure strlen() may still be used as a constant expression
fortify: Use __diagnose_as() for better diagnostic coverage
fortify: Make pointer arguments const
Compiler Attributes: Add __diagnose_as for Clang
Compiler Attributes: Add __overloadable for Clang
Compiler Attributes: Add __pass_object_size for Clang
fortify: Replace open-coded __gnu_inline attribute
fortify: Update compile-time tests for Clang 14
fortify: Detect struct member overflows in memset() at compile-time
fortify: Detect struct member overflows in memmove() at compile-time
fortify: Detect struct member overflows in memcpy() at compile-time
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI92rYQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkAJD/9PvRN61YnNRjjAiHgslwMc2fy9lkxwYF4j
+DYqFwnhHgiADO/3Y3wsqHxmDJrhq7vxHM3btxUzkKxg2mVoOI/Bm6rhqEPhNkok
nlpMWHXR+9Jvl85IO5jHg9GHZ/PZfaDMn9naVXVpHVgycdJ06tr7T1tMtoAtsEzA
atEkwpc+r8E2NlxkcTPAQhJzmkrHVdxgtWxlKL/RkmivmBXu3/fj2pLHYyPcvqm1
8LxDn1DIoUHlpce10Qf7r+hf1sXiKNv+nltl9aWxdoSOM8OYHjQcp4K1qe+VYVzC
XbXqg3ZWaGKSnieyawN2yXtFkZSzgyCy+TCTHnf8NwGfgYYk86twh2clP5t6lE58
/TC8CmrBHIy8+79BvpSlTh7LlGip0snY3IVbZhR5EHJV3nDVtg/vdDwiSSQ6VdCM
FM3tkY7KvZDb42IvKzD/NKmAzKv/XMri1MmQB2f/VvbwN3OK5EQOJT1DYFdiohUQ
1YIb81HiGvlogB783HFXXAcHu/qQNZGDK4EDjNFHThPtmYqtLuOixIo0KG6BJnuV
sl/YhtDSe3FRnvcDZ4xki9CpBqHFG7vK85H05NXXdC1ddBdQ+N+yLS1/jONUlkGc
vJphI6FPr+DcPX8o/QuapQpNfg+HXY/h4u83jFJ8VRAyraxSarZ/19at0DM2wdvR
IhKlNfOHlA==
=RAVX
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block
Pull block layer 64-bit data integrity support from Jens Axboe:
"This adds support for 64-bit data integrity in the block layer and in
NVMe"
* tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block:
crypto: fix crc64 testmgr digest byte order
nvme: add support for enhanced metadata
block: add pi for extended integrity
crypto: add rocksoft 64b crc guard tag framework
lib: add rocksoft model crc64
linux/kernel: introduce lower_48_bits function
asm-generic: introduce be48 unaligned accessors
nvme: allow integrity on extended metadata formats
block: support pi with extended metadata
Merge yet more updates from Andrew Morton:
"This is the material which was staged after willystuff in linux-next.
Subsystems affected by this patch series: mm (debug, selftests,
pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise),
and selftests"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits)
selftests: kselftest framework: provide "finished" helper
mm: madvise: MADV_DONTNEED_LOCKED
mm: fix race between MADV_FREE reclaim and blkdev direct IO read
mm: generalize ARCH_HAS_FILTER_PGPROT
mm: unmap_mapping_range_tree() with i_mmap_rwsem shared
mm: warn on deleting redirtied only if accounted
mm/huge_memory: remove stale locking logic from __split_huge_pmd()
mm/huge_memory: remove stale page_trans_huge_mapcount()
mm/swapfile: remove stale reuse_swap_page()
mm/khugepaged: remove reuse_swap_page() usage
mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page()
mm: streamline COW logic in do_swap_page()
mm: slightly clarify KSM logic in do_swap_page()
mm: optimize do_wp_page() for fresh pages in local LRU pagevecs
mm: optimize do_wp_page() for exclusive pages in the swapcache
mm/huge_memory: make is_transparent_hugepage() static
userfaultfd/selftests: enable hugetlb remap and remove event testing
selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
mm: enable MADV_DONTNEED for hugetlb mappings
kasan: disable LOCKDEP when printing reports
...
The function kasan_global_oob was renamed to kasan_global_oob_right, but
the comments referring to it were not updated. Do so.
Link: https://linux-review.googlesource.com/id/I20faa90126937bbee77d9d44709556c3dd4b40be
Link: https://lkml.kernel.org/r/20220219012433.890941-1-pcc@google.com
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Async mode support has already been implemented in commit e80a76aa1a
("kasan, arm64: tests supports for HW_TAGS async mode") but then got
accidentally broken in commit 99734b535d ("kasan: detect false-positives
in tests").
Restore the changes removed by the latter patch and adapt them for asymm
mode: add a sync_fault flag to kunit_kasan_expectation that only get set
if the MTE fault was synchronous, and reenable MTE on such faults in
tests.
Also rename kunit_kasan_expectation to kunit_kasan_status and move its
definition to mm/kasan/kasan.h from include/linux/kasan.h, as this
structure is only internally used by KASAN. Also put the structure
definition under IS_ENABLED(CONFIG_KUNIT).
Link: https://lkml.kernel.org/r/133970562ccacc93ba19d754012c562351d4a8c8.1645033139.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update the existing vmalloc_oob() test to account for the specifics of the
tag-based modes. Also add a few new checks and comments.
Add new vmalloc-related tests:
- vmalloc_helpers_tags() to check that exported vmalloc helpers can
handle tagged pointers.
- vmap_tags() to check that SW_TAGS mode properly tags vmap() mappings.
- vm_map_ram_tags() to check that SW_TAGS mode properly tags
vm_map_ram() mappings.
- vmalloc_percpu() to check that SW_TAGS mode tags regions allocated
for __alloc_percpu(). The tagging of per-cpu mappings is best-effort;
proper tagging is tracked in [1].
[1] https://bugzilla.kernel.org/show_bug.cgi?id=215019
[sfr@canb.auug.org.au: similar to "kasan: test: fix compatibility with FORTIFY_SOURCE"]
Link: https://lkml.kernel.org/r/20220128144801.73f5ced0@canb.auug.org.au
Link: https://lkml.kernel.org/r/865c91ba49b90623ab50c7526b79ccb955f544f0.1644950160.git.andreyknvl@google.com
[andreyknvl@google.com: set_memory_rw/ro() are not exported to modules]
Link: https://lkml.kernel.org/r/019ac41602e0c4a7dfe96dc8158a95097c2b2ebd.1645554036.git.andreyknvl@google.com
[akpm@linux-foundation.org: fix build]
Cc: Andrey Konovalov <andreyknvl@gmail.com>
[andreyknvl@google.com: vmap_tags() and vm_map_ram_tags() pass invalid page array size]
Link: https://lkml.kernel.org/r/bbdc1c0501c5275e7f26fdb8e2a7b14a40a9f36b.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow enabling CONFIG_KASAN_VMALLOC with SW_TAGS and HW_TAGS KASAN modes.
Also adjust CONFIG_KASAN_VMALLOC description:
- Mention HW_TAGS support.
- Remove unneeded internal details: they have no place in Kconfig
description and are already explained in the documentation.
Link: https://lkml.kernel.org/r/bfa0fdedfe25f65e5caa4e410f074ddbac7a0b59.1643047180.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Acked-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4.
While debugging the constant increase in percpu memory consumption on a
system that spawned large number of containers, it was found that a lot
of offline mem_cgroup structures remained in place without being freed.
Further investigation indicated that those mem_cgroup structures were
pinned by some pages.
In order to find out what those pages are, the existing page_owner
debugging tool is extended to show memory cgroup information and whether
those memcgs are offline or not. With the enhanced page_owner tool, the
following is a typical page that pinned the mem_cgroup structure in my
test case:
Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns
PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff)
prep_new_page+0xac/0xe0
get_page_from_freelist+0x1327/0x14d0
__alloc_pages+0x191/0x340
alloc_pages_vma+0x84/0x250
shmem_alloc_page+0x3f/0x90
shmem_alloc_and_acct_page+0x76/0x1c0
shmem_getpage_gfp+0x281/0x940
shmem_write_begin+0x36/0xe0
generic_perform_write+0xed/0x1d0
__generic_file_write_iter+0xdc/0x1b0
generic_file_write_iter+0x5d/0xb0
new_sync_write+0x11f/0x1b0
vfs_write+0x1ba/0x2a0
ksys_write+0x59/0xd0
do_syscall_64+0x37/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8.
So the page was not freed because it was part of a shmem segment. That
is useful information that can help users to diagnose similar problems.
With cgroup v1, /proc/cgroups can be read to find out the total number
of memory cgroups (online + offline). With cgroup v2, the cgroup.stat
of the root cgroup can be read to find the number of dying cgroups (most
likely pinned by dying memcgs).
The page_owner feature is not supposed to be enabled for production
system due to its memory overhead. However, if it is suspected that
dying memcgs are increasing over time, a test environment with
page_owner enabled can then be set up with appropriate workload for
further analysis on what may be causing the increasing number of dying
memcgs.
This patch (of 4):
For *scnprintf(), vsnprintf() is always called even if the input size is
0. That is a waste of time, so just return 0 in this case.
Note that vsnprintf() will never return -1 to indicate an error. So
skipping the call to vsnprintf() when size is 0 will have no functional
impact at all.
Link: https://lkml.kernel.org/r/20220202203036.744010-1-longman@redhat.com
Link: https://lkml.kernel.org/r/20220202203036.744010-2-longman@redhat.com
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Acked-by: Roman Gushchin <guro@fb.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem
operation as distinct from 'cxl_pci' mailbox operations. Its primary
responsibility is enumerating an endpoint 'struct cxl_port' and all the
'struct cxl_port' instances between an endpoint and the CXL platform
root.
- Add a driver for 'struct cxl_port' objects responsible for enumerating
and operating all Host-managed Device Memory (HDM) decoder resources
between the platform-level CXL memory description, all intervening host
bridges / switches, and the HDM resources in endpoints.
- Update the cxl_pci driver to validate CXL.mem operation precursors to
HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC
based CXL.mem configuration.
- Add basic lockdep coverage for usage of device_lock() on CXL subsystem
objects similar to what exists for LIBNVDIMM. Include a compile-time
switch for which subsystem to validate at run-time.
- Update cxl_test to emulate a one level switch topology.
- Document a "Theory of Operation" for the subsystem.
- Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs
- Include miscellaneous fixes for spec / QEMU CXL emulation
compatibility and static analysis reports.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYjpX6AAKCRDfioYZHlFs
ZzyxAQCztxAXj7mzkm1Qt5zZz4e7p/6sR49B03jBTfPtrEF9kQEAl9R15WVt6U+o
Ooof1XhRic3kT6e8zS3ZVKHzGduYxwM=
=mR94
-----END PGP SIGNATURE-----
Merge tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull CXL (Compute Express Link) updates from Dan Williams:
"This development cycle extends the subsystem to discover CXL resources
throughout a CXL/PCIe switch topology and respond to hot add/remove
events anywhere in that topology.
This is more foundational infrastructure in preparation for dynamic
memory region provisioning support. Recall that CXL memory regions, as
the new "Theory of Operation" section of
Documentation/driver-api/cxl/memory-devices.rst describes, bring
storage volume striping semantics to memory.
The hot add/remove behavior is validated with extensions to the
cxl_test unit test environment and this test in the cxl-cli test
suite:
https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh
Summary:
- Add a driver for 'struct cxl_memdev' objects responsible for
CXL.mem operation as distinct from 'cxl_pci' mailbox operations.
Its primary responsibility is enumerating an endpoint 'struct
cxl_port' and all the 'struct cxl_port' instances between an
endpoint and the CXL platform root.
- Add a driver for 'struct cxl_port' objects responsible for
enumerating and operating all Host-managed Device Memory (HDM)
decoder resources between the platform-level CXL memory
description, all intervening host bridges / switches, and the HDM
resources in endpoints.
- Update the cxl_pci driver to validate CXL.mem operation precursors
to HDM decoder operation like ready-polling, and legacy CXL 1.1
DVSEC based CXL.mem configuration.
- Add basic lockdep coverage for usage of device_lock() on CXL
subsystem objects similar to what exists for LIBNVDIMM. Include a
compile-time switch for which subsystem to validate at run-time.
- Update cxl_test to emulate a one level switch topology.
- Document a "Theory of Operation" for the subsystem.
- Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs
- Include miscellaneous fixes for spec / QEMU CXL emulation
compatibility and static analysis reports"
* tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits)
cxl/core/port: Fix NULL but dereferenced coccicheck error
cxl/port: Hold port reference until decoder release
cxl/port: Fix endpoint refcount leak
cxl/core: Fix cxl_device_lock() class detection
cxl/core/port: Fix unregister_port() lock assertion
cxl/regs: Fix size of CXL Capability Header Register
cxl/core/port: Handle invalid decoders
cxl/core/port: Fix / relax decoder target enumeration
tools/testing/cxl: Add a physical_node link
tools/testing/cxl: Enumerate mock decoders
tools/testing/cxl: Mock one level of switches
tools/testing/cxl: Fix root port to host bridge assignment
tools/testing/cxl: Mock dvsec_ranges()
cxl/core/port: Add endpoint decoders
cxl/core: Move target_list out of base decoder attributes
cxl/mem: Add the cxl_mem driver
cxl/core/port: Add switch port enumeration
cxl/memdev: Add numa_node attribute
cxl/pci: Emit device serial number
cxl/pci: Implement wait for media active
...
Merge more updates from Andrew Morton:
"Various misc subsystems, before getting into the post-linux-next
material.
41 patches.
Subsystems affected by this patch series: procfs, misc, core-kernel,
lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump,
taskstats, panic, kcov, resource, and ubsan"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits)
Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang"
kernel/resource: fix kfree() of bootmem memory again
kcov: properly handle subsequent mmap calls
kcov: split ioctl handling into locked and unlocked parts
panic: move panic_print before kmsg dumpers
panic: add option to dump all CPUs backtraces in panic_print
docs: sysctl/kernel: add missing bit to panic_print
taskstats: remove unneeded dead assignment
kasan: no need to unset panic_on_warn in end_report()
ubsan: no need to unset panic_on_warn in ubsan_epilogue()
panic: unset panic_on_warn inside panic()
docs: kdump: add scp example to write out the dump file
docs: kdump: update description about sysfs file system support
arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef
kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible
cgroup: use irqsave in cgroup_rstat_flush_locked().
fat: use pointer to simple type in put_user()
minix: fix bug when opening a file with O_DIRECT
...
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout
the stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower
iTLB pressure and prevents identity mapping huge pages from
getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop
the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called
from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling
of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S
IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS
FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+
AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE
KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3
/DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J
6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw
7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H
K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY
cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo
krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy
dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw=
=ELpe
-----END PGP SIGNATURE-----
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"The sprinkling of SPI drivers is because we added a new one and Mark
sent us a SPI driver interface conversion pull request.
Core
----
- Introduce XDP multi-buffer support, allowing the use of XDP with
jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little.
Remove unnecessary per-netns sockets. Scope some lists to a netns.
Cut down RCU syncing. Use batch methods. Allow netdev registration
to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and
maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the
stack.
- Switch netdev error counters from an atomic to dynamically
allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting
sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF
---
- Introduce "packing allocator" for BPF JIT images. JITed code is
marked read only, and used to be allocated at page granularity.
Custom allocator allows for more efficient memory use, lower iTLB
pressure and prevents identity mapping huge pages from getting
split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce
the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the
user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling
its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if
called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch
bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF
kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels
without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols
---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency
links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable,
via sysctl and setsockopt. Distinguish between server and client
behavior.
- VxLAN support to "collect metadata" devices to terminate only
configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where
given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS.
Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs,
doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent
neighbor solicitation and advertisement from using the wrong port.
Support NS/NA monitor selection similar to existing ARP monitor.
- SMC
- improve performance with TCP_CORK and sendfile()
- support auto-corking
- support TCP_NODELAY
- MCTP (Management Component Transport Protocol)
- add user space tag control interface
- I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth:
- handle MSFT Monitor Device Event
- add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP:
- add support for the SO_SNDTIMEO socket option
- lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API
----------
- Add HW counters for SW netdevs, a mechanism for devices which
offload packet forwarding to report packet statistics back to
software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of
physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of
drivers to use that lock directly instead of creating their own
which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of
TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow
reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture):
- replay and offload of host VLAN entries
- offload of static and local FDB entries on LAG interfaces
- FDB isolation and unicast filtering
New hardware / drivers
----------------------
- Ethernet:
- LAN937x T1 PHYs
- Davicom DM9051 SPI NIC driver
- Realtek RTL8367S, RTL8367RB-VB switch and MDIO
- Microchip ksz8563 switches
- Netronome NFP3800 SmartNICs
- Fungible SmartNICs
- MediaTek MT8195 switches
- WiFi:
- mt76: MediaTek mt7916
- mt76: MediaTek mt7921u USB adapters
- brcmfmac: Broadcom BCM43454/6
- Mobile:
- iosm: Intel M.2 7360 WWAN card
Drivers
-------
- Convert many drivers to the new phylink API built for split PCS
designs but also simplifying other cases.
- Intel Ethernet NICs:
- add TTY for GNSS module for E810T device
- improve AF_XDP performance
- GTP-C and GTP-U filter offload
- QinQ VLAN support
- Mellanox Ethernet NICs (mlx5):
- support xdp->data_meta
- multi-buffer XDP
- offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp):
- flow-independent tc action hardware offload (police / meter)
- AF_XDP
- Other Ethernet NICs:
- at803x: fiber and SFP support
- xgmac: mdio: preamble suppression and custom MDC frequencies
- r8169: enable ASPM L1.2 if system vendor flags it as safe
- macb/gem: ZynqMP SGMII
- hns3: add TX push mode
- dpaa2-eth: software TSO
- lan743x: multi-queue, mdio, SGMII, PTP
- axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw):
- source and dest IP address rewrites
- RJ45 ports
- Marvell Ethernet switches (prestera):
- basic routing offload
- multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix):
- PTP over UDP with the ocelot-8021q DSA tagging protocol
- basic QoS classification on Felix DSA switch using dcbnl
- port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5):
- offloading of bridge port flooding flags
- PTP Hardware Clock
- Other embedded switches:
- lan966x: PTP Hardward Clock
- qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k):
- add LDPC FEC type and 802.11ax High Efficiency data in radiotap
- enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi):
- UHB TAS enablement via BIOS
- band disablement via BIOS
- channel switch offload
- 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76):
- background radar detection
- thermal management improvements on mt7915
- SAR support for more mt76 platforms
- MBSSID and 6 GHz band on mt7915
- RealTek WiFi:
- rtw89: AP mode
- rtw89: 160 MHz channels and 6 GHz band
- rtw89: hardware scan
- Bluetooth:
- mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd):
- multiple RX-FIFOs and runtime configurable RX/TX rings
- internal PLL, runtime PM handling simplification
- improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits)
llc: fix netdevice reference leaks in llc_ui_bind()
drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool
ice: don't allow to run ice_send_event_to_aux() in atomic ctx
ice: fix 'scheduling while atomic' on aux critical err interrupt
net/sched: fix incorrect vlan_push_eth dest field
net: bridge: mst: Restrict info size queries to bridge ports
net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init()
drivers: net: xgene: Fix regression in CRC stripping
net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT
net: dsa: fix missing host-filtered multicast addresses
net/mlx5e: Fix build warning, detected write beyond size of field
iwlwifi: mvm: Don't fail if PPAG isn't supported
selftests/bpf: Fix kprobe_multi test.
Revert "rethook: x86: Add rethook x86 implementation"
Revert "arm64: rethook: Add arm64 rethook implementation"
Revert "powerpc: Add rethook support"
Revert "ARM: rethook: Add rethook arm implementation"
netdevice: add missing dm_private kdoc
net: bridge: mst: prevent NULL deref in br_mst_info_size()
selftests: forwarding: Use same VRF for port and VLAN upper
...
This reverts commit ea91a1d45d.
Since df05c0e949 ("Documentation: Raise the minimum supported version
of LLVM to 11.0.0") the minimum Clang version is now 11.0, which fixed
the UBSAN/KCSAN vs. KCOV incompatibilities.
Link: https://bugs.llvm.org/show_bug.cgi?id=45831
Link: https://lkml.kernel.org/r/YaodyZzu0MTCJcvO@elver.google.com
Link: https://lkml.kernel.org/r/20220128105631.509772-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
panic_on_warn is unset inside panic(), so no need to unset it before
calling panic() in ubsan_epilogue().
Link: https://lkml.kernel.org/r/1644324666-15947-5-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Xuefeng Li <lixuefeng@loongson.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix kernel-doc warings in lib/bitmap.c:
lib/bitmap.c:498: warning: Function parameter or member 'buf' not described in 'bitmap_print_to_buf'
lib/bitmap.c:498: warning: Function parameter or member 'maskp' not described in 'bitmap_print_to_buf'
lib/bitmap.c:498: warning: Function parameter or member 'nmaskbits' not described in 'bitmap_print_to_buf'
lib/bitmap.c:498: warning: Function parameter or member 'off' not described in 'bitmap_print_to_buf'
lib/bitmap.c:498: warning: Function parameter or member 'count' not described in 'bitmap_print_to_buf'
lib/bitmap.c:561: warning: contents before sections
lib/bitmap.c:606: warning: Function parameter or member 'buf' not described in 'bitmap_print_list_to_buf'
lib/bitmap.c:606: warning: Function parameter or member 'maskp' not described in 'bitmap_print_list_to_buf'
lib/bitmap.c:606: warning: Function parameter or member 'nmaskbits' not described in 'bitmap_print_list_to_buf'
lib/bitmap.c:606: warning: Function parameter or member 'off' not described in 'bitmap_print_list_to_buf'
lib/bitmap.c:606: warning: Function parameter or member 'count' not described in 'bitmap_print_list_to_buf'
lib/bitmap.c:819: warning: missing initial short description on line:
* bitmap_parselist_user()
This still leaves 15 warnings for function return values not described,
similar to this one:
bitmap.c:890: warning: No description found for return value of 'bitmap_parse'
Link: https://lkml.kernel.org/r/20220306065823.5153-1-rdunlap@infradead.org
Fixes: 1fae562983 ("cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list")
Fixes: 4b060420a5 ("bitmap, irq: add smp_affinity_list interface to /proc/irq")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tian Tao <tiantao6@hisilicon.com>
Cc: Mike Travis <mike.travis@hpe.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0Day robots reported there is compiling issue for 'csky' ARCH when
CONFIG_DEBUG_FORCE_DATA_SECTION_ALIGNED is enabled [1]:
All errors (new ones prefixed by >>):
{standard input}: Assembler messages:
>> {standard input}:2277: Error: pcrel offset for branch to .LS000B too far (0x3c)
Which was discussed in [2]. And as there is no solution for csky yet, add
some dependency for this config to limit it to several ARCHs which have no
compiling issue so far.
[1]. https://lore.kernel.org/lkml/202202271612.W32UJAj2-lkp@intel.com/
[2]. https://www.spinics.net/lists/linux-kbuild/msg30298.html
Link: https://lkml.kernel.org/r/20220304021100.GN4548@shbuild999.sh.intel.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Cc: Guo Ren <guoren@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently it's not possible to enable DEBUG_INFO for an all*config
build, since it is marked as "depends on !COMPILE_TEST".
This generally makes sense because a debug build of an all*config target
ends up taking much longer and the output is much larger. Having this
be "default off" makes sense.
However, there are cases where enabling DEBUG_INFO for such builds is
useful for doing treewide A/B comparisons of build options, etc.
Make DEBUG_INFO selectable from any of the DWARF version choice options,
with DEBUG_INFO_NONE being the default for COMPILE_TEST.
The mutually exclusive relationship between DWARF5 and BTF must be
inverted, but the result remains the same. Additionally moves
DEBUG_KERNEL and DEBUG_MISC up to the top of the menu because they were
enabling features _above_ it, making it weird to navigate menuconfig.
[keescook@chromium.org: make DEBUG_INFO always default=n]
Link: https://lkml.kernel.org/r/20220128214131.580131-1-keescook@chromium.org
Link: https://lore.kernel.org/lkml/YfRY6+CaQxX7O8vF@dev-arch.archlinux-ax161
Link: https://lkml.kernel.org/r/20220125075126.891825-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good. This
was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly
tricky and error-prone code.
There is a small merge conflict against a parisc cleanup, the
solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel. The
hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
There are some obvious conflicts against changes to the removed
files.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmI69BsACgkQmmx57+YA
GNn/zA//f4d5VTT0ThhRxRWTu9BdThGHoB8TUcY7iOhbsWu0X/913NItRC3UeWNl
IdmisaXgVtirg1dcC2pWUmrcHdoWOCEGfK4+Zr2NhSWfuZDWvODHK9pGWk4WLnhe
cQgUNBvIuuAMryGtrOBwHPO4TpfCyy2ioeVP36ZfcsWXdDxTrqfaq/56mk3sxIP6
sUTk1UEjut9NG4C9xIIvcSU50R3l6LryQE/H9kyTLtaSvfvTOvprcVYCq0GPmSzo
DtQ1Wwa9zbJ+4EqoMiP5RrgQwWvOTg2iRByLU8ytwlX3e/SEF0uihvMv1FQbL8zG
G8RhGUOKQSEhaBfc3lIkm8GpOVPh0uHzB6zhn7daVmAWtazRD2Nu59BMjipa+ims
a8Z58iHH7jRAnKeEkVZqXKb1CEiUxaQx/IeVPzN4QlwMhDtwrI76LY7ZJ1zCqTGY
ENG0yRLav1XselYBslOYXGtOEWcY5EZPWqLyWbp4P9vz2g0Fe0gZxoIOvPmNQc89
QnfXpCt7vm/DGkyO255myu08GOLeMkisVqUIzLDB9avlym5mri7T7vk9abBa2YyO
CRpTL5gl1/qKPWuH1UI5mvhT+sbbBE2SUHSuy84btns39ZKKKynwCtdu+hSQkKLE
h9pV30Gf1cLTD4JAE0RWlUgOmbBLVp34loTOexQj4MrLM1noOnw=
=vtCN
-----END PGP SIGNATURE-----
Merge tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull asm-generic updates from Arnd Bergmann:
"There are three sets of updates for 5.18 in the asm-generic tree:
- The set_fs()/get_fs() infrastructure gets removed for good.
This was already gone from all major architectures, but now we can
finally remove it everywhere, which loses some particularly tricky
and error-prone code. There is a small merge conflict against a
parisc cleanup, the solution is to use their new version.
- The nds32 architecture ends its tenure in the Linux kernel.
The hardware is still used and the code is in reasonable shape, but
the mainline port is not actively maintained any more, as all
remaining users are thought to run vendor kernels that would never
be updated to a future release.
- A series from Masahiro Yamada cleans up some of the uapi header
files to pass the compile-time checks"
* tag 'asm-generic-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (27 commits)
nds32: Remove the architecture
uaccess: remove CONFIG_SET_FS
ia64: remove CONFIG_SET_FS support
sh: remove CONFIG_SET_FS support
sparc64: remove CONFIG_SET_FS support
lib/test_lockup: fix kernel pointer check for separate address spaces
uaccess: generalize access_ok()
uaccess: fix type mismatch warnings from access_ok()
arm64: simplify access_ok()
m68k: fix access_ok for coldfire
MIPS: use simpler access_ok()
MIPS: Handle address errors for accesses above CPU max virtual user address
uaccess: add generic __{get,put}_kernel_nofault
nios2: drop access_ok() check from __put_user()
x86: use more conventional access_ok() definition
x86: remove __range_not_ok()
sparc64: add __{get,put}_kernel_nofault()
nds32: fix access_ok() checks in get/put_user
uaccess: fix nios2 and microblaze get_user_8()
sparc64: fix building assembly files
...
This KUnit update for Linux 5.18-rc1 consists of:
- changes to decrease macro layering string, integer, EQ/NE asserts
- remove unused macros
- several cleanups and fixes
- new list tests for list_del_init_careful(), list_is_head() and
list_entry_is_head()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmI5KcsACgkQCwJExA0N
Qxy37BAA4NKkZHOpIk3P+aHbqE/S+Utg+gHsFOS7srp8wTeM1nSVMCP7MYefBiRs
4+R6RViCAvd5skK5/4UkYp53KePOww4Qo5zZKfN5J+479juMk+8CJtk3QwgY0IAu
jaI3nZlvo+WW+2OdIXdYNNScLR5mKHVSxpoLs1KtJZXm62RQgycoGCrIEtiAKYTk
w2mMUxG4X0upIF08xTfb5UDQyyMjqWMZJZ0l65xsJr4bgU+It0HoYCmPzqufpGza
ZgTWac8Iai1sEzxPXaTMLCM6V3QlbESIaIB6J13BWS+OvKs7cbcIADnG79Nvh7eH
v8v9fXTojlS6vSNJUqxA8S0f2kGJ2mVmePg11ZeOh2oqaF6l1bs7iFJQPc3PidRl
/dobIMBGlEI2yi9vaRz6/roDp44K56OlbthtSlaEc1NLyI/+nGuG7hzXuXkmoNiX
LloMfTmcCtrWGUnZH80K18l03T1swEiKzLuYMlzNvVz7jiIoZhXw4YG8H2FHJrpf
9LOJFEJgVcCp5JmDTk19HwN1OogH8TcbaJkQE0EthxExb2LW5BfO9cXzQ/n+uapl
QoN+5ig1x2ozyplVOhz/6VbmKxf7EDEOiYr1F1Kbc5qdSm1kdRQQTrMaWJkQ+KzT
bo+yWr/2zkAqrCns5lbUERfhBSx9jZqcnmUPcdcXLd7qse0cnKc=
=e1/u
-----END PGP SIGNATURE-----
Merge tag 'linux-kselftest-kunit-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit updates from Shuah Khan:
- changes to decrease macro layering string, integer, EQ/NE asserts
- remove unused macros
- several cleanups and fixes
- new list tests for list_del_init_careful(), list_is_head() and
list_entry_is_head()
* tag 'linux-kselftest-kunit-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
list: test: Add a test for list_entry_is_head()
list: test: Add a test for list_is_head()
list: test: Add test for list_del_init_careful()
kunit: cleanup assertion macro internal variables
kunit: factor out str constants from binary assertion structs
kunit: consolidate KUNIT_INIT_BINARY_ASSERT_STRUCT macros
kunit: remove va_format from kunit_assert
kunit: tool: drop mostly unused KunitResult.result field
kunit: decrease macro layering for EQ/NE asserts
kunit: decrease macro layering for integer asserts
kunit: reduce layering in string assertion macros
kunit: drop unused intermediate macros for ptr inequality checks
kunit: make KUNIT_EXPECT_EQ() use KUNIT_EXPECT_EQ_MSG(), etc.
kunit: drop unused assert_type from kunit_assert and clean up macros
kunit: split out part of kunit_assert into a static const
kunit: factor out kunit_base_assert_format() call into kunit_fail()
kunit: drop unused kunit* field in kunit_assert
kunit: move check if assertion passed into the macros
kunit: add example test case showing off all the expect macros
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAmI4ggsACgkQUqAMR0iA
lPLrlA//R12HGCGSzdpdyynl+5wByIqcHe8RANHOAj9f9qxBtmYv2ZK69mzSvhHO
6kAGdb3vBtxo1NCHeqxlXpds9GP/zOGEWmEJP2P7pIZ8ci8QtwrXCtQ8XIW9UGhJ
WHzXpXkfzcIDsRZs6B1pxN5cRXuW2VVzfgxyu6L+hvNV0o0PPO4A48ptzNBZh8rj
URAid+n/aGs9SOXM0h8SRjjBYEqjiB2RZ3gLg5XGZmcATtitBO135LGZnBR2fwnX
RZKckbdA/fBzqS4Njsp2rV5Rqldwj7mHzQbcsQm4YDrxSdl8d78XxQdAA5sNyaCD
ToDw6/DeegXzgtPJpuBH/ymF9RczIu4l3eawO1FBMCB5EPq56zVHWErxry8qaTgi
yQFqhBgifNN5NqfQCn7dyF10usmsvImFczre7ZxJvL7vmzqDsYYqdZG5oouLudR4
iOphFwX71v4X+RsxbOXqEt+mS3AwqEJc1SZl5rrDc4TSUOE1qCd+ncLTAuAf3Wfm
1xaZ+siomahcZAKrgmSw6AcD5bU+JJpr6FktKAddiO7J1+nIdT1lYEbpUsfWZ/p8
Kx8A2M2ula+whJ6CgtnGTsbsacsFi+j/MioMTGZIU+Fubkig3XEeIp3QUO6sEN+9
/sUQ6Wj6c95miWdttff9o6ap8py9NbfuKIw/HMOesfLVKP82rmw=
=EUpJ
-----END PGP SIGNATURE-----
Merge tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Make %pK behave the same as %p for kptr_restrict == 0 also with
no_hash_pointers parameter
- Ignore the default console in the device tree also when console=null
or console="" is used on the command line
- Document console=null and console="" behavior
- Prevent a deadlock and a livelock caused by console_lock in panic()
- Make console_lock available for panicking CPU
- Fast query for the next to-be-used sequence number
- Use the expected return values in printk.devkmsg __setup handler
- Use the correct atomic operations in wake_up_klogd() irq_work handler
- Avoid possible unaligned access when handling %4cc printing format
* tag 'printk-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux:
printk: fix return value of printk.devkmsg __setup handler
vsprintf: Fix %pK with kptr_restrict == 0
printk: make suppress_panic_printk static
printk: Set console_set_on_cmdline=1 when __add_preferred_console() is called with user_specified == true
Docs: printk: add 'console=null|""' to admin/kernel-parameters
printk: use atomic updates for klogd work
printk: Drop console_sem during panic
printk: Avoid livelock with heavy printk during panic
printk: disable optimistic spin during panic
printk: Add panic_in_progress helper
vsprintf: Move space out of string literals in fourcc_string()
vsprintf: Fix potential unaligned access
printk: ringbuffer: Improve prb_next_seq() performance
- Rewrite how munlock works to massively reduce the contention
on i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
=n9Ad
-----END PGP SIGNATURE-----
Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
Allow the use of a deferrable timer, which does not force CPU wake-ups
when the system is idle. A consequence is that the sample interval
becomes very unpredictable, to the point that it is not guaranteed that
the KFENCE KUnit test still passes.
Nevertheless, on power-constrained systems this may be preferable, so
let's give the user the option should they accept the above trade-off.
Link: https://lkml.kernel.org/r/20220308141415.3168078-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In function kunit_test_timeout, it is declared "300 * MSEC_PER_SEC"
represent 5min. However, it is wrong when dealing with arm64 whose
default HZ = 250, or some other situations. Use msecs_to_jiffies to fix
this, and kunit_test_timeout will work as desired.
Link: https://lkml.kernel.org/r/20220309083753.1561921-3-liupeng256@huawei.com
Fixes: 5f3e062089 ("kunit: test: add support for test abort")
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Daniel Latypov <dlatypov@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Patch series "kunit: fix a UAF bug and do some optimization", v2.
This series is to fix UAF (use after free) when running kfence test case
test_gfpzero, which is time costly. This UAF bug can be easily triggered
by setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Furthermore, some
optimization for kunit tests has been done.
This patch (of 3):
Kunit will create a new thread to run an actual test case, and the main
process will wait for the completion of the actual test thread until
overtime. The variable "struct kunit test" has local property in function
kunit_try_catch_run, and will be used in the test case thread. Task
kunit_try_catch_run will free "struct kunit test" when kunit runs
overtime, but the actual test case is still run and an UAF bug will be
triggered.
The above problem has been both observed in a physical machine and qemu
platform when running kfence kunit tests. The problem can be triggered
when setting CONFIG_KFENCE_NUM_OBJECTS = 65535. Under this setting, the
test case test_gfpzero will cost hours and kunit will run to overtime.
The follows show the panic log.
BUG: unable to handle page fault for address: ffffffff82d882e9
Call Trace:
kunit_log_append+0x58/0xd0
...
test_alloc.constprop.0.cold+0x6b/0x8a [kfence_test]
test_gfpzero.cold+0x61/0x8ab [kfence_test]
kunit_try_run_case+0x4c/0x70
kunit_generic_run_threadfn_adapter+0x11/0x20
kthread+0x166/0x190
ret_from_fork+0x22/0x30
Kernel panic - not syncing: Fatal exception
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
To solve this problem, the test case thread should be stopped when the
kunit frame runs overtime. The stop signal will send in function
kunit_try_catch_run, and test_gfpzero will handle it.
Link: https://lkml.kernel.org/r/20220309083753.1561921-1-liupeng256@huawei.com
Link: https://lkml.kernel.org/r/20220309083753.1561921-2-liupeng256@huawei.com
Signed-off-by: Peng Liu <liupeng256@huawei.com>
Reviewed-by: Marco Elver <elver@google.com>
Reviewed-by: Brendan Higgins <brendanhiggins@google.com>
Tested-by: Brendan Higgins <brendanhiggins@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Wang Kefeng <wangkefeng.wang@huawei.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: David Gow <davidgow@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The workingset will add the xa_node to the shadow_nodes list. So the
allocation of xa_node should be done by kmem_cache_alloc_lru(). Using
xas_set_lru() to pass the list_lru which we want to insert xa_node into to
set up the xa_node reclaim context correctly.
Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Alex Shi <alexs@kernel.org>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Fam Zheng <fam.zheng@bytedance.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Kari Argillander <kari.argillander@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Convert overflow selftest to KUnit
- Convert stackinit selftest to KUnit
- Implement size_t saturating arithmetic helpers
- Allow struct_size() to be used in initializers
-----BEGIN PGP SIGNATURE-----
iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmI4l80WHGtlZXNjb29r
QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjsSEACqmwsnvyQXI+fKBr/wsqGRGdTx
cURccVT/mhQSaAAJMoYjWqOQZVs63dwtoM9leVA9rZuAFNFyiGKrK5r/KhpOijYu
AlIOPJzxDnPDu/jHHtAnDgsUeTHPhDnqLPK5j+oz1gPkyHBLyBFvEqDNrlAiTbvV
JLkssdcYPEv8QiLBkqX5ossOexxHksvxixmXts1Vc85I/anyuvtbpq/u7HsUrbcO
+f/qj7ekB114VgREPJZu5wc2pB+iJMA8jEGqrNLWCOqRIFXJOWLWky/wmATjwXST
Pi1kwzII7XZQMrVlMOK0P4YxepLKv5wnJGxZIi6JwJswd0a6oc8NLDTXrtHEq0jq
5Vqq+nPCyW2+OLWF5sNLYzlArI3G6tIPWQSxJcLfcnXLP/tz1+KiW4aa46V16N+D
MBQBCK1xei61kWFixn5qGVydOoaTTXgDhMWenxEk55EuU+S9XmiC1Nwvodsl65dv
RVGEYfk/7AlRGGTdasn35+6cmrFaCrElGz8+ZfDTaZZbbr6FfWpXRB4xQYwmqwDh
YGoyXNQdqlxtGaH5lutmsK5l+q2NlD0u8qRk6pti07hHMAJEyb0i6o3lNsUyw38T
gjoglwZUYOUwGOaWk6IOA7Gc3vCycdzP5t2njjBx/54PrCI9tq1oCN9bE6eAtRcA
4BoHC368qhuPttUaWA==
=eRcK
-----END PGP SIGNATURE-----
Merge tag 'overflow-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull overflow updates from Kees Cook:
"These changes come in roughly two halves: support of Gustavo A. R.
Silva's struct_size() work via additional helpers for catching
overflow allocation size calculations, and conversions of selftests to
KUnit (which includes some tweaks for UML + Clang):
- Convert overflow selftest to KUnit
- Convert stackinit selftest to KUnit
- Implement size_t saturating arithmetic helpers
- Allow struct_size() to be used in initializers"
* tag 'overflow-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib: stackinit: Convert to KUnit
um: Allow builds with Clang
lib: overflow: Convert to Kunit
overflow: Provide constant expression struct_size
overflow: Implement size_t saturating arithmetic helpers
test_overflow: Regularize test reporting output
sbitmap has been used in scsi for replacing atomic operations on
sdev->device_busy, so IOPS on some fast scsi storage can be improved.
However, sdev->device_busy can be changed in fast path, so we have to
allocate the sb->map statically. sdev->device_busy has been capped to 1024,
but some drivers may configure the default depth as < 8, then
cause each sbitmap word to hold only one bit. Finally 1024 * 128(
sizeof(sbitmap_word)) bytes is needed for sb->map, given it is order 5
allocation, sometimes it may fail.
Avoid the issue by using kvzalloc_node() for allocating sb->map.
Cc: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220316012708.354668-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0/QUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn8GEACRVxJaJV5qjZfoFAQKoAWJEtquwjeARyB+
0V8ROWHDWHSacdug9wBytayiS1lz2zmUHJ6YXyts2dn0v6CrK4s8yGzk5G/RgH6+
6M3GmBKjj+r1DfE8L3OoQWkDR1JFPuFxXTG/uBd7fBY2Excih1Z0D2lpspMleIRf
w8zBrlWrWH8lZlm6HF3fadjEoiWhOM5F4Ofz3eg/PAQrHuD06z8hjQgMeR0jQVzw
bWF9jrdNIplxRjNWIwCTsQRM+z5KQhUGwDODJjIwdQtVaKSt9D99ZbeKTudlslQ2
zrizsCq8P1RjBPcrA45FV6QnT9DIRRGrYzHD63qC6fDae34rbzdSHUwRMP2XSxo8
+hT1AzGypiBauODTPzHFtTskaQ0KibLznEanChh/ThySmNYcEVAljSx3Z5Vo81J+
IqJYK2m3RESCFruy9w3U/P7qiXZmqYldPfjxAKq8ucg6x1PU3XRAVm7SI/i4l75D
Crk1ujj2LJgsyxL6qMrK3XUavl1SJdzWeFSarcCt3m4m11EWWfYzmG8Yn8OE2CEZ
a2CAyDsRi8CZ3hvkaMwigL4wBJjrrig8vyIgok3VrfCmYlNNqMQqM5Rw7vzjR3v1
cKewI3rQjkFXEaveIXyGPTI/0Da4cT0DOfn/Mws9MDUXNPlFMNEDUZkPuzMywiTB
2SWDLTe77g==
=993h
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- NVMe updates via Christoph:
- add vectored-io support for user-passthrough (Kanchan Joshi)
- add verbose error logging (Alan Adamson)
- support buffered I/O on block devices in nvmet (Chaitanya
Kulkarni)
- central discovery controller support (Martin Belanger)
- fix and extended the globally unique idenfier validation
(Christoph)
- move away from the deprecated IDA APIs (Sagi Grimberg)
- misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
Chaitanya Kulkarni)
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
- warn about shared namespaces without multipathing (Christoph)
- MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
Erik, Dirk)
- loop cleanups and queue depth configuration (Chaitanya)
- null_blk cleanups and fixes (Chaitanya)
- Use descriptive init/exit names in virtio_blk (Randy)
- Use bvec_kmap_local() in drivers (Christoph)
- bcache fixes (Mingzhe)
- xen blk-front persistent grant speedups (Juergen)
- rnbd fix and cleanup (Gioh)
- Misc fixes (Christophe, Colin)
* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
virtio_blk: eliminate anonymous module_init & module_exit
nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
nvme: remove nvme_alloc_request and nvme_alloc_request_qid
nvme: cleanup how disk->disk_name is assigned
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet: use snprintf() with PAGE_SIZE in configfs
nvmet: don't fold lines
nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
nvme-tcp: lockdep: annotate in-kernel sockets
nvme-tcp: don't fold the line
nvme-tcp: don't initialize ret variable
nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
nvme-multipath: use vmalloc for ANA log buffer
xen/blkfront: speed up purge_persistent_grants()
raid5: initialize the stripe_head embeeded bios as needed
raid5-cache: statically allocate the recovery ra bio
raid5-cache: fully initialize flush_bio when needed
raid5-ppl: fully initialize the bio in ppl_new_iounit
...
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmIzwtEACgkQSfxwEqXe
A67NCBAA1+U01HXx4ethmmy1m2pXHAIwngI7PP0QzyZtmoloWockdN1lRfQ1C0uJ
Whk/9Hc9G7iujznsxOnCS+LeNwRzd7CjtFbTgK+yGIRKwL9GFcVwA5nrifP9TjqZ
FWmTIomjjmA06YRYsNOdNSQdN6DdpQz8xLw0EqVOZerI4ITFErYlW8lLqOOKY99N
f9glQK75kh41SUgo+K3JSn46fhB95HldL6dYSZzjQ6QsVKBQuQTDE9ryfrH2XZDw
xI2nf/ycXPUBv7Bb+0op+7ES++CoDigM2nIyxapEj3ZkpplxL4M+cCIHq3Juzfwm
jDdbZbs5SqDszOQM/dvCJSR+S/D3QIKdv3fwwWHDTigByZdgpudT3rr9k7dY60Z8
aNvOzNWOzGH9/0boLl55WysF6cBQnazbgtzeWpzeuWFhAyfxN/DJx2sf8U+TmN6n
3bDUafamAvmkkIOoHUzOXfjo2lhXxlmRZ40rWVNX5JvcJj5+5jRmTawrQj+9fn8/
MhiIZ6KBDV1OxPwJzG6jm++JP6rgXfXsxduomO7cIEWs10itf/cE8WD9qJrtZTtg
kfjYUguFOd/QyzY0A1w6FD865vy8YhATk71Ywgwj9AI+cfH8QUajpDkXOutjop8x
8HBxIGx6Itgzilfuo5jpJxlVhNO3G6v1fX/A+mUMAfHufkmnfiQ=
=cyDR
-----END PGP SIGNATURE-----
Merge tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
"There have been a few important changes to the RNG's crypto, but the
intent for 5.18 has been to shore up the existing design as much as
possible with modern cryptographic functions and proven constructions,
rather than actually changing up anything fundamental to the RNG's
design.
So it's still the same old RNG at its core as before: it still counts
entropy bits, and collects from the various sources with the same
heuristics as before, and so forth. However, the cryptographic
algorithms that transform that entropic data into safe random numbers
have been modernized.
Just as important, if not more, is that the code has been cleaned up
and re-documented. As one of the first drivers in Linux, going back to
1.3.30, its general style and organization was showing its age and
becoming both a maintenance burden and an auditability impediment.
Hopefully this provides a more solid foundation to build on for the
future. I encourage you to open up the file in full, and maybe you'll
remark, "oh, that's what it's doing," and enjoy reading it. That, at
least, is the eventual goal, which this pull begins working toward.
Here's a summary of the various patches in this pull:
- /dev/urandom and /dev/random now do the same thing, per the patch
we discussed on the list. I think this is worth trying out. If it
does appear problematic, I've made sure to keep it standalone and
revertible without any conflicts.
- Fixes and cleanups for numerous integer type problems, locking
issues, and general code quality concerns.
- The input pool's LFSR has been replaced with a cryptographically
secure hash function, which has security and performance benefits
alike, and consequently allows us to count entropy bits linearly.
- The pre-init injection now uses a real hash function too, instead
of an LFSR or vanilla xor.
- The interrupt handler's fast_mix() function now uses one round of
SipHash, rather than the fake crypto that was there before.
- All additions of RDRAND and RDSEED now go through the input pool's
hash function, in part to mitigate ridiculous hypothetical CPU
backdoors, but more so to have a consistent interface for ingesting
entropy that's easy to analyze, making everything happen one way,
instead of a potpourri of different ways.
- The crng now works on per-cpu data, while also being in accordance
with the actual "fast key erasure RNG" design. This allows us to
fix several boot-time race complications associated with the prior
dynamically allocated model, eliminates much locking, and makes our
backtrack protection more robust.
- Batched entropy now erases doled out values so that it's backtrack
resistant.
- Working closely with Sebastian, the interrupt handler no longer
needs to take any locks at all, as we punt the
synchronized/expensive operations to a workqueue. This is
especially nice for PREEMPT_RT, where taking spinlocks in irq
context is problematic. It also makes the handler faster for the
rest of us.
- Also working with Sebastian, we now do the right thing on CPU
hotplug, so that we don't use stale entropy or fail to accumulate
new entropy when CPUs come back online.
- We handle virtual machines that fork / clone / snapshot, using the
"vmgenid" ACPI specification for retrieving a unique new RNG seed,
which we can use to also make WireGuard (and in the future, other
things) safe across VM forks.
- Around boot time, we now try to reseed more often if enough entropy
is available, before settling on the usual 5 minute schedule.
- Last, but certainly not least, the documentation in the file has
been updated considerably"
* tag 'random-5.18-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (60 commits)
random: check for signal and try earlier when generating entropy
random: reseed more often immediately after booting
random: make consistent usage of crng_ready()
random: use SipHash as interrupt entropy accumulator
wireguard: device: clear keys on VM fork
random: provide notifier for VM fork
random: replace custom notifier chain with standard one
random: do not export add_vmfork_randomness() unless needed
virt: vmgenid: notify RNG of VM fork and supply generation ID
ACPI: allow longer device IDs
random: add mechanism for VM forks to reinitialize crng
random: don't let 644 read-only sysctls be written to
random: give sysctl_random_min_urandom_seed a more sensible value
random: block in /dev/urandom
random: do crng pre-init loading in worker rather than irq
random: unify cycles_t and jiffies usage and types
random: cleanup UUID handling
random: only wake up writers after zap if threshold was passed
random: round-robin registers as ulong, not u32
random: clear fast pool, crng, and batches in cpuhp bring up
...
Convert stackinit unit tests to KUnit, for better integration
into the kernel self test framework. Includes a rename of
test_stackinit.c to stackinit_kunit.c, and CONFIG_TEST_STACKINIT to
CONFIG_STACKINIT_KUNIT_TEST.
Adjust expected test results based on which stack initialization method
was chosen:
$ CMD="./tools/testing/kunit/kunit.py run stackinit --raw_output \
--arch=x86_64 --kconfig_add"
$ $CMD | grep stackinit:
# stackinit: pass:36 fail:0 skip:29 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y | grep stackinit:
# stackinit: pass:37 fail:0 skip:28 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y | grep stackinit:
# stackinit: pass:55 fail:0 skip:10 total:65
$ $CMD CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y | grep stackinit:
# stackinit: pass:62 fail:0 skip:3 total:65
$ $CMD CONFIG_INIT_STACK_ALL_PATTERN=y --make_option LLVM=1 | grep stackinit:
# stackinit: pass:60 fail:0 skip:5 total:65
$ $CMD CONFIG_INIT_STACK_ALL_ZERO=y --make_option LLVM=1 | grep stackinit:
# stackinit: pass:60 fail:0 skip:5 total:65
Temporarily remove the userspace-build mode, which will be restored in a
later patch.
Expand the size of the pre-case switch variable so it doesn't get
accidentally cleared.
Cc: David Gow <davidgow@google.com>
Cc: Daniel Latypov <dlatypov@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v1: https://lore.kernel.org/lkml/20220224055145.1853657-1-keescook@chromium.org
v2:
- split "userspace KUnit stub" into separate header and patch (Daniel)
- Improve commit log and comments (David)
- Provide mapping of expected XFAIL tests to CONFIGs (David)
Adding support to have priv pointer in swap callback function.
Following the initial change on cmp callback functions [1]
and adding SWAP_WRAPPER macro to identify sort call of sort_r.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://lore.kernel.org/bpf/20220316122419.933957-2-jolsa@kernel.org
[1] 4333fb96ca ("media: lib/sort.c: implement sort() variant taking context argument")