Previously, we have two mechanisms to cache & submit small discards:
a) set max small discard number in /sys/fs/f2fs/vdb/max_small_discards,
and checkpoint will cache small discard candidates w/ configured maximum
number.
b) call FITRIM ioctl, also, checkpoint in f2fs_trim_fs() will cache small
discard candidates w/ configured discard granularity, but w/o limitation
of number. FSTRIM interface is asynchronized, so it won't submit discard
directly.
Finally, discard thread will submit them in background periodically.
However, after commit 9ac00e7cef ("f2fs: do not issue small discard
commands during checkpoint"), the mechanism a) is broken, since no matter
how we configure the sysfs entry /sys/fs/f2fs/vdb/max_small_discards,
checkpoint will not cache small discard candidates any more.
echo 0 > /sys/fs/f2fs/vdb/max_small_discards
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
echo 100 > /sys/fs/f2fs/vdb/max_small_discards
rm /mnt/f2fs/file
xfs_io -f /mnt/f2fs/file -c "pwrite 0 2m" -c "fsync"
xfs_io /mnt/f2fs/file -c "fpunch 0 4k"
sync
cat /proc/fs/f2fs/vdb/discard_plist_info |head -2
Before the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 3 1 . . . . . .
After the patch:
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
Discard pend list(Show diacrd_cmd count on each entry, .:not exist):
0 . . . . . . . .
This patch reverts commit 9ac00e7cef ("f2fs: do not issue small discard
commands during checkpoint") in order to fix this issue.
Fixes: 9ac00e7cef ("f2fs: do not issue small discard commands during checkpoint")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The description of max_small_discards is out-of-update in below two
aspects, fix it.
- it is disabled by default
- small discards will be issued during checkpoint
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The sending interval of discard and GC should also
consider direct write requests; filesystem is not
idle if there is direct write.
Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As reported, status debugfs entry shows inconsistent GC stats as below:
GC calls: 6008 (BG: 6161)
- data segments : 3053 (BG: 3053)
- node segments : 2955 (BG: 2955)
Total GC calls is larger than BGGC calls, the reason is:
- f2fs_stat_info.call_count accounts total migrated section count
by f2fs_gc()
- f2fs_stat_info.bg_gc accounts total call times of f2fs_gc() from
background gc_thread
Another issue is gc_foreground_calls sysfs entry shows total GC call
count rather than FGGC call count.
This patch changes as below for fix:
- account GC calls and migrated segment count separately
- support to account migrated section count if it enables large section
mode
- fix to show correct value in gc_foreground_calls sysfs entry
Fixes: fc7100ea2a ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It has checked return value of write_all_xattrs(), remove unneeded
following check condition.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
generic/728 - output mismatch (see /media/fstests/results//generic/728.out.bad)
--- tests/generic/728.out 2023-07-19 07:10:48.362711407 +0000
+++ /media/fstests/results//generic/728.out.bad 2023-07-19 08:39:57.000000000 +0000
QA output created by 728
+Expected ctime to change after setxattr.
+Expected ctime to change after removexattr.
Silence is golden
...
(Run 'diff -u /media/fstests/tests/generic/728.out /media/fstests/results//generic/728.out.bad' to see the entire diff)
generic/729 1s
It needs to update i_ctime after {set,remove}xattr, fix it.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
syzbot reports a f2fs bug as below:
UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3275:19
index 1409 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x1e7/0x2d0 lib/dump_stack.c:106
ubsan_epilogue lib/ubsan.c:217 [inline]
__ubsan_handle_out_of_bounds+0x11c/0x150 lib/ubsan.c:348
inline_data_addr fs/f2fs/f2fs.h:3275 [inline]
__recover_inline_status fs/f2fs/inode.c:113 [inline]
do_read_inode fs/f2fs/inode.c:480 [inline]
f2fs_iget+0x4730/0x48b0 fs/f2fs/inode.c:604
f2fs_fill_super+0x640e/0x80c0 fs/f2fs/super.c:4601
mount_bdev+0x276/0x3b0 fs/super.c:1391
legacy_get_tree+0xef/0x190 fs/fs_context.c:611
vfs_get_tree+0x8c/0x270 fs/super.c:1519
do_new_mount+0x28f/0xae0 fs/namespace.c:3335
do_mount fs/namespace.c:3675 [inline]
__do_sys_mount fs/namespace.c:3884 [inline]
__se_sys_mount+0x2d9/0x3c0 fs/namespace.c:3861
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The issue was bisected to:
commit d48a7b3a72
Author: Chao Yu <chao@kernel.org>
Date: Mon Jan 9 03:49:20 2023 +0000
f2fs: fix to do sanity check on extent cache correctly
The root cause is we applied both v1 and v2 of the patch, v2 is the right
fix, so it needs to revert v1 in order to fix reported issue.
v1:
commit d48a7b3a72 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230109034920.492914-1-chao@kernel.org/
v2:
commit 269d119481 ("f2fs: fix to do sanity check on extent cache correctly")
https://lore.kernel.org/lkml/20230207134808.1827869-1-chao@kernel.org/
Reported-by: syzbot+601018296973a481f302@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/000000000000fcf0690600e4d04d@google.com/
Fixes: d48a7b3a72 ("f2fs: fix to do sanity check on extent cache correctly")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Simplify code pattern of 'folio->index + folio_nr_pages(folio)' by using
the existing helper folio_next_index().
Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now f2fs support four block allocation modes: lfs, adaptive,
fragment:segment, fragment:block. Only lfs mode is allowed with zoned block
device feature.
Fixes: 6691d940b0 ("f2fs: introduce fragment allocation mode mount option")
Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
The commit 25f9080576 ("f2fs: add async reset zone command support")
introduced "async reset zone commands" by calling
__submit_zone_reset_cmd() in async discard operations. However,
__submit_zone_reset_cmd() is called regardless of zone type of discard
target zone. When devices have conventional zones, zone reset commands
are sent to the conventional zones and cause I/O errors.
Avoid the I/O errors by checking that the discard target zone type is
sequential write required. If not, handle the discard operation in same
manner as non-zoned, regular block devices. For that purpose, add a new
helper function f2fs_bdev_index() which gets index of the zone reset
target device.
Fixes: 25f9080576 ("f2fs: add async reset zone command support")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs won't compress non-full cluster in tail of file, let's skip
dirtying and rewrite such cluster during f2fs_ioc_{,de}compress_file.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that,
userspace won't be blocked when manual {,de}compression on large file is
interrupted by signal.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.
Fixes: 3c62be17d4 ("f2fs: support multiple devices")
Fixes: 0718afd47f ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Compression option in inode should not be changed after they have
been used, however, it may happen in below race case:
Thread A Thread B
- f2fs_ioc_set_compress_option
- check f2fs_is_mmap_file()
- check get_dirty_pages()
- check F2FS_HAS_BLOCKS()
- f2fs_file_mmap
- set_inode_flag(FI_MMAP_FILE)
- fault
- do_page_mkwrite
- f2fs_vm_page_mkwrite
- f2fs_get_block_locked
- fault_dirty_shared_page
- set_page_dirty
- update i_compress_algorithm
- update i_log_cluster_size
- update i_cluster_size
Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock,
meanwhile add mmap file check condition in f2fs_may_compress() as well.
Fixes: e1e8debec6 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=216050
Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.
Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_compress_alloc_page() uses mempool to allocate memory, it never
fail, don't handle error case in its callers.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit bfd4766239.
Shinichiro Kawasaki reported:
When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block
devices with 4kb logical block size, I observe mount failure as follows. When
I revert this commit, the failure goes away.
[ 167.781975][ T1555] F2FS-fs (dm-0): IO Block Size: 4 KB
[ 167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint
[ 171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8]
[ 171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000)
[ 171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed: (errno=-5)
The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with
"sbi->log_sectors_per_block". However, I think these two are not equal when the
device has 4k logical block size. The former uses Linux kernel sector size 512
byte. The latter use 512b sector size or 4kb sector size depending on the
device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device
and reflects it to the value sbi->log_sector_size_per_block. This causes
unexpected write pointer calculations in check_zone_write_pointer(). This
resulted in unexpected zone reset and the mount failure.
[1] https://lkml.kernel.org/linux-f2fs-devel/20230711050101.GA19128@lst.de/
[2] https://lore.kernel.org/linux-f2fs-devel/20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/
Cc: stable@vger.kernel.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: bfd4766239 ("f2fs: clean up w/ sbi->log_sectors_per_block")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc848 ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- swiotlb area sizing fixes (Petr Tesarik)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmSq57sLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOh0xAAkwklIxxzXvNlgjvy2hdgPWImPS8tGPDSIsqA9TDD
WDZq89nz/ndnchdPObDvyJXmfBgqa0qCHqopBVPqMKv5a1pKZhrRXYlbajGQQwji
MIqefTLZ/VGw7bDpEivOt+yadwQ3KxuxWWs7/JPKLLReSJ22H8P+jkrK7P7kBL5X
YaMtZG9d86fvFHnQHKdAOlF1iCvnoZDHPcvaZbI6m5mMSZ+HIYqK5pP1MTUEAbIU
MX4ZSI7/mL0q67+kZuM/NCSeq1pH0Cd0D2DGm+8k/y87G81GS6E5Wgg+xO7vYiXf
5ChuwlAO9K9KhH7NIRkKhkad/Ii89ENXSyv5gmPRoKYK5FXajnHSlJTUrZojV6XC
Pbsd9ATVzV0rY61EPyh6G1a+Ttp/pwMp+W0I2fi032GVAePQ/vhB9x9O+2O/3QiC
v80nUSatkciZncWqkguhp3NONsRmLKep3CCQnEAA/gLs27B0avdQeslnqbOOUQKd
Si+djIoel8ONjQ+mW8eFCsVYMH1xFSo0aGsgGe0y2cyBE3DN1AW9eRnOXWT4C1PR
UyDlx8ACs87ojec+YRQFYk2/PbsU7CQiH1pteXvBHcbhiVUAvrtXtg6ANQ+7066P
IIduAZmlHcHb1BhyrSQbAtRllVLIp/l9IAkCSY9SvL0tjh/B5CaRBD5m2Taow5I/
KUI=
=4Lfc
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
boot reordering work
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSqal4ACgkQEsHwGGHe
VUr+pg/+JyqfzyymWYAaPUfwaFH7V8425p8thrZL+OSnDDoAZt5UnPpLB4lYZKWW
u2SlphNSLhuclZ7Wly1zkkPO1J8O88FRCFFBxONtnrQ4WqH2P7f2E6cHgzD4dQRF
RX/pNuLQ1TNYiOHvNvJ3xJvVdAGrcXFBbqupfSig+dQMBKIyuzGu/Jn7Cm0Q+HJK
j9WJWiGNJ+f8WOEbHiTdI89OFcPmUMe2nhtK/I/QIUoCBIiyp3jQ2RilZwY2V7Wu
U5kSQChqp7N+e275TLlOCFGvNW2htCZ5GPc2/nCOkfmnTDTwjVGX8jQr+EqC1pj1
WcueoTjBMw2Drs4/V9ItkGXYqmUE4CK03nGp6uZ2hA5Qo8mSAdzr59A3+I7BbHur
ulbm1i6ZZ0ip9Co080E0JS0F1CIL7ROIQ6HDQz4BUGQ1BbmIhNBmdj7yBJ20nTrr
L7EmwgDsOF2NhKpg5USGrPxJWBvc9ma72CAlHAiPVUgzFIR6Z5DN9TM8aWgZZPDt
RULC1/L/SI2FQmrMnCYhjO7Om0qJFk422cWCVjOA3D/lRo3toFEJ/XopxxXz9FZs
guAIJuFLjDun13hxS9PCGvRCkg2cdVsCykkg1ydAbg2ux99rPDAmmnwYPG7pvxiP
2W0gq43dbQAZlYjRx3gV5sHpUtPCsF+1Lz5jXkldRZJNXD1v1Fk=
=RZFV
-----END PGP SIGNATURE-----
Merge tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
On shutdown or kexec, the kernel tries to park the non-boot CPUs with an
INIT IPI. But the same code path is also used by the crash utility. If the
CPU which panics is not the boot CPU then it sends an INIT IPI to the boot
CPU which resets the machine. Prevent this by validating that the CPU which
runs the stop mechanism is the boot CPU. If not, leave the other CPUs in
HLT.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSqoEcTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYNSEACwo5zgibek27qeMvJGfNztm0qRa4mw
wN0qV31yaNcEfhqL8bMU8n3wvEA+pZBqhaU5fyalY+yxc29jI/j9eda5zR+Fi9e5
kVyFT2M0rVSDLFraoQeD+T/tSSK2MJtswF12ytY5mHzHMCb6Uy9fNCpUiQlB+i81
AcnlKQk9ifZXFdMJPj5E+E6l776T8NZPoYEdFgJloxaYOGTdFJDWDlryx4LD7Urz
Fx/ec8Ug/FYSPl2XzXHugvHjNefxKoomcZ3v3CSZonBcav7Gz6F06HAR5vVRWSHx
4Dlh6zdy+60YKBmkvpb+RJIBMo8aXclwT+tntaoJvGHZ+PNASO6JVz9PvmoNgfWK
Oy2n1K687qIOY6d+yxUZgbZpwXX5bG6kc0xbicUNigGagrYTfd83G5RAfwxNkqsY
23Qw4Ue8uxve4M8iM/FfxKIShuDBiLCIDDIrWDjEkvIAnr1pd+NPUv7kqOTI7Kz/
srNgcOwalypzuS93lgaN1yjRv1mmaPXhdhjy0DwGbC54bKgNzfq+7z75Ibn0dSFF
JUPFVjztB+ymnM6PJ1dR77SvPi+xOi60nw7L+Qu9US4yKkW0NeGiIWVsggNorbU6
UPFSE5gxwFD0w1EZ9W+IDeOZUNhjJUINZsn8txm+tb+oEqTIGRPHPOo0C1dBmLW9
AmDIeHljj0iWIw==
=DOCF
-----END PGP SIGNATURE-----
Merge tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
* Fix an uninitialized variable warning.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZKjUjwAKCRBKO3ySh0YR
pn92AQC4gY9GOyKcc/aiAd/t1u8gGxnFtcN06xh4TdVArMM4/AD/UtEKx9LYuaSF
pyhw5SfzxI555HfXkA8ci/D+BxguVQs=
=/vX1
-----END PGP SIGNATURE-----
Merge tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmSqNkIACgkQiiy9cAdy
T1GXsAwAhYUyjlXZLDsmO+9PjKhM9WRM1IO5myy3P396R0Tzq741f8LM7Lx08qc+
D1701gsnhIrvprem1HjtW6DZzCVnLdpBIYUEnwUr8eDqMpk1VFKug3xSVhIRMih3
Y30dHTgQ0aCLrrh5XHOWhBHJbpq7Wdlh3q0oi8I36Of8e6tGFNo2wI4ud7no4aIj
N222dWOs56FXtVAmgEAuc7U2A40ztMOp7FXrbzhK4FwD5kO+pFkqJcLjG6Bk10ph
Tyg3Wh2TnX+MviOY0xUaN0X50dSoSJPkSUYGkccrIcfVPwEoH7l6j0LNgAVyhG7K
f5EUbM7Td51a1Znj9wX6U9N0UfO/IOZRDFZ7ACckLBBBEzfKYCgYY5dWJ6aVxZHb
bB336f1ObvDiocEabS1SMa//sXUjpOy3Tg8etLCYJpqjWYE8nO7lERoBWGWXkUqy
xO86pGQjYLzkw16R11tzbplv+1HxoGwIuQnOubivv2prn++NZ4Zr2ohBeDlyJc1/
WwF42UfM
=F8D0
-----END PGP SIGNATURE-----
Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZKmgXAAKCRDdBJ7gKXxA
joqDAP0V520Jy0cyJrRMvaQRFMqtVeDOdTpAue7ZOQHSi/LZnAD9EEAxDpYF/V4x
PO27ixXQ4Glm2iYgH7bDX7J73WiA3wg=
=JsYW
-----END PGP SIGNATURE-----
Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few late arriving patches that missed the initial pull request.
It's mostly bug fixes (the dt-bindings is a fix for the initial pull).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZKmvwSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishResAQCPDbBh
omMRBE+W+Vx2TgOJGjo/F+T1D2JjBhLIGpNVggEApJtgrQutAToiCU/qIP9GOTl7
evetzh5boMMuyD2s7ak=
=pi4v
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
* xiic patch should have been in part 1 but slipped through
* mpc patch fixes a build regression from part 1
* nomadik is a fix which needed a rebase after part 1
-----BEGIN PGP SIGNATURE-----
iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmSoblYPHHdzYUBrZXJu
ZWwub3JnAAoJEBQN5MwUoCm2iG0P/RG3yGeV0PzQ2XbKPwIMAQURQ7NiqVVrAnUP
kkduOkPJ9QU+9Qjb/4SRCTxWFVFbqruyOZWB+5hmwHtv4MS3H1ulKaH9bat5bQsF
F28o5pjMVQEhjl+Y1jAtnSOlhprbTds6B1NdgbMFopnQ7ExlB1/RAt0rvZdgsfkt
WeKRQMlne0mpQ7mybp6qsN8ldYBSDu8AmZ0S/RSnCwRwl0AxYYxk5roQAfg0tbaU
N1zWUerNO7lAVoJnFsdeo0CMxK7t3QGHPSZGayargEF0KJHmxfzU14oZ6l28j6wq
mSazjyK5wTeqqa4mU835RO9Ko6zE42u5nM+ok8Ui6a4xWjvafNrjnZkIClMg2eC9
7ESZoz+Rtd3kCu3LiWXqPhs9cplHrRtA4M8E4gbTbsdMB8kFRaaiocjHfPksO8SG
QtFXHGV5AgtoBkCC3womZWvY17XOZf40S6NUFafZ8kZFg5WGCWP3rwUZhz/+frSM
QiIdxM+cER6WC6ADnZHdQyFl1fjLc4EdbwoN54E7DRBqB4s0wy+1vU+/qf9JdIuR
zIN0I2OFxnwFEiTfAlTX1RmYy7L/3dtP2Gk8og+W9CxHSwJsURhqIfz/MjNuoh7U
c11jS9QfETBj2GDjRoRxE0zj4Q2rhjdTwFhQfuYp0eI6UsDfYHHB7ql2FwO1+EW1
VkmkX38T
=xwp6
-----END PGP SIGNATURE-----
Merge tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- xiic patch should have been in the original pull but slipped through
- mpc patch fixes a build regression
- nomadik cleanup
* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Drop unused variable
i2c: nomadik: Remove a useless call in the remove function
i2c: xiic: Don't try to handle more interrupt events after error
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Build:
- Allow to generate vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in the
debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZKb4mwAKCRCMstVUGiXM
g1QqAPwKZow/DhAzyN7KvzdNd+SojRGpUMl6RkVphY/9ntDqPAD+L3V5aXLTiC1L
8kUzdpRX5VMjqdR9U7TycUOi4QU40QA=
=dEF1
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next
Pull more perf tools updates from Namhyung Kim:
"These are remaining changes and fixes for this cycle.
Build:
- Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
and skip if the vmlinux has no BTF.
- Replace deprecated clang -target xxx option by --target=xxx.
perf record:
- Print event attributes with well known type and config symbols in
the debug output like below:
# perf record -e cycles,cpu-clock -C0 -vv true
<SNIP>
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1 (PERF_TYPE_SOFTWARE)
size 136
config 0 (PERF_COUNT_SW_CPU_CLOCK)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
read_format ID
disabled 1
inherit 1
freq 1
sample_id_all 1
exclude_guest 1
- Update AMD IBS event error message since it now support per-process
profiling but no priviledge filters.
$ sudo perf record -e ibs_op//k -C 0
Error:
AMD IBS doesn't support privilege filtering. Try again without
the privilege modifiers (like 'k') at the end.
perf lock contention:
- Support CSV style output using -x option
$ sudo perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
- Add --output option to save the data to a file not to be interfered
by other debug messages.
Test:
- Fix event parsing test on ARM where there's no raw PMU nor supports
PERF_PMU_CAP_EXTENDED_HW_TYPE.
- Update the lock contention test case for CSV output.
- Fix a segfault in the daemon command test.
Vendor events (JSON):
- Add has_event() to check if the given event is available on system
at runtime. On Intel machines, some transaction events may not be
present when TSC extensions are disabled.
- Update Intel event metrics.
Misc:
- Sort symbols by name using an external array of pointers instead of
a rbtree node in the symbol. This will save 16-bytes or 24-bytes
per symbol whether the sorting is actually requested or not.
- Fix unwinding DWARF callstacks using libdw when --symfs option is
used"
* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
perf test: Fix event parsing test on Arm
perf evsel amd: Fix IBS error message
perf: unwind: Fix symfs with libdw
perf symbol: Fix uninitialized return value in symbols__find_by_name()
perf test: Test perf lock contention CSV output
perf lock contention: Add --output option
perf lock contention: Add -x option for CSV style output
perf lock: Remove stale comments
perf vendor events intel: Update tigerlake to 1.13
perf vendor events intel: Update skylakex to 1.31
perf vendor events intel: Update skylake to 57
perf vendor events intel: Update sapphirerapids to 1.14
perf vendor events intel: Update icelakex to 1.21
perf vendor events intel: Update icelake to 1.19
perf vendor events intel: Update cascadelakex to 1.19
perf vendor events intel: Update meteorlake to 1.03
perf vendor events intel: Add rocketlake events/metrics
perf vendor metrics intel: Make transaction metrics conditional
perf jevents: Support for has_event function
...
Fixes for different bitmap pieces:
- lib/test_bitmap: increment failure counter properly
The tests that don't use expect_eq() macro to determine that a test is
failured must increment failed_tests explicitly.
- lib/bitmap: drop optimization of bitmap_{from,to}_arr64
bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE
architectures when it's wired to bitmap_copy_clear_tail().
- nodemask: Drop duplicate check in for_each_node_mask()
As the return value type of first_node() became unsigned, the node >= 0
became unnecessary.
- cpumask: fix function description kernel-doc notation
- MAINTAINERS: Add bits.h to the BITMAP API record
- MAINTAINERS: Add bitfield.h to the BITMAP API record
Add linux/bits.h and linux/bitfield.h for visibility
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmShrd0ACgkQsUSA/Tof
vsg5RAv/YyedOccCkLtd4D+l+jF/a1qd64u6HfONkYzdoSeKIBiAlKukpHOjJjky
ii5vSwEMzW34MA6wBylmY4078OXEwEu9SUsF/+5jGvqJ1ABCZkwFoMuAavRHkHhc
xE/es6XvpSbZ4U4GHnyaUbQCuEkpW21U644QdSWqjVPDHmo32EgtNG7hTaZfCRNh
VNa8H+tbgi5rQUMBeyxppJjyynGKwut6pX4OcOi2lq1onz8zzHx8H4AJC2v8Yg0r
2ZRS1FfAuk2Pvp7JFQOExcxf0x/Ph3zp8dqIgfo2pSMAi/0ZjnTkAPxCYIGsWG5u
TbfO3rHviHtKAN0q5/6xp939G69flW5xSica+IvtuXSIS/JbllwDN4mrexYxppWH
Euj/ixJe79Dn+jtOgG17p4cp70yAcM1IQfyWH25Q8Jej+C/gN8WB0UXD4K7jTXrn
9rGID28WbuclAWzJDmFQTpZutG1GHjM7rytit+if08gz065vUhTllhJC+vCHQojY
clS4E2DW
=nZF4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux
Pull bitmap updates from Yury Norov:
"Fixes for different bitmap pieces:
- lib/test_bitmap: increment failure counter properly
The tests that don't use expect_eq() macro to determine that a test
is failured must increment failed_tests explicitly.
- lib/bitmap: drop optimization of bitmap_{from,to}_arr64
bitmap_{from,to}_arr64() optimization is overly optimistic
on 32-bit LE architectures when it's wired to
bitmap_copy_clear_tail().
- nodemask: Drop duplicate check in for_each_node_mask()
As the return value type of first_node() became unsigned, the node
>= 0 became unnecessary.
- cpumask: fix function description kernel-doc notation
- MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record
Add linux/bits.h and linux/bitfield.h for visibility"
* tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
MAINTAINERS: Add bitfield.h to the BITMAP API record
MAINTAINERS: Add bits.h to the BITMAP API record
cpumask: fix function description kernel-doc notation
nodemask: Drop duplicate check in for_each_node_mask()
lib/bitmap: drop optimization of bitmap_{from,to}_arr64
lib/test_bitmap: increment failure counter properly
The Smatch static checker reports the following warnings:
lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context
Indeed, dhry() does sleeping allocations inside the non-preemptable
section delimited by get_cpu()/put_cpu().
Fix this by using atomic allocations instead.
Add error handling, as atomic these allocations may fail.
Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
Fixes: 13684e966d ("lib: dhry: fix unstable smp_processor_id(_) usage")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountain
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit 946fa0dbf2 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.
However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature. Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones. As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.
To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether. Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook. Common slab code
seems like a more appropriate place for a slub_debug check anyway.
Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com
Fixes: 946fa0dbf2 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: kasan-dev@googlegroups.com
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Commit bb6e04a173 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.
As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.
Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).
Link: https://lkml.kernel.org/r/8c9e0251c2b8b81016255709d4ec42942dcaf018.1688431866.git.andreyknvl@google.com
Fixes: bb6e04a173 ("kasan: use internal prototypes matching gcc-13 builtins")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
I am going to lose my vrull.eu address at the end of july, and while
adding it to mailmap I also realised that there are more old addresses
from me dangling, so update .mailmap for all of them.
Link: https://lkml.kernel.org/r/20230704163919.1136784-3-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Patch series "Update .mailmap for my work address and fix manpage".
While updating mailmap for the going-away address, I also found that on
current systems the manpage linked from the header comment changed.
And in fact it looks like the git mailmap feature got its own manpage.
This patch (of 2):
On recent systems the git-shortlog manpage only tells people to
See gitmailmap(5)
So instead of sending people on a scavenger hunt, put that info into the
header directly. Though keep the old reference around for older systems.
Link: https://lkml.kernel.org/r/20230704163919.1136784-1-heiko@sntech.de
Link: https://lkml.kernel.org/r/20230704163919.1136784-2-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
commit dd0ff4d12d ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem") fix an overlaps existing problem of kmemleak. But the
problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in
this case, free_bootmem_page() will call free_reserved_page() directly.
Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when
HAVE_BOOTMEM_INFO_NODE is disabled.
Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com
Fixes: f41f2ed43c ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Add linux-next info to MAINTAINERS for ease of finding this data.
Link: https://lkml.kernel.org/r/20230704054410.12527-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1. Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.
Dave added:
: XFS is the only filesystem this would affect, right? AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
:
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.
Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes: 793917d997 ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>