OpenCloudOS-Kernel/fs
Yang Shi 16380f52b7 mm: page_ref: remove folio_try_get_rcu()
commit fa2690af573dfefb47ba6eef888797a64b6b5f3c upstream.

The below bug was reported on a non-SMP kernel:

[  275.267158][ T4335] ------------[ cut here ]------------
[  275.267949][ T4335] kernel BUG at include/linux/page_ref.h:275!
[  275.268526][ T4335] invalid opcode: 0000 [#1] KASAN PTI
[  275.269001][ T4335] CPU: 0 PID: 4335 Comm: trinity-c3 Not tainted 6.7.0-rc4-00061-gefa7df3e3bb5 #1
[  275.269787][ T4335] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  275.270679][ T4335] RIP: 0010:try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[  275.272813][ T4335] RSP: 0018:ffffc90005dcf650 EFLAGS: 00010202
[  275.273346][ T4335] RAX: 0000000000000246 RBX: ffffea00066e0000 RCX: 0000000000000000
[  275.274032][ T4335] RDX: fffff94000cdc007 RSI: 0000000000000004 RDI: ffffea00066e0034
[  275.274719][ T4335] RBP: ffffea00066e0000 R08: 0000000000000000 R09: fffff94000cdc006
[  275.275404][ T4335] R10: ffffea00066e0037 R11: 0000000000000000 R12: 0000000000000136
[  275.276106][ T4335] R13: ffffea00066e0034 R14: dffffc0000000000 R15: ffffea00066e0008
[  275.276790][ T4335] FS:  00007fa2f9b61740(0000) GS:ffffffff89d0d000(0000) knlGS:0000000000000000
[  275.277570][ T4335] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  275.278143][ T4335] CR2: 00007fa2f6c00000 CR3: 0000000134b04000 CR4: 00000000000406f0
[  275.278833][ T4335] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  275.279521][ T4335] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  275.280201][ T4335] Call Trace:
[  275.280499][ T4335]  <TASK>
[ 275.280751][ T4335] ? die (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:434 arch/x86/kernel/dumpstack.c:447)
[ 275.281087][ T4335] ? do_trap (arch/x86/kernel/traps.c:112 arch/x86/kernel/traps.c:153)
[ 275.281463][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.281884][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.282300][ T4335] ? do_error_trap (arch/x86/kernel/traps.c:174)
[ 275.282711][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.283129][ T4335] ? handle_invalid_op (arch/x86/kernel/traps.c:212)
[ 275.283561][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.283990][ T4335] ? exc_invalid_op (arch/x86/kernel/traps.c:264)
[ 275.284415][ T4335] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568)
[ 275.284859][ T4335] ? try_get_folio (include/linux/page_ref.h:275 (discriminator 3) mm/gup.c:79 (discriminator 3))
[ 275.285278][ T4335] try_grab_folio (mm/gup.c:148)
[ 275.285684][ T4335] __get_user_pages (mm/gup.c:1297 (discriminator 1))
[ 275.286111][ T4335] ? __pfx___get_user_pages (mm/gup.c:1188)
[ 275.286579][ T4335] ? __pfx_validate_chain (kernel/locking/lockdep.c:3825)
[ 275.287034][ T4335] ? mark_lock (kernel/locking/lockdep.c:4656 (discriminator 1))
[ 275.287416][ T4335] __gup_longterm_locked (mm/gup.c:1509 mm/gup.c:2209)
[ 275.288192][ T4335] ? __pfx___gup_longterm_locked (mm/gup.c:2204)
[ 275.288697][ T4335] ? __pfx_lock_acquire (kernel/locking/lockdep.c:5722)
[ 275.289135][ T4335] ? __pfx___might_resched (kernel/sched/core.c:10106)
[ 275.289595][ T4335] pin_user_pages_remote (mm/gup.c:3350)
[ 275.290041][ T4335] ? __pfx_pin_user_pages_remote (mm/gup.c:3350)
[ 275.290545][ T4335] ? find_held_lock (kernel/locking/lockdep.c:5244 (discriminator 1))
[ 275.290961][ T4335] ? mm_access (kernel/fork.c:1573)
[ 275.291353][ T4335] process_vm_rw_single_vec+0x142/0x360
[ 275.291900][ T4335] ? __pfx_process_vm_rw_single_vec+0x10/0x10
[ 275.292471][ T4335] ? mm_access (kernel/fork.c:1573)
[ 275.292859][ T4335] process_vm_rw_core+0x272/0x4e0
[ 275.293384][ T4335] ? hlock_class (arch/x86/include/asm/bitops.h:227 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 kernel/locking/lockdep.c:228)
[ 275.293780][ T4335] ? __pfx_process_vm_rw_core+0x10/0x10
[ 275.294350][ T4335] process_vm_rw (mm/process_vm_access.c:284)
[ 275.294748][ T4335] ? __pfx_process_vm_rw (mm/process_vm_access.c:259)
[ 275.295197][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
[ 275.295634][ T4335] __x64_sys_process_vm_readv (mm/process_vm_access.c:291)
[ 275.296139][ T4335] ? syscall_enter_from_user_mode (kernel/entry/common.c:94 kernel/entry/common.c:112)
[ 275.296642][ T4335] do_syscall_64 (arch/x86/entry/common.c:51 (discriminator 1) arch/x86/entry/common.c:82 (discriminator 1))
[ 275.297032][ T4335] ? __task_pid_nr_ns (include/linux/rcupdate.h:306 (discriminator 1) include/linux/rcupdate.h:780 (discriminator 1) kernel/pid.c:504 (discriminator 1))
[ 275.297470][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
[ 275.297988][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.298389][ T4335] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4300 kernel/locking/lockdep.c:4359)
[ 275.298906][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.299304][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.299703][ T4335] ? do_syscall_64 (arch/x86/include/asm/cpufeature.h:171 arch/x86/entry/common.c:97)
[ 275.300115][ T4335] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129)

This BUG is the VM_BUG_ON(!in_atomic() && !irqs_disabled()) assertion in
folio_ref_try_add_rcu() for non-SMP kernel.

The process_vm_readv() calls GUP to pin the THP. An optimization for
pinning THP instroduced by commit 57edfcfd34 ("mm/gup: accelerate thp
gup even for "pages != NULL"") calls try_grab_folio() to pin the THP,
but try_grab_folio() is supposed to be called in atomic context for
non-SMP kernel, for example, irq disabled or preemption disabled, due to
the optimization introduced by commit e286781d5f ("mm: speculative
page references").

The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") is not actually the root cause although it was bisected to.
It just makes the problem exposed more likely.

The follow up discussion suggested the optimization for non-SMP kernel
may be out-dated and not worth it anymore [1].  So removing the
optimization to silence the BUG.

However calling try_grab_folio() in GUP slow path actually is
unnecessary, so the following patch will clean this up.

[1] https://lore.kernel.org/linux-mm/821cf1d6-92b9-4ac4-bacc-d8f2364ac14f@paulmck-laptop/

Link: https://lkml.kernel.org/r/20240625205350.1777481-1-yang@os.amperecomputing.com
Fixes: 57edfcfd34 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Cc: <stable@vger.kernel.org>	[6.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-07-25 09:50:56 +02:00
..
9p 9p: add missing locking around taking dentry fid list 2024-06-16 13:47:37 +02:00
adfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
affs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
afs mm: page_ref: remove folio_try_get_rcu() 2024-07-25 09:50:56 +02:00
autofs v6.6-vfs.autofs 2023-08-28 11:39:14 -07:00
befs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
bfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
btrfs btrfs: qgroup: fix quota root leak after quota disable failure 2024-07-25 09:50:51 +02:00
cachefiles cachefiles: make on-demand read killable 2024-07-25 09:50:45 +02:00
ceph ceph: redirty page before returning AOP_WRITEPAGE_ACTIVATE 2024-04-27 17:11:29 +02:00
coda v6.6-vfs.ctime 2023-08-28 09:31:32 -07:00
configfs configfs: convert to ctime accessor functions 2023-07-13 10:28:05 +02:00
cramfs v6.6-vfs.super 2023-08-28 11:04:18 -07:00
crypto
debugfs debugfs: fix automount d_fsdata usage 2024-01-20 11:51:37 +01:00
devpts v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
dlm dlm: fix user space lock decision to copy lvb 2024-06-12 11:11:38 +02:00
ecryptfs ecryptfs: Fix buffer size for tag 66 packet 2024-06-12 11:11:31 +02:00
efivarfs efivarfs: Request at most 512 bytes for variable names 2024-03-06 14:48:41 +00:00
efs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
erofs erofs: ensure m_llen is reset to 0 if metadata is invalid 2024-07-25 09:50:54 +02:00
exfat exfat: support handle zero-size directory 2023-11-28 17:19:44 +00:00
exportfs exportfs: remove kernel-doc warnings in exportfs 2023-08-29 17:45:22 -04:00
ext2 quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:19:46 -04:00
ext4 ext4: avoid ptr null pointer dereference 2024-07-18 13:21:25 +02:00
f2fs f2fs: Add inline to f2fs_build_fault_attr() stub 2024-07-11 12:49:15 +02:00
fat fat: fix uninitialized field in nostale filehandles 2024-04-03 15:28:20 +02:00
freevxfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
fscache netfs, fscache: Prevent Oops in fscache_put_cache() 2024-01-31 16:19:01 -08:00
fuse fuse: fix leaked ENOSYS error on first statx call 2024-04-27 17:11:42 +02:00
gfs2 gfs2: Fix NULL pointer dereference in gfs2_log_flush 2024-07-05 09:33:52 +02:00
hfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
hfsplus hfsplus: fix uninit-value in copy_name 2024-07-25 09:50:56 +02:00
hostfs hostfs: convert to ctime accessor functions 2023-07-24 10:30:00 +02:00
hpfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
hugetlbfs mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE 2024-02-23 09:25:16 +01:00
iomap iomap: Fix iomap_adjust_read_range for plen calculation 2024-07-25 09:50:46 +02:00
isofs isofs: handle CDs with bad root inode but good Joliet root directory 2024-04-13 13:07:34 +02:00
jbd2 jbd2: fix soft lockup in journal_finish_inode_data_buffers() 2024-01-20 11:51:43 +01:00
jffs2 jffs2: Fix potential illegal address access in jffs2_free_inode 2024-07-11 12:49:09 +02:00
jfs jfs: xattr: fix buffer overflow for invalid xattr 2024-06-21 14:38:24 +02:00
kernfs kernfs: RCU protect kernfs_nodes and avoid kernfs_idr_lock in kernfs_find_and_get_node_by_id() 2024-04-13 13:07:38 +02:00
lockd SUNRPC: Add enum svc_auth_status 2023-08-29 17:45:22 -04:00
minix for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
netfs netfs: Only call folio_start_fscache() one time for each folio 2023-09-18 12:03:46 -07:00
nfs nfs: don't invalidate dentries on transient errors 2024-07-25 09:50:45 +02:00
nfs_common
nfsd knfsd: LOOKUP can return an illegal error value 2024-06-21 14:38:40 +02:00
nilfs2 nilfs2: fix kernel bug on rename operation of broken directory 2024-07-18 13:21:25 +02:00
nls nls: Hide new NLS_UCS2_UTILS 2023-08-31 12:07:34 -05:00
notify fanotify: limit reporting of event with non-decodeable file handles 2023-10-19 16:19:20 +02:00
ntfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
ntfs3 fs/ntfs3: Mark volume as dirty if xattr is broken 2024-07-11 12:49:19 +02:00
ocfs2 ocfs2: fix DIO failure due to insufficient transaction credits 2024-07-05 09:33:54 +02:00
omfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
openpromfs openpromfs: finish conversion to the new mount API 2024-06-12 11:11:30 +02:00
orangefs orangefs: fix out-of-bounds fsid access 2024-07-11 12:49:08 +02:00
overlayfs ovl: fix encoding fid for lower only root 2024-06-27 13:49:12 +02:00
proc fs/proc: fix softlockup in __read_vmcore 2024-06-21 14:38:41 +02:00
pstore pstore/zone: Add a null pointer check to the psz_kmsg_read 2024-04-13 13:07:31 +02:00
qnx4 for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
qnx6 for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
quota quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:19:46 -04:00
ramfs ramfs: convert to ctime accessor functions 2023-07-24 10:30:04 +02:00
reiserfs quota: Properly annotate i_dquot arrays with __rcu 2024-03-26 18:19:46 -04:00
romfs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
smb mm: page_ref: remove folio_try_get_rcu() 2024-07-25 09:50:56 +02:00
squashfs Squashfs: check the inode number is not the invalid value of zero 2024-05-02 16:32:41 +02:00
sysfs fs: sysfs: Fix reference leak in sysfs_break_active_protection() 2024-04-27 17:11:41 +02:00
sysv sysv: don't call sb_bread() with pointers_lock held 2024-04-13 13:07:34 +02:00
tracefs eventfs: Update all the eventfs_inodes from the events descriptor 2024-06-21 14:38:22 +02:00
ubifs ubifs: Set page uptodate in the correct place 2024-04-03 15:28:20 +02:00
udf udf: udftime: prevent overflow in udf_disk_stamp_to_time() 2024-06-27 13:49:04 +02:00
ufs for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
unicode
vboxsf vboxsf: explicitly deny setlease attempts 2024-05-17 12:02:13 +02:00
verity fsverity: use register_sysctl_init() to avoid kmemleak warning 2024-06-16 13:47:33 +02:00
xfs xfs: allow cross-linking special files without project quota 2024-06-21 14:38:45 +02:00
zonefs zonefs: Improve error handling 2024-02-23 09:25:13 +01:00
Kconfig for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
Kconfig.binfmt riscv: support the elf-fdpic binfmt loader 2023-08-23 14:17:43 -07:00
Makefile fs: add CONFIG_BUFFER_HEAD 2023-08-02 09:13:09 -06:00
aio.c fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion 2024-04-03 15:28:44 +02:00
anon_inodes.c
attr.c v6.6-vfs.misc 2023-08-28 10:17:14 -07:00
bad_inode.c fs: drop the timespec64 argument from update_time 2023-08-11 09:04:57 +02:00
binfmt_elf.c Merge branch 'expand-stack' 2023-06-28 20:35:21 -07:00
binfmt_elf_fdpic.c fs: binfmt_elf_efpic: fix personality for ELF-FDPIC 2023-09-29 17:20:45 -07:00
binfmt_elf_test.c
binfmt_flat.c
binfmt_misc.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
binfmt_script.c
buffer.c iomap: add a workaround for racy i_size updates on block devices 2023-09-25 08:55:00 -07:00
char_dev.c
compat_binfmt_elf.c
coredump.c v6.5/vfs.misc 2023-06-26 09:50:21 -07:00
d_path.c
dax.c mm: convert DAX lock/unlock page to lock/unlock folio 2024-01-10 17:16:53 +01:00
dcache.c fs: better handle deep ancestor chains in is_subdir() 2024-07-25 09:50:54 +02:00
direct-io.c - Yosry Ahmed brought back some cgroup v1 stats in OOM logs. 2023-06-28 10:28:11 -07:00
drop_caches.c fs: drop_caches: draining pages before dropping caches 2023-08-18 10:12:11 -07:00
eventfd.c eventfd: prevent underflow for eventfd semaphores 2023-07-11 11:41:34 +02:00
eventpoll.c epoll: be better about file lifetimes 2024-06-12 11:11:30 +02:00
exec.c exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack() 2024-04-03 15:28:55 +02:00
fcntl.c fs: Fix rw_hint validation 2024-03-26 18:19:17 -04:00
fhandle.c do_sys_name_to_handle(): use kzalloc() to fix kernel-infoleak 2024-03-26 18:19:15 -04:00
file.c fs/file: fix the check in find_next_fd() 2024-07-25 09:50:45 +02:00
file_table.c fs: use __fput_sync in close(2) 2023-08-08 19:36:51 +02:00
filesystems.c
fs-writeback.c fs/writeback: bail out if there is no more inodes for IO and queued once 2024-06-27 13:49:00 +02:00
fs_context.c fs: factor out vfs_parse_monolithic_sep() helper 2023-10-12 18:53:36 +03:00
fs_parser.c
fs_pin.c
fs_struct.c kill do_each_thread() 2023-08-21 13:46:25 -07:00
fs_types.c
fsopen.c fs: add FSCONFIG_CMD_CREATE_EXCL 2023-08-14 18:48:02 +02:00
init.c
inode.c filemap: add a per-mapping stable writes flag 2023-12-03 07:33:03 +01:00
internal.h for-6.6/block-2023-08-28 2023-08-29 20:21:42 -07:00
ioctl.c lsm: new security_file_ioctl_compat() hook 2024-01-31 16:18:54 -08:00
kernel_read_file.c fs: Fix kernel-doc warnings 2023-08-19 12:12:12 +02:00
libfs.c fs: new accessor methods for atime and mtime 2024-01-05 15:19:40 +01:00
locks.c filelock: Remove locks reliably when fcntl/close race is detected 2024-07-25 09:50:39 +02:00
mbcache.c
mnt_idmapping.c
mount.h
mpage.c
namei.c rename(): fix the locking of subdirectories 2024-01-31 16:18:57 -08:00
namespace.c fs: relax mount_setattr() permission checks 2024-02-23 09:25:15 +01:00
nsfs.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
open.c ftruncate: pass a signed offset 2024-07-05 09:34:04 +02:00
pipe.c fs/pipe: Fix lockdep false-positive in watchqueue pipe_write() 2024-04-10 16:35:57 +02:00
pnode.c
pnode.h
posix_acl.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
proc_namespace.c
read_write.c fs: Fix one kernel-doc comment 2023-08-15 08:32:45 +02:00
readdir.c vfs: get rid of old '->iterate' directory operation 2023-08-06 15:08:35 +02:00
remap_range.c
select.c fs/select: rework stack allocation hack for clang 2024-03-26 18:19:17 -04:00
seq_file.c
signalfd.c
splice.c - Some swap cleanups from Ma Wupeng ("fix WARN_ON in add_to_avail_list") 2023-08-29 14:25:26 -07:00
stack.c fs: convert to ctime accessor functions 2023-07-13 10:28:04 +02:00
stat.c fs: Pass AT_GETATTR_NOSEC flag to getattr interface function 2023-12-03 07:33:03 +01:00
statfs.c
super.c fs: export sget_dev() 2023-08-31 12:47:15 +02:00
sync.c
sysctls.c
timerfd.c
userfaultfd.c Fix userfaultfd_api to return EINVAL as expected 2024-07-18 13:21:22 +02:00
utimes.c
xattr.c tmpfs,xattr: GFP_KERNEL_ACCOUNT for simple xattrs 2023-08-22 10:57:46 +02:00