Commit Graph

739 Commits

Author SHA1 Message Date
Jeff Layton c62ebd3501 f2fs: convert to ctime accessor functions
In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-24 10:29:58 +02:00
Linus Torvalds 73a3fcdaa7 f2fs update for 6.5-rc1
In this cycle, we've mainly investigated the zoned block device support along
 with patches such as correcting write pointers between f2fs and storage, adding
 asynchronous zone reset flow, and managing the number of open zones. Other than
 them, f2fs adds another mount option, "errors=x" to specify how to handle when
 it detects an unexpected behavior at runtime.
 
 Enhancement:
  - support errors=remount-ro|continue|panic mountoption
  - enforce some inode flag policies
  - allow .tmp compression given extensions
  - add some ioctls to manage the f2fs compression
  - improve looped node chain flow
  - avoid issuing small-sized discard commands during checkpoint
  - implement an asynchronous zone reset
 
 Bug fix:
  - fix deadlock in xattr and inode page lock
  - fix and add sanity check in some error paths
  - fix to avoid NULL pointer dereference f2fs_write_end_io() along with put_super
  - set proper flags to quota files
  - fix potential deadlock due to unpaired node_write lock use
  - fix over-estimating free section during FG GC
  - fix the wrong condition to determine atomic context
 
 As usual, also there are a number of patches having code refactoring and minor
 clean-ups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmSlracACgkQQBSofoJI
 UNJETA//RhFkVGbKlZ7cAuB6xwaLPmRi+aUSn3/rBbdq6CjBOslClEYwW981Fe19
 q6VSIX7Zt5RKZg2+meHLhE28rGknLaZ4wFC/F4BDjoykN119g+KXkypjf+OVbatK
 zLImOzXVKjAWJevpYCENcUQY/xI2kNtg3csp6kq9GQoZCD2US/wzzI0QF6xCQ1q3
 WZpHHy2fi6Lry2xGEbDLKlxg9e5nDqhOSp6S/taF+w+RyUMlTcoIU2fIWT0iZ2kI
 taFe+PWHHkJg2oJcgZ+hx5ZACGb1BWdHg8n1N0MK/IiNbA6CWe+iSM9T0TtctM5V
 Gkz4233bFR976O27Bu2znqil+AH3ECZ/F2HbRbh5vTFJIhUMwNQ1GJTgDqE5Vf8h
 R4W/ejbBSLM/g4TK5boyWrrKpAAVmhFhSj+OBboIlFFKchP4RKaqL7Az+m0tGSD3
 0uCVAFtzmctBmgJ9ko+dxwwFwbbGiG6MDYeGIODMdqFQMQrcGqPoOrcWbj2hPoaW
 LRz/OzA8N0fQUfvCH6E31Ypd6cBFrG++FgtFA4lsv10KNsJUOx4MbEWxF5HobV3t
 axpcRsOOb95hAmqenY3sWyiR+IcIEAYlzx6D6QEbNPe47fuLxr9jx52Ollw6RDfj
 RvkaxzAIDeESMWWNftzJ8r8Wt6RzoBv5JjBkFIMbCz28V3v9R44=
 =TAG3
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've mainly investigated the zoned block device
  support along with patches such as correcting write pointers between
  f2fs and storage, adding asynchronous zone reset flow, and managing
  the number of open zones.

  Other than them, f2fs adds another mount option, "errors=x" to specify
  how to handle when it detects an unexpected behavior at runtime.

  Enhancements:
   - support 'errors=remount-ro|continue|panic' mount option
   - enforce some inode flag policies
   - allow .tmp compression given extensions
   - add some ioctls to manage the f2fs compression
   - improve looped node chain flow
   - avoid issuing small-sized discard commands during checkpoint
   - implement an asynchronous zone reset

  Bug fixes:
   - fix deadlock in xattr and inode page lock
   - fix and add sanity check in some error paths
   - fix to avoid NULL pointer dereference f2fs_write_end_io() along
     with put_super
   - set proper flags to quota files
   - fix potential deadlock due to unpaired node_write lock use
   - fix over-estimating free section during FG GC
   - fix the wrong condition to determine atomic context

  As usual, also there are a number of patches with code refactoring and
  minor clean-ups"

* tag 'f2fs-for-6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: fix to do sanity check on direct node in truncate_dnode()
  f2fs: only set release for file that has compressed data
  f2fs: fix compile warning in f2fs_destroy_node_manager()
  f2fs: fix error path handling in truncate_dnode()
  f2fs: fix deadlock in i_xattr_sem and inode page lock
  f2fs: remove unneeded page uptodate check/set
  f2fs: update mtime and ctime in move file range method
  f2fs: compress tmp files given extension
  f2fs: refactor struct f2fs_attr macro
  f2fs: convert to use sbi directly
  f2fs: remove redundant assignment to variable err
  f2fs: do not issue small discard commands during checkpoint
  f2fs: check zone write pointer points to the end of zone
  f2fs: add f2fs_ioc_get_compress_blocks
  f2fs: cleanup MIN_INLINE_XATTR_SIZE
  f2fs: add helper to check compression level
  f2fs: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method
  f2fs: do more sanity check on inode
  f2fs: compress: fix to check validity of i_compress_flag field
  f2fs: add sanity compress level check for compressed file
  ...
2023-07-05 14:14:37 -07:00
Sheng Yong dde38c03b3 f2fs: cleanup MIN_INLINE_XATTR_SIZE
Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26 06:07:10 -07:00
Sheng Yong c571fbb5b5 f2fs: add helper to check compression level
This patch adds a helper function to check if compression level is
valid.

Signed-off-by: Sheng Yong <shengyong@oppo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26 06:07:09 -07:00
Jaegeuk Kim 00e120b5e4 f2fs: assign default compression level
Let's avoid any confusion from assigning compress_level=0 for LZ4HC and ZSTD.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26 06:07:09 -07:00
Chao Yu ccf3ff2b30 f2fs: introduce F2FS_QUOTA_DEFAULT_FL for cleanup
This patch adds F2FS_QUOTA_DEFAULT_FL to include two default flags:
F2FS_NOATIME_FL and F2FS_IMMUTABLE_FL, and use it to clean up codes.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-26 06:07:08 -07:00
Chao Yu 5079e1c0c8 f2fs: avoid dead loop in f2fs_issue_checkpoint()
generic/082 reports a bug as below:

__schedule+0x332/0xf60
schedule+0x6f/0xf0
schedule_timeout+0x23b/0x2a0
wait_for_completion+0x8f/0x140
f2fs_issue_checkpoint+0xfe/0x1b0
f2fs_sync_fs+0x9d/0xb0
sync_filesystem+0x87/0xb0
dquot_load_quota_sb+0x41b/0x460
dquot_load_quota_inode+0xa5/0x130
dquot_quota_on+0x4b/0x60
f2fs_quota_on+0xe3/0x1b0
do_quotactl+0x483/0x700
__x64_sys_quotactl+0x15c/0x310
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc

The root casue is race case as below:

Thread A			Kworker			IRQ
- write()
: write data to quota.user file

				- writepages
				 - f2fs_submit_page_write
				  - __is_cp_guaranteed return false
				  - inc_page_count(F2FS_WB_DATA)
				 - submit_bio
- quotactl(Q_QUOTAON)
 - f2fs_quota_on
  - dquot_quota_on
   - dquot_load_quota_inode
    - vfs_setup_quota_inode
    : inode->i_flags |= S_NOQUOTA
							- f2fs_write_end_io
							 - __is_cp_guaranteed return true
							 - dec_page_count(F2FS_WB_CP_DATA)
    - dquot_load_quota_sb
     - f2fs_sync_fs
      - f2fs_issue_checkpoint
       - do_checkpoint
        - f2fs_wait_on_all_pages(F2FS_WB_CP_DATA)
        : loop due to F2FS_WB_CP_DATA count is negative

Calling filemap_fdatawrite() and filemap_fdatawait() to keep all data
clean before quota file setup.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:37:36 -07:00
Chao Yu 20872584b8 f2fs: fix to drop all dirty meta/node pages during umount()
For cp error case, there will be dirty meta/node pages remained after
f2fs_write_checkpoint() in f2fs_put_super(), drop them explicitly, and
do sanity check on reference count of dirty pages and inflight IOs.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:04:09 -07:00
Chao Yu 901c12d144 f2fs: flush error flags in workqueue
In IRQ context, it wakes up workqueue to record errors into on-disk
superblock fields rather than in-memory fields.

Fixes: 1aa161e431 ("f2fs: fix scheduling while atomic in decompression path")
Fixes: 95fa90c9e5 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:04:08 -07:00
Chao Yu 458c15dfbc f2fs: don't reset unchangable mount option in f2fs_remount()
syzbot reports a bug as below:

general protection fault, probably for non-canonical address 0xdffffc0000000009: 0000 [#1] PREEMPT SMP KASAN
RIP: 0010:__lock_acquire+0x69/0x2000 kernel/locking/lockdep.c:4942
Call Trace:
 lock_acquire+0x1e3/0x520 kernel/locking/lockdep.c:5691
 __raw_write_lock include/linux/rwlock_api_smp.h:209 [inline]
 _raw_write_lock+0x2e/0x40 kernel/locking/spinlock.c:300
 __drop_extent_tree+0x3ac/0x660 fs/f2fs/extent_cache.c:1100
 f2fs_drop_extent_tree+0x17/0x30 fs/f2fs/extent_cache.c:1116
 f2fs_insert_range+0x2d5/0x3c0 fs/f2fs/file.c:1664
 f2fs_fallocate+0x4e4/0x6d0 fs/f2fs/file.c:1838
 vfs_fallocate+0x54b/0x6b0 fs/open.c:324
 ksys_fallocate fs/open.c:347 [inline]
 __do_sys_fallocate fs/open.c:355 [inline]
 __se_sys_fallocate fs/open.c:353 [inline]
 __x64_sys_fallocate+0xbd/0x100 fs/open.c:353
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is race condition as below:
- since it tries to remount rw filesystem, so that do_remount won't
call sb_prepare_remount_readonly to block fallocate, there may be race
condition in between remount and fallocate.
- in f2fs_remount(), default_options() will reset mount option to default
one, and then update it based on result of parse_options(), so there is
a hole which race condition can happen.

Thread A			Thread B
- f2fs_fill_super
 - parse_options
  - clear_opt(READ_EXTENT_CACHE)

- f2fs_remount
 - default_options
  - set_opt(READ_EXTENT_CACHE)
				- f2fs_fallocate
				 - f2fs_insert_range
				  - f2fs_drop_extent_tree
				   - __drop_extent_tree
				    - __may_extent_tree
				     - test_opt(READ_EXTENT_CACHE) return true
				    - write_lock(&et->lock) access NULL pointer
 - parse_options
  - clear_opt(READ_EXTENT_CACHE)

Cc: <stable@vger.kernel.org>
Reported-by: syzbot+d015b6c2fbb5c383bf08@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/20230522124203.3838360-1-chao@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:04:08 -07:00
Chao Yu 90b7c4b748 f2fs: fix to set noatime and immutable flag for quota file
We should set noatime bit for quota files, since no one cares about
atime of quota file, and we should set immutalbe bit as well, due to
nobody should write to the file through exported interfaces.

Meanwhile this patch use inode_lock to avoid race condition during
inode->i_flags, f2fs_inode->i_flags update.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-06-12 13:04:08 -07:00
Christoph Hellwig 05bdb99653 block: replace fmode_t with a block-specific type for block open flags
The only overlap between the block open flags mapped into the fmode_t and
other uses of fmode_t are FMODE_READ and FMODE_WRITE.  Define a new
blk_mode_t instead for use in blkdev_get_by_{dev,path}, ->open and
->ioctl and stop abusing fmode_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-28-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:05 -06:00
Christoph Hellwig 81b1fb7d17 fs: remove sb->s_mode
There is no real need to store the open mode in the super_block now.
It is only used by f2fs, which can easily recalculate it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230608110258.189493-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig 2736e8eeb0 block: use the holder as indication for exclusive opens
The current interface for exclusive opens is rather confusing as it
requires both the FMODE_EXCL flag and a holder.  Remove the need to pass
FMODE_EXCL and just key off the exclusive open off a non-NULL holder.

For blkdev_put this requires adding the holder argument, which provides
better debug checking that only the holder actually releases the hold,
but at the same time allows removing the now superfluous mode argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Acked-by: David Sterba <dsterba@suse.com>		[btrfs]
Acked-by: Jack Wang <jinpu.wang@ionos.com>		[rnbd]
Link: https://lore.kernel.org/r/20230608110258.189493-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-12 08:04:04 -06:00
Christoph Hellwig 0718afd47f block: introduce holder ops
Add a new blk_holder_ops structure, which is passed to blkdev_get_by_* and
installed in the block_device for exclusive claims.  It will be used to
allow the block layer to call back into the user of the block device for
thing like notification of a removed device or a device resize.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Link: https://lore.kernel.org/r/20230601094459.1350643-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-06-05 10:53:04 -06:00
Chao Yu b62e71be21 f2fs: support errors=remount-ro|continue|panic mountoption
This patch supports errors=remount-ro|continue|panic mount option
for f2fs.

f2fs behaves as below in three different modes:
mode			continue	remount-ro	panic
access ops		normal		noraml		N/A
syscall errors		-EIO		-EROFS		N/A
mount option		rw		ro		N/A
pending dir write	keep		keep		N/A
pending non-dir write	drop		keep		N/A
pending node write	drop		keep		N/A
pending meta write	keep		keep		N/A

By default it uses "continue" mode.

[Yangtao helps to clean up function's name]
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-05-08 11:18:04 -07:00
Jaegeuk Kim 2e2c6e9b72 f2fs: remove power-of-two limitation of zoned device
In f2fs, there's no reason to force po2.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-24 11:03:10 -07:00
Chao Yu e1bb7d3d9c f2fs: fix to recover quota data correctly
With -O quota mkfs option, xfstests generic/417 fails due to fsck detects
data corruption on quota inodes.

[ASSERT] (fsck_chk_quota_files:2051)  --> Quota file is missing or invalid quota file content found.

The root cause is there is a hole f2fs doesn't hold quota inodes,
so all recovered quota data will be dropped due to SBI_POR_DOING
flag was set.
- f2fs_fill_super
 - f2fs_recover_orphan_inodes
  - f2fs_enable_quota_files
  - f2fs_quota_off_umount
<--- quota inodes were dropped --->
 - f2fs_recover_fsync_data
  - f2fs_enable_quota_files
  - f2fs_quota_off_umount

This patch tries to eliminate the hole by holding quota inodes
during entire recovery flow as below:
- f2fs_fill_super
 - f2fs_recover_quota_begin
 - f2fs_recover_orphan_inodes
 - f2fs_recover_fsync_data
 - f2fs_recover_quota_end

Then, recovered quota data can be persisted after SBI_POR_DOING
is cleared.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-12 20:00:36 -07:00
Chao Yu d78dfefcde f2fs: fix to check readonly condition correctly
With below case, it can mount multi-device image w/ rw option, however
one of secondary device is set as ro, later update will cause panic, so
let's introduce f2fs_dev_is_readonly(), and check multi-devices rw status
in f2fs_remount() w/ it in order to avoid such inconsistent mount status.

mkfs.f2fs -c /dev/zram1 /dev/zram0 -f
blockdev --setro /dev/zram1
mount -t f2fs dev/zram0 /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
mount -t f2fs -o remount,rw mnt/f2fs
dd if=/dev/zero  of=/mnt/f2fs/file bs=1M count=8192

kernel BUG at fs/f2fs/inline.c:258!
RIP: 0010:f2fs_write_inline_data+0x23e/0x2d0 [f2fs]
Call Trace:
  f2fs_write_single_data_page+0x26b/0x9f0 [f2fs]
  f2fs_write_cache_pages+0x389/0xa60 [f2fs]
  __f2fs_write_data_pages+0x26b/0x2d0 [f2fs]
  f2fs_write_data_pages+0x2e/0x40 [f2fs]
  do_writepages+0xd3/0x1b0
  __writeback_single_inode+0x5b/0x420
  writeback_sb_inodes+0x236/0x5a0
  __writeback_inodes_wb+0x56/0xf0
  wb_writeback+0x2a3/0x490
  wb_do_writeback+0x2b2/0x330
  wb_workfn+0x6a/0x260
  process_one_work+0x270/0x5e0
  worker_thread+0x52/0x3e0
  kthread+0xf4/0x120
  ret_from_fork+0x29/0x50

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-12 20:00:36 -07:00
Chao Yu 68f0453dab f2fs: use f2fs_hw_is_readonly() instead of bdev_read_only()
f2fs has supported multi-device feature, to check devices' rw status,
it should use f2fs_hw_is_readonly() rather than bdev_read_only(), fix
it.

Meanwhile, it removes f2fs_hw_is_readonly() check condition in:
- f2fs_write_checkpoint()
- f2fs_convert_inline_inode()
As it has checked f2fs_readonly() condition, and if f2fs' devices
were readonly, f2fs_readonly() must be true.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10 11:00:31 -07:00
Yangtao Li c2c14ca5b1 f2fs: set default compress option only when sb_has_compression
If the compress feature is not enabled, there is no need to set
compress-related parameters.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-10 10:58:45 -07:00
Yangtao Li d4998b7895 f2fs: add compression feature check for all compress mount opt
Opt_compress_chksum, Opt_compress_mode and Opt_compress_cache
lack the necessary check to see if the image supports compression,
let's add it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-04-04 13:57:29 -07:00
Jaegeuk Kim 1aa161e431 f2fs: fix scheduling while atomic in decompression path
[   16.945668][    C0] Call trace:
[   16.945678][    C0]  dump_backtrace+0x110/0x204
[   16.945706][    C0]  dump_stack_lvl+0x84/0xbc
[   16.945735][    C0]  __schedule_bug+0xb8/0x1ac
[   16.945756][    C0]  __schedule+0x724/0xbdc
[   16.945778][    C0]  schedule+0x154/0x258
[   16.945793][    C0]  bit_wait_io+0x48/0xa4
[   16.945808][    C0]  out_of_line_wait_on_bit+0x114/0x198
[   16.945824][    C0]  __sync_dirty_buffer+0x1f8/0x2e8
[   16.945853][    C0]  __f2fs_commit_super+0x140/0x1f4
[   16.945881][    C0]  f2fs_commit_super+0x110/0x28c
[   16.945898][    C0]  f2fs_handle_error+0x1f4/0x2f4
[   16.945917][    C0]  f2fs_decompress_cluster+0xc4/0x450
[   16.945942][    C0]  f2fs_end_read_compressed_page+0xc0/0xfc
[   16.945959][    C0]  f2fs_handle_step_decompress+0x118/0x1cc
[   16.945978][    C0]  f2fs_read_end_io+0x168/0x2b0
[   16.945993][    C0]  bio_endio+0x25c/0x2c8
[   16.946015][    C0]  dm_io_dec_pending+0x3e8/0x57c
[   16.946052][    C0]  clone_endio+0x134/0x254
[   16.946069][    C0]  bio_endio+0x25c/0x2c8
[   16.946084][    C0]  blk_update_request+0x1d4/0x478
[   16.946103][    C0]  scsi_end_request+0x38/0x4cc
[   16.946129][    C0]  scsi_io_completion+0x94/0x184
[   16.946147][    C0]  scsi_finish_command+0xe8/0x154
[   16.946164][    C0]  scsi_complete+0x90/0x1d8
[   16.946181][    C0]  blk_done_softirq+0xa4/0x11c
[   16.946198][    C0]  _stext+0x184/0x614
[   16.946214][    C0]  __irq_exit_rcu+0x78/0x144
[   16.946234][    C0]  handle_domain_irq+0xd4/0x154
[   16.946260][    C0]  gic_handle_irq.33881+0x5c/0x27c
[   16.946281][    C0]  call_on_irq_stack+0x40/0x70
[   16.946298][    C0]  do_interrupt_handler+0x48/0xa4
[   16.946313][    C0]  el1_interrupt+0x38/0x68
[   16.946346][    C0]  el1h_64_irq_handler+0x20/0x30
[   16.946362][    C0]  el1h_64_irq+0x78/0x7c
[   16.946377][    C0]  finish_task_switch+0xc8/0x3d8
[   16.946394][    C0]  __schedule+0x600/0xbdc
[   16.946408][    C0]  preempt_schedule_common+0x34/0x5c
[   16.946423][    C0]  preempt_schedule+0x44/0x48
[   16.946438][    C0]  process_one_work+0x30c/0x550
[   16.946456][    C0]  worker_thread+0x414/0x8bc
[   16.946472][    C0]  kthread+0x16c/0x1e0
[   16.946486][    C0]  ret_from_fork+0x10/0x20

Fixes: bff139b49d ("f2fs: handle decompress only post processing in softirq")
Fixes: 95fa90c9e5 ("f2fs: support recording errors into superblock")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29 15:17:40 -07:00
Yangtao Li 447286ebad f2fs: convert to use bitmap API
Let's use BIT() and GENMASK() instead of open it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-03-29 15:17:37 -07:00
Linus Torvalds 103830683c f2fs-for-6.3-rc1
In this round, we've got a huge number of patches that improve code readability
 along with minor bug fixes, while we've mainly fixed some critical issues in
 recently-added per-block age-based extent_cache, atomic write support, and some
 folio cases.
 
 Enhancement:
  - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran
  - show ipu policy in debugfs
  - reduce stack memory cost by using bitfield in struct f2fs_io_info
  - introduce trace_f2fs_replace_atomic_write_block
  - enhance iostat support and adds flush commands
 
 Bug fix:
  - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  - fix kernel crash on the atomic write abort flow
  - call clear_page_private_reference in .{release,invalid}_folio
  - support .migrate_folio for compressed inode
  - fix cgroup writeback accounting with fs-layer encryption
  - retry to update the inode page given data corruption
  - fix kernel crash due to null io->bio
  - fix some bugs in per-block age-based extent_cache:
     a. wrong calculation of block age
     b. update age extent in f2fs_do_zero_range()
     c. update age extent correctly during truncation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmP9M/cACgkQQBSofoJI
 UNIX1Q//Yp+nDeY91H3IO6aMSHPqRoBBnVTr8ERtLUF0fuQbBkzcQE+t8cMSoYDM
 88sxoC+F7UnovNr84VeuKlHN4hYyAXuxj8OZetXI3XX+yiO+auEPtljGA0BaGwqL
 93lIg8nIQ2ing/oZ/4+h4dvpYCPrhKOQS6h1sHhIWlql6Wxwxq01uA47i0Ni6y/o
 D23JPFaDQfumN8qy1bm5xfLRhTQmaE35n5NhcBJpUD/rGK92NPXv7RLKPgZc3tKN
 tJmL+NXa3NNEx5e5TSP7JX+rhD7KlL5XlB/m8LbpPIx338I0pt0uzAc7nTIOlzM3
 DI0q3HXe9U3+JBHi+rKxkIniiRDmvhPx3NzdgcsYg75EnwdrNazsRHulBUEAXB4v
 ghHbx53OxA2uSnUVbkY4HXNYf7cYjrx5vbX0oqEx48btBCC8KGFIcHtI72tIBee3
 xdCxoM3e2AWijkFBOCkThXNuNNbdifQzn2e7xR7W+o0L9hwdR5t7tHhHT+cqG9Ox
 6UKWoZZUjYUAV/YQT5Qh6570GsGncM8gHAUMz7DVTIOB9wYkHtb0Q9tUmoQccwf5
 46r84c4/jxUQHt8SkXBBl1aiqHmR7EF17YvM5AXIdG3DqqOy70NWkM48hMOyIy2G
 eY/wTVJwpd6oDveQPtxaHn7Wo3jSPNUlidOfAYQ7Itm34VXY/zc=
 =yXLl
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've got a huge number of patches that improve code
  readability along with minor bug fixes, while we've mainly fixed some
  critical issues in recently-added per-block age-based extent_cache,
  atomic write support, and some folio cases.

  Enhancements:

   - add sysfs nodes to set last_age_weight and manage
     discard_io_aware_gran

   - show ipu policy in debugfs

   - reduce stack memory cost by using bitfield in struct f2fs_io_info

   - introduce trace_f2fs_replace_atomic_write_block

   - enhance iostat support and adds flush commands

  Bug fixes:

   - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"

   - fix kernel crash on the atomic write abort flow

   - call clear_page_private_reference in .{release,invalid}_folio

   - support .migrate_folio for compressed inode

   - fix cgroup writeback accounting with fs-layer encryption

   - retry to update the inode page given data corruption

   - fix kernel crash due to NULL io->bio

   - fix some bugs in per-block age-based extent_cache:
       - wrong calculation of block age
       - update age extent in f2fs_do_zero_range()
       - update age extent correctly during truncation"

* tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: drop unnecessary arg for f2fs_ioc_*()
  f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  f2fs: synchronize atomic write aborts
  f2fs: fix wrong segment count
  f2fs: replace si->sbi w/ sbi in stat_show()
  f2fs: export ipu policy in debugfs
  f2fs: make kobj_type structures constant
  f2fs: fix to do sanity check on extent cache correctly
  f2fs: add missing description for ipu_policy node
  f2fs: fix to set ipu policy
  f2fs: fix typos in comments
  f2fs: fix kernel crash due to null io->bio
  f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
  f2fs: add sysfs nodes to set last_age_weight
  f2fs: fix f2fs_show_options to show nogc_merge mount option
  f2fs: fix cgroup writeback accounting with fs-layer encryption
  f2fs: fix wrong calculation of block age
  f2fs: fix to update age extent in f2fs_do_zero_range()
  f2fs: fix to update age extent correctly during truncation
  f2fs: fix to avoid potential memory corruption in __update_iostat_latency()
  ...
2023-02-27 16:18:51 -08:00
Daeho Jeong a46bebd502 f2fs: synchronize atomic write aborts
To fix a race condition between atomic write aborts, I use the inode
lock and make COW inode to be re-usable thoroughout the whole
atomic file inode lifetime.

Reported-by: syzbot+823000d23b3400619f7c@syzkaller.appspotmail.com
Fixes: 3db1de0e58 ("f2fs: change the current atomic write way")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-14 10:08:59 -08:00
Eric Biggers 1ad2a62676 f2fs: stop calling fscrypt_add_test_dummy_key()
Now that fs/crypto/ adds the test dummy encryption key on-demand when
it's needed, there's no need for individual filesystems to call
fscrypt_add_test_dummy_key().  Remove the call to it from f2fs.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230208062107.199831-4-ebiggers@kernel.org
2023-02-07 22:30:30 -08:00
Yangtao Li c5bf834833 f2fs: fix to set ipu policy
For LFS mode, it should update outplace and no need inplace update.
When using LFS mode for small-volume devices, IPU will not be used,
and the OPU writing method is actually used, but F2FS_IPU_FORCE can
be read from the ipu_policy node, which is different from the actual
situation. And remount to lfs mode should be disallowed when
f2fs ipu is enabled, let's fix it.

Fixes: 84b89e5d94 ("f2fs: add auto tuning for small devices")
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-07 10:45:00 -08:00
Yangtao Li 04d7a7ae43 f2fs: fix f2fs_show_options to show nogc_merge mount option
Commit 5911d2d1d1 ("f2fs: introduce gc_merge mount option") forgot
to show nogc_merge option, let's fix it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-02-05 19:38:28 -08:00
Yangtao Li b1c5ef26e4 f2fs: return true if all cmd were issued or no cmd need to be issued for f2fs_issue_discard_timeout()
f2fs_issue_discard_timeout() returns whether discard cmds are dropped,
which does not match the meaning of the function. Let's change it to
return whether all discard cmd are issued.

After commit 4d67490498 ("f2fs: Don't create discard thread when
device doesn't support realtime discard"), f2fs_issue_discard_timeout()
is alse called by f2fs_remount(). Since the comments of
f2fs_issue_discard_timeout() doesn't make much sense, let's update it.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-31 10:47:45 -08:00
Yangtao Li 2381a68556 f2fs: fix to show discard_unit mount opt
Convert to show discard_unit only when has DISCARD opt.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-31 10:47:45 -08:00
Yangtao Li c40e15a9a5 f2fs: merge f2fs_show_injection_info() into time_to_inject()
There is no need to additionally use f2fs_show_injection_info()
to output information. Concatenate time_to_inject() and
__time_to_inject() via a macro. In the new __time_to_inject()
function, pass in the caller function name and parent function.

In this way, we no longer need the f2fs_show_injection_info() function,
and let's remove it.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-11 11:15:19 -08:00
Yangtao Li b5a711acab f2fs: judge whether discard_unit is section only when have CONFIG_BLK_DEV_ZONED
The current logic, regardless of whether CONFIG_BLK_DEV_ZONED
is enabled or not, will judge whether discard_unit is SECTION,
when f2fs_sb_has_blkzoned.

In fact, when CONFIG_BLK_DEV_ZONED is not enabled, this judgment
is a path that will never be accessed. At this time, -EINVAL will
be returned in the parse_options function, accompanied by the
message "Zoned block device support is not enabled".

Let's wrap this discard_unit judgment with CONFIG_BLK_DEV_ZONED.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-11 11:15:18 -08:00
Yangtao Li fdb7ccc3f9 f2fs: introduce IS_F2FS_IPU_* macro
IS_F2FS_IPU_* macro can be used to identify whether
f2fs ipu related policies are enabled.

BTW, convert to use BIT() instead of open code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-01-06 15:13:41 -08:00
Yangtao Li 25547439f1 f2fs: don't call f2fs_issue_discard_timeout() when discard_cmd_cnt is 0 in f2fs_put_super()
No need to call f2fs_issue_discard_timeout() in f2fs_put_super,
when no discard command requires issue. Since the caller of
f2fs_issue_discard_timeout() usually judges the number of discard
commands before using it. Let's move this logic to
f2fs_issue_discard_timeout().

By the way, use f2fs_realtime_discard_enable to simplify the code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:59:38 -08:00
Colin Ian King db8dcd25ec f2fs: Fix spelling mistake in label: free_bio_enrty_cache -> free_bio_entry_cache
There is a spelling mistake in a label name. Fix it.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:59:32 -08:00
Jaegeuk Kim 71644dff48 f2fs: add block_age-based extent cache
This patch introduces a runtime hot/cold data separation method
for f2fs, in order to improve the accuracy for data temperature
classification, reduce the garbage collection overhead after
long-term data updates.

Enhanced hot/cold data separation can record data block update
frequency as "age" of the extent per inode, and take use of the age
info to indicate better temperature type for data block allocation:
 - It records total data blocks allocated since mount;
 - When file extent has been updated, it calculate the count of data
blocks allocated since last update as the age of the extent;
 - Before the data block allocated, it searches for the age info and
chooses the suitable segment for allocation.

Test and result:
 - Prepare: create about 30000 files
  * 3% for cold files (with cold file extension like .apk, from 3M to 10M)
  * 50% for warm files (with random file extension like .FcDxq, from 1K
to 4M)
  * 47% for hot files (with hot file extension like .db, from 1K to 256K)
 - create(5%)/random update(90%)/delete(5%) the files
  * total write amount is about 70G
  * fsync will be called for .db files, and buffered write will be used
for other files

The storage of test device is large enough(128G) so that it will not
switch to SSR mode during the test.

Benefit: dirty segment count increment reduce about 14%
 - before: Dirty +21110
 - after:  Dirty +18286

Signed-off-by: qixiaoyu1 <qixiaoyu1@xiaomi.com>
Signed-off-by: xiongping1 <xiongping1@xiaomi.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:53:56 -08:00
Jaegeuk Kim 12607c1ba7 f2fs: specify extent cache for read explicitly
Let's descrbie it's read extent cache.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:53:55 -08:00
Yangtao Li ed8ac22b6b f2fs: introduce f2fs_is_readonly() for readability
Introduce f2fs_is_readonly() and use it to simplify code.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-12 14:53:48 -08:00
Yangtao Li 870af777da f2fs: do some cleanup for f2fs module init
Just for cleanup, no functional changes.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-12-08 09:32:20 -08:00
Yangtao Li 1cd2e6d544 f2fs: define MIN_DISCARD_GRANULARITY macro
Do cleanup in f2fs_tuning_parameters() and __init_discard_policy(),
let's use macro instead of number.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28 12:50:58 -08:00
Yuwei Guan 66aee5aaa2 f2fs: change type for 'sbi->readdir_ra'
Before this patch, the varibale 'readdir_ra' takes effect if it's equal
to '1' or not, so we can change type for it from 'int' to 'bool'.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28 12:46:33 -08:00
Yuwei Guan 777cd95b80 f2fs: cleanup for 'f2fs_tuning_parameters' function
A cleanup patch for 'f2fs_tuning_parameters' function.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28 12:46:33 -08:00
Yuwei Guan b7ad23cec2 f2fs: fix to alloc_mode changed after remount on a small volume device
The commit 84b89e5d94 ("f2fs: add auto tuning for small devices") add
tuning for small volume device, now support to tune alloce_mode to 'reuse'
if it's small size. But the alloc_mode will change to 'default' when do
remount on this small size dievce. This patch fo fix alloc_mode changed
when do remount for a small volume device.

Signed-off-by: Yuwei Guan <Yuwei.Guan@zeekrlife.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-28 12:46:33 -08:00
Yangtao Li 967eaad1fe f2fs: fix to set flush_merge opt and show noflush_merge
Some minor modifications to flush_merge and related parameters:

  1.The FLUSH_MERGE opt is set by default only in non-ro mode.
  2.When ro and merge are set at the same time, an error is reported.
  3.Display noflush_merge mount opt.

Suggested-by: Chao Yu <chao@kernel.org>
Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11 09:48:25 -08:00
Tetsuo Handa 92b4cf5b48 f2fs: initialize locks earlier in f2fs_fill_super()
syzbot is reporting lockdep warning at f2fs_handle_error() [1], for
spin_lock(&sbi->error_lock) is called before spin_lock_init() is called.
For safe locking in error handling, move initialization of locks (and
obvious structures) in f2fs_fill_super() to immediately after memory
allocation.

Link: https://syzkaller.appspot.com/bug?extid=40642be9b7e0bb28e0df [1]
Reported-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot <syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11 09:48:24 -08:00
Chao Yu cc249e4cba f2fs: fix to avoid accessing uninitialized spinlock
syzbot reports a kernel bug:

 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1e3/0x2cb lib/dump_stack.c:106
 assign_lock_key+0x22a/0x240 kernel/locking/lockdep.c:981
 register_lock_class+0x287/0x9b0 kernel/locking/lockdep.c:1294
 __lock_acquire+0xe4/0x1f60 kernel/locking/lockdep.c:4934
 lock_acquire+0x1a7/0x400 kernel/locking/lockdep.c:5668
 __raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
 _raw_spin_lock+0x2a/0x40 kernel/locking/spinlock.c:154
 spin_lock include/linux/spinlock.h:350 [inline]
 f2fs_save_errors fs/f2fs/super.c:3868 [inline]
 f2fs_handle_error+0x29/0x230 fs/f2fs/super.c:3896
 f2fs_iget+0x215/0x4bb0 fs/f2fs/inode.c:516
 f2fs_fill_super+0x47d3/0x7b50 fs/f2fs/super.c:4222
 mount_bdev+0x26c/0x3a0 fs/super.c:1401
 legacy_get_tree+0xea/0x180 fs/fs_context.c:610
 vfs_get_tree+0x88/0x270 fs/super.c:1531
 do_new_mount+0x289/0xad0 fs/namespace.c:3040
 do_mount fs/namespace.c:3383 [inline]
 __do_sys_mount fs/namespace.c:3591 [inline]
 __se_sys_mount+0x2e3/0x3d0 fs/namespace.c:3568
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

F2FS-fs (loop1): Failed to read F2FS meta data inode

The root cause is if sbi->error_lock may be accessed before
its initialization, fix it.

Link: https://lore.kernel.org/linux-f2fs-devel/0000000000007edb6605ecbb6442@google.com/T/#u
Reported-by: syzbot+40642be9b7e0bb28e0df@syzkaller.appspotmail.com
Fixes: 95fa90c9e5 ("f2fs: support recording errors into superblock")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-11 09:48:24 -08:00
Yangtao Li e5a0db6a9e f2fs: replace gc_urgent_high_remaining with gc_remaining_trials
The user can set the trial count limit for GC urgent and
idle mode with replaced gc_remaining_trials.. If GC thread gets
to the limit, the mode will turn back to GC normal mode finally.

It was applied only to GC_URGENT, while this patch expands it for
GC_IDLE.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-01 17:56:04 -07:00
Chao Yu 7b02b22018 f2fs: fix to destroy sbi->post_read_wq in error path of f2fs_fill_super()
In error path of f2fs_fill_super(), this patch fixes to call
f2fs_destroy_post_read_wq() once if we fail in f2fs_start_ckpt_thread().

Fixes: 261eeb9c15 ("f2fs: introduce checkpoint_merge mount option")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-01 17:56:04 -07:00
Yangtao Li 6047de5482 f2fs: add barrier mount option
This patch adds a mount option, barrier, in f2fs.
The barrier option is the opposite of nobarrier.
If this option is set, cache_flush commands are allowed to be issued.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-11-01 17:56:02 -07:00