f2fs-for-5.15-rc1

In this cycle, we've addressed some performance issues such as lock contention,
 misbehaving compress_cache, allowing extent_cache for compressed files, and new
 sysfs to adjust ra_size for fadvise. In order to diagnose the performance issues
 quickly, we also added an iostat which shows the IO latencies periodically. On
 the stability side, we've found two memory leakage cases in the error path in
 compression flow. And, we've also fixed various corner cases in fiemap, quota,
 checkpoint=disable, zstd, and so on.
 
 Enhancement:
  - avoid long checkpoint latency by releasing nat_tree_lock
  - collect and show iostats periodically
  - support extent_cache for compressed files
  - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
  - report f2fs GC status via sysfs
  - add discard_unit=%s in mount option to handle zoned device
 
 Bug fix:
  - fix two memory leakages when an error happens in the compressed IO flow
  - fix commpress_cache to get the right LBA
  - fix fiemap to deal with compressed case correctly
  - fix wrong EIO returns due to SBI_NEED_FSCK
  - fix missing writes when enabling checkpoint back
  - fix quota deadlock
  - fix zstd level mount option
 
 In addition to the above major updates, we've cleaned up several code paths such
 as dio, unnecessary operations, debugfs/f2fs/status, sanity check, and typos.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmEyw1sACgkQQBSofoJI
 UNLJmA/+NHUgwUjLMcHvmLyp6QYpQDZtKj93/sRDo+YHOYNdYFjWWUb329PYTKWS
 kEdzApCP+KHfVxeSkiL/x3qWP+RlTkIf96P0kR3/BKi0tjg25G2riFWztusDDFpt
 xi+AW5sUFDvIx1tFumvQHAQedSwBgcZ96ovT5EwxEuONkljhZC9phEC6vSXz9nOR
 e2EQIyezbC5O21np1KSeqSgqRMpVkJkVcEHy4VmpMBCLMOOYPepWwKw+yPaV/jR/
 zUXdo2/53vma50M5LCDPCtjCtWQgLoeNeGLxyjfzQuTJU6TmtPY65JObLPt6pUSj
 fRW6qIziTZbVYXzOWBD0EYilv2N4c3BNJdhQCpx2Vyjw9/LLxzqKPOUyzBoa1kjY
 eZVvmaLXVCKsoJdHDSi7OH/4BqS6SuSZE8eO/nGkgswqiErHZ0Vwl3bFCWC7r/Bk
 r2U5spJx/83XO6c9H1bzeWEies1DRtwnCDIRRuw35RtJ4uHZaqCfkuJ7rOBwC90X
 4SpaAKdUxP2RWc3GKELBIhaqPn7vyMy9ile6VU14PjM8UcY5hyE87T2azqR8gGut
 nVjRL4cbMGTPj6m1Qj8KqBRSaLuShe6AncUy7bNGiM+JlcLcdB6OJ1ZYLl9hjx2r
 TbIouXThgcZ4SIK0DEaBLKz2b9/0TfaO9gw1XzpRma+bWA1pApM=
 =W67o
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this cycle, we've addressed some performance issues such as lock
  contention, misbehaving compress_cache, allowing extent_cache for
  compressed files, and new sysfs to adjust ra_size for fadvise.

  In order to diagnose the performance issues quickly, we also added an
  iostat which shows the IO latencies periodically.

  On the stability side, we've found two memory leakage cases in the
  error path in compression flow. And, we've also fixed various corner
  cases in fiemap, quota, checkpoint=disable, zstd, and so on.

  Enhancements:
   - avoid long checkpoint latency by releasing nat_tree_lock
   - collect and show iostats periodically
   - support extent_cache for compressed files
   - add a sysfs entry to manage ra_size given fadvise(POSIX_FADV_SEQUENTIAL)
   - report f2fs GC status via sysfs
   - add discard_unit=%s in mount option to handle zoned device

  Bug fixes:
   - fix two memory leakages when an error happens in the compressed IO flow
   - fix commpress_cache to get the right LBA
   - fix fiemap to deal with compressed case correctly
   - fix wrong EIO returns due to SBI_NEED_FSCK
   - fix missing writes when enabling checkpoint back
   - fix quota deadlock
   - fix zstd level mount option

  In addition to the above major updates, we've cleaned up several code
  paths such as dio, unnecessary operations, debugfs/f2fs/status, sanity
  check, and typos"

* tag 'f2fs-for-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (46 commits)
  f2fs: should put a page beyond EOF when preparing a write
  f2fs: deallocate compressed pages when error happens
  f2fs: enable realtime discard iff device supports discard
  f2fs: guarantee to write dirty data when enabling checkpoint back
  f2fs: fix to unmap pages from userspace process in punch_hole()
  f2fs: fix unexpected ENOENT comes from f2fs_map_blocks()
  f2fs: fix to account missing .skipped_gc_rwsem
  f2fs: adjust unlock order for cleanup
  f2fs: Don't create discard thread when device doesn't support realtime discard
  f2fs: rebuild nat_bits during umount
  f2fs: introduce periodic iostat io latency traces
  f2fs: separate out iostat feature
  f2fs: compress: do sanity check on cluster
  f2fs: fix description about main_blkaddr node
  f2fs: convert S_IRUGO to 0444
  f2fs: fix to keep compatibility of fault injection interface
  f2fs: support fault injection for f2fs_kmem_cache_alloc()
  f2fs: compress: allow write compress released file after truncate to zero
  f2fs: correct comment in segment.h
  f2fs: improve sbi status info in debugfs/f2fs/status
  ...
This commit is contained in:
Linus Torvalds 2021-09-04 10:48:47 -07:00
commit 6abaa83c73
23 changed files with 1499 additions and 463 deletions

View File

@ -41,8 +41,7 @@ Description: This parameter controls the number of prefree segments to be
What: /sys/fs/f2fs/<disk>/main_blkaddr What: /sys/fs/f2fs/<disk>/main_blkaddr
Date: November 2019 Date: November 2019
Contact: "Ramon Pantin" <pantin@google.com> Contact: "Ramon Pantin" <pantin@google.com>
Description: Description: Shows first block address of MAIN area.
Shows first block address of MAIN area.
What: /sys/fs/f2fs/<disk>/ipu_policy What: /sys/fs/f2fs/<disk>/ipu_policy
Date: November 2013 Date: November 2013
@ -493,3 +492,23 @@ Contact: "Chao Yu" <yuchao0@huawei.com>
Description: When ATGC is on, it controls age threshold to bypass GCing young Description: When ATGC is on, it controls age threshold to bypass GCing young
candidates whose age is not beyond the threshold, by default it was candidates whose age is not beyond the threshold, by default it was
initialized as 604800 seconds (equals to 7 days). initialized as 604800 seconds (equals to 7 days).
What: /sys/fs/f2fs/<disk>/gc_reclaimed_segments
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: Show how many segments have been reclaimed by GC during a specific
GC mode (0: GC normal, 1: GC idle CB, 2: GC idle greedy,
3: GC idle AT, 4: GC urgent high, 5: GC urgent low)
You can re-initialize this value to "0".
What: /sys/fs/f2fs/<disk>/gc_segment_mode
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: You can control for which gc mode the "gc_reclaimed_segments" node shows.
Refer to the description of the modes in "gc_reclaimed_segments".
What: /sys/fs/f2fs/<disk>/seq_file_ra_mul
Date: July 2021
Contact: "Daeho Jeong" <daehojeong@google.com>
Description: You can control the multiplier value of bdi device readahead window size
between 2 (default) and 256 for POSIX_FADV_SEQUENTIAL advise option.

View File

@ -185,6 +185,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_KVMALLOC 0x000000002 FAULT_KVMALLOC 0x000000002
FAULT_PAGE_ALLOC 0x000000004 FAULT_PAGE_ALLOC 0x000000004
FAULT_PAGE_GET 0x000000008 FAULT_PAGE_GET 0x000000008
FAULT_ALLOC_BIO 0x000000010 (obsolete)
FAULT_ALLOC_NID 0x000000020 FAULT_ALLOC_NID 0x000000020
FAULT_ORPHAN 0x000000040 FAULT_ORPHAN 0x000000040
FAULT_BLOCK 0x000000080 FAULT_BLOCK 0x000000080
@ -195,6 +196,7 @@ fault_type=%d Support configuring fault injection type, should be
FAULT_CHECKPOINT 0x000001000 FAULT_CHECKPOINT 0x000001000
FAULT_DISCARD 0x000002000 FAULT_DISCARD 0x000002000
FAULT_WRITE_IO 0x000004000 FAULT_WRITE_IO 0x000004000
FAULT_SLAB_ALLOC 0x000008000
=================== =========== =================== ===========
mode=%s Control block allocation mode which supports "adaptive" mode=%s Control block allocation mode which supports "adaptive"
and "lfs". In "lfs" mode, there should be no random and "lfs". In "lfs" mode, there should be no random
@ -312,6 +314,14 @@ inlinecrypt When possible, encrypt/decrypt the contents of encrypted
Documentation/block/inline-encryption.rst. Documentation/block/inline-encryption.rst.
atgc Enable age-threshold garbage collection, it provides high atgc Enable age-threshold garbage collection, it provides high
effectiveness and efficiency on background GC. effectiveness and efficiency on background GC.
discard_unit=%s Control discard unit, the argument can be "block", "segment"
and "section", issued discard command's offset/size will be
aligned to the unit, by default, "discard_unit=block" is set,
so that small discard functionality is enabled.
For blkzoned device, "discard_unit=section" will be set by
default, it is helpful for large sized SMR or ZNS devices to
reduce memory cost by getting rid of fs metadata supports small
discard.
======================== ============================================================ ======================== ============================================================
Debugfs Entries Debugfs Entries
@ -857,8 +867,11 @@ Compression implementation
directly in order to guarantee potential data updates later to the space. directly in order to guarantee potential data updates later to the space.
Instead, the main goal is to reduce data writes to flash disk as much as Instead, the main goal is to reduce data writes to flash disk as much as
possible, resulting in extending disk life time as well as relaxing IO possible, resulting in extending disk life time as well as relaxing IO
congestion. Alternatively, we've added ioctl interface to reclaim compressed congestion. Alternatively, we've added ioctl(F2FS_IOC_RELEASE_COMPRESS_BLOCKS)
space and show it to user after putting the immutable bit. interface to reclaim compressed space and show it to user after putting the
immutable bit. Immutable bit, after release, it doesn't allow writing/mmaping
on the file, until reserving compressed space via
ioctl(F2FS_IOC_RESERVE_COMPRESS_BLOCKS) or truncating filesize to zero.
Compress metadata layout:: Compress metadata layout::

View File

@ -105,6 +105,13 @@ config F2FS_FS_LZO
help help
Support LZO compress algorithm, if unsure, say Y. Support LZO compress algorithm, if unsure, say Y.
config F2FS_FS_LZORLE
bool "LZO-RLE compression support"
depends on F2FS_FS_LZO
default y
help
Support LZO-RLE compress algorithm, if unsure, say Y.
config F2FS_FS_LZ4 config F2FS_FS_LZ4
bool "LZ4 compression support" bool "LZ4 compression support"
depends on F2FS_FS_COMPRESSION depends on F2FS_FS_COMPRESSION
@ -114,7 +121,6 @@ config F2FS_FS_LZ4
config F2FS_FS_LZ4HC config F2FS_FS_LZ4HC
bool "LZ4HC compression support" bool "LZ4HC compression support"
depends on F2FS_FS_COMPRESSION
depends on F2FS_FS_LZ4 depends on F2FS_FS_LZ4
default y default y
help help
@ -128,10 +134,11 @@ config F2FS_FS_ZSTD
help help
Support ZSTD compress algorithm, if unsure, say Y. Support ZSTD compress algorithm, if unsure, say Y.
config F2FS_FS_LZORLE config F2FS_IOSTAT
bool "LZO-RLE compression support" bool "F2FS IO statistics information"
depends on F2FS_FS_COMPRESSION depends on F2FS_FS
depends on F2FS_FS_LZO
default y default y
help help
Support LZO-RLE compress algorithm, if unsure, say Y. Support getting IO statistics through sysfs and printing out periodic
IO statistics tracepoint events. You have to turn on "iostat_enable"
sysfs node to enable this feature.

View File

@ -9,3 +9,4 @@ f2fs-$(CONFIG_F2FS_FS_XATTR) += xattr.o
f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o f2fs-$(CONFIG_F2FS_FS_POSIX_ACL) += acl.o
f2fs-$(CONFIG_FS_VERITY) += verity.o f2fs-$(CONFIG_FS_VERITY) += verity.o
f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o f2fs-$(CONFIG_F2FS_FS_COMPRESSION) += compress.o
f2fs-$(CONFIG_F2FS_IOSTAT) += iostat.o

View File

@ -18,6 +18,7 @@
#include "f2fs.h" #include "f2fs.h"
#include "node.h" #include "node.h"
#include "segment.h" #include "segment.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3)) #define DEFAULT_CHECKPOINT_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
@ -465,16 +466,29 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
unsigned int devidx, int type) unsigned int devidx, int type)
{ {
struct inode_management *im = &sbi->im[type]; struct inode_management *im = &sbi->im[type];
struct ino_entry *e, *tmp; struct ino_entry *e = NULL, *new = NULL;
tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS); if (type == FLUSH_INO) {
rcu_read_lock();
e = radix_tree_lookup(&im->ino_root, ino);
rcu_read_unlock();
}
retry:
if (!e)
new = f2fs_kmem_cache_alloc(ino_entry_slab,
GFP_NOFS, true, NULL);
radix_tree_preload(GFP_NOFS | __GFP_NOFAIL); radix_tree_preload(GFP_NOFS | __GFP_NOFAIL);
spin_lock(&im->ino_lock); spin_lock(&im->ino_lock);
e = radix_tree_lookup(&im->ino_root, ino); e = radix_tree_lookup(&im->ino_root, ino);
if (!e) { if (!e) {
e = tmp; if (!new) {
spin_unlock(&im->ino_lock);
goto retry;
}
e = new;
if (unlikely(radix_tree_insert(&im->ino_root, ino, e))) if (unlikely(radix_tree_insert(&im->ino_root, ino, e)))
f2fs_bug_on(sbi, 1); f2fs_bug_on(sbi, 1);
@ -492,8 +506,8 @@ static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
spin_unlock(&im->ino_lock); spin_unlock(&im->ino_lock);
radix_tree_preload_end(); radix_tree_preload_end();
if (e != tmp) if (new && e != new)
kmem_cache_free(ino_entry_slab, tmp); kmem_cache_free(ino_entry_slab, new);
} }
static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type) static void __remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
@ -1289,12 +1303,20 @@ static void update_ckpt_flags(struct f2fs_sb_info *sbi, struct cp_control *cpc)
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi); struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned long flags; unsigned long flags;
spin_lock_irqsave(&sbi->cp_lock, flags); if (cpc->reason & CP_UMOUNT) {
if (le32_to_cpu(ckpt->cp_pack_total_block_count) >
sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks) {
clear_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Disable nat_bits due to no space");
} else if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG) &&
f2fs_nat_bitmap_enabled(sbi)) {
f2fs_enable_nat_bits(sbi);
set_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Rebuild and enable nat_bits");
}
}
if ((cpc->reason & CP_UMOUNT) && spin_lock_irqsave(&sbi->cp_lock, flags);
le32_to_cpu(ckpt->cp_pack_total_block_count) >
sbi->blocks_per_seg - NM_I(sbi)->nat_bits_blocks)
disable_nat_bits(sbi, false);
if (cpc->reason & CP_TRIMMED) if (cpc->reason & CP_TRIMMED)
__set_ckpt_flags(ckpt, CP_TRIMMED_FLAG); __set_ckpt_flags(ckpt, CP_TRIMMED_FLAG);
@ -1480,7 +1502,8 @@ static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
start_blk = __start_cp_next_addr(sbi); start_blk = __start_cp_next_addr(sbi);
/* write nat bits */ /* write nat bits */
if (enabled_nat_bits(sbi, cpc)) { if ((cpc->reason & CP_UMOUNT) &&
is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG)) {
__u64 cp_ver = cur_cp_version(ckpt); __u64 cp_ver = cur_cp_version(ckpt);
block_t blk; block_t blk;
@ -1639,8 +1662,11 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
/* write cached NAT/SIT entries to NAT/SIT area */ /* write cached NAT/SIT entries to NAT/SIT area */
err = f2fs_flush_nat_entries(sbi, cpc); err = f2fs_flush_nat_entries(sbi, cpc);
if (err) if (err) {
f2fs_err(sbi, "f2fs_flush_nat_entries failed err:%d, stop checkpoint", err);
f2fs_bug_on(sbi, !f2fs_cp_error(sbi));
goto stop; goto stop;
}
f2fs_flush_sit_entries(sbi, cpc); f2fs_flush_sit_entries(sbi, cpc);
@ -1648,10 +1674,13 @@ int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
f2fs_save_inmem_curseg(sbi); f2fs_save_inmem_curseg(sbi);
err = do_checkpoint(sbi, cpc); err = do_checkpoint(sbi, cpc);
if (err) if (err) {
f2fs_err(sbi, "do_checkpoint failed err:%d, stop checkpoint", err);
f2fs_bug_on(sbi, !f2fs_cp_error(sbi));
f2fs_release_discard_addrs(sbi); f2fs_release_discard_addrs(sbi);
else } else {
f2fs_clear_prefree_segments(sbi, cpc); f2fs_clear_prefree_segments(sbi, cpc);
}
f2fs_restore_inmem_curseg(sbi); f2fs_restore_inmem_curseg(sbi);
stop: stop:

View File

@ -28,7 +28,8 @@ static void *page_array_alloc(struct inode *inode, int nr)
unsigned int size = sizeof(struct page *) * nr; unsigned int size = sizeof(struct page *) * nr;
if (likely(size <= sbi->page_array_slab_size)) if (likely(size <= sbi->page_array_slab_size))
return kmem_cache_zalloc(sbi->page_array_slab, GFP_NOFS); return f2fs_kmem_cache_alloc(sbi->page_array_slab,
GFP_F2FS_ZERO, false, F2FS_I_SB(inode));
return f2fs_kzalloc(sbi, size, GFP_NOFS); return f2fs_kzalloc(sbi, size, GFP_NOFS);
} }
@ -898,6 +899,54 @@ static bool cluster_has_invalid_data(struct compress_ctx *cc)
return false; return false;
} }
bool f2fs_sanity_check_cluster(struct dnode_of_data *dn)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
unsigned int cluster_size = F2FS_I(dn->inode)->i_cluster_size;
bool compressed = dn->data_blkaddr == COMPRESS_ADDR;
int cluster_end = 0;
int i;
char *reason = "";
if (!compressed)
return false;
/* [..., COMPR_ADDR, ...] */
if (dn->ofs_in_node % cluster_size) {
reason = "[*|C|*|*]";
goto out;
}
for (i = 1; i < cluster_size; i++) {
block_t blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
/* [COMPR_ADDR, ..., COMPR_ADDR] */
if (blkaddr == COMPRESS_ADDR) {
reason = "[C|*|C|*]";
goto out;
}
if (compressed) {
if (!__is_valid_data_blkaddr(blkaddr)) {
if (!cluster_end)
cluster_end = i;
continue;
}
/* [COMPR_ADDR, NULL_ADDR or NEW_ADDR, valid_blkaddr] */
if (cluster_end) {
reason = "[C|N|N|V]";
goto out;
}
}
}
return false;
out:
f2fs_warn(sbi, "access invalid cluster, ino:%lu, nid:%u, ofs_in_node:%u, reason:%s",
dn->inode->i_ino, dn->nid, dn->ofs_in_node, reason);
set_sbi_flag(sbi, SBI_NEED_FSCK);
return true;
}
static int __f2fs_cluster_blocks(struct inode *inode, static int __f2fs_cluster_blocks(struct inode *inode,
unsigned int cluster_idx, bool compr) unsigned int cluster_idx, bool compr)
{ {
@ -915,6 +964,11 @@ static int __f2fs_cluster_blocks(struct inode *inode,
goto fail; goto fail;
} }
if (f2fs_sanity_check_cluster(&dn)) {
ret = -EFSCORRUPTED;
goto fail;
}
if (dn.data_blkaddr == COMPRESS_ADDR) { if (dn.data_blkaddr == COMPRESS_ADDR) {
int i; int i;
@ -1228,7 +1282,7 @@ static int f2fs_write_compressed_pages(struct compress_ctx *cc,
fio.version = ni.version; fio.version = ni.version;
cic = kmem_cache_zalloc(cic_entry_slab, GFP_NOFS); cic = f2fs_kmem_cache_alloc(cic_entry_slab, GFP_F2FS_ZERO, false, sbi);
if (!cic) if (!cic)
goto out_put_dnode; goto out_put_dnode;
@ -1340,12 +1394,6 @@ out_destroy_crypt:
for (--i; i >= 0; i--) for (--i; i >= 0; i--)
fscrypt_finalize_bounce_page(&cc->cpages[i]); fscrypt_finalize_bounce_page(&cc->cpages[i]);
for (i = 0; i < cc->nr_cpages; i++) {
if (!cc->cpages[i])
continue;
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
out_put_cic: out_put_cic:
kmem_cache_free(cic_entry_slab, cic); kmem_cache_free(cic_entry_slab, cic);
out_put_dnode: out_put_dnode:
@ -1356,6 +1404,12 @@ out_unlock_op:
else else
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
out_free: out_free:
for (i = 0; i < cc->nr_cpages; i++) {
if (!cc->cpages[i])
continue;
f2fs_compress_free_page(cc->cpages[i]);
cc->cpages[i] = NULL;
}
page_array_free(cc->inode, cc->cpages, cc->nr_cpages); page_array_free(cc->inode, cc->cpages, cc->nr_cpages);
cc->cpages = NULL; cc->cpages = NULL;
return -EAGAIN; return -EAGAIN;
@ -1506,7 +1560,8 @@ struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc)
pgoff_t start_idx = start_idx_of_cluster(cc); pgoff_t start_idx = start_idx_of_cluster(cc);
int i; int i;
dic = kmem_cache_zalloc(dic_entry_slab, GFP_NOFS); dic = f2fs_kmem_cache_alloc(dic_entry_slab, GFP_F2FS_ZERO,
false, F2FS_I_SB(cc->inode));
if (!dic) if (!dic)
return ERR_PTR(-ENOMEM); return ERR_PTR(-ENOMEM);
@ -1666,6 +1721,30 @@ void f2fs_put_page_dic(struct page *page)
f2fs_put_dic(dic); f2fs_put_dic(dic);
} }
/*
* check whether cluster blocks are contiguous, and add extent cache entry
* only if cluster blocks are logically and physically contiguous.
*/
unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn)
{
bool compressed = f2fs_data_blkaddr(dn) == COMPRESS_ADDR;
int i = compressed ? 1 : 0;
block_t first_blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
for (i += 1; i < F2FS_I(dn->inode)->i_cluster_size; i++) {
block_t blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + i);
if (!__is_valid_data_blkaddr(blkaddr))
break;
if (first_blkaddr + i - (compressed ? 1 : 0) != blkaddr)
return 0;
}
return compressed ? i - 1 : i;
}
const struct address_space_operations f2fs_compress_aops = { const struct address_space_operations f2fs_compress_aops = {
.releasepage = f2fs_release_page, .releasepage = f2fs_release_page,
.invalidatepage = f2fs_invalidate_page, .invalidatepage = f2fs_invalidate_page,

View File

@ -25,6 +25,7 @@
#include "f2fs.h" #include "f2fs.h"
#include "node.h" #include "node.h"
#include "segment.h" #include "segment.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#define NUM_PREALLOC_POST_READ_CTXS 128 #define NUM_PREALLOC_POST_READ_CTXS 128
@ -116,6 +117,7 @@ struct bio_post_read_ctx {
struct f2fs_sb_info *sbi; struct f2fs_sb_info *sbi;
struct work_struct work; struct work_struct work;
unsigned int enabled_steps; unsigned int enabled_steps;
block_t fs_blkaddr;
}; };
static void f2fs_finish_read_bio(struct bio *bio) static void f2fs_finish_read_bio(struct bio *bio)
@ -228,7 +230,7 @@ static void f2fs_handle_step_decompress(struct bio_post_read_ctx *ctx)
struct bio_vec *bv; struct bio_vec *bv;
struct bvec_iter_all iter_all; struct bvec_iter_all iter_all;
bool all_compressed = true; bool all_compressed = true;
block_t blkaddr = SECTOR_TO_BLOCK(ctx->bio->bi_iter.bi_sector); block_t blkaddr = ctx->fs_blkaddr;
bio_for_each_segment_all(bv, ctx->bio, iter_all) { bio_for_each_segment_all(bv, ctx->bio, iter_all) {
struct page *page = bv->bv_page; struct page *page = bv->bv_page;
@ -269,7 +271,10 @@ static void f2fs_post_read_work(struct work_struct *work)
static void f2fs_read_end_io(struct bio *bio) static void f2fs_read_end_io(struct bio *bio)
{ {
struct f2fs_sb_info *sbi = F2FS_P_SB(bio_first_page_all(bio)); struct f2fs_sb_info *sbi = F2FS_P_SB(bio_first_page_all(bio));
struct bio_post_read_ctx *ctx = bio->bi_private; struct bio_post_read_ctx *ctx;
iostat_update_and_unbind_ctx(bio, 0);
ctx = bio->bi_private;
if (time_to_inject(sbi, FAULT_READ_IO)) { if (time_to_inject(sbi, FAULT_READ_IO)) {
f2fs_show_injection_info(sbi, FAULT_READ_IO); f2fs_show_injection_info(sbi, FAULT_READ_IO);
@ -291,10 +296,13 @@ static void f2fs_read_end_io(struct bio *bio)
static void f2fs_write_end_io(struct bio *bio) static void f2fs_write_end_io(struct bio *bio)
{ {
struct f2fs_sb_info *sbi = bio->bi_private; struct f2fs_sb_info *sbi;
struct bio_vec *bvec; struct bio_vec *bvec;
struct bvec_iter_all iter_all; struct bvec_iter_all iter_all;
iostat_update_and_unbind_ctx(bio, 1);
sbi = bio->bi_private;
if (time_to_inject(sbi, FAULT_WRITE_IO)) { if (time_to_inject(sbi, FAULT_WRITE_IO)) {
f2fs_show_injection_info(sbi, FAULT_WRITE_IO); f2fs_show_injection_info(sbi, FAULT_WRITE_IO);
bio->bi_status = BLK_STS_IOERR; bio->bi_status = BLK_STS_IOERR;
@ -398,6 +406,8 @@ static struct bio *__bio_alloc(struct f2fs_io_info *fio, int npages)
bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi,
fio->type, fio->temp); fio->type, fio->temp);
} }
iostat_alloc_and_bind_ctx(sbi, bio, NULL);
if (fio->io_wbc) if (fio->io_wbc)
wbc_init_bio(fio->io_wbc, bio); wbc_init_bio(fio->io_wbc, bio);
@ -479,6 +489,8 @@ submit_io:
trace_f2fs_submit_read_bio(sbi->sb, type, bio); trace_f2fs_submit_read_bio(sbi->sb, type, bio);
else else
trace_f2fs_submit_write_bio(sbi->sb, type, bio); trace_f2fs_submit_write_bio(sbi->sb, type, bio);
iostat_update_submit_ctx(bio, type);
submit_bio(bio); submit_bio(bio);
} }
@ -723,7 +735,7 @@ static void add_bio_entry(struct f2fs_sb_info *sbi, struct bio *bio,
struct f2fs_bio_info *io = sbi->write_io[DATA] + temp; struct f2fs_bio_info *io = sbi->write_io[DATA] + temp;
struct bio_entry *be; struct bio_entry *be;
be = f2fs_kmem_cache_alloc(bio_entry_slab, GFP_NOFS); be = f2fs_kmem_cache_alloc(bio_entry_slab, GFP_NOFS, true, NULL);
be->bio = bio; be->bio = bio;
bio_get(bio); bio_get(bio);
@ -970,7 +982,7 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
{ {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct bio *bio; struct bio *bio;
struct bio_post_read_ctx *ctx; struct bio_post_read_ctx *ctx = NULL;
unsigned int post_read_steps = 0; unsigned int post_read_steps = 0;
bio = bio_alloc_bioset(for_write ? GFP_NOIO : GFP_KERNEL, bio = bio_alloc_bioset(for_write ? GFP_NOIO : GFP_KERNEL,
@ -1003,8 +1015,10 @@ static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
ctx->bio = bio; ctx->bio = bio;
ctx->sbi = sbi; ctx->sbi = sbi;
ctx->enabled_steps = post_read_steps; ctx->enabled_steps = post_read_steps;
ctx->fs_blkaddr = blkaddr;
bio->bi_private = ctx; bio->bi_private = ctx;
} }
iostat_alloc_and_bind_ctx(sbi, bio, ctx);
return bio; return bio;
} }
@ -1133,7 +1147,7 @@ int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index)
int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index) int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index)
{ {
struct extent_info ei = {0, 0, 0}; struct extent_info ei = {0, };
struct inode *inode = dn->inode; struct inode *inode = dn->inode;
if (f2fs_lookup_extent_cache(inode, index, &ei)) { if (f2fs_lookup_extent_cache(inode, index, &ei)) {
@ -1150,7 +1164,7 @@ struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
struct address_space *mapping = inode->i_mapping; struct address_space *mapping = inode->i_mapping;
struct dnode_of_data dn; struct dnode_of_data dn;
struct page *page; struct page *page;
struct extent_info ei = {0,0,0}; struct extent_info ei = {0, };
int err; int err;
page = f2fs_grab_cache_page(mapping, index, for_write); page = f2fs_grab_cache_page(mapping, index, for_write);
@ -1448,7 +1462,7 @@ int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
int err = 0, ofs = 1; int err = 0, ofs = 1;
unsigned int ofs_in_node, last_ofs_in_node; unsigned int ofs_in_node, last_ofs_in_node;
blkcnt_t prealloc; blkcnt_t prealloc;
struct extent_info ei = {0,0,0}; struct extent_info ei = {0, };
block_t blkaddr; block_t blkaddr;
unsigned int start_pgofs; unsigned int start_pgofs;
@ -1490,7 +1504,21 @@ next_dnode:
if (err) { if (err) {
if (flag == F2FS_GET_BLOCK_BMAP) if (flag == F2FS_GET_BLOCK_BMAP)
map->m_pblk = 0; map->m_pblk = 0;
if (err == -ENOENT) { if (err == -ENOENT) {
/*
* There is one exceptional case that read_node_page()
* may return -ENOENT due to filesystem has been
* shutdown or cp_error, so force to convert error
* number to EIO for such case.
*/
if (map->m_may_create &&
(is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN) ||
f2fs_cp_error(sbi))) {
err = -EIO;
goto unlock_out;
}
err = 0; err = 0;
if (map->m_next_pgofs) if (map->m_next_pgofs)
*map->m_next_pgofs = *map->m_next_pgofs =
@ -1550,6 +1578,13 @@ next_block:
map->m_flags |= F2FS_MAP_NEW; map->m_flags |= F2FS_MAP_NEW;
blkaddr = dn.data_blkaddr; blkaddr = dn.data_blkaddr;
} else { } else {
if (f2fs_compressed_file(inode) &&
f2fs_sanity_check_cluster(&dn) &&
(flag != F2FS_GET_BLOCK_FIEMAP ||
IS_ENABLED(CONFIG_F2FS_CHECK_FS))) {
err = -EFSCORRUPTED;
goto sync_out;
}
if (flag == F2FS_GET_BLOCK_BMAP) { if (flag == F2FS_GET_BLOCK_BMAP) {
map->m_pblk = 0; map->m_pblk = 0;
goto sync_out; goto sync_out;
@ -1843,8 +1878,9 @@ int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 logical = 0, phys = 0, size = 0; u64 logical = 0, phys = 0, size = 0;
u32 flags = 0; u32 flags = 0;
int ret = 0; int ret = 0;
bool compr_cluster = false; bool compr_cluster = false, compr_appended;
unsigned int cluster_size = F2FS_I(inode)->i_cluster_size; unsigned int cluster_size = F2FS_I(inode)->i_cluster_size;
unsigned int count_in_cluster = 0;
loff_t maxbytes; loff_t maxbytes;
if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) { if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
@ -1892,15 +1928,17 @@ next:
map.m_next_pgofs = &next_pgofs; map.m_next_pgofs = &next_pgofs;
map.m_seg_type = NO_CHECK_TYPE; map.m_seg_type = NO_CHECK_TYPE;
if (compr_cluster) if (compr_cluster) {
map.m_len = cluster_size - 1; map.m_lblk += 1;
map.m_len = cluster_size - count_in_cluster;
}
ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP); ret = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_FIEMAP);
if (ret) if (ret)
goto out; goto out;
/* HOLE */ /* HOLE */
if (!(map.m_flags & F2FS_MAP_FLAGS)) { if (!compr_cluster && !(map.m_flags & F2FS_MAP_FLAGS)) {
start_blk = next_pgofs; start_blk = next_pgofs;
if (blks_to_bytes(inode, start_blk) < blks_to_bytes(inode, if (blks_to_bytes(inode, start_blk) < blks_to_bytes(inode,
@ -1910,6 +1948,14 @@ next:
flags |= FIEMAP_EXTENT_LAST; flags |= FIEMAP_EXTENT_LAST;
} }
compr_appended = false;
/* In a case of compressed cluster, append this to the last extent */
if (compr_cluster && ((map.m_flags & F2FS_MAP_UNWRITTEN) ||
!(map.m_flags & F2FS_MAP_FLAGS))) {
compr_appended = true;
goto skip_fill;
}
if (size) { if (size) {
flags |= FIEMAP_EXTENT_MERGED; flags |= FIEMAP_EXTENT_MERGED;
if (IS_ENCRYPTED(inode)) if (IS_ENCRYPTED(inode))
@ -1926,39 +1972,37 @@ next:
if (start_blk > last_blk) if (start_blk > last_blk)
goto out; goto out;
if (compr_cluster) { skip_fill:
compr_cluster = false;
logical = blks_to_bytes(inode, start_blk - 1);
phys = blks_to_bytes(inode, map.m_pblk);
size = blks_to_bytes(inode, cluster_size);
flags |= FIEMAP_EXTENT_ENCODED;
start_blk += cluster_size - 1;
if (start_blk > last_blk)
goto out;
goto prep_next;
}
if (map.m_pblk == COMPRESS_ADDR) { if (map.m_pblk == COMPRESS_ADDR) {
compr_cluster = true; compr_cluster = true;
start_blk++; count_in_cluster = 1;
goto prep_next; } else if (compr_appended) {
unsigned int appended_blks = cluster_size -
count_in_cluster + 1;
size += blks_to_bytes(inode, appended_blks);
start_blk += appended_blks;
compr_cluster = false;
} else {
logical = blks_to_bytes(inode, start_blk);
phys = __is_valid_data_blkaddr(map.m_pblk) ?
blks_to_bytes(inode, map.m_pblk) : 0;
size = blks_to_bytes(inode, map.m_len);
flags = 0;
if (compr_cluster) {
flags = FIEMAP_EXTENT_ENCODED;
count_in_cluster += map.m_len;
if (count_in_cluster == cluster_size) {
compr_cluster = false;
size += blks_to_bytes(inode, 1);
}
} else if (map.m_flags & F2FS_MAP_UNWRITTEN) {
flags = FIEMAP_EXTENT_UNWRITTEN;
}
start_blk += bytes_to_blks(inode, size);
} }
logical = blks_to_bytes(inode, start_blk);
phys = blks_to_bytes(inode, map.m_pblk);
size = blks_to_bytes(inode, map.m_len);
flags = 0;
if (map.m_flags & F2FS_MAP_UNWRITTEN)
flags = FIEMAP_EXTENT_UNWRITTEN;
start_blk += bytes_to_blks(inode, size);
prep_next: prep_next:
cond_resched(); cond_resched();
if (fatal_signal_pending(current)) if (fatal_signal_pending(current))
@ -2115,6 +2159,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
sector_t last_block_in_file; sector_t last_block_in_file;
const unsigned blocksize = blks_to_bytes(inode, 1); const unsigned blocksize = blks_to_bytes(inode, 1);
struct decompress_io_ctx *dic = NULL; struct decompress_io_ctx *dic = NULL;
struct extent_info ei = {0, };
bool from_dnode = true;
int i; int i;
int ret = 0; int ret = 0;
@ -2137,6 +2183,8 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
continue; continue;
} }
unlock_page(page); unlock_page(page);
if (for_write)
put_page(page);
cc->rpages[i] = NULL; cc->rpages[i] = NULL;
cc->nr_rpages--; cc->nr_rpages--;
} }
@ -2145,6 +2193,12 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
if (f2fs_cluster_is_empty(cc)) if (f2fs_cluster_is_empty(cc))
goto out; goto out;
if (f2fs_lookup_extent_cache(inode, start_idx, &ei))
from_dnode = false;
if (!from_dnode)
goto skip_reading_dnode;
set_new_dnode(&dn, inode, NULL, NULL, 0); set_new_dnode(&dn, inode, NULL, NULL, 0);
ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE); ret = f2fs_get_dnode_of_data(&dn, start_idx, LOOKUP_NODE);
if (ret) if (ret)
@ -2152,11 +2206,13 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
f2fs_bug_on(sbi, dn.data_blkaddr != COMPRESS_ADDR); f2fs_bug_on(sbi, dn.data_blkaddr != COMPRESS_ADDR);
skip_reading_dnode:
for (i = 1; i < cc->cluster_size; i++) { for (i = 1; i < cc->cluster_size; i++) {
block_t blkaddr; block_t blkaddr;
blkaddr = data_blkaddr(dn.inode, dn.node_page, blkaddr = from_dnode ? data_blkaddr(dn.inode, dn.node_page,
dn.ofs_in_node + i); dn.ofs_in_node + i) :
ei.blk + i - 1;
if (!__is_valid_data_blkaddr(blkaddr)) if (!__is_valid_data_blkaddr(blkaddr))
break; break;
@ -2166,6 +2222,9 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
goto out_put_dnode; goto out_put_dnode;
} }
cc->nr_cpages++; cc->nr_cpages++;
if (!from_dnode && i >= ei.c_len)
break;
} }
/* nothing to decompress */ /* nothing to decompress */
@ -2185,8 +2244,9 @@ int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
block_t blkaddr; block_t blkaddr;
struct bio_post_read_ctx *ctx; struct bio_post_read_ctx *ctx;
blkaddr = data_blkaddr(dn.inode, dn.node_page, blkaddr = from_dnode ? data_blkaddr(dn.inode, dn.node_page,
dn.ofs_in_node + i + 1); dn.ofs_in_node + i + 1) :
ei.blk + i;
f2fs_wait_on_block_writeback(inode, blkaddr); f2fs_wait_on_block_writeback(inode, blkaddr);
@ -2220,7 +2280,7 @@ submit_and_realloc:
if (bio_add_page(bio, page, blocksize, 0) < blocksize) if (bio_add_page(bio, page, blocksize, 0) < blocksize)
goto submit_and_realloc; goto submit_and_realloc;
ctx = bio->bi_private; ctx = get_post_read_ctx(bio);
ctx->enabled_steps |= STEP_DECOMPRESS; ctx->enabled_steps |= STEP_DECOMPRESS;
refcount_inc(&dic->refcnt); refcount_inc(&dic->refcnt);
@ -2231,13 +2291,15 @@ submit_and_realloc:
*last_block_in_bio = blkaddr; *last_block_in_bio = blkaddr;
} }
f2fs_put_dnode(&dn); if (from_dnode)
f2fs_put_dnode(&dn);
*bio_ret = bio; *bio_ret = bio;
return 0; return 0;
out_put_dnode: out_put_dnode:
f2fs_put_dnode(&dn); if (from_dnode)
f2fs_put_dnode(&dn);
out: out:
for (i = 0; i < cc->cluster_size; i++) { for (i = 0; i < cc->cluster_size; i++) {
if (cc->rpages[i]) { if (cc->rpages[i]) {
@ -2272,6 +2334,7 @@ static int f2fs_mpage_readpages(struct inode *inode,
.nr_rpages = 0, .nr_rpages = 0,
.nr_cpages = 0, .nr_cpages = 0,
}; };
pgoff_t nc_cluster_idx = NULL_CLUSTER;
#endif #endif
unsigned nr_pages = rac ? readahead_count(rac) : 1; unsigned nr_pages = rac ? readahead_count(rac) : 1;
unsigned max_nr_pages = nr_pages; unsigned max_nr_pages = nr_pages;
@ -2304,12 +2367,23 @@ static int f2fs_mpage_readpages(struct inode *inode,
if (ret) if (ret)
goto set_error_page; goto set_error_page;
} }
ret = f2fs_is_compressed_cluster(inode, page->index); if (cc.cluster_idx == NULL_CLUSTER) {
if (ret < 0) if (nc_cluster_idx ==
goto set_error_page; page->index >> cc.log_cluster_size) {
else if (!ret) goto read_single_page;
goto read_single_page; }
ret = f2fs_is_compressed_cluster(inode, page->index);
if (ret < 0)
goto set_error_page;
else if (!ret) {
nc_cluster_idx =
page->index >> cc.log_cluster_size;
goto read_single_page;
}
nc_cluster_idx = NULL_CLUSTER;
}
ret = f2fs_init_compress_ctx(&cc); ret = f2fs_init_compress_ctx(&cc);
if (ret) if (ret)
goto set_error_page; goto set_error_page;
@ -2498,6 +2572,8 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
return true; return true;
if (f2fs_is_atomic_file(inode)) if (f2fs_is_atomic_file(inode))
return true; return true;
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
return true;
/* swap file is migrating in aligned write mode */ /* swap file is migrating in aligned write mode */
if (is_inode_flag_set(inode, FI_ALIGNED_WRITE)) if (is_inode_flag_set(inode, FI_ALIGNED_WRITE))
@ -2530,7 +2606,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
struct page *page = fio->page; struct page *page = fio->page;
struct inode *inode = page->mapping->host; struct inode *inode = page->mapping->host;
struct dnode_of_data dn; struct dnode_of_data dn;
struct extent_info ei = {0,0,0}; struct extent_info ei = {0, };
struct node_info ni; struct node_info ni;
bool ipu_force = false; bool ipu_force = false;
int err = 0; int err = 0;
@ -3176,9 +3252,8 @@ static int f2fs_write_data_pages(struct address_space *mapping,
FS_CP_DATA_IO : FS_DATA_IO); FS_CP_DATA_IO : FS_DATA_IO);
} }
static void f2fs_write_failed(struct address_space *mapping, loff_t to) static void f2fs_write_failed(struct inode *inode, loff_t to)
{ {
struct inode *inode = mapping->host;
loff_t i_size = i_size_read(inode); loff_t i_size = i_size_read(inode);
if (IS_NOQUOTA(inode)) if (IS_NOQUOTA(inode))
@ -3187,12 +3262,12 @@ static void f2fs_write_failed(struct address_space *mapping, loff_t to)
/* In the fs-verity case, f2fs_end_enable_verity() does the truncate */ /* In the fs-verity case, f2fs_end_enable_verity() does the truncate */
if (to > i_size && !f2fs_verity_in_progress(inode)) { if (to > i_size && !f2fs_verity_in_progress(inode)) {
down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_lock(mapping); filemap_invalidate_lock(inode->i_mapping);
truncate_pagecache(inode, i_size); truncate_pagecache(inode, i_size);
f2fs_truncate_blocks(inode, i_size, true); f2fs_truncate_blocks(inode, i_size, true);
filemap_invalidate_unlock(mapping); filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
} }
} }
@ -3206,7 +3281,7 @@ static int prepare_write_begin(struct f2fs_sb_info *sbi,
struct dnode_of_data dn; struct dnode_of_data dn;
struct page *ipage; struct page *ipage;
bool locked = false; bool locked = false;
struct extent_info ei = {0,0,0}; struct extent_info ei = {0, };
int err = 0; int err = 0;
int flag; int flag;
@ -3328,6 +3403,9 @@ static int f2fs_write_begin(struct file *file, struct address_space *mapping,
*fsdata = NULL; *fsdata = NULL;
if (len == PAGE_SIZE)
goto repeat;
ret = f2fs_prepare_compress_overwrite(inode, pagep, ret = f2fs_prepare_compress_overwrite(inode, pagep,
index, fsdata); index, fsdata);
if (ret < 0) { if (ret < 0) {
@ -3410,7 +3488,7 @@ repeat:
fail: fail:
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
f2fs_write_failed(mapping, pos + len); f2fs_write_failed(inode, pos + len);
if (drop_atomic) if (drop_atomic)
f2fs_drop_inmem_pages_all(sbi, false); f2fs_drop_inmem_pages_all(sbi, false);
return err; return err;
@ -3552,7 +3630,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
if (f2fs_force_buffered_io(inode, iocb, iter)) if (f2fs_force_buffered_io(inode, iocb, iter))
return 0; return 0;
do_opu = allow_outplace_dio(inode, iocb, iter); do_opu = rw == WRITE && f2fs_lfs_mode(sbi);
trace_f2fs_direct_IO_enter(inode, offset, count, rw); trace_f2fs_direct_IO_enter(inode, offset, count, rw);
@ -3600,7 +3678,7 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO, f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
count - iov_iter_count(iter)); count - iov_iter_count(iter));
} else if (err < 0) { } else if (err < 0) {
f2fs_write_failed(mapping, offset + count); f2fs_write_failed(inode, offset + count);
} }
} else { } else {
if (err > 0) if (err > 0)

View File

@ -323,11 +323,27 @@ get_cache:
#endif #endif
} }
static char *s_flag[] = {
[SBI_IS_DIRTY] = " fs_dirty",
[SBI_IS_CLOSE] = " closing",
[SBI_NEED_FSCK] = " need_fsck",
[SBI_POR_DOING] = " recovering",
[SBI_NEED_SB_WRITE] = " sb_dirty",
[SBI_NEED_CP] = " need_cp",
[SBI_IS_SHUTDOWN] = " shutdown",
[SBI_IS_RECOVERED] = " recovered",
[SBI_CP_DISABLED] = " cp_disabled",
[SBI_CP_DISABLED_QUICK] = " cp_disabled_quick",
[SBI_QUOTA_NEED_FLUSH] = " quota_need_flush",
[SBI_QUOTA_SKIP_FLUSH] = " quota_skip_flush",
[SBI_QUOTA_NEED_REPAIR] = " quota_need_repair",
[SBI_IS_RESIZEFS] = " resizefs",
};
static int stat_show(struct seq_file *s, void *v) static int stat_show(struct seq_file *s, void *v)
{ {
struct f2fs_stat_info *si; struct f2fs_stat_info *si;
int i = 0; int i = 0, j = 0;
int j;
mutex_lock(&f2fs_stat_mutex); mutex_lock(&f2fs_stat_mutex);
list_for_each_entry(si, &f2fs_stat_list, stat_list) { list_for_each_entry(si, &f2fs_stat_list, stat_list) {
@ -337,7 +353,13 @@ static int stat_show(struct seq_file *s, void *v)
si->sbi->sb->s_bdev, i++, si->sbi->sb->s_bdev, i++,
f2fs_readonly(si->sbi->sb) ? "RO": "RW", f2fs_readonly(si->sbi->sb) ? "RO": "RW",
is_set_ckpt_flags(si->sbi, CP_DISABLED_FLAG) ? is_set_ckpt_flags(si->sbi, CP_DISABLED_FLAG) ?
"Disabled": (f2fs_cp_error(si->sbi) ? "Error": "Good")); "Disabled" : (f2fs_cp_error(si->sbi) ? "Error" : "Good"));
if (si->sbi->s_flag) {
seq_puts(s, "[SBI:");
for_each_set_bit(j, &si->sbi->s_flag, 32)
seq_puts(s, s_flag[j]);
seq_puts(s, "]\n");
}
seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ", seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ",
si->sit_area_segs, si->nat_area_segs); si->sit_area_segs, si->nat_area_segs);
seq_printf(s, "[SSA: %d] [MAIN: %d", seq_printf(s, "[SSA: %d] [MAIN: %d",
@ -450,6 +472,15 @@ static int stat_show(struct seq_file *s, void *v)
si->data_segs, si->bg_data_segs); si->data_segs, si->bg_data_segs);
seq_printf(s, " - node segments : %d (%d)\n", seq_printf(s, " - node segments : %d (%d)\n",
si->node_segs, si->bg_node_segs); si->node_segs, si->bg_node_segs);
seq_printf(s, " - Reclaimed segs : Normal (%d), Idle CB (%d), "
"Idle Greedy (%d), Idle AT (%d), "
"Urgent High (%d), Urgent Low (%d)\n",
si->sbi->gc_reclaimed_segs[GC_NORMAL],
si->sbi->gc_reclaimed_segs[GC_IDLE_CB],
si->sbi->gc_reclaimed_segs[GC_IDLE_GREEDY],
si->sbi->gc_reclaimed_segs[GC_IDLE_AT],
si->sbi->gc_reclaimed_segs[GC_URGENT_HIGH],
si->sbi->gc_reclaimed_segs[GC_URGENT_LOW]);
seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks, seq_printf(s, "Try to move %d blocks (BG: %d)\n", si->tot_blks,
si->bg_data_blks + si->bg_node_blks); si->bg_data_blks + si->bg_node_blks);
seq_printf(s, " - data blocks : %d (%d)\n", si->data_blks, seq_printf(s, " - data blocks : %d (%d)\n", si->data_blks,
@ -611,7 +642,7 @@ void __init f2fs_create_root_stats(void)
#ifdef CONFIG_DEBUG_FS #ifdef CONFIG_DEBUG_FS
f2fs_debugfs_root = debugfs_create_dir("f2fs", NULL); f2fs_debugfs_root = debugfs_create_dir("f2fs", NULL);
debugfs_create_file("status", S_IRUGO, f2fs_debugfs_root, NULL, debugfs_create_file("status", 0444, f2fs_debugfs_root, NULL,
&stat_fops); &stat_fops);
#endif #endif
} }

View File

@ -83,8 +83,8 @@ int f2fs_init_casefolded_name(const struct inode *dir,
struct super_block *sb = dir->i_sb; struct super_block *sb = dir->i_sb;
if (IS_CASEFOLDED(dir)) { if (IS_CASEFOLDED(dir)) {
fname->cf_name.name = kmem_cache_alloc(f2fs_cf_name_slab, fname->cf_name.name = f2fs_kmem_cache_alloc(f2fs_cf_name_slab,
GFP_NOFS); GFP_NOFS, false, F2FS_SB(sb));
if (!fname->cf_name.name) if (!fname->cf_name.name)
return -ENOMEM; return -ENOMEM;
fname->cf_name.len = utf8_casefold(sb->s_encoding, fname->cf_name.len = utf8_casefold(sb->s_encoding,
@ -1000,6 +1000,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode); struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode);
struct blk_plug plug; struct blk_plug plug;
bool readdir_ra = sbi->readdir_ra == 1; bool readdir_ra = sbi->readdir_ra == 1;
bool found_valid_dirent = false;
int err = 0; int err = 0;
bit_pos = ((unsigned long)ctx->pos % d->max); bit_pos = ((unsigned long)ctx->pos % d->max);
@ -1014,13 +1015,15 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
de = &d->dentry[bit_pos]; de = &d->dentry[bit_pos];
if (de->name_len == 0) { if (de->name_len == 0) {
if (found_valid_dirent || !bit_pos) {
printk_ratelimited(
"%sF2FS-fs (%s): invalid namelen(0), ino:%u, run fsck to fix.",
KERN_WARNING, sbi->sb->s_id,
le32_to_cpu(de->ino));
set_sbi_flag(sbi, SBI_NEED_FSCK);
}
bit_pos++; bit_pos++;
ctx->pos = start_pos + bit_pos; ctx->pos = start_pos + bit_pos;
printk_ratelimited(
"%sF2FS-fs (%s): invalid namelen(0), ino:%u, run fsck to fix.",
KERN_WARNING, sbi->sb->s_id,
le32_to_cpu(de->ino));
set_sbi_flag(sbi, SBI_NEED_FSCK);
continue; continue;
} }
@ -1063,6 +1066,7 @@ int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
f2fs_ra_node_page(sbi, le32_to_cpu(de->ino)); f2fs_ra_node_page(sbi, le32_to_cpu(de->ino));
ctx->pos = start_pos + bit_pos; ctx->pos = start_pos + bit_pos;
found_valid_dirent = true;
} }
out: out:
if (readdir_ra) if (readdir_ra)

View File

@ -239,7 +239,7 @@ static struct extent_node *__attach_extent_node(struct f2fs_sb_info *sbi,
{ {
struct extent_node *en; struct extent_node *en;
en = kmem_cache_alloc(extent_node_slab, GFP_ATOMIC); en = f2fs_kmem_cache_alloc(extent_node_slab, GFP_ATOMIC, false, sbi);
if (!en) if (!en)
return NULL; return NULL;
@ -292,7 +292,8 @@ static struct extent_tree *__grab_extent_tree(struct inode *inode)
mutex_lock(&sbi->extent_tree_lock); mutex_lock(&sbi->extent_tree_lock);
et = radix_tree_lookup(&sbi->extent_tree_root, ino); et = radix_tree_lookup(&sbi->extent_tree_root, ino);
if (!et) { if (!et) {
et = f2fs_kmem_cache_alloc(extent_tree_slab, GFP_NOFS); et = f2fs_kmem_cache_alloc(extent_tree_slab,
GFP_NOFS, true, NULL);
f2fs_radix_tree_insert(&sbi->extent_tree_root, ino, et); f2fs_radix_tree_insert(&sbi->extent_tree_root, ino, et);
memset(et, 0, sizeof(struct extent_tree)); memset(et, 0, sizeof(struct extent_tree));
et->ino = ino; et->ino = ino;
@ -661,6 +662,47 @@ static void f2fs_update_extent_tree_range(struct inode *inode,
f2fs_mark_inode_dirty_sync(inode, true); f2fs_mark_inode_dirty_sync(inode, true);
} }
#ifdef CONFIG_F2FS_FS_COMPRESSION
void f2fs_update_extent_tree_range_compressed(struct inode *inode,
pgoff_t fofs, block_t blkaddr, unsigned int llen,
unsigned int c_len)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct extent_tree *et = F2FS_I(inode)->extent_tree;
struct extent_node *en = NULL;
struct extent_node *prev_en = NULL, *next_en = NULL;
struct extent_info ei;
struct rb_node **insert_p = NULL, *insert_parent = NULL;
bool leftmost = false;
trace_f2fs_update_extent_tree_range(inode, fofs, blkaddr, llen);
/* it is safe here to check FI_NO_EXTENT w/o et->lock in ro image */
if (is_inode_flag_set(inode, FI_NO_EXTENT))
return;
write_lock(&et->lock);
en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root,
(struct rb_entry *)et->cached_en, fofs,
(struct rb_entry **)&prev_en,
(struct rb_entry **)&next_en,
&insert_p, &insert_parent, false,
&leftmost);
if (en)
goto unlock_out;
set_extent_info(&ei, fofs, blkaddr, llen);
ei.c_len = c_len;
if (!__try_merge_extent_node(sbi, et, &ei, prev_en, next_en))
__insert_extent_tree(sbi, et, &ei,
insert_p, insert_parent, leftmost);
unlock_out:
write_unlock(&et->lock);
}
#endif
unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink) unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink)
{ {
struct extent_tree *et, *next; struct extent_tree *et, *next;

View File

@ -43,6 +43,7 @@ enum {
FAULT_KVMALLOC, FAULT_KVMALLOC,
FAULT_PAGE_ALLOC, FAULT_PAGE_ALLOC,
FAULT_PAGE_GET, FAULT_PAGE_GET,
FAULT_ALLOC_BIO, /* it's obsolete due to bio_alloc() will never fail */
FAULT_ALLOC_NID, FAULT_ALLOC_NID,
FAULT_ORPHAN, FAULT_ORPHAN,
FAULT_BLOCK, FAULT_BLOCK,
@ -53,6 +54,7 @@ enum {
FAULT_CHECKPOINT, FAULT_CHECKPOINT,
FAULT_DISCARD, FAULT_DISCARD,
FAULT_WRITE_IO, FAULT_WRITE_IO,
FAULT_SLAB_ALLOC,
FAULT_MAX, FAULT_MAX,
}; };
@ -139,6 +141,11 @@ struct f2fs_mount_info {
int fsync_mode; /* fsync policy */ int fsync_mode; /* fsync policy */
int fs_mode; /* fs mode: LFS or ADAPTIVE */ int fs_mode; /* fs mode: LFS or ADAPTIVE */
int bggc_mode; /* bggc mode: off, on or sync */ int bggc_mode; /* bggc mode: off, on or sync */
int discard_unit; /*
* discard command's offset/size should
* be aligned to this unit: block,
* segment or section
*/
struct fscrypt_dummy_policy dummy_enc_policy; /* test dummy encryption */ struct fscrypt_dummy_policy dummy_enc_policy; /* test dummy encryption */
block_t unusable_cap_perc; /* percentage for cap */ block_t unusable_cap_perc; /* percentage for cap */
block_t unusable_cap; /* Amount of space allowed to be block_t unusable_cap; /* Amount of space allowed to be
@ -542,7 +549,7 @@ enum {
*/ */
}; };
#define DEFAULT_RETRY_IO_COUNT 8 /* maximum retry read IO count */ #define DEFAULT_RETRY_IO_COUNT 8 /* maximum retry read IO or flush count */
/* congestion wait timeout value, default: 20ms */ /* congestion wait timeout value, default: 20ms */
#define DEFAULT_IO_TIMEOUT (msecs_to_jiffies(20)) #define DEFAULT_IO_TIMEOUT (msecs_to_jiffies(20))
@ -575,6 +582,9 @@ struct extent_info {
unsigned int fofs; /* start offset in a file */ unsigned int fofs; /* start offset in a file */
unsigned int len; /* length of the extent */ unsigned int len; /* length of the extent */
u32 blk; /* start block address of the extent */ u32 blk; /* start block address of the extent */
#ifdef CONFIG_F2FS_FS_COMPRESSION
unsigned int c_len; /* physical extent length of compressed blocks */
#endif
}; };
struct extent_node { struct extent_node {
@ -793,6 +803,9 @@ static inline void set_extent_info(struct extent_info *ei, unsigned int fofs,
ei->fofs = fofs; ei->fofs = fofs;
ei->blk = blk; ei->blk = blk;
ei->len = len; ei->len = len;
#ifdef CONFIG_F2FS_FS_COMPRESSION
ei->c_len = 0;
#endif
} }
static inline bool __is_discard_mergeable(struct discard_info *back, static inline bool __is_discard_mergeable(struct discard_info *back,
@ -817,6 +830,12 @@ static inline bool __is_discard_front_mergeable(struct discard_info *cur,
static inline bool __is_extent_mergeable(struct extent_info *back, static inline bool __is_extent_mergeable(struct extent_info *back,
struct extent_info *front) struct extent_info *front)
{ {
#ifdef CONFIG_F2FS_FS_COMPRESSION
if (back->c_len && back->len != back->c_len)
return false;
if (front->c_len && front->len != front->c_len)
return false;
#endif
return (back->fofs + back->len == front->fofs && return (back->fofs + back->len == front->fofs &&
back->blk + back->len == front->blk); back->blk + back->len == front->blk);
} }
@ -1252,6 +1271,7 @@ enum {
GC_IDLE_AT, GC_IDLE_AT,
GC_URGENT_HIGH, GC_URGENT_HIGH,
GC_URGENT_LOW, GC_URGENT_LOW,
MAX_GC_MODE,
}; };
enum { enum {
@ -1297,6 +1317,12 @@ enum {
*/ */
}; };
enum {
DISCARD_UNIT_BLOCK, /* basic discard unit is block */
DISCARD_UNIT_SEGMENT, /* basic discard unit is segment */
DISCARD_UNIT_SECTION, /* basic discard unit is section */
};
static inline int f2fs_test_bit(unsigned int nr, char *addr); static inline int f2fs_test_bit(unsigned int nr, char *addr);
static inline void f2fs_set_bit(unsigned int nr, char *addr); static inline void f2fs_set_bit(unsigned int nr, char *addr);
static inline void f2fs_clear_bit(unsigned int nr, char *addr); static inline void f2fs_clear_bit(unsigned int nr, char *addr);
@ -1686,14 +1712,6 @@ struct f2fs_sb_info {
#endif #endif
spinlock_t stat_lock; /* lock for stat operations */ spinlock_t stat_lock; /* lock for stat operations */
/* For app/fs IO statistics */
spinlock_t iostat_lock;
unsigned long long rw_iostat[NR_IO_TYPE];
unsigned long long prev_rw_iostat[NR_IO_TYPE];
bool iostat_enable;
unsigned long iostat_next_period;
unsigned int iostat_period_ms;
/* to attach REQ_META|REQ_FUA flags */ /* to attach REQ_META|REQ_FUA flags */
unsigned int data_io_flag; unsigned int data_io_flag;
unsigned int node_io_flag; unsigned int node_io_flag;
@ -1732,6 +1750,12 @@ struct f2fs_sb_info {
struct kmem_cache *inline_xattr_slab; /* inline xattr entry */ struct kmem_cache *inline_xattr_slab; /* inline xattr entry */
unsigned int inline_xattr_slab_size; /* default inline xattr slab size */ unsigned int inline_xattr_slab_size; /* default inline xattr slab size */
/* For reclaimed segs statistics per each GC mode */
unsigned int gc_segment_mode; /* GC state for reclaimed segments */
unsigned int gc_reclaimed_segs[MAX_GC_MODE]; /* Reclaimed segs for each mode */
unsigned long seq_file_ra_mul; /* multiplier for ra_pages of seq. files in fadvise */
#ifdef CONFIG_F2FS_FS_COMPRESSION #ifdef CONFIG_F2FS_FS_COMPRESSION
struct kmem_cache *page_array_slab; /* page array entry */ struct kmem_cache *page_array_slab; /* page array entry */
unsigned int page_array_slab_size; /* default page array slab size */ unsigned int page_array_slab_size; /* default page array slab size */
@ -1747,6 +1771,20 @@ struct f2fs_sb_info {
unsigned int compress_watermark; /* cache page watermark */ unsigned int compress_watermark; /* cache page watermark */
atomic_t compress_page_hit; /* cache hit count */ atomic_t compress_page_hit; /* cache hit count */
#endif #endif
#ifdef CONFIG_F2FS_IOSTAT
/* For app/fs IO statistics */
spinlock_t iostat_lock;
unsigned long long rw_iostat[NR_IO_TYPE];
unsigned long long prev_rw_iostat[NR_IO_TYPE];
bool iostat_enable;
unsigned long iostat_next_period;
unsigned int iostat_period_ms;
/* For io latency related statistics info in one iostat period */
spinlock_t iostat_lat_lock;
struct iostat_lat_info *iostat_io_lat;
#endif
}; };
struct f2fs_private_dio { struct f2fs_private_dio {
@ -2034,36 +2072,6 @@ static inline void clear_ckpt_flags(struct f2fs_sb_info *sbi, unsigned int f)
spin_unlock_irqrestore(&sbi->cp_lock, flags); spin_unlock_irqrestore(&sbi->cp_lock, flags);
} }
static inline void disable_nat_bits(struct f2fs_sb_info *sbi, bool lock)
{
unsigned long flags;
unsigned char *nat_bits;
/*
* In order to re-enable nat_bits we need to call fsck.f2fs by
* set_sbi_flag(sbi, SBI_NEED_FSCK). But it may give huge cost,
* so let's rely on regular fsck or unclean shutdown.
*/
if (lock)
spin_lock_irqsave(&sbi->cp_lock, flags);
__clear_ckpt_flags(F2FS_CKPT(sbi), CP_NAT_BITS_FLAG);
nat_bits = NM_I(sbi)->nat_bits;
NM_I(sbi)->nat_bits = NULL;
if (lock)
spin_unlock_irqrestore(&sbi->cp_lock, flags);
kvfree(nat_bits);
}
static inline bool enabled_nat_bits(struct f2fs_sb_info *sbi,
struct cp_control *cpc)
{
bool set = is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
return (cpc) ? (cpc->reason & CP_UMOUNT) && set : set;
}
static inline void f2fs_lock_op(struct f2fs_sb_info *sbi) static inline void f2fs_lock_op(struct f2fs_sb_info *sbi)
{ {
down_read(&sbi->cp_rwsem); down_read(&sbi->cp_rwsem);
@ -2587,7 +2595,7 @@ static inline struct kmem_cache *f2fs_kmem_cache_create(const char *name,
return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, NULL); return kmem_cache_create(name, size, 0, SLAB_RECLAIM_ACCOUNT, NULL);
} }
static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep, static inline void *f2fs_kmem_cache_alloc_nofail(struct kmem_cache *cachep,
gfp_t flags) gfp_t flags)
{ {
void *entry; void *entry;
@ -2598,6 +2606,20 @@ static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep,
return entry; return entry;
} }
static inline void *f2fs_kmem_cache_alloc(struct kmem_cache *cachep,
gfp_t flags, bool nofail, struct f2fs_sb_info *sbi)
{
if (nofail)
return f2fs_kmem_cache_alloc_nofail(cachep, flags);
if (time_to_inject(sbi, FAULT_SLAB_ALLOC)) {
f2fs_show_injection_info(sbi, FAULT_SLAB_ALLOC);
return NULL;
}
return kmem_cache_alloc(cachep, flags);
}
static inline bool is_inflight_io(struct f2fs_sb_info *sbi, int type) static inline bool is_inflight_io(struct f2fs_sb_info *sbi, int type)
{ {
if (get_pages(sbi, F2FS_RD_DATA) || get_pages(sbi, F2FS_RD_NODE) || if (get_pages(sbi, F2FS_RD_DATA) || get_pages(sbi, F2FS_RD_NODE) ||
@ -3210,47 +3232,6 @@ static inline int get_inline_xattr_addrs(struct inode *inode)
sizeof((f2fs_inode)->field)) \ sizeof((f2fs_inode)->field)) \
<= (F2FS_OLD_ATTRIBUTE_SIZE + (extra_isize))) \ <= (F2FS_OLD_ATTRIBUTE_SIZE + (extra_isize))) \
#define DEFAULT_IOSTAT_PERIOD_MS 3000
#define MIN_IOSTAT_PERIOD_MS 100
/* maximum period of iostat tracing is 1 day */
#define MAX_IOSTAT_PERIOD_MS 8640000
static inline void f2fs_reset_iostat(struct f2fs_sb_info *sbi)
{
int i;
spin_lock(&sbi->iostat_lock);
for (i = 0; i < NR_IO_TYPE; i++) {
sbi->rw_iostat[i] = 0;
sbi->prev_rw_iostat[i] = 0;
}
spin_unlock(&sbi->iostat_lock);
}
extern void f2fs_record_iostat(struct f2fs_sb_info *sbi);
static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes)
{
if (!sbi->iostat_enable)
return;
spin_lock(&sbi->iostat_lock);
sbi->rw_iostat[type] += io_bytes;
if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
sbi->rw_iostat[APP_BUFFERED_IO] =
sbi->rw_iostat[APP_WRITE_IO] -
sbi->rw_iostat[APP_DIRECT_IO];
if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
sbi->rw_iostat[APP_BUFFERED_READ_IO] =
sbi->rw_iostat[APP_READ_IO] -
sbi->rw_iostat[APP_DIRECT_READ_IO];
spin_unlock(&sbi->iostat_lock);
f2fs_record_iostat(sbi);
}
#define __is_large_section(sbi) ((sbi)->segs_per_sec > 1) #define __is_large_section(sbi) ((sbi)->segs_per_sec > 1)
#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META) #define __is_meta_io(fio) (PAGE_TYPE_OF_BIO((fio)->type) == META)
@ -3417,6 +3398,7 @@ int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from);
int f2fs_truncate_xattr_node(struct inode *inode); int f2fs_truncate_xattr_node(struct inode *inode);
int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi,
unsigned int seq_id); unsigned int seq_id);
bool f2fs_nat_bitmap_enabled(struct f2fs_sb_info *sbi);
int f2fs_remove_inode_page(struct inode *inode); int f2fs_remove_inode_page(struct inode *inode);
struct page *f2fs_new_inode_page(struct inode *inode); struct page *f2fs_new_inode_page(struct inode *inode);
struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs); struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs);
@ -3441,6 +3423,7 @@ int f2fs_recover_xattr_data(struct inode *inode, struct page *page);
int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page); int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page);
int f2fs_restore_node_summary(struct f2fs_sb_info *sbi, int f2fs_restore_node_summary(struct f2fs_sb_info *sbi,
unsigned int segno, struct f2fs_summary_block *sum); unsigned int segno, struct f2fs_summary_block *sum);
void f2fs_enable_nat_bits(struct f2fs_sb_info *sbi);
int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc); int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
int f2fs_build_node_manager(struct f2fs_sb_info *sbi); int f2fs_build_node_manager(struct f2fs_sb_info *sbi);
void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi); void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi);
@ -3464,6 +3447,7 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi);
void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free); void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free);
void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr); void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr);
bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr); bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr);
int f2fs_start_discard_thread(struct f2fs_sb_info *sbi);
void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi); void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi);
void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi); void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi);
bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi); bool f2fs_issue_discard_timeout(struct f2fs_sb_info *sbi);
@ -3986,6 +3970,9 @@ void f2fs_destroy_extent_cache(void);
/* /*
* sysfs.c * sysfs.c
*/ */
#define MIN_RA_MUL 2
#define MAX_RA_MUL 256
int __init f2fs_init_sysfs(void); int __init f2fs_init_sysfs(void);
void f2fs_exit_sysfs(void); void f2fs_exit_sysfs(void);
int f2fs_register_sysfs(struct f2fs_sb_info *sbi); int f2fs_register_sysfs(struct f2fs_sb_info *sbi);
@ -4040,18 +4027,23 @@ void f2fs_end_read_compressed_page(struct page *page, bool failed,
block_t blkaddr); block_t blkaddr);
bool f2fs_cluster_is_empty(struct compress_ctx *cc); bool f2fs_cluster_is_empty(struct compress_ctx *cc);
bool f2fs_cluster_can_merge_page(struct compress_ctx *cc, pgoff_t index); bool f2fs_cluster_can_merge_page(struct compress_ctx *cc, pgoff_t index);
bool f2fs_sanity_check_cluster(struct dnode_of_data *dn);
void f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page); void f2fs_compress_ctx_add_page(struct compress_ctx *cc, struct page *page);
int f2fs_write_multi_pages(struct compress_ctx *cc, int f2fs_write_multi_pages(struct compress_ctx *cc,
int *submitted, int *submitted,
struct writeback_control *wbc, struct writeback_control *wbc,
enum iostat_type io_type); enum iostat_type io_type);
int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index); int f2fs_is_compressed_cluster(struct inode *inode, pgoff_t index);
void f2fs_update_extent_tree_range_compressed(struct inode *inode,
pgoff_t fofs, block_t blkaddr, unsigned int llen,
unsigned int c_len);
int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret, int f2fs_read_multi_pages(struct compress_ctx *cc, struct bio **bio_ret,
unsigned nr_pages, sector_t *last_block_in_bio, unsigned nr_pages, sector_t *last_block_in_bio,
bool is_readahead, bool for_write); bool is_readahead, bool for_write);
struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc); struct decompress_io_ctx *f2fs_alloc_dic(struct compress_ctx *cc);
void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed); void f2fs_decompress_end_io(struct decompress_io_ctx *dic, bool failed);
void f2fs_put_page_dic(struct page *page); void f2fs_put_page_dic(struct page *page);
unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn);
int f2fs_init_compress_ctx(struct compress_ctx *cc); int f2fs_init_compress_ctx(struct compress_ctx *cc);
void f2fs_destroy_compress_ctx(struct compress_ctx *cc, bool reuse); void f2fs_destroy_compress_ctx(struct compress_ctx *cc, bool reuse);
void f2fs_init_compress_info(struct f2fs_sb_info *sbi); void f2fs_init_compress_info(struct f2fs_sb_info *sbi);
@ -4106,6 +4098,8 @@ static inline void f2fs_put_page_dic(struct page *page)
{ {
WARN_ON_ONCE(1); WARN_ON_ONCE(1);
} }
static inline unsigned int f2fs_cluster_blocks_are_contiguous(struct dnode_of_data *dn) { return 0; }
static inline bool f2fs_sanity_check_cluster(struct dnode_of_data *dn) { return false; }
static inline int f2fs_init_compress_inode(struct f2fs_sb_info *sbi) { return 0; } static inline int f2fs_init_compress_inode(struct f2fs_sb_info *sbi) { return 0; }
static inline void f2fs_destroy_compress_inode(struct f2fs_sb_info *sbi) { } static inline void f2fs_destroy_compress_inode(struct f2fs_sb_info *sbi) { }
static inline int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) { return 0; } static inline int f2fs_init_page_array_cache(struct f2fs_sb_info *sbi) { return 0; }
@ -4121,6 +4115,9 @@ static inline bool f2fs_load_compressed_page(struct f2fs_sb_info *sbi,
static inline void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi, static inline void f2fs_invalidate_compress_pages(struct f2fs_sb_info *sbi,
nid_t ino) { } nid_t ino) { }
#define inc_compr_inode_stat(inode) do { } while (0) #define inc_compr_inode_stat(inode) do { } while (0)
static inline void f2fs_update_extent_tree_range_compressed(struct inode *inode,
pgoff_t fofs, block_t blkaddr, unsigned int llen,
unsigned int c_len) { }
#endif #endif
static inline void set_compress_context(struct inode *inode) static inline void set_compress_context(struct inode *inode)
@ -4136,7 +4133,8 @@ static inline void set_compress_context(struct inode *inode)
1 << COMPRESS_CHKSUM : 0; 1 << COMPRESS_CHKSUM : 0;
F2FS_I(inode)->i_cluster_size = F2FS_I(inode)->i_cluster_size =
1 << F2FS_I(inode)->i_log_cluster_size; 1 << F2FS_I(inode)->i_log_cluster_size;
if (F2FS_I(inode)->i_compress_algorithm == COMPRESS_LZ4 && if ((F2FS_I(inode)->i_compress_algorithm == COMPRESS_LZ4 ||
F2FS_I(inode)->i_compress_algorithm == COMPRESS_ZSTD) &&
F2FS_OPTION(sbi).compress_level) F2FS_OPTION(sbi).compress_level)
F2FS_I(inode)->i_compress_flag |= F2FS_I(inode)->i_compress_flag |=
F2FS_OPTION(sbi).compress_level << F2FS_OPTION(sbi).compress_level <<
@ -4304,16 +4302,6 @@ static inline int block_unaligned_IO(struct inode *inode,
return align & blocksize_mask; return align & blocksize_mask;
} }
static inline int allow_outplace_dio(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int rw = iov_iter_rw(iter);
return (f2fs_lfs_mode(sbi) && (rw == WRITE) &&
!block_unaligned_IO(inode, iocb, iter));
}
static inline bool f2fs_force_buffered_io(struct inode *inode, static inline bool f2fs_force_buffered_io(struct inode *inode,
struct kiocb *iocb, struct iov_iter *iter) struct kiocb *iocb, struct iov_iter *iter)
{ {
@ -4368,6 +4356,11 @@ static inline bool is_journalled_quota(struct f2fs_sb_info *sbi)
return false; return false;
} }
static inline bool f2fs_block_unit_discard(struct f2fs_sb_info *sbi)
{
return F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK;
}
#define EFSBADCRC EBADMSG /* Bad CRC detected */ #define EFSBADCRC EBADMSG /* Bad CRC detected */
#define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */ #define EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

View File

@ -23,6 +23,7 @@
#include <linux/nls.h> #include <linux/nls.h>
#include <linux/sched/signal.h> #include <linux/sched/signal.h>
#include <linux/fileattr.h> #include <linux/fileattr.h>
#include <linux/fadvise.h>
#include "f2fs.h" #include "f2fs.h"
#include "node.h" #include "node.h"
@ -30,6 +31,7 @@
#include "xattr.h" #include "xattr.h"
#include "acl.h" #include "acl.h"
#include "gc.h" #include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#include <uapi/linux/f2fs.h> #include <uapi/linux/f2fs.h>
@ -258,8 +260,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
}; };
unsigned int seq_id = 0; unsigned int seq_id = 0;
if (unlikely(f2fs_readonly(inode->i_sb) || if (unlikely(f2fs_readonly(inode->i_sb)))
is_sbi_flag_set(sbi, SBI_CP_DISABLED)))
return 0; return 0;
trace_f2fs_sync_file_enter(inode); trace_f2fs_sync_file_enter(inode);
@ -273,7 +274,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
ret = file_write_and_wait_range(file, start, end); ret = file_write_and_wait_range(file, start, end);
clear_inode_flag(inode, FI_NEED_IPU); clear_inode_flag(inode, FI_NEED_IPU);
if (ret) { if (ret || is_sbi_flag_set(sbi, SBI_CP_DISABLED)) {
trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret); trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret);
return ret; return ret;
} }
@ -298,6 +299,18 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
f2fs_exist_written_data(sbi, ino, UPDATE_INO)) f2fs_exist_written_data(sbi, ino, UPDATE_INO))
goto flush_out; goto flush_out;
goto out; goto out;
} else {
/*
* for OPU case, during fsync(), node can be persisted before
* data when lower device doesn't support write barrier, result
* in data corruption after SPO.
* So for strict fsync mode, force to use atomic write sematics
* to keep write order in between data/node and last node to
* avoid potential data corruption.
*/
if (F2FS_OPTION(sbi).fsync_mode ==
FSYNC_MODE_STRICT && !atomic)
atomic = true;
} }
go_write: go_write:
/* /*
@ -737,6 +750,14 @@ int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock)
return err; return err;
#ifdef CONFIG_F2FS_FS_COMPRESSION #ifdef CONFIG_F2FS_FS_COMPRESSION
/*
* For compressed file, after release compress blocks, don't allow write
* direct, but we should allow write direct after truncate to zero.
*/
if (f2fs_compressed_file(inode) && !free_from
&& is_inode_flag_set(inode, FI_COMPRESS_RELEASED))
clear_inode_flag(inode, FI_COMPRESS_RELEASED);
if (from != free_from) { if (from != free_from) {
err = f2fs_truncate_partial_cluster(inode, from, lock); err = f2fs_truncate_partial_cluster(inode, from, lock);
if (err) if (err)
@ -1082,7 +1103,6 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len)
} }
if (pg_start < pg_end) { if (pg_start < pg_end) {
struct address_space *mapping = inode->i_mapping;
loff_t blk_start, blk_end; loff_t blk_start, blk_end;
struct f2fs_sb_info *sbi = F2FS_I_SB(inode); struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@ -1092,16 +1112,15 @@ static int punch_hole(struct inode *inode, loff_t offset, loff_t len)
blk_end = (loff_t)pg_end << PAGE_SHIFT; blk_end = (loff_t)pg_end << PAGE_SHIFT;
down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_lock(mapping); filemap_invalidate_lock(inode->i_mapping);
truncate_inode_pages_range(mapping, blk_start, truncate_pagecache_range(inode, blk_start, blk_end - 1);
blk_end - 1);
f2fs_lock_op(sbi); f2fs_lock_op(sbi);
ret = f2fs_truncate_hole(inode, pg_start, pg_end); ret = f2fs_truncate_hole(inode, pg_start, pg_end);
f2fs_unlock_op(sbi); f2fs_unlock_op(sbi);
filemap_invalidate_unlock(mapping); filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]); up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
} }
} }
@ -3473,8 +3492,8 @@ static int f2fs_release_compress_blocks(struct file *filp, unsigned long arg)
released_blocks += ret; released_blocks += ret;
} }
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_unlock(inode->i_mapping); filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
out: out:
inode_unlock(inode); inode_unlock(inode);
@ -3626,8 +3645,8 @@ static int f2fs_reserve_compress_blocks(struct file *filp, unsigned long arg)
reserved_blocks += ret; reserved_blocks += ret;
} }
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
filemap_invalidate_unlock(inode->i_mapping); filemap_invalidate_unlock(inode->i_mapping);
up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
if (ret >= 0) { if (ret >= 0) {
clear_inode_flag(inode, FI_COMPRESS_RELEASED); clear_inode_flag(inode, FI_COMPRESS_RELEASED);
@ -4290,7 +4309,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
* back to buffered IO. * back to buffered IO.
*/ */
if (!f2fs_force_buffered_io(inode, iocb, from) && if (!f2fs_force_buffered_io(inode, iocb, from) &&
allow_outplace_dio(inode, iocb, from)) f2fs_lfs_mode(F2FS_I_SB(inode)))
goto write; goto write;
} }
preallocated = true; preallocated = true;
@ -4330,6 +4349,34 @@ out:
return ret; return ret;
} }
static int f2fs_file_fadvise(struct file *filp, loff_t offset, loff_t len,
int advice)
{
struct inode *inode;
struct address_space *mapping;
struct backing_dev_info *bdi;
if (advice == POSIX_FADV_SEQUENTIAL) {
inode = file_inode(filp);
if (S_ISFIFO(inode->i_mode))
return -ESPIPE;
mapping = filp->f_mapping;
if (!mapping || len < 0)
return -EINVAL;
bdi = inode_to_bdi(mapping->host);
filp->f_ra.ra_pages = bdi->ra_pages *
F2FS_I_SB(inode)->seq_file_ra_mul;
spin_lock(&filp->f_lock);
filp->f_mode &= ~FMODE_RANDOM;
spin_unlock(&filp->f_lock);
return 0;
}
return generic_fadvise(filp, offset, len, advice);
}
#ifdef CONFIG_COMPAT #ifdef CONFIG_COMPAT
struct compat_f2fs_gc_range { struct compat_f2fs_gc_range {
u32 sync; u32 sync;
@ -4458,4 +4505,5 @@ const struct file_operations f2fs_file_operations = {
#endif #endif
.splice_read = generic_file_splice_read, .splice_read = generic_file_splice_read,
.splice_write = iter_file_splice_write, .splice_write = iter_file_splice_write,
.fadvise = f2fs_file_fadvise,
}; };

View File

@ -19,6 +19,7 @@
#include "node.h" #include "node.h"
#include "segment.h" #include "segment.h"
#include "gc.h" #include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
static struct kmem_cache *victim_entry_slab; static struct kmem_cache *victim_entry_slab;
@ -371,7 +372,8 @@ static struct victim_entry *attach_victim_entry(struct f2fs_sb_info *sbi,
struct atgc_management *am = &sbi->am; struct atgc_management *am = &sbi->am;
struct victim_entry *ve; struct victim_entry *ve;
ve = f2fs_kmem_cache_alloc(victim_entry_slab, GFP_NOFS); ve = f2fs_kmem_cache_alloc(victim_entry_slab,
GFP_NOFS, true, NULL);
ve->mtime = mtime; ve->mtime = mtime;
ve->segno = segno; ve->segno = segno;
@ -849,7 +851,8 @@ static void add_gc_inode(struct gc_inode_list *gc_list, struct inode *inode)
iput(inode); iput(inode);
return; return;
} }
new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS); new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab,
GFP_NOFS, true, NULL);
new_ie->inode = inode; new_ie->inode = inode;
f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie); f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie);
@ -1497,8 +1500,10 @@ next_step:
int err; int err;
if (S_ISREG(inode->i_mode)) { if (S_ISREG(inode->i_mode)) {
if (!down_write_trylock(&fi->i_gc_rwsem[READ])) if (!down_write_trylock(&fi->i_gc_rwsem[READ])) {
sbi->skipped_gc_rwsem++;
continue; continue;
}
if (!down_write_trylock( if (!down_write_trylock(
&fi->i_gc_rwsem[WRITE])) { &fi->i_gc_rwsem[WRITE])) {
sbi->skipped_gc_rwsem++; sbi->skipped_gc_rwsem++;
@ -1646,6 +1651,7 @@ static int do_garbage_collect(struct f2fs_sb_info *sbi,
force_migrate); force_migrate);
stat_inc_seg_count(sbi, type, gc_type); stat_inc_seg_count(sbi, type, gc_type);
sbi->gc_reclaimed_segs[sbi->gc_mode]++;
migrated++; migrated++;
freed: freed:
@ -1747,7 +1753,7 @@ gc_more:
round++; round++;
} }
if (gc_type == FG_GC && seg_freed) if (gc_type == FG_GC)
sbi->cur_victim_sec = NULL_SEGNO; sbi->cur_victim_sec = NULL_SEGNO;
if (sync) if (sync)

287
fs/f2fs/iostat.c Normal file
View File

@ -0,0 +1,287 @@
// SPDX-License-Identifier: GPL-2.0
/*
* f2fs iostat support
*
* Copyright 2021 Google LLC
* Author: Daeho Jeong <daehojeong@google.com>
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
#include <linux/seq_file.h>
#include "f2fs.h"
#include "iostat.h"
#include <trace/events/f2fs.h>
#define NUM_PREALLOC_IOSTAT_CTXS 128
static struct kmem_cache *bio_iostat_ctx_cache;
static mempool_t *bio_iostat_ctx_pool;
int __maybe_unused iostat_info_seq_show(struct seq_file *seq, void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
time64_t now = ktime_get_real_seconds();
if (!sbi->iostat_enable)
return 0;
seq_printf(seq, "time: %-16llu\n", now);
/* print app write IOs */
seq_puts(seq, "[WRITE]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_IO]);
/* print fs write IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GC_DATA_IO]);
seq_printf(seq, "fs gc node: %-16llu\n",
sbi->rw_iostat[FS_GC_NODE_IO]);
seq_printf(seq, "fs cp data: %-16llu\n",
sbi->rw_iostat[FS_CP_DATA_IO]);
seq_printf(seq, "fs cp node: %-16llu\n",
sbi->rw_iostat[FS_CP_NODE_IO]);
seq_printf(seq, "fs cp meta: %-16llu\n",
sbi->rw_iostat[FS_CP_META_IO]);
/* print app read IOs */
seq_puts(seq, "[READ]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_READ_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_READ_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_READ_IO]);
/* print fs read IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_READ_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GDATA_READ_IO]);
seq_printf(seq, "fs compr_data: %-16llu\n",
sbi->rw_iostat[FS_CDATA_READ_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_READ_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_READ_IO]);
/* print other IOs */
seq_puts(seq, "[OTHER]\n");
seq_printf(seq, "fs discard: %-16llu\n",
sbi->rw_iostat[FS_DISCARD]);
return 0;
}
static inline void __record_iostat_latency(struct f2fs_sb_info *sbi)
{
int io, idx = 0;
unsigned int cnt;
struct f2fs_iostat_latency iostat_lat[MAX_IO_TYPE][NR_PAGE_TYPE];
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
spin_lock_irq(&sbi->iostat_lat_lock);
for (idx = 0; idx < MAX_IO_TYPE; idx++) {
for (io = 0; io < NR_PAGE_TYPE; io++) {
cnt = io_lat->bio_cnt[idx][io];
iostat_lat[idx][io].peak_lat =
jiffies_to_msecs(io_lat->peak_lat[idx][io]);
iostat_lat[idx][io].cnt = cnt;
iostat_lat[idx][io].avg_lat = cnt ?
jiffies_to_msecs(io_lat->sum_lat[idx][io]) / cnt : 0;
io_lat->sum_lat[idx][io] = 0;
io_lat->peak_lat[idx][io] = 0;
io_lat->bio_cnt[idx][io] = 0;
}
}
spin_unlock_irq(&sbi->iostat_lat_lock);
trace_f2fs_iostat_latency(sbi, iostat_lat);
}
static inline void f2fs_record_iostat(struct f2fs_sb_info *sbi)
{
unsigned long long iostat_diff[NR_IO_TYPE];
int i;
if (time_is_after_jiffies(sbi->iostat_next_period))
return;
/* Need double check under the lock */
spin_lock(&sbi->iostat_lock);
if (time_is_after_jiffies(sbi->iostat_next_period)) {
spin_unlock(&sbi->iostat_lock);
return;
}
sbi->iostat_next_period = jiffies +
msecs_to_jiffies(sbi->iostat_period_ms);
for (i = 0; i < NR_IO_TYPE; i++) {
iostat_diff[i] = sbi->rw_iostat[i] -
sbi->prev_rw_iostat[i];
sbi->prev_rw_iostat[i] = sbi->rw_iostat[i];
}
spin_unlock(&sbi->iostat_lock);
trace_f2fs_iostat(sbi, iostat_diff);
__record_iostat_latency(sbi);
}
void f2fs_reset_iostat(struct f2fs_sb_info *sbi)
{
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int i;
spin_lock(&sbi->iostat_lock);
for (i = 0; i < NR_IO_TYPE; i++) {
sbi->rw_iostat[i] = 0;
sbi->prev_rw_iostat[i] = 0;
}
spin_unlock(&sbi->iostat_lock);
spin_lock_irq(&sbi->iostat_lat_lock);
memset(io_lat, 0, sizeof(struct iostat_lat_info));
spin_unlock_irq(&sbi->iostat_lat_lock);
}
void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes)
{
if (!sbi->iostat_enable)
return;
spin_lock(&sbi->iostat_lock);
sbi->rw_iostat[type] += io_bytes;
if (type == APP_WRITE_IO || type == APP_DIRECT_IO)
sbi->rw_iostat[APP_BUFFERED_IO] =
sbi->rw_iostat[APP_WRITE_IO] -
sbi->rw_iostat[APP_DIRECT_IO];
if (type == APP_READ_IO || type == APP_DIRECT_READ_IO)
sbi->rw_iostat[APP_BUFFERED_READ_IO] =
sbi->rw_iostat[APP_READ_IO] -
sbi->rw_iostat[APP_DIRECT_READ_IO];
spin_unlock(&sbi->iostat_lock);
f2fs_record_iostat(sbi);
}
static inline void __update_iostat_latency(struct bio_iostat_ctx *iostat_ctx,
int rw, bool is_sync)
{
unsigned long ts_diff;
unsigned int iotype = iostat_ctx->type;
unsigned long flags;
struct f2fs_sb_info *sbi = iostat_ctx->sbi;
struct iostat_lat_info *io_lat = sbi->iostat_io_lat;
int idx;
if (!sbi->iostat_enable)
return;
ts_diff = jiffies - iostat_ctx->submit_ts;
if (iotype >= META_FLUSH)
iotype = META;
if (rw == 0) {
idx = READ_IO;
} else {
if (is_sync)
idx = WRITE_SYNC_IO;
else
idx = WRITE_ASYNC_IO;
}
spin_lock_irqsave(&sbi->iostat_lat_lock, flags);
io_lat->sum_lat[idx][iotype] += ts_diff;
io_lat->bio_cnt[idx][iotype]++;
if (ts_diff > io_lat->peak_lat[idx][iotype])
io_lat->peak_lat[idx][iotype] = ts_diff;
spin_unlock_irqrestore(&sbi->iostat_lat_lock, flags);
}
void iostat_update_and_unbind_ctx(struct bio *bio, int rw)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
bool is_sync = bio->bi_opf & REQ_SYNC;
if (rw == 0)
bio->bi_private = iostat_ctx->post_read_ctx;
else
bio->bi_private = iostat_ctx->sbi;
__update_iostat_latency(iostat_ctx, rw, is_sync);
mempool_free(iostat_ctx, bio_iostat_ctx_pool);
}
void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx)
{
struct bio_iostat_ctx *iostat_ctx;
/* Due to the mempool, this never fails. */
iostat_ctx = mempool_alloc(bio_iostat_ctx_pool, GFP_NOFS);
iostat_ctx->sbi = sbi;
iostat_ctx->submit_ts = 0;
iostat_ctx->type = 0;
iostat_ctx->post_read_ctx = ctx;
bio->bi_private = iostat_ctx;
}
int __init f2fs_init_iostat_processing(void)
{
bio_iostat_ctx_cache =
kmem_cache_create("f2fs_bio_iostat_ctx",
sizeof(struct bio_iostat_ctx), 0, 0, NULL);
if (!bio_iostat_ctx_cache)
goto fail;
bio_iostat_ctx_pool =
mempool_create_slab_pool(NUM_PREALLOC_IOSTAT_CTXS,
bio_iostat_ctx_cache);
if (!bio_iostat_ctx_pool)
goto fail_free_cache;
return 0;
fail_free_cache:
kmem_cache_destroy(bio_iostat_ctx_cache);
fail:
return -ENOMEM;
}
void f2fs_destroy_iostat_processing(void)
{
mempool_destroy(bio_iostat_ctx_pool);
kmem_cache_destroy(bio_iostat_ctx_cache);
}
int f2fs_init_iostat(struct f2fs_sb_info *sbi)
{
/* init iostat info */
spin_lock_init(&sbi->iostat_lock);
spin_lock_init(&sbi->iostat_lat_lock);
sbi->iostat_enable = false;
sbi->iostat_period_ms = DEFAULT_IOSTAT_PERIOD_MS;
sbi->iostat_io_lat = f2fs_kzalloc(sbi, sizeof(struct iostat_lat_info),
GFP_KERNEL);
if (!sbi->iostat_io_lat)
return -ENOMEM;
return 0;
}
void f2fs_destroy_iostat(struct f2fs_sb_info *sbi)
{
kfree(sbi->iostat_io_lat);
}

84
fs/f2fs/iostat.h Normal file
View File

@ -0,0 +1,84 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* Copyright 2021 Google LLC
* Author: Daeho Jeong <daehojeong@google.com>
*/
#ifndef __F2FS_IOSTAT_H__
#define __F2FS_IOSTAT_H__
struct bio_post_read_ctx;
#ifdef CONFIG_F2FS_IOSTAT
#define DEFAULT_IOSTAT_PERIOD_MS 3000
#define MIN_IOSTAT_PERIOD_MS 100
/* maximum period of iostat tracing is 1 day */
#define MAX_IOSTAT_PERIOD_MS 8640000
enum {
READ_IO,
WRITE_SYNC_IO,
WRITE_ASYNC_IO,
MAX_IO_TYPE,
};
struct iostat_lat_info {
unsigned long sum_lat[MAX_IO_TYPE][NR_PAGE_TYPE]; /* sum of io latencies */
unsigned long peak_lat[MAX_IO_TYPE][NR_PAGE_TYPE]; /* peak io latency */
unsigned int bio_cnt[MAX_IO_TYPE][NR_PAGE_TYPE]; /* bio count */
};
extern int __maybe_unused iostat_info_seq_show(struct seq_file *seq,
void *offset);
extern void f2fs_reset_iostat(struct f2fs_sb_info *sbi);
extern void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes);
struct bio_iostat_ctx {
struct f2fs_sb_info *sbi;
unsigned long submit_ts;
enum page_type type;
struct bio_post_read_ctx *post_read_ctx;
};
static inline void iostat_update_submit_ctx(struct bio *bio,
enum page_type type)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
iostat_ctx->submit_ts = jiffies;
iostat_ctx->type = type;
}
static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
{
struct bio_iostat_ctx *iostat_ctx = bio->bi_private;
return iostat_ctx->post_read_ctx;
}
extern void iostat_update_and_unbind_ctx(struct bio *bio, int rw);
extern void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx);
extern int f2fs_init_iostat_processing(void);
extern void f2fs_destroy_iostat_processing(void);
extern int f2fs_init_iostat(struct f2fs_sb_info *sbi);
extern void f2fs_destroy_iostat(struct f2fs_sb_info *sbi);
#else
static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi,
enum iostat_type type, unsigned long long io_bytes) {}
static inline void iostat_update_and_unbind_ctx(struct bio *bio, int rw) {}
static inline void iostat_alloc_and_bind_ctx(struct f2fs_sb_info *sbi,
struct bio *bio, struct bio_post_read_ctx *ctx) {}
static inline void iostat_update_submit_ctx(struct bio *bio,
enum page_type type) {}
static inline struct bio_post_read_ctx *get_post_read_ctx(struct bio *bio)
{
return bio->bi_private;
}
static inline int f2fs_init_iostat_processing(void) { return 0; }
static inline void f2fs_destroy_iostat_processing(void) {}
static inline int f2fs_init_iostat(struct f2fs_sb_info *sbi) { return 0; }
static inline void f2fs_destroy_iostat(struct f2fs_sb_info *sbi) {}
#endif
#endif /* __F2FS_IOSTAT_H__ */

View File

@ -17,6 +17,7 @@
#include "node.h" #include "node.h"
#include "segment.h" #include "segment.h"
#include "xattr.h" #include "xattr.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock) #define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock)
@ -162,14 +163,13 @@ static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid)
return dst_page; return dst_page;
} }
static struct nat_entry *__alloc_nat_entry(nid_t nid, bool no_fail) static struct nat_entry *__alloc_nat_entry(struct f2fs_sb_info *sbi,
nid_t nid, bool no_fail)
{ {
struct nat_entry *new; struct nat_entry *new;
if (no_fail) new = f2fs_kmem_cache_alloc(nat_entry_slab,
new = f2fs_kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO); GFP_F2FS_ZERO, no_fail, sbi);
else
new = kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO);
if (new) { if (new) {
nat_set_nid(new, nid); nat_set_nid(new, nid);
nat_reset_flag(new); nat_reset_flag(new);
@ -242,7 +242,8 @@ static struct nat_entry_set *__grab_nat_entry_set(struct f2fs_nm_info *nm_i,
head = radix_tree_lookup(&nm_i->nat_set_root, set); head = radix_tree_lookup(&nm_i->nat_set_root, set);
if (!head) { if (!head) {
head = f2fs_kmem_cache_alloc(nat_entry_set_slab, GFP_NOFS); head = f2fs_kmem_cache_alloc(nat_entry_set_slab,
GFP_NOFS, true, NULL);
INIT_LIST_HEAD(&head->entry_list); INIT_LIST_HEAD(&head->entry_list);
INIT_LIST_HEAD(&head->set_list); INIT_LIST_HEAD(&head->set_list);
@ -329,7 +330,8 @@ static unsigned int f2fs_add_fsync_node_entry(struct f2fs_sb_info *sbi,
unsigned long flags; unsigned long flags;
unsigned int seq_id; unsigned int seq_id;
fn = f2fs_kmem_cache_alloc(fsync_node_entry_slab, GFP_NOFS); fn = f2fs_kmem_cache_alloc(fsync_node_entry_slab,
GFP_NOFS, true, NULL);
get_page(page); get_page(page);
fn->page = page; fn->page = page;
@ -428,7 +430,7 @@ static void cache_nat_entry(struct f2fs_sb_info *sbi, nid_t nid,
struct f2fs_nm_info *nm_i = NM_I(sbi); struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *new, *e; struct nat_entry *new, *e;
new = __alloc_nat_entry(nid, false); new = __alloc_nat_entry(sbi, nid, false);
if (!new) if (!new)
return; return;
@ -451,7 +453,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni,
{ {
struct f2fs_nm_info *nm_i = NM_I(sbi); struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e; struct nat_entry *e;
struct nat_entry *new = __alloc_nat_entry(ni->nid, true); struct nat_entry *new = __alloc_nat_entry(sbi, ni->nid, true);
down_write(&nm_i->nat_tree_lock); down_write(&nm_i->nat_tree_lock);
e = __lookup_nat_cache(nm_i, ni->nid); e = __lookup_nat_cache(nm_i, ni->nid);
@ -552,7 +554,7 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
int i; int i;
ni->nid = nid; ni->nid = nid;
retry:
/* Check nat cache */ /* Check nat cache */
down_read(&nm_i->nat_tree_lock); down_read(&nm_i->nat_tree_lock);
e = __lookup_nat_cache(nm_i, nid); e = __lookup_nat_cache(nm_i, nid);
@ -564,10 +566,19 @@ int f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
return 0; return 0;
} }
memset(&ne, 0, sizeof(struct f2fs_nat_entry)); /*
* Check current segment summary by trying to grab journal_rwsem first.
* This sem is on the critical path on the checkpoint requiring the above
* nat_tree_lock. Therefore, we should retry, if we failed to grab here
* while not bothering checkpoint.
*/
if (!rwsem_is_locked(&sbi->cp_global_sem)) {
down_read(&curseg->journal_rwsem);
} else if (!down_read_trylock(&curseg->journal_rwsem)) {
up_read(&nm_i->nat_tree_lock);
goto retry;
}
/* Check current segment summary */
down_read(&curseg->journal_rwsem);
i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0); i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0);
if (i >= 0) { if (i >= 0) {
ne = nat_in_journal(journal, i); ne = nat_in_journal(journal, i);
@ -832,6 +843,26 @@ int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode)
dn->ofs_in_node = offset[level]; dn->ofs_in_node = offset[level];
dn->node_page = npage[level]; dn->node_page = npage[level];
dn->data_blkaddr = f2fs_data_blkaddr(dn); dn->data_blkaddr = f2fs_data_blkaddr(dn);
if (is_inode_flag_set(dn->inode, FI_COMPRESSED_FILE) &&
f2fs_sb_has_readonly(sbi)) {
unsigned int c_len = f2fs_cluster_blocks_are_contiguous(dn);
block_t blkaddr;
if (!c_len)
goto out;
blkaddr = f2fs_data_blkaddr(dn);
if (blkaddr == COMPRESS_ADDR)
blkaddr = data_blkaddr(dn->inode, dn->node_page,
dn->ofs_in_node + 1);
f2fs_update_extent_tree_range_compressed(dn->inode,
index, blkaddr,
F2FS_I(dn->inode)->i_cluster_size,
c_len);
}
out:
return 0; return 0;
release_pages: release_pages:
@ -1321,7 +1352,8 @@ static int read_node_page(struct page *page, int op_flags)
if (err) if (err)
return err; return err;
if (unlikely(ni.blk_addr == NULL_ADDR) || /* NEW_ADDR can be seen, after cp_error drops some dirty node pages */
if (unlikely(ni.blk_addr == NULL_ADDR || ni.blk_addr == NEW_ADDR) ||
is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) { is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) {
ClearPageUptodate(page); ClearPageUptodate(page);
return -ENOENT; return -ENOENT;
@ -2181,6 +2213,24 @@ static void __move_free_nid(struct f2fs_sb_info *sbi, struct free_nid *i,
} }
} }
bool f2fs_nat_bitmap_enabled(struct f2fs_sb_info *sbi)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
unsigned int i;
bool ret = true;
down_read(&nm_i->nat_tree_lock);
for (i = 0; i < nm_i->nat_blocks; i++) {
if (!test_bit_le(i, nm_i->nat_block_bitmap)) {
ret = false;
break;
}
}
up_read(&nm_i->nat_tree_lock);
return ret;
}
static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid, static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
bool set, bool build) bool set, bool build)
{ {
@ -2222,7 +2272,7 @@ static bool add_free_nid(struct f2fs_sb_info *sbi,
if (unlikely(f2fs_check_nid_range(sbi, nid))) if (unlikely(f2fs_check_nid_range(sbi, nid)))
return false; return false;
i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS); i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS, true, NULL);
i->nid = nid; i->nid = nid;
i->state = FREE_NID; i->state = FREE_NID;
@ -2812,7 +2862,7 @@ static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
ne = __lookup_nat_cache(nm_i, nid); ne = __lookup_nat_cache(nm_i, nid);
if (!ne) { if (!ne) {
ne = __alloc_nat_entry(nid, true); ne = __alloc_nat_entry(sbi, nid, true);
__init_nat_entry(nm_i, ne, &raw_ne, true); __init_nat_entry(nm_i, ne, &raw_ne, true);
} }
@ -2852,7 +2902,23 @@ add_out:
list_add_tail(&nes->set_list, head); list_add_tail(&nes->set_list, head);
} }
static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid, static void __update_nat_bits(struct f2fs_nm_info *nm_i, unsigned int nat_ofs,
unsigned int valid)
{
if (valid == 0) {
__set_bit_le(nat_ofs, nm_i->empty_nat_bits);
__clear_bit_le(nat_ofs, nm_i->full_nat_bits);
return;
}
__clear_bit_le(nat_ofs, nm_i->empty_nat_bits);
if (valid == NAT_ENTRY_PER_BLOCK)
__set_bit_le(nat_ofs, nm_i->full_nat_bits);
else
__clear_bit_le(nat_ofs, nm_i->full_nat_bits);
}
static void update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
struct page *page) struct page *page)
{ {
struct f2fs_nm_info *nm_i = NM_I(sbi); struct f2fs_nm_info *nm_i = NM_I(sbi);
@ -2861,7 +2927,7 @@ static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
int valid = 0; int valid = 0;
int i = 0; int i = 0;
if (!enabled_nat_bits(sbi, NULL)) if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
return; return;
if (nat_index == 0) { if (nat_index == 0) {
@ -2872,17 +2938,36 @@ static void __update_nat_bits(struct f2fs_sb_info *sbi, nid_t start_nid,
if (le32_to_cpu(nat_blk->entries[i].block_addr) != NULL_ADDR) if (le32_to_cpu(nat_blk->entries[i].block_addr) != NULL_ADDR)
valid++; valid++;
} }
if (valid == 0) {
__set_bit_le(nat_index, nm_i->empty_nat_bits); __update_nat_bits(nm_i, nat_index, valid);
__clear_bit_le(nat_index, nm_i->full_nat_bits); }
return;
void f2fs_enable_nat_bits(struct f2fs_sb_info *sbi)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
unsigned int nat_ofs;
down_read(&nm_i->nat_tree_lock);
for (nat_ofs = 0; nat_ofs < nm_i->nat_blocks; nat_ofs++) {
unsigned int valid = 0, nid_ofs = 0;
/* handle nid zero due to it should never be used */
if (unlikely(nat_ofs == 0)) {
valid = 1;
nid_ofs = 1;
}
for (; nid_ofs < NAT_ENTRY_PER_BLOCK; nid_ofs++) {
if (!test_bit_le(nid_ofs,
nm_i->free_nid_bitmap[nat_ofs]))
valid++;
}
__update_nat_bits(nm_i, nat_ofs, valid);
} }
__clear_bit_le(nat_index, nm_i->empty_nat_bits); up_read(&nm_i->nat_tree_lock);
if (valid == NAT_ENTRY_PER_BLOCK)
__set_bit_le(nat_index, nm_i->full_nat_bits);
else
__clear_bit_le(nat_index, nm_i->full_nat_bits);
} }
static int __flush_nat_entry_set(struct f2fs_sb_info *sbi, static int __flush_nat_entry_set(struct f2fs_sb_info *sbi,
@ -2901,7 +2986,7 @@ static int __flush_nat_entry_set(struct f2fs_sb_info *sbi,
* #1, flush nat entries to journal in current hot data summary block. * #1, flush nat entries to journal in current hot data summary block.
* #2, flush nat entries to nat page. * #2, flush nat entries to nat page.
*/ */
if (enabled_nat_bits(sbi, cpc) || if ((cpc->reason & CP_UMOUNT) ||
!__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL)) !__has_cursum_space(journal, set->entry_cnt, NAT_JOURNAL))
to_journal = false; to_journal = false;
@ -2948,7 +3033,7 @@ static int __flush_nat_entry_set(struct f2fs_sb_info *sbi,
if (to_journal) { if (to_journal) {
up_write(&curseg->journal_rwsem); up_write(&curseg->journal_rwsem);
} else { } else {
__update_nat_bits(sbi, start_nid, page); update_nat_bits(sbi, start_nid, page);
f2fs_put_page(page, 1); f2fs_put_page(page, 1);
} }
@ -2979,7 +3064,7 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
* during unmount, let's flush nat_bits before checking * during unmount, let's flush nat_bits before checking
* nat_cnt[DIRTY_NAT]. * nat_cnt[DIRTY_NAT].
*/ */
if (enabled_nat_bits(sbi, cpc)) { if (cpc->reason & CP_UMOUNT) {
down_write(&nm_i->nat_tree_lock); down_write(&nm_i->nat_tree_lock);
remove_nats_in_journal(sbi); remove_nats_in_journal(sbi);
up_write(&nm_i->nat_tree_lock); up_write(&nm_i->nat_tree_lock);
@ -2995,7 +3080,7 @@ int f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
* entries, remove all entries from journal and merge them * entries, remove all entries from journal and merge them
* into nat entry set. * into nat entry set.
*/ */
if (enabled_nat_bits(sbi, cpc) || if (cpc->reason & CP_UMOUNT ||
!__has_cursum_space(journal, !__has_cursum_space(journal,
nm_i->nat_cnt[DIRTY_NAT], NAT_JOURNAL)) nm_i->nat_cnt[DIRTY_NAT], NAT_JOURNAL))
remove_nats_in_journal(sbi); remove_nats_in_journal(sbi);
@ -3032,15 +3117,18 @@ static int __get_nat_bitmaps(struct f2fs_sb_info *sbi)
__u64 cp_ver = cur_cp_version(ckpt); __u64 cp_ver = cur_cp_version(ckpt);
block_t nat_bits_addr; block_t nat_bits_addr;
if (!enabled_nat_bits(sbi, NULL))
return 0;
nm_i->nat_bits_blocks = F2FS_BLK_ALIGN((nat_bits_bytes << 1) + 8); nm_i->nat_bits_blocks = F2FS_BLK_ALIGN((nat_bits_bytes << 1) + 8);
nm_i->nat_bits = f2fs_kvzalloc(sbi, nm_i->nat_bits = f2fs_kvzalloc(sbi,
nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS, GFP_KERNEL); nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS, GFP_KERNEL);
if (!nm_i->nat_bits) if (!nm_i->nat_bits)
return -ENOMEM; return -ENOMEM;
nm_i->full_nat_bits = nm_i->nat_bits + 8;
nm_i->empty_nat_bits = nm_i->full_nat_bits + nat_bits_bytes;
if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
return 0;
nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg - nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg -
nm_i->nat_bits_blocks; nm_i->nat_bits_blocks;
for (i = 0; i < nm_i->nat_bits_blocks; i++) { for (i = 0; i < nm_i->nat_bits_blocks; i++) {
@ -3057,13 +3145,12 @@ static int __get_nat_bitmaps(struct f2fs_sb_info *sbi)
cp_ver |= (cur_cp_crc(ckpt) << 32); cp_ver |= (cur_cp_crc(ckpt) << 32);
if (cpu_to_le64(cp_ver) != *(__le64 *)nm_i->nat_bits) { if (cpu_to_le64(cp_ver) != *(__le64 *)nm_i->nat_bits) {
disable_nat_bits(sbi, true); clear_ckpt_flags(sbi, CP_NAT_BITS_FLAG);
f2fs_notice(sbi, "Disable nat_bits due to incorrect cp_ver (%llu, %llu)",
cp_ver, le64_to_cpu(*(__le64 *)nm_i->nat_bits));
return 0; return 0;
} }
nm_i->full_nat_bits = nm_i->nat_bits + 8;
nm_i->empty_nat_bits = nm_i->full_nat_bits + nat_bits_bytes;
f2fs_notice(sbi, "Found nat_bits in checkpoint"); f2fs_notice(sbi, "Found nat_bits in checkpoint");
return 0; return 0;
} }
@ -3074,7 +3161,7 @@ static inline void load_free_nid_bitmap(struct f2fs_sb_info *sbi)
unsigned int i = 0; unsigned int i = 0;
nid_t nid, last_nid; nid_t nid, last_nid;
if (!enabled_nat_bits(sbi, NULL)) if (!is_set_ckpt_flags(sbi, CP_NAT_BITS_FLAG))
return; return;
for (i = 0; i < nm_i->nat_blocks; i++) { for (i = 0; i < nm_i->nat_blocks; i++) {

View File

@ -91,7 +91,8 @@ static struct fsync_inode_entry *add_fsync_inode(struct f2fs_sb_info *sbi,
goto err_out; goto err_out;
} }
entry = f2fs_kmem_cache_alloc(fsync_entry_slab, GFP_F2FS_ZERO); entry = f2fs_kmem_cache_alloc(fsync_entry_slab,
GFP_F2FS_ZERO, true, NULL);
entry->inode = inode; entry->inode = inode;
list_add_tail(&entry->list, head); list_add_tail(&entry->list, head);

View File

@ -20,6 +20,7 @@
#include "segment.h" #include "segment.h"
#include "node.h" #include "node.h"
#include "gc.h" #include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
#define __reverse_ffz(x) __reverse_ffs(~(x)) #define __reverse_ffz(x) __reverse_ffs(~(x))
@ -188,7 +189,8 @@ void f2fs_register_inmem_page(struct inode *inode, struct page *page)
set_page_private_atomic(page); set_page_private_atomic(page);
new = f2fs_kmem_cache_alloc(inmem_entry_slab, GFP_NOFS); new = f2fs_kmem_cache_alloc(inmem_entry_slab,
GFP_NOFS, true, NULL);
/* add atomic page indices to the list */ /* add atomic page indices to the list */
new->page = page; new->page = page;
@ -776,11 +778,22 @@ int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
return 0; return 0;
for (i = 1; i < sbi->s_ndevs; i++) { for (i = 1; i < sbi->s_ndevs; i++) {
int count = DEFAULT_RETRY_IO_COUNT;
if (!f2fs_test_bit(i, (char *)&sbi->dirty_device)) if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
continue; continue;
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret) do {
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret)
congestion_wait(BLK_RW_ASYNC,
DEFAULT_IO_TIMEOUT);
} while (ret && --count);
if (ret) {
f2fs_stop_checkpoint(sbi, false);
break; break;
}
spin_lock(&sbi->dev_lock); spin_lock(&sbi->dev_lock);
f2fs_clear_bit(i, (char *)&sbi->dirty_device); f2fs_clear_bit(i, (char *)&sbi->dirty_device);
@ -990,7 +1003,7 @@ static struct discard_cmd *__create_discard_cmd(struct f2fs_sb_info *sbi,
pend_list = &dcc->pend_list[plist_idx(len)]; pend_list = &dcc->pend_list[plist_idx(len)];
dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS); dc = f2fs_kmem_cache_alloc(discard_cmd_slab, GFP_NOFS, true, NULL);
INIT_LIST_HEAD(&dc->list); INIT_LIST_HEAD(&dc->list);
dc->bdev = bdev; dc->bdev = bdev;
dc->lstart = lstart; dc->lstart = lstart;
@ -1893,7 +1906,8 @@ static int f2fs_issue_discard(struct f2fs_sb_info *sbi,
se = get_seg_entry(sbi, GET_SEGNO(sbi, i)); se = get_seg_entry(sbi, GET_SEGNO(sbi, i));
offset = GET_BLKOFF_FROM_SEG0(sbi, i); offset = GET_BLKOFF_FROM_SEG0(sbi, i);
if (!f2fs_test_and_set_bit(offset, se->discard_map)) if (f2fs_block_unit_discard(sbi) &&
!f2fs_test_and_set_bit(offset, se->discard_map))
sbi->discard_blks--; sbi->discard_blks--;
} }
@ -1918,7 +1932,8 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
struct list_head *head = &SM_I(sbi)->dcc_info->entry_list; struct list_head *head = &SM_I(sbi)->dcc_info->entry_list;
int i; int i;
if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi)) if (se->valid_blocks == max_blocks || !f2fs_hw_support_discard(sbi) ||
!f2fs_block_unit_discard(sbi))
return false; return false;
if (!force) { if (!force) {
@ -1949,7 +1964,7 @@ static bool add_discard_addrs(struct f2fs_sb_info *sbi, struct cp_control *cpc,
if (!de) { if (!de) {
de = f2fs_kmem_cache_alloc(discard_entry_slab, de = f2fs_kmem_cache_alloc(discard_entry_slab,
GFP_F2FS_ZERO); GFP_F2FS_ZERO, true, NULL);
de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start); de->start_blkaddr = START_BLOCK(sbi, cpc->trim_start);
list_add_tail(&de->list, head); list_add_tail(&de->list, head);
} }
@ -2003,14 +2018,18 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
unsigned int start = 0, end = -1; unsigned int start = 0, end = -1;
unsigned int secno, start_segno; unsigned int secno, start_segno;
bool force = (cpc->reason & CP_DISCARD); bool force = (cpc->reason & CP_DISCARD);
bool need_align = f2fs_lfs_mode(sbi) && __is_large_section(sbi); bool section_alignment = F2FS_OPTION(sbi).discard_unit ==
DISCARD_UNIT_SECTION;
if (f2fs_lfs_mode(sbi) && __is_large_section(sbi))
section_alignment = true;
mutex_lock(&dirty_i->seglist_lock); mutex_lock(&dirty_i->seglist_lock);
while (1) { while (1) {
int i; int i;
if (need_align && end != -1) if (section_alignment && end != -1)
end--; end--;
start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1); start = find_next_bit(prefree_map, MAIN_SEGS(sbi), end + 1);
if (start >= MAIN_SEGS(sbi)) if (start >= MAIN_SEGS(sbi))
@ -2018,7 +2037,7 @@ void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi), end = find_next_zero_bit(prefree_map, MAIN_SEGS(sbi),
start + 1); start + 1);
if (need_align) { if (section_alignment) {
start = rounddown(start, sbi->segs_per_sec); start = rounddown(start, sbi->segs_per_sec);
end = roundup(end, sbi->segs_per_sec); end = roundup(end, sbi->segs_per_sec);
} }
@ -2056,6 +2075,9 @@ next:
} }
mutex_unlock(&dirty_i->seglist_lock); mutex_unlock(&dirty_i->seglist_lock);
if (!f2fs_block_unit_discard(sbi))
goto wakeup;
/* send small discards */ /* send small discards */
list_for_each_entry_safe(entry, this, head, list) { list_for_each_entry_safe(entry, this, head, list) {
unsigned int cur_pos = 0, next_pos, len, total_len = 0; unsigned int cur_pos = 0, next_pos, len, total_len = 0;
@ -2089,12 +2111,29 @@ skip:
dcc->nr_discards -= total_len; dcc->nr_discards -= total_len;
} }
wakeup:
wake_up_discard_thread(sbi, false); wake_up_discard_thread(sbi, false);
} }
int f2fs_start_discard_thread(struct f2fs_sb_info *sbi)
{
dev_t dev = sbi->sb->s_bdev->bd_dev;
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
int err = 0;
if (!f2fs_realtime_discard_enable(sbi))
return 0;
dcc->f2fs_issue_discard = kthread_run(issue_discard_thread, sbi,
"f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev));
if (IS_ERR(dcc->f2fs_issue_discard))
err = PTR_ERR(dcc->f2fs_issue_discard);
return err;
}
static int create_discard_cmd_control(struct f2fs_sb_info *sbi) static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
{ {
dev_t dev = sbi->sb->s_bdev->bd_dev;
struct discard_cmd_control *dcc; struct discard_cmd_control *dcc;
int err = 0, i; int err = 0, i;
@ -2108,6 +2147,11 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
return -ENOMEM; return -ENOMEM;
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY; dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SEGMENT)
dcc->discard_granularity = sbi->blocks_per_seg;
else if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SECTION)
dcc->discard_granularity = BLKS_PER_SEC(sbi);
INIT_LIST_HEAD(&dcc->entry_list); INIT_LIST_HEAD(&dcc->entry_list);
for (i = 0; i < MAX_PLIST_NUM; i++) for (i = 0; i < MAX_PLIST_NUM; i++)
INIT_LIST_HEAD(&dcc->pend_list[i]); INIT_LIST_HEAD(&dcc->pend_list[i]);
@ -2127,13 +2171,10 @@ static int create_discard_cmd_control(struct f2fs_sb_info *sbi)
init_waitqueue_head(&dcc->discard_wait_queue); init_waitqueue_head(&dcc->discard_wait_queue);
SM_I(sbi)->dcc_info = dcc; SM_I(sbi)->dcc_info = dcc;
init_thread: init_thread:
dcc->f2fs_issue_discard = kthread_run(issue_discard_thread, sbi, err = f2fs_start_discard_thread(sbi);
"f2fs_discard-%u:%u", MAJOR(dev), MINOR(dev)); if (err) {
if (IS_ERR(dcc->f2fs_issue_discard)) {
err = PTR_ERR(dcc->f2fs_issue_discard);
kfree(dcc); kfree(dcc);
SM_I(sbi)->dcc_info = NULL; SM_I(sbi)->dcc_info = NULL;
return err;
} }
return err; return err;
@ -2255,7 +2296,8 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
del = 0; del = 0;
} }
if (!f2fs_test_and_set_bit(offset, se->discard_map)) if (f2fs_block_unit_discard(sbi) &&
!f2fs_test_and_set_bit(offset, se->discard_map))
sbi->discard_blks--; sbi->discard_blks--;
/* /*
@ -2297,7 +2339,8 @@ static void update_sit_entry(struct f2fs_sb_info *sbi, block_t blkaddr, int del)
} }
} }
if (f2fs_test_and_clear_bit(offset, se->discard_map)) if (f2fs_block_unit_discard(sbi) &&
f2fs_test_and_clear_bit(offset, se->discard_map))
sbi->discard_blks++; sbi->discard_blks++;
} }
if (!f2fs_test_bit(offset, se->ckpt_valid_map)) if (!f2fs_test_bit(offset, se->ckpt_valid_map))
@ -3563,7 +3606,7 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio)
goto drop_bio; goto drop_bio;
} }
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) { if (f2fs_cp_error(sbi)) {
err = -EIO; err = -EIO;
goto drop_bio; goto drop_bio;
} }
@ -4071,7 +4114,8 @@ static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
static struct sit_entry_set *grab_sit_entry_set(void) static struct sit_entry_set *grab_sit_entry_set(void)
{ {
struct sit_entry_set *ses = struct sit_entry_set *ses =
f2fs_kmem_cache_alloc(sit_entry_set_slab, GFP_NOFS); f2fs_kmem_cache_alloc(sit_entry_set_slab,
GFP_NOFS, true, NULL);
ses->entry_cnt = 0; ses->entry_cnt = 0;
INIT_LIST_HEAD(&ses->set_list); INIT_LIST_HEAD(&ses->set_list);
@ -4282,6 +4326,7 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
unsigned int sit_segs, start; unsigned int sit_segs, start;
char *src_bitmap, *bitmap; char *src_bitmap, *bitmap;
unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size; unsigned int bitmap_size, main_bitmap_size, sit_bitmap_size;
unsigned int discard_map = f2fs_block_unit_discard(sbi) ? 1 : 0;
/* allocate memory for SIT information */ /* allocate memory for SIT information */
sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL); sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
@ -4304,9 +4349,9 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
return -ENOMEM; return -ENOMEM;
#ifdef CONFIG_F2FS_CHECK_FS #ifdef CONFIG_F2FS_CHECK_FS
bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 4; bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (3 + discard_map);
#else #else
bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * 3; bitmap_size = MAIN_SEGS(sbi) * SIT_VBLOCK_MAP_SIZE * (2 + discard_map);
#endif #endif
sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL); sit_i->bitmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!sit_i->bitmap) if (!sit_i->bitmap)
@ -4326,8 +4371,10 @@ static int build_sit_info(struct f2fs_sb_info *sbi)
bitmap += SIT_VBLOCK_MAP_SIZE; bitmap += SIT_VBLOCK_MAP_SIZE;
#endif #endif
sit_i->sentries[start].discard_map = bitmap; if (discard_map) {
bitmap += SIT_VBLOCK_MAP_SIZE; sit_i->sentries[start].discard_map = bitmap;
bitmap += SIT_VBLOCK_MAP_SIZE;
}
} }
sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL); sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
@ -4489,17 +4536,19 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (IS_NODESEG(se->type)) if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks; total_node_blocks += se->valid_blocks;
/* build discard map only one time */ if (f2fs_block_unit_discard(sbi)) {
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { /* build discard map only one time */
memset(se->discard_map, 0xff, if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
SIT_VBLOCK_MAP_SIZE); memset(se->discard_map, 0xff,
} else { SIT_VBLOCK_MAP_SIZE);
memcpy(se->discard_map, } else {
se->cur_valid_map, memcpy(se->discard_map,
SIT_VBLOCK_MAP_SIZE); se->cur_valid_map,
sbi->discard_blks += SIT_VBLOCK_MAP_SIZE);
sbi->blocks_per_seg - sbi->discard_blks +=
se->valid_blocks; sbi->blocks_per_seg -
se->valid_blocks;
}
} }
if (__is_large_section(sbi)) if (__is_large_section(sbi))
@ -4535,13 +4584,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi)
if (IS_NODESEG(se->type)) if (IS_NODESEG(se->type))
total_node_blocks += se->valid_blocks; total_node_blocks += se->valid_blocks;
if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { if (f2fs_block_unit_discard(sbi)) {
memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE); if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) {
} else { memset(se->discard_map, 0xff, SIT_VBLOCK_MAP_SIZE);
memcpy(se->discard_map, se->cur_valid_map, } else {
SIT_VBLOCK_MAP_SIZE); memcpy(se->discard_map, se->cur_valid_map,
sbi->discard_blks += old_valid_blocks; SIT_VBLOCK_MAP_SIZE);
sbi->discard_blks -= se->valid_blocks; sbi->discard_blks += old_valid_blocks;
sbi->discard_blks -= se->valid_blocks;
}
} }
if (__is_large_section(sbi)) { if (__is_large_section(sbi)) {
@ -5159,7 +5210,7 @@ int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC; sm_info->ipu_policy = 1 << F2FS_IPU_FSYNC;
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL; sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS; sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
sm_info->min_seq_blocks = sbi->blocks_per_seg * sbi->segs_per_sec; sm_info->min_seq_blocks = sbi->blocks_per_seg;
sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS; sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
sm_info->min_ssr_sections = reserved_sections(sbi); sm_info->min_ssr_sections = reserved_sections(sbi);

View File

@ -142,7 +142,7 @@ enum {
}; };
/* /*
* In the victim_sel_policy->alloc_mode, there are two block allocation modes. * In the victim_sel_policy->alloc_mode, there are three block allocation modes.
* LFS writes data sequentially with cleaning operations. * LFS writes data sequentially with cleaning operations.
* SSR (Slack Space Recycle) reuses obsolete space without cleaning operations. * SSR (Slack Space Recycle) reuses obsolete space without cleaning operations.
* AT_SSR (Age Threshold based Slack Space Recycle) merges fragments into * AT_SSR (Age Threshold based Slack Space Recycle) merges fragments into
@ -155,7 +155,7 @@ enum {
}; };
/* /*
* In the victim_sel_policy->gc_mode, there are two gc, aka cleaning, modes. * In the victim_sel_policy->gc_mode, there are three gc, aka cleaning, modes.
* GC_CB is based on cost-benefit algorithm. * GC_CB is based on cost-benefit algorithm.
* GC_GREEDY is based on greedy algorithm. * GC_GREEDY is based on greedy algorithm.
* GC_AT is based on age-threshold algorithm. * GC_AT is based on age-threshold algorithm.

View File

@ -33,6 +33,7 @@
#include "segment.h" #include "segment.h"
#include "xattr.h" #include "xattr.h"
#include "gc.h" #include "gc.h"
#include "iostat.h"
#define CREATE_TRACE_POINTS #define CREATE_TRACE_POINTS
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
@ -56,6 +57,7 @@ const char *f2fs_fault_name[FAULT_MAX] = {
[FAULT_CHECKPOINT] = "checkpoint error", [FAULT_CHECKPOINT] = "checkpoint error",
[FAULT_DISCARD] = "discard error", [FAULT_DISCARD] = "discard error",
[FAULT_WRITE_IO] = "write IO error", [FAULT_WRITE_IO] = "write IO error",
[FAULT_SLAB_ALLOC] = "slab alloc",
}; };
void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate, void f2fs_build_fault_attr(struct f2fs_sb_info *sbi, unsigned int rate,
@ -155,6 +157,7 @@ enum {
Opt_atgc, Opt_atgc,
Opt_gc_merge, Opt_gc_merge,
Opt_nogc_merge, Opt_nogc_merge,
Opt_discard_unit,
Opt_err, Opt_err,
}; };
@ -231,6 +234,7 @@ static match_table_t f2fs_tokens = {
{Opt_atgc, "atgc"}, {Opt_atgc, "atgc"},
{Opt_gc_merge, "gc_merge"}, {Opt_gc_merge, "gc_merge"},
{Opt_nogc_merge, "nogc_merge"}, {Opt_nogc_merge, "nogc_merge"},
{Opt_discard_unit, "discard_unit=%s"},
{Opt_err, NULL}, {Opt_err, NULL},
}; };
@ -657,10 +661,14 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
return -EINVAL; return -EINVAL;
break; break;
case Opt_discard: case Opt_discard:
if (!f2fs_hw_support_discard(sbi)) {
f2fs_warn(sbi, "device does not support discard");
break;
}
set_opt(sbi, DISCARD); set_opt(sbi, DISCARD);
break; break;
case Opt_nodiscard: case Opt_nodiscard:
if (f2fs_sb_has_blkzoned(sbi)) { if (f2fs_hw_should_discard(sbi)) {
f2fs_warn(sbi, "discard is required for zoned block devices"); f2fs_warn(sbi, "discard is required for zoned block devices");
return -EINVAL; return -EINVAL;
} }
@ -1173,6 +1181,25 @@ static int parse_options(struct super_block *sb, char *options, bool is_remount)
case Opt_nogc_merge: case Opt_nogc_merge:
clear_opt(sbi, GC_MERGE); clear_opt(sbi, GC_MERGE);
break; break;
case Opt_discard_unit:
name = match_strdup(&args[0]);
if (!name)
return -ENOMEM;
if (!strcmp(name, "block")) {
F2FS_OPTION(sbi).discard_unit =
DISCARD_UNIT_BLOCK;
} else if (!strcmp(name, "segment")) {
F2FS_OPTION(sbi).discard_unit =
DISCARD_UNIT_SEGMENT;
} else if (!strcmp(name, "section")) {
F2FS_OPTION(sbi).discard_unit =
DISCARD_UNIT_SECTION;
} else {
kfree(name);
return -EINVAL;
}
kfree(name);
break;
default: default:
f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value", f2fs_err(sbi, "Unrecognized mount option \"%s\" or missing value",
p); p);
@ -1211,6 +1238,14 @@ default_check:
return -EINVAL; return -EINVAL;
} }
#endif #endif
if (f2fs_sb_has_blkzoned(sbi)) {
if (F2FS_OPTION(sbi).discard_unit !=
DISCARD_UNIT_SECTION) {
f2fs_info(sbi, "Zoned block device doesn't need small discard, set discard_unit=section by default");
F2FS_OPTION(sbi).discard_unit =
DISCARD_UNIT_SECTION;
}
}
#ifdef CONFIG_F2FS_FS_COMPRESSION #ifdef CONFIG_F2FS_FS_COMPRESSION
if (f2fs_test_compress_extension(sbi)) { if (f2fs_test_compress_extension(sbi)) {
@ -1271,7 +1306,8 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
{ {
struct f2fs_inode_info *fi; struct f2fs_inode_info *fi;
fi = kmem_cache_alloc(f2fs_inode_cachep, GFP_F2FS_ZERO); fi = f2fs_kmem_cache_alloc(f2fs_inode_cachep,
GFP_F2FS_ZERO, false, F2FS_SB(sb));
if (!fi) if (!fi)
return NULL; return NULL;
@ -1541,6 +1577,7 @@ static void f2fs_put_super(struct super_block *sb)
#endif #endif
fscrypt_free_dummy_policy(&F2FS_OPTION(sbi).dummy_enc_policy); fscrypt_free_dummy_policy(&F2FS_OPTION(sbi).dummy_enc_policy);
destroy_percpu_info(sbi); destroy_percpu_info(sbi);
f2fs_destroy_iostat(sbi);
for (i = 0; i < NR_PAGE_TYPE; i++) for (i = 0; i < NR_PAGE_TYPE; i++)
kvfree(sbi->write_io[i]); kvfree(sbi->write_io[i]);
#ifdef CONFIG_UNICODE #ifdef CONFIG_UNICODE
@ -1924,6 +1961,14 @@ static int f2fs_show_options(struct seq_file *seq, struct dentry *root)
if (test_opt(sbi, ATGC)) if (test_opt(sbi, ATGC))
seq_puts(seq, ",atgc"); seq_puts(seq, ",atgc");
if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_BLOCK)
seq_printf(seq, ",discard_unit=%s", "block");
else if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SEGMENT)
seq_printf(seq, ",discard_unit=%s", "segment");
else if (F2FS_OPTION(sbi).discard_unit == DISCARD_UNIT_SECTION)
seq_printf(seq, ",discard_unit=%s", "section");
return 0; return 0;
} }
@ -1959,11 +2004,15 @@ static void default_options(struct f2fs_sb_info *sbi)
F2FS_OPTION(sbi).unusable_cap = 0; F2FS_OPTION(sbi).unusable_cap = 0;
sbi->sb->s_flags |= SB_LAZYTIME; sbi->sb->s_flags |= SB_LAZYTIME;
set_opt(sbi, FLUSH_MERGE); set_opt(sbi, FLUSH_MERGE);
set_opt(sbi, DISCARD); if (f2fs_hw_support_discard(sbi) || f2fs_hw_should_discard(sbi))
if (f2fs_sb_has_blkzoned(sbi)) set_opt(sbi, DISCARD);
if (f2fs_sb_has_blkzoned(sbi)) {
F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS; F2FS_OPTION(sbi).fs_mode = FS_MODE_LFS;
else F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_SECTION;
} else {
F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE; F2FS_OPTION(sbi).fs_mode = FS_MODE_ADAPTIVE;
F2FS_OPTION(sbi).discard_unit = DISCARD_UNIT_BLOCK;
}
#ifdef CONFIG_F2FS_FS_XATTR #ifdef CONFIG_F2FS_FS_XATTR
set_opt(sbi, XATTR_USER); set_opt(sbi, XATTR_USER);
@ -2038,8 +2087,17 @@ restore_flag:
static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi) static void f2fs_enable_checkpoint(struct f2fs_sb_info *sbi)
{ {
int retry = DEFAULT_RETRY_IO_COUNT;
/* we should flush all the data to keep data consistency */ /* we should flush all the data to keep data consistency */
sync_inodes_sb(sbi->sb); do {
sync_inodes_sb(sbi->sb);
cond_resched();
congestion_wait(BLK_RW_ASYNC, DEFAULT_IO_TIMEOUT);
} while (get_pages(sbi, F2FS_DIRTY_DATA) && retry--);
if (unlikely(retry < 0))
f2fs_warn(sbi, "checkpoint=enable has some unwritten data.");
down_write(&sbi->gc_lock); down_write(&sbi->gc_lock);
f2fs_dirty_to_prefree(sbi); f2fs_dirty_to_prefree(sbi);
@ -2060,12 +2118,15 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
bool need_restart_gc = false, need_stop_gc = false; bool need_restart_gc = false, need_stop_gc = false;
bool need_restart_ckpt = false, need_stop_ckpt = false; bool need_restart_ckpt = false, need_stop_ckpt = false;
bool need_restart_flush = false, need_stop_flush = false; bool need_restart_flush = false, need_stop_flush = false;
bool need_restart_discard = false, need_stop_discard = false;
bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE); bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
bool disable_checkpoint = test_opt(sbi, DISABLE_CHECKPOINT); bool enable_checkpoint = !test_opt(sbi, DISABLE_CHECKPOINT);
bool no_io_align = !F2FS_IO_ALIGNED(sbi); bool no_io_align = !F2FS_IO_ALIGNED(sbi);
bool no_atgc = !test_opt(sbi, ATGC); bool no_atgc = !test_opt(sbi, ATGC);
bool no_discard = !test_opt(sbi, DISCARD);
bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE); bool no_compress_cache = !test_opt(sbi, COMPRESS_CACHE);
bool checkpoint_changed; bool block_unit_discard = f2fs_block_unit_discard(sbi);
struct discard_cmd_control *dcc;
#ifdef CONFIG_QUOTA #ifdef CONFIG_QUOTA
int i, j; int i, j;
#endif #endif
@ -2110,8 +2171,6 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
err = parse_options(sb, data, true); err = parse_options(sb, data, true);
if (err) if (err)
goto restore_opts; goto restore_opts;
checkpoint_changed =
disable_checkpoint != test_opt(sbi, DISABLE_CHECKPOINT);
/* /*
* Previous and new state of filesystem is RO, * Previous and new state of filesystem is RO,
@ -2168,6 +2227,12 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts; goto restore_opts;
} }
if (block_unit_discard != f2fs_block_unit_discard(sbi)) {
err = -EINVAL;
f2fs_warn(sbi, "switch discard_unit option is not allowed");
goto restore_opts;
}
if ((*flags & SB_RDONLY) && test_opt(sbi, DISABLE_CHECKPOINT)) { if ((*flags & SB_RDONLY) && test_opt(sbi, DISABLE_CHECKPOINT)) {
err = -EINVAL; err = -EINVAL;
f2fs_warn(sbi, "disabling checkpoint not compatible with read-only"); f2fs_warn(sbi, "disabling checkpoint not compatible with read-only");
@ -2233,11 +2298,26 @@ static int f2fs_remount(struct super_block *sb, int *flags, char *data)
need_stop_flush = true; need_stop_flush = true;
} }
if (checkpoint_changed) { if (no_discard == !!test_opt(sbi, DISCARD)) {
if (test_opt(sbi, DISCARD)) {
err = f2fs_start_discard_thread(sbi);
if (err)
goto restore_flush;
need_stop_discard = true;
} else {
dcc = SM_I(sbi)->dcc_info;
f2fs_stop_discard_thread(sbi);
if (atomic_read(&dcc->discard_cmd_cnt))
f2fs_issue_discard_timeout(sbi);
need_restart_discard = true;
}
}
if (enable_checkpoint == !!test_opt(sbi, DISABLE_CHECKPOINT)) {
if (test_opt(sbi, DISABLE_CHECKPOINT)) { if (test_opt(sbi, DISABLE_CHECKPOINT)) {
err = f2fs_disable_checkpoint(sbi); err = f2fs_disable_checkpoint(sbi);
if (err) if (err)
goto restore_flush; goto restore_discard;
} else { } else {
f2fs_enable_checkpoint(sbi); f2fs_enable_checkpoint(sbi);
} }
@ -2257,6 +2337,13 @@ skip:
adjust_unusable_cap_perc(sbi); adjust_unusable_cap_perc(sbi);
*flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME); *flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME);
return 0; return 0;
restore_discard:
if (need_restart_discard) {
if (f2fs_start_discard_thread(sbi))
f2fs_warn(sbi, "discard has been stopped");
} else if (need_stop_discard) {
f2fs_stop_discard_thread(sbi);
}
restore_flush: restore_flush:
if (need_restart_flush) { if (need_restart_flush) {
if (f2fs_create_flush_cmd_control(sbi)) if (f2fs_create_flush_cmd_control(sbi))
@ -2517,6 +2604,33 @@ static int f2fs_enable_quotas(struct super_block *sb)
return 0; return 0;
} }
static int f2fs_quota_sync_file(struct f2fs_sb_info *sbi, int type)
{
struct quota_info *dqopt = sb_dqopt(sbi->sb);
struct address_space *mapping = dqopt->files[type]->i_mapping;
int ret = 0;
ret = dquot_writeback_dquots(sbi->sb, type);
if (ret)
goto out;
ret = filemap_fdatawrite(mapping);
if (ret)
goto out;
/* if we are using journalled quota */
if (is_journalled_quota(sbi))
goto out;
ret = filemap_fdatawait(mapping);
truncate_inode_pages(&dqopt->files[type]->i_data, 0);
out:
if (ret)
set_sbi_flag(sbi, SBI_QUOTA_NEED_REPAIR);
return ret;
}
int f2fs_quota_sync(struct super_block *sb, int type) int f2fs_quota_sync(struct super_block *sb, int type)
{ {
struct f2fs_sb_info *sbi = F2FS_SB(sb); struct f2fs_sb_info *sbi = F2FS_SB(sb);
@ -2524,57 +2638,42 @@ int f2fs_quota_sync(struct super_block *sb, int type)
int cnt; int cnt;
int ret; int ret;
/*
* do_quotactl
* f2fs_quota_sync
* down_read(quota_sem)
* dquot_writeback_dquots()
* f2fs_dquot_commit
* block_operation
* down_read(quota_sem)
*/
f2fs_lock_op(sbi);
down_read(&sbi->quota_sem);
ret = dquot_writeback_dquots(sb, type);
if (ret)
goto out;
/* /*
* Now when everything is written we can discard the pagecache so * Now when everything is written we can discard the pagecache so
* that userspace sees the changes. * that userspace sees the changes.
*/ */
for (cnt = 0; cnt < MAXQUOTAS; cnt++) { for (cnt = 0; cnt < MAXQUOTAS; cnt++) {
struct address_space *mapping;
if (type != -1 && cnt != type) if (type != -1 && cnt != type)
continue; continue;
if (!sb_has_quota_active(sb, cnt))
continue;
mapping = dqopt->files[cnt]->i_mapping; if (!sb_has_quota_active(sb, type))
return 0;
ret = filemap_fdatawrite(mapping);
if (ret)
goto out;
/* if we are using journalled quota */
if (is_journalled_quota(sbi))
continue;
ret = filemap_fdatawait(mapping);
if (ret)
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
inode_lock(dqopt->files[cnt]); inode_lock(dqopt->files[cnt]);
truncate_inode_pages(&dqopt->files[cnt]->i_data, 0);
/*
* do_quotactl
* f2fs_quota_sync
* down_read(quota_sem)
* dquot_writeback_dquots()
* f2fs_dquot_commit
* block_operation
* down_read(quota_sem)
*/
f2fs_lock_op(sbi);
down_read(&sbi->quota_sem);
ret = f2fs_quota_sync_file(sbi, cnt);
up_read(&sbi->quota_sem);
f2fs_unlock_op(sbi);
inode_unlock(dqopt->files[cnt]); inode_unlock(dqopt->files[cnt]);
if (ret)
break;
} }
out:
if (ret)
set_sbi_flag(F2FS_SB(sb), SBI_QUOTA_NEED_REPAIR);
up_read(&sbi->quota_sem);
f2fs_unlock_op(sbi);
return ret; return ret;
} }
@ -3207,11 +3306,13 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi,
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
if (le32_to_cpu(raw_super->cp_payload) > if (le32_to_cpu(raw_super->cp_payload) >=
(blocks_per_seg - F2FS_CP_PACKS)) { (blocks_per_seg - F2FS_CP_PACKS -
f2fs_info(sbi, "Insane cp_payload (%u > %u)", NR_CURSEG_PERSIST_TYPE)) {
f2fs_info(sbi, "Insane cp_payload (%u >= %u)",
le32_to_cpu(raw_super->cp_payload), le32_to_cpu(raw_super->cp_payload),
blocks_per_seg - F2FS_CP_PACKS); blocks_per_seg - F2FS_CP_PACKS -
NR_CURSEG_PERSIST_TYPE);
return -EFSCORRUPTED; return -EFSCORRUPTED;
} }
@ -3247,6 +3348,7 @@ int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi)
unsigned int cp_pack_start_sum, cp_payload; unsigned int cp_pack_start_sum, cp_payload;
block_t user_block_count, valid_user_blocks; block_t user_block_count, valid_user_blocks;
block_t avail_node_count, valid_node_count; block_t avail_node_count, valid_node_count;
unsigned int nat_blocks, nat_bits_bytes, nat_bits_blocks;
int i, j; int i, j;
total = le32_to_cpu(raw_super->segment_count); total = le32_to_cpu(raw_super->segment_count);
@ -3377,6 +3479,17 @@ skip_cross:
return 1; return 1;
} }
nat_blocks = nat_segs << log_blocks_per_seg;
nat_bits_bytes = nat_blocks / BITS_PER_BYTE;
nat_bits_blocks = F2FS_BLK_ALIGN((nat_bits_bytes << 1) + 8);
if (__is_set_ckpt_flags(ckpt, CP_NAT_BITS_FLAG) &&
(cp_payload + F2FS_CP_PACKS +
NR_CURSEG_PERSIST_TYPE + nat_bits_blocks >= blocks_per_seg)) {
f2fs_warn(sbi, "Insane cp_payload: %u, nat_bits_blocks: %u)",
cp_payload, nat_bits_blocks);
return -EFSCORRUPTED;
}
if (unlikely(f2fs_cp_error(sbi))) { if (unlikely(f2fs_cp_error(sbi))) {
f2fs_err(sbi, "A bug case: need to run fsck"); f2fs_err(sbi, "A bug case: need to run fsck");
return 1; return 1;
@ -3409,6 +3522,7 @@ static void init_sb_info(struct f2fs_sb_info *sbi)
sbi->next_victim_seg[FG_GC] = NULL_SEGNO; sbi->next_victim_seg[FG_GC] = NULL_SEGNO;
sbi->max_victim_search = DEF_MAX_VICTIM_SEARCH; sbi->max_victim_search = DEF_MAX_VICTIM_SEARCH;
sbi->migration_granularity = sbi->segs_per_sec; sbi->migration_granularity = sbi->segs_per_sec;
sbi->seq_file_ra_mul = MIN_RA_MUL;
sbi->dir_level = DEF_DIR_LEVEL; sbi->dir_level = DEF_DIR_LEVEL;
sbi->interval_time[CP_TIME] = DEF_CP_INTERVAL; sbi->interval_time[CP_TIME] = DEF_CP_INTERVAL;
@ -3768,7 +3882,8 @@ static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
/* adjust parameters according to the volume size */ /* adjust parameters according to the volume size */
if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) { if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) {
F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE; F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
sm_i->dcc_info->discard_granularity = 1; if (f2fs_block_unit_discard(sbi))
sm_i->dcc_info->discard_granularity = 1;
sm_i->ipu_policy = 1 << F2FS_IPU_FORCE; sm_i->ipu_policy = 1 << F2FS_IPU_FORCE;
} }
@ -3889,11 +4004,6 @@ try_onemore:
set_sbi_flag(sbi, SBI_POR_DOING); set_sbi_flag(sbi, SBI_POR_DOING);
spin_lock_init(&sbi->stat_lock); spin_lock_init(&sbi->stat_lock);
/* init iostat info */
spin_lock_init(&sbi->iostat_lock);
sbi->iostat_enable = false;
sbi->iostat_period_ms = DEFAULT_IOSTAT_PERIOD_MS;
for (i = 0; i < NR_PAGE_TYPE; i++) { for (i = 0; i < NR_PAGE_TYPE; i++) {
int n = (i == META) ? 1 : NR_TEMP_TYPE; int n = (i == META) ? 1 : NR_TEMP_TYPE;
int j; int j;
@ -3924,10 +4034,14 @@ try_onemore:
init_waitqueue_head(&sbi->cp_wait); init_waitqueue_head(&sbi->cp_wait);
init_sb_info(sbi); init_sb_info(sbi);
err = init_percpu_info(sbi); err = f2fs_init_iostat(sbi);
if (err) if (err)
goto free_bio_info; goto free_bio_info;
err = init_percpu_info(sbi);
if (err)
goto free_iostat;
if (F2FS_IO_ALIGNED(sbi)) { if (F2FS_IO_ALIGNED(sbi)) {
sbi->write_io_dummy = sbi->write_io_dummy =
mempool_create_page_pool(2 * (F2FS_IO_SIZE(sbi) - 1), 0); mempool_create_page_pool(2 * (F2FS_IO_SIZE(sbi) - 1), 0);
@ -4259,6 +4373,8 @@ free_io_dummy:
mempool_destroy(sbi->write_io_dummy); mempool_destroy(sbi->write_io_dummy);
free_percpu: free_percpu:
destroy_percpu_info(sbi); destroy_percpu_info(sbi);
free_iostat:
f2fs_destroy_iostat(sbi);
free_bio_info: free_bio_info:
for (i = 0; i < NR_PAGE_TYPE; i++) for (i = 0; i < NR_PAGE_TYPE; i++)
kvfree(sbi->write_io[i]); kvfree(sbi->write_io[i]);
@ -4401,9 +4517,12 @@ static int __init init_f2fs_fs(void)
err = f2fs_init_post_read_processing(); err = f2fs_init_post_read_processing();
if (err) if (err)
goto free_root_stats; goto free_root_stats;
err = f2fs_init_bio_entry_cache(); err = f2fs_init_iostat_processing();
if (err) if (err)
goto free_post_read; goto free_post_read;
err = f2fs_init_bio_entry_cache();
if (err)
goto free_iostat;
err = f2fs_init_bioset(); err = f2fs_init_bioset();
if (err) if (err)
goto free_bio_enrty_cache; goto free_bio_enrty_cache;
@ -4425,6 +4544,8 @@ free_bioset:
f2fs_destroy_bioset(); f2fs_destroy_bioset();
free_bio_enrty_cache: free_bio_enrty_cache:
f2fs_destroy_bio_entry_cache(); f2fs_destroy_bio_entry_cache();
free_iostat:
f2fs_destroy_iostat_processing();
free_post_read: free_post_read:
f2fs_destroy_post_read_processing(); f2fs_destroy_post_read_processing();
free_root_stats: free_root_stats:
@ -4459,6 +4580,7 @@ static void __exit exit_f2fs_fs(void)
f2fs_destroy_compress_mempool(); f2fs_destroy_compress_mempool();
f2fs_destroy_bioset(); f2fs_destroy_bioset();
f2fs_destroy_bio_entry_cache(); f2fs_destroy_bio_entry_cache();
f2fs_destroy_iostat_processing();
f2fs_destroy_post_read_processing(); f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats(); f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type); unregister_filesystem(&f2fs_fs_type);

View File

@ -17,6 +17,7 @@
#include "f2fs.h" #include "f2fs.h"
#include "segment.h" #include "segment.h"
#include "gc.h" #include "gc.h"
#include "iostat.h"
#include <trace/events/f2fs.h> #include <trace/events/f2fs.h>
static struct proc_dir_entry *f2fs_proc_root; static struct proc_dir_entry *f2fs_proc_root;
@ -307,6 +308,14 @@ static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
return sysfs_emit(buf, "%u\n", sbi->compr_new_inode); return sysfs_emit(buf, "%u\n", sbi->compr_new_inode);
#endif #endif
if (!strcmp(a->attr.name, "gc_segment_mode"))
return sysfs_emit(buf, "%u\n", sbi->gc_segment_mode);
if (!strcmp(a->attr.name, "gc_reclaimed_segments")) {
return sysfs_emit(buf, "%u\n",
sbi->gc_reclaimed_segs[sbi->gc_segment_mode]);
}
ui = (unsigned int *)(ptr + a->offset); ui = (unsigned int *)(ptr + a->offset);
return sprintf(buf, "%u\n", *ui); return sprintf(buf, "%u\n", *ui);
@ -343,7 +352,7 @@ static ssize_t __sbi_store(struct f2fs_attr *a,
set = false; set = false;
} }
if (strlen(name) >= F2FS_EXTENSION_LEN) if (!strlen(name) || strlen(name) >= F2FS_EXTENSION_LEN)
return -EINVAL; return -EINVAL;
down_write(&sbi->sb_lock); down_write(&sbi->sb_lock);
@ -420,6 +429,8 @@ out:
if (!strcmp(a->attr.name, "discard_granularity")) { if (!strcmp(a->attr.name, "discard_granularity")) {
if (t == 0 || t > MAX_PLIST_NUM) if (t == 0 || t > MAX_PLIST_NUM)
return -EINVAL; return -EINVAL;
if (!f2fs_block_unit_discard(sbi))
return -EINVAL;
if (t == *ui) if (t == *ui)
return count; return count;
*ui = t; *ui = t;
@ -467,6 +478,7 @@ out:
return count; return count;
} }
#ifdef CONFIG_F2FS_IOSTAT
if (!strcmp(a->attr.name, "iostat_enable")) { if (!strcmp(a->attr.name, "iostat_enable")) {
sbi->iostat_enable = !!t; sbi->iostat_enable = !!t;
if (!sbi->iostat_enable) if (!sbi->iostat_enable)
@ -482,6 +494,7 @@ out:
spin_unlock(&sbi->iostat_lock); spin_unlock(&sbi->iostat_lock);
return count; return count;
} }
#endif
#ifdef CONFIG_F2FS_FS_COMPRESSION #ifdef CONFIG_F2FS_FS_COMPRESSION
if (!strcmp(a->attr.name, "compr_written_block") || if (!strcmp(a->attr.name, "compr_written_block") ||
@ -515,6 +528,29 @@ out:
return count; return count;
} }
if (!strcmp(a->attr.name, "gc_segment_mode")) {
if (t < MAX_GC_MODE)
sbi->gc_segment_mode = t;
else
return -EINVAL;
return count;
}
if (!strcmp(a->attr.name, "gc_reclaimed_segments")) {
if (t != 0)
return -EINVAL;
sbi->gc_reclaimed_segs[sbi->gc_segment_mode] = 0;
return count;
}
if (!strcmp(a->attr.name, "seq_file_ra_mul")) {
if (t >= MIN_RA_MUL && t <= MAX_RA_MUL)
sbi->seq_file_ra_mul = t;
else
return -EINVAL;
return count;
}
*ui = (unsigned int)t; *ui = (unsigned int)t;
return count; return count;
@ -667,8 +703,10 @@ F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, discard_idle_interval,
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle_interval, interval_time[GC_TIME]); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle_interval, interval_time[GC_TIME]);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info,
umount_discard_timeout, interval_time[UMOUNT_DISCARD_TIMEOUT]); umount_discard_timeout, interval_time[UMOUNT_DISCARD_TIMEOUT]);
#ifdef CONFIG_F2FS_IOSTAT
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, iostat_enable); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, iostat_enable);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_period_ms, iostat_period_ms);
#endif
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, max_io_bytes, max_io_bytes);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold); F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
@ -740,6 +778,10 @@ F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_candidate_count, max_candidate_cou
F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_weight, age_weight); F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_weight, age_weight);
F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_threshold, age_threshold); F2FS_RW_ATTR(ATGC_INFO, atgc_management, atgc_age_threshold, age_threshold);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, seq_file_ra_mul, seq_file_ra_mul);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_segment_mode, gc_segment_mode);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_reclaimed_segments, gc_reclaimed_segs);
#define ATTR_LIST(name) (&f2fs_attr_##name.attr) #define ATTR_LIST(name) (&f2fs_attr_##name.attr)
static struct attribute *f2fs_attrs[] = { static struct attribute *f2fs_attrs[] = {
ATTR_LIST(gc_urgent_sleep_time), ATTR_LIST(gc_urgent_sleep_time),
@ -770,8 +812,10 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(discard_idle_interval), ATTR_LIST(discard_idle_interval),
ATTR_LIST(gc_idle_interval), ATTR_LIST(gc_idle_interval),
ATTR_LIST(umount_discard_timeout), ATTR_LIST(umount_discard_timeout),
#ifdef CONFIG_F2FS_IOSTAT
ATTR_LIST(iostat_enable), ATTR_LIST(iostat_enable),
ATTR_LIST(iostat_period_ms), ATTR_LIST(iostat_period_ms),
#endif
ATTR_LIST(readdir_ra), ATTR_LIST(readdir_ra),
ATTR_LIST(max_io_bytes), ATTR_LIST(max_io_bytes),
ATTR_LIST(gc_pin_file_thresh), ATTR_LIST(gc_pin_file_thresh),
@ -812,6 +856,9 @@ static struct attribute *f2fs_attrs[] = {
ATTR_LIST(atgc_candidate_count), ATTR_LIST(atgc_candidate_count),
ATTR_LIST(atgc_age_weight), ATTR_LIST(atgc_age_weight),
ATTR_LIST(atgc_age_threshold), ATTR_LIST(atgc_age_threshold),
ATTR_LIST(seq_file_ra_mul),
ATTR_LIST(gc_segment_mode),
ATTR_LIST(gc_reclaimed_segments),
NULL, NULL,
}; };
ATTRIBUTE_GROUPS(f2fs); ATTRIBUTE_GROUPS(f2fs);
@ -1036,101 +1083,6 @@ static int __maybe_unused segment_bits_seq_show(struct seq_file *seq,
return 0; return 0;
} }
void f2fs_record_iostat(struct f2fs_sb_info *sbi)
{
unsigned long long iostat_diff[NR_IO_TYPE];
int i;
if (time_is_after_jiffies(sbi->iostat_next_period))
return;
/* Need double check under the lock */
spin_lock(&sbi->iostat_lock);
if (time_is_after_jiffies(sbi->iostat_next_period)) {
spin_unlock(&sbi->iostat_lock);
return;
}
sbi->iostat_next_period = jiffies +
msecs_to_jiffies(sbi->iostat_period_ms);
for (i = 0; i < NR_IO_TYPE; i++) {
iostat_diff[i] = sbi->rw_iostat[i] -
sbi->prev_rw_iostat[i];
sbi->prev_rw_iostat[i] = sbi->rw_iostat[i];
}
spin_unlock(&sbi->iostat_lock);
trace_f2fs_iostat(sbi, iostat_diff);
}
static int __maybe_unused iostat_info_seq_show(struct seq_file *seq,
void *offset)
{
struct super_block *sb = seq->private;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
time64_t now = ktime_get_real_seconds();
if (!sbi->iostat_enable)
return 0;
seq_printf(seq, "time: %-16llu\n", now);
/* print app write IOs */
seq_puts(seq, "[WRITE]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_IO]);
/* print fs write IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GC_DATA_IO]);
seq_printf(seq, "fs gc node: %-16llu\n",
sbi->rw_iostat[FS_GC_NODE_IO]);
seq_printf(seq, "fs cp data: %-16llu\n",
sbi->rw_iostat[FS_CP_DATA_IO]);
seq_printf(seq, "fs cp node: %-16llu\n",
sbi->rw_iostat[FS_CP_NODE_IO]);
seq_printf(seq, "fs cp meta: %-16llu\n",
sbi->rw_iostat[FS_CP_META_IO]);
/* print app read IOs */
seq_puts(seq, "[READ]\n");
seq_printf(seq, "app buffered: %-16llu\n",
sbi->rw_iostat[APP_BUFFERED_READ_IO]);
seq_printf(seq, "app direct: %-16llu\n",
sbi->rw_iostat[APP_DIRECT_READ_IO]);
seq_printf(seq, "app mapped: %-16llu\n",
sbi->rw_iostat[APP_MAPPED_READ_IO]);
/* print fs read IOs */
seq_printf(seq, "fs data: %-16llu\n",
sbi->rw_iostat[FS_DATA_READ_IO]);
seq_printf(seq, "fs gc data: %-16llu\n",
sbi->rw_iostat[FS_GDATA_READ_IO]);
seq_printf(seq, "fs compr_data: %-16llu\n",
sbi->rw_iostat[FS_CDATA_READ_IO]);
seq_printf(seq, "fs node: %-16llu\n",
sbi->rw_iostat[FS_NODE_READ_IO]);
seq_printf(seq, "fs meta: %-16llu\n",
sbi->rw_iostat[FS_META_READ_IO]);
/* print other IOs */
seq_puts(seq, "[OTHER]\n");
seq_printf(seq, "fs discard: %-16llu\n",
sbi->rw_iostat[FS_DISCARD]);
return 0;
}
static int __maybe_unused victim_bits_seq_show(struct seq_file *seq, static int __maybe_unused victim_bits_seq_show(struct seq_file *seq,
void *offset) void *offset)
{ {
@ -1213,13 +1165,15 @@ int f2fs_register_sysfs(struct f2fs_sb_info *sbi)
sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root); sbi->s_proc = proc_mkdir(sb->s_id, f2fs_proc_root);
if (sbi->s_proc) { if (sbi->s_proc) {
proc_create_single_data("segment_info", S_IRUGO, sbi->s_proc, proc_create_single_data("segment_info", 0444, sbi->s_proc,
segment_info_seq_show, sb); segment_info_seq_show, sb);
proc_create_single_data("segment_bits", S_IRUGO, sbi->s_proc, proc_create_single_data("segment_bits", 0444, sbi->s_proc,
segment_bits_seq_show, sb); segment_bits_seq_show, sb);
proc_create_single_data("iostat_info", S_IRUGO, sbi->s_proc, #ifdef CONFIG_F2FS_IOSTAT
proc_create_single_data("iostat_info", 0444, sbi->s_proc,
iostat_info_seq_show, sb); iostat_info_seq_show, sb);
proc_create_single_data("victim_bits", S_IRUGO, sbi->s_proc, #endif
proc_create_single_data("victim_bits", 0444, sbi->s_proc,
victim_bits_seq_show, sb); victim_bits_seq_show, sb);
} }
return 0; return 0;
@ -1238,7 +1192,9 @@ put_sb_kobj:
void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi) void f2fs_unregister_sysfs(struct f2fs_sb_info *sbi)
{ {
if (sbi->s_proc) { if (sbi->s_proc) {
#ifdef CONFIG_F2FS_IOSTAT
remove_proc_entry("iostat_info", sbi->s_proc); remove_proc_entry("iostat_info", sbi->s_proc);
#endif
remove_proc_entry("segment_info", sbi->s_proc); remove_proc_entry("segment_info", sbi->s_proc);
remove_proc_entry("segment_bits", sbi->s_proc); remove_proc_entry("segment_bits", sbi->s_proc);
remove_proc_entry("victim_bits", sbi->s_proc); remove_proc_entry("victim_bits", sbi->s_proc);

View File

@ -27,7 +27,8 @@ static void *xattr_alloc(struct f2fs_sb_info *sbi, int size, bool *is_inline)
{ {
if (likely(size == sbi->inline_xattr_slab_size)) { if (likely(size == sbi->inline_xattr_slab_size)) {
*is_inline = true; *is_inline = true;
return kmem_cache_zalloc(sbi->inline_xattr_slab, GFP_NOFS); return f2fs_kmem_cache_alloc(sbi->inline_xattr_slab,
GFP_F2FS_ZERO, false, sbi);
} }
*is_inline = false; *is_inline = false;
return f2fs_kzalloc(sbi, size, GFP_NOFS); return f2fs_kzalloc(sbi, size, GFP_NOFS);

View File

@ -1818,6 +1818,7 @@ DEFINE_EVENT(f2fs_zip_end, f2fs_decompress_pages_end,
TP_ARGS(inode, cluster_idx, compressed_size, ret) TP_ARGS(inode, cluster_idx, compressed_size, ret)
); );
#ifdef CONFIG_F2FS_IOSTAT
TRACE_EVENT(f2fs_iostat, TRACE_EVENT(f2fs_iostat,
TP_PROTO(struct f2fs_sb_info *sbi, unsigned long long *iostat), TP_PROTO(struct f2fs_sb_info *sbi, unsigned long long *iostat),
@ -1894,6 +1895,102 @@ TRACE_EVENT(f2fs_iostat,
__entry->fs_cdrio, __entry->fs_nrio, __entry->fs_mrio) __entry->fs_cdrio, __entry->fs_nrio, __entry->fs_mrio)
); );
#ifndef __F2FS_IOSTAT_LATENCY_TYPE
#define __F2FS_IOSTAT_LATENCY_TYPE
struct f2fs_iostat_latency {
unsigned int peak_lat;
unsigned int avg_lat;
unsigned int cnt;
};
#endif /* __F2FS_IOSTAT_LATENCY_TYPE */
TRACE_EVENT(f2fs_iostat_latency,
TP_PROTO(struct f2fs_sb_info *sbi, struct f2fs_iostat_latency (*iostat_lat)[NR_PAGE_TYPE]),
TP_ARGS(sbi, iostat_lat),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(unsigned int, d_rd_peak)
__field(unsigned int, d_rd_avg)
__field(unsigned int, d_rd_cnt)
__field(unsigned int, n_rd_peak)
__field(unsigned int, n_rd_avg)
__field(unsigned int, n_rd_cnt)
__field(unsigned int, m_rd_peak)
__field(unsigned int, m_rd_avg)
__field(unsigned int, m_rd_cnt)
__field(unsigned int, d_wr_s_peak)
__field(unsigned int, d_wr_s_avg)
__field(unsigned int, d_wr_s_cnt)
__field(unsigned int, n_wr_s_peak)
__field(unsigned int, n_wr_s_avg)
__field(unsigned int, n_wr_s_cnt)
__field(unsigned int, m_wr_s_peak)
__field(unsigned int, m_wr_s_avg)
__field(unsigned int, m_wr_s_cnt)
__field(unsigned int, d_wr_as_peak)
__field(unsigned int, d_wr_as_avg)
__field(unsigned int, d_wr_as_cnt)
__field(unsigned int, n_wr_as_peak)
__field(unsigned int, n_wr_as_avg)
__field(unsigned int, n_wr_as_cnt)
__field(unsigned int, m_wr_as_peak)
__field(unsigned int, m_wr_as_avg)
__field(unsigned int, m_wr_as_cnt)
),
TP_fast_assign(
__entry->dev = sbi->sb->s_dev;
__entry->d_rd_peak = iostat_lat[0][DATA].peak_lat;
__entry->d_rd_avg = iostat_lat[0][DATA].avg_lat;
__entry->d_rd_cnt = iostat_lat[0][DATA].cnt;
__entry->n_rd_peak = iostat_lat[0][NODE].peak_lat;
__entry->n_rd_avg = iostat_lat[0][NODE].avg_lat;
__entry->n_rd_cnt = iostat_lat[0][NODE].cnt;
__entry->m_rd_peak = iostat_lat[0][META].peak_lat;
__entry->m_rd_avg = iostat_lat[0][META].avg_lat;
__entry->m_rd_cnt = iostat_lat[0][META].cnt;
__entry->d_wr_s_peak = iostat_lat[1][DATA].peak_lat;
__entry->d_wr_s_avg = iostat_lat[1][DATA].avg_lat;
__entry->d_wr_s_cnt = iostat_lat[1][DATA].cnt;
__entry->n_wr_s_peak = iostat_lat[1][NODE].peak_lat;
__entry->n_wr_s_avg = iostat_lat[1][NODE].avg_lat;
__entry->n_wr_s_cnt = iostat_lat[1][NODE].cnt;
__entry->m_wr_s_peak = iostat_lat[1][META].peak_lat;
__entry->m_wr_s_avg = iostat_lat[1][META].avg_lat;
__entry->m_wr_s_cnt = iostat_lat[1][META].cnt;
__entry->d_wr_as_peak = iostat_lat[2][DATA].peak_lat;
__entry->d_wr_as_avg = iostat_lat[2][DATA].avg_lat;
__entry->d_wr_as_cnt = iostat_lat[2][DATA].cnt;
__entry->n_wr_as_peak = iostat_lat[2][NODE].peak_lat;
__entry->n_wr_as_avg = iostat_lat[2][NODE].avg_lat;
__entry->n_wr_as_cnt = iostat_lat[2][NODE].cnt;
__entry->m_wr_as_peak = iostat_lat[2][META].peak_lat;
__entry->m_wr_as_avg = iostat_lat[2][META].avg_lat;
__entry->m_wr_as_cnt = iostat_lat[2][META].cnt;
),
TP_printk("dev = (%d,%d), "
"iotype [peak lat.(ms)/avg lat.(ms)/count], "
"rd_data [%u/%u/%u], rd_node [%u/%u/%u], rd_meta [%u/%u/%u], "
"wr_sync_data [%u/%u/%u], wr_sync_node [%u/%u/%u], "
"wr_sync_meta [%u/%u/%u], wr_async_data [%u/%u/%u], "
"wr_async_node [%u/%u/%u], wr_async_meta [%u/%u/%u]",
show_dev(__entry->dev),
__entry->d_rd_peak, __entry->d_rd_avg, __entry->d_rd_cnt,
__entry->n_rd_peak, __entry->n_rd_avg, __entry->n_rd_cnt,
__entry->m_rd_peak, __entry->m_rd_avg, __entry->m_rd_cnt,
__entry->d_wr_s_peak, __entry->d_wr_s_avg, __entry->d_wr_s_cnt,
__entry->n_wr_s_peak, __entry->n_wr_s_avg, __entry->n_wr_s_cnt,
__entry->m_wr_s_peak, __entry->m_wr_s_avg, __entry->m_wr_s_cnt,
__entry->d_wr_as_peak, __entry->d_wr_as_avg, __entry->d_wr_as_cnt,
__entry->n_wr_as_peak, __entry->n_wr_as_avg, __entry->n_wr_as_cnt,
__entry->m_wr_as_peak, __entry->m_wr_as_avg, __entry->m_wr_as_cnt)
);
#endif
TRACE_EVENT(f2fs_bmap, TRACE_EVENT(f2fs_bmap,
TP_PROTO(struct inode *inode, sector_t lblock, sector_t pblock), TP_PROTO(struct inode *inode, sector_t lblock, sector_t pblock),