Commit Graph

737796 Commits

Author SHA1 Message Date
Eric Biggers 53fedcc09c f2fs: reserve bits for fs-verity
Reserve an F2FS feature flag and inode flag for fs-verity.  This is an
in-development feature that is planned be discussed at LSF/MM 2018 [1].
It will provide file-based integrity and authenticity for read-only
files.  Most code will be in a filesystem-independent module, with
smaller changes needed to individual filesystems that opt-in to
supporting the feature.  An early prototype supporting F2FS is available
[2].  Reserving the F2FS on-disk bits for fs-verity will prevent users
of the prototype from conflicting with other new F2FS features.

Note that we're reserving the inode flag in f2fs_inode.i_advise, which
isn't really appropriate since it's not a hint or advice.  But
->i_advise is already being used to hold the 'encrypt' flag; and F2FS's
->i_flags uses the generic FS_* values, so it seems ->i_flags can't be
used for an F2FS-specific flag without additional work to remove the
assumption that ->i_flags uses the generic flags namespace.

[1] https://marc.info/?l=linux-fsdevel&m=151690752225644
[2] https://git.kernel.org/pub/scm/linux/kernel/git/mhalcrow/linux.git/log/?h=fs-verity-dev

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-28 12:56:49 -07:00
Yunlei He d21b0f238a f2fs: Add a segment type check in inplace write
This patch add a segment type check in IPU, in
case of something wrong with blkadd in dnode.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:13:54 -07:00
Yunlong Song 2f52052929 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
Since f2fs_inode_info is allocated with flag GFP_F2FS_ZERO, so we do not
need to initialize zero value for its member any more.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:12:53 -07:00
Chao Yu 780de47cf6 f2fs: don't track new nat entry in nat set
Nat entry set is used only in checkpoint(), and during checkpoint() we
won't flush new nat entry with unallocated address, so we don't need to
add new nat entry into nat set, then nat_entry_set::entry_cnt can
indicate actual entry count we need to flush in checkpoint().

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:10:29 -07:00
Chao Yu df033caf51 f2fs: clean up with F2FS_BLK_ALIGN
Clean up F2FS_BYTES_TO_BLK(x + F2FS_BLKSIZE - 1) with F2FS_BLK_ALIGN(x).

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-27 20:09:32 -07:00
Yunlei He 0833721ec3 f2fs: check blkaddr more accuratly before issue a bio
This patch check blkaddr more accuratly before issue a
write or read bio.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-18 23:33:50 -07:00
Ritesh Harjani 02117b8ae9 f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
Quota code itself is serializing the operations by taking mutex_lock.
It seems a below deadlock can happen if GF_NOFS is not used in
f2fs_quota_read

__switch_to+0x88
__schedule+0x5b0
schedule+0x78
schedule_preempt_disabled+0x20
__mutex_lock_slowpath+0xdc   		//mutex owner is itself
mutex_lock+0x2c
dquot_commit+0x30			//mutex_lock(&dqopt->dqio_mutex);
dqput+0xe0
__dquot_drop+0x80
dquot_drop+0x48
f2fs_evict_inode+0x218
evict+0xa8
dispose_list+0x3c
prune_icache_sb+0x58
super_cache_scan+0xf4
do_shrink_slab+0x208
shrink_slab.part.40+0xac
shrink_zone+0x1b0
do_try_to_free_pages+0x25c
try_to_free_pages+0x164
__alloc_pages_nodemask+0x534
do_read_cache_page+0x6c
read_cache_page+0x14
f2fs_quota_read+0xa4
read_blk+0x54
find_tree_dqentry+0xe4
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
find_tree_dqentry+0xb8
qtree_read_dquot+0x68
v2_read_dquot+0x24
dquot_acquire+0x5c			// mutex_lock(&dqopt->dqio_mutex);
dqget+0x238
__dquot_initialize+0xd4
dquot_initialize+0x10
dquot_file_open+0x34
f2fs_file_open+0x6c
do_dentry_open+0x1e4
vfs_open+0x6c
path_openat+0xa20
do_filp_open+0x4c
do_sys_open+0x178

Signed-off-by: Ritesh Harjani <riteshh@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-18 23:31:28 -07:00
Sheng Yong ff62af200b f2fs: introduce a new mount option test_dummy_encryption
This patch introduces a new mount option `test_dummy_encryption'
to allow fscrypt to create a fake fscrypt context. This is used
by xfstests.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 14:02:57 +09:00
Sheng Yong b7c409deda f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
This patch introduces a new feature, F2FS_FEATURE_LOST_FOUND, which
is set by mkfs. mkfs creates a directory named lost+found, which saves
unreachable files. If fsck finds a file which has no parent, or its
parent is removed by fsck, the file will be placed under lost+found
directory by fsck.

lost+found directory could not be encrypted. As a result, the root
directory cannot be encrypted too. So if LOST_FOUND feature is enabled,
let's avoid to encrypt root directory.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 14:02:47 +09:00
Qiuyang Sun da5ce874f8 f2fs: release locks before return in f2fs_ioc_gc_range()
Currently, we will leave the kernel with locks still held when the gc_range
is invalid. This patch fixes the bug.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:58:16 +09:00
Jaegeuk Kim bb1105e479 f2fs: align memory boundary for bitops
For example, in arm64, free_nid_bitmap should be aligned to word size in order
to use bit operations.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:39 +09:00
Chao Yu c56675750d f2fs: remove unneeded set_cold_node()
When setting COLD_BIT_SHIFT flag in node block, we only need to call
set_cold_node() in new_node_page() and recover_inode_page() during
node page initialization. So remove unneeded set_cold_node() in other
places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:33 +09:00
Hyunchul Lee b91050a80c f2fs: add nowait aio support
This patch adds nowait aio support[1].

Return EAGAIN if any of the following checks fail for direct I/O:
  - i_rwsem is not lockable
  - Blocks are not allocated at the write location

And xfstests generic/471 is passed.

 [1]: 6be96d "Introduce RWF_NOWAIT and FMODE_AIO_NOWAIT"

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:32 +09:00
Chao Yu 63189b7859 f2fs: wrap all options with f2fs_sb_info.mount_opt
This patch merges miscellaneous mount options into struct f2fs_mount_info,
After this patch, once we add new mount option, we don't need to worry
about recovery of it in remount_fs(), since we will recover the
f2fs_sb_info.mount_opt including all options.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:30 +09:00
Yunlei He 5d7881cadf f2fs: Don't overwrite all types of node to keep node chain
Currently, we enable node SSR by default, and mixed
different types of node segment to do SSR more intensively.
Although reuse warm node is not allowed, warm node chain
will be destroyed by errors introduced by other types
node chain. So we'd better forbid reusing all types
of node to keep warm node chain.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:29 +09:00
Junling Zheng 93cf93f17c f2fs: introduce mount option for fsync mode
Commit "0a007b97aad6"(f2fs: recover directory operations by fsync)
fixed xfstest generic/342 case, but it also increased the written
data and caused the performance degradation. In most cases, there's
no need to do so heavy fsync actually.

So we introduce new mount option "fsync_mode={posix,strict}" to
control the policy of fsync. "fsync_mode=posix" is set by default,
and means that f2fs uses a light fsync, which follows POSIX semantics.
And "fsync_mode=strict" means that it's a heavy fsync, which behaves
in line with xfs, ext4 and btrfs, where generic/342 will pass, but
the performance will regress.

Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:28 +09:00
Chao Yu eea52882d8 f2fs: fix to restore old mount option in ->remount_fs
This patch fixes to restore old mount option once we encounter failure
in ->remount_fs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:27 +09:00
Chao Yu deeedd71e3 f2fs: wrap sb_rdonly with f2fs_readonly
Use f2fs_readonly to wrap sb_rdonly for cleanup, and spread it in
all places.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:57:26 +09:00
Jaegeuk Kim 162b27aec9 f2fs: avoid selinux denial on CAP_SYS_RESOURCE
This fixes CAP_SYS_RESOURCE denial of selinux when using resgid, since it
seems selinux reports it at the first place, but mostly we don't need to
check this condition first.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-17 13:55:43 +09:00
Chao Yu b6a06cbbb5 f2fs: support hot file extension
This patch supports to recognize hot file extension in f2fs, so that we
can allocate proper hot segment location for its data, which can lead to
better hot/cold seperation in filesystem.

In addition, we changes a bit on query/add/del operation method for
extension_list sysfs entry as below:

- Query: cat /sys/fs/f2fs/<disk>/extension_list
- Add: echo 'extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '!extension' > /sys/fs/f2fs/<disk>/extension_list
- Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
- Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
- [h] means add/del hot file extension
- [c] means add/del cold file extension

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:57 +09:00
Chao Yu 1dc0f8991d f2fs: fix to avoid race in between atomic write and background GC
Sqlite user			Background GC
				- move_data_block
				  : move page #1
				 - f2fs_is_atomic_file
- f2fs_ioc_start_atomic_write
- f2fs_ioc_commit_atomic_write
 - commit_inmem_pages
   : commit page #1 & set node #2 dirty
				 - f2fs_submit_page_write
				  - f2fs_update_data_blkaddr
				   - set_page_dirty
				     : set node #2 dirty
 - f2fs_do_sync_file
  - fsync_node_pages
   : commit node #1 & node #2, then sudden power-cut

In a race case, we may check FI_ATOMIC_FILE flag before starting atomic
write flow, then we will commit meta data before data with reversed
order, after a sudden pow-cut, database transaction will be inconsistent.

So we'd better to exclude gc/atomic_write to each other by using lock
instead of flag checking.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:55 +09:00
Jaegeuk Kim b27bc8091c f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
Otherwise, f2fs conducts GC on 8GB range only based on slow cost-benefit.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:54 +09:00
Jaegeuk Kim dee02f0d62 f2fs: issue discard aggressively in the gc_urgent mode
This patch avoids to skip discard commands when user sets gc_urgent mode.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:53 +09:00
Jaegeuk Kim 782911f491 f2fs: set readdir_ra by default
It gives general readdir improvement.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:52 +09:00
Jaegeuk Kim 84b89e5d94 f2fs: add auto tuning for small devices
If f2fs is running on top of very small devices, it's worth to avoid abusing
free LBAs. In order to achieve that, this patch introduces some parameter
tuning.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:51 +09:00
Jaegeuk Kim 079396270b f2fs: add mount option for segment allocation policy
This patch adds an mount option, "alloc_mode=%s" having two options, "default"
and "reuse".

In "alloc_mode=reuse" case, f2fs starts to allocate segments from 0'th segment
all the time to reassign segments. It'd be useful for small-sized eMMC parts.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:50 +09:00
Jaegeuk Kim 69babac019 f2fs: don't stop GC if GC is contended
Let's do GC as much as possible, while gc_urgent is set.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:49 +09:00
Chao Yu 846ae671ad f2fs: expose extension_list sysfs entry
This patch adds a sysfs entry 'extension_list' to support
query/add/del item in extension list.

Query:
cat /sys/fs/f2fs/<device>/extension_list

Add:
echo 'extension' > /sys/fs/f2fs/<device>/extension_list

Del:
echo '!extension' > /sys/fs/f2fs/<device>/extension_list

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:48 +09:00
Chao Yu 17cd07ae95 f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
As Jayashree Mohan reported:

A simple workload to reproduce this would be :
1. create foo
2. Write (8K - 16K)  // foo size = 16K now
3. fsync()
4. falloc zero_range , keep_size (4202496 - 4210688) // foo size must be 16K
5. fdatasync()
Crash now

On recovery, we see that the file size is 4210688 and not 16K, which
violates the semantics of keep_size flag. We have a test case to
reproduce this using CrashMonkey on 4.15 kernel. Try this out by
simply running :
 ./c_harness -f /dev/sda -d /dev/cow_ram0 -t f2fs -e 102400  -P -v
 tests/generic_468_zero.so

The root cause is that we miss to set KEEP_SIZE bit correctly in zero_range
when zeroing block cross EOF with FALLOC_FL_KEEP_SIZE, let's fix this
missing case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:47 +09:00
Chao Yu d0d3f1b329 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
f2fs_super_block.encrypt_pw_salt can be udpated and persisted
concurrently, result in getting different pwsalt in separated
threads, so let's introduce sb_lock to exclude concurrent
accessers.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:46 +09:00
Colin Ian King 8fe326cb99 f2fs: remove redundant initialization of pointer 'p'
Pointer p is initialized with a value that is never read and is later
re-assigned a new value, hence the initialization is redundant and can
be removed.

Cleans up clang warning:
fs/f2fs/extent_cache.c:463:19: warning: Value stored to 'p' during
its initialization is never read

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:45 +09:00
Gao Xiang 46706d5917 f2fs: flush cp pack except cp pack 2 page at first
Previously, we attempt to flush the whole cp pack in a single bio,
however, when suddenly powering off at this time, we could get into
an extreme scenario that cp pack 1 page and cp pack 2 page are updated
and latest, but payload or current summaries are still partially
outdated. (see reliable write in the UFS specification)

This patch submits the whole cp pack except cp pack 2 page at first,
and then writes the cp pack 2 page with an extra independent
bio with pre-io barrier.

Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:43 +09:00
Sheng Yong ccd31cb28f f2fs: clean up f2fs_sb_has_xxx functions
This patch introduces F2FS_FEATURE_FUNCS to clean up the definitions of
different f2fs_sb_has_xxx functions.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:42 +09:00
Tiezhu Yang 3bb09a0e7e f2fs: remove redundant check of page type when submit bio
This patch removes redundant check of page type when submit bio to
make the logic more clear.

Signed-off-by: Tiezhu Yang <kernelpatch@126.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:41 +09:00
Chao Yu fb0e72c8b9 f2fs: fix to handle looped node chain during recovery
There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:40 +09:00
Jaegeuk Kim 0f9ec2a8f6 f2fs: handle quota for orphan inodes
This is to detect dquot_initialize errors early from evict_inode
for orphan inodes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:38 +09:00
Hyunchul Lee 8b3a0ca0fd f2fs: Add the 'whint_mode' mount option to f2fs documentation
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Add the write-hint policy in f2fs doc.]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:37 +09:00
Hyunchul Lee f2e703f9a3 f2fs: support passing down write hints to block layer with F2FS policy
Add 'whint_mode=fs-based' mount option. In this mode, F2FS passes
down write hints with its policy.

* whint_mode=fs-based. F2FS passes down hints with its policy.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_MEDIUM;
                      HOT_NODE                 WRITE_LIFE_NOT_SET
                      WARM_NODE                "
                      COLD_NODE                WRITE_LIFE_NONE
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_LONG
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:36 +09:00
Hyunchul Lee 0cdd319539 f2fs: support passing down write hints given by users to block layer
Add the 'whint_mode' mount option that controls which write
hints are passed down to block layer. There are "off" and
"user-based" mode. The default mode is "off".

1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.

2) whint_mode=user-based. F2FS tries to pass down hints given
by users.

User                  F2FS                     Block
----                  ----                     -----
                      META                     WRITE_LIFE_NOT_SET
                      HOT_NODE                 "
                      WARM_NODE                "
                      COLD_NODE                "
ioctl(COLD)           COLD_DATA                WRITE_LIFE_EXTREME
extension list        "                        "

-- buffered io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        "
WRITE_LIFE_MEDIUM     "                        "
WRITE_LIFE_LONG       "                        "

-- direct io
WRITE_LIFE_EXTREME    COLD_DATA                WRITE_LIFE_EXTREME
WRITE_LIFE_SHORT      HOT_DATA                 WRITE_LIFE_SHORT
WRITE_LIFE_NOT_SET    WARM_DATA                WRITE_LIFE_NOT_SET
WRITE_LIFE_NONE       "                        WRITE_LIFE_NONE
WRITE_LIFE_MEDIUM     "                        WRITE_LIFE_MEDIUM
WRITE_LIFE_LONG       "                        WRITE_LIFE_LONG

Many thanks to Chao Yu and Jaegeuk Kim for comments to
implement this patch.

Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid build warning]
[Chao Yu: fix to restore whint_mode in ->remount_fs]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:35 +09:00
Chao Yu cd36d7a17f f2fs: fix to clear CP_TRIMMED_FLAG
Once CP_TRIMMED_FLAG is set, after a reboot, we will never issue discard
before LBA becomes invalid again, fix it by clearing the flag in
checkpoint without CP_TRIMMED reason.

Fixes: 1f43e2ad7b ("f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:34 +09:00
Chao Yu 199bc3fef2 f2fs: support large nat bitmap
Previously, we will store all nat version bitmap in checkpoint pack block,
so our total node entry number has a limitation which caused total node
number can not exceed (3900 * 8) block * 455 node/block = 14196000. So
that once user wants to create more nodes in large size image, it becomes
a bottleneck, that's unreasonable.

This patch detects the new layout of nat/sit version bitmap in image in
order to enable supporting large nat bitmap.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:33 +09:00
Chao Yu bf617f7a92 f2fs: fix to check extent cache in f2fs_drop_extent_tree
If noextent_cache mount option is on, we will never initialize extent tree
in inode, but still we're going to access it in f2fs_drop_extent_tree,
result in kernel panic as below:

 BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
 IP: _raw_write_lock+0xc/0x30
 Call Trace:
  ? f2fs_drop_extent_tree+0x41/0x70 [f2fs]
  f2fs_fallocate+0x5a0/0xdd0 [f2fs]
  ? common_file_perm+0x47/0xc0
  ? apparmor_file_permission+0x1a/0x20
  vfs_fallocate+0x15b/0x290
  SyS_fallocate+0x44/0x70
  do_syscall_64+0x6e/0x160
  entry_SYSCALL64_slow_path+0x25/0x25

This patch fixes to check extent cache status before using in
f2fs_drop_extent_tree.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:32 +09:00
Chao Yu 4d817ae099 f2fs: restrict inline_xattr_size configuration
This patch limits to enable inline_xattr_size mount option only if
both extra_attr and flexible_inline_xattr feature is on in current
image.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:31 +09:00
Yunlong Song b94929d975 f2fs: fix heap mode to reset it back
Commit 7a20b8a61e ("f2fs: allocate node
and hot data in the beginning of partition") introduces another mount
option, heap, to reset it back. But it does not do anything for heap
mode, so fix it.

Cc: stable@vger.kernel.org
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:30 +09:00
Sheng Yong 0964fc1a82 f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
sb_getblk does not guarantee the buffer head is uptodate. If bh is not
uptodate, the data (may be used as boot code) in area before
F2FS_SUPER_OFFSET may get corrupted when super block is committed.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:29 +09:00
Yunlong Song bdbc90fa55 f2fs: don't put dentry page in pagecache into highmem
Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-03-13 08:05:03 +09:00
Linus Torvalds 3664ce2d93 powerpc fixes for 4.16 #4
Add handling for a missing instruction in our 32-bit BPF JIT so that it can be
 used for seccomp filtering.
 
 Add a missing NULL pointer check before a function call in new EEH code.
 
 Fix an error path in the new ocxl driver to correctly return EFAULT.
 
 The support for the new ibm,drc-info device tree property turns out to need
 several fixes, so for now we just stop advertising to firmware that we support
 it until the bugs can be ironed out.
 
 One fix for the new drmem code which was incorrectly modifying the device tree
 in place.
 
 Finally two fixes for the RFI flush support, so that firmware can advertise to
 us that it should be disabled entirely so as not to affect performance.
 
 Thanks to:
   Bharata B Rao, Frederic Barrat, Juan J. Alvarez, Mark Lord, Michael Bringmann.
 -----BEGIN PGP SIGNATURE-----
 
 iQIwBAABCAAaBQJakNObExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYAy
 5A//RHFnwaXrOmcQ7h6fJim9lh+CsbhyHkf+hEGqJ4gp/ZOxjgWLjxgqB8O09hLb
 CZUvbqc3phip431E+qCw1KVVoPzKXVVqEQD1mckvpVwVL7vVxOvcNfjzRH3oMsjC
 jLQAfEvZMO6Lf14tb83vC5YBYOrWjojyOqJ/DbFzedw9hLLt1JTI12KfcgRZo5xr
 3WhyevTHa4JJSgLhY9r34Q2hsf7CS2yTfhHa2+iHFPg2fINYb+Ld8V5JNj3JiFtx
 fd6JiPFjQSyK+xXwK55eQGqMXZdEJU0iSwnYWBLUnFGaK7MQpEIJdv7FiRjD7Hcp
 b3wpBuCSsN2mQIYpsoBY3GUm0thtAIbLDTRq1nAWdULbnOi1QBQy694Vgn4f8A79
 T10Y1QuuwL2nInPy9qb/c6sukGz/czu9txFyqO/PwtRs99A6i4hRAtfKLUeW0cbG
 QxI3GMfxst31SThqjVQifyhy7omXEjU8Oc6o1VMp8Vm1eH1QviwzcfmKA5UQuq3k
 Y+ninT4wjo8MceAZ6F3w767DQmTK9e9lcgLU1ZZd/jntcnDzTr6AF54TFNmmicRb
 o0arHMneMDo1IHEUfeski+0j0uZGuA/reQ6A/mDGSxum09voPwtWEH+dlAr9oKSQ
 lgETCP6GN7rWluEWTYidgGGhV77QsfLNZyJYbGJXQ8cValo=
 =mTqU
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - Add handling for a missing instruction in our 32-bit BPF JIT so that
   it can be used for seccomp filtering.

 - Add a missing NULL pointer check before a function call in new EEH
   code.

 - Fix an error path in the new ocxl driver to correctly return EFAULT.

 - The support for the new ibm,drc-info device tree property turns out
   to need several fixes, so for now we just stop advertising to
   firmware that we support it until the bugs can be ironed out.

 - One fix for the new drmem code which was incorrectly modifying the
   device tree in place.

 - Finally two fixes for the RFI flush support, so that firmware can
   advertise to us that it should be disabled entirely so as not to
   affect performance.

Thanks to: Bharata B Rao, Frederic Barrat, Juan J. Alvarez, Mark Lord,
Michael Bringmann.

* tag 'powerpc-4.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Support firmware disable of RFI flush
  powerpc/pseries: Support firmware disable of RFI flush
  powerpc/mm/drmem: Fix unexpected flag value in ibm,dynamic-memory-v2
  powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
  powerpc/pseries: Revert support for ibm,drc-info devtree property
  powerpc/pseries: Fix duplicate firmware feature for DRC_INFO
  ocxl: Fix potential bad errno on irq allocation
  powerpc/eeh: Fix crashes in eeh_report_resume()
2018-02-24 16:05:50 -08:00
Linus Torvalds 9cb9c07d6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Fix TTL offset calculation in mac80211 mesh code, from Peter Oh.

 2) Fix races with procfs in ipt_CLUSTERIP, from Cong Wang.

 3) Memory leak fix in lpm_trie BPF map code, from Yonghong Song.

 4) Need to use GFP_ATOMIC in BPF cpumap allocations, from Jason Wang.

 5) Fix potential deadlocks in netfilter getsockopt() code paths, from
    Paolo Abeni.

 6) Netfilter stackpointer size checks really are needed to validate
    user input, from Florian Westphal.

 7) Missing timer init in x_tables, from Paolo Abeni.

 8) Don't use WQ_MEM_RECLAIM in mac80211 hwsim, from Johannes Berg.

 9) When an ibmvnic device is brought down then back up again, it can be
    sent queue entries from a previous session, handle this properly
    instead of crashing. From Thomas Falcon.

10) Fix TCP checksum on LRO buffers in mlx5e, from Gal Pressman.

11) When we are dumping filters in cls_api, the output SKB is empty, and
    the filter we are dumping is too large for the space in the SKB, we
    should return -EMSGSIZE like other netlink dump operations do.
    Otherwise userland has no signal that is needs to increase the size
    of its read buffer. From Roman Kapl.

12) Several XDP fixes for virtio_net, from Jesper Dangaard Brouer.

13) Module refcount leak in netlink when a dump start fails, from Jason
    Donenfeld.

14) Handle sub-optimal GSO sizes better in TCP BBR congestion control,
    from Eric Dumazet.

15) Releasing bpf per-cpu arraymaps can take a long time, add a
    condtional scheduling point. From Eric Dumazet.

16) Implement retpolines for tail calls in x64 and arm64 bpf JITs. From
    Daniel Borkmann.

17) Fix page leak in gianfar driver, from Andy Spencer.

18) Missed clearing of estimator scratch buffer, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
  net_sched: gen_estimator: fix broken estimators based on percpu stats
  gianfar: simplify FCS handling and fix memory leak
  ipv6 sit: work around bogus gcc-8 -Wrestrict warning
  macvlan: fix use-after-free in macvlan_common_newlink()
  bpf, arm64: fix out of bounds access in tail call
  bpf, x64: implement retpoline for tail call
  rxrpc: Fix send in rxrpc_send_data_packet()
  net: aquantia: Fix error handling in aq_pci_probe()
  bpf: fix rcu lockdep warning for lpm_trie map_free callback
  bpf: add schedule points in percpu arrays management
  regulatory: add NUL to request alpha2
  ibmvnic: Fix early release of login buffer
  net/smc9194: Remove bogus CONFIG_MAC reference
  net: ipv4: Set addr_type in hash_keys for forwarded case
  tcp_bbr: better deal with suboptimal GSO
  smsc75xx: fix smsc75xx_set_features()
  netlink: put module reference if dump start fails
  selftests/bpf/test_maps: exit child process without error in ENOMEM case
  selftests/bpf: update gitignore with test_libbpf_open
  selftests/bpf: tcpbpf_kern: use in6_* macros from glibc
  ..
2018-02-23 15:14:17 -08:00
Linus Torvalds 2eb02aa94f Merge branch 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:

 - keys fixes via David Howells:
      "A collection of fixes for Linux keyrings, mostly thanks to Eric
       Biggers:

        - Fix some PKCS#7 verification issues.

        - Fix handling of unsupported crypto in X.509.

        - Fix too-large allocation in big_key"

 - Seccomp updates via Kees Cook:
      "These are fixes for the get_metadata interface that landed during
       -rc1. While the new selftest is strictly not a bug fix, I think
       it's in the same spirit of avoiding bugs"

 - an IMA build fix from Randy Dunlap

* 'fixes-v4.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  integrity/security: fix digsig.c build error with header file
  KEYS: Use individual pages in big_key for crypto buffers
  X.509: fix NULL dereference when restricting key with unsupported_sig
  X.509: fix BUG_ON() when hash algorithm is unsupported
  PKCS#7: fix direct verification of SignerInfo signature
  PKCS#7: fix certificate blacklisting
  PKCS#7: fix certificate chain verification
  seccomp: add a selftest for get_metadata
  ptrace, seccomp: tweak get_metadata behavior slightly
  seccomp, ptrace: switch get_metadata types to arch independent
2018-02-23 15:04:24 -08:00
Linus Torvalds 65738c6b46 arm64 fixes:
- Compilation error when accessing MPIDR_HWID_BITMASK from .S
 
 - CTR_EL0 field definitions
 
 - Remove/disable some kernel messages on user faults (unhandled signals,
   unimplemented syscalls)
 
 - Kernel page fault in unwind_frame() with function graph tracing
 
 - perf sleeping while atomic errors when booting with ACPI
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlqQYbQACgkQa9axLQDI
 XvEk0Q//clUp5MdD/2hBMtHQuoJ2HuXEHe5zMnZO7YQqzcUj+syF/HQ/r5U7Repc
 C+rNgjNl7ILrZK6T7LFKeDt6TCzTzTsudcdGyrZ2kXIaJ6lwIOlr8pcv+EsVsPAX
 nEdNdZpJ3+N7tKqoNNWgmWN3rOhROEMPWaZc+b/zKz6VGs2K5axpfi/eKIoUeedA
 7p1PTE0m7E16f5iUasimHXCJh5IvbEZN3u1H1588wetApA/wKG8HZaK5yTbgblyH
 Cmg83pIMjDQTikvhDk9wNgn8G/N0qIcu0/h3YazgIyzFIf7Pie4aPcfa+uHjYsAT
 aSUyC7KoeETTMRYreRHpcXzCzZzsvl+1SY27cMdrIZQwsQ5H3V5+hHXXV8S7UH3g
 1QgDua6bp9ZCCB7jYqOQupP+hs64EIetlSSufdpPWcJ3MWO3zZ8uael3tpxYjzNW
 F447ytBaAaNOhA6JxpJERChi8EkdbOfZCfzMes9Pdcce3ACGC0k7FNFwgApDtRlN
 Dsbua9OZfRrafoMv5BWGprdCczcwZMNefoOZ7FwMRWmFuEos6eB21MEelLf5Heh3
 hNERKj22LDhiq24wCle19EnmQTf+6KTu1FOX4bZz6QTGn+nNWh+VNy4xKArzFZR3
 AGv05QY9pehekjBF52O2RISftU9flJluZxMNpsVm5pnt6yFIL3o=
 =MQ2t
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 "arm64 and perf fixes:

   - build error when accessing MPIDR_HWID_BITMASK from .S

   - fix CTR_EL0 field definitions

   - remove/disable some kernel messages on user faults (unhandled
     signals, unimplemented syscalls)

   - fix kernel page fault in unwind_frame() with function graph tracing

   - fix perf sleeping while atomic errors when booting with ACPI"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: fix unwind_frame() for filtered out fn for function graph tracing
  arm64: Enforce BBM for huge IO/VMAP mappings
  arm64: perf: correct PMUVer probing
  arm_pmu: acpi: request IRQs up-front
  arm_pmu: note IRQs and PMUs per-cpu
  arm_pmu: explicitly enable/disable SPIs at hotplug
  arm_pmu: acpi: check for mismatched PPIs
  arm_pmu: add armpmu_alloc_atomic()
  arm_pmu: fold platform helpers into platform code
  arm_pmu: kill arm_pmu_platdata
  ARM: ux500: remove PMU IRQ bouncer
  arm64: __show_regs: Only resolve kernel symbols when running at EL1
  arm64: Remove unimplemented syscall log message
  arm64: Disable unhandled signal log messages by default
  arm64: cpufeature: Fix CTR_EL0 field definitions
  arm64: uaccess: Formalise types for access_ok()
  arm64: Fix compilation error while accessing MPIDR_HWID_BITMASK from .S files
2018-02-23 15:01:01 -08:00