When truncating cached inline_data, we don't need to allocate a new page
all the time. Instead, it must check its page cache only.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We don't need to make zeros beyond i_size, since we already wrote that through
NEW_ADDR case.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Some applications may create multimeida file with temporary name like
'*.jpg.tmp' or '*.mp4.tmp', then rename to '*.jpg' or '*.mp4'.
Now, f2fs can only detect multimedia filename with specified format:
"filename + '.' + extension", so it will make f2fs missing to detect
multimedia file with special temporary name, result in failing to set
cold flag on file.
This patch enhances detection flow for enabling lookup extension in the
middle of temporary filename.
Reported-by: Xue Liu <liuxueliu.liu@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add roll-forward recovery process for encrypted dentry, so the first fsync
issued to an encrypted file does not need writing checkpoint.
This improves the performance of the following test at thousands of small
files: open -> write -> fsync -> close
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify kernel message to show encrypted names]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch enhances the xattr consistency of dirs from suddern power-cuts.
Possible scenario would be:
1. dir->setxattr used by per-file encryption
2. file->setxattr goes into inline_xattr
3. file->fsync
In that case, we should do checkpoint for #1.
Otherwise we'd lose dir's key information for the file given #2.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Like most filesystems, f2fs will issue discard command synchronously, so
when user trigger fstrim through ioctl, multiple discard commands will be
issued serially with sync mode, which makes poor performance.
In this patch we try to support async discard, so that all discard
commands can be issued and be waited for endio in batch to improve
performance.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch sets encryption name flag in the add inline entry path
if filename is encrypted.
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When decrypting dirents in ->readdir, fscrypt_fname_disk_to_usr won't
change content of original encrypted dirent, we don't need to allocate
additional buffer for storing mirror of it, so get rid of it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When creating new inode, security_inode_init_security will be called for
initializing security info related to the inode, and filename is passed to
security module, it helps security module such as SElinux to know which
rule or label could be applied for the inode with specified name.
Previously, if new inode is created as an encrypted one, f2fs will transfer
encrypted filename to security module which may fail the check of security
policy belong to the inode. So in order to this issue, alter to transfer
original unencrypted filename instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In batch discard approach of fstrim will grab/release gc_mutex lock
repeatly, it makes contention of the lock becoming more intensive.
So after one batch discards were issued in checkpoint and the lock
was released, it's better to do schedule() to increase opportunity
of grabbing gc_mutex lock for other competitors.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make inline_dentry as default mount option to improve space usage and
IO performance in scenario of numerous small directory.
It adds noinline_dentry mount option, instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the following scenario,
1) we don't have the key and doing a lookup for encrypted file,
2) and the encrypted filename is big name
we should use fname->hash as name hash value instead of what is
calculated by fname->disk_name. Because in such case,
fname->disk_name is empty.
Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In write_begin(), we skip checking dnode block for preallocating block
when whole block needs to be updated since we preallocated its block in
f2fs_preallocate_blocks, for partial updated block, we will still try
to lock its node and do preallocation in write_begin(), so in
f2fs_preallocate_blocks we should not preallocate its block.
But previously, the calculation of preallocating block number is
incorrect, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Fixes the following sparse warning:
fs/f2fs/data.c:969:12: warning:
symbol 'f2fs_grab_bio' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
`flags' is used to save value from userspace, there is no need to
initialize it, and FS_FL_USER_VISIBLE is the mask for getflags.
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During fstrim, if one of multiple write_checkpoint failed, break off and
return error number to caller.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
should call f2fs_balance_fs for checking and reclaiming space, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When building each sit entry in cache, firstly, we will load it from
sit page, and then check all entries in sit journal, if there is one
updated entry in journal, cover cached entry with the journaled one.
Actually, most of check operation is unneeded since we only need
to update cached entries with journaled entries in batch, so
changing the flow as below for more efficient:
1. load all sit entries into cache from sit pages;
2. update sit entries with journal.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch changes to check valid block number of one GCed section
directly instead of checking the number in all segments of section
one by one in order to clean up codes of foreground GC.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We don't guarantee integrity of user data after checkpoint, since we only
guarantee meta data integrity for data consistency of filesystem.
Due to above reason, we only need to set fs as dirty when meta data is
updated, so that we can skip writing checkpoint in some case of non-meta
data is updated.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch enables to do fstrim without checkpoint, if there is no fs
change.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch add discard block count to sys entry of f2fs status
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Changes in this update
- regression fixes for XFS changes introduce in 4.8-rc1
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
- regression fixes for iomap infrastructure in 4.8-rc1
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXtpPqAAoJEK3oKUf0dfoduBkQAKxy6ETxDUd5OqlFdc0NlZYL
cZYefWzaX/X+eO/SCgw9zB/HE9o0/zyCF/OcGF0Inb1uySPkERIfV5qPGmmHvqdm
86bfV3PRkRYsoXI289ci5y64hwFFbev65ZAm6pEbFCbkAYCPBZg6w4Cg+80dG9Cp
o8o82oUW6WINVfySpyqsrO5Uje15Bz/Dx/tD9gkhQdWWaOHQhi8C9tf0uJQgLKWN
MC/SXDoNafDIDs5rxJB2n8Nu66i09OgoP3wk1ID8GYwHOBi1QqSGWoLZFpLdgMoi
GJnAAzl6yolq7exZuk/1LD/Vsu4mvtuK/5hA+pRf0KBLim+BLnv7tVYCRM9BHD7m
s2ddgk2ZW9MNlYNB4K1GQf2WfjnNb7qrzupswEBtBXArgsA6v9TXjHz2sizF04MO
EYcHbhAl48usuBiibLqDbo9w2bsAOE4BhLReU4SUSPD1/C6Ujicx32hj/IPKbUxV
tIiAr6zf9PThvCI+flaFN6ztWTi1rZN1NPC/boxoUYh3FvEMmJZ6WFYB9SfvlCF/
dd8ybry8wwIswdVA7R6GqWTWOEvY70QOsDZFLiJ/nivEKLpnifTGVAy6mBSIBiqG
HsVktXj25454j+hv+2YPkmI1Th8E59ABZvX0oam+ZdtpjDnLefkrdaTKtARpzW3L
xNKHeXjpODATzlVaWKfC
=h8VB
-----END PGP SIGNATURE-----
Merge tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs and iomap fixes from Dave Chinner:
"Changes in this update:
Regression fixes for XFS changes introduce in 4.8-rc1:
- buffer IO accounting assert failure
- ENOSPC block accounting reservation issue
- DAX IO path page cache invalidation fix
- rmapbt on-disk block count in agf
- correct classification of rmap block type when updating AGFL.
- iomap support for attribute fork mapping
Regression fixes for iomap infrastructure in 4.8-rc1:
- fiemap: honor FIEMAP_FLAG_SYNC
- fiemap: implement FIEMAP_FLAG_XATTR support to fix XFS regression
- make mark_page_accessed and pagefault_disable usage consistent with
other IO paths"
* tag 'xfs-iomap-for-linus-4.8-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: remove OWN_AG rmap when allocating a block from the AGFL
xfs: (re-)implement FIEMAP_FLAG_XATTR
xfs: simplify xfs_file_iomap_begin
iomap: mark ->iomap_end as optional
iomap: prepare iomap_fiemap for attribute mappings
iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag
iomap: remove superflous pagefault_disable from iomap_write_actor
iomap: remove superflous mark_page_accessed from iomap_write_actor
xfs: store rmapbt block count in the AGF
xfs: don't invalidate whole file on DAX read/write
xfs: fix bogus space reservation in xfs_iomap_write_allocate
xfs: don't assert fail on non-async buffers on ioacct decrement
Thread A Thread B
- inode_lock fileA
- inode_lock fileB
- inode_lock fileA
- inode_lock fileB
We may encounter above potential deadlock during moving file range in
concurrent scenario. This patch fixes the issue by using inode_trylock
instead.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Only if two input files are regular files, we allow copying data in
range of them, otherwise, deny it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit a2ee0a3003.
When testing with generic/032 of xfstest suit, failure message will be
reported as below:
generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
--- tests/generic/032.out 2015-01-11 16:52:27.643681072 +0800
+++ results/generic/032.out.bad 2016-08-06 13:44:43.861330500 +0800
@@ -1,5 +1,5 @@
QA output created by 032
-100 iterations
-0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
-*
-0100000
+1: [768..775]: unwritten
+Unwritten extents found!
...
(Run 'diff -u tests/generic/032.out results/generic/032.out.bad' to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests
In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.
Thread A Thread B
- write_end
- unlock page
- writepages
- lock_page
- writepage
if page is out-of-range of file size,
we will skip writting the page.
- update i_size
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.
[1] https://github.com/sslab-gatech/fxmark
This reverts commit ec795418c4.
Pull networking fixes from David Miller:
1) Buffers powersave frame test is reversed in cfg80211, fix from Felix
Fietkau.
2) Remove bogus WARN_ON in openvswitch, from Jarno Rajahalme.
3) Fix some tg3 ethtool logic bugs, and one that would cause no
interrupts to be generated when rx-coalescing is set to 0. From
Satish Baddipadige and Siva Reddy Kallam.
4) QLCNIC mailbox corruption and napi budget handling fix from Manish
Chopra.
5) Fix fib_trie logic when walking the trie during /proc/net/route
output than can access a stale node pointer. From David Forster.
6) Several sctp_diag fixes from Phil Sutter.
7) PAUSE frame handling fixes in mlxsw driver from Ido Schimmel.
8) Checksum fixup fixes in bpf from Daniel Borkmann.
9) Memork leaks in nfnetlink, from Liping Zhang.
10) Use after free in rxrpc, from David Howells.
11) Use after free in new skb_array code of macvtap driver, from Jason
Wang.
12) Calipso resource leak, from Colin Ian King.
13) mediatek bug fixes (missing stats sync init, etc.) from Sean Wang.
14) Fix bpf non-linear packet write helpers, from Daniel Borkmann.
15) Fix lockdep splats in macsec, from Sabrina Dubroca.
16) hv_netvsc bug fixes from Vitaly Kuznetsov, mostly to do with VF
handling.
17) Various tc-action bug fixes, from CONG Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (116 commits)
net_sched: allow flushing tc police actions
net_sched: unify the init logic for act_police
net_sched: convert tcf_exts from list to pointer array
net_sched: move tc offload macros to pkt_cls.h
net_sched: fix a typo in tc_for_each_action()
net_sched: remove an unnecessary list_del()
net_sched: remove the leftover cleanup_a()
mlxsw: spectrum: Allow packets to be trapped from any PG
mlxsw: spectrum: Unmap 802.1Q FID before destroying it
mlxsw: spectrum: Add missing rollbacks in error path
mlxsw: reg: Fix missing op field fill-up
mlxsw: spectrum: Trap loop-backed packets
mlxsw: spectrum: Add missing packet traps
mlxsw: spectrum: Mark port as active before registering it
mlxsw: spectrum: Create PVID vPort before registering netdevice
mlxsw: spectrum: Remove redundant errors from the code
mlxsw: spectrum: Don't return upon error in removal path
i40e: check for and deal with non-contiguous TCs
ixgbe: Re-enable ability to toggle VLAN filtering
ixgbe: Force VLNCTRL.VFE to be set in all VMDq paths
...
When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller. Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up. This bug was discovered by running generic/299
repeatedly.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Use a special read-only iomap_ops implementation to support fiemap on
the attr fork.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
We'll never get nimap == 0 for a successful return from xfs_bmapi_read,
so don't try to handle it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
No need to implement it for read-only mappings.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
By bassing through an -ENOENT, similar to the old XFS implementation of
FIEMAP_FLAG_XATTR.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The flag is checked as supported, but then we do an unconditional
sync of the file, regardless of whether the flag is set or not. Make
the sync conditional on having the FIEMAP_FLAG_SYNC flag set.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
iov_iter_copy_from_user_atomic disables page faults internally, no need to
do it around the call. This also brings the iomap code in line with
the original filemap version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This catches up with commit 2457ae ("mm: non-atomically mark page
accessed during page cache allocation where possible"), which
moved the initial access marking into the pagecache allocator.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Track the number of blocks used for the rmapbt in the AGF. When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When we do DAX IO, we try to invalidate the entire page cache held
on the file. This is incorrect as it will trash the entire mapping
tree that now tracks dirty state in exceptional entries in the radix
tree slots.
What we are trying to do is remove cached pages (e.g from reads
into holes) that sit in the radix tree over the range we are about
to write to. Hence we should just limit the invalidation to the
range we are about to overwrite.
Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The space reservations was without an explaination in commit
"Add error reporting calls in error paths that return EFSCORRUPTED"
back in 2003. There is no reason to reserve disk blocks in the
transaction when allocating blocks for delalloc space as we already
reserved the space when creating the delalloc extent.
With this fix we stop running out of the reserved pool in
generic/229, which has happened for long time with small blocksize
file systems, and has increased in severity with the new buffered
write path.
[ dchinner: we still need to pass the block reservation into
xfs_bmapi_write() to ensure we don't deadlock during AG selection.
See commit dbd5c8c ("xfs: pass total block res. as total
xfs_bmapi_write() parameter") for more details on why this is
necessary. ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>