fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback

23d0127096 ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE
writeback") claims that sync_file_range(2) syscall was "created for
userspace to be able to issue background writeout and so waiting for
in-flight IO is undesirable there" and changes the writeback (back) to
WB_SYNC_NONE.

This claim is only partially true.  It is true for users that use the flag
SYNC_FILE_RANGE_WRITE by itself, as does PostgreSQL, the user that was the
reason for changing to WB_SYNC_NONE writeback.

However, that claim is not true for users that use that flag combination
SYNC_FILE_RANGE_{WAIT_BEFORE|WRITE|_WAIT_AFTER}.  Those users explicitly
requested to wait for in-flight IO as well as to writeback of dirty pages.

Re-brand that flag combination as SYNC_FILE_RANGE_WRITE_AND_WAIT and use
WB_SYNC_ALL writeback to perform the full range sync request.

Link: http://lkml.kernel.org/r/20190409114922.30095-1-amir73il@gmail.com
Link: http://lkml.kernel.org/r/20190419072938.31320-1-amir73il@gmail.com
Fixes: 23d0127096 ("fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Jan Kara <jack@suse.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Amir Goldstein 2019-05-13 17:22:30 -07:00 committed by Linus Torvalds
parent 5326905798
commit c553ea4fdf
2 changed files with 18 additions and 6 deletions

View File

@ -292,8 +292,14 @@ int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
} }
if (flags & SYNC_FILE_RANGE_WRITE) { if (flags & SYNC_FILE_RANGE_WRITE) {
int sync_mode = WB_SYNC_NONE;
if ((flags & SYNC_FILE_RANGE_WRITE_AND_WAIT) ==
SYNC_FILE_RANGE_WRITE_AND_WAIT)
sync_mode = WB_SYNC_ALL;
ret = __filemap_fdatawrite_range(mapping, offset, endbyte, ret = __filemap_fdatawrite_range(mapping, offset, endbyte,
WB_SYNC_NONE); sync_mode);
if (ret < 0) if (ret < 0)
goto out; goto out;
} }
@ -306,9 +312,9 @@ out:
} }
/* /*
* sys_sync_file_range() permits finely controlled syncing over a segment of * ksys_sync_file_range() permits finely controlled syncing over a segment of
* a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is * a file in the range offset .. (offset+nbytes-1) inclusive. If nbytes is
* zero then sys_sync_file_range() will operate from offset out to EOF. * zero then ksys_sync_file_range() will operate from offset out to EOF.
* *
* The flag bits are: * The flag bits are:
* *
@ -325,7 +331,7 @@ out:
* Useful combinations of the flag bits are: * Useful combinations of the flag bits are:
* *
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE: ensures that all pages
* in the range which were dirty on entry to sys_sync_file_range() are placed * in the range which were dirty on entry to ksys_sync_file_range() are placed
* under writeout. This is a start-write-for-data-integrity operation. * under writeout. This is a start-write-for-data-integrity operation.
* *
* SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which * SYNC_FILE_RANGE_WRITE: start writeout of all dirty pages in the range which
@ -337,10 +343,13 @@ out:
* earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait * earlier SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE operation to wait
* for that operation to complete and to return the result. * for that operation to complete and to return the result.
* *
* SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER: * SYNC_FILE_RANGE_WAIT_BEFORE|SYNC_FILE_RANGE_WRITE|SYNC_FILE_RANGE_WAIT_AFTER
* (a.k.a. SYNC_FILE_RANGE_WRITE_AND_WAIT):
* a traditional sync() operation. This is a write-for-data-integrity operation * a traditional sync() operation. This is a write-for-data-integrity operation
* which will ensure that all pages in the range which were dirty on entry to * which will ensure that all pages in the range which were dirty on entry to
* sys_sync_file_range() are committed to disk. * ksys_sync_file_range() are written to disk. It should be noted that disk
* caches are not flushed by this call, so there are no guarantees here that the
* data will be available on disk after a crash.
* *
* *
* SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any * SYNC_FILE_RANGE_WAIT_BEFORE and SYNC_FILE_RANGE_WAIT_AFTER will detect any

View File

@ -320,6 +320,9 @@ struct fscrypt_key {
#define SYNC_FILE_RANGE_WAIT_BEFORE 1 #define SYNC_FILE_RANGE_WAIT_BEFORE 1
#define SYNC_FILE_RANGE_WRITE 2 #define SYNC_FILE_RANGE_WRITE 2
#define SYNC_FILE_RANGE_WAIT_AFTER 4 #define SYNC_FILE_RANGE_WAIT_AFTER 4
#define SYNC_FILE_RANGE_WRITE_AND_WAIT (SYNC_FILE_RANGE_WRITE | \
SYNC_FILE_RANGE_WAIT_BEFORE | \
SYNC_FILE_RANGE_WAIT_AFTER)
/* /*
* Flags for preadv2/pwritev2: * Flags for preadv2/pwritev2: