Commit Graph

56086 Commits

Author SHA1 Message Date
Long Li ff526d8605 CIFS: SMBD: Do not call ib_dereg_mr on invalidated memory registration
It is not necessary to deregister a memory registration after it has been
successfully invalidated.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:05 -05:00
Long Li 6d3adb23be CIFS: pass page offsets on SMB1 read/write
When issuing SMB1 read/write, pass the page offset to transport.

Signed-off-by: Long Li <longli@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:05 -05:00
Garry McNulty ef2298a06d fs/cifs: fix uninitialised variable warnings
In some error conditions, resp_buftype can be passed uninitialised to
free_rsp_buf(), potentially resulting in a spurious debug message.
If resp_buftype randomly had the value 1 (CIFS_SMALL_BUFFER) then this
would log a debug message.
The rsp pointer is initialised to NULL so there is no other side-effect.

Detected by CoverityScan, CID 1438585 ("Uninitialized scalar variable")
Detected by CoverityScan, CID 1438667 ("Uninitialized scalar variable")
Detected by CoverityScan, CID 1438764 ("Uninitialized scalar variable")

Signed-off-by: Garry McNulty <garrmcnu@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:05 -05:00
Steve French 179e44d49c smb3: add tracepoint for sending lease break responses to server
Be able to log a ftrace message on success and/or failure of
sending a lease break response to the server.

Example output:

           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
             | |       |   ||||       |         |
     kworker/1:1-5681  [001] .... 11123.530457: smb3_lease_done: sid=0x291e3e0f tid=0x8ba43071 lease_key=0x1852ca0d3ecd9b55847750a86716fde lease_state=0x0

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:05 -05:00
Steve French 9b9c5bea0b cifs: do not return atime less than mtime
In network file system it is fairly easy for server and client
atime vs. mtime to get confused (and atime updated less frequently)
which we noticed broke some apps which expect atime >= mtime

Also ignore relatime mount option (rather than error on it) since
relatime is basically what some network server fs are doing
(relatime).

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:05 -05:00
Steve French 3d621230b8 smb3: update default requested iosize to 4MB from 1MB for recent dialects
Modern servers often support 8MB as maximum i/o size, and we see some
performance benefits (my testing showed 1 to 13% on write paths,
and 1 to 3% on read paths for increasing the default to 4MB). If server
doesn't support larger i/o size, during negotiate protocol it is already
set correctly to the server's maximum if lower than 4MB.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:05 -05:00
Steve French 6e4d3bbe92 smb3: Add debug message later in smb2/smb3 reconnect path
As we reset credits later in the reconnect path, useful
to have optional (cifsFYI) debug message.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-23 21:16:05 -05:00
Aurelien Aptel 8393072bab CIFS: make 'nodfs' mount opt a superblock flag
tcon->Flags is only used by SMB1 code and changing it is not permanent
(you lose the setting on tcon reconnect).

* Move the setting to superblock flags (per mount-points).
* Make automount callback exit early when flag present
* Make dfs resolving happening in mount syscall exit early if flag present

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-23 21:16:05 -05:00
Steve French 9e1a37dad4 smb3: track the instance of each session for debugging
Each time we reconnect to the same server, bump an instance
counter (and display in /proc/fs/cifs/DebugData) to make it
easier to debug.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-10-23 21:16:04 -05:00
Steve French 37e6a70576 smb3: minor missing defines relating to reparse points
Previously reserved dpen response field changed in smb3

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:04 -05:00
Steve French 00778e2294 smb3: add way to control slow response threshold for logging and stats
/proc/fs/cifs/Stats when CONFIG_CIFS_STATS2 is enabled logs 'slow'
responses, but depending on the server you are debugging a
one second timeout may be too fast, so allow setting it to
a larger number of seconds via new module parameter

/sys/module/cifs/parameters/slow_rsp_threshold

or via modprobe:

slow_rsp_threshold:Amount of time (in seconds) to wait before
logging that a response is delayed.
Default: 1 (if set to 0 disables msg). (uint)

Recommended values are 0 (disabled) to 32767 (9 hours) with
the default remaining as 1 second.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:04 -05:00
Steve French 1c3a13a38a cifs: minor updates to module description for cifs.ko
note smb3 (and common more modern servers) in the module description

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:04 -05:00
Steve French 5a519bead4 cifs: protect against server returning invalid file system block size
For a network file system we generally prefer large i/o, but
if the server returns invalid file system block/sector sizes
in cifs (vers=1.0) QFSInfo then set block size to a default
of a reasonable minimum (4K).

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:04 -05:00
Steve French 2c887635cd smb3: allow stats which track session and share reconnects to be reset
Currently, "echo 0 > /proc/fs/cifs/Stats" resets all of the stats
except the session and share reconnect counts.  Fix it to
reset those as well.

CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-10-23 21:16:04 -05:00
Steve French 4d5bdf2869 SMB3: Backup intent flag missing from compounded ops
When "backup intent" is requested on the mount (e.g. backupuid or
backupgid mount options), the corresponding flag was missing from
some of the new compounding operations as well (now that
open_query_close is gone).

Related to kernel bugzilla #200953

Reported-and-tested-by: <whh@rubrik.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg 14e562ada2 cifs: create a define for the max number of iov we need for a SMB2 set_info
So we don't overflow the io vector arrays accidentally

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg bb435512ce cifs: change SMB2_OP_RENAME and SMB2_OP_HARDLINK to use compounding
Get rid of smb2_open_op_close() as all operations are now migrated
to smb2_compound_op().

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg 3764cbd179 cifs: remove the is_falloc argument to SMB2_set_eof
We never pass is_falloc==true here anyway and if we ever need to support
is_falloc in the future, SMB2_set_eof is such a trivial wrapper around
send_set_info() that we can/should just create a differently named wrapper
for that new functionality.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg dcbf910357 cifs: change SMB2_OP_SET_INFO to use compounding
Cuts number of network roundtrips significantly for some common syscalls

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg f7bfe04bf0 cifs: change SMB2_OP_SET_EOF to use compounding
This changes SMB2_OP_SET_EOF to use compounding in some situations.
This is part of the path based API to truncate a file.
Most of the time this will however not be invoked for SMB2 since
cifs_set_file_size() will as far as I can tell almost always just
open the file synchronously and switch to the handle based truncate
code path, thus bypassing the compounding we add here.

Rewriting cifs_set_file_size() and make that whole pile of code more
compounding friendly, and also easier to read and understand, is a
different project though and not for this patch.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg c2e0fe3f5a cifs: make rmdir() use compounding
This and previous patches drop the number of roundtrips we need for rmdir()
from 6 to 2.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg ba8ca11685 cifs: create helpers for SMB2_set_info_init/free()
so that we can use these later for compounded set-info calls.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg 47dd9597df cifs: change unlink to use a compound
This,and previous patches, drops the number of roundtrips from five to two
for unlink()

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg f733e3936d cifs: change mkdir to use a compound
This with the previous patch changes mkdir() from needing 6 roundtrips to
just 3.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg c5a5f38f07 cifs: add a smb2_compound_op and change QUERY_INFO to use it
This turns most open/query-info/close patterns in cifs.ko
to become compounds.

This changes stat from using 3 roundtrips to just a single one.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Ronnie Sahlberg cb5c2e6394 cifs: fix a credits leak for compund commands
When processing the mids for compounds we would only add credits based on
the last successful mid in the compound which would leak credits and
eventually triggering a re-connect.

Fix this by splitting the mid processing part into two loops instead of one
where the first loop just waits for all mids and then counts how many
credits we were granted for the whole compound.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:04 -05:00
Steve French b340a4d4aa smb3: add tracepoint to catch cases where credit refund of failed op overlaps reconnect
Add tracepoint to catch potential cases where a pending operation overlapping a
reconnect could fail and incorrectly refund its credits causing the client
to think it has more credits available than the server thinks it does.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:03 -05:00
YueHaibing ce7fb50f92 cifs: remove set but not used variable 'cifs_sb'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/ioctl.c: In function 'cifs_ioctl':
fs/cifs/ioctl.c:164:23: warning:
 variable 'cifs_sb' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:03 -05:00
YueHaibing d034feeb44 cifs: Use kmemdup rather than duplicating its implementation in smb311_posix_mkdir()
Use kmemdup rather than duplicating its implementation

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2018-10-23 21:16:03 -05:00
Steve French d42c8a87d1 smb3: do not display confusing message on mount to Azure servers
Some servers (e.g. Azure) return "STATUS_NOT_IMPLEMENTED" rather
than "STATUS_INVALID_DEVICE_REQUEST" on query network interface
info at mount.  This shouldn't cause us to log a warning message
automatically.  Don't log this unless noisier cifsFYI is enabled.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2018-10-23 21:16:03 -05:00
Frank Sorenson 86bbd7422a NFS: change sign of nfs_fh length
The filehandle has a length which is defined as a 32-bit
"unsigned integer".  Change sign of the length appropriately.

Signed-off-by: Frank Sorenson <sorenson@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-23 12:22:21 -04:00
Linus Torvalds 99792e0cea Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Lots of changes in this cycle:

   - Lots of CPA (change page attribute) optimizations and related
     cleanups (Thomas Gleixner, Peter Zijstra)

   - Make lazy TLB mode even lazier (Rik van Riel)

   - Fault handler cleanups and improvements (Dave Hansen)

   - kdump, vmcore: Enable kdumping encrypted memory with AMD SME
     enabled (Lianbo Jiang)

   - Clean up VM layout documentation (Baoquan He, Ingo Molnar)

   - ... plus misc other fixes and enhancements"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()
  x86/mm: Kill stray kernel fault handling comment
  x86/mm: Do not warn about PCI BIOS W+X mappings
  resource: Clean it up a bit
  resource: Fix find_next_iomem_res() iteration issue
  resource: Include resource end in walk_*() interfaces
  x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
  x86/mm: Remove spurious fault pkey check
  x86/mm/vsyscall: Consider vsyscall page part of user address space
  x86/mm: Add vsyscall address helper
  x86/mm: Fix exception table comments
  x86/mm: Add clarifying comments for user addr space
  x86/mm: Break out user address space handling
  x86/mm: Break out kernel address space handling
  x86/mm: Clarify hardware vs. software "error_code"
  x86/mm/tlb: Make lazy TLB mode lazier
  x86/mm/tlb: Add freed_tables element to flush_tlb_info
  x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range
  smp,cpumask: introduce on_each_cpu_cond_mask
  smp: use __cpumask_set_cpu in on_each_cpu_cond
  ...
2018-10-23 17:05:28 +01:00
Linus Torvalds 0200fbdd43 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and misc x86 updates from Ingo Molnar:
 "Lots of changes in this cycle - in part because locking/core attracted
  a number of related x86 low level work which was easier to handle in a
  single tree:

   - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
     McKenney, Andrea Parri)

   - lockdep scalability improvements and micro-optimizations (Waiman
     Long)

   - rwsem improvements (Waiman Long)

   - spinlock micro-optimization (Matthew Wilcox)

   - qspinlocks: Provide a liveness guarantee (more fairness) on x86.
     (Peter Zijlstra)

   - Add support for relative references in jump tables on arm64, x86
     and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)

   - Be a lot less permissive on weird (kernel address) uaccess faults
     on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
     Horn)

   - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
     Amit)

   - ... and a handful of other smaller changes as well"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  locking/lockdep: Make global debug_locks* variables read-mostly
  locking/lockdep: Fix debug_locks off performance problem
  locking/pvqspinlock: Extend node size when pvqspinlock is configured
  locking/qspinlock_stat: Count instances of nested lock slowpaths
  locking/qspinlock, x86: Provide liveness guarantee
  x86/asm: 'Simplify' GEN_*_RMWcc() macros
  locking/qspinlock: Rework some comments
  locking/qspinlock: Re-order code
  locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
  x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
  futex: Replace spin_is_locked() with lockdep
  locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
  x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
  x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
  x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
  x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
  x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
  x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
  x86/refcount: Work around GCC inlining bug
  x86/objtool: Use asm macros to work around GCC inlining bugs
  ...
2018-10-23 13:08:53 +01:00
Chao Yu 7813081969 f2fs: fix to keep project quota consistent
This patch does below changes to keep consistence of project quota data
in sudden power-cut case:
- update inode.i_projid and project quota atomically under lock_op() in
f2fs_ioc_setproject()
- recover inode.i_projid and project quota in recover_inode()

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:48 -07:00
Chao Yu af033b2aa8 f2fs: guarantee journalled quota data by checkpoint
For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
 a) flush dquot metadata into quota file.
 b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
 a) checkpoint will skip syncing dquot metadata.
 b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
    hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Sheng Yong 26b5a07919 f2fs: cleanup dirty pages if recover failed
During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546174] Modules linked in:
[   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   53.546242] Call Trace:
[   53.546248]  f2fs_submit_page_bio+0x95/0x740
[   53.546253]  read_node_page+0x161/0x1e0
[   53.546271]  ? truncate_node+0x650/0x650
[   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
[   53.546288]  ? pagecache_get_page+0x262/0x2d0
[   53.546292]  __get_node_page+0x200/0x660
[   53.546302]  f2fs_update_inode_page+0x4a/0x160
[   53.546306]  f2fs_write_inode+0x86/0xb0
[   53.546317]  __writeback_single_inode+0x49c/0x620
[   53.546322]  writeback_single_inode+0xe4/0x1e0
[   53.546326]  sync_inode_metadata+0x93/0xd0
[   53.546330]  ? sync_inode+0x10/0x10
[   53.546342]  ? do_raw_spin_unlock+0xed/0x100
[   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
[   53.546351]  f2fs_fill_super+0x287d/0x2d10
[   53.546367]  ? vsnprintf+0x742/0x7a0
[   53.546372]  ? f2fs_commit_super+0x180/0x180
[   53.546379]  ? up_write+0x20/0x40
[   53.546385]  ? set_blocksize+0x5f/0x140
[   53.546391]  ? f2fs_commit_super+0x180/0x180
[   53.546402]  mount_bdev+0x181/0x200
[   53.546406]  mount_fs+0x94/0x180
[   53.546411]  vfs_kern_mount+0x6c/0x1e0
[   53.546415]  do_mount+0xe5e/0x1510
[   53.546420]  ? fs_reclaim_release+0x9/0x30
[   53.546424]  ? copy_mount_string+0x20/0x20
[   53.546428]  ? fs_reclaim_acquire+0xd/0x30
[   53.546435]  ? __might_sleep+0x2c/0xc0
[   53.546440]  ? ___might_sleep+0x53/0x170
[   53.546453]  ? __might_fault+0x4c/0x60
[   53.546468]  ? _copy_from_user+0x95/0xa0
[   53.546474]  ? memdup_user+0x39/0x60
[   53.546478]  ksys_mount+0x88/0xb0
[   53.546482]  __x64_sys_mount+0x5d/0x70
[   53.546495]  do_syscall_64+0x65/0x130
[   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Sahitya Tummala 1e78e8bd9d f2fs: fix data corruption issue with hardware encryption
Direct IO can be used in case of hardware encryption. The following
scenario results into data corruption issue in this path -

Thread A -                          Thread B-
-> write file#1 in direct IO
                                    -> GC gets kicked in
                                    -> GC submitted bio on meta mapping
				       for file#1, but pending completion
-> write file#1 again with new data
   in direct IO
                                    -> GC bio gets completed now
                                    -> GC writes old data to the new
                                       location and thus file#1 is
				       corrupted.

Fix this by submitting and waiting for pending io on meta mapping
for direct IO case in f2fs_map_blocks().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 0c093b590e f2fs: fix to recover inode->i_flags of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +a /mnt/f2fs/file
6. xfs_io -a /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. xfs_io /mnt/f2fs/file

There is no error when opening this file w/o O_APPEND, but actually,
we expect the correct result should be:

/mnt/f2fs/file: Operation not permitted

The root cause is, in recover_inode(), we recover inode->i_flags more
than F2FS_I(inode)->i_flags, so fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 9149a5eb60 f2fs: spread f2fs_set_inode_flags()
This patch changes codes as below:
- use f2fs_set_inode_flags() to update i_flags atomically to avoid
potential race.
- synchronize F2FS_I(inode)->i_flags to inode->i_flags in
f2fs_new_inode().
- use f2fs_set_inode_flags() to simply codes in f2fs_quota_{on,off}.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:47 -07:00
Chao Yu 2baf078185 f2fs: fix to spread clear_cold_data()
We need to drop PG_checked flag on page as well when we clear PG_uptodate
flag, in order to avoid treating the page as GCing one later.

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Jaegeuk Kim 164a63fa6b Revert "f2fs: fix to clear PG_checked flag in set_page_dirty()"
This reverts commit 66110abc4c.

If we clear the cold data flag out of the writeback flow, we can miscount
-1 by end_io, which incurs a deadlock caused by all I/Os being blocked during
heavy GC.

Balancing F2FS Async:
 - IO (CP:    1, Data:   -1, Flush: (   0    0    1), Discard: (   ...

GC thread:                              IRQ
- move_data_page()
 - set_page_dirty()
  - clear_cold_data()
                                        - f2fs_write_end_io()
                                         - type = WB_DATA_TYPE(page);
                                           here, we get wrong type
                                         - dec_page_count(sbi, type);
 - f2fs_wait_on_page_writeback()

Cc: <stable@vger.kernel.org>
Reported-and-Tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Jaegeuk Kim 5f9abab42b f2fs: account read IOs and use IO counts for is_idle
This patch adds issued read IO counts which is under block layer.

Chao modified a bit, since:

Below race can cause reversed reference on F2FS_RD_DATA, there is
the same issue in f2fs_submit_page_bio(), fix them by relocate
__submit_bio() and inc_page_count.

Thread A			Thread B
- f2fs_write_begin
 - f2fs_submit_page_read
 - __submit_bio
				- f2fs_read_end_io
				 - __read_end_io
				 - dec_page_count(, F2FS_RD_DATA)
 - inc_page_count(, F2FS_RD_DATA)

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:46 -07:00
Chao Yu 78efac537d f2fs: fix to account IO correctly for cgroup writeback
Now, we have supported cgroup writeback, it depends on correctly IO
account of specified filesystem.

But in commit d1b3e72d54 ("f2fs: submit bio of in-place-update pages"),
we split write paths from f2fs_submit_page_mbio() to two:
- f2fs_submit_page_bio() for IPU path
- f2fs_submit_page_bio() for OPU path

But still we account write IO only in f2fs_submit_page_mbio(), result in
incorrect IO account, fix it by adding missing IO account in IPU path.

Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:54:39 -07:00
Chao Yu 4c58ed0768 f2fs: fix to account IO correctly
Below race can cause reversed reference on dirty count, fix it by
relocating __submit_bio() and inc_page_count().

Thread A				Thread B
- f2fs_inplace_write_data
 - f2fs_submit_page_bio
  - __submit_bio
					- f2fs_write_end_io
					 - dec_page_count
  - inc_page_count

Cc: <stable@vger.kernel.org>
Fixes: d1b3e72d54 ("f2fs: submit bio of in-place-update pages")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-22 17:53:48 -07:00
Linus Torvalds a36cf68651 SPI NOR changes:
Core changes:
   * Support non-uniform erase size
   * Support controllers with limited TX fifo size
 
  Driver changes:
   * m25p80: Re-issue a WREN command after each write access
   * cadence: Pass a proper dir value to dma_[un]map_single()
   * fsl-qspi: Check fsl_qspi_get_seqid() return val make sure 4B
     addressing opcodes are properly handled
   * intel-spi: Add a new PCI entry for Ice Lake
 
 NAND changes:
  Raw NAND core changes:
  - Two batchs of cleanups of the NAND API, including:
    * Deprecating a lot of interfaces (now replaced by ->exec_op()).
    * Moving code in separate drivers (JEDEC, ONFI), in private files
      (internals), in platform drivers, etc.
    * Functions/structures reordering.
    * Exclusive use of the nand_chip structure instead of the MTD one
      all across the subsystem.
  - Addition of the nand_wait_readrdy/rdy_op() helpers.
 
  Raw NAND controllers drivers changes:
  - Various coccinelle patches.
  - Marvell:
    * Use regmap_update_bits() for syscon access.
    * More documentation.
    * BCH failure path rework.
    * More layouts to be supported.
    * IRQ handler complete() condition fixed.
  - Fsl_ifc:
    * SRAM initialization fixed for newer controller versions.
  - Denali:
    * Fix licenses mismatch and use a SPDX tag.
    * Set SPARE_AREA_SKIP_BYTES register to 8 if unset.
  - Qualcomm:
    * Do not include dma-direct.h.
  - Docg4:
    * Removed.
  - Ams-delta:
    * Use of a GPIO lookup table
    * Internal machinery changes.
 
  Raw NAND chip drivers changes:
  - Toshiba:
    * Add support for Toshiba memory BENAND
    * Pass a single nand_chip object to the status helper.
  - ESMT:
    * New driver to retrieve the ECC requirements from the 5th ID byte.
 
 MTD changes:
  * physmap cleanups/fixe
  * gpio-addr-flash cleanups/fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJQBAABCgA6FiEEKmCqpbOU668PNA69Ze02AX4ItwAFAlvNmJocHGJvcmlzLmJy
 ZXppbGxvbkBib290bGluLmNvbQAKCRBl7TYBfgi3AKr+D/4pH0gYSGDUstZsqUHW
 gZsPRU4jmOA+OCRbSxE03bOZOYR0UPgdoUYLfAKhZ5qxc7wHd3b47wykP0kUEviu
 lC8QhSs4DUA+EuMVPDVS4FlXRT0e3dMV7jh/IK6nInshD2YkaovyCqa6GvgwFEcM
 7BCizxRhtHV8+fo7GVQWuMLb9ZfbEvz42D0sYOu6hIsF1SnRDvHOvfdDUFEXpJoF
 a2mC9ove65ChEzc2iZ/r260iZ4aoJYb9XJRJKWxmYeITZSmLNmcrUyGqAaNQ7NRc
 AIuPXASbeHGjrIuEfXpKYW07AE5MV1nJFSk3v4u5FjgoohOoobPp7npk+Lz/qwe3
 y8/uW9qOQ/iEOsRnkvMGNu4Yjhw41L1a+J5wVvUvzmwHy1xMCrRHYB/8gXoZzemR
 A7XmCPwjAFVWv1WeKV2cs5MLW9yZq9QNMtGLlNs5OgFR6CccLk67/tHIkYLsJ/l9
 IDXFhd/YKhTeF361u6Iimmgb427TwM71P9N+OMpv4nk46DxurpxkgGle3nslzKNU
 DOFNFlMGiPIx3h4X96AKER6u7cxiOXnLGq+XeHa/y8tIrziy3jg03YoTh0RSwKxV
 J3EXwh1sFLaeebwWEgpE3Ag1LOxpRCqJ2ED71SPzS/DR/938HBVsEoPqUEHNf1in
 jwsUB3cRzNwf+XX68DRCVT5GFA==
 =7w4w
 -----END PGP SIGNATURE-----

Merge tag 'mtd/for-4.20' of git://git.infradead.org/linux-mtd

Pull mtd updates from Boris Brezillon:
 "SPI NOR core changes:
   - Support non-uniform erase size
   - Support controllers with limited TX fifo size

 Driver changes:
   - m25p80: Re-issue a WREN command after each write access
   - cadence: Pass a proper dir value to dma_[un]map_single()
   - fsl-qspi: Check fsl_qspi_get_seqid() return val make sure 4B
     addressing opcodes are properly handled
   - intel-spi: Add a new PCI entry for Ice Lake

 Raw NAND core changes:
   - Two batchs of cleanups of the NAND API, including:
      * Deprecating a lot of interfaces (now replaced by ->exec_op()).
      * Moving code in separate drivers (JEDEC, ONFI), in private files
        (internals), in platform drivers, etc.
      * Functions/structures reordering.
      * Exclusive use of the nand_chip structure instead of the MTD one
        all across the subsystem.
   - Addition of the nand_wait_readrdy/rdy_op() helpers.

 Raw NAND controllers drivers changes:
   - Various coccinelle patches.
   - Marvell:
      * Use regmap_update_bits() for syscon access.
      * More documentation.
      * BCH failure path rework.
      * More layouts to be supported.
      * IRQ handler complete() condition fixed.
   - Fsl_ifc:
      * SRAM initialization fixed for newer controller versions.
   - Denali:
      * Fix licenses mismatch and use a SPDX tag.
      * Set SPARE_AREA_SKIP_BYTES register to 8 if unset.
   - Qualcomm:
      * Do not include dma-direct.h.
   - Docg4:
      * Removed.
   - Ams-delta:
      * Use of a GPIO lookup table
      * Internal machinery changes.

 Raw NAND chip drivers changes:
   - Toshiba:
      * Add support for Toshiba memory BENAND
      * Pass a single nand_chip object to the status helper.
   - ESMT:
      * New driver to retrieve the ECC requirements from the 5th ID
        byte.

  MTD changes:
   - physmap cleanups/fixe
   - gpio-addr-flash cleanups/fixes"

* tag 'mtd/for-4.20' of git://git.infradead.org/linux-mtd: (93 commits)
  jffs2: free jffs2_sb_info through jffs2_kill_sb()
  mtd: spi-nor: fsl-quadspi: fix read error for flash size larger than 16MB
  mtd: spi-nor: intel-spi: Add support for Intel Ice Lake SPI serial flash
  mtd: maps: gpio-addr-flash: Convert to gpiod
  mtd: maps: gpio-addr-flash: Replace array with an integer
  mtd: maps: gpio-addr-flash: Use order instead of size
  mtd: spi-nor: fsl-quadspi: Don't let -EINVAL on the bus
  mtd: devices: m25p80: Make sure WRITE_EN is issued before each write
  mtd: spi-nor: Support controllers with limited TX FIFO size
  mtd: spi-nor: cadence-quadspi: Use proper enum for dma_[un]map_single
  mtd: spi-nor: parse SFDP Sector Map Parameter Table
  mtd: spi-nor: add support to non-uniform SFDP SPI NOR flash memories
  mtd: rawnand: marvell: fix the IRQ handler complete() condition
  mtd: rawnand: denali: set SPARE_AREA_SKIP_BYTES register to 8 if unset
  mtd: rawnand: r852: fix spelling mistake "card_registred" -> "card_registered"
  mtd: rawnand: toshiba: Pass a single nand_chip object to the status helper
  mtd: maps: gpio-addr-flash: Use devm_* functions
  mtd: maps: gpio-addr-flash: Fix ioremapped size
  mtd: maps: gpio-addr-flash: Replace custom printk
  mtd: physmap_of: Release resources on error
  ...
2018-10-23 01:09:22 +01:00
Linus Torvalds 6ab9e09238 for-4.20/block-20181021
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlvNQKgQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgps+8D/9Iy6YIeoPwN10gYsqIh0P2fS3wKzL3kiww
 3vFsWO78PzgLxUlNmB7teLtNFc/R5mi8becZmAdvs9za5YFZk56o3Ifv1x+e+z00
 VY1/gxhiJD6suLeJ6lECnERGDaiWOZVRMo2TE17vxYGW6GGaa0Ts6PUUXmpla1u5
 WKctgt0Qv9WVNyiIdLdeHqzKJwsSSwNTt8fK7eFhy3x8e0CwJr+GtXckbbW3LFkY
 lug0npsTli3EmEPMovZhd25SjZmTk5GTM+ADZQ7Tnv5KXoDWB9jn6TcCSAi3G+5d
 5WUVwfnDyYJiH8qvlg5tRJ690muIy3xMOmpr7QBQ0YnR/LQ3EW+1CVfqD+qimgLH
 TXzlREXQpBP3YlxSDS5nddz4o5z84GZmC9B/43ujPaZKIQ6eBXYdkmQH7tPtSugm
 C6VGomR5tHotjxIiAsexh/5hAus+wW8bObKGTPTyINT0ub3XNclwCKLh26CgI9ie
 WvbS9g3j/KPvu/7s6weZpgD+cks0YdWe/XdXXxiHwsGI9h3J2aJna5RQt1rKWDm5
 wGCgbc/B8eSwiWx+GXlqdB9/Dy/bGXOnSTDnKpEVl1f5zNjeLwUKXbjvkMefWs4m
 jEIcquuDETORY+ZYEfa5YbmS4Lhskr0kzMVTVkZ++81tAWpSCU9Xh3IHrR8TNpt+
 J0oh0FHBDg==
 =LRTT
 -----END PGP SIGNATURE-----

Merge tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "This is the main pull request for block changes for 4.20. This
  contains:

   - Series enabling runtime PM for blk-mq (Bart).

   - Two pull requests from Christoph for NVMe, with items such as;
      - Better AEN tracking
      - Multipath improvements
      - RDMA fixes
      - Rework of FC for target removal
      - Fixes for issues identified by static checkers
      - Fabric cleanups, as prep for TCP transport
      - Various cleanups and bug fixes

   - Block merging cleanups (Christoph)

   - Conversion of drivers to generic DMA mapping API (Christoph)

   - Series fixing ref count issues with blkcg (Dennis)

   - Series improving BFQ heuristics (Paolo, et al)

   - Series improving heuristics for the Kyber IO scheduler (Omar)

   - Removal of dangerous bio_rewind_iter() API (Ming)

   - Apply single queue IPI redirection logic to blk-mq (Ming)

   - Set of fixes and improvements for bcache (Coly et al)

   - Series closing a hotplug race with sysfs group attributes (Hannes)

   - Set of patches for lightnvm:
      - pblk trace support (Hans)
      - SPDX license header update (Javier)
      - Tons of refactoring patches to cleanly abstract the 1.2 and 2.0
        specs behind a common core interface. (Javier, Matias)
      - Enable pblk to use a common interface to retrieve chunk metadata
        (Matias)
      - Bug fixes (Various)

   - Set of fixes and updates to the blk IO latency target (Josef)

   - blk-mq queue number updates fixes (Jianchao)

   - Convert a bunch of drivers from the old legacy IO interface to
     blk-mq. This will conclude with the removal of the legacy IO
     interface itself in 4.21, with the rest of the drivers (me, Omar)

   - Removal of the DAC960 driver. The SCSI tree will introduce two
     replacement drivers for this (Hannes)"

* tag 'for-4.20/block-20181021' of git://git.kernel.dk/linux-block: (204 commits)
  block: setup bounce bio_sets properly
  blkcg: reassociate bios when make_request() is called recursively
  blkcg: fix edge case for blk_get_rl() under memory pressure
  nvme-fabrics: move controller options matching to fabrics
  nvme-rdma: always have a valid trsvcid
  mtip32xx: fully switch to the generic DMA API
  rsxx: switch to the generic DMA API
  umem: switch to the generic DMA API
  sx8: switch to the generic DMA API
  sx8: remove dead IF_64BIT_DMA_IS_POSSIBLE code
  skd: switch to the generic DMA API
  ubd: remove use of blk_rq_map_sg
  nvme-pci: remove duplicate check
  drivers/block: Remove DAC960 driver
  nvme-pci: fix hot removal during error handling
  nvmet-fcloop: suppress a compiler warning
  nvme-core: make implicit seed truncation explicit
  nvmet-fc: fix kernel-doc headers
  nvme-fc: rework the request initialization code
  nvme-fc: introduce struct nvme_fcp_op_w_sgl
  ...
2018-10-22 17:46:08 +01:00
Kees Cook 1227daa43b pstore/ram: Clarify resource reservation labels
When ramoops reserved a memory region in the kernel, it had an unhelpful
label of "persistent_memory". When reading /proc/iomem, it would be
repeated many times, did not hint that it was ramoops in particular,
and didn't clarify very much about what each was used for:

400000000-407ffffff : Persistent Memory (legacy)
  400000000-400000fff : persistent_memory
  400001000-400001fff : persistent_memory
...
  4000ff000-4000fffff : persistent_memory

Instead, this adds meaningful labels for how the various regions are
being used:

400000000-407ffffff : Persistent Memory (legacy)
  400000000-400000fff : ramoops:dump(0/252)
  400001000-400001fff : ramoops:dump(1/252)
...
  4000fc000-4000fcfff : ramoops:dump(252/252)
  4000fd000-4000fdfff : ramoops:console
  4000fe000-4000fe3ff : ramoops:ftrace(0/3)
  4000fe400-4000fe7ff : ramoops:ftrace(1/3)
  4000fe800-4000febff : ramoops:ftrace(2/3)
  4000fec00-4000fefff : ramoops:ftrace(3/3)
  4000ff000-4000fffff : ramoops:pmsg

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
2018-10-22 07:11:58 -07:00
Kees Cook 95047b0519 pstore: Refactor compression initialization
This refactors compression initialization slightly to better handle
getting potentially called twice (via early pstore_register() calls
and later pstore_init()) and improves the comments and reporting to be
more verbose.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
2018-10-22 07:11:58 -07:00
Joel Fernandes (Google) 416031653e pstore: Allocate compression during late_initcall()
ramoops's call of pstore_register() was recently moved to run during
late_initcall() because the crypto backend may not have been ready during
postcore_initcall(). This meant early-boot crash dumps were not getting
caught by pstore any more.

Instead, lets allow calls to pstore_register() earlier, and once crypto
is ready we can initialize the compression.

Reported-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Tested-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Fixes: cb3bee0369 ("pstore: Use crypto compress API")
[kees: trivial rebase]
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
2018-10-22 07:11:58 -07:00
Kees Cook cb095afd44 pstore: Centralize init/exit routines
In preparation for having additional actions during init/exit, this moves
the init/exit into platform.c, centralizing the logic to make call outs
to the fs init/exit.

Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
2018-10-22 07:11:58 -07:00
David S. Miller 2e2d6f0342 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
net/sched/cls_api.c has overlapping changes to a call to
nlmsg_parse(), one (from 'net') added rtm_tca_policy instead of NULL
to the 5th argument, and another (from 'net-next') added cb->extack
instead of NULL to the 6th argument.

net/ipv4/ipmr_base.c is a case of a bug fix in 'net' being done to
code which moved (to mr_table_dump)) in 'net-next'.  Thanks to David
Ahern for the heads up.

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-19 11:03:06 -07:00
Bob Peterson 8e31582a9a gfs2: Fix minor typo: couln't versus couldn't.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-19 11:34:04 -05:00
Boris Brezillon 042c1a5a60 NAND core changes:
- Two batchs of cleanups of the NAND API, including:
   * Deprecating a lot of interfaces (now replaced by ->exec_op()).
   * Moving code in separate drivers (JEDEC, ONFI), in private files
     (internals), in platform drivers, etc.
   * Functions/structures reordering.
   * Exclusive use of the nand_chip structure instead of the MTD one
     all across the subsystem.
 - Addition of the nand_wait_readrdy/rdy_op() helpers.
 
 Raw NAND controllers drivers changes:
 - Various coccinelle patches.
 - Marvell:
   * Use regmap_update_bits() for syscon access.
   * More documentation.
   * BCH failure path rework.
   * More layouts to be supported.
   * IRQ handler complete() condition fixed.
 - Fsl_ifc:
   * SRAM initialization fixed for newer controller versions.
 - Denali:
   * Fix licenses mismatch and use a SPDX tag.
   * Set SPARE_AREA_SKIP_BYTES register to 8 if unset.
 - Qualcomm:
   * Do not include dma-direct.h.
 - Docg4:
   * Removed.
 - Ams-delta:
   * Use of a GPIO lookup table
   * Internal machinery changes.
 
 Raw NAND chip drivers changes:
 - Toshiba:
   * Add support for Toshiba memory BENAND
   * Pass a single nand_chip object to the status helper.
 - ESMT:
   * New driver to retrieve the ECC requirements from the 5th ID byte.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEE9HuaYnbmDhq/XIDIJWrqGEe9VoQFAlvDdvgACgkQJWrqGEe9
 VoQ0zQf/QfXYjLDbC/1FXckEkEeVZPRbXQMf+ptdTo0inJefy9LMOvhRKmuABCxn
 wwPa99aqQ+aHESnU/s2d8FUrhNRUf35lmRIAP512lJk79tBfgo73l5WSc6UTg6hO
 w3KbigOvTi7wCKQYGDFOBZuoFFCvpNsIEtWFuodCbYG1GmeAaL7L2fhbri9JbcE2
 zPqmCv0w5BUrPR7i/taI5gD+KDtRdmCEyfb7wQoJy2XZVO0SKqGsBhDhYWqwCpck
 eP2UzQ4cFT8X1S0/AfUpbowc6cftdqIwW9VRY4inx3ALQ+sKxrkJ+0E3gL9lnZ+F
 p/is0gTdvDmnOjXfwGcL7GXGy0nN8Q==
 =EKv1
 -----END PGP SIGNATURE-----

Merge tag 'nand/for-4.20' of git://git.infradead.org/linux-mtd into mtd/next

NAND core changes:
- Two batchs of cleanups of the NAND API, including:
  * Deprecating a lot of interfaces (now replaced by ->exec_op()).
  * Moving code in separate drivers (JEDEC, ONFI), in private files
    (internals), in platform drivers, etc.
  * Functions/structures reordering.
  * Exclusive use of the nand_chip structure instead of the MTD one
    all across the subsystem.
- Addition of the nand_wait_readrdy/rdy_op() helpers.

Raw NAND controllers drivers changes:
- Various coccinelle patches.
- Marvell:
  * Use regmap_update_bits() for syscon access.
  * More documentation.
  * BCH failure path rework.
  * More layouts to be supported.
  * IRQ handler complete() condition fixed.
- Fsl_ifc:
  * SRAM initialization fixed for newer controller versions.
- Denali:
  * Fix licenses mismatch and use a SPDX tag.
  * Set SPARE_AREA_SKIP_BYTES register to 8 if unset.
- Qualcomm:
  * Do not include dma-direct.h.
- Docg4:
  * Removed.
- Ams-delta:
  * Use of a GPIO lookup table
  * Internal machinery changes.

Raw NAND chip drivers changes:
- Toshiba:
  * Add support for Toshiba memory BENAND
  * Pass a single nand_chip object to the status helper.
- ESMT:
  * New driver to retrieve the ECC requirements from the 5th ID byte.
2018-10-19 09:20:09 +02:00
Benjamin Coddington fc187514d8 nfs: remove redundant call to nfs_context_set_write_error()
We don't need to call this in the direct, read, or pnfs resend paths and
the only other caller is the write path in nfs_page_async_flush() which
already checks and sets the pg_error on the context.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-18 17:20:57 -04:00
Benjamin Coddington fdbd1a2e4a nfs: Fix a missed page unlock after pg_doio()
We must check pg_error and call error_cleanup after any call to pg_doio.
Currently, we are skipping the unlock of a page if we encounter an error in
nfs_pageio_complete() before handing off the work to the RPC layer.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2018-10-18 17:20:57 -04:00
Mike Marshall 22fc9db296 orangefs: no need to check for service_operation returns > 0
service_operation returns > 0 is undefined.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 14:05:46 -04:00
Mike Marshall 34e6148a2c orangefs: some error code paths missed kmem_cache_free
If a slab cache object is allocated, it needs to be freed eventually,
certainly before anyone unloads the module that allocated it.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:58:25 -04:00
Mike Marshall b5d72cdc53 orangefs: don't let orangefs_iget return NULL.
Suggested by Dan Carpenter.

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:52:23 -04:00
Mike Marshall 56249998b2 orangefs: don't let orangefs_new_inode return NULL
Suggested by Dan Carpenter

Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2018-10-18 13:47:16 -04:00
Eric Sandeen fa520c47ea fscache: Fix out of bound read in long cookie keys
fscache_set_key() can incur an out-of-bounds read, reported by KASAN:

 BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x5b3/0x680 [fscache]
 Read of size 4 at addr ffff88084ff056d4 by task mount.nfs/32615

and also reported by syzbot at https://lkml.org/lkml/2018/7/8/236

  BUG: KASAN: slab-out-of-bounds in fscache_set_key fs/fscache/cookie.c:120 [inline]
  BUG: KASAN: slab-out-of-bounds in fscache_alloc_cookie+0x7a9/0x880 fs/fscache/cookie.c:171
  Read of size 4 at addr ffff8801d3cc8bb4 by task syz-executor907/4466

This happens for any index_key_len which is not divisible by 4 and is
larger than the size of the inline key, because the code allocates exactly
index_key_len for the key buffer, but the hashing loop is stepping through
it 4 bytes (u32) at a time in the buf[] array.

Fix this by calculating how many u32 buffers we'll need by using
DIV_ROUND_UP, and then using kcalloc() to allocate a precleared allocation
buffer to hold the index_key, then using that same count as the hashing
index limit.

Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
David Howells 1ff22883b0 fscache: Fix incomplete initialisation of inline key space
The inline key in struct rxrpc_cookie is insufficiently initialized,
zeroing only 3 of the 4 slots, therefore an index_key_len between 13 and 15
bytes will end up hashing uninitialized memory because the memcpy only
partially fills the last buf[] element.

Fix this by clearing fscache_cookie objects on allocation rather than using
the slab constructor to initialise them.  We're going to pretty much fill
in the entire struct anyway, so bringing it into our dcache writably
shouldn't incur much overhead.

This removes the need to do clearance in fscache_set_key() (where we aren't
doing it correctly anyway).

Also, we don't need to set cookie->key_len in fscache_set_key() as we
already did it in the only caller, so remove that.

Fixes: ec0328e46d ("fscache: Maintain a catalogue of allocated cookies")
Reported-by: syzbot+a95b989b2dde8e806af8@syzkaller.appspotmail.com
Reported-by: Eric Sandeen <sandeen@redhat.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
Al Viro 169b803397 cachefiles: fix the race between cachefiles_bury_object() and rmdir(2)
the victim might've been rmdir'ed just before the lock_rename();
unlike the normal callers, we do not look the source up after the
parents are locked - we know it beforehand and just recheck that it's
still the child of what used to be its parent.  Unfortunately,
the check is too weak - we don't spot a dead directory since its
->d_parent is unchanged, dentry is positive, etc.  So we sail all
the way to ->rename(), with hosting filesystems _not_ expecting
to be asked renaming an rmdir'ed subdirectory.

The fix is easy, fortunately - the lock on parent is sufficient for
making IS_DEADDIR() on child safe.

Cc: stable@vger.kernel.org
Fixes: 9ae326a690 (CacheFiles: A cache that backs onto a mounted filesystem)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-18 11:32:21 +02:00
Christoph Hellwig 96987eea53 xfs: cancel COW blocks before swapext
We need to make sure we have no outstanding COW blocks before we swap
extents, as there is nothing preventing us from having preallocated COW
delalloc on either inode that swapext is called on.  That case can
easily be reproduced by running generic/324 in always_cow mode:

[  620.760572] XFS: Assertion failed: tip->i_delayed_blks == 0, file: fs/xfs/xfs_bmap_util.c, line: 1669
[  620.761608] ------------[ cut here ]------------
[  620.762171] kernel BUG at fs/xfs/xfs_message.c:102!
[  620.762732] invalid opcode: 0000 [#1] SMP PTI
[  620.763272] CPU: 0 PID: 24153 Comm: xfs_fsr Tainted: G        W         4.19.0-rc1+ #4182
[  620.764203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
[  620.765202] RIP: 0010:assfail+0x20/0x28
[  620.765646] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
[  620.767758] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
[  620.768359] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
[  620.769174] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
[  620.769982] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
[  620.770788] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
[  620.771638] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
[  620.772504] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
[  620.773475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  620.774168] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
[  620.774978] Call Trace:
[  620.775274]  xfs_swap_extent_forks+0x2a0/0x2e0
[  620.775792]  xfs_swap_extents+0x38b/0xab0
[  620.776256]  xfs_ioc_swapext+0x121/0x140
[  620.776709]  xfs_file_ioctl+0x328/0xc90
[  620.777154]  ? rcu_read_lock_sched_held+0x50/0x60
[  620.777694]  ? xfs_iunlock+0x233/0x260
[  620.778127]  ? xfs_setattr_nonsize+0x3be/0x6a0
[  620.778647]  do_vfs_ioctl+0x9d/0x680
[  620.779071]  ? ksys_fchown+0x47/0x80
[  620.779552]  ksys_ioctl+0x35/0x70
[  620.780040]  __x64_sys_ioctl+0x11/0x20
[  620.780530]  do_syscall_64+0x4b/0x190
[  620.780927]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  620.781467] RIP: 0033:0x7fdc364d0f07
[  620.781900] Code: b3 66 90 48 8b 05 81 5f 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 28
[  620.784044] RSP: 002b:00007ffe2a766038 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  620.784896] RAX: ffffffffffffffda RBX: 0000000000000025 RCX: 00007fdc364d0f07
[  620.785667] RDX: 0000560296ca2fc0 RSI: 00000000c0c0586d RDI: 0000000000000005
[  620.786398] RBP: 0000000000000025 R08: 0000000000001200 R09: 0000000000000000
[  620.787283] R10: 0000000000000432 R11: 0000000000000246 R12: 0000000000000005
[  620.788051] R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000006
[  620.788927] Modules linked in:
[  620.789340] ---[ end trace 9503b7417ffdbdb0 ]---
[  620.790065] RIP: 0010:assfail+0x20/0x28
[  620.790642] Code: 31 ff e8 83 fc ff ff 0f 0b c3 48 89 f1 41 89 d0 48 c7 c6 48 ca 8d 82 48 89 fa 38
[  620.793038] RSP: 0018:ffffc9000898bc10 EFLAGS: 00010202
[  620.793609] RAX: 0000000000000000 RBX: ffff88012f14ba40 RCX: 0000000000000000
[  620.794317] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828560d9
[  620.795025] RBP: ffff88012f14b300 R08: 0000000000000000 R09: 0000000000000000
[  620.795778] R10: 000000000000000a R11: f000000000000000 R12: ffffc9000898bc98
[  620.796675] R13: ffffc9000898bc9c R14: ffff880130b5e2b8 R15: ffff88012a1fa2a8
[  620.797782] FS:  00007fdc36e0fbc0(0000) GS:ffff88013ba00000(0000) knlGS:0000000000000000
[  620.798908] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  620.799594] CR2: 00007fdc3604d000 CR3: 0000000132afc000 CR4: 00000000000006f0
[  620.800424] Kernel panic - not syncing: Fatal exception
[  620.801191] Kernel Offset: disabled
[  620.801597] ---[ end Kernel panic - not syncing: Fatal exception ]---

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:55 +11:00
Brian Foster efc3289cf8 xfs: clear ail delwri queued bufs on unmount of shutdown fs
In the typical unmount case, the AIL is forced out by the unmount
sequence before the xfsaild task is stopped. Since AIL items are
removed on writeback completion, this means that the AIL
->ail_buf_list delwri queue has been drained. This is not always
true in the shutdown case, however.

It's possible for buffers to sit on a delwri queue for a period of
time across submission attempts if said items are locked or have
been relogged and pinned since first added to the queue. If the
attempt to log such an item results in a log I/O error, the error
processing can shutdown the fs, remove the item from the AIL, stale
the buffer (dropping the LRU reference) and clear its delwri queue
state. The latter bit means the buffer will be released from a
delwri queue on the next submission attempt, but this might never
occur if the filesystem has shutdown and the AIL is empty.

This means that such buffers are held indefinitely by the AIL delwri
queue across destruction of the AIL. Aside from being a memory leak,
these buffers can also hold references to in-core perag structures.
The latter problem manifests as a generic/475 failure, reproducing
the following asserts at unmount time:

  XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
	file: fs/xfs/xfs_mount.c, line: 151
  XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0,
	file: fs/xfs/xfs_mount.c, line: 132

To prevent this problem, clear the AIL delwri queue as a final step
before xfsaild() exit. The !empty state should never occur in the
normal case, so add an assert to catch unexpected problems going
forward.

[dgc: add comment explaining need for xfs_buf_delwri_cancel() after
 calling xfs_buf_delwri_submit_nowait().]

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:49 +11:00
Carlos Maiolino 26ca39015e xfs: use offsetof() in place of offset macros for __xfsstats
Most offset macro mess is used in xfs_stats_format() only, and we can
simply get the right offsets using offsetof(), instead of several macros
to mark the offsets inside __xfsstats structure.

Replace all XFSSTAT_END_* macros by a single helper macro to get the
right offset into __xfsstats, and use this helper in xfs_stats_format()
directly.

The quota stats code, still looks a bit cleaner when using XFSSTAT_*
macros, so, this patch also defines XFSSTAT_START_XQMSTAT and
XFSSTAT_END_XQMSTAT locally to that code. This also should prevent
offset mistakes when updates are done into __xfsstats.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:39 +11:00
Carlos Maiolino 41657e5507 xfs: Fix xqmstats offsets in /proc/fs/xfs/xqmstat
The addition of FIBT, RMAP and REFCOUNT changed the offsets into
__xfssats structure.

This caused xqmstat_proc_show() to display garbage data via
/proc/fs/xfs/xqmstat, once it relies on the offsets marked via macros.

Fix it.

Fixes: 00f4e4f9 xfs: add rmap btree stats infrastructure
Fixes: aafc3c24 xfs: support the XFS_BTNUM_FINOBT free inode btree type
Fixes: 46eeb521 xfs: introduce refcount btree definitions
Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:34 +11:00
Dave Chinner 37fd167824 xfs: fix use-after-free race in xfs_buf_rele
When looking at a 4.18 based KASAN use after free report, I noticed
that racing xfs_buf_rele() may race on dropping the last reference
to the buffer and taking the buffer lock. This was the symptom
displayed by the KASAN report, but the actual issue that was
reported had already been fixed in 4.19-rc1 by commit e339dd8d8b
("xfs: use sync buffer I/O for sync delwri queue submission").

Despite this, I think there is still an issue with xfs_buf_rele()
in this code:

        release = atomic_dec_and_lock(&bp->b_hold, &pag->pag_buf_lock);
        spin_lock(&bp->b_lock);
        if (!release) {
.....

If two threads race on the b_lock after both dropping a reference
and one getting dropping the last reference so release = true, we
end up with:

CPU 0				CPU 1
atomic_dec_and_lock()
				atomic_dec_and_lock()
				spin_lock(&bp->b_lock)
spin_lock(&bp->b_lock)
<spins>
				<release = true bp->b_lru_ref = 0>
				<remove from lists>
				freebuf = true
				spin_unlock(&bp->b_lock)
				xfs_buf_free(bp)
<gets lock, reading and writing freed memory>
<accesses freed memory>
spin_unlock(&bp->b_lock) <reads/writes freed memory>

IOWs, we can't safely take bp->b_lock after dropping the hold
reference because the buffer may go away at any time after we
drop that reference. However, this can be fixed simply by taking the
bp->b_lock before we drop the reference.

It is safe to nest the pag_buf_lock inside bp->b_lock as the
pag_buf_lock is only used to serialise against lookup in
xfs_buf_find() and no other locks are held over or under the
pag_buf_lock there. Make this clear by documenting the buffer lock
orders at the top of the file.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:29 +11:00
Allison Henderson 068f985a9e xfs: Add attibute remove and helper functions
This patch adds xfs_attr_remove_args. These sub-routines remove
the attributes specified in @args. We will use this later for setting
parent pointers as a deferred attribute operation.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:23 +11:00
Allison Henderson 2f3cd80919 xfs: Add attibute set and helper functions
This patch adds xfs_attr_set_args and xfs_bmap_set_attrforkoff.
These sub-routines set the attributes specified in @args.
We will use this later for setting parent pointers as a deferred
attribute operation.

[dgc: remove attr fork init code from xfs_attr_set_args().]
[dgc: xfs_attr_try_sf_addname() NULLs args.trans after commit.]
[dgc: correct sf add error handling.]

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:21:16 +11:00
Allison Henderson 4c74a56b9d xfs: Add helper function xfs_attr_try_sf_addname
This patch adds a subroutine xfs_attr_try_sf_addname
used by xfs_attr_set.  This subrotine will attempt to
add the attribute name specified in args in shortform,
as well and perform error handling previously done in
xfs_attr_set.

This patch helps to pre-simplify xfs_attr_set for reviewing
purposes and reduce indentation.  New function will be added
in the next patch.

[dgc: moved commit to helper function, too.]

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:50 +11:00
Allison Henderson e2421f0b5f xfs: Move fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
This patch moves fs/xfs/xfs_attr.h to fs/xfs/libxfs/xfs_attr.h
since xfs_attr.c is in libxfs.  We will need these later in
xfsprogs.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:45 +11:00
Dave Chinner 56668a5cc4 xfs: issue log message on user force shutdown
The kernel only issues a log message that it's been shut down when
the filesystem triggers a shutdown itself. Hence there is no trace
in the log when a shutdown is triggered manually from userspace.
This can make it hard to see sequence of events in the log when
things go wrong, so make sure we always log a message when a
shutdown is run.

While there, clean up the logic flow so we don't have to continually
check if the shutdown trigger was user initiated before logging
shutdown messages.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:39 +11:00
Darrick J. Wong 38b6238eb6 xfs: fix buffer state management in xrep_findroot_block
We don't handle buffer state properly in online repair's findroot
routine.  If a buffer already has b_ops set, we don't ever want to touch
that, and we don't want to call the read verifiers on a buffer that
could be dirty (CRCs are only recomputed during log checkpoints).

Therefore, be more careful about what we do with a buffer -- if someone
else already attached ops that are not the ones for this btree type,
just ignore the buffer.  We only attach our btree type's buf ops if it
matches the magic/uuid and structure checks.

We also modify xfs_buf_read_map to allow callers to set buffer ops on a
DONE buffer with NULL ops so that repair doesn't leave behind buffers
which won't have buffers attached to them.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:35 +11:00
Darrick J. Wong 1aff5696f3 xfs: always assign buffer verifiers when one is provided
If a caller supplies buffer ops when trying to read a buffer and the
buffer doesn't already have buf ops assigned, ensure that the ops are
assigned to the buffer and the verifier is run on that buffer.

Note that current XFS code is careful to assign buffer ops after a
xfs_{trans_,}buf_read call in which ops were not supplied.  However, we
should apply ops defensively in case there is ever a coding mistake; and
an upcoming repair patch will need to be able to read a buffer without
assigning buf ops.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:30 +11:00
Darrick J. Wong 1002ff45ef xfs: xrep_findroot_block should reject root blocks with siblings
In xrep_findroot_block, if we find a candidate root block with sibling
pointers or sibling blocks on the same tree level, we should not return
that block as a tree root because root blocks cannot have siblings.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:26 +11:00
Adam Borowski dddde68b8f xfs: add a define for statfs magic to uapi
Needed by userspace programs that call fstatfs().

It'd be natural to publish XFS_SB_MAGIC in uapi, but while these two
have identical values, they have different semantic meaning: one is
an enum cookie meant for statfs, the other a signature of the
on-disk format.

Signed-off-by: Adam Borowski <kilobyte@angband.pl>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:19 +11:00
Christoph Hellwig 4831822ff1 xfs: print dangling delalloc extents
Instead of just asserting that we have no delalloc space dangling
in an inode that gets freed print the actual offenders for debug
mode.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:20:11 +11:00
Christoph Hellwig 032dc923b2 xfs: fix fork selection in xfs_find_trim_cow_extent
We should want to write directly into the data fork for blocks that don't
have an extent in the COW fork covering them yet.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:19:58 +11:00
Christoph Hellwig d392bc81bb xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:19:48 +11:00
Christoph Hellwig fc439464e3 xfs: remove the unused shared argument to xfs_reflink_reserve_cow
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:19:37 +11:00
Christoph Hellwig 0365c5d6c3 xfs: handle zeroing in xfs_file_iomap_begin_delay
We only need to allocate blocks for zeroing for reflink inodes,
and for we currently have a special case for reflink files in
the otherwise direct I/O path that I'd like to get rid of.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:19:26 +11:00
Christoph Hellwig daa79baefc xfs: remove suport for filesystems without unwritten extent flag
The option to enable unwritten extents was made default in 2003,
removed from mkfs in 2007, and cannot be disabled in v5.  We also
rely on it for a lot of common functionality, so filesystems without
it will run a completely untested and buggy code path.  Enabling the
support also is a simple bit flip using xfs_db, so legacy file
systems can still be brought forward.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:18:58 +11:00
Christoph Hellwig 97e5a6e6dc xfs: remove XFS_IO_INVALID
The invalid state isn't any different from a hole, so merge the two
states.  Use the more descriptive hole name, but keep it as the first
value of the enum to catch uninitialized fields.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2018-10-18 17:17:50 +11:00
Jens Axboe b93f654d73 f2fs: remove request_list check in is_idle()
This doesn't work on stacked devices, and it doesn't work on
blk-mq devices. The request_list is only used on legacy, which
we don't have much of anymore, and soon won't have any of.

Kill the check.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 19:09:39 -07:00
Jaegeuk Kim 730746ce88 f2fs: allow to mount, if quota is failed
Since we can use the filesystem without quotas till next boot.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Sahitya Tummala 6390398ec7 f2fs: update REQ_TIME in f2fs_cross_rename()
Update REQ_TIME in the missing path - f2fs_cross_rename().

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
[Jaegeuk Kim: add it in f2fs_rename()]
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Sahitya Tummala c75f2feb80 f2fs: do not update REQ_TIME in case of error conditions
The REQ_TIME should be updated only in case of success cases
as followed at all other places in the file system.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:37:00 -07:00
Chao Yu 3b30eb19dc f2fs: remove unneeded disable_nat_bits()
Commit 7735730d39 ("f2fs: fix to propagate error from __get_meta_page()")
added disable_nat_bits() in error path of __get_nat_bitmaps(), but it's
unneeded, beause we will fail mount, we won't have chance to change nid
usage status w/o nat full/empty bitmaps.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 850971b23f f2fs: remove unused sbi->trigger_ssr_threshold
Commit a2a12b679f ("f2fs: export SSR allocation threshold") introduced
two threshold .min_ssr_sections and .trigger_ssr_threshold, but only
.min_ssr_sections is used, so just remove redundant one for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu ed15ba1415 f2fs: shrink sbi->sb_lock coverage in set_file_temperature()
file_set_{cold,hot} doesn't need holding sbi->sb_lock, so moving them
out of the lock.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 4dada3fd70 f2fs: use rb_*_cached friends
As rbtree supports caching leftmost node natively, update f2fs codes
to use rb_*_cached helpers to speed up leftmost node visiting.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu ef2a007134 f2fs: fix to recover cold bit of inode block during POR
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. chattr -A /mnt/f2fs/file
11. xfs_io -f /mnt/f2fs/file -c "fsync"
12. umount /mnt/f2fs
13. mount -t f2fs /dev/sdd /mnt/f2fs
14. lsattr /mnt/f2fs/file

-----------------N- /mnt/f2fs/file

But actually, we expect the corrct result is:

-------A---------N- /mnt/f2fs/file

The reason is in step 9) we missed to recover cold bit flag in inode
block, so later, in fsync, we will skip write inode block due to below
condition check, result in lossing data in another SPOR.

f2fs_fsync_node_pages()
	if (!IS_DNODE(page) || !is_cold_node(page))
		continue;

Note that, I guess that some non-dir inode has already lost cold bit
during POR, so in order to reenable recovery for those inode, let's
try to recover cold bit in f2fs_iget() to save more fsynced data.

Fixes: c56675750d ("f2fs: remove unneeded set_cold_node()")
Cc: <stable@vger.kernel.org> 4.17+
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Chao Yu 48018b4cfd f2fs: submit cached bio to avoid endless PageWriteback
When migrating encrypted block from background GC thread, we only add
them into f2fs inner bio cache, but forget to submit the cached bio, it
may cause potential deadlock when we are waiting page writebacked, fix
it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:59 -07:00
Daniel Rosenberg 4354994f09 f2fs: checkpoint disabling
Note that, it requires "f2fs: return correct errno in f2fs_gc".

This adds a lightweight non-persistent snapshotting scheme to f2fs.

To use, mount with the option checkpoint=disable, and to return to
normal operation, remount with checkpoint=enable. If the filesystem
is shut down before remounting with checkpoint=enable, it will revert
back to its apparent state when it was first mounted with
checkpoint=disable. This is useful for situations where you wish to be
able to roll back the state of the disk in case of some critical
failure.

Signed-off-by: Daniel Rosenberg <drosen@google.com>
[Jaegeuk Kim: use SB_RDONLY instead of MS_RDONLY]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-10-16 09:36:39 -07:00
Hou Tao 92e2921f7e jffs2: free jffs2_sb_info through jffs2_kill_sb()
When an invalid mount option is passed to jffs2, jffs2_parse_options()
will fail and jffs2_sb_info will be freed, but then jffs2_sb_info will
be used (use-after-free) and freeed (double-free) in jffs2_kill_sb().

Fix it by removing the buggy invocation of kfree() when getting invalid
mount options.

Fixes: 92abc475d8 ("jffs2: implement mount option parsing and compression overriding")
Cc: stable@kernel.org
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Boris Brezillon <boris.brezillon@bootlin.com>
2018-10-16 10:34:28 +02:00
Bob Peterson c9e58fb2aa gfs2: write revokes should traverse sd_ail1_list in reverse
All the other functions that deal with the sd_ail_list run the list
from the tail back to the head, iow, in reverse. We should do the
same while writing revokes, otherwise we might miss removing entries
properly from the list when we hit the limit of how many revokes we
can write at one time (based on block size, which determines how
many block pointers will fit in the revoke block).

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-10-15 12:17:30 -05:00
Lu Fengqi d9352794da btrfs: switch return_bigger to bool in find_ref_head
Using bool is more suitable than int here, and add the comment about the
return_bigger.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00
Lu Fengqi 7c8616278b btrfs: remove fs_info from btrfs_should_throttle_delayed_refs
The avg_delayed_ref_runtime can be referenced from the transaction
handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00
Lu Fengqi af9b8a0e20 btrfs: remove fs_info from btrfs_check_space_for_delayed_refs
It can be referenced from the transaction handle.

Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00
Lu Fengqi 9e920a6f03 btrfs: delayed-ref: pass delayed_refs directly to btrfs_delayed_ref_lock
Since trans is only used for referring to delayed_refs, there is no need
to pass it instead of delayed_refs to btrfs_delayed_ref_lock().

No functional change.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2018-10-15 17:23:41 +02:00