Commit Graph

75750 Commits

Author SHA1 Message Date
Shyam Prasad N 5752bf645f cifs: avoid parallel session setups on same channel
After allowing channels to reconnect in parallel, it now
becomes important to take care that multiple processes do not
call negotiate/session setup in parallel on the same channel.

This change avoids that by marking a channel as "in_reconnect".
During session setup if the channel in question has this flag
set, we return immediately.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:16:32 -05:00
Shyam Prasad N dd3cd8709e cifs: use new enum for ses_status
ses->status today shares statusEnum with server->tcpStatus.
This has been confusing, and tcon->status has deviated to use
a new enum. Follow suit and use new enum for ses_status as well.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:11:17 -05:00
Shyam Prasad N 1a6a41d4ce cifs: do not use tcpStatus after negotiate completes
Recent changes to multichannel to allow channel reconnects to
work in parallel and independent of each other did so by
making use of tcpStatus for the connection, and status for the
session. However, this did not take into account the multiuser
scenario, where same connection is used by multiple connections.

However, tcpStatus should be tracked only till the end of
negotiate exchange, and not used for session setup. This change
fixes this.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-24 14:08:25 -05:00
Steve French 52832252dd smb3: add mount parm nosparse
To reduce risk of applications breaking that mount to servers
with only partial sparse file support, add optional mount parm
"nosparse" which disables setting files sparse (and thus
will return EOPNOTSUPP on certain fallocate operations).

Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 23:32:54 -05:00
Steve French 9ccfc23a72 smb3: don't set rc when used and unneeded in query_info_compound
rc is not checked so should not be set coming back from open_cached_dir
(the cfid pointer is checked instead to see if open_cached_dir failed)

Addresses-Coverity: 1518021 ("Code maintainability issues  (UNUSED_VALUE)")
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 21:02:45 -05:00
Steve French bbdf6cf56c smb3: check for null tcon
Although unlikely to be null, it is confusing to use a pointer
before checking for it to be null so move the use down after
null check.

Addresses-Coverity: 1517586 ("Null pointer dereferences  (REVERSE_INULL)")
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 20:50:38 -05:00
Steve French 93ed91c020 cifs: fix minor compile warning
Add ifdef around nodfs variable from patch:
  "cifs: don't call cifs_dfs_query_info_nonascii_quirk() if nodfs was set"
which is unused when CONFIG_DFS_UPCALL is not set.

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 20:24:12 -05:00
Steve French a42078b9e8 Add various fsctl structs
Add missing structure definition for various newer fsctl operations
  - duplicate_extents_ex
  - get_integrity_information
  - query_file_regions
  - query_on_disk_volume_info

And move some fsctl defintions to smbfs_common

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 20:24:12 -05:00
Steve French 22c5b91336 Add defines for various newer FSCTLs
Checking MS-FSCC section 2.3 found six FSCTL defines
that were missing

Reviewed-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-23 20:23:57 -05:00
Steve French 35a2b533a2 smb3: add trace point for oplock not found
In order to debug problems with server potentially
sending us an oplock that we don't recognize (or a race
with close and oplock break) it would be helpful to have
a dynamic trace point for this case.  New tracepoint
is called trace_smb3_oplock_not_found

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-22 00:46:08 -05:00
ChenXiaoSong 2b058acecf cifs: return the more nuanced writeback error on close()
As filemap_check_errors() only report -EIO or -ENOSPC, we return more nuanced
writeback error -(file->f_mapping->wb_err & MAX_ERRNO).

  filemap_write_and_wait
    filemap_write_and_wait_range
      filemap_check_errors
        -ENOSPC or -EIO
  filemap_check_wb_err
    errseq_check
      return -(file->f_mapping->wb_err & MAX_ERRNO)

Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-22 00:01:06 -05:00
Steve French fb253d5ba3 smb3: add trace point for lease not found issue
When trying to debug problems with server sending us a
lease we don't recognize, it would be helpful to have
a dynamic trace point for this case.  New tracepoint
is called trace_smb3_lease_not_found

Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21 23:56:16 -05:00
Julia Lawall fb64f7f105 cifs: smbd: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21 23:35:43 -05:00
Ronnie Sahlberg c9fc5ca454 cifs: set the CREATE_NOT_FILE when opening the directory in use_cached_dir()
This enforces that we can only do this for directories and not normal files
or else the server will return an error.
This means that we will have conditionally check IF the path refers
to a directory or not in all the call-sites where we are unsure.
Right now this check is for "" i.e. root.

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21 12:23:24 -05:00
Ronnie Sahlberg 198bf836df cifs: check for smb1 in open_cached_dir()
Check protocol version in open_cached_dir() and return not supported
for SMB1.  This allows us to call open_cached_dir() from code that
is common to both smb1 and smb2/3 in future patches without having to
do this check in the call-site.
At the same time, add a check if tcon is valid or not for the same reason.

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21 12:23:08 -05:00
Ronnie Sahlberg f695b28935 cifs: move definition of cifs_fattr earlier in cifsglob.h
This only moves these definitions to come earlier in the file
but not change the definition itself.
This is done to reduce the amount of changes in future patches.

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-21 12:22:57 -05:00
Enzo Matsumiya 71081e7ac1 cifs: print TIDs as hex
Makes these debug messages easier to read

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-20 17:46:22 -05:00
Enzo Matsumiya 337b8b0e43 cifs: return ENOENT for DFS lookup_cache_entry()
EEXIST didn't make sense to use when dfs_cache_find() couldn't find a
cache entry nor retrieve a referral target.

It also doesn't make sense cifs_dfs_query_info_nonascii_quirk() to
emulate ENOENT anymore.

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-20 17:44:34 -05:00
Enzo Matsumiya 421ef3d565 cifs: don't call cifs_dfs_query_info_nonascii_quirk() if nodfs was set
Also return EOPNOTSUPP if path is remote but nodfs was set.

Fixes: a2809d0e16 ("cifs: quirk for STATUS_OBJECT_NAME_INVALID returned for non-ASCII dfs refs")
Cc: stable@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-20 17:38:11 -05:00
Paulo Alcantara d80c69846d cifs: fix signed integer overflow when fl_end is OFFSET_MAX
This fixes the following when running xfstests generic/504:

[  134.394698] CIFS: Attempting to mount \\win16.vm.test\Share
[  134.420905] CIFS: VFS: generate_smb3signingkey: dumping generated
AES session keys
[  134.420911] CIFS: VFS: Session Id    05 00 00 00 00 c4 00 00
[  134.420914] CIFS: VFS: Cipher type   1
[  134.420917] CIFS: VFS: Session Key   ea 0b d9 22 2e af 01 69 30 1b
15 74 bf 87 41 11
[  134.420920] CIFS: VFS: Signing Key   59 28 43 5c f0 b6 b1 6f f5 7b
65 f2 9f 9e 58 7d
[  134.420923] CIFS: VFS: ServerIn Key  eb aa 58 c8 95 01 9a f7 91 98
e4 fa bc d8 74 f1
[  134.420926] CIFS: VFS: ServerOut Key 08 5b 21 e5 2e 4e 86 f6 05 c2
58 e0 af 53 83 e7
[  134.771946]
================================================================================
[  134.771953] UBSAN: signed-integer-overflow in fs/cifs/file.c:1706:19
[  134.771957] 9223372036854775807 + 1 cannot be represented in type
'long long int'
[  134.771960] CPU: 4 PID: 2773 Comm: flock Not tainted 5.11.22 #1
[  134.771964] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  134.771966] Call Trace:
[  134.771970]  dump_stack+0x8d/0xb5
[  134.771981]  ubsan_epilogue+0x5/0x50
[  134.771988]  handle_overflow+0xa3/0xb0
[  134.771997]  ? lockdep_hardirqs_on_prepare+0xe8/0x1b0
[  134.772006]  cifs_setlk+0x63c/0x680 [cifs]
[  134.772085]  ? _get_xid+0x5f/0xa0 [cifs]
[  134.772085]  cifs_flock+0x131/0x400 [cifs]
[  134.772085]  __x64_sys_flock+0xfc/0x120
[  134.772085]  do_syscall_64+0x33/0x40
[  134.772085]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  134.772085] RIP: 0033:0x7fea4f83b3fb
[  134.772085] Code: ff 48 8b 15 8f 1a 0d 00 f7 d8 64 89 02 b8 ff ff
ff ff eb da e8 16 0b 02 00 66 0f 1f 44 00 00 f3 0f 1e fa b8 49 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5d 1a 0d 00 f7 d8 64 89
01 48

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-19 10:54:41 -05:00
Steve French 0a55cf74ff SMB3: EBADF/EIO errors in rename/open caused by race condition in smb2_compound_op
There is  a race condition in smb2_compound_op:

after_close:
	num_rqst++;

	if (cfile) {
		cifsFileInfo_put(cfile); // sends SMB2_CLOSE to the server
		cfile = NULL;

This is triggered by smb2_query_path_info operation that happens during
revalidate_dentry. In smb2_query_path_info, get_readable_path is called to
load the cfile, increasing the reference counter. If in the meantime, this
reference becomes the very last, this call to cifsFileInfo_put(cfile) will
trigger a SMB2_CLOSE request sent to the server just before sending this compound
request – and so then the compound request fails either with EBADF/EIO depending
on the timing at the server, because the handle is already closed.

In the first scenario, the race seems to be happening between smb2_query_path_info
triggered by the rename operation, and between “cleanup” of asynchronous writes – while
fsync(fd) likely waits for the asynchronous writes to complete, releasing the writeback
structures can happen after the close(fd) call. So the EBADF/EIO errors will pop up if
the timing is such that:
1) There are still outstanding references after close(fd) in the writeback structures
2) smb2_query_path_info successfully fetches the cfile, increasing the refcounter by 1
3) All writeback structures release the same cfile, reducing refcounter to 1
4) smb2_compound_op is called with that cfile

In the second scenario, the race seems to be similar – here open triggers the
smb2_query_path_info operation, and if all other threads in the meantime decrease the
refcounter to 1 similarly to the first scenario, again SMB2_CLOSE will be sent to the
server just before issuing the compound request. This case is harder to reproduce.

See https://bugzilla.samba.org/show_bug.cgi?id=15051

Cc: stable@vger.kernel.org
Fixes: 8de9e86c67 ("cifs: create a helper to find a writeable handle by path name")
Signed-off-by: Ondrej Hubsch <ohubsch@purestorage.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-17 13:47:27 -05:00
Linus Torvalds d928e8f3af gfs2 fixes
- Fix filesystem block deallocation for short writes.
 - Stop using glock holder auto-demotion for now.
 - Get rid of buffered writes inefficiencies due to page
   faults being disabled.
 - Minor other cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmJ+wL4UHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqNkRAAuBc4oJ3Wkp/dkfF6MDY9DXqCzhQo
 TodFIdEQvOUtncYBljE6DZQ9MT1sRvwxDe0nIjErFQzHYcW88zczILWOBRhrhlci
 kANL6JtjSvtE3kecvR9I3nZX44aDETJliV3FX8n7vDSNMTIwjzW38d0XmDLX9t8F
 bTEFv3rKsUzgcGaxpeQe7mzoQPi910WFPO/pos2ghuZwB1QEpdBrCESz4XB+OwKM
 +V+8nooHvYp8T+2AzrBM6hBgsYelrBPRXlz6cYEjWY2FQuvQk/thX+zO2dvXCsPA
 uNWJTCKJEsufWflPNI2ugZ9TVneIU1umGACoEHteeRvG6Qsg6K4Kf0EJEFf6Y0Tg
 PKUJLUcdi0suS8UuUCBTAVAgCv9+ueLhIbbFeRkbHjxSjET7NQl2gbkfY2V1RsFt
 yNFLMGU01xsb1YylncY4xQc9WVMDbPsdv1KGDF75njchuHZXhfg00ezPQys4Uy5R
 0EUwqoPYNePV6VsoHLbU+kImf936VawP916yDiyflyz6UFSi5vNg7SwMqDrXpIxM
 T8nNgrTC+Npv7T2xc8JhSFGv9T86nZXWjQTpDzV8onPvxdCLCT1cmm352aHqXd7+
 wY9ZFJZ4iMoinYUkEzgySQW00+wK/AThQKQ6ImhLEvwxMUc6dJUnesVGkzLDGh9Z
 KSfqgYzmlb2YdKY=
 =FJq+
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fixes from Andreas Gruenbacher:
 "We've finally identified commit dc732906c2 ("gfs2: Introduce flag
  for glock holder auto-demotion") to be the other cause of the
  filesystem corruption we've been seeing. This feature isn't strictly
  necessary anymore, so we've decided to stop using it for now.

  With this and the gfs_iomap_end rounding fix you've already seen
  ("gfs2: Fix filesystem block deallocation for short writes" in this
  pull request), we're corruption free again now.

   - Fix filesystem block deallocation for short writes.

   - Stop using glock holder auto-demotion for now.

   - Get rid of buffered writes inefficiencies due to page faults being
     disabled.

   - Minor other cleanups"

* tag 'gfs2-v5.18-rc4-fix3' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  gfs2: Stop using glock holder auto-demotion for now
  gfs2: buffered write prefaulting
  gfs2: Align read and write chunks to the page cache
  gfs2: Pull return value test out of should_fault_in_pages
  gfs2: Clean up use of fault_in_iov_iter_{read,write}able
  gfs2: Variable rename
  gfs2: Fix filesystem block deallocation for short writes
2022-05-13 14:32:53 -07:00
Andreas Gruenbacher e1fa9ea85c gfs2: Stop using glock holder auto-demotion for now
We're having unresolved issues with the glock holder auto-demotion mechanism
introduced in commit dc732906c2.  This mechanism was assumed to be essential
for avoiding frequent short reads and writes until commit 296abc0d91
("gfs2: No short reads or writes upon glock contention").  Since then,
when the inode glock is lost, it is simply re-acquired and the operation
is resumed.  This means that apart from the performance penalty, we
might as well drop the inode glock before faulting in pages, and
re-acquire it afterwards.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:32:52 +02:00
Andreas Gruenbacher fa5dfa645d gfs2: buffered write prefaulting
In gfs2_file_buffered_write, to increase the likelihood that all the
user memory we're trying to write will be resident in memory, carry out
the write in chunks and fault in each chunk of user memory before trying
to write it.  Otherwise, some workloads will trigger frequent short
"internal" writes, causing filesystem blocks to be allocated and then
partially deallocated again when writing into holes, which is wasteful
and breaks reservations.

Neither the chunked writes nor any of the short "internal" writes are
user visible.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:32:52 +02:00
Andreas Gruenbacher 324d116c5a gfs2: Align read and write chunks to the page cache
Align the chunks that reads and writes are carried out in to the page
cache rather than the user buffers.  This will be more efficient in
general, especially for allocating writes.  Optimizing the case that the
user buffer is gfs2 backed isn't very useful; we only need to make sure
we won't deadlock.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:00:22 +02:00
Andreas Gruenbacher 7238226450 gfs2: Pull return value test out of should_fault_in_pages
Pull the return value test of the previous read or write operation out
of should_fault_in_pages().  In a following patch, we'll fault in pages
before the I/O and there will be no return value to check.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:00:22 +02:00
Andreas Gruenbacher 6d22ff4710 gfs2: Clean up use of fault_in_iov_iter_{read,write}able
No need to store the return value of the fault_in functions in separate
variables.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:00:22 +02:00
Andreas Gruenbacher 42e4c3bdca gfs2: Variable rename
Instead of counting the number of bytes read from the filesystem,
functions gfs2_file_direct_read and gfs2_file_read_iter count the number
of bytes written into the user buffer.  Conversely, functions
gfs2_file_direct_write and gfs2_file_buffered_write count the number of
bytes read from the user buffer.  This is nothing but confusing, so
change the read functions to count how many bytes they have read, and
the write functions to count how many bytes they have written.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 22:00:22 +02:00
Andreas Gruenbacher d031a8866e gfs2: Fix filesystem block deallocation for short writes
When a write cannot be carried out in full, gfs2_iomap_end() releases
blocks that have been allocated for this write but haven't been used.

To compute the end of the allocation, gfs2_iomap_end() incorrectly
rounded the end of the attempted write down to the next block boundary
to arrive at the end of the allocation.  It would have to round up, but
the end of the allocation is also available as iomap->offset +
iomap->length, so just use that instead.

In addition, use round_up() for computing the start of the unused range.

Fixes: 64bc06bb32 ("gfs2: iomap buffered write support")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2022-05-13 21:59:18 +02:00
Linus Torvalds c3f5e692bf Two fixes to properly maintain xattrs on async creates and thus
preserve SELinux context on newly created files and to avoid improper
 usage of folio->private field which triggered BUG_ONs, both marked for
 stable.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmJ+kC4THGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi0dJB/9tCG5VsOa8wyh/eR6HZyAHAGtXEnPB
 qUn+bOvsRFw65GorwvbQw5fgXrQSaxzEgYuOEuCKOCA1KUxpYTGwYMhZIGNS0ZQm
 251MyfGl3+aahe3H2pAkbuk8HnUq+989BPiRzwpTi0vZMDZNy1nyuE7VK+Nw/77w
 E+swcFeyvmimKSlXURH6LQFzamxcnTdFoGRL1cc+97xOFSmbq3JSS+ZhrHhlZEyi
 7C5tIfz8BtNGswCh2HfHhmFWxBAe0GWll6SqvKq2go/1Cg4UUiCn5BMsGCAYnL3G
 eFnt3n7b3NARfSdoI+OCWa5yVERRDy7c4zxzyqdIxLKI1LVhXRhHqdaR
 =73K4
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.18-rc7' of https://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "Two fixes to properly maintain xattrs on async creates and thus
  preserve SELinux context on newly created files and to avoid improper
  usage of folio->private field which triggered BUG_ONs.

  Both marked for stable"

* tag 'ceph-for-5.18-rc7' of https://github.com/ceph/ceph-client:
  ceph: check folio PG_private bit instead of folio->private
  ceph: fix setting of xattrs on async created inodes
2022-05-13 11:12:04 -07:00
Linus Torvalds 6dd5884d1d NFS client bugfixes for Linux 5.18
Highlights include:
 
 Stable fixes:
 - SUNRPC: Ensure that the gssproxy client can start in a connected state
 
 Bugfixes:
 - Revert "SUNRPC: Ensure gss-proxy connects on setup"
 - nfs: fix broken handling of the softreval mount option
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJ+j5kACgkQZwvnipYK
 APIQag/+JXtDt9xo0SsGTMJ2PJArnRoyd3QcZjJtoabaZTylZyILZ20sxvt/jgby
 miOKrI+bbsykbrwQijRLF/Yys1G+iMBzSWy4lpJ9eH4AVkfjr5qqauRbPobo/ZiI
 i5fDLR84FlzKYWBY1Nv6YE5VREukIlXbrq7KWe/HoS7/hSAamhkUv+a0M8iNLGO4
 QtH7+M0iBZTI9yM4gFAcMAANV21SxvjqP1z62kbCp00qO2mL0PgF/2pxC9WgX24/
 EZX037ykzKFkjgWfzT8+/eIfCQGIPi/9e6Ir4Bc99psVFOYd1YxkTLBycNwm1VOz
 5RLORbURDVMPQH2/qZ57u7B0gJF76UZM4pH3gv+i9nFhoUqf3kFZAOy48MZEFz3X
 sPiQZLck63mvO9Bd3QX6pFZc0datiYmhuXRknjxV6Bz/Y41y3NzeJeX4r88s4q7l
 tisZDmaIm0Q+H07QOTL0aCk456amP6XLnO1+nu/PSR/3ImwJSaOpypHx/BxDJUTG
 TvpY1ouBZ8irfT6JrbfVnSdbedIZx4c+6btVw6mlT40edZF6M3r+3s8s2TU7x+co
 uiBMB4Qj/C19zqcf/DziiL1PZEJPm3lk0fBHc6JIWCV4I3eYxQ5J4nFJsy8/jPm1
 lTgreoHWfnYC3WhlxUk+N3+9X/9+iFEJUxqGYfANf8GFaGqcYVQ=
 =/Yg+
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "One more pull request. There was a bug in the fix to ensure that gss-
  proxy continues to work correctly after we fixed the AF_LOCAL socket
  leak in the RPC code. This therefore reverts that broken patch, and
  replaces it with one that works correctly.

  Stable fixes:

   - SUNRPC: Ensure that the gssproxy client can start in a connected
     state

  Bugfixes:

   - Revert "SUNRPC: Ensure gss-proxy connects on setup"

   - nfs: fix broken handling of the softreval mount option"

* tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  nfs: fix broken handling of the softreval mount option
  SUNRPC: Ensure that the gssproxy client can start in a connected state
  Revert "SUNRPC: Ensure gss-proxy connects on setup"
2022-05-13 11:04:37 -07:00
Linus Torvalds 364a453ab9 hotfixes for 5.18-rc7
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYnvwxgAKCRDdBJ7gKXxA
 jhymAQDvHnFT3F5ydvBqApbzrQRUk/+fkkQSrF/xYawknZNgkAEA6Tnh9XqYplJN
 bbmml6HTVvDjprEOCGakY/Kyz7qmdQ0=
 =SMJQ
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Seven MM fixes, three of which address issues added in the most recent
  merge window, four of which are cc:stable.

  Three non-MM fixes, none very serious"

[ And yes, that's a real pull request from Andrew, not me creating a
  branch from emailed patches. Woo-hoo! ]

* tag 'mm-hotfixes-stable-2022-05-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  MAINTAINERS: add a mailing list for DAMON development
  selftests: vm: Makefile: rename TARGETS to VMTARGETS
  mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool
  mailmap: add entry for martyna.szapar-mudlaw@intel.com
  arm[64]/memremap: don't abuse pfn_valid() to ensure presence of linear map
  procfs: prevent unprivileged processes accessing fdinfo dir
  mm: mremap: fix sign for EFAULT error return value
  mm/hwpoison: use pr_err() instead of dump_page() in get_any_page()
  mm/huge_memory: do not overkill when splitting huge_zero_page
  Revert "mm/memory-failure.c: skip huge_zero_page in memory_failure()"
2022-05-13 10:22:37 -07:00
Linus Torvalds c37dba6ae4 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmJ9O8cACgkQnJ2qBz9k
 QNk2sQf/amL1My1giNxtKwWJsHxljHqQH3W695mH1Q4fs1V9zKLbRv+ZLCcgIUBq
 qUq1EvOT5aUfLugKh1St6WyrfZwqgSWILHYeLhB8WS08roqGrWNB09c+ik2YW1Fz
 xGsLhwc9CBOn6A79+woN33kRUAyWH+TTDt7rr4ongAwGDPEpmmbTnbDvDY7xgVh/
 AOXg75x5AhwVg/DxiLRPhaXk9ME8OT1Sc8mKVfboQsT/yz6rRHZCoUJwiVAXPNrN
 rG4pqNn+ncrvR1/tMQENWbSp+nIFN5PG5RzKf71sQrSfn2Voo6Bz8zoHYn7EulTT
 3uNfXgR6aqOB17b5ptAOCCcjiWrBEQ==
 =//1B
 -----END PGP SIGNATURE-----

Merge tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fs fixes from Jan Kara:
 "Three fixes that I'd still like to get to 5.18:

   - add a missing sanity check in the fanotify FAN_RENAME feature
     (added in 5.17, let's fix it before it gets wider usage in
     userspace)

   - udf fix for recently introduced filesystem corruption issue

   - writeback fix for a race in inode list handling that can lead to
     delayed writeback and possible dirty throttling stalls"

* tag 'fixes_for_v5.18-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  udf: Avoid using stale lengthOfImpUse
  writeback: Avoid skipping inode writeback
  fanotify: do not allow setting dirent events in mask of non-dir
2022-05-12 10:21:44 -07:00
Jan Kara c1ad35dd05 udf: Avoid using stale lengthOfImpUse
udf_write_fi() uses lengthOfImpUse of the entry it is writing to.
However this field has not yet been initialized so it either contains
completely bogus value or value from last directory entry at that place.
In either case this is wrong and can lead to filesystem corruption or
kernel crashes.

Reported-by: butt3rflyh4ck <butterflyhuangxx@gmail.com>
CC: stable@vger.kernel.org
Fixes: 979a6e28dd ("udf: Get rid of 0-length arrays in struct fileIdentDesc")
Signed-off-by: Jan Kara <jack@suse.cz>
2022-05-10 13:30:32 +02:00
Jing Xia 846a3351dd writeback: Avoid skipping inode writeback
We have run into an issue that a task gets stuck in
balance_dirty_pages_ratelimited() when perform I/O stress testing.
The reason we observed is that an I_DIRTY_PAGES inode with lots
of dirty pages is in b_dirty_time list and standard background
writeback cannot writeback the inode.
After studing the relevant code, the following scenario may lead
to the issue:

task1                                   task2
-----                                   -----
fuse_flush
 write_inode_now //in b_dirty_time
  writeback_single_inode
   __writeback_single_inode
                                 fuse_write_end
                                  filemap_dirty_folio
                                   __xa_set_mark:PAGECACHE_TAG_DIRTY
    lock inode->i_lock
    if mapping tagged PAGECACHE_TAG_DIRTY
    inode->i_state |= I_DIRTY_PAGES
    unlock inode->i_lock
                                   __mark_inode_dirty:I_DIRTY_PAGES
                                      lock inode->i_lock
                                      -was dirty,inode stays in
                                      -b_dirty_time
                                      unlock inode->i_lock

   if(!(inode->i_state & I_DIRTY_All))
      -not true,so nothing done

This patch moves the dirty inode to b_dirty list when the inode
currently is not queued in b_io or b_more_io list at the end of
writeback_single_inode.

Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
CC: stable@vger.kernel.org
Fixes: 0ae45f63d4 ("vfs: add support for a lazytime mount option")
Signed-off-by: Jing Xia <jing.xia@unisoc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220510023514.27399-1-jing.xia@unisoc.com
2022-05-10 12:04:21 +02:00
Xiubo Li 642d51fb07 ceph: check folio PG_private bit instead of folio->private
The pages in the file mapping maybe reclaimed and reused by other
subsystems and the page->private maybe used as flags field or
something else, if later that pages are used by page caches again
the page->private maybe not cleared as expected.

Here will check the PG_private bit instead of the folio->private.

Cc: stable@vger.kernel.org
URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Luis Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10 09:48:31 +02:00
Jeff Layton 620239d9a3 ceph: fix setting of xattrs on async created inodes
Currently when we create a file, we spin up an xattr buffer to send
along with the create request. If we end up doing an async create
however, then we currently pass down a zero-length xattr buffer.

Fix the code to send down the xattr buffer in req->r_pagelist. If the
xattrs span more than a page, however give up and don't try to do an
async create.

Cc: stable@vger.kernel.org
URL: https://bugzilla.redhat.com/show_bug.cgi?id=2063929
Fixes: 9a8d03ca2e ("ceph: attempt to do async create when possible")
Reported-by: John Fortin <fortinj66@gmail.com>
Reported-by: Sri Ramanujam <sri@ramanujam.io>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-10 09:48:31 +02:00
Kalesh Singh 1927e498ae procfs: prevent unprivileged processes accessing fdinfo dir
The file permissions on the fdinfo dir from were changed from
S_IRUSR|S_IXUSR to S_IRUGO|S_IXUGO, and a PTRACE_MODE_READ check was added
for opening the fdinfo files [1].  However, the ptrace permission check
was not added to the directory, allowing anyone to get the open FD numbers
by reading the fdinfo directory.

Add the missing ptrace permission check for opening the fdinfo directory.

[1] https://lkml.kernel.org/r/20210308170651.919148-1-kaleshsingh@google.com

Link: https://lkml.kernel.org/r/20210713162008.1056986-1-kaleshsingh@google.com
Fixes: 7bc3fa0172 ("procfs: allow reading fdinfo with PTRACE_MODE_READ")
Signed-off-by: Kalesh Singh <kaleshsingh@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Hridya Valsaraju <hridya@google.com>
Cc: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 17:34:28 -07:00
Dan Aloni 085d16d5f9 nfs: fix broken handling of the softreval mount option
Turns out that ever since this mount option was added, passing
`softreval` in NFS mount options cancelled all other flags while not
affecting the underlying flag `NFS_MOUNT_SOFTREVAL`.

Fixes: c74dfe97c1 ("NFS: Add mount option 'softreval'")
Signed-off-by: Dan Aloni <dan.aloni@vastdata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-05-09 13:02:54 -04:00
Amir Goldstein ceaf69f8ea fanotify: do not allow setting dirent events in mask of non-dir
Dirent events (create/delete/move) are only reported on watched
directory inodes, but in fanotify as well as in legacy inotify, it was
always allowed to set them on non-dir inode, which does not result in
any meaningful outcome.

Until kernel v5.17, dirent events in fanotify also differed from events
"on child" (e.g. FAN_OPEN) in the information provided in the event.
For example, FAN_OPEN could be set in the mask of a non-dir or the mask
of its parent and event would report the fid of the child regardless of
the marked object.
By contrast, FAN_DELETE is not reported if the child is marked and the
child fid was not reported in the events.

Since kernel v5.17, with fanotify group flag FAN_REPORT_TARGET_FID, the
fid of the child is reported with dirent events, like events "on child",
which may create confusion for users expecting the same behavior as
events "on child" when setting events in the mask on a child.

The desired semantics of setting dirent events in the mask of a child
are not clear, so for now, deny this action for a group initialized
with flag FAN_REPORT_TARGET_FID and for the new event FAN_RENAME.
We may relax this restriction in the future if we decide on the
semantics and implement them.

Fixes: d61fd650e9 ("fanotify: introduce group flag FAN_REPORT_TARGET_FID")
Fixes: 8cc3b1ccd9 ("fanotify: wire up FAN_RENAME event")
Link: https://lore.kernel.org/linux-fsdevel/20220505133057.zm5t6vumc4xdcnsg@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220507080028.219826-1-amir73il@gmail.com
2022-05-09 11:49:09 +02:00
Linus Torvalds b366bd7d96 io_uring-5.18-2022-05-06
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJ10okQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprhNEADcc6I72dvyd5RVh0bAfWIVlk4QUStGG5rf
 AP+DdefmQBZaipjdkZ3MRUJ5brr7BD2Ioizo6vShDq5uymfWVS5EP3vKYkLSrBO1
 7cmPtNG1qEwnwNEVtDDKKrZwFe5qM/3ayrlBcGQG3SNSLhYe44+Plc3TmrFNzCSi
 bTHphi06+iBujBG/o3G1mlo307Y9GJboxGEEVUkzyvUZ2wB+NLMAJ04fpSrMPGIa
 be/Y2PLcM4FOlM8HnUEcjfjQUuygBW7iarcGZyckkfKJOMkRcHwMvl9Z3Dyi4byH
 srBn2KQINlsLOVbn4q0U76dgqHlJ1lpARNHuDOEBp5DbcF2XX/ZrEaBmLFD1LNpb
 0U+uZ9H82lBj2bH5iaz4se2/oGbhAlo77eGGshSMzlLE9BVd62JBmdIqYJhNlgxi
 CKM2WZZJAJSS4ftiSyJIkZ0ty1iM3vgvj0sbLe7gIZYpu9MhIUVICESXTim2VCd4
 lAJtKJTIL2ad//WLgfTRWzSUOVCzQIpOMezIkz4lJWFZ+xOh8WBl5l7wXvVb21Ld
 JHOXcq/OFZJfEIRE2k0qWzQPjqOohwASDi5hI3ceNM2ZjjgmDtZmcdgajT9u4KFl
 q9dLW6e5DdMAIjAOne/1mVnq38xj4jhrV3rtIAcjU4ESXUACTThThbLUcRNzEz+4
 Hpo1FNnltQ==
 =1wM2
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block

Pull io_uring fix from Jens Axboe:
 "Just a single file assignment fix this week"

* tag 'io_uring-5.18-2022-05-06' of git://git.kernel.dk/linux-block:
  io_uring: assign non-fixed early for async work
2022-05-07 10:41:41 -07:00
Linus Torvalds 4b97bac075 for-5.18-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmJ1X+YACgkQxWXV+ddt
 WDs75g/8DsrDBe/Sbn/bDLRw30D0StC5xRhaVJLNcAhSfIyyoBo0EKifd2nXdNSn
 rdc3pvRToPh2X1YQ/5FmtxuYW12N4pVOfWtXKdoFFXabMVJetDBSnS8KNzxBd9Ys
 OkiKj2qb8H+1uNwWVLxfbFBNWWPyCQLe/2DDmePxOAszs4YUPvwxffyA8oclyHb5
 wW0qOJAC76Lz6y5Wo/GJrtAdlvWwb3t8+IDUKgGPT1BEIOE/fna+MhvEwqvCdY9F
 PPU5sWIXkAv3addgrcBHVX5HdAzRC0jAv2lHsttfu4dEOmTzw8dCTUh/IzSRa3fj
 IVy7AIGR+ZR++d4tOhAEZBe1KAqntY3UVmXY19cKTHOLLWFjv4XkKw8KJzsDiHUt
 rczedAFdgRRo9QIgwSsAU9Zi8DSG56BSenxsqFzqiL5BDDX1bUFXCegNYR42GfB/
 8E89eYkBCXxP6XeA1+44EOalCdqLKuQOyEijTUxn0UDGdqHHB1Gd/IUQDU5fpCqo
 kX6gdgMhNmjGJ/zETfOdFrYZaHYJUiiIRc6z8SCgM3JIPBgVw0FAa99Hs8Pl+eJn
 idmpfnHn6Xvmq46FISVRWgolkuj7VzTEOM65rNgOY3889Vk9Qt2qjI/DvYPGDs9Y
 9PQI1FY+2E+ZqbNY8dZpXDii+6y36aAmGR1B10x3545+C36CA/0=
 =S4KK
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Regression fixes in zone activation:

   - move a loop invariant out of the loop to avoid checking space
     status

   - properly handle unlimited activation

  Other fixes:

   - for subpage, force the free space v2 mount to avoid a warning and
     make it easy to switch a filesystem on different page size systems

   - export sysfs status of exclusive operation 'balance paused', so the
     user space tools can recognize it and allow adding a device with
     paused balance

   - fix assertion failure when logging directory key range item"

* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: sysfs: export the balance paused state of exclusive operation
  btrfs: fix assertion failure when logging directory key range item
  btrfs: zoned: activate block group properly on unlimited active zone device
  btrfs: zoned: move non-changing condition check out of the loop
  btrfs: force v2 space cache usage for subpage mount
2022-05-06 14:32:16 -07:00
Linus Torvalds adcffc1716 NFS client bugfixes for Linux 5.18
Highlights include:
 
 Stable fixes:
 - Fix a socket leak when setting up an AF_LOCAL RPC client
 - Ensure that knfsd connects to the gss-proxy daemon on setup
 
 Bugfixes:
 - Fix a refcount leak when migrating a task off an offlined transport
 - Don't gratuitously invalidate inode attributes on delegation return
 - Don't leak sockets in xs_local_connect()
 - Ensure timely close of disconnected AF_LOCAL sockets
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmJ1f3wACgkQZwvnipYK
 APKKIRAAmVUswcfRQ9wSz5wW6DCFU9hdsN9JD4pAcPvWAYGo8fqmn3I3qe/iaBCf
 rrJF38SfQVygtthmAY4CBBwOiVxm2fvqanML2lta+ZUU15MqoH2px3kMYemRyulJ
 9/2yP25AUSgkmwdEmm69hIXJkEJa3dsjg+LajQZ5X01DgKSfpObS5s9t/upM9kve
 Wqz5QRr+aJnZuuYYJWxNmXZ4XQEkzHccg3aSswB6bEsEGNXKo8NnWryrSMnWTW1y
 rQCb0e+gxpoVFgV3ngP1r9xT2l2ISbJIIhTPoj5hSjSVlFvQlIEyHtGA2vuIEZH9
 hPJAnaSc7Xb+QER6XfZkTxjW+jtMl5OmMKkWUcUmHiYv2KIM8dUAd3ANnbDBCUvw
 C5bGF907Qjqs5d2VdfsbisT9ikyn+xw6SFxcr9HYyH2T3dIsC1A8P9uUvn/afwUQ
 EPfQIsIEDeufo6O8KLfF+gCO9kbk9rdaP8Bv3B2H94aRs1yYde9bJpa7QABncGbA
 otWehkX/AbrIa4Zjp1ELzcVJxlIl+/AtxzCdGY2me1Ds388U/RKsyDWwXuGynLP6
 98ycdtHWVyoJ48L5kZowuj8/3tEB998En5hh0HSuAd0DYkAuGxaSGb+iuwKi/M0H
 +D1wZxef49r2ggQkEOsllTEjJKSHcq1+vCVASZ8ITEbcVUSiO90=
 =LSoH
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - Fix a socket leak when setting up an AF_LOCAL RPC client

   - Ensure that knfsd connects to the gss-proxy daemon on setup

  Bugfixes:

   - Fix a refcount leak when migrating a task off an offlined transport

   - Don't gratuitously invalidate inode attributes on delegation return

   - Don't leak sockets in xs_local_connect()

   - Ensure timely close of disconnected AF_LOCAL sockets"

* tag 'nfs-for-5.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  Revert "SUNRPC: attempt AF_LOCAL connect on setup"
  SUNRPC: Ensure gss-proxy connects on setup
  SUNRPC: Ensure timely close of disconnected AF_LOCAL sockets
  SUNRPC: Don't leak sockets in xs_local_connect()
  NFSv4: Don't invalidate inode attributes on delegation return
  SUNRPC release the transport of a relocated task with an assigned transport
2022-05-06 13:19:11 -07:00
David Sterba 3e1ad19638 btrfs: sysfs: export the balance paused state of exclusive operation
The new state allowing device addition with paused balance is not
exported to user space so it can't recognize it and actually start the
operation.

Fixes: efc0e69c2f ("btrfs: introduce exclusive operation BALANCE_PAUSED state")
CC: stable@vger.kernel.org # 5.17
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Filipe Manana 750ee45490 btrfs: fix assertion failure when logging directory key range item
When inserting a key range item (BTRFS_DIR_LOG_INDEX_KEY) while logging
a directory, we don't expect the insertion to fail with -EEXIST, because
we are holding the directory's log_mutex and we have dropped all existing
BTRFS_DIR_LOG_INDEX_KEY keys from the log tree before we started to log
the directory. However it's possible that during the logging we attempt
to insert the same BTRFS_DIR_LOG_INDEX_KEY key twice, but for this to
happen we need to race with insertions of items from other inodes in the
subvolume's tree while we are logging a directory. Here's how this can
happen:

1) We are logging a directory with inode number 1000 that has its items
   spread across 3 leaves in the subvolume's tree:

   leaf A - has index keys from the range 2 to 20 for example. The last
   item in the leaf corresponds to a dir item for index number 20. All
   these dir items were created in a past transaction.

   leaf B - has index keys from the range 22 to 100 for example. It has
   no keys from other inodes, all its keys are dir index keys for our
   directory inode number 1000. Its first key is for the dir item with
   a sequence number of 22. All these dir items were also created in a
   past transaction.

   leaf C - has index keys for our directory for the range 101 to 120 for
   example. This leaf also has items from other inodes, and its first
   item corresponds to the dir item for index number 101 for our directory
   with inode number 1000;

2) When we finish processing the items from leaf A at log_dir_items(),
   we log a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and a last
   offset of 21, meaning the log is authoritative for the index range
   from 21 to 21 (a single sequence number). At this point leaf B was
   not yet modified in the current transaction;

3) When we return from log_dir_items() we have released our read lock on
   leaf B, and have set *last_offset_ret to 21 (index number of the first
   item on leaf B minus 1);

4) Some other task inserts an item for other inode (inode number 1001 for
   example) into leaf C. That resulted in pushing some items from leaf C
   into leaf B, in order to make room for the new item, so now leaf B
   has dir index keys for the sequence number range from 22 to 102 and
   leaf C has the dir items for the sequence number range 103 to 120;

5) At log_directory_changes() we call log_dir_items() again, passing it
   a 'min_offset' / 'min_key' value of 22 (*last_offset_ret from step 3
   plus 1, so 21 + 1). Then btrfs_search_forward() leaves us at slot 0
   of leaf B, since leaf B was modified in the current transaction.

   We have also initialized 'last_old_dentry_offset' to 20 after calling
   btrfs_previous_item() at log_dir_items(), as it left us at the last
   item of leaf A, which refers to the dir item with sequence number 20;

6) We then call process_dir_items_leaf() to process the dir items of
   leaf B, and when we process the first item, corresponding to slot 0,
   sequence number 22, we notice the dir item was created in a past
   transaction and its sequence number is greater than the value of
   *last_old_dentry_offset + 1 (20 + 1), so we decide to log again a
   BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and an end range
   of 21 (key.offset - 1 == 22 - 1 == 21), which results in an -EEXIST
   error from insert_dir_log_key(), as we have already inserted that
   key at step 2, triggering the assertion at process_dir_items_leaf().

The trace produced in dmesg is like the following:

assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857
[198255.980839][ T7460] ------------[ cut here ]------------
[198255.981666][ T7460] kernel BUG at fs/btrfs/ctree.h:3617!
[198255.983141][ T7460] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI
[198255.984080][ T7460] CPU: 0 PID: 7460 Comm: repro-ghost-dir Not tainted 5.18.0-5314c78ac373-misc-next+
[198255.986027][ T7460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[198255.988600][ T7460] RIP: 0010:assertfail.constprop.0+0x1c/0x1e
[198255.989465][ T7460] Code: 8b 4c 89 (...)
[198255.992599][ T7460] RSP: 0018:ffffc90007387188 EFLAGS: 00010282
[198255.993414][ T7460] RAX: 000000000000003d RBX: 0000000000000065 RCX: 0000000000000000
[198255.996056][ T7460] RDX: 0000000000000001 RSI: ffffffff8b62b180 RDI: fffff52000e70e24
[198255.997668][ T7460] RBP: ffffc90007387188 R08: 000000000000003d R09: ffff8881f0e16507
[198255.999199][ T7460] R10: ffffed103e1c2ca0 R11: 0000000000000001 R12: 00000000ffffffef
[198256.000683][ T7460] R13: ffff88813befc630 R14: ffff888116c16e70 R15: ffffc90007387358
[198256.007082][ T7460] FS:  00007fc7f7c24640(0000) GS:ffff8881f0c00000(0000) knlGS:0000000000000000
[198256.009939][ T7460] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[198256.014133][ T7460] CR2: 0000560bb16d0b78 CR3: 0000000140b34005 CR4: 0000000000170ef0
[198256.015239][ T7460] Call Trace:
[198256.015674][ T7460]  <TASK>
[198256.016313][ T7460]  log_dir_items.cold+0x16/0x2c
[198256.018858][ T7460]  ? replay_one_extent+0xbf0/0xbf0
[198256.025932][ T7460]  ? release_extent_buffer+0x1d2/0x270
[198256.029658][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.031114][ T7460]  ? lock_acquired+0xbe/0x660
[198256.032633][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.034386][ T7460]  ? lock_release+0xcf/0x8a0
[198256.036152][ T7460]  log_directory_changes+0xf9/0x170
[198256.036993][ T7460]  ? log_dir_items+0xba0/0xba0
[198256.037661][ T7460]  ? do_raw_write_unlock+0x7d/0xe0
[198256.038680][ T7460]  btrfs_log_inode+0x233b/0x26d0
[198256.041294][ T7460]  ? log_directory_changes+0x170/0x170
[198256.042864][ T7460]  ? btrfs_attach_transaction_barrier+0x60/0x60
[198256.045130][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.046568][ T7460]  ? lock_release+0xcf/0x8a0
[198256.047504][ T7460]  ? lock_downgrade+0x420/0x420
[198256.048712][ T7460]  ? ilookup5_nowait+0x81/0xa0
[198256.049747][ T7460]  ? lock_downgrade+0x420/0x420
[198256.050652][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.051618][ T7460]  ? __might_resched+0x128/0x1c0
[198256.052511][ T7460]  ? __might_sleep+0x66/0xc0
[198256.053442][ T7460]  ? __kasan_check_read+0x11/0x20
[198256.054251][ T7460]  ? iget5_locked+0xbd/0x150
[198256.054986][ T7460]  ? run_delayed_iput_locked+0x110/0x110
[198256.055929][ T7460]  ? btrfs_iget+0xc7/0x150
[198256.056630][ T7460]  ? btrfs_orphan_cleanup+0x4a0/0x4a0
[198256.057502][ T7460]  ? free_extent_buffer+0x13/0x20
[198256.058322][ T7460]  btrfs_log_inode+0x2654/0x26d0
[198256.059137][ T7460]  ? log_directory_changes+0x170/0x170
[198256.060020][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.060930][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.061905][ T7460]  ? lock_contended+0x770/0x770
[198256.062682][ T7460]  ? btrfs_log_inode_parent+0xd04/0x1750
[198256.063582][ T7460]  ? lock_downgrade+0x420/0x420
[198256.064432][ T7460]  ? preempt_count_sub+0x18/0xc0
[198256.065550][ T7460]  ? __mutex_lock+0x580/0xdc0
[198256.066654][ T7460]  ? stack_trace_save+0x94/0xc0
[198256.068008][ T7460]  ? __kasan_check_write+0x14/0x20
[198256.072149][ T7460]  ? __mutex_unlock_slowpath+0x12a/0x430
[198256.073145][ T7460]  ? mutex_lock_io_nested+0xcd0/0xcd0
[198256.074341][ T7460]  ? wait_for_completion_io_timeout+0x20/0x20
[198256.075345][ T7460]  ? lock_downgrade+0x420/0x420
[198256.076142][ T7460]  ? lock_contended+0x770/0x770
[198256.076939][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.078401][ T7460]  ? btrfs_sync_file+0x5e6/0xa40
[198256.080598][ T7460]  btrfs_log_inode_parent+0x523/0x1750
[198256.081991][ T7460]  ? wait_current_trans+0xc8/0x240
[198256.083320][ T7460]  ? lock_downgrade+0x420/0x420
[198256.085450][ T7460]  ? btrfs_end_log_trans+0x70/0x70
[198256.086362][ T7460]  ? rcu_read_lock_sched_held+0x16/0x80
[198256.087544][ T7460]  ? lock_release+0xcf/0x8a0
[198256.088305][ T7460]  ? lock_downgrade+0x420/0x420
[198256.090375][ T7460]  ? dget_parent+0x8e/0x300
[198256.093538][ T7460]  ? do_raw_spin_lock+0x1c0/0x1c0
[198256.094918][ T7460]  ? lock_downgrade+0x420/0x420
[198256.097815][ T7460]  ? do_raw_spin_unlock+0xa9/0x100
[198256.101822][ T7460]  ? dget_parent+0xb7/0x300
[198256.103345][ T7460]  btrfs_log_dentry_safe+0x48/0x60
[198256.105052][ T7460]  btrfs_sync_file+0x629/0xa40
[198256.106829][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.109655][ T7460]  ? __fget_files+0x161/0x230
[198256.110760][ T7460]  vfs_fsync_range+0x6d/0x110
[198256.111923][ T7460]  ? start_ordered_ops.constprop.0+0x120/0x120
[198256.113556][ T7460]  __x64_sys_fsync+0x45/0x70
[198256.114323][ T7460]  do_syscall_64+0x5c/0xc0
[198256.115084][ T7460]  ? syscall_exit_to_user_mode+0x3b/0x50
[198256.116030][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.116768][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.117555][ T7460]  ? do_syscall_64+0x69/0xc0
[198256.118324][ T7460]  ? sysvec_call_function_single+0x57/0xc0
[198256.119308][ T7460]  ? asm_sysvec_call_function_single+0xa/0x20
[198256.120363][ T7460]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[198256.121334][ T7460] RIP: 0033:0x7fc7fe97b6ab
[198256.122067][ T7460] Code: 0f 05 48 (...)
[198256.125198][ T7460] RSP: 002b:00007fc7f7c23950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[198256.126568][ T7460] RAX: ffffffffffffffda RBX: 00007fc7f7c239f0 RCX: 00007fc7fe97b6ab
[198256.127942][ T7460] RDX: 0000000000000002 RSI: 000056167536bcf0 RDI: 0000000000000004
[198256.129302][ T7460] RBP: 0000000000000004 R08: 0000000000000000 R09: 000000007ffffeb8
[198256.130670][ T7460] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001
[198256.132046][ T7460] R13: 0000561674ca8140 R14: 00007fc7f7c239d0 R15: 000056167536dab8
[198256.133403][ T7460]  </TASK>

Fix this by treating -EEXIST as expected at insert_dir_log_key() and have
it update the item with an end offset corresponding to the maximum between
the previously logged end offset and the new requested end offset. The end
offsets may be different due to dir index key deletions that happened as
part of unlink operations while we are logging a directory (triggered when
fsyncing some other inode parented by the directory) or during renames
which always attempt to log a single dir index deletion.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Link: https://lore.kernel.org/linux-btrfs/YmyefE9mc2xl5ZMz@hungrycats.org/
Fixes: 732d591a5d ("btrfs: stop copying old dir items when logging a directory")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Naohiro Aota ceb4f60830 btrfs: zoned: activate block group properly on unlimited active zone device
btrfs_zone_activate() checks if it activated all the underlying zones in
the loop. However, that check never hit on an unlimited activate zone
device (max_active_zones == 0).

Fortunately, it still works without ENOSPC because btrfs_zone_activate()
returns true in the end, even if block_group->zone_is_active == 0. But, it
is confusing to have non zone_is_active block group still usable for
allocation. Also, we are wasting CPU time to iterate the loop every time
btrfs_zone_activate() is called for the blog groups.

Since error case in the loop is handled by out_unlock, we can just set
zone_is_active and do the list stuff after the loop.

Fixes: f9a912a3c4 ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Naohiro Aota 549577127a btrfs: zoned: move non-changing condition check out of the loop
btrfs_zone_activate() checks if block_group->alloc_offset ==
block_group->zone_capacity every time it iterates the loop. But, it is
not depending on the index. Move out the check and do it only once.

Fixes: f9a912a3c4 ("btrfs: zoned: make zone activation multi stripe capable")
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Qu Wenruo 9f73f1aef9 btrfs: force v2 space cache usage for subpage mount
[BUG]
For a 4K sector sized btrfs with v1 cache enabled and only mounted on
systems with 4K page size, if it's mounted on subpage (64K page size)
systems, it can cause the following warning on v1 space cache:

 BTRFS error (device dm-1): csum mismatch on free space cache
 BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now

Although not a big deal, as kernel can rebuild it without problem, such
warning will bother end users, especially if they want to switch the
same btrfs seamlessly between different page sized systems.

[CAUSE]
V1 free space cache is still using fixed PAGE_SIZE for various bitmap,
like BITS_PER_BITMAP.

Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1
cache size to checksum.

Thus kernel will always reject v1 cache with a different PAGE_SIZE with
csum mismatch.

[FIX]
Although we should fix v1 cache, it's already going to be marked
deprecated soon.

And we have v2 cache based on metadata (which is already fully subpage
compatible), and it has almost everything superior than v1 cache.

So just force subpage mount to use v2 cache on mount.

Reported-by: Matt Corallo <blnxfsl@bluematt.me>
CC: stable@vger.kernel.org # 5.15+
Link: https://lore.kernel.org/linux-btrfs/61aa27d1-30fc-c1a9-f0f4-9df544395ec3@bluematt.me/
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-05-05 21:05:56 +02:00
Linus Torvalds 9050ba3a61 for-5.18-rc5-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmJwBoQACgkQxWXV+ddt
 WDu/FQ/+L2LpN5Zu1NkjOAh2Lcvz5RYZjcVext4bbPUW2yhXYH3e6836R/feLOCG
 RxRICOAQhJ7I6ct/N1aToI2AbjWnMSJK+IgageA1UdIS8McbcSP/qJOYwJ/+2Xhl
 AvK5psoj+qwGbTI9e0luNe6b+UWTGIMyYRjmN0SlkBOdg9/xqQpBMQxfKJumMvEc
 3ZwLpcNjhUUwdFKHvHZNCOQhZiZwloKFeq9MLaEL5LO30wKSY6ShiCA3pafFoVN0
 mvEEVtIGgZUsgeQTzSzD8UhGDvZtZ1+aaX0dcNMRzwI2h2pkBmPkd5QtFM9Qs0xP
 hGibSN9bC/SzQyE9v4cKohwS+g4dE4r+dUWFNpdZLIOpBt5PJBDA0tjcjxquFtMr
 6JX77coAl9kt0jspBmHVPb3qmIc1Xo3Iw2PrVgTK14QUo46XwF5Rga68wKOfNt0u
 LbD9+KCLnwxoOhvXh/LJX6nvPT8tuMrT/5AOXULI2oMCnpCY6Vl+kX8zFlAfbDk1
 d4/jy42bgHCso60j2vAcdQmZB+/snpboOhKJwkE2FqOTs+hBR8PBln6BqEt5xkHZ
 q2mBfYDujnmZWaiAU6+ETjOcCooWQioi3335tp/C9TdOrIkFDij1ztYRxhgP0g0X
 w6d6wlbkcol1Zubb/zEiBiwe+6GR/KtNYc4PBEfVe5Vw0npFreU=
 =7k9f
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "A few more fixes mostly around how some file attributes could be set.

   - fix handling of compression property:
      - don't allow setting it on anything else than regular file or
        directory
      - do not allow setting it on nodatacow files via properties

   - improved error handling when setting xattr

   - make sure symlinks are always properly logged"

* tag 'for-5.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: skip compression property for anything other than files and dirs
  btrfs: do not BUG_ON() on failure to update inode when setting xattr
  btrfs: always log symlinks in full mode
  btrfs: do not allow compression on nodatacow files
  btrfs: export a helper for compression hard check
2022-05-02 10:09:02 -07:00
Jens Axboe a196c78b54 io_uring: assign non-fixed early for async work
We defer file assignment to ensure that fixed files work with links
between a direct accept/open and the links that follow it. But this has
the side effect that normal file assignment is then not complete by the
time that request submission has been done.

For deferred execution, if the file is a regular file, assign it when
we do the async prep anyway.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02 08:09:39 -06:00