Commit Graph

80907 Commits

Author SHA1 Message Date
Namjae Jeon 728f14c72b ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
If vfs objects = streams_xattr in ksmbd.conf FILE_NAMED_STREAMS should
be set to Attributes in FS_ATTRIBUTE_INFORMATION. MacOS client show
"Format: SMB (Unknown)" on faked NTFS and no streams support.

Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:32:50 -05:00
Namjae Jeon 7a891d4b62 ksmbd: fix wrong signingkey creation when encryption is AES256
MacOS and Win11 support AES256 encrytion and it is included in the cipher
array of encryption context. Especially on macOS, The most preferred
cipher is AES256. Connecting to ksmbd fails on newer MacOS clients that
support AES256 encryption. MacOS send disconnect request after receiving
final session setup response from ksmbd. Because final session setup is
signed with signing key was generated incorrectly.
For signging key, 'L' value should be initialized to 128 if key size is
16bytes.

Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:32:50 -05:00
Linus Torvalds a2eaf246f5 nfsd-6.3 fixes:
- Fix a crash during NFS READs from certain client implementations
 - Address a minor kbuild regression in v6.3
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQaEJgACgkQM2qzM29m
 f5ceYBAAiGHz+ISIIirg2+B5DttDL5ssPoqbgfZ16JZfBgaDduOg8oK7pbILv1nf
 4ufZQRcLpJDSnzhqNXxQE/L4Vsei8+tqkbwILSqFsCBZk5ei2AYSsZMu7Ptf8X8X
 Y+KZ55Bko0uTYAsglnPiEaDEGeq0ZYwfUc5/bnw7ZzXdAHhlwgjoAads6Q25A9f1
 6ZuKQ6nIis/ok9q88GdLf7x51IdKrXTrkC1iYAeVTXCgR3qXPezqmPOzW44rZs0t
 TMzUgDuyPgvB5eT/N+pwXoI9Bli9v1PFoXZjm2NchhJ/Iy6Gt1eEEB93J/qRSU/x
 ziJLkk0jnJEPWJ0Ht5li4+60b58t2OoWCTJTr3FzHCcF5Iqr7YjenSi2U8IT84ct
 EeFdqR2nZpJ4Y1wbAhTjQ2w6rw5WpaXDLHCRRdE7OF/D121n3FkSNo8DtSn+fbR8
 sFDpOruZn4tjgpf3AjCR1zbyKw9baC6clhnLrW6AE6T4ht9XwfE6eCUSy48FSMzg
 4IZKLcofQh2UEvlVtCvGOjtcK151eFKocsq+p6/yhYE6BM20Z3dBqhw9iBzqOAPg
 0qZk60AWl9W5h7M+ovkGAZpwXv4djsdWf7LbFuYOQ0kodzXOQ/5sFQm4zC5F3I1Z
 8WPPA9SkYyNiNSBUt65tca4JNw5h6x9A3scPWtuHB7Lz3fuGTnE=
 =frHb
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a crash during NFS READs from certain client implementations

 - Address a minor kbuild regression in v6.3

* tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't replace page in rq_pages if it's a continuation of last page
  NFS & NFSD: Update GSS dependencies
2023-03-21 14:48:38 -07:00
Linus Torvalds 17214b70a1 fsverity fixes for v6.3-rc4
Fix two significant performance issues with fsverity.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZBjIvRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2uuAP9I9h+KjU8cSpF0fS3vHhwmDtqc/vW9
 wfHniylcwK2YnwD/REgCv8yKlJva0+IUHMCGNWUzd0CERfuUFJy1Z7xvCgo=
 =rjIA
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:
 "Fix two significant performance issues with fsverity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
  fsverity: Remove WQ_UNBOUND from fsverity read workqueue
2023-03-20 15:20:33 -07:00
Linus Torvalds 4f1e308df8 fscrypt fixes for v6.3-rc4
Fix a bug where when a filesystem was being unmounted, the fscrypt
 keyring was destroyed before inodes have been released by the Landlock
 LSM.  This bug was found by syzbot.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZBjHAxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK1THAP91M4UXEsVTPw1QI03+72pPy0U4iArf
 j8jhQVL6vjXHnQEA4fmwg3tE5g9F5Qe5mjFOxvoa/Z9xqupPOybOR3z1zQI=
 =S80B
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt fix from Eric Biggers:
 "Fix a bug where when a filesystem was being unmounted, the fscrypt
  keyring was destroyed before inodes have been released by the Landlock
  LSM.

  This bug was found by syzbot"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
  fscrypt: improve fscrypt_destroy_keyring() documentation
  fscrypt: destroy keyring after security_sb_delete()
2023-03-20 15:12:55 -07:00
Damien Le Moal 88b170088a zonefs: Fix error message in zonefs_file_dio_append()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.

Fixes: a608da3bd7 ("zonefs: Detect append writes at invalid locations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-21 06:36:43 +09:00
Damien Le Moal d7e673c2a9 zonefs: Prevent uninitialized symbol 'size' warning
In zonefs_file_dio_append(), initialize the variable size to 0 to
prevent compilation and static code analizers warning such as:

New smatch warnings:
fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized
symbol 'size'.

The warning is a false positive as size is never actually used
uninitialized.

No functional change.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202303191227.GL8Dprbi-lkp@intel.com/
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-21 06:36:34 +09:00
Linus Torvalds 3d3699bde4 NFS Client Bugfixes for Linux 6.3-rc
Bugfixes:
   * Fix /proc/PID/io read_bytes accounting
   * Fix setting NLM file_lock start and end during decoding testargs
   * Fix timing for setting access cache timestamps
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmQYnFoACgkQ18tUv7Cl
 QOv86RAAuDaOVfnPHuM1EMgTiBG+EZnCjMz+ktEuYhKuHdtS2cplq0S5Uygx1Zpl
 Kt9/Ww77k2jOPvsk3R1ejmFv1aiyNz+S9eEHrPp6nfwRhhQBw04fvo+y3mjrkNLq
 p4Y0Wes+9/0SnYftcsrcO/ghsc0t5DLm9eiGZMICes21cnrAO8iY1PY5DX2p7ipb
 O4qEnfCA67Nt+rY2T6lbMoI8kMyN+J4FaFCwPaMv8UDP/RqdG1KR429V46JwrKxR
 1FbG1nKH03U3apmaaGcRVr8meswPetQNWfJnmjsAHur6z8pCZ233oIggj4gnBIdy
 jGDWwQ3lT55+6XvkW4yrgGPUT2oDYiYwW4uk401I+xWJIbNvRMXKPIQWehHMTR4v
 0Xo8rhefPkgHrJkhaoBjvnMeKl79QwGUCSw5ECluxKm9nkDGeVa5kRBFCfkeWgfR
 dGJNUOqd6P3vVdGVvTEYP/q5mrDpr+4ZTyggm2IbP0nKNnYgwPoHVn+Prwsht+wh
 XGmXpu5C07U5fex7uzfmK/S8JWZDkEoNOZJQ8AE+NFm8MZFl8wUIfuyaZS4ItSEk
 odIGQ+PSMbEyfo24WhSk3BLt0cqzWxeweENUxP00PVOyqcYmj5duRP7OIVVHmm/w
 dDwh5tQfykrF7opNMoJBXXAGlwI9O0t0UwBvNasBUiINctrapOw=
 =n/JC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Fix /proc/PID/io read_bytes accounting

 - Fix setting NLM file_lock start and end during decoding testargs

 - Fix timing for setting access cache timestamps

* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Correct timing for assigning access cache timestamp
  lockd: set file_lock start and end when decoding nlm4 testargs
  NFS: Fix /proc/PID/io read_bytes for buffered reads
2023-03-20 10:52:10 -07:00
Darrick J. Wong 3cfb9290da xfs: test dir/attr hash when loading module
Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
ext4 against generic/454.  The cause of this test failure was the
unfortunate combination of setting an xattr name containing UTF8 encoded
emoji, an xattr hash function that accepted a char pointer with no
explicit signedness, signed type extension of those chars to an int, and
the 6.2 build tools maintainers deciding to mandate -funsigned-char
across the board.  As a result, the ondisk extended attribute structure
written out by 6.1 and 6.2 were not the same.

This discrepancy, in fact, had been noticeable if a filesystem with such
an xattr were moved between any two architectures that don't employ the
same signedness of a raw "char" declaration.  The only reason anyone
noticed is that x86 gcc defaults to signed, and no such -funsigned-char
update was made to e2fsprogs, so e2fsck immediately started reporting
data corruption.

After a day and a half of discussing how to handle this use case (xattrs
with bit 7 set anywhere in the name) without breaking existing users,
Linus merged his own patch and didn't tell the maintainer.  None of the
ext4 developers realized this until AUTOSEL announced that the commit
had been backported to stable.

In the end, this problem could have been detected much earlier if there
had been any useful tests of hash function(s) in use inside ext4 to make
sure that they always produce the same outputs given the same inputs.

The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
vulnerable to this problem.  However, let's avoid all this drama by
adding our own self test to check that the da hash produces the same
outputs for a static pile of inputs on various platforms.  This enables
us to fix any breakage that may result in a controlled fashion.  The
buffer and test data are identical to the patches submitted to xfsprogs.

Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:49 -07:00
Darrick J. Wong e6fbb7167e xfs: add tracepoints for each of the externally visible allocators
There are now five separate space allocator interfaces exposed to the
rest of XFS for five different strategies to find space.  Add
tracepoints for each of them so that I can tell from a trace dump
exactly which ones got called and what happened underneath them.  Add a
sixth so it's more obvious if an allocation actually happened.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:49 -07:00
Darrick J. Wong 9eb775968b xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space.  There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.

This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled.  I observed
the following sequence as part of an xfs_dir_createname call:

0. Fragment the free space, then allocate nearly all the free space in
   all AGs except AG 0.

1. Create a directory in AG 2 and let it grow for a while.

2. Try to allocate 2 blocks to expand the dirent part of a directory.
   The space will be allocated out of AG 0, but the allocation will not
   be contiguous.  This (I think) activates the LOWMODE allocator.

3. The bmapi call decides to convert from extents to bmbt format and
   tries to allocate 1 block.  This allocation request calls
   xfs_alloc_vextent_start_ag with the inode number, which starts the
   scan at AG 2.  We ignore AG 0 (with all its free space) and instead
   scrape AG 2 and 3 for more space.  We find one block, but this now
   kicks t_highest_agno to 3.

4. The createname call decides it needs to split the dabtree.  It tries
   to allocate even more space with xfs_alloc_vextent_start_ag, but now
   we're constrained to AG 3, and we don't find the space.  The
   createname returns ENOSPC and the filesystem shuts down.

This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds.  Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:48 -07:00
Linus Torvalds 995bba436b Fix a double unlock bug on an error path in ext4, found by smatch and
syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQWfYcACgkQ8vlZVpUN
 gaNvhAgAqIv3PLgt3SJIC/klXmEsXJn2q1EqAUKk6XHAzppHoUMrnsJJ2Sz+u4NT
 eKx7FGz+dIzJK8lLoX+XNnLgdkkJwklD8jIxAjbnkgHaGJbjdLH3Su4elQvGwLla
 CprPysjMtgny4rogLwz2D+At3w8o7/JACbzlh0QNhaZ5IIP4Kj5QutgyftF1m4Qa
 AfC9KcpJ+poTQTI4m4oOrJCrH6kwJdVItuS5/mk54DTChNNYtJJy2M6eho7IQ9iG
 ENBnt06/uWhdHQz4OYmYn3x1iXYOAdAmeVGtREKqOstUd31VO2TCC2lE+7tSnO2d
 177m3jfTHdv5znU+LfyrS1Y1u1nGBA==
 =AXq7
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Fix a double unlock bug on an error path in ext4, found by smatch and
  syzkaller"

* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix possible double unlock when moving a directory
2023-03-19 09:38:26 -07:00
Eric Biggers 4bcf6f827a fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
It is a bug for fscrypt_put_master_key_activeref() to see a NULL
keyring.  But it used to be possible due to the bug, now fixed, where
fscrypt_destroy_keyring() was called before security_sb_delete().  To be
consistent with how fscrypt_destroy_keyring() uses WARN_ON for the same
issue, WARN and leak the fscrypt_master_key if the keyring is NULL
instead of dereferencing the NULL pointer.

This is a robustness improvement, not a fix.

Link: https://lore.kernel.org/r/20230313221231.272498-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-18 21:08:03 -07:00
Eric Biggers 43e5f1d592 fscrypt: improve fscrypt_destroy_keyring() documentation
Document that fscrypt_destroy_keyring() must be called after all
potentially-encrypted inodes have been evicted.

Link: https://lore.kernel.org/r/20230313221231.272498-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-18 21:08:00 -07:00
Theodore Ts'o 70e42feab2 ext4: fix possible double unlock when moving a directory
Fixes: 0813299c58 ("ext4: Fix possible corruption when moving a directory")
Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain
Reported-by: Dan Carpenter <error27@gmail.com>
Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-17 21:53:52 -04:00
Jeff Layton 27c934dd88 nfsd: don't replace page in rq_pages if it's a continuation of last page
The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.

nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.

If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()") removed the check
for it.

Fixes: 91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dario Lesca <d.lesca@solinos.it>
Tested-by: David Critch <dcritch@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-17 18:18:15 -04:00
Shyam Prasad N 2f0e4f0342 cifs: check only tcon status on tcon related functions
We had a couple of checks for session in cifs_tree_connect
and cifs_mark_open_files_invalid, which were unnecessary.
And that was done with ses_lock. Changed that to tc_lock too.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-17 13:22:22 -05:00
Linus Torvalds 38e04b3e42 seven cifs/smb3 client fixes, all also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmQTfDQACgkQiiy9cAdy
 T1FXOQwAsqPc6ctQnl5V5UuZgfr0DZPurLgbrC35XgEYpHiopa4myZvl4Dc2DDTs
 f+TdLkd3fMBuopjOrO8RBN5JioruLeOLuTILlWwmcFqx9cXdM6aaMmGVAMU5JhhL
 ahgPjSgF2cft/zLBsiLb4N4T1WOctGRSpcAgX7AfQXqufBQ/auqMVldVcgPaMRD/
 Si9R9qem8BXHTR8ZgBXrqzUKova0CN9JQICoV8yp9crK//0npHvgHkIIdflgxLwJ
 IE6D73MYo+xMrzlmV867kRO8xHeq03Er19WF73X1cGsVLJtsmxdbeFd8fXFIrDwf
 4JRn5vvXiuHPIPqtMpdqcSU1/W1YP4R609cSketlpbqTWDvuGU1myN9tUqYYnYGv
 KyNSOQi+p4f/rC/8UdzCgbSAlt0wqoQJm/nV97uom5OWEjS3hgB8Tqh1akZPrwzc
 7tXmJ4lETdUFSZCApmCAqHRwoG7WvnchCFxqNG85PCTybVuIEKDMjMkBl19pD8d1
 ZfRHDulA
 =iLy+
 -----END PGP SIGNATURE-----

Merge tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Seven cifs/smb3 client fixes, all also for stable:

   - four DFS fixes

   - multichannel reconnect fix

   - fix smb1 stats for cancel command

   - fix for set file size error path"

* tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: use DFS root session instead of tcon ses
  cifs: return DFS root session id in DebugData
  cifs: fix use-after-free bug in refresh_cache_worker()
  cifs: set DFS root session in cifs_get_smb_ses()
  cifs: generate signkey for the channel that's reconnecting
  cifs: Fix smb2_set_path_size()
  cifs: Move the in_send statistic to __smb_send_rqst()
2023-03-16 15:06:16 -07:00
Darrick J. Wong 6de4b1ab47 xfs: try to idiot-proof the allocators
In porting his development branch to 6.3-rc1, yours truly has
repeatedly screwed up the args->pag being fed to the xfs_alloc_vextent*
functions.  Add some debugging assertions to test the preconditions
required of the callers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-16 09:32:18 -07:00
Eric Biggers a075bacde2 fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity.  It was
intended to solve a race condition where unverified pages might be left
in the pagecache.  But actually it doesn't solve it fully.

Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.

Fixes: 3fda4c617e ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-15 22:50:41 -07:00
Naohiro Aota e15acc2588 btrfs: zoned: drop space_info->active_total_bytes
The space_info->active_total_bytes is no longer necessary as we now
count the region of newly allocated block group as zone_unusable. Drop
its usage.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Naohiro Aota fa2068d7e9 btrfs: zoned: count fresh BG region as zone unusable
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.

For a background, there are three kinds of block groups in terms of
activation.

  1. Block groups never activated
  2. Block groups currently active
  3. Block groups previously active and currently inactive (due to fully
     written or zone finish)

What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.

And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.

OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.

Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.

Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 ..  zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Josef Bacik df384da5a4 btrfs: use temporary variable for space_info in btrfs_update_block_group
We do

  cache->space_info->counter += num_bytes;

everywhere in here.  This is makes the lines longer than they need to
be, and will be especially noticeable when we add the active tracking in,
so add a temp variable for the space_info so this is cleaner.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Josef Bacik bf1f1fec27 btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
This flag only gets set when we're doing active zone tracking, and we're
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Naohiro Aota 9e1cdf0c35 btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs_can_activate_zone() returns true if at least one device has one zone
available for activation. This is OK for the single profile, but not OK for
DUP profile. We need two zones to create a DUP block group. Fix it by
properly handling the case with the profile flags.

Fixes: 265f7237dd ("btrfs: zoned: allow DUP on meta-data block groups")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Sweet Tea Dorminy 10a8857a1b btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
Commit 1ec49744ba ("btrfs: turn on -Wmaybe-uninitialized") exposed
that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename()
only returns negative error values or 0. This ultimately results in a
maybe-uninitialized warning in btrfs_lookup_dentry().

Change to only return negative error values or 0 from
fscrypt_setup_filename() at the relevant call site, and assert that no
positive error codes are returned (which would have wider implications
involving other users).

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Qu Wenruo 1c3ab6dfa0 btrfs: handle missing chunk mapping more gracefully
[BUG]
During my scrub rework, I did a stupid thing like this:

        bio->bi_iter.bi_sector = stripe->logical;
        btrfs_submit_bio(fs_info, bio, stripe->mirror_num);

Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".

This results a read on a range which has no chunk mapping.

This results the following crash:

  BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
  assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387

Sure this is all my fault, but this shows a possible problem in real
world, that some bit flip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer access.

[PROBLEMS]
In the above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().

[FIX]
The fix is straightforward, replace the ASSERT() with proper error
handling (callers handle errors already).

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:05 +01:00
Paulo Alcantara 6284e46bdd cifs: use DFS root session instead of tcon ses
Use DFS root session whenever possible to get new DFS referrals
otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that
doesn't respond to them.  It should be safe accessing
@ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it
has same lifetime as of @tcon.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 22:48:53 -05:00
Paulo Alcantara f446a63080 cifs: return DFS root session id in DebugData
Return the DFS root session id in /proc/fs/cifs/DebugData to make it
easier to track which IPC tcon was used to get new DFS referrals for a
specific connection, and aids in debugging.

A simple output of it would be

  Sessions:
  1) Address: 192.168.1.13 Uses: 1 Capability: 0x300067   Session Status: 1
  Security type: RawNTLMSSP  SessionId: 0xd80000000009
  User: 0 Cred User: 0
  DFS root session id: 0x128006c000035

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:43:23 -05:00
Paulo Alcantara 396935de14 cifs: fix use-after-free bug in refresh_cache_worker()
The UAF bug occurred because we were putting DFS root sessions in
cifs_umount() while DFS cache refresher was being executed.

Make DFS root sessions have same lifetime as DFS tcons so we can avoid
the use-after-free bug is DFS cache refresher and other places that
require IPCs to get new DFS referrals on.  Also, get rid of mount
group handling in DFS cache as we no longer need it.

This fixes below use-after-free bug catched by KASAN

[ 379.946955] BUG: KASAN: use-after-free in __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.947642] Read of size 8 at addr ffff888018f57030 by task kworker/u4:3/56
[ 379.948096]
[ 379.948208] CPU: 0 PID: 56 Comm: kworker/u4:3 Not tainted 6.2.0-rc7-lku #23
[ 379.948661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[ 379.949368] Workqueue: cifs-dfscache refresh_cache_worker [cifs]
[ 379.949942] Call Trace:
[ 379.950113] <TASK>
[ 379.950260] dump_stack_lvl+0x50/0x67
[ 379.950510] print_report+0x16a/0x48e
[ 379.950759] ? __virt_addr_valid+0xd8/0x160
[ 379.951040] ? __phys_addr+0x41/0x80
[ 379.951285] kasan_report+0xdb/0x110
[ 379.951533] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.952056] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.952585] __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.953096] ? __pfx___refresh_tcon.isra.0+0x10/0x10 [cifs]
[ 379.953637] ? __pfx___mutex_lock+0x10/0x10
[ 379.953915] ? lock_release+0xb6/0x720
[ 379.954167] ? __pfx_lock_acquire+0x10/0x10
[ 379.954443] ? refresh_cache_worker+0x34e/0x6d0 [cifs]
[ 379.954960] ? __pfx_wb_workfn+0x10/0x10
[ 379.955239] refresh_cache_worker+0x4ad/0x6d0 [cifs]
[ 379.955755] ? __pfx_refresh_cache_worker+0x10/0x10 [cifs]
[ 379.956323] ? __pfx_lock_acquired+0x10/0x10
[ 379.956615] ? read_word_at_a_time+0xe/0x20
[ 379.956898] ? lockdep_hardirqs_on_prepare+0x12/0x220
[ 379.957235] process_one_work+0x535/0x990
[ 379.957509] ? __pfx_process_one_work+0x10/0x10
[ 379.957812] ? lock_acquired+0xb7/0x5f0
[ 379.958069] ? __list_add_valid+0x37/0xd0
[ 379.958341] ? __list_add_valid+0x37/0xd0
[ 379.958611] worker_thread+0x8e/0x630
[ 379.958861] ? __pfx_worker_thread+0x10/0x10
[ 379.959148] kthread+0x17d/0x1b0
[ 379.959369] ? __pfx_kthread+0x10/0x10
[ 379.959630] ret_from_fork+0x2c/0x50
[ 379.959879] </TASK>

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:07:44 -05:00
Paulo Alcantara b56bce502f cifs: set DFS root session in cifs_get_smb_ses()
Set the DFS root session pointer earlier when creating a new SMB
session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon()
and DFS cache refresher.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:05:53 -05:00
Linus Torvalds 26e2878b3e Eleven hotfixes. Four of these are cc:stable and the remainder address
post-6.2 issues or aren't considered suitable for backporting.  Seven of
 these fixes are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZBEItgAKCRDdBJ7gKXxA
 jsF/AP0ToKnDwmZ1SJOGK3pFiVGVy7VSaq1THrnLQoC57l8jTAD+PReSZMNXaxhB
 8701hVQcxKAiu9wAvowSd+lOvpwHMwQ=
 =IANU
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Eleven hotfixes.

  Four of these are cc:stable and the remainder address post-6.2 issues
  or aren't considered suitable for backporting.

  Seven of these fixes are for MM"

* tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate()
  mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young()
  ocfs2: fix data corruption after failed write
  migrate_pages: try migrate in batch asynchronously firstly
  migrate_pages: move split folios processing out of migrate_pages_batch()
  migrate_pages: fix deadlock in batched migration
  .mailmap: add Alexandre Ghiti personal email address
  mailmap: correct Dikshita Agarwal's Qualcomm email address
  mailmap: updates for Jarkko Sakkinen
  mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage
  mm: teach mincore_hugetlb about pte markers
2023-03-14 17:13:58 -07:00
Nathan Huckleberry f959325e6a fsverity: Remove WQ_UNBOUND from fsverity read workqueue
WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
is problematic for latency sensitive workloads, like I/O
post-processing.

Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.
WQ_UNBOUND was also removed from the dm-verity workqueue for the same
reason [1].

This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the runnable state.

Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us

[1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Fixes: 8a1d0f9cac ("fs-verity: add data verification hooks for ->readpages()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-14 16:16:36 -07:00
Shyam Prasad N 05ce0448c3 cifs: generate signkey for the channel that's reconnecting
Before my changes to how multichannel reconnects work, the
primary channel was always used to do a non-binding session
setup. With my changes, that is not the case anymore.
Missed this place where channel at index 0 was forcibly
updated with the signing key.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 17:36:49 -05:00
Volker Lendecke 211baef0ea cifs: Fix smb2_set_path_size()
If cifs_get_writable_path() finds a writable file, smb2_compound_op()
must use that file's FID and not the COMPOUND_FID.

Cc: stable@vger.kernel.org
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 17:36:44 -05:00
Chengen Du 21fd9e8700 NFS: Correct timing for assigning access cache timestamp
When the user's login time is newer than the cache's timestamp,
the original entry in the RB-tree will be replaced by a new entry.
Currently, the timestamp is only set if the entry is not found in
the RB-tree, which can cause the timestamp to be undefined when
the entry exists. This may result in a significant increase in
ACCESS operations if the timestamp is set to zero.

Signed-off-by: Chengen Du <chengen.du@canonical.com>
Fixes: 0eb43812c0 ("NFS: Clear the file access cache upon login”)
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 15:19:44 -04:00
Jeff Layton 7ff84910c6 lockd: set file_lock start and end when decoding nlm4 testargs
Commit 6930bcbfb6 dropped the setting of the file_lock range when
decoding a nlm_lock off the wire. This causes the client side grant
callback to miss matching blocks and reject the lock, only to rerequest
it 30s later.

Add a helper function to set the file_lock range from the start and end
values that the protocol uses, and have the nlm_lock decoder call that to
set up the file_lock args properly.

Fixes: 6930bcbfb6 ("lockd: detect and reject lock arguments that overflow")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org #6.0
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 14:00:55 -04:00
Dave Wysochanski 9c88ea00fe NFS: Fix /proc/PID/io read_bytes for buffered reads
Prior to commit 8786fde842 ("Convert NFS from readpages to
readahead"), nfs_readpages() used the old mm interface read_cache_pages()
which called task_io_account_read() for each NFS page read.  After
this commit, nfs_readpages() is converted to nfs_readahead(), which
now uses the new mm interface readahead_page().  The new interface
requires callers to call task_io_account_read() themselves.
In addition, to nfs_readahead() task_io_account_read() should also
be called from nfs_read_folio().

Fixes: 8786fde842 ("Convert NFS from readpages to readahead")
Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 13:47:43 -04:00
Eric Biggers ccb820dc7d fscrypt: destroy keyring after security_sb_delete()
fscrypt_destroy_keyring() must be called after all potentially-encrypted
inodes were evicted; otherwise it cannot safely destroy the keyring.
Since inodes that are in-use by the Landlock LSM don't get evicted until
security_sb_delete(), this means that fscrypt_destroy_keyring() must be
called *after* security_sb_delete().

This fixes a WARN_ON followed by a NULL dereference, only possible if
Landlock was being used on encrypted files.

Fixes: d7e7b9af10 ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
Cc: stable@vger.kernel.org
Reported-by: syzbot+93e495f6a4f748827c88@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000044651705f6ca1e30@google.com
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230313221231.272498-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-14 10:30:30 -07:00
Linus Torvalds 2e545d69bd Fixes for 6.3-rc1:
* Fix a crash if mount time quotacheck fails when there are inodes
    queued for garbage collection.
  * Fix an off by one error when discarding folios after writeback
    failure.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZAYvQwAKCRBKO3ySh0YR
 pjMXAP9X9HozNYESlg/cMq6nY2XfbHIR2qvNOfopiRpWby5xQAEAqBiEhafIJ0A1
 mTt+0TqQxDsH+uxr/QEUm76Q7F3f1gE=
 =0zU0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

 - Fix a crash if mount time quotacheck fails when there are inodes
   queued for garbage collection.

 - Fix an off by one error when discarding folios after writeback
   failure.

* tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix off-by-one-block in xfs_discard_folio()
  xfs: quotacheck failure can race with background inode inactivation
2023-03-12 09:47:08 -07:00
Linus Torvalds 3b11717f95 vfs.misc.v6.3-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZA2yXAAKCRCRxhvAZXjc
 opK7AP0fqkk75P1bRZL36iNOgCV0RDiSN/Ynk/oMYpsOyBndlAD7BKCEZFF2OKzP
 aeJrY0F+guwL67X+18X+yiLZrk2rag4=
 =2Wa/
 -----END PGP SIGNATURE-----

Merge tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs fixes from Christian Brauner:

 - When allocating pages for a watch queue failed, we didn't return an
   error causing userspace to proceed even though all subsequent
   notifcations would be lost. Make sure to return an error.

 - Fix a misformed tree entry for the idmapping maintainers entry.

 - When setting file leases from an idmapped mount via
   generic_setlease() we need to take the idmapping into account
   otherwise taking a lease would fail from an idmapped mount.

 - Remove two redundant assignments, one in splice code and the other in
   locks code, that static checkers complained about.

* tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  filelocks: use mount idmapping for setlease permission check
  fs/locks: Remove redundant assignment to cmd
  splice: Remove redundant assignment to ret
  MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS
  watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
2023-03-12 09:00:54 -07:00
Linus Torvalds 40d0c0901e Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
 the merge window discovered by a combination of syzbot and lockdep.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQNVwIACgkQ8vlZVpUN
 gaMwmgf/ZAasXZEMV0zaQZa8zP4KvMKZjWe6azkcJg4sb/HG9Q7JzeJDCurhhWUj
 8+QnyUcuKTyWKYWjGf0f5CZaYEM5AZYij41UJzu2qMkz5hVXSqBVuY8KywxuiJv5
 kfuIvQh0Onv0Yrg2qAc52/kZkq1lu2sl/F5ertBWjdpTUXdBUdrCxkUk+1BgQWAj
 vNwi1/+gNuX7RxMboHqYmwXFP39vECd+wteNdsiK1hR8bLqL68duLLq8xQdHt4gS
 sbVmJKR4j2Giw4ZnlYi9RiwKIO0beqocanp+cfOPulyj5mTM8X1lr0uvaLZgx2AF
 lqrS3/5ksp45cRT70qCIz8je70hTSg==
 =nN3T
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes and regressions for ext4, the most serious of which is a
  potential deadlock during directory renames that was introduced during
  the merge window discovered by a combination of syzbot and lockdep"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: zero i_disksize when initializing the bootloader inode
  ext4: make sure fs error flag setted before clear journal error
  ext4: commit super block if fs record error when journal record without error
  ext4, jbd2: add an optimized bmap for the journal inode
  ext4: fix WARNING in ext4_update_inline_data
  ext4: move where set the MAY_INLINE_DATA flag is set
  ext4: Fix deadlock during directory rename
  ext4: Fix comment about the 64BIT feature
  docs: ext4: modify the group desc size to 64
  ext4: fix another off-by-one fsmap error on 1k block filesystems
  ext4: fix RENAME_WHITEOUT handling for inline directories
  ext4: make kobj_type structures constant
  ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-12 08:55:55 -07:00
Zhihao Cheng f5361da1e6 ext4: zero i_disksize when initializing the bootloader inode
If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin f57886ca16 ext4: make sure fs error flag setted before clear journal error
Now, jounral error number maybe cleared even though ext4_commit_super()
failed. This may lead to error flag miss, then fsck will miss to check
file system deeply.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Ye Bin eee00237fa ext4: commit super block if fs record error when journal record without error
Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Theodore Ts'o 62913ae96d ext4, jbd2: add an optimized bmap for the journal inode
The generic bmap() function exported by the VFS takes locks and does
checks that are not necessary for the journal inode.  So allow the
file system to set a journal-optimized bmap function in
journal->j_bmap.

Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin 2b96b4a5d9 ext4: fix WARNING in ext4_update_inline_data
Syzbot found the following issue:
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none.
fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni"
fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
Modules linked in:
CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246
RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000
RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248
RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220
R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40
R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c
FS:  0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __alloc_pages_node include/linux/gfp.h:237 [inline]
 alloc_pages_node include/linux/gfp.h:260 [inline]
 __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113
 __do_kmalloc_node mm/slab_common.c:956 [inline]
 __kmalloc+0xfe/0x190 mm/slab_common.c:981
 kmalloc include/linux/slab.h:584 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346
 ext4_update_inline_dir fs/ext4/inline.c:1115 [inline]
 ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307
 ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385
 ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772
 ext4_create+0x36c/0x560 fs/ext4/namei.c:2817
 lookup_open fs/namei.c:3413 [inline]
 open_last_lookups fs/namei.c:3481 [inline]
 path_openat+0x12ac/0x2dd0 fs/namei.c:3711
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_openat fs/open.c:1342 [inline]
 __se_sys_openat fs/open.c:1337 [inline]
 __x64_sys_openat+0x243/0x290 fs/open.c:1337
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue happens as follows:
ext4_iget
   ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60
ext4_try_add_inline_entry
   __ext4_mark_inode_dirty
      ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44
         ext4_xattr_shift_entries
	 ->after shift i_inline_off is incorrect, actually is change to 176
ext4_try_add_inline_entry
  ext4_update_inline_dir
    get_max_inline_xattr_value_size
      if (EXT4_I(inode)->i_inline_off)
	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
			EXT4_I(inode)->i_inline_off);
        free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
	->As entry is incorrect, then 'free' may be negative
   ext4_update_inline_data
      value = kzalloc(len, GFP_NOFS);
      -> len is unsigned int, maybe very large, then trigger warning when
         'kzalloc()'

To resolve the above issue we need to update 'i_inline_off' after
'ext4_xattr_shift_entries()'.  We do not need to set
EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty()
already sets this flag if needed.  Setting EXT4_STATE_MAY_INLINE_DATA
when it is needed may trigger a BUG_ON in ext4_writepages().

Reported-by: syzbot+d30838395804afc2fa6f@syzkaller.appspotmail.com
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-3-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin 1dcdce5919 ext4: move where set the MAY_INLINE_DATA flag is set
The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Linus Torvalds 4831f76247 pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZAuQ7gAKCRBZ7Krx/gZQ
 6xSmAPsFPc3ykvOWwCl7eTGS65gHZpK80e5lX9kZB8KIa5JjaAEA551vgRWi34+D
 PWvDDpN1QUFL6HHL+FR7heLJr2SKIwA=
 =RsPH
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc fixes from Al Viro:
 "pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)

  The fs/file.c one is a genuine missing speculation barrier in
  pick_file() (reachable e.g. via close(2)). The alpha one is strictly
  speaking not a bug fix, but only because confusion between
  preempt_enable() and preempt_disable() is harmless on architecture
  without CONFIG_PREEMPT.

  Looks like alpha.git picked the wrong version of patch - that braino
  used to be there in early versions, but it had been fixed quite a
  while ago..."

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: prevent out-of-bounds array speculation when closing a file descriptor
  alpha: fix lazy-FPU mis(merged/applied/whatnot)
2023-03-10 19:04:10 -08:00
Linus Torvalds 388a810192 Changes since last update:
- Fix LZMA decompression failure on HIGHMEM platforms;
 
  - Revert an inproper fix since it is actually an implementation
    issue of vmalloc();
 
  - Avoid a wrong DBG_BUGON since it could be triggered with -EINTR;
 
  - Minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZAtOhBEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBKamAQC4njqAPMt1SPJU1HYACnS8TuNIC0CO2eT6
 gU11ja+AZwEAwwyjucoEirD1xCDBlTOUD2fPm3W87RVNr2juiZg2OA8=
 =Dj3V
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.3-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "The most important one reverts an improper fix which can cause an
  unexpected warning more often on specific images, and another one
  fixes LZMA decompression on 32-bit platforms. The others are minor
  fixes and cleanups.

   - Fix LZMA decompression failure on HIGHMEM platforms

   - Revert an inproper fix since it is actually an implementation issue
     of vmalloc()

   - Avoid a wrong DBG_BUGON since it could be triggered with -EINTR

   - Minor cleanups"

* tag 'erofs-for-6.3-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use wrapper i_blocksize() in erofs_file_read_iter()
  erofs: get rid of a useless DBG_BUGON
  erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
  erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
  erofs: mark z_erofs_lzma_init/erofs_pcpubuf_init w/ __init
2023-03-10 08:51:57 -08:00
Linus Torvalds 92cadfcffa nfsd-6.3 fixes:
- Protect NFSD writes against filesystem freezing
 - Fix a potential memory leak during server shutdown
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQLPvQACgkQM2qzM29m
 f5crgRAAiuZvYynBppTxeEZ9ITLRJCgeYhTCJDBFngh3NZtW0UxrTUfYIry/MJTm
 j520mvdzx8+OVyasUJCMoy+dY/1toeCaCspfEdyul47JRVrxuzJokbYzyV5fFZav
 q3vDNHdBXOkrJkXvJd+VtbdZa2iYBMIQ1pTc6vG2FbUaxfNqZyg7uKug/R2Q/Ipj
 tF587jXQuMvpPwlXOeFol3fhLQCO9KjTrZF/d4+m5+KTSQ4sgzrgoWuVD/39Is4Q
 bVx5hBX74PFNHaFMqviIjtIEFjrOfe6wVjdNgbmpzbEiUm9VGZ6zJ2k+LsnawMIw
 8W3178Yjp+SEb21X+b+pe7/efIqD01SOX36zUOETL9gOvX3aMHbTuDGERQlCBUdZ
 9pcUwWfzeGzQSsnEvEi05HlnrDQpeavAc4tVYWgiEvO2S8cngXgsuoGhUO2ov8qY
 scq4PQGclpmR6IMYDJLL6JbOvfBDIMoFHGbDbdkjpkkhoRHgtQgnRebOgB20bxom
 D3XPk39U2CbVX9UJqgPN0yvA8L4biG4RPNZZNOjFtOup6oggpzwgN2M9h1ZhuulG
 bTOjHWl9NJmSNIYUDzOyH7m98UCI1pEXcH1kkV7vjjAIAPbn3ovXgzMjSBqDVvNj
 tyqGujS4snRb5/NI8g6O+VyTbMSnimu+tZpUvKH+4oSBH9ZAUVI=
 =z6NN
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Protect NFSD writes against filesystem freezing

 - Fix a potential memory leak during server shutdown

* tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix a server shutdown leak
  NFSD: Protect against filesystem freezing
2023-03-10 08:45:30 -08:00
Linus Torvalds ae195ca1a8 for-6.3-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmQKUxwACgkQxWXV+ddt
 WDtPMg//RHAnHYRm+sHkXfRhz/+kWhipPo1OskLE5aYZaP1MSpk0NfNc1c6ZYwcg
 FQNeNQOooqBIYFpLeery14vw/FpFc/tivw7OP4XmtH9Jeyj6mwgAQpP5Gho8jDmm
 u90jf2UMwA+7qo57e9qfioufiZPGMsNnmK1BwdrcbuUZIz5UEZZ6u6BVhVFnEDGa
 y08Uv03t9g5F7msXfh4iBaPeJRgdWL7kiZfhFyCa6OHKiGOT39hYXn0ov1pET/yG
 IMECrX+BKiunABExHDN9VbW1AVWGmsvGjFYpZQnAWCm37cr3Mc7ngIz1FBF8hm+L
 9Cd07GhBOPaKzFI+uAzVJrA0QkKnI8Wgd1YT3LWWT0qj5gpPA5YL4G0V4KLzPBOt
 TBe4dW7g4o4EXsYBJzYwiLjHILZyydkPKEQ78Bt2mwjdGs4PYNBGwyl0I2bV/pV+
 dKGv+KOsiX2euPFtwVaIG5u8gEBCCoiKSO+HwphtfWyxnEE5/uvw0fdSJlKNt1Yj
 28f+qyzN9WuNK/aSxI+KfW4PAXvkoLi7w8tjyJp3vpj6VnSmaFf2EtGiKtGSmLVn
 3uSY8WZ24FdOHNV5QaliABGt/SaLG0rbLC8uPocryh0aW9xkMpvVVYPfTJmyWmxy
 kc5dfDhUinp5I0wLTtjRH407bB0CdukgpxOrN6GELqPufm7YvQk=
 =rJlY
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "First batch of fixes. Among them there are two updates to sysfs and
  ioctl which are not strictly fixes but are used for testing so there's
  no reason to delay them.

   - fix block group item corruption after inserting new block group

   - fix extent map logging bit not cleared for split maps after
     dropping range

   - fix calculation of unusable block group space reporting bogus
     values due to 32/64b division

   - fix unnecessary increment of read error stat on write error

   - improve error handling in inode update

   - export per-device fsid in DEV_INFO ioctl to distinguish seeding
     devices, needed for testing

   - allocator size classes:
      - fix potential dead lock in size class loading logic
      - print sysfs stats for the allocation classes"

* tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix block group item corruption after inserting new block group
  btrfs: fix extent map logging bit not cleared for split maps after dropping range
  btrfs: fix percent calculation for bg reclaim message
  btrfs: fix unnecessary increment of read error stat on write error
  btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
  btrfs: ioctl: return device fsid from DEV_INFO ioctl
  btrfs: fix potential dead lock in size class loading logic
  btrfs: sysfs: add size class stats
2023-03-10 08:39:13 -08:00
Chuck Lever e57d065277 NFS & NFSD: Update GSS dependencies
Geert reports that:
> On v6.2, "make ARCH=m68k defconfig" gives you
> CONFIG_RPCSEC_GSS_KRB5=m
> On v6.3, it became builtin, due to dropping the dependencies on
> the individual crypto modules.
>
> $ grep -E "CRYPTO_(MD5|DES|CBC|CTS|ECB|HMAC|SHA1|AES)" .config
> CONFIG_CRYPTO_AES=y
> CONFIG_CRYPTO_AES_TI=m
> CONFIG_CRYPTO_DES=m
> CONFIG_CRYPTO_CBC=m
> CONFIG_CRYPTO_CTS=m
> CONFIG_CRYPTO_ECB=m
> CONFIG_CRYPTO_HMAC=m
> CONFIG_CRYPTO_MD5=m
> CONFIG_CRYPTO_SHA1=m

This behavior is triggered by the "default y" in the definition of
RPCSEC_GSS.

The "default y" was added in 2010 by commit df486a2590 ("NFS: Fix
the selection of security flavours in Kconfig"). However,
svc_gss_principal was removed in 2012 by commit 03a4e1f6dd
("nfsd4: move principal name into svc_cred"), so the 2010 fix is
no longer necessary. We can safely change the NFS_V4 and NFSD_V4
dependencies back to RPCSEC_GSS_KRB5 to get the nicer v6.2
behavior back.

Selecting KRB5 symbolically represents the true requirement here:
that all spec-compliant NFSv4 implementations must have Kerberos
available to use.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: dfe9a12345 ("SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-10 09:38:47 -05:00
Theodore Ts'o 609d544414 fs: prevent out-of-bounds array speculation when closing a file descriptor
Google-Bug-Id: 114199369
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-09 22:46:21 -05:00
Seth Forshee 42d0c4bdf7
filelocks: use mount idmapping for setlease permission check
A user should be allowed to take out a lease via an idmapped mount if
the fsuid matches the mapped uid of the inode. generic_setlease() is
checking the unmapped inode uid, causing these operations to be denied.

Fix this by comparing against the mapped inode uid instead of the
unmapped uid.

Fixes: 9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")
Cc: stable@vger.kernel.org
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 22:36:12 +01:00
Yue Hu 3993f4f456 erofs: use wrapper i_blocksize() in erofs_file_read_iter()
linux/fs.h has a wrapper for this operation.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230306075527.1338-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:04 +08:00
Gao Xiang 9ff471800b erofs: get rid of a useless DBG_BUGON
`err` could be -EINTR and it should not be the case.  Actually such
DBG_BUGON is useless.

Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230309053148.9223-2-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:01 +08:00
Gao Xiang 647dd2c3f0 erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
Let's revert commit 12724ba389 ("erofs: fix kvcalloc() misuse with
__GFP_NOFAIL") since kvmalloc() already supports __GFP_NOFAIL in commit
a421ef3030 ("mm: allow !GFP_KERNEL allocations for kvmalloc").  So
the original fix was wrong.

Actually there was some issue as [1] discussed, so before that mm fix
is landed, the warn could still happen but applying this commit first
will cause less.

[1] https://lore.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com

Fixes: 12724ba389 ("erofs: fix kvcalloc() misuse with __GFP_NOFAIL")
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230309053148.9223-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:01 +08:00
Gao Xiang 8f121dfb15 erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
As the call trace shown, the root cause is kunmap incorrect pages:

 BUG: kernel NULL pointer dereference, address: 00000000
 CPU: 1 PID: 40 Comm: kworker/u5:0 Not tainted 6.2.0-rc5 #4
 Workqueue: erofs_worker z_erofs_decompressqueue_work
 EIP: z_erofs_lzma_decompress+0x34b/0x8ac
  z_erofs_decompress+0x12/0x14
  z_erofs_decompress_queue+0x7e7/0xb1c
  z_erofs_decompressqueue_work+0x32/0x60
  process_one_work+0x24b/0x4d8
  ? process_one_work+0x1a4/0x4d8
  worker_thread+0x14c/0x3fc
  kthread+0xe6/0x10c
  ? rescuer_thread+0x358/0x358
  ? kthread_complete_and_exit+0x18/0x18
  ret_from_fork+0x1c/0x28
 ---[ end trace 0000000000000000 ]---

The bug is trivial and should be fixed now.  It has no impact on
!HIGHMEM platforms.

Fixes: 622ceaddb7 ("erofs: lzma compression support")
Cc: <stable@vger.kernel.org> # 5.16+
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230305134455.88236-1-hsiangkao@linux.alibaba.com
2023-03-09 22:49:57 +08:00
Yangtao Li a279adedbb erofs: mark z_erofs_lzma_init/erofs_pcpubuf_init w/ __init
They are used during the erofs module init phase. Let's mark it as
__init like any other function.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230303063731.66760-1-frank.li@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 22:49:30 +08:00
Jiapeng Chong dc592190a5
fs/locks: Remove redundant assignment to cmd
Variable 'cmd' set but not used.

fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 10:17:04 +01:00
Jiapeng Chong c3a4aec055
splice: Remove redundant assignment to ret
The variable ret belongs to redundant assignment and can be deleted.

fs/splice.c:940:2: warning: Value stored to 'ret' is never read.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4406
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 10:10:31 +01:00
Jan Kara 3c92792da8 ext4: Fix deadlock during directory rename
As lockdep properly warns, we should not be locking i_rwsem while having
transactions started as the proper lock ordering used by all directory
handling operations is i_rwsem -> transaction start. Fix the lock
ordering by moving the locking of the directory earlier in
ext4_rename().

Reported-by: syzbot+9d16c39efb5fade84574@syzkaller.appspotmail.com
Fixes: 0813299c58 ("ext4: Fix possible corruption when moving a directory")
Link: https://syzkaller.appspot.com/bug?extid=9d16c39efb5fade84574
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230301141004.15087-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 21:45:38 -05:00
Tudor Ambarus 7fc1f5c28a ext4: Fix comment about the 64BIT feature
64BIT is part of the incompatible feature set, update the comment
accordingly.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20230301133842.671821-1-tudor.ambarus@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:27:54 -05:00
Darrick J. Wong c993799baf ext4: fix another off-by-one fsmap error on 1k block filesystems
Apparently syzbot figured out that issuing this FSMAP call:

struct fsmap_head cmd = {
	.fmh_count	= ...;
	.fmh_keys	= {
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
	},
...
};
ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);

Produces this crash if the underlying filesystem is a 1k-block ext4
filesystem:

kernel BUG at fs/ext4/ext4.h:3331!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
Call Trace:
 <TASK>
 ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf20558aff
RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000

For GETFSMAP calls, the caller selects a physical block device by
writing its block number into fsmap_head.fmh_keys[01].fmr_device.
To query mappings for a subrange of the device, the starting byte of the
range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.

IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
set the inputs as follows:

	fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
	fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},

Which would return you whatever is mapped in the 12 bytes starting at
physical offset 3.

The crash is due to insufficient range validation of keys[1] in
ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
the filesystem, which means that s_first_data_block is nonzero.
ext4_get_group_no_and_offset subtracts this quantity from the blocknr
argument before cracking it into a group number and a block number
within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
instead of 0-8191 (0-based) like what happens with larger blocksizes.

The net result of this encoding is that blocknr < s_first_data_block is
not a valid input to this function.  The end_fsb variable is set from
the keys that are copied from userspace, which means that in the above
example, its value is zero.  That leads to an underflow here:

	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);

The division then operates on -1:

	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
		EXT4_SB(sb)->s_cluster_bits;

Leaving an impossibly large group number (2^32-1) in blocknr.
ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
keys[1].fmr_physical are in increasing order, but
ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
s_first_data_block.  This implies that we have to check it again after
the adjustment, which is the piece that I forgot.

Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Fixes: 4a4956249d ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Cc: stable@vger.kernel.org
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Eric Whitney c9f62c8b2d ext4: fix RENAME_WHITEOUT handling for inline directories
A significant number of xfstests can cause ext4 to log one or more
warning messages when they are run on a test file system where the
inline_data feature has been enabled.  An example:

"EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode
 #16385: comm fsstress: No space for directory leaf checksum. Please
run e2fsck -D."

The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068,
070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585,
589, 626, 631, and 650.

In this situation, the warning message indicates a bug in the code that
performs the RENAME_WHITEOUT operation on a directory entry that has
been stored inline.  It doesn't detect that the directory is stored
inline, and incorrectly attempts to compute a dirent block checksum on
the whiteout inode when creating it.  This attempt fails as a result
of the integrity checking in get_dirent_tail (usually due to a failure
to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message
is then emitted.

Fix this by simply collecting the inlined data state at the time the
search for the source directory entry is performed.  Existing code
handles the rest, and this is sufficient to eliminate all spurious
warning messages produced by the tests above.  Go one step further
and do the same in the code that resets the source directory entry in
the event of failure.  The inlined state should be present in the
"old" struct, but given the possibility of a race there's no harm
in taking a conservative approach and getting that information again
since the directory entry is being reread anyway.

Fixes: b7ff91fd03 ("ext4: find old entry again if failed to rename whiteout")
Cc: stable@kernel.org
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Thomas Weißschuh 60db00e0a1 ext4: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230209-kobj_type-ext4-v1-1-6865fb05c1f8@weissschuh.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Eric Biggers ffec85d53d ext4: fix cgroup writeback accounting with fs-layer encryption
When writing a page from an encrypted file that is using
filesystem-layer encryption (not inline encryption), ext4 encrypts the
pagecache page into a bounce page, then writes the bounce page.

It also passes the bounce page to wbc_account_cgroup_owner().  That's
incorrect, because the bounce page is a newly allocated temporary page
that doesn't have the memory cgroup of the original pagecache page.
This makes wbc_account_cgroup_owner() not account the I/O to the owner
of the pagecache page as it should.

Fix this by always passing the pagecache page to
wbc_account_cgroup_owner().

Fixes: 001e4a8775 ("ext4: implement cgroup writeback support")
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:12:30 -05:00
Jan Kara via Ocfs2-devel 90410bcf87 ocfs2: fix data corruption after failed write
When buffered write fails to copy data into underlying page cache page,
ocfs2_write_end_nolock() just zeroes out and dirties the page.  This can
leave dirty page beyond EOF and if page writeback tries to write this page
before write succeeds and expands i_size, page gets into inconsistent
state where page dirty bit is clear but buffer dirty bits stay set
resulting in page data never getting written and so data copied to the
page is lost.  Fix the problem by invalidating page beyond EOF after
failed write.

Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz
Fixes: 6dbf7bb555 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-07 17:04:55 -08:00
Filipe Manana 675dfe1223 btrfs: fix block group item corruption after inserting new block group
We can often end up inserting a block group item, for a new block group,
with a wrong value for the used bytes field.

This happens if for the new allocated block group, in the same transaction
that created the block group, we have tasks allocating extents from it as
well as tasks removing extents from it.

For example:

1) Task A creates a metadata block group X;

2) Two extents are allocated from block group X, so its "used" field is
   updated to 32K, and its "commit_used" field remains as 0;

3) Transaction commit starts, by some task B, and it enters
   btrfs_start_dirty_block_groups(). There it tries to update the block
   group item for block group X, which currently has its "used" field with
   a value of 32K. But that fails since the block group item was not yet
   inserted, and so on failure update_block_group_item() sets the
   "commit_used" field of the block group back to 0;

4) The block group item is inserted by task A, when for example
   btrfs_create_pending_block_groups() is called when releasing its
   transaction handle. This results in insert_block_group_item() inserting
   the block group item in the extent tree (or block group tree), with a
   "used" field having a value of 32K, but without updating the
   "commit_used" field in the block group, which remains with value of 0;

5) The two extents are freed from block X, so its "used" field changes
   from 32K to 0;

6) The transaction commit by task B continues, it enters
   btrfs_write_dirty_block_groups() which calls update_block_group_item()
   for block group X, and there it decides to skip the block group item
   update, because "used" has a value of 0 and "commit_used" has a value
   of 0 too.

   As a result, we end up with a block item having a 32K "used" field but
   no extents allocated from it.

When this issue happens, a btrfs check reports an error like this:

   [1/7] checking root items
   [2/7] checking extents
   block group [1104150528 1073741824] used 39796736 but extent items used 0
   ERROR: errors found in extent allocation tree or chunk allocation
   (...)

Fix this by making insert_block_group_item() update the block group's
"commit_used" field.

Fixes: 7248e0cebb ("btrfs: skip update of block group item if used bytes are the same")
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-08 01:14:01 +01:00
Chuck Lever fd9a2e1d51 NFSD: Protect against filesystem freezing
Flole observes this WARNING on occasion:

[1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0

Reported-by: <flole@flole.de>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123
Fixes: 73da852e38 ("nfsd: use vfs_iter_read/write")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-07 09:33:42 -05:00
Filipe Manana e4cc1483f3 btrfs: fix extent map logging bit not cleared for split maps after dropping range
At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
bit on a 'flags' variable that was not initialized. This makes static
checkers complain about it, so initialize the 'flags' variable before
clearing the bit.

In practice this has no consequences, because EXTENT_FLAG_LOGGING should
not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
mode too and it always flushes all delalloc.

Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
flags of the split extent map.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
Fixes: db21370bff ("btrfs: drop extent map range more efficiently")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Johannes Thumshirn 95cd356ca2 btrfs: fix percent calculation for bg reclaim message
We have a report, that the info message for block-group reclaim is
crossing the 100% used mark.

This is happening as we were truncating the divisor for the division
(the block_group->length) to a 32bit value.

Fix this by using div64_u64() to not truncate the divisor.

In the worst case, it can lead to a div by zero error and should be
possible to trigger on 4 disks RAID0, and each device is large enough:

  $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
  btrfs-progs v6.1
  [...]
  Filesystem size:    40.00GiB
  Block group profiles:
    Data:             RAID0             4.00GiB <<<
    Metadata:         RAID1           256.00MiB
    System:           RAID1             8.00MiB

Reported-by: Forza <forza@tnonline.net>
Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
Fixes: 5f93e776c6 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add Qu's note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Naohiro Aota 98e8d36a26 btrfs: fix unnecessary increment of read error stat on write error
Current btrfs_log_dev_io_error() increases the read error count even if the
erroneous IO is a WRITE request. This is because it forget to use "else
if", and all the error WRITE requests counts as READ error as there is (of
course) no REQ_RAHEAD bit set.

Fixes: c3a62baf21 ("btrfs: use chained bios when cloning")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
void0red c06016a02a btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
Even if the slot is already read out, we may still need to re-balance
the tree, thus it can cause error in that btrfs_del_item() call and we
need to handle it properly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: void0red <void0red@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Qu Wenruo 2943868a90 btrfs: ioctl: return device fsid from DEV_INFO ioctl
Currently user space utilizes dev info ioctl to grab the info of a
certain devid, this includes its device uuid.  But the returned info is
not enough to determine if a device is a seed.

Commit a26d60dedf ("btrfs: sysfs: add devinfo/fsid to retrieve actual
fsid from the device") exports the same value in sysfs so this is for
parity with ioctl.  Add a new member, fsid, into
btrfs_ioctl_dev_info_args, and populate the member with fsid value.

This should not cause any compatibility problem, following the
combinations:

- Old user space, old kernel
- Old user space, new kernel
  User space tool won't even check the new member.

- New user space, old kernel
  The kernel won't touch the new member, and user space tool should
  zero out its argument, thus the new member is all zero.

  User space tool can then know the kernel doesn't support this fsid
  reporting, and falls back to whatever they can.

- New user space, new kernel
  Go as planned.

  Would find the fsid member is no longer zero, and trust its value.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Boris Burkov 12148367d7 btrfs: fix potential dead lock in size class loading logic
As reported by Filipe, there's a potential deadlock caused by
using btrfs_search_forward on commit_root. The locking there is
unconditional, even if ->skip_locking and ->search_commit_root is set.
It's not meant to be used for commit roots, so it always needs to do
locking.

So if another task is COWing a child node of the same root node and
then needs to wait for block group caching to complete when trying to
allocate a metadata extent, it deadlocks.

For example:

[539604.239315] sysrq: Show Blocked State
[539604.240133] task:kworker/u16:6   state:D stack:0     pid:2119594 ppid:2      flags:0x00004000
[539604.241613] Workqueue: btrfs-cache btrfs_work_helper [btrfs]
[539604.242673] Call Trace:
[539604.243129]  <TASK>
[539604.243925]  __schedule+0x41d/0xee0
[539604.244797]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.245399]  ? rwsem_down_read_slowpath+0x185/0x490
[539604.246111]  schedule+0x5d/0xf0
[539604.246593]  rwsem_down_read_slowpath+0x2da/0x490
[539604.247290]  ? rcu_barrier_tasks_trace+0x10/0x20
[539604.248090]  __down_read_common+0x3d/0x150
[539604.248702]  down_read_nested+0xc3/0x140
[539604.249280]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
[539604.250097]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
[539604.250915]  btrfs_search_forward+0x59/0x460 [btrfs]
[539604.251781]  ? btrfs_global_root+0x50/0x70 [btrfs]
[539604.252476]  caching_thread+0x1be/0x920 [btrfs]
[539604.253167]  btrfs_work_helper+0xf6/0x400 [btrfs]
[539604.253848]  process_one_work+0x24f/0x5a0
[539604.254476]  worker_thread+0x52/0x3b0
[539604.255166]  ? __pfx_worker_thread+0x10/0x10
[539604.256047]  kthread+0xf0/0x120
[539604.256591]  ? __pfx_kthread+0x10/0x10
[539604.257212]  ret_from_fork+0x29/0x50
[539604.257822]  </TASK>
[539604.258233] task:btrfs-transacti state:D stack:0     pid:2236474 ppid:2      flags:0x00004000
[539604.259802] Call Trace:
[539604.260243]  <TASK>
[539604.260615]  __schedule+0x41d/0xee0
[539604.261205]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.262000]  ? rwsem_down_read_slowpath+0x185/0x490
[539604.262822]  schedule+0x5d/0xf0
[539604.263374]  rwsem_down_read_slowpath+0x2da/0x490
[539604.266228]  ? lock_acquire+0x160/0x310
[539604.266917]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.267996]  ? lock_contended+0x19e/0x500
[539604.268720]  __down_read_common+0x3d/0x150
[539604.269400]  down_read_nested+0xc3/0x140
[539604.270057]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
[539604.271129]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
[539604.272372]  btrfs_search_slot+0x143/0xf70 [btrfs]
[539604.273295]  update_block_group_item+0x9e/0x190 [btrfs]
[539604.274282]  btrfs_start_dirty_block_groups+0x1c4/0x4f0 [btrfs]
[539604.275381]  ? __mutex_unlock_slowpath+0x45/0x280
[539604.276390]  btrfs_commit_transaction+0xee/0xed0 [btrfs]
[539604.277391]  ? lock_acquire+0x1a4/0x310
[539604.278080]  ? start_transaction+0xcb/0x6c0 [btrfs]
[539604.279099]  transaction_kthread+0x142/0x1c0 [btrfs]
[539604.279996]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[539604.280673]  kthread+0xf0/0x120
[539604.281050]  ? __pfx_kthread+0x10/0x10
[539604.281496]  ret_from_fork+0x29/0x50
[539604.281966]  </TASK>
[539604.282255] task:fsstress        state:D stack:0     pid:2236483 ppid:1      flags:0x00004006
[539604.283897] Call Trace:
[539604.284700]  <TASK>
[539604.285088]  __schedule+0x41d/0xee0
[539604.285660]  schedule+0x5d/0xf0
[539604.286175]  btrfs_wait_block_group_cache_progress+0xf2/0x170 [btrfs]
[539604.287342]  ? __pfx_autoremove_wake_function+0x10/0x10
[539604.288450]  find_free_extent+0xd93/0x1750 [btrfs]
[539604.289256]  ? _raw_spin_unlock+0x29/0x50
[539604.289911]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
[539604.290843]  btrfs_reserve_extent+0x147/0x290 [btrfs]
[539604.291943]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
[539604.292903]  __btrfs_cow_block+0x138/0x580 [btrfs]
[539604.293773]  btrfs_cow_block+0x10e/0x240 [btrfs]
[539604.294595]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
[539604.295585]  btrfs_update_device+0x71/0x1b0 [btrfs]
[539604.296459]  btrfs_chunk_alloc_add_chunk_item+0xe0/0x340 [btrfs]
[539604.297489]  btrfs_chunk_alloc+0x1bf/0x490 [btrfs]
[539604.298335]  find_free_extent+0x6fa/0x1750 [btrfs]
[539604.299174]  ? _raw_spin_unlock+0x29/0x50
[539604.299950]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
[539604.300918]  btrfs_reserve_extent+0x147/0x290 [btrfs]
[539604.301797]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
[539604.303017]  ? lock_release+0x224/0x4a0
[539604.303855]  __btrfs_cow_block+0x138/0x580 [btrfs]
[539604.304789]  btrfs_cow_block+0x10e/0x240 [btrfs]
[539604.305611]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
[539604.306682]  ? btrfs_global_root+0x50/0x70 [btrfs]
[539604.308198]  lookup_inline_extent_backref+0x17b/0x7a0 [btrfs]
[539604.309254]  lookup_extent_backref+0x43/0xd0 [btrfs]
[539604.310122]  __btrfs_free_extent+0xf8/0x810 [btrfs]
[539604.310874]  ? lock_release+0x224/0x4a0
[539604.311724]  ? btrfs_merge_delayed_refs+0x17b/0x1d0 [btrfs]
[539604.313023]  __btrfs_run_delayed_refs+0x2ba/0x1260 [btrfs]
[539604.314271]  btrfs_run_delayed_refs+0x8f/0x1c0 [btrfs]
[539604.315445]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.316706]  btrfs_commit_transaction+0xa2/0xed0 [btrfs]
[539604.317855]  ? do_raw_spin_unlock+0x4b/0xa0
[539604.318544]  ? _raw_spin_unlock+0x29/0x50
[539604.319240]  create_subvol+0x53d/0x6e0 [btrfs]
[539604.320283]  btrfs_mksubvol+0x4f5/0x590 [btrfs]
[539604.321220]  __btrfs_ioctl_snap_create+0x11b/0x180 [btrfs]
[539604.322307]  btrfs_ioctl_snap_create_v2+0xc6/0x150 [btrfs]
[539604.323295]  btrfs_ioctl+0x9f7/0x33e0 [btrfs]
[539604.324331]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.325137]  ? lock_release+0x224/0x4a0
[539604.325808]  ? __x64_sys_ioctl+0x87/0xc0
[539604.326467]  __x64_sys_ioctl+0x87/0xc0
[539604.327109]  do_syscall_64+0x38/0x90
[539604.327875]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[539604.328792] RIP: 0033:0x7f05a7babaeb

This needs to use regular btrfs_search_slot() with some skip and stop
logic.

Since we only consider five samples (five search slots), don't bother
with the complexity of looking for commit_root_sem contention. If
necessary, it can be added to the load function in between samples.

Reported-by: Filipe Manana <fdmanana@kernel.org>
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H7eKMD44Z1+=Kb-1RFMMeZpAm2fwyO59yeBwCcSOU80Pg@mail.gmail.com/
Fixes: c7eec3d9aa ("btrfs: load block group size class when caching")
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Jan Kara 63bceed808 udf: Warn if block mapping is done for in-ICB files
Now that address space operations are merge dfor in-ICB and normal
files, it is more likely some code mistakenly tries to map blocks for
in-ICB files. WARN and return error instead of silently returning
garbage.

Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Jan Kara cecb1f0654 udf: Fix reading of in-ICB files
After merging address space operations of normal and in-ICB files,
readahead could get called for in-ICB files which resulted in
udf_get_block() being called for these files. udf_get_block() is not
prepared to be called for in-ICB files and ends up returning garbage
results as it interprets file data as extent list. Fix the problem by
skipping readahead for in-ICB files.

Fixes: 37a8a39f7a ("udf: Switch to single address_space_operations")
Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Jan Kara 49854d3ccc udf: Fix lost writes in udf_adinicb_writepage()
The patch converting udf_adinicb_writepage() to avoid manually kmapping
the page used memcpy_to_page() however that copies in the wrong
direction (effectively overwriting file data with the old contents).
What we should be using is memcpy_from_page() to copy data from the page
into the inode and then mark inode dirty to store the data.

Fixes: 5cfc45321a ("udf: Convert udf_adinicb_writepage() to memcpy_to_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Zhang Xiaoxu d0dc411199 cifs: Move the in_send statistic to __smb_send_rqst()
When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the
in_send statistic was lost.

Let's move the in_send statistic to the send function to avoid
this scenario.

Fixes: 7ee1af765d ("[CIFS]")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-05 17:50:38 -06:00
Dave Chinner 8ac5b996bf xfs: fix off-by-one-block in xfs_discard_folio()
The recent writeback corruption fixes changed the code in
xfs_discard_folio() to calculate a byte range to for punching
delalloc extents. A mistake was made in using round_up(pos) for the
end offset, because when pos points at the first byte of a block, it
does not get rounded up to point to the end byte of the block. hence
the punch range is short, and this leads to unexpected behaviour in
certain cases in xfs_bmap_punch_delalloc_range.

e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
there is no previous extent and it rounds up the punch to the end of
the delalloc extent it found at offset 0, not the end of the range
given to xfs_bmap_punch_delalloc_range().

Fix this by handling the zero block offset case correctly.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030
Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/
Fixes: 7348b32233 ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05 15:13:23 -08:00
Dave Chinner 0c7273e494 xfs: quotacheck failure can race with background inode inactivation
The background inode inactivation can attached dquots to inodes, but
this can race with a foreground quotacheck failure that leads to
disabling quotas and freeing the mp->m_quotainfo structure. The
background inode inactivation then tries to allocate a quota, tries
to dereference mp->m_quotainfo, and crashes like so:

XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas.
xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff)
BUG: kernel NULL pointer dereference, address: 00000000000002a8
....
CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker
RIP: 0010:xfs_dquot_alloc+0x95/0x1e0
....
Call Trace:
 <TASK>
 xfs_qm_dqread+0x46/0x440
 xfs_qm_dqget_inode+0x154/0x500
 xfs_qm_dqattach_one+0x142/0x3c0
 xfs_qm_dqattach_locked+0x14a/0x170
 xfs_qm_dqattach+0x52/0x80
 xfs_inactive+0x186/0x340
 xfs_inodegc_worker+0xd3/0x430
 process_one_work+0x3b1/0x960
 worker_thread+0x52/0x660
 kthread+0x161/0x1a0
 ret_from_fork+0x29/0x50
 </TASK>
....

Prevent this race by flushing all the queued background inode
inactivations pending before purging all the cached dquots when
quotacheck fails.

Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05 15:13:22 -08:00
Linus Torvalds 20fdfd55ab 17 hotfixes. Eight are for MM and seven are for other parts of the
kernel.  Seven are cc:stable and eight address post-6.3 issues or were
 judged unsuitable for -stable backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZAO0bAAKCRDdBJ7gKXxA
 jo73AP0Sbgd+E0u5Hs+aACHW28FpxleVRdyexc5chXD5QsyLKgEAwjntE7jfHHYK
 GkUKsoWQJblgjm3ksRxdLbVkDSQ8sQE=
 =CQ0B
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes.

  Eight are for MM and seven are for other parts of the kernel. Seven
  are cc:stable and eight address post-6.3 issues or were judged
  unsuitable for -stable backporting"

* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: map Dikshita Agarwal's old address to his current one
  mailmap: map Vikash Garodia's old address to his current one
  fs/cramfs/inode.c: initialize file_ra_state
  fs: hfsplus: fix UAF issue in hfsplus_put_super
  panic: fix the panic_print NMI backtrace setting
  lib: parser: update documentation for match_NUMBER functions
  kasan, x86: don't rename memintrinsics in uninstrumented files
  kasan: test: fix test for new meminstrinsic instrumentation
  kasan: treat meminstrinsic as builtins in uninstrumented files
  kasan: emit different calls for instrumentable memintrinsics
  ocfs2: fix non-auto defrag path not working issue
  ocfs2: fix defrag path triggering jbd2 ASSERT
  mailmap: map Georgi Djakov's old Linaro address to his current one
  mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
  lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
  mm/damon/paddr: fix missing folio_put()
  mm/mremap: fix dup_anon_vma() in vma_merge() case 4
2023-03-04 13:32:50 -08:00
Linus Torvalds 3162745aad 9 cifs/smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmQCU+4ACgkQiiy9cAdy
 T1HuQAwArSaTpdZfgnd08HKz783fpGtDxicv47mqyQHOEzpReu7ZsVY7ybJgaIZy
 lR96qzTcZWsHi9WF69uF9hShDz4z4G15UfJXoa1AzoQUR9Kb6jsInEXodOeZQZBU
 57sioJqr7CU66i4yIWb30900iC7cPmmJ4YdOUiSIbeb7edW5IHQ59IyxuYRkJP3W
 EIWrKQ5wJQS49YiaM0C3LSpLgIRLhFU3Fkar66wHY8SwDHJunR0BzuiJn5EVgcVY
 Vf4R4K1F8s2H2svg9fwbqZhihUjrr0Je3T/1zrDg1xUom0uWYqA42iSIBhocSuRz
 J3ipIwpQwSiud4dvegq3QHbK61jgDsNa2VQu5+njGp6wX5pYefwbN50aTZ72xg6z
 j0tnpwPtw0ZW0TO57Q5wdwbQ+yjC4GEJwLcD9WPtUKDsGakVY7fnMdwfE2SAe+xg
 u5StlRUegYybMH1wCd3yHuAR17JisE0JyWlP9QyLCwL1Xa5ii1cB+YqHH/DTOp/l
 YDD69Hzi
 =wYW2
 -----END PGP SIGNATURE-----

Merge tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - xfstest generic/208 fix (memory leak)

 - minor netfs fix (to address smatch warning)

 - a DFS fix for stable

 - a reconnect race fix

 - two multichannel fixes

 - RDMA (smbdirect) fix

 - two additional writeback fixes from David

* tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix memory leak in direct I/O
  cifs: prevent data race in cifs_reconnect_tcon()
  cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID
  iov: Fix netfs_extract_user_to_sg()
  cifs: Fix cifs_write_back_from_locked_folio()
  cifs: reuse cifs_match_ipaddr for comparison of dstaddr too
  cifs: match even the scope id for ipv6 addresses
  cifs: Fix an uninitialised variable
  cifs: Add some missing xas_retry() calls
2023-03-03 16:26:43 -08:00
Andrew Morton 3e35102666 fs/cramfs/inode.c: initialize file_ra_state
file_ra_state_init() assumes that the file_ra_state has been zeroed out. 
Fixes a KMSAN used-unintialized issue (at least).

Fixes: cf948cbc35 ("cramfs: read_mapping_page() is synchronous")
Reported-by: syzbot <syzbot+8ce7f8308d91e6b8bbe2@syzkaller.appspotmail.com>
  Link: https://lkml.kernel.org/r/0000000000008f74e905f56df987@google.com
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-02 21:54:23 -08:00
Dongliang Mu 07db5e247a fs: hfsplus: fix UAF issue in hfsplus_put_super
The current hfsplus_put_super first calls hfs_btree_close on
sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an
use-after-free issue in hfsplus_release_folio.

As shown in hfsplus_fill_super, the error handling code also calls iput
before hfs_btree_close.

To fix this error, we move all iput calls before hfsplus_btree_close.

Note that this patch is tested on Syzbot.

Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com
Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com
Tested-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-02 21:54:23 -08:00
Linus Torvalds c3f9b9fa10 Two small fixes from Xiubo and myself, marked for stable.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmQA0uETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi92pB/4yZ7Go/7j2zb84N9nEYPCHV23v1vED
 YGZIiWHYv6X3dJyTYpcU7Mn9TF00naTGDKi9NpTZjKOUIkibXPFJfbG7Dh4T2HhN
 TKw9EbldCaXE1mR7o+g/mrVQFM1PIR1VbtIeszL3eD2qO0aXEGyBMvPfUNqFX/M7
 lNWVjuglIaYUL235Uid/wt0zfmPDvtGD24fjpN0e22UQh/aBFnodIDpa/AapsFKp
 yifzqe/ADbvgnHwOhMiEMG1gRFd3vywVfPDQmQ41oSMnf7yTtLWE9t47wTfyoTY5
 IwZY2K1H51QJej/mObYJmClp/y81xSLXEydFdQ571MqZbDeDfQeM23/7
 =cWWl
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two small fixes from Xiubo and myself, marked for stable"

* tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client:
  rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails
  ceph: update the time stamps and try to drop the suid/sgid
2023-03-02 10:48:30 -08:00
David Howells 71562809e4 cifs: Fix memory leak in direct I/O
When __cifs_readv() and __cifs_writev() extract pages from a user-backed
iterator into a BVEC-type iterator, they set ->bv_need_unpin to note
whether they need to unpin the pages later.  However, in both cases they
examine the BVEC-type iterator and not the source iterator - and so
bv_need_unpin doesn't get set and the pages are leaked.

I think this may be responsible for the generic/208 xfstest failing
occasionally with:

	WARNING: CPU: 0 PID: 3064 at mm/gup.c:218 try_grab_page+0x65/0x100
	RIP: 0010:try_grab_page+0x65/0x100
	follow_page_pte+0x1a7/0x570
	__get_user_pages+0x1a2/0x650
	__gup_longterm_locked+0xdc/0xb50
	internal_get_user_pages_fast+0x17f/0x310
	pin_user_pages_fast+0x46/0x60
	iov_iter_extract_pages+0xc9/0x510
	? __kmalloc_large_node+0xb1/0x120
	? __kmalloc_node+0xbe/0x130
	netfs_extract_user_iter+0xbf/0x200 [netfs]
	__cifs_writev+0x150/0x330 [cifs]
	vfs_write+0x2a8/0x3c0
	ksys_pwrite64+0x65/0xa0

with the page refcount going negative.  This is less unlikely than it seems
because the page is being pinned, not simply got, and so the refcount
increased by 1024 each time, and so only needs to be called around ~2097152
for the refcount to go negative.

Further, the test program (aio-dio-invalidate-failure) uses a 32MiB static
buffer and all the PTEs covering it refer to the same page because it's
never written to.

The warning in try_grab_page():

	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
		return -ENOMEM;

then trips and prevents us ever using the page again for DIO at least.

Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/r/CAH2r5mvaTsJ---n=265a4zqRA7pP+o4MJ36WCQUS6oPrOij8cw@mail.gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Paulo Alcantara 1bcd548d93 cifs: prevent data race in cifs_reconnect_tcon()
Make sure to get an up-to-date TCP_Server_Info::nr_targets value prior
to waiting the server to be reconnected in cifs_reconnect_tcon().  It
is set in cifs_tcp_ses_needs_reconnect() and protected by
TCP_Server_Info::srv_lock.

Create a new cifs_wait_for_server_reconnect() helper that can be used
by both SMB2+ and CIFS reconnect code.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Paulo Alcantara b9ee2e307c cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID
Do not map STATUS_OBJECT_NAME_INVALID to -EREMOTE under non-DFS
shares, or 'nodfs' mounts or CONFIG_CIFS_DFS_UPCALL=n builds.
Otherwise, in the slow path, get a referral to figure out whether it
is an actual DFS link.

This could be simply reproduced under a non-DFS share by running the
following

  $ mount.cifs //srv/share /mnt -o ...
  $ cat /mnt/$(printf '\U110000')
  cat: '/mnt/'$'\364\220\200\200': Object is remote

Fixes: c877ce47e1 ("cifs: reduce roundtrips on create/qinfo requests")
CC: stable@vger.kernel.org # 6.2
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
David Howells 4c0421fa6d iov: Fix netfs_extract_user_to_sg()
Fix the loop check in netfs_extract_user_to_sg() for extraction from
user-backed iterators to do the body if npages > 0, not if npages < 0
(which it can never be).

This isn't currently used by cifs, which only ever extracts data from BVEC,
KVEC and XARRAY iterators at this level, user-backed iterators having being
decanted into BVEC iterators at a higher level to accommodate the work
being done in a kernel thread.

Found by smatch:
	fs/netfs/iterator.c:139 netfs_extract_user_to_sg() warn: unsigned 'npages' is never less than zero.

Fixes: 0185846975 ("netfs: Add a function to extract an iterator into a scatterlist")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302261115.P3TQi1ZO-lkp@intel.com/
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y/yYnAhoAYDBKixX@kili
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
David Howells 0268792f77 cifs: Fix cifs_write_back_from_locked_folio()
cifs_write_back_from_locked_folio() should return the number of bytes read,
but returns the result of ->async_writev(), which will be 0 on success.  As
it happens, this doesn't prevent cifs_writepages_region() from working as
it will then examine and ignore the pages that are no longer dirty rather
than just skipping over them.

Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Shyam Prasad N 410612b072 cifs: reuse cifs_match_ipaddr for comparison of dstaddr too
We have two pieces of code that does pretty much the same
comparison. This change reuses cifs_match_ipaddr within
match_address.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:24 -06:00
Shyam Prasad N a21be1f5df cifs: match even the scope id for ipv6 addresses
match_address function matches the scope id for ipv6 addresses,
but cifs_match_ipaddr (which is another function used for comparison)
does not use scope id. Doing so with this change.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:24 -06:00
David Howells 4bd3e4c7e4 cifs: Fix an uninitialised variable
Fix an uninitialised variable introduced in cifs.

Fixes: 3d78fe73fa ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:17:36 -06:00
David Howells 1907e0fec1 cifs: Add some missing xas_retry() calls
The xas_for_each loops added into fs/cifs/file.c need to go round again if
indicated by xas_retry().

Fixes: b8713c4dbf ("cifs: Add some helper functions")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:17:16 -06:00
Boris Burkov fcd9531b30 btrfs: sysfs: add size class stats
Make it possible to see the distribution of size classes for block
groups. Helpful for testing and debugging the allocator w.r.t. to size
classes.

The new stats can be found at the path:

  /sys/fs/btrfs/<FSID>/allocation/<bg-type>/size_class

but they will only be non-zero for bg-type = data.

Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-01 19:27:20 +01:00
Linus Torvalds f122a08b19 capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words.  It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more.  And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation.  It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently.  By just using a saner data structure, it went from

	unsigned __capi;
	CAP_FOR_EACH_U32(__capi) {
		if (a.cap[__capi] != b.cap[__capi])
			return false;
	}
	return true;

to just being

	return a.val == b.val;

instead.  Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-01 10:01:22 -08:00
Linus Torvalds 64e851689e This pull request contains the following changes for UML:
- Add support for rust (yay!)
 - Add support for LTO
 - Add platform bus support to virtio-pci
 - Various virtio fixes
 - Coding style, spelling cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/AVYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wUQvEADRVll8yCg+a4C1BORSefv0FQ/I
 z+3jiUtzq/ABGBf9S0TYEfaGcOn1LXFzlEgtqf4kd3vmzyIVG0pHUt3BxaqLSSTU
 IjDZkbtZ2GC0i7EK8D3iAC+lvew+TjBMfgEjhrNyni3VeYNDBh8EkUseWIbrNX8Q
 pqoUQwYlVdjY6PedtcdGDko70Fiy4OMIK7lat7JDtuoL5pvMEadbR1D7ClfiYRIh
 NGg5mSnfBBTGc20ochDkHUhubzagtSCDHvNe2SiYDBrM5sjeSANecsICmymWS+Pm
 aJtYybwpC4tl9J25O4OTD3SyfKxZ93mKwK89Tw9ryqQV2rmXG+3qB2hbZ0zpi75I
 vpgDrfv+VC+4daC0Wp8zjoeSE6zUbCKVE1s307EC5fjYyIoHjSUAsPE9GrNaJi5K
 91WVe1x8Dnpfq9/ZO8o3sLqftBo3aVH21dGVuqi5qS6OjRqDMkFkaY31nUjVXELV
 tEBj6n9UoyqPFzcgvsQfTcCjjlMiVL+JU+sl2L7dQXTNev1/RReTJngdK/vv7Epo
 BjLpGfn+1I/8dlbsyjLt0FOIwhIbUf+8RbWpezENGVgKP81iqEQPOdax/yBEhCKm
 NWduDz2itQn94KNMcUoPq+G3xoG3dOW7lXlEzXP6ZbLkcuIyXpeNZeJNlvq7J/z/
 2PBy61Ngs/DDUK/doQ==
 =q2ks
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Add support for rust (yay!)

 - Add support for LTO

 - Add platform bus support to virtio-pci

 - Various virtio fixes

 - Coding style, spelling cleanups

* tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (27 commits)
  Documentation: rust: Fix arch support table
  uml: vector: Remove unused definitions VECTOR_{WRITE,HEADERS}
  um: virt-pci: properly remove PCI device from bus
  um: virtio_uml: move device breaking into workqueue
  um: virtio_uml: mark device as unregistered when breaking it
  um: virtio_uml: free command if adding to virtqueue failed
  UML: define RUNTIME_DISCARD_EXIT
  virt-pci: add platform bus support
  um-virt-pci: Make max delay configurable
  um: virt-pci: implement pcibios_get_phb_of_node()
  um: Support LTO
  um: put power options in a menu
  um: Use CFLAGS_vmlinux
  um: Prevent building modules incompatible with MODVERSIONS
  um: Avoid pcap multiple definition errors
  um: Make the definition of cpu_data more compatible
  x86: um: vdso: Add '%rcx' and '%r11' to the syscall clobber list
  rust: arch/um: Add support for CONFIG_RUST under x86_64 UML
  rust: arch/um: Disable FP/SIMD instruction to match x86
  rust: arch/um: Use 'pie' relocation mode under UML
  ...
2023-03-01 09:13:00 -08:00