Commit Graph

80971 Commits

Author SHA1 Message Date
Linus Torvalds 1e760fa359 gfs2 fix
- Reinstate commit 970343cd49 ("GFS2: free disk inode which is deleted
   by remote node -V2") as reverting that commit could cause
   gfs2_put_super() to hang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmQcnPwUHGFncnVlbmJh
 QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTrNsQ/+IMRh864N8MbIIksCC+vFqqaTSjNR
 oC1FJgTK5YEivAMH5BOMKx7QybFNdM3H04oidrcFRtSL1vcadUBH46LPUfcIcAxf
 Bqhgp1BgYLfPNB1jISmXHy/JfyPwIHUdjwnKAOZmg3IOozfGBnQSWmoaL/iFwcbS
 28nzDC3I1r5hfwbFYbKUs3b5DIC3MILdxTKFX4oMbCmn6829nG9oLkyoAbmEquXW
 ETnS4fJwwUpJ3iYfaOWvXfkH1MaBxdJWhWmqMZ+OlSu3w49tVjPEpgtm3fxdRLXC
 y1P5lb9YL3XjFoW2jcvmeIpVQbYjyq6AneyDT0KK8P+jrNLFnx87/4LDvRv9OS/U
 wxixf2khB8/WeLXTkyJ/uqs2IpIA4OhoNuaTG9ChC7MjGNoDDT8xTFHJ0yLB+SAw
 8Qb4ZN4n1LcFF7Lt0dlv1LuPBOFmizyBW4PK+6wwwXzlcLkvx6ORA7ygaABNkapa
 ZtYnKPTBg4WV1zWBLqp7s5zdZgVZGtldZLMzFS93vtBRbbMc3pKWyqO4uwrp13Rf
 3kU+TAsuBw6V7m9sfIY/xXRQ/7Gr4klf6tu9CF9cL+wS/YSm6M/LCI9PfxNubbcy
 6zeS61pzEzGFEMq4RcbxDhFtBsM3aK1dz0QNnr0kMfnEthHakng8tdYe2uBXpwLL
 aq562MqusCarEZ4=
 =+iCq
 -----END PGP SIGNATURE-----

Merge tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

 - Reinstate commit 970343cd49 ("GFS2: free disk inode which is
   deleted by remote node -V2") as reverting that commit could cause
   gfs2_put_super() to hang.

* tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Reinstate "GFS2: free disk inode which is deleted by remote node -V2"
2023-03-23 15:25:49 -07:00
Bob Peterson 260595b439 Reinstate "GFS2: free disk inode which is deleted by remote node -V2"
It turns out that reverting commit 970343cd49 ("GFS2: free disk inode
which is deleted by remote node -V2") causes a regression related to
evicting inodes that were unlinked on a different cluster node.

We could also have simply added a call to d_mark_dontcache() to function
gfs2_try_evict(), but the original pre-revert code is better tested and
proven.

This reverts commit 445cb1277e.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-03-23 19:37:56 +01:00
Linus Torvalds 9fd6ba5420 zonefs fixes for 6.3-rc4
* Silence a false positive smatch warning about an uninitialized
    variable.
 
  * Fix an error message to provide more useful information about invalid
    zone append write results.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZBwrTQAKCRDdoc3SxdoY
 dpbpAP4xZjKspoUZO71oheqK4Of0WboSDTtBCBzBuYojHwCk7wEAukflXNFHxU3y
 Ng0Y8EmEqM64SMdL3GZlwTh6hZkDWQU=
 =JlQ1
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - Silence a false positive smatch warning about an uninitialized
   variable

 - Fix an error message to provide more useful information about invalid
   zone append write results

* tag 'zonefs-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: Fix error message in zonefs_file_dio_append()
  zonefs: Prevent uninitialized symbol 'size' warning
2023-03-23 11:05:08 -07:00
Shyam Prasad N 175b54abc4 cifs: print session id while listing open files
In the output of /proc/fs/cifs/open_files, we only print
the tree id for the tcon of each open file. It becomes
difficult to know which tcon these files belong to with
just the tree id.

This change dumps ses id in addition to all other data today.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23 11:19:42 -05:00
Shyam Prasad N d12bc6d26f cifs: dump pending mids for all channels in DebugData
Currently, we only dump the pending mid information only
on the primary channel in /proc/fs/cifs/DebugData.
If multichannel is active, we do not print the pending MID
list on secondary channels.

This change will dump the pending mids for all the channels
based on server->conn_id.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23 11:19:42 -05:00
Shyam Prasad N 896cd316b8 cifs: empty interface list when server doesn't support query interfaces
When querying server interfaces returns -EOPNOTSUPP,
clear the list of interfaces. Assumption is that multichannel
would be disabled too.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23 11:19:42 -05:00
Shyam Prasad N 072a28c890 cifs: do not poll server interfaces too regularly
We have the server interface list hanging off the tcon
structure today for reasons unknown. So each tcon which is
connected to a file server can query them separately,
which is really unnecessary. To avoid this, in the query
function, we will check the time of last update of the
interface list, and avoid querying the server if it is
within a certain range.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-23 11:19:42 -05:00
Shyam Prasad N 2f4e429c84 cifs: lock chan_lock outside match_session
Coverity had rightly indicated a possible deadlock
due to chan_lock being done inside match_session.
All callers of match_* functions should pick up the
necessary locks and call them.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Fixes: 724244cdb3 ("cifs: protect session channel fields with chan_lock")
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 23:15:40 -05:00
Namjae Jeon b53e8cfec3 ksmbd: return STATUS_NOT_SUPPORTED on unsupported smb2.0 dialect
ksmbd returned "Input/output error" when mounting with vers=2.0 to
ksmbd. It should return STATUS_NOT_SUPPORTED on unsupported smb2.0
dialect.

Cc: stable@vger.kernel.org
Reported-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:38:33 -05:00
Namjae Jeon be6f42fad5 ksmbd: don't terminate inactive sessions after a few seconds
Steve reported that inactive sessions are terminated after a few
seconds. ksmbd terminate when receiving -EAGAIN error from
kernel_recvmsg(). -EAGAIN means there is no data available in timeout.
So ksmbd should keep connection with unlimited retries instead of
terminating inactive sessions.

Cc: stable@vger.kernel.org
Reported-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:38:33 -05:00
ChenXiaoSong 2624b44554 ksmbd: fix possible refcount leak in smb2_open()
Reference count of acls will leak when memory allocation fails. Fix this
by adding the missing posix_acl_release().

Fixes: e2f34481b2 ("cifsd: add server-side procedures for SMB3")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:38:33 -05:00
Namjae Jeon 342edb60dc ksmbd: add low bound validation to FSCTL_QUERY_ALLOCATED_RANGES
Smatch static checker warning:
 fs/ksmbd/vfs.c:1040 ksmbd_vfs_fqar_lseek() warn: no lower bound on 'length'
 fs/ksmbd/vfs.c:1041 ksmbd_vfs_fqar_lseek() warn: no lower bound on 'start'

Fix unexpected result that could caused from negative start and length.

Fixes: f441584858 ("cifsd: add file operations")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:38:33 -05:00
Namjae Jeon 2d74ec9713 ksmbd: add low bound validation to FSCTL_SET_ZERO_DATA
Smatch static checker warning:
 fs/ksmbd/smb2pdu.c:7759 smb2_ioctl()
 warn: no lower bound on 'off'

Fix unexpected result that could caused from negative off and bfz.

Fixes: b5e5f9dfc9 ("ksmbd: check invalid FileOffset and BeyondFinalZero in FSCTL_ZERO_DATA")
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:38:33 -05:00
Namjae Jeon 728f14c72b ksmbd: set FILE_NAMED_STREAMS attribute in FS_ATTRIBUTE_INFORMATION
If vfs objects = streams_xattr in ksmbd.conf FILE_NAMED_STREAMS should
be set to Attributes in FS_ATTRIBUTE_INFORMATION. MacOS client show
"Format: SMB (Unknown)" on faked NTFS and no streams support.

Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:32:50 -05:00
Namjae Jeon 7a891d4b62 ksmbd: fix wrong signingkey creation when encryption is AES256
MacOS and Win11 support AES256 encrytion and it is included in the cipher
array of encryption context. Especially on macOS, The most preferred
cipher is AES256. Connecting to ksmbd fails on newer MacOS clients that
support AES256 encryption. MacOS send disconnect request after receiving
final session setup response from ksmbd. Because final session setup is
signed with signing key was generated incorrectly.
For signging key, 'L' value should be initialized to 128 if key size is
16bytes.

Cc: stable@vger.kernel.org
Reported-by: Miao Lihua <441884205@qq.com>
Tested-by: Miao Lihua <441884205@qq.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-22 16:32:50 -05:00
Trond Myklebust 6165a16a5a NFSv4: Fix hangs when recovering open state after a server reboot
When we're using a cached open stateid or a delegation in order to avoid
sending a CLAIM_PREVIOUS open RPC call to the server, we don't have a
new open stateid to present to update_open_stateid().
Instead rely on nfs4_try_open_cached(), just as if we were doing a
normal open.

Fixes: d2bfda2e7a ("NFSv4: don't reprocess cached open CLAIM_PREVIOUS")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-22 16:22:35 -04:00
Linus Torvalds a2eaf246f5 nfsd-6.3 fixes:
- Fix a crash during NFS READs from certain client implementations
 - Address a minor kbuild regression in v6.3
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQaEJgACgkQM2qzM29m
 f5ceYBAAiGHz+ISIIirg2+B5DttDL5ssPoqbgfZ16JZfBgaDduOg8oK7pbILv1nf
 4ufZQRcLpJDSnzhqNXxQE/L4Vsei8+tqkbwILSqFsCBZk5ei2AYSsZMu7Ptf8X8X
 Y+KZ55Bko0uTYAsglnPiEaDEGeq0ZYwfUc5/bnw7ZzXdAHhlwgjoAads6Q25A9f1
 6ZuKQ6nIis/ok9q88GdLf7x51IdKrXTrkC1iYAeVTXCgR3qXPezqmPOzW44rZs0t
 TMzUgDuyPgvB5eT/N+pwXoI9Bli9v1PFoXZjm2NchhJ/Iy6Gt1eEEB93J/qRSU/x
 ziJLkk0jnJEPWJ0Ht5li4+60b58t2OoWCTJTr3FzHCcF5Iqr7YjenSi2U8IT84ct
 EeFdqR2nZpJ4Y1wbAhTjQ2w6rw5WpaXDLHCRRdE7OF/D121n3FkSNo8DtSn+fbR8
 sFDpOruZn4tjgpf3AjCR1zbyKw9baC6clhnLrW6AE6T4ht9XwfE6eCUSy48FSMzg
 4IZKLcofQh2UEvlVtCvGOjtcK151eFKocsq+p6/yhYE6BM20Z3dBqhw9iBzqOAPg
 0qZk60AWl9W5h7M+ovkGAZpwXv4djsdWf7LbFuYOQ0kodzXOQ/5sFQm4zC5F3I1Z
 8WPPA9SkYyNiNSBUt65tca4JNw5h6x9A3scPWtuHB7Lz3fuGTnE=
 =frHb
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Fix a crash during NFS READs from certain client implementations

 - Address a minor kbuild regression in v6.3

* tag 'nfsd-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: don't replace page in rq_pages if it's a continuation of last page
  NFS & NFSD: Update GSS dependencies
2023-03-21 14:48:38 -07:00
Linus Torvalds 17214b70a1 fsverity fixes for v6.3-rc4
Fix two significant performance issues with fsverity.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZBjIvRQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK2uuAP9I9h+KjU8cSpF0fS3vHhwmDtqc/vW9
 wfHniylcwK2YnwD/REgCv8yKlJva0+IUHMCGNWUzd0CERfuUFJy1Z7xvCgo=
 =rjIA
 -----END PGP SIGNATURE-----

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:
 "Fix two significant performance issues with fsverity"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
  fsverity: Remove WQ_UNBOUND from fsverity read workqueue
2023-03-20 15:20:33 -07:00
Linus Torvalds 4f1e308df8 fscrypt fixes for v6.3-rc4
Fix a bug where when a filesystem was being unmounted, the fscrypt
 keyring was destroyed before inodes have been released by the Landlock
 LSM.  This bug was found by syzbot.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCZBjHAxQcZWJpZ2dlcnNA
 Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK1THAP91M4UXEsVTPw1QI03+72pPy0U4iArf
 j8jhQVL6vjXHnQEA4fmwg3tE5g9F5Qe5mjFOxvoa/Z9xqupPOybOR3z1zQI=
 =S80B
 -----END PGP SIGNATURE-----

Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux

Pull fscrypt fix from Eric Biggers:
 "Fix a bug where when a filesystem was being unmounted, the fscrypt
  keyring was destroyed before inodes have been released by the Landlock
  LSM.

  This bug was found by syzbot"

* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux:
  fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
  fscrypt: improve fscrypt_destroy_keyring() documentation
  fscrypt: destroy keyring after security_sb_delete()
2023-03-20 15:12:55 -07:00
Damien Le Moal 88b170088a zonefs: Fix error message in zonefs_file_dio_append()
Since the expected write location in a sequential file is always at the
end of the file (append write), when an invalid write append location is
detected in zonefs_file_dio_append(), print the invalid written location
instead of the expected write location.

Fixes: a608da3bd7 ("zonefs: Detect append writes at invalid locations")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-21 06:36:43 +09:00
Damien Le Moal d7e673c2a9 zonefs: Prevent uninitialized symbol 'size' warning
In zonefs_file_dio_append(), initialize the variable size to 0 to
prevent compilation and static code analizers warning such as:

New smatch warnings:
fs/zonefs/file.c:441 zonefs_file_dio_append() error: uninitialized
symbol 'size'.

The warning is a false positive as size is never actually used
uninitialized.

No functional change.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/202303191227.GL8Dprbi-lkp@intel.com/
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
2023-03-21 06:36:34 +09:00
Linus Torvalds 3d3699bde4 NFS Client Bugfixes for Linux 6.3-rc
Bugfixes:
   * Fix /proc/PID/io read_bytes accounting
   * Fix setting NLM file_lock start and end during decoding testargs
   * Fix timing for setting access cache timestamps
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmQYnFoACgkQ18tUv7Cl
 QOv86RAAuDaOVfnPHuM1EMgTiBG+EZnCjMz+ktEuYhKuHdtS2cplq0S5Uygx1Zpl
 Kt9/Ww77k2jOPvsk3R1ejmFv1aiyNz+S9eEHrPp6nfwRhhQBw04fvo+y3mjrkNLq
 p4Y0Wes+9/0SnYftcsrcO/ghsc0t5DLm9eiGZMICes21cnrAO8iY1PY5DX2p7ipb
 O4qEnfCA67Nt+rY2T6lbMoI8kMyN+J4FaFCwPaMv8UDP/RqdG1KR429V46JwrKxR
 1FbG1nKH03U3apmaaGcRVr8meswPetQNWfJnmjsAHur6z8pCZ233oIggj4gnBIdy
 jGDWwQ3lT55+6XvkW4yrgGPUT2oDYiYwW4uk401I+xWJIbNvRMXKPIQWehHMTR4v
 0Xo8rhefPkgHrJkhaoBjvnMeKl79QwGUCSw5ECluxKm9nkDGeVa5kRBFCfkeWgfR
 dGJNUOqd6P3vVdGVvTEYP/q5mrDpr+4ZTyggm2IbP0nKNnYgwPoHVn+Prwsht+wh
 XGmXpu5C07U5fex7uzfmK/S8JWZDkEoNOZJQ8AE+NFm8MZFl8wUIfuyaZS4ItSEk
 odIGQ+PSMbEyfo24WhSk3BLt0cqzWxeweENUxP00PVOyqcYmj5duRP7OIVVHmm/w
 dDwh5tQfykrF7opNMoJBXXAGlwI9O0t0UwBvNasBUiINctrapOw=
 =n/JC
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Fix /proc/PID/io read_bytes accounting

 - Fix setting NLM file_lock start and end during decoding testargs

 - Fix timing for setting access cache timestamps

* tag 'nfs-for-6.3-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS: Correct timing for assigning access cache timestamp
  lockd: set file_lock start and end when decoding nlm4 testargs
  NFS: Fix /proc/PID/io read_bytes for buffered reads
2023-03-20 10:52:10 -07:00
Darrick J. Wong 3cfb9290da xfs: test dir/attr hash when loading module
Back in the 6.2-rc1 days, Eric Whitney reported a fstests regression in
ext4 against generic/454.  The cause of this test failure was the
unfortunate combination of setting an xattr name containing UTF8 encoded
emoji, an xattr hash function that accepted a char pointer with no
explicit signedness, signed type extension of those chars to an int, and
the 6.2 build tools maintainers deciding to mandate -funsigned-char
across the board.  As a result, the ondisk extended attribute structure
written out by 6.1 and 6.2 were not the same.

This discrepancy, in fact, had been noticeable if a filesystem with such
an xattr were moved between any two architectures that don't employ the
same signedness of a raw "char" declaration.  The only reason anyone
noticed is that x86 gcc defaults to signed, and no such -funsigned-char
update was made to e2fsprogs, so e2fsck immediately started reporting
data corruption.

After a day and a half of discussing how to handle this use case (xattrs
with bit 7 set anywhere in the name) without breaking existing users,
Linus merged his own patch and didn't tell the maintainer.  None of the
ext4 developers realized this until AUTOSEL announced that the commit
had been backported to stable.

In the end, this problem could have been detected much earlier if there
had been any useful tests of hash function(s) in use inside ext4 to make
sure that they always produce the same outputs given the same inputs.

The XFS dirent/xattr name hash takes a uint8_t*, so I don't think it's
vulnerable to this problem.  However, let's avoid all this drama by
adding our own self test to check that the da hash produces the same
outputs for a static pile of inputs on various platforms.  This enables
us to fix any breakage that may result in a controlled fashion.  The
buffer and test data are identical to the patches submitted to xfsprogs.

Link: https://lore.kernel.org/linux-ext4/Y8bpkm3jA3bDm3eL@debian-BULLSEYE-live-builder-AMD64/
Link: https://lore.kernel.org/linux-xfs/ZBUKCRR7xvIqPrpX@destitution/T/#md38272cc684e2c0d61494435ccbb91f022e8dee4
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:49 -07:00
Darrick J. Wong e6fbb7167e xfs: add tracepoints for each of the externally visible allocators
There are now five separate space allocator interfaces exposed to the
rest of XFS for five different strategies to find space.  Add
tracepoints for each of them so that I can tell from a trace dump
exactly which ones got called and what happened underneath them.  Add a
sixth so it's more obvious if an allocation actually happened.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:49 -07:00
Darrick J. Wong 9eb775968b xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags
Callers of xfs_alloc_vextent_iterate_ags that pass in the TRYLOCK flag
want us to perform a non-blocking scan of the AGs for free space.  There
are no ordering constraints for non-blocking AGF lock acquisition, so
the scan can freely start over at AG 0 even when minimum_agno > 0.

This manifests fairly reliably on xfs/294 on 6.3-rc2 with the parent
pointer patchset applied and the realtime volume enabled.  I observed
the following sequence as part of an xfs_dir_createname call:

0. Fragment the free space, then allocate nearly all the free space in
   all AGs except AG 0.

1. Create a directory in AG 2 and let it grow for a while.

2. Try to allocate 2 blocks to expand the dirent part of a directory.
   The space will be allocated out of AG 0, but the allocation will not
   be contiguous.  This (I think) activates the LOWMODE allocator.

3. The bmapi call decides to convert from extents to bmbt format and
   tries to allocate 1 block.  This allocation request calls
   xfs_alloc_vextent_start_ag with the inode number, which starts the
   scan at AG 2.  We ignore AG 0 (with all its free space) and instead
   scrape AG 2 and 3 for more space.  We find one block, but this now
   kicks t_highest_agno to 3.

4. The createname call decides it needs to split the dabtree.  It tries
   to allocate even more space with xfs_alloc_vextent_start_ag, but now
   we're constrained to AG 3, and we don't find the space.  The
   createname returns ENOSPC and the filesystem shuts down.

This change fixes the problem by making the trylock scan wrap around to
AG 0 if it doesn't like the AGs that it finds.  Since the current
transaction itself holds AGF 0, the trylock of AGF 0 will succeed, and
we take space from the AG that has plenty.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-19 09:55:48 -07:00
Linus Torvalds 995bba436b Fix a double unlock bug on an error path in ext4, found by smatch and
syzkaller.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQWfYcACgkQ8vlZVpUN
 gaNvhAgAqIv3PLgt3SJIC/klXmEsXJn2q1EqAUKk6XHAzppHoUMrnsJJ2Sz+u4NT
 eKx7FGz+dIzJK8lLoX+XNnLgdkkJwklD8jIxAjbnkgHaGJbjdLH3Su4elQvGwLla
 CprPysjMtgny4rogLwz2D+At3w8o7/JACbzlh0QNhaZ5IIP4Kj5QutgyftF1m4Qa
 AfC9KcpJ+poTQTI4m4oOrJCrH6kwJdVItuS5/mk54DTChNNYtJJy2M6eho7IQ9iG
 ENBnt06/uWhdHQz4OYmYn3x1iXYOAdAmeVGtREKqOstUd31VO2TCC2lE+7tSnO2d
 177m3jfTHdv5znU+LfyrS1Y1u1nGBA==
 =AXq7
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fix from Ted Ts'o:
 "Fix a double unlock bug on an error path in ext4, found by smatch and
  syzkaller"

* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix possible double unlock when moving a directory
2023-03-19 09:38:26 -07:00
Eric Biggers 4bcf6f827a fscrypt: check for NULL keyring in fscrypt_put_master_key_activeref()
It is a bug for fscrypt_put_master_key_activeref() to see a NULL
keyring.  But it used to be possible due to the bug, now fixed, where
fscrypt_destroy_keyring() was called before security_sb_delete().  To be
consistent with how fscrypt_destroy_keyring() uses WARN_ON for the same
issue, WARN and leak the fscrypt_master_key if the keyring is NULL
instead of dereferencing the NULL pointer.

This is a robustness improvement, not a fix.

Link: https://lore.kernel.org/r/20230313221231.272498-4-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-18 21:08:03 -07:00
Eric Biggers 43e5f1d592 fscrypt: improve fscrypt_destroy_keyring() documentation
Document that fscrypt_destroy_keyring() must be called after all
potentially-encrypted inodes have been evicted.

Link: https://lore.kernel.org/r/20230313221231.272498-3-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-18 21:08:00 -07:00
Theodore Ts'o 70e42feab2 ext4: fix possible double unlock when moving a directory
Fixes: 0813299c58 ("ext4: Fix possible corruption when moving a directory")
Link: https://lore.kernel.org/r/5efbe1b9-ad8b-4a4f-b422-24824d2b775c@kili.mountain
Reported-by: Dan Carpenter <error27@gmail.com>
Reported-by: syzbot+0c73d1d8b952c5f3d714@syzkaller.appspotmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-17 21:53:52 -04:00
Jeff Layton 27c934dd88 nfsd: don't replace page in rq_pages if it's a continuation of last page
The splice read calls nfsd_splice_actor to put the pages containing file
data into the svc_rqst->rq_pages array. It's possible however to get a
splice result that only has a partial page at the end, if (e.g.) the
filesystem hands back a short read that doesn't cover the whole page.

nfsd_splice_actor will plop the partial page into its rq_pages array and
return. Then later, when nfsd_splice_actor is called again, the
remainder of the page may end up being filled out. At this point,
nfsd_splice_actor will put the page into the array _again_ corrupting
the reply. If this is done enough times, rq_next_page will overrun the
array and corrupt the trailing fields -- the rq_respages and
rq_next_page pointers themselves.

If we've already added the page to the array in the last pass, don't add
it to the array a second time when dealing with a splice continuation.
This was originally handled properly in nfsd_splice_actor, but commit
91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()") removed the check
for it.

Fixes: 91e23b1c39 ("NFSD: Clean up nfsd_splice_actor()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Reported-by: Dario Lesca <d.lesca@solinos.it>
Tested-by: David Critch <dcritch@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2150630
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-17 18:18:15 -04:00
Shyam Prasad N 2f0e4f0342 cifs: check only tcon status on tcon related functions
We had a couple of checks for session in cifs_tree_connect
and cifs_mark_open_files_invalid, which were unnecessary.
And that was done with ses_lock. Changed that to tc_lock too.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-17 13:22:22 -05:00
Linus Torvalds 38e04b3e42 seven cifs/smb3 client fixes, all also for stable
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmQTfDQACgkQiiy9cAdy
 T1FXOQwAsqPc6ctQnl5V5UuZgfr0DZPurLgbrC35XgEYpHiopa4myZvl4Dc2DDTs
 f+TdLkd3fMBuopjOrO8RBN5JioruLeOLuTILlWwmcFqx9cXdM6aaMmGVAMU5JhhL
 ahgPjSgF2cft/zLBsiLb4N4T1WOctGRSpcAgX7AfQXqufBQ/auqMVldVcgPaMRD/
 Si9R9qem8BXHTR8ZgBXrqzUKova0CN9JQICoV8yp9crK//0npHvgHkIIdflgxLwJ
 IE6D73MYo+xMrzlmV867kRO8xHeq03Er19WF73X1cGsVLJtsmxdbeFd8fXFIrDwf
 4JRn5vvXiuHPIPqtMpdqcSU1/W1YP4R609cSketlpbqTWDvuGU1myN9tUqYYnYGv
 KyNSOQi+p4f/rC/8UdzCgbSAlt0wqoQJm/nV97uom5OWEjS3hgB8Tqh1akZPrwzc
 7tXmJ4lETdUFSZCApmCAqHRwoG7WvnchCFxqNG85PCTybVuIEKDMjMkBl19pD8d1
 ZfRHDulA
 =iLy+
 -----END PGP SIGNATURE-----

Merge tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Seven cifs/smb3 client fixes, all also for stable:

   - four DFS fixes

   - multichannel reconnect fix

   - fix smb1 stats for cancel command

   - fix for set file size error path"

* tag '6.3-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: use DFS root session instead of tcon ses
  cifs: return DFS root session id in DebugData
  cifs: fix use-after-free bug in refresh_cache_worker()
  cifs: set DFS root session in cifs_get_smb_ses()
  cifs: generate signkey for the channel that's reconnecting
  cifs: Fix smb2_set_path_size()
  cifs: Move the in_send statistic to __smb_send_rqst()
2023-03-16 15:06:16 -07:00
Darrick J. Wong 6de4b1ab47 xfs: try to idiot-proof the allocators
In porting his development branch to 6.3-rc1, yours truly has
repeatedly screwed up the args->pag being fed to the xfs_alloc_vextent*
functions.  Add some debugging assertions to test the preconditions
required of the callers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-03-16 09:32:18 -07:00
Eric Biggers a075bacde2 fsverity: don't drop pagecache at end of FS_IOC_ENABLE_VERITY
The full pagecache drop at the end of FS_IOC_ENABLE_VERITY is causing
performance problems and is hindering adoption of fsverity.  It was
intended to solve a race condition where unverified pages might be left
in the pagecache.  But actually it doesn't solve it fully.

Since the incomplete solution for this race condition has too much
performance impact for it to be worth it, let's remove it for now.

Fixes: 3fda4c617e ("fs-verity: implement FS_IOC_ENABLE_VERITY ioctl")
Cc: stable@vger.kernel.org
Reviewed-by: Victor Hsieh <victorhsieh@google.com>
Link: https://lore.kernel.org/r/20230314235332.50270-1-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-15 22:50:41 -07:00
Naohiro Aota e15acc2588 btrfs: zoned: drop space_info->active_total_bytes
The space_info->active_total_bytes is no longer necessary as we now
count the region of newly allocated block group as zone_unusable. Drop
its usage.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Naohiro Aota fa2068d7e9 btrfs: zoned: count fresh BG region as zone unusable
The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.

For a background, there are three kinds of block groups in terms of
activation.

  1. Block groups never activated
  2. Block groups currently active
  3. Block groups previously active and currently inactive (due to fully
     written or zone finish)

What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.

And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.

OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.

Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.

Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 ..  zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.

Fixes: 6a921de589 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:07 +01:00
Josef Bacik df384da5a4 btrfs: use temporary variable for space_info in btrfs_update_block_group
We do

  cache->space_info->counter += num_bytes;

everywhere in here.  This is makes the lines longer than they need to
be, and will be especially noticeable when we add the active tracking in,
so add a temp variable for the space_info so this is cleaner.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Josef Bacik bf1f1fec27 btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
This flag only gets set when we're doing active zone tracking, and we're
going to need to use this flag for things related to this behavior.
Rename the flag to represent what it actually means for the file system
so it can be used in other ways and still make sense.

Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Naohiro Aota 9e1cdf0c35 btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
btrfs_can_activate_zone() returns true if at least one device has one zone
available for activation. This is OK for the single profile, but not OK for
DUP profile. We need two zones to create a DUP block group. Fix it by
properly handling the case with the profile flags.

Fixes: 265f7237dd ("btrfs: zoned: allow DUP on meta-data block groups")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Sweet Tea Dorminy 10a8857a1b btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
Commit 1ec49744ba ("btrfs: turn on -Wmaybe-uninitialized") exposed
that on SPARC and PA-RISC, gcc is unaware that fscrypt_setup_filename()
only returns negative error values or 0. This ultimately results in a
maybe-uninitialized warning in btrfs_lookup_dentry().

Change to only return negative error values or 0 from
fscrypt_setup_filename() at the relevant call site, and assert that no
positive error codes are returned (which would have wider implications
involving other users).

Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/481b19b5-83a0-4793-b4fd-194ad7b978c3@roeck-us.net/
Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:06 +01:00
Qu Wenruo 1c3ab6dfa0 btrfs: handle missing chunk mapping more gracefully
[BUG]
During my scrub rework, I did a stupid thing like this:

        bio->bi_iter.bi_sector = stripe->logical;
        btrfs_submit_bio(fs_info, bio, stripe->mirror_num);

Above bi_sector assignment is using logical address directly, which
lacks ">> SECTOR_SHIFT".

This results a read on a range which has no chunk mapping.

This results the following crash:

  BTRFS critical (device dm-1): unable to find logical 11274289152 length 65536
  assertion failed: !IS_ERR(em), in fs/btrfs/volumes.c:6387

Sure this is all my fault, but this shows a possible problem in real
world, that some bit flip in file extents/tree block can point to
unmapped ranges, and trigger above ASSERT(), or if CONFIG_BTRFS_ASSERT
is not configured, cause invalid pointer access.

[PROBLEMS]
In the above call chain, we just don't handle the possible error from
btrfs_get_chunk_map() inside __btrfs_map_block().

[FIX]
The fix is straightforward, replace the ASSERT() with proper error
handling (callers handle errors already).

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-15 20:51:05 +01:00
Paulo Alcantara 6284e46bdd cifs: use DFS root session instead of tcon ses
Use DFS root session whenever possible to get new DFS referrals
otherwise we might end up with an IPC tcon (tcon->ses->tcon_ipc) that
doesn't respond to them.  It should be safe accessing
@ses->dfs_root_ses directly in cifs_inval_name_dfs_link_error() as it
has same lifetime as of @tcon.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 22:48:53 -05:00
Paulo Alcantara f446a63080 cifs: return DFS root session id in DebugData
Return the DFS root session id in /proc/fs/cifs/DebugData to make it
easier to track which IPC tcon was used to get new DFS referrals for a
specific connection, and aids in debugging.

A simple output of it would be

  Sessions:
  1) Address: 192.168.1.13 Uses: 1 Capability: 0x300067   Session Status: 1
  Security type: RawNTLMSSP  SessionId: 0xd80000000009
  User: 0 Cred User: 0
  DFS root session id: 0x128006c000035

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:43:23 -05:00
Paulo Alcantara 396935de14 cifs: fix use-after-free bug in refresh_cache_worker()
The UAF bug occurred because we were putting DFS root sessions in
cifs_umount() while DFS cache refresher was being executed.

Make DFS root sessions have same lifetime as DFS tcons so we can avoid
the use-after-free bug is DFS cache refresher and other places that
require IPCs to get new DFS referrals on.  Also, get rid of mount
group handling in DFS cache as we no longer need it.

This fixes below use-after-free bug catched by KASAN

[ 379.946955] BUG: KASAN: use-after-free in __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.947642] Read of size 8 at addr ffff888018f57030 by task kworker/u4:3/56
[ 379.948096]
[ 379.948208] CPU: 0 PID: 56 Comm: kworker/u4:3 Not tainted 6.2.0-rc7-lku #23
[ 379.948661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014
[ 379.949368] Workqueue: cifs-dfscache refresh_cache_worker [cifs]
[ 379.949942] Call Trace:
[ 379.950113] <TASK>
[ 379.950260] dump_stack_lvl+0x50/0x67
[ 379.950510] print_report+0x16a/0x48e
[ 379.950759] ? __virt_addr_valid+0xd8/0x160
[ 379.951040] ? __phys_addr+0x41/0x80
[ 379.951285] kasan_report+0xdb/0x110
[ 379.951533] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.952056] ? __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.952585] __refresh_tcon.isra.0+0x10b/0xc10 [cifs]
[ 379.953096] ? __pfx___refresh_tcon.isra.0+0x10/0x10 [cifs]
[ 379.953637] ? __pfx___mutex_lock+0x10/0x10
[ 379.953915] ? lock_release+0xb6/0x720
[ 379.954167] ? __pfx_lock_acquire+0x10/0x10
[ 379.954443] ? refresh_cache_worker+0x34e/0x6d0 [cifs]
[ 379.954960] ? __pfx_wb_workfn+0x10/0x10
[ 379.955239] refresh_cache_worker+0x4ad/0x6d0 [cifs]
[ 379.955755] ? __pfx_refresh_cache_worker+0x10/0x10 [cifs]
[ 379.956323] ? __pfx_lock_acquired+0x10/0x10
[ 379.956615] ? read_word_at_a_time+0xe/0x20
[ 379.956898] ? lockdep_hardirqs_on_prepare+0x12/0x220
[ 379.957235] process_one_work+0x535/0x990
[ 379.957509] ? __pfx_process_one_work+0x10/0x10
[ 379.957812] ? lock_acquired+0xb7/0x5f0
[ 379.958069] ? __list_add_valid+0x37/0xd0
[ 379.958341] ? __list_add_valid+0x37/0xd0
[ 379.958611] worker_thread+0x8e/0x630
[ 379.958861] ? __pfx_worker_thread+0x10/0x10
[ 379.959148] kthread+0x17d/0x1b0
[ 379.959369] ? __pfx_kthread+0x10/0x10
[ 379.959630] ret_from_fork+0x2c/0x50
[ 379.959879] </TASK>

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:07:44 -05:00
Paulo Alcantara b56bce502f cifs: set DFS root session in cifs_get_smb_ses()
Set the DFS root session pointer earlier when creating a new SMB
session to prevent racing with smb2_reconnect(), cifs_reconnect_tcon()
and DFS cache refresher.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org # 6.2
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 21:05:53 -05:00
Linus Torvalds 26e2878b3e Eleven hotfixes. Four of these are cc:stable and the remainder address
post-6.2 issues or aren't considered suitable for backporting.  Seven of
 these fixes are for MM.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZBEItgAKCRDdBJ7gKXxA
 jsF/AP0ToKnDwmZ1SJOGK3pFiVGVy7VSaq1THrnLQoC57l8jTAD+PReSZMNXaxhB
 8701hVQcxKAiu9wAvowSd+lOvpwHMwQ=
 =IANU
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "Eleven hotfixes.

  Four of these are cc:stable and the remainder address post-6.2 issues
  or aren't considered suitable for backporting.

  Seven of these fixes are for MM"

* tag 'mm-hotfixes-stable-2023-03-14-16-51' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate()
  mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young()
  ocfs2: fix data corruption after failed write
  migrate_pages: try migrate in batch asynchronously firstly
  migrate_pages: move split folios processing out of migrate_pages_batch()
  migrate_pages: fix deadlock in batched migration
  .mailmap: add Alexandre Ghiti personal email address
  mailmap: correct Dikshita Agarwal's Qualcomm email address
  mailmap: updates for Jarkko Sakkinen
  mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage
  mm: teach mincore_hugetlb about pte markers
2023-03-14 17:13:58 -07:00
Nathan Huckleberry f959325e6a fsverity: Remove WQ_UNBOUND from fsverity read workqueue
WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
is problematic for latency sensitive workloads, like I/O
post-processing.

Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.
WQ_UNBOUND was also removed from the dm-verity workqueue for the same
reason [1].

This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the runnable state.

Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us

[1]: https://lore.kernel.org/all/20230202012348.885402-1-nhuck@google.com/

Signed-off-by: Nathan Huckleberry <nhuck@google.com>
Fixes: 8a1d0f9cac ("fs-verity: add data verification hooks for ->readpages()")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230310193325.620493-1-nhuck@google.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-14 16:16:36 -07:00
Shyam Prasad N 05ce0448c3 cifs: generate signkey for the channel that's reconnecting
Before my changes to how multichannel reconnects work, the
primary channel was always used to do a non-binding session
setup. With my changes, that is not the case anymore.
Missed this place where channel at index 0 was forcibly
updated with the signing key.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 17:36:49 -05:00
Volker Lendecke 211baef0ea cifs: Fix smb2_set_path_size()
If cifs_get_writable_path() finds a writable file, smb2_compound_op()
must use that file's FID and not the COMPOUND_FID.

Cc: stable@vger.kernel.org
Signed-off-by: Volker Lendecke <vl@samba.org>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-14 17:36:44 -05:00
Chengen Du 21fd9e8700 NFS: Correct timing for assigning access cache timestamp
When the user's login time is newer than the cache's timestamp,
the original entry in the RB-tree will be replaced by a new entry.
Currently, the timestamp is only set if the entry is not found in
the RB-tree, which can cause the timestamp to be undefined when
the entry exists. This may result in a significant increase in
ACCESS operations if the timestamp is set to zero.

Signed-off-by: Chengen Du <chengen.du@canonical.com>
Fixes: 0eb43812c0 ("NFS: Clear the file access cache upon login”)
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 15:19:44 -04:00
Jeff Layton 7ff84910c6 lockd: set file_lock start and end when decoding nlm4 testargs
Commit 6930bcbfb6 dropped the setting of the file_lock range when
decoding a nlm_lock off the wire. This causes the client side grant
callback to miss matching blocks and reject the lock, only to rerequest
it 30s later.

Add a helper function to set the file_lock range from the start and end
values that the protocol uses, and have the nlm_lock decoder call that to
set up the file_lock args properly.

Fixes: 6930bcbfb6 ("lockd: detect and reject lock arguments that overflow")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org #6.0
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 14:00:55 -04:00
Dave Wysochanski 9c88ea00fe NFS: Fix /proc/PID/io read_bytes for buffered reads
Prior to commit 8786fde842 ("Convert NFS from readpages to
readahead"), nfs_readpages() used the old mm interface read_cache_pages()
which called task_io_account_read() for each NFS page read.  After
this commit, nfs_readpages() is converted to nfs_readahead(), which
now uses the new mm interface readahead_page().  The new interface
requires callers to call task_io_account_read() themselves.
In addition, to nfs_readahead() task_io_account_read() should also
be called from nfs_read_folio().

Fixes: 8786fde842 ("Convert NFS from readpages to readahead")
Link: https://lore.kernel.org/linux-nfs/CAPt2mGNEYUk5u8V4abe=5MM5msZqmvzCVrtCP4Qw1n=gCHCnww@mail.gmail.com/
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-03-14 13:47:43 -04:00
Eric Biggers ccb820dc7d fscrypt: destroy keyring after security_sb_delete()
fscrypt_destroy_keyring() must be called after all potentially-encrypted
inodes were evicted; otherwise it cannot safely destroy the keyring.
Since inodes that are in-use by the Landlock LSM don't get evicted until
security_sb_delete(), this means that fscrypt_destroy_keyring() must be
called *after* security_sb_delete().

This fixes a WARN_ON followed by a NULL dereference, only possible if
Landlock was being used on encrypted files.

Fixes: d7e7b9af10 ("fscrypt: stop using keyrings subsystem for fscrypt_master_key")
Cc: stable@vger.kernel.org
Reported-by: syzbot+93e495f6a4f748827c88@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/00000000000044651705f6ca1e30@google.com
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/r/20230313221231.272498-2-ebiggers@kernel.org
Signed-off-by: Eric Biggers <ebiggers@google.com>
2023-03-14 10:30:30 -07:00
Linus Torvalds 2e545d69bd Fixes for 6.3-rc1:
* Fix a crash if mount time quotacheck fails when there are inodes
    queued for garbage collection.
  * Fix an off by one error when discarding folios after writeback
    failure.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZAYvQwAKCRBKO3ySh0YR
 pjMXAP9X9HozNYESlg/cMq6nY2XfbHIR2qvNOfopiRpWby5xQAEAqBiEhafIJ0A1
 mTt+0TqQxDsH+uxr/QEUm76Q7F3f1gE=
 =0zU0
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:

 - Fix a crash if mount time quotacheck fails when there are inodes
   queued for garbage collection.

 - Fix an off by one error when discarding folios after writeback
   failure.

* tag 'xfs-6.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix off-by-one-block in xfs_discard_folio()
  xfs: quotacheck failure can race with background inode inactivation
2023-03-12 09:47:08 -07:00
Linus Torvalds 3b11717f95 vfs.misc.v6.3-rc2
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCZA2yXAAKCRCRxhvAZXjc
 opK7AP0fqkk75P1bRZL36iNOgCV0RDiSN/Ynk/oMYpsOyBndlAD7BKCEZFF2OKzP
 aeJrY0F+guwL67X+18X+yiLZrk2rag4=
 =2Wa/
 -----END PGP SIGNATURE-----

Merge tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping

Pull vfs fixes from Christian Brauner:

 - When allocating pages for a watch queue failed, we didn't return an
   error causing userspace to proceed even though all subsequent
   notifcations would be lost. Make sure to return an error.

 - Fix a misformed tree entry for the idmapping maintainers entry.

 - When setting file leases from an idmapped mount via
   generic_setlease() we need to take the idmapping into account
   otherwise taking a lease would fail from an idmapped mount.

 - Remove two redundant assignments, one in splice code and the other in
   locks code, that static checkers complained about.

* tag 'vfs.misc.v6.3-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
  filelocks: use mount idmapping for setlease permission check
  fs/locks: Remove redundant assignment to cmd
  splice: Remove redundant assignment to ret
  MAINTAINERS: repair a malformed T: entry in IDMAPPED MOUNTS
  watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths
2023-03-12 09:00:54 -07:00
Linus Torvalds 40d0c0901e Bug fixes and regressions for ext4, the most serious of which is a
potential deadlock during directory renames that was introduced during
 the merge window discovered by a combination of syzbot and lockdep.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmQNVwIACgkQ8vlZVpUN
 gaMwmgf/ZAasXZEMV0zaQZa8zP4KvMKZjWe6azkcJg4sb/HG9Q7JzeJDCurhhWUj
 8+QnyUcuKTyWKYWjGf0f5CZaYEM5AZYij41UJzu2qMkz5hVXSqBVuY8KywxuiJv5
 kfuIvQh0Onv0Yrg2qAc52/kZkq1lu2sl/F5ertBWjdpTUXdBUdrCxkUk+1BgQWAj
 vNwi1/+gNuX7RxMboHqYmwXFP39vECd+wteNdsiK1hR8bLqL68duLLq8xQdHt4gS
 sbVmJKR4j2Giw4ZnlYi9RiwKIO0beqocanp+cfOPulyj5mTM8X1lr0uvaLZgx2AF
 lqrS3/5ksp45cRT70qCIz8je70hTSg==
 =nN3T
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
 "Bug fixes and regressions for ext4, the most serious of which is a
  potential deadlock during directory renames that was introduced during
  the merge window discovered by a combination of syzbot and lockdep"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: zero i_disksize when initializing the bootloader inode
  ext4: make sure fs error flag setted before clear journal error
  ext4: commit super block if fs record error when journal record without error
  ext4, jbd2: add an optimized bmap for the journal inode
  ext4: fix WARNING in ext4_update_inline_data
  ext4: move where set the MAY_INLINE_DATA flag is set
  ext4: Fix deadlock during directory rename
  ext4: Fix comment about the 64BIT feature
  docs: ext4: modify the group desc size to 64
  ext4: fix another off-by-one fsmap error on 1k block filesystems
  ext4: fix RENAME_WHITEOUT handling for inline directories
  ext4: make kobj_type structures constant
  ext4: fix cgroup writeback accounting with fs-layer encryption
2023-03-12 08:55:55 -07:00
Zhihao Cheng f5361da1e6 ext4: zero i_disksize when initializing the bootloader inode
If the boot loader inode has never been used before, the
EXT4_IOC_SWAP_BOOT inode will initialize it, including setting the
i_size to 0.  However, if the "never before used" boot loader has a
non-zero i_size, then i_disksize will be non-zero, and the
inconsistency between i_size and i_disksize can trigger a kernel
warning:

 WARNING: CPU: 0 PID: 2580 at fs/ext4/file.c:319
 CPU: 0 PID: 2580 Comm: bb Not tainted 6.3.0-rc1-00004-g703695902cfa
 RIP: 0010:ext4_file_write_iter+0xbc7/0xd10
 Call Trace:
  vfs_write+0x3b1/0x5c0
  ksys_write+0x77/0x160
  __x64_sys_write+0x22/0x30
  do_syscall_64+0x39/0x80

Reproducer:
 1. create corrupted image and mount it:
       mke2fs -t ext4 /tmp/foo.img 200
       debugfs -wR "sif <5> size 25700" /tmp/foo.img
       mount -t ext4 /tmp/foo.img /mnt
       cd /mnt
       echo 123 > file
 2. Run the reproducer program:
       posix_memalign(&buf, 1024, 1024)
       fd = open("file", O_RDWR | O_DIRECT);
       ioctl(fd, EXT4_IOC_SWAP_BOOT);
       write(fd, buf, 1024);

Fix this by setting i_disksize as well as i_size to zero when
initiaizing the boot loader inode.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217159
Cc: stable@kernel.org
Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Link: https://lore.kernel.org/r/20230308032643.641113-1-chengzhihao1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin f57886ca16 ext4: make sure fs error flag setted before clear journal error
Now, jounral error number maybe cleared even though ext4_commit_super()
failed. This may lead to error flag miss, then fsck will miss to check
file system deeply.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Ye Bin eee00237fa ext4: commit super block if fs record error when journal record without error
Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.com
2023-03-11 00:44:24 -05:00
Theodore Ts'o 62913ae96d ext4, jbd2: add an optimized bmap for the journal inode
The generic bmap() function exported by the VFS takes locks and does
checks that are not necessary for the journal inode.  So allow the
file system to set a journal-optimized bmap function in
journal->j_bmap.

Reported-by: syzbot+9543479984ae9e576000@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=e4aaa78795e490421c79f76ec3679006c8ff4cf0
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin 2b96b4a5d9 ext4: fix WARNING in ext4_update_inline_data
Syzbot found the following issue:
EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 without journal. Quota mode: none.
fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-aesni"
fscrypt: AES-256-XTS using implementation "xts-aes-aesni"
------------[ cut here ]------------
WARNING: CPU: 0 PID: 5071 at mm/page_alloc.c:5525 __alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
Modules linked in:
CPU: 1 PID: 5071 Comm: syz-executor263 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:__alloc_pages+0x30a/0x560 mm/page_alloc.c:5525
RSP: 0018:ffffc90003c2f1c0 EFLAGS: 00010246
RAX: ffffc90003c2f220 RBX: 0000000000000014 RCX: 0000000000000000
RDX: 0000000000000028 RSI: 0000000000000000 RDI: ffffc90003c2f248
RBP: ffffc90003c2f2d8 R08: dffffc0000000000 R09: ffffc90003c2f220
R10: fffff52000785e49 R11: 1ffff92000785e44 R12: 0000000000040d40
R13: 1ffff92000785e40 R14: dffffc0000000000 R15: 1ffff92000785e3c
FS:  0000555556c0d300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f95d5e04138 CR3: 00000000793aa000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __alloc_pages_node include/linux/gfp.h:237 [inline]
 alloc_pages_node include/linux/gfp.h:260 [inline]
 __kmalloc_large_node+0x95/0x1e0 mm/slab_common.c:1113
 __do_kmalloc_node mm/slab_common.c:956 [inline]
 __kmalloc+0xfe/0x190 mm/slab_common.c:981
 kmalloc include/linux/slab.h:584 [inline]
 kzalloc include/linux/slab.h:720 [inline]
 ext4_update_inline_data+0x236/0x6b0 fs/ext4/inline.c:346
 ext4_update_inline_dir fs/ext4/inline.c:1115 [inline]
 ext4_try_add_inline_entry+0x328/0x990 fs/ext4/inline.c:1307
 ext4_add_entry+0x5a4/0xeb0 fs/ext4/namei.c:2385
 ext4_add_nondir+0x96/0x260 fs/ext4/namei.c:2772
 ext4_create+0x36c/0x560 fs/ext4/namei.c:2817
 lookup_open fs/namei.c:3413 [inline]
 open_last_lookups fs/namei.c:3481 [inline]
 path_openat+0x12ac/0x2dd0 fs/namei.c:3711
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_openat fs/open.c:1342 [inline]
 __se_sys_openat fs/open.c:1337 [inline]
 __x64_sys_openat+0x243/0x290 fs/open.c:1337
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue happens as follows:
ext4_iget
   ext4_find_inline_data_nolock ->i_inline_off=164 i_inline_size=60
ext4_try_add_inline_entry
   __ext4_mark_inode_dirty
      ext4_expand_extra_isize_ea ->i_extra_isize=32 s_want_extra_isize=44
         ext4_xattr_shift_entries
	 ->after shift i_inline_off is incorrect, actually is change to 176
ext4_try_add_inline_entry
  ext4_update_inline_dir
    get_max_inline_xattr_value_size
      if (EXT4_I(inode)->i_inline_off)
	entry = (struct ext4_xattr_entry *)((void *)raw_inode +
			EXT4_I(inode)->i_inline_off);
        free += EXT4_XATTR_SIZE(le32_to_cpu(entry->e_value_size));
	->As entry is incorrect, then 'free' may be negative
   ext4_update_inline_data
      value = kzalloc(len, GFP_NOFS);
      -> len is unsigned int, maybe very large, then trigger warning when
         'kzalloc()'

To resolve the above issue we need to update 'i_inline_off' after
'ext4_xattr_shift_entries()'.  We do not need to set
EXT4_STATE_MAY_INLINE_DATA flag here, since ext4_mark_inode_dirty()
already sets this flag if needed.  Setting EXT4_STATE_MAY_INLINE_DATA
when it is needed may trigger a BUG_ON in ext4_writepages().

Reported-by: syzbot+d30838395804afc2fa6f@syzkaller.appspotmail.com
Cc: stable@kernel.org
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-3-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Ye Bin 1dcdce5919 ext4: move where set the MAY_INLINE_DATA flag is set
The only caller of ext4_find_inline_data_nolock() that needs setting of
EXT4_STATE_MAY_INLINE_DATA flag is ext4_iget_extra_inode().  In
ext4_write_inline_data_end() we just need to update inode->i_inline_off.
Since we are going to add one more caller that does not need to set
EXT4_STATE_MAY_INLINE_DATA, just move setting of EXT4_STATE_MAY_INLINE_DATA
out to ext4_iget_extra_inode().

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307015253.2232062-2-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-11 00:44:24 -05:00
Linus Torvalds 4831f76247 pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCZAuQ7gAKCRBZ7Krx/gZQ
 6xSmAPsFPc3ykvOWwCl7eTGS65gHZpK80e5lX9kZB8KIa5JjaAEA551vgRWi34+D
 PWvDDpN1QUFL6HHL+FR7heLJr2SKIwA=
 =RsPH
 -----END PGP SIGNATURE-----

Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc fixes from Al Viro:
 "pick_file() speculation fix + fix for alpha mis(merge,cherry-pick)

  The fs/file.c one is a genuine missing speculation barrier in
  pick_file() (reachable e.g. via close(2)). The alpha one is strictly
  speaking not a bug fix, but only because confusion between
  preempt_enable() and preempt_disable() is harmless on architecture
  without CONFIG_PREEMPT.

  Looks like alpha.git picked the wrong version of patch - that braino
  used to be there in early versions, but it had been fixed quite a
  while ago..."

* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: prevent out-of-bounds array speculation when closing a file descriptor
  alpha: fix lazy-FPU mis(merged/applied/whatnot)
2023-03-10 19:04:10 -08:00
Linus Torvalds 388a810192 Changes since last update:
- Fix LZMA decompression failure on HIGHMEM platforms;
 
  - Revert an inproper fix since it is actually an implementation
    issue of vmalloc();
 
  - Avoid a wrong DBG_BUGON since it could be triggered with -EINTR;
 
  - Minor cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCZAtOhBEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBKamAQC4njqAPMt1SPJU1HYACnS8TuNIC0CO2eT6
 gU11ja+AZwEAwwyjucoEirD1xCDBlTOUD2fPm3W87RVNr2juiZg2OA8=
 =Dj3V
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.3-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:
 "The most important one reverts an improper fix which can cause an
  unexpected warning more often on specific images, and another one
  fixes LZMA decompression on 32-bit platforms. The others are minor
  fixes and cleanups.

   - Fix LZMA decompression failure on HIGHMEM platforms

   - Revert an inproper fix since it is actually an implementation issue
     of vmalloc()

   - Avoid a wrong DBG_BUGON since it could be triggered with -EINTR

   - Minor cleanups"

* tag 'erofs-for-6.3-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: use wrapper i_blocksize() in erofs_file_read_iter()
  erofs: get rid of a useless DBG_BUGON
  erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
  erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
  erofs: mark z_erofs_lzma_init/erofs_pcpubuf_init w/ __init
2023-03-10 08:51:57 -08:00
Linus Torvalds 92cadfcffa nfsd-6.3 fixes:
- Protect NFSD writes against filesystem freezing
 - Fix a potential memory leak during server shutdown
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmQLPvQACgkQM2qzM29m
 f5crgRAAiuZvYynBppTxeEZ9ITLRJCgeYhTCJDBFngh3NZtW0UxrTUfYIry/MJTm
 j520mvdzx8+OVyasUJCMoy+dY/1toeCaCspfEdyul47JRVrxuzJokbYzyV5fFZav
 q3vDNHdBXOkrJkXvJd+VtbdZa2iYBMIQ1pTc6vG2FbUaxfNqZyg7uKug/R2Q/Ipj
 tF587jXQuMvpPwlXOeFol3fhLQCO9KjTrZF/d4+m5+KTSQ4sgzrgoWuVD/39Is4Q
 bVx5hBX74PFNHaFMqviIjtIEFjrOfe6wVjdNgbmpzbEiUm9VGZ6zJ2k+LsnawMIw
 8W3178Yjp+SEb21X+b+pe7/efIqD01SOX36zUOETL9gOvX3aMHbTuDGERQlCBUdZ
 9pcUwWfzeGzQSsnEvEi05HlnrDQpeavAc4tVYWgiEvO2S8cngXgsuoGhUO2ov8qY
 scq4PQGclpmR6IMYDJLL6JbOvfBDIMoFHGbDbdkjpkkhoRHgtQgnRebOgB20bxom
 D3XPk39U2CbVX9UJqgPN0yvA8L4biG4RPNZZNOjFtOup6oggpzwgN2M9h1ZhuulG
 bTOjHWl9NJmSNIYUDzOyH7m98UCI1pEXcH1kkV7vjjAIAPbn3ovXgzMjSBqDVvNj
 tyqGujS4snRb5/NI8g6O+VyTbMSnimu+tZpUvKH+4oSBH9ZAUVI=
 =z6NN
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:

 - Protect NFSD writes against filesystem freezing

 - Fix a potential memory leak during server shutdown

* tag 'nfsd-6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Fix a server shutdown leak
  NFSD: Protect against filesystem freezing
2023-03-10 08:45:30 -08:00
Linus Torvalds ae195ca1a8 for-6.3-rc1-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmQKUxwACgkQxWXV+ddt
 WDtPMg//RHAnHYRm+sHkXfRhz/+kWhipPo1OskLE5aYZaP1MSpk0NfNc1c6ZYwcg
 FQNeNQOooqBIYFpLeery14vw/FpFc/tivw7OP4XmtH9Jeyj6mwgAQpP5Gho8jDmm
 u90jf2UMwA+7qo57e9qfioufiZPGMsNnmK1BwdrcbuUZIz5UEZZ6u6BVhVFnEDGa
 y08Uv03t9g5F7msXfh4iBaPeJRgdWL7kiZfhFyCa6OHKiGOT39hYXn0ov1pET/yG
 IMECrX+BKiunABExHDN9VbW1AVWGmsvGjFYpZQnAWCm37cr3Mc7ngIz1FBF8hm+L
 9Cd07GhBOPaKzFI+uAzVJrA0QkKnI8Wgd1YT3LWWT0qj5gpPA5YL4G0V4KLzPBOt
 TBe4dW7g4o4EXsYBJzYwiLjHILZyydkPKEQ78Bt2mwjdGs4PYNBGwyl0I2bV/pV+
 dKGv+KOsiX2euPFtwVaIG5u8gEBCCoiKSO+HwphtfWyxnEE5/uvw0fdSJlKNt1Yj
 28f+qyzN9WuNK/aSxI+KfW4PAXvkoLi7w8tjyJp3vpj6VnSmaFf2EtGiKtGSmLVn
 3uSY8WZ24FdOHNV5QaliABGt/SaLG0rbLC8uPocryh0aW9xkMpvVVYPfTJmyWmxy
 kc5dfDhUinp5I0wLTtjRH407bB0CdukgpxOrN6GELqPufm7YvQk=
 =rJlY
 -----END PGP SIGNATURE-----

Merge tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "First batch of fixes. Among them there are two updates to sysfs and
  ioctl which are not strictly fixes but are used for testing so there's
  no reason to delay them.

   - fix block group item corruption after inserting new block group

   - fix extent map logging bit not cleared for split maps after
     dropping range

   - fix calculation of unusable block group space reporting bogus
     values due to 32/64b division

   - fix unnecessary increment of read error stat on write error

   - improve error handling in inode update

   - export per-device fsid in DEV_INFO ioctl to distinguish seeding
     devices, needed for testing

   - allocator size classes:
      - fix potential dead lock in size class loading logic
      - print sysfs stats for the allocation classes"

* tag 'for-6.3-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix block group item corruption after inserting new block group
  btrfs: fix extent map logging bit not cleared for split maps after dropping range
  btrfs: fix percent calculation for bg reclaim message
  btrfs: fix unnecessary increment of read error stat on write error
  btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
  btrfs: ioctl: return device fsid from DEV_INFO ioctl
  btrfs: fix potential dead lock in size class loading logic
  btrfs: sysfs: add size class stats
2023-03-10 08:39:13 -08:00
Chuck Lever e57d065277 NFS & NFSD: Update GSS dependencies
Geert reports that:
> On v6.2, "make ARCH=m68k defconfig" gives you
> CONFIG_RPCSEC_GSS_KRB5=m
> On v6.3, it became builtin, due to dropping the dependencies on
> the individual crypto modules.
>
> $ grep -E "CRYPTO_(MD5|DES|CBC|CTS|ECB|HMAC|SHA1|AES)" .config
> CONFIG_CRYPTO_AES=y
> CONFIG_CRYPTO_AES_TI=m
> CONFIG_CRYPTO_DES=m
> CONFIG_CRYPTO_CBC=m
> CONFIG_CRYPTO_CTS=m
> CONFIG_CRYPTO_ECB=m
> CONFIG_CRYPTO_HMAC=m
> CONFIG_CRYPTO_MD5=m
> CONFIG_CRYPTO_SHA1=m

This behavior is triggered by the "default y" in the definition of
RPCSEC_GSS.

The "default y" was added in 2010 by commit df486a2590 ("NFS: Fix
the selection of security flavours in Kconfig"). However,
svc_gss_principal was removed in 2012 by commit 03a4e1f6dd
("nfsd4: move principal name into svc_cred"), so the 2010 fix is
no longer necessary. We can safely change the NFS_V4 and NFSD_V4
dependencies back to RPCSEC_GSS_KRB5 to get the nicer v6.2
behavior back.

Selecting KRB5 symbolically represents the true requirement here:
that all spec-compliant NFSv4 implementations must have Kerberos
available to use.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Fixes: dfe9a12345 ("SUNRPC: Enable rpcsec_gss_krb5.ko to be built without CRYPTO_DES")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-10 09:38:47 -05:00
Theodore Ts'o 609d544414 fs: prevent out-of-bounds array speculation when closing a file descriptor
Google-Bug-Id: 114199369
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2023-03-09 22:46:21 -05:00
Seth Forshee 42d0c4bdf7
filelocks: use mount idmapping for setlease permission check
A user should be allowed to take out a lease via an idmapped mount if
the fsuid matches the mapped uid of the inode. generic_setlease() is
checking the unmapped inode uid, causing these operations to be denied.

Fix this by comparing against the mapped inode uid instead of the
unmapped uid.

Fixes: 9caccd4154 ("fs: introduce MOUNT_ATTR_IDMAP")
Cc: stable@vger.kernel.org
Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 22:36:12 +01:00
Yue Hu 3993f4f456 erofs: use wrapper i_blocksize() in erofs_file_read_iter()
linux/fs.h has a wrapper for this operation.

Signed-off-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230306075527.1338-1-zbestahu@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:04 +08:00
Gao Xiang 9ff471800b erofs: get rid of a useless DBG_BUGON
`err` could be -EINTR and it should not be the case.  Actually such
DBG_BUGON is useless.

Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230309053148.9223-2-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:01 +08:00
Gao Xiang 647dd2c3f0 erofs: Revert "erofs: fix kvcalloc() misuse with __GFP_NOFAIL"
Let's revert commit 12724ba389 ("erofs: fix kvcalloc() misuse with
__GFP_NOFAIL") since kvmalloc() already supports __GFP_NOFAIL in commit
a421ef3030 ("mm: allow !GFP_KERNEL allocations for kvmalloc").  So
the original fix was wrong.

Actually there was some issue as [1] discussed, so before that mm fix
is landed, the warn could still happen but applying this commit first
will cause less.

[1] https://lore.kernel.org/r/20230305053035.1911-1-hsiangkao@linux.alibaba.com

Fixes: 12724ba389 ("erofs: fix kvcalloc() misuse with __GFP_NOFAIL")
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230309053148.9223-1-hsiangkao@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 23:36:01 +08:00
Gao Xiang 8f121dfb15 erofs: fix wrong kunmap when using LZMA on HIGHMEM platforms
As the call trace shown, the root cause is kunmap incorrect pages:

 BUG: kernel NULL pointer dereference, address: 00000000
 CPU: 1 PID: 40 Comm: kworker/u5:0 Not tainted 6.2.0-rc5 #4
 Workqueue: erofs_worker z_erofs_decompressqueue_work
 EIP: z_erofs_lzma_decompress+0x34b/0x8ac
  z_erofs_decompress+0x12/0x14
  z_erofs_decompress_queue+0x7e7/0xb1c
  z_erofs_decompressqueue_work+0x32/0x60
  process_one_work+0x24b/0x4d8
  ? process_one_work+0x1a4/0x4d8
  worker_thread+0x14c/0x3fc
  kthread+0xe6/0x10c
  ? rescuer_thread+0x358/0x358
  ? kthread_complete_and_exit+0x18/0x18
  ret_from_fork+0x1c/0x28
 ---[ end trace 0000000000000000 ]---

The bug is trivial and should be fixed now.  It has no impact on
!HIGHMEM platforms.

Fixes: 622ceaddb7 ("erofs: lzma compression support")
Cc: <stable@vger.kernel.org> # 5.16+
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230305134455.88236-1-hsiangkao@linux.alibaba.com
2023-03-09 22:49:57 +08:00
Yangtao Li a279adedbb erofs: mark z_erofs_lzma_init/erofs_pcpubuf_init w/ __init
They are used during the erofs module init phase. Let's mark it as
__init like any other function.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20230303063731.66760-1-frank.li@vivo.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2023-03-09 22:49:30 +08:00
Jiapeng Chong dc592190a5
fs/locks: Remove redundant assignment to cmd
Variable 'cmd' set but not used.

fs/locks.c:2428:3: warning: Value stored to 'cmd' is never read.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4439
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 10:17:04 +01:00
Jiapeng Chong c3a4aec055
splice: Remove redundant assignment to ret
The variable ret belongs to redundant assignment and can be deleted.

fs/splice.c:940:2: warning: Value stored to 'ret' is never read.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4406
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2023-03-09 10:10:31 +01:00
Jan Kara 3c92792da8 ext4: Fix deadlock during directory rename
As lockdep properly warns, we should not be locking i_rwsem while having
transactions started as the proper lock ordering used by all directory
handling operations is i_rwsem -> transaction start. Fix the lock
ordering by moving the locking of the directory earlier in
ext4_rename().

Reported-by: syzbot+9d16c39efb5fade84574@syzkaller.appspotmail.com
Fixes: 0813299c58 ("ext4: Fix possible corruption when moving a directory")
Link: https://syzkaller.appspot.com/bug?extid=9d16c39efb5fade84574
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230301141004.15087-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 21:45:38 -05:00
Tudor Ambarus 7fc1f5c28a ext4: Fix comment about the 64BIT feature
64BIT is part of the incompatible feature set, update the comment
accordingly.

Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20230301133842.671821-1-tudor.ambarus@linaro.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:27:54 -05:00
Darrick J. Wong c993799baf ext4: fix another off-by-one fsmap error on 1k block filesystems
Apparently syzbot figured out that issuing this FSMAP call:

struct fsmap_head cmd = {
	.fmh_count	= ...;
	.fmh_keys	= {
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
	},
...
};
ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);

Produces this crash if the underlying filesystem is a 1k-block ext4
filesystem:

kernel BUG at fs/ext4/ext4.h:3331!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
Call Trace:
 <TASK>
 ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf20558aff
RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000

For GETFSMAP calls, the caller selects a physical block device by
writing its block number into fsmap_head.fmh_keys[01].fmr_device.
To query mappings for a subrange of the device, the starting byte of the
range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.

IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
set the inputs as follows:

	fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
	fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},

Which would return you whatever is mapped in the 12 bytes starting at
physical offset 3.

The crash is due to insufficient range validation of keys[1] in
ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
the filesystem, which means that s_first_data_block is nonzero.
ext4_get_group_no_and_offset subtracts this quantity from the blocknr
argument before cracking it into a group number and a block number
within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
instead of 0-8191 (0-based) like what happens with larger blocksizes.

The net result of this encoding is that blocknr < s_first_data_block is
not a valid input to this function.  The end_fsb variable is set from
the keys that are copied from userspace, which means that in the above
example, its value is zero.  That leads to an underflow here:

	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);

The division then operates on -1:

	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
		EXT4_SB(sb)->s_cluster_bits;

Leaving an impossibly large group number (2^32-1) in blocknr.
ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
keys[1].fmr_physical are in increasing order, but
ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
s_first_data_block.  This implies that we have to check it again after
the adjustment, which is the piece that I forgot.

Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Fixes: 4a4956249d ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Cc: stable@vger.kernel.org
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnolia
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Eric Whitney c9f62c8b2d ext4: fix RENAME_WHITEOUT handling for inline directories
A significant number of xfstests can cause ext4 to log one or more
warning messages when they are run on a test file system where the
inline_data feature has been enabled.  An example:

"EXT4-fs warning (device vdc): ext4_dirblock_csum_set:425: inode
 #16385: comm fsstress: No space for directory leaf checksum. Please
run e2fsck -D."

The xfstests include: ext4/057, 058, and 307; generic/013, 051, 068,
070, 076, 078, 083, 232, 269, 270, 390, 461, 475, 476, 482, 579, 585,
589, 626, 631, and 650.

In this situation, the warning message indicates a bug in the code that
performs the RENAME_WHITEOUT operation on a directory entry that has
been stored inline.  It doesn't detect that the directory is stored
inline, and incorrectly attempts to compute a dirent block checksum on
the whiteout inode when creating it.  This attempt fails as a result
of the integrity checking in get_dirent_tail (usually due to a failure
to match the EXT4_FT_DIR_CSUM magic cookie), and the warning message
is then emitted.

Fix this by simply collecting the inlined data state at the time the
search for the source directory entry is performed.  Existing code
handles the rest, and this is sufficient to eliminate all spurious
warning messages produced by the tests above.  Go one step further
and do the same in the code that resets the source directory entry in
the event of failure.  The inlined state should be present in the
"old" struct, but given the possibility of a race there's no harm
in taking a conservative approach and getting that information again
since the directory entry is being reread anyway.

Fixes: b7ff91fd03 ("ext4: find old entry again if failed to rename whiteout")
Cc: stable@kernel.org
Signed-off-by: Eric Whitney <enwlinux@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230210173244.679890-1-enwlinux@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Thomas Weißschuh 60db00e0a1 ext4: make kobj_type structures constant
Since commit ee6d3dd4ed ("driver core: make kobj_type constant.")
the driver core allows the usage of const struct kobj_type.

Take advantage of this to constify the structure definitions to prevent
modification at runtime.

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230209-kobj_type-ext4-v1-1-6865fb05c1f8@weissschuh.net
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:20:48 -05:00
Eric Biggers ffec85d53d ext4: fix cgroup writeback accounting with fs-layer encryption
When writing a page from an encrypted file that is using
filesystem-layer encryption (not inline encryption), ext4 encrypts the
pagecache page into a bounce page, then writes the bounce page.

It also passes the bounce page to wbc_account_cgroup_owner().  That's
incorrect, because the bounce page is a newly allocated temporary page
that doesn't have the memory cgroup of the original pagecache page.
This makes wbc_account_cgroup_owner() not account the I/O to the owner
of the pagecache page as it should.

Fix this by always passing the pagecache page to
wbc_account_cgroup_owner().

Fixes: 001e4a8775 ("ext4: implement cgroup writeback support")
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20230203005503.141557-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-03-07 20:12:30 -05:00
Jan Kara via Ocfs2-devel 90410bcf87 ocfs2: fix data corruption after failed write
When buffered write fails to copy data into underlying page cache page,
ocfs2_write_end_nolock() just zeroes out and dirties the page.  This can
leave dirty page beyond EOF and if page writeback tries to write this page
before write succeeds and expands i_size, page gets into inconsistent
state where page dirty bit is clear but buffer dirty bits stay set
resulting in page data never getting written and so data copied to the
page is lost.  Fix the problem by invalidating page beyond EOF after
failed write.

Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz
Fixes: 6dbf7bb555 ("fs: Don't invalidate page buffers in block_write_full_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-07 17:04:55 -08:00
Filipe Manana 675dfe1223 btrfs: fix block group item corruption after inserting new block group
We can often end up inserting a block group item, for a new block group,
with a wrong value for the used bytes field.

This happens if for the new allocated block group, in the same transaction
that created the block group, we have tasks allocating extents from it as
well as tasks removing extents from it.

For example:

1) Task A creates a metadata block group X;

2) Two extents are allocated from block group X, so its "used" field is
   updated to 32K, and its "commit_used" field remains as 0;

3) Transaction commit starts, by some task B, and it enters
   btrfs_start_dirty_block_groups(). There it tries to update the block
   group item for block group X, which currently has its "used" field with
   a value of 32K. But that fails since the block group item was not yet
   inserted, and so on failure update_block_group_item() sets the
   "commit_used" field of the block group back to 0;

4) The block group item is inserted by task A, when for example
   btrfs_create_pending_block_groups() is called when releasing its
   transaction handle. This results in insert_block_group_item() inserting
   the block group item in the extent tree (or block group tree), with a
   "used" field having a value of 32K, but without updating the
   "commit_used" field in the block group, which remains with value of 0;

5) The two extents are freed from block X, so its "used" field changes
   from 32K to 0;

6) The transaction commit by task B continues, it enters
   btrfs_write_dirty_block_groups() which calls update_block_group_item()
   for block group X, and there it decides to skip the block group item
   update, because "used" has a value of 0 and "commit_used" has a value
   of 0 too.

   As a result, we end up with a block item having a 32K "used" field but
   no extents allocated from it.

When this issue happens, a btrfs check reports an error like this:

   [1/7] checking root items
   [2/7] checking extents
   block group [1104150528 1073741824] used 39796736 but extent items used 0
   ERROR: errors found in extent allocation tree or chunk allocation
   (...)

Fix this by making insert_block_group_item() update the block group's
"commit_used" field.

Fixes: 7248e0cebb ("btrfs: skip update of block group item if used bytes are the same")
CC: stable@vger.kernel.org # 6.2+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-08 01:14:01 +01:00
Chuck Lever fd9a2e1d51 NFSD: Protect against filesystem freezing
Flole observes this WARNING on occasion:

[1210423.486503] WARNING: CPU: 8 PID: 1524732 at fs/ext4/ext4_jbd2.c:75 ext4_journal_check_start+0x68/0xb0

Reported-by: <flole@flole.de>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217123
Fixes: 73da852e38 ("nfsd: use vfs_iter_read/write")
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2023-03-07 09:33:42 -05:00
Filipe Manana e4cc1483f3 btrfs: fix extent map logging bit not cleared for split maps after dropping range
At btrfs_drop_extent_map_range() we are clearing the EXTENT_FLAG_LOGGING
bit on a 'flags' variable that was not initialized. This makes static
checkers complain about it, so initialize the 'flags' variable before
clearing the bit.

In practice this has no consequences, because EXTENT_FLAG_LOGGING should
not be set when btrfs_drop_extent_map_range() is called, as an fsync locks
the inode in exclusive mode, locks the inode's mmap semaphore in exclusive
mode too and it always flushes all delalloc.

Also add a comment about why we clear EXTENT_FLAG_LOGGING on a copy of the
flags of the split extent map.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/Y%2FyipSVozUDEZKow@kili/
Fixes: db21370bff ("btrfs: drop extent map range more efficiently")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Johannes Thumshirn 95cd356ca2 btrfs: fix percent calculation for bg reclaim message
We have a report, that the info message for block-group reclaim is
crossing the 100% used mark.

This is happening as we were truncating the divisor for the division
(the block_group->length) to a 32bit value.

Fix this by using div64_u64() to not truncate the divisor.

In the worst case, it can lead to a div by zero error and should be
possible to trigger on 4 disks RAID0, and each device is large enough:

  $ mkfs.btrfs  -f /dev/test/scratch[1234] -m raid1 -d raid0
  btrfs-progs v6.1
  [...]
  Filesystem size:    40.00GiB
  Block group profiles:
    Data:             RAID0             4.00GiB <<<
    Metadata:         RAID1           256.00MiB
    System:           RAID1             8.00MiB

Reported-by: Forza <forza@tnonline.net>
Link: https://lore.kernel.org/linux-btrfs/e99483.c11a58d.1863591ca52@tnonline.net/
Fixes: 5f93e776c6 ("btrfs: zoned: print unusable percentage when reclaiming block groups")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
[ add Qu's note ]
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Naohiro Aota 98e8d36a26 btrfs: fix unnecessary increment of read error stat on write error
Current btrfs_log_dev_io_error() increases the read error count even if the
erroneous IO is a WRITE request. This is because it forget to use "else
if", and all the error WRITE requests counts as READ error as there is (of
course) no REQ_RAHEAD bit set.

Fixes: c3a62baf21 ("btrfs: use chained bios when cloning")
CC: stable@vger.kernel.org # 6.1+
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
void0red c06016a02a btrfs: handle btrfs_del_item errors in __btrfs_update_delayed_inode
Even if the slot is already read out, we may still need to re-balance
the tree, thus it can cause error in that btrfs_del_item() call and we
need to handle it properly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: void0red <void0red@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Qu Wenruo 2943868a90 btrfs: ioctl: return device fsid from DEV_INFO ioctl
Currently user space utilizes dev info ioctl to grab the info of a
certain devid, this includes its device uuid.  But the returned info is
not enough to determine if a device is a seed.

Commit a26d60dedf ("btrfs: sysfs: add devinfo/fsid to retrieve actual
fsid from the device") exports the same value in sysfs so this is for
parity with ioctl.  Add a new member, fsid, into
btrfs_ioctl_dev_info_args, and populate the member with fsid value.

This should not cause any compatibility problem, following the
combinations:

- Old user space, old kernel
- Old user space, new kernel
  User space tool won't even check the new member.

- New user space, old kernel
  The kernel won't touch the new member, and user space tool should
  zero out its argument, thus the new member is all zero.

  User space tool can then know the kernel doesn't support this fsid
  reporting, and falls back to whatever they can.

- New user space, new kernel
  Go as planned.

  Would find the fsid member is no longer zero, and trust its value.

Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Boris Burkov 12148367d7 btrfs: fix potential dead lock in size class loading logic
As reported by Filipe, there's a potential deadlock caused by
using btrfs_search_forward on commit_root. The locking there is
unconditional, even if ->skip_locking and ->search_commit_root is set.
It's not meant to be used for commit roots, so it always needs to do
locking.

So if another task is COWing a child node of the same root node and
then needs to wait for block group caching to complete when trying to
allocate a metadata extent, it deadlocks.

For example:

[539604.239315] sysrq: Show Blocked State
[539604.240133] task:kworker/u16:6   state:D stack:0     pid:2119594 ppid:2      flags:0x00004000
[539604.241613] Workqueue: btrfs-cache btrfs_work_helper [btrfs]
[539604.242673] Call Trace:
[539604.243129]  <TASK>
[539604.243925]  __schedule+0x41d/0xee0
[539604.244797]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.245399]  ? rwsem_down_read_slowpath+0x185/0x490
[539604.246111]  schedule+0x5d/0xf0
[539604.246593]  rwsem_down_read_slowpath+0x2da/0x490
[539604.247290]  ? rcu_barrier_tasks_trace+0x10/0x20
[539604.248090]  __down_read_common+0x3d/0x150
[539604.248702]  down_read_nested+0xc3/0x140
[539604.249280]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
[539604.250097]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
[539604.250915]  btrfs_search_forward+0x59/0x460 [btrfs]
[539604.251781]  ? btrfs_global_root+0x50/0x70 [btrfs]
[539604.252476]  caching_thread+0x1be/0x920 [btrfs]
[539604.253167]  btrfs_work_helper+0xf6/0x400 [btrfs]
[539604.253848]  process_one_work+0x24f/0x5a0
[539604.254476]  worker_thread+0x52/0x3b0
[539604.255166]  ? __pfx_worker_thread+0x10/0x10
[539604.256047]  kthread+0xf0/0x120
[539604.256591]  ? __pfx_kthread+0x10/0x10
[539604.257212]  ret_from_fork+0x29/0x50
[539604.257822]  </TASK>
[539604.258233] task:btrfs-transacti state:D stack:0     pid:2236474 ppid:2      flags:0x00004000
[539604.259802] Call Trace:
[539604.260243]  <TASK>
[539604.260615]  __schedule+0x41d/0xee0
[539604.261205]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.262000]  ? rwsem_down_read_slowpath+0x185/0x490
[539604.262822]  schedule+0x5d/0xf0
[539604.263374]  rwsem_down_read_slowpath+0x2da/0x490
[539604.266228]  ? lock_acquire+0x160/0x310
[539604.266917]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.267996]  ? lock_contended+0x19e/0x500
[539604.268720]  __down_read_common+0x3d/0x150
[539604.269400]  down_read_nested+0xc3/0x140
[539604.270057]  __btrfs_tree_read_lock+0x24/0x100 [btrfs]
[539604.271129]  btrfs_read_lock_root_node+0x48/0x60 [btrfs]
[539604.272372]  btrfs_search_slot+0x143/0xf70 [btrfs]
[539604.273295]  update_block_group_item+0x9e/0x190 [btrfs]
[539604.274282]  btrfs_start_dirty_block_groups+0x1c4/0x4f0 [btrfs]
[539604.275381]  ? __mutex_unlock_slowpath+0x45/0x280
[539604.276390]  btrfs_commit_transaction+0xee/0xed0 [btrfs]
[539604.277391]  ? lock_acquire+0x1a4/0x310
[539604.278080]  ? start_transaction+0xcb/0x6c0 [btrfs]
[539604.279099]  transaction_kthread+0x142/0x1c0 [btrfs]
[539604.279996]  ? __pfx_transaction_kthread+0x10/0x10 [btrfs]
[539604.280673]  kthread+0xf0/0x120
[539604.281050]  ? __pfx_kthread+0x10/0x10
[539604.281496]  ret_from_fork+0x29/0x50
[539604.281966]  </TASK>
[539604.282255] task:fsstress        state:D stack:0     pid:2236483 ppid:1      flags:0x00004006
[539604.283897] Call Trace:
[539604.284700]  <TASK>
[539604.285088]  __schedule+0x41d/0xee0
[539604.285660]  schedule+0x5d/0xf0
[539604.286175]  btrfs_wait_block_group_cache_progress+0xf2/0x170 [btrfs]
[539604.287342]  ? __pfx_autoremove_wake_function+0x10/0x10
[539604.288450]  find_free_extent+0xd93/0x1750 [btrfs]
[539604.289256]  ? _raw_spin_unlock+0x29/0x50
[539604.289911]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
[539604.290843]  btrfs_reserve_extent+0x147/0x290 [btrfs]
[539604.291943]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
[539604.292903]  __btrfs_cow_block+0x138/0x580 [btrfs]
[539604.293773]  btrfs_cow_block+0x10e/0x240 [btrfs]
[539604.294595]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
[539604.295585]  btrfs_update_device+0x71/0x1b0 [btrfs]
[539604.296459]  btrfs_chunk_alloc_add_chunk_item+0xe0/0x340 [btrfs]
[539604.297489]  btrfs_chunk_alloc+0x1bf/0x490 [btrfs]
[539604.298335]  find_free_extent+0x6fa/0x1750 [btrfs]
[539604.299174]  ? _raw_spin_unlock+0x29/0x50
[539604.299950]  ? btrfs_get_alloc_profile+0x127/0x2a0 [btrfs]
[539604.300918]  btrfs_reserve_extent+0x147/0x290 [btrfs]
[539604.301797]  btrfs_alloc_tree_block+0xcb/0x3e0 [btrfs]
[539604.303017]  ? lock_release+0x224/0x4a0
[539604.303855]  __btrfs_cow_block+0x138/0x580 [btrfs]
[539604.304789]  btrfs_cow_block+0x10e/0x240 [btrfs]
[539604.305611]  btrfs_search_slot+0x7f3/0xf70 [btrfs]
[539604.306682]  ? btrfs_global_root+0x50/0x70 [btrfs]
[539604.308198]  lookup_inline_extent_backref+0x17b/0x7a0 [btrfs]
[539604.309254]  lookup_extent_backref+0x43/0xd0 [btrfs]
[539604.310122]  __btrfs_free_extent+0xf8/0x810 [btrfs]
[539604.310874]  ? lock_release+0x224/0x4a0
[539604.311724]  ? btrfs_merge_delayed_refs+0x17b/0x1d0 [btrfs]
[539604.313023]  __btrfs_run_delayed_refs+0x2ba/0x1260 [btrfs]
[539604.314271]  btrfs_run_delayed_refs+0x8f/0x1c0 [btrfs]
[539604.315445]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.316706]  btrfs_commit_transaction+0xa2/0xed0 [btrfs]
[539604.317855]  ? do_raw_spin_unlock+0x4b/0xa0
[539604.318544]  ? _raw_spin_unlock+0x29/0x50
[539604.319240]  create_subvol+0x53d/0x6e0 [btrfs]
[539604.320283]  btrfs_mksubvol+0x4f5/0x590 [btrfs]
[539604.321220]  __btrfs_ioctl_snap_create+0x11b/0x180 [btrfs]
[539604.322307]  btrfs_ioctl_snap_create_v2+0xc6/0x150 [btrfs]
[539604.323295]  btrfs_ioctl+0x9f7/0x33e0 [btrfs]
[539604.324331]  ? rcu_read_lock_sched_held+0x12/0x70
[539604.325137]  ? lock_release+0x224/0x4a0
[539604.325808]  ? __x64_sys_ioctl+0x87/0xc0
[539604.326467]  __x64_sys_ioctl+0x87/0xc0
[539604.327109]  do_syscall_64+0x38/0x90
[539604.327875]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[539604.328792] RIP: 0033:0x7f05a7babaeb

This needs to use regular btrfs_search_slot() with some skip and stop
logic.

Since we only consider five samples (five search slots), don't bother
with the complexity of looking for commit_root_sem contention. If
necessary, it can be added to the load function in between samples.

Reported-by: Filipe Manana <fdmanana@kernel.org>
Link: https://lore.kernel.org/linux-btrfs/CAL3q7H7eKMD44Z1+=Kb-1RFMMeZpAm2fwyO59yeBwCcSOU80Pg@mail.gmail.com/
Fixes: c7eec3d9aa ("btrfs: load block group size class when caching")
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-06 19:28:19 +01:00
Jan Kara 63bceed808 udf: Warn if block mapping is done for in-ICB files
Now that address space operations are merge dfor in-ICB and normal
files, it is more likely some code mistakenly tries to map blocks for
in-ICB files. WARN and return error instead of silently returning
garbage.

Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Jan Kara cecb1f0654 udf: Fix reading of in-ICB files
After merging address space operations of normal and in-ICB files,
readahead could get called for in-ICB files which resulted in
udf_get_block() being called for these files. udf_get_block() is not
prepared to be called for in-ICB files and ends up returning garbage
results as it interprets file data as extent list. Fix the problem by
skipping readahead for in-ICB files.

Fixes: 37a8a39f7a ("udf: Switch to single address_space_operations")
Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Jan Kara 49854d3ccc udf: Fix lost writes in udf_adinicb_writepage()
The patch converting udf_adinicb_writepage() to avoid manually kmapping
the page used memcpy_to_page() however that copies in the wrong
direction (effectively overwriting file data with the old contents).
What we should be using is memcpy_from_page() to copy data from the page
into the inode and then mark inode dirty to store the data.

Fixes: 5cfc45321a ("udf: Convert udf_adinicb_writepage() to memcpy_to_page()")
Signed-off-by: Jan Kara <jack@suse.cz>
2023-03-06 16:38:25 +01:00
Zhang Xiaoxu d0dc411199 cifs: Move the in_send statistic to __smb_send_rqst()
When send SMB_COM_NT_CANCEL and RFC1002_SESSION_REQUEST, the
in_send statistic was lost.

Let's move the in_send statistic to the send function to avoid
this scenario.

Fixes: 7ee1af765d ("[CIFS]")
Signed-off-by: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-05 17:50:38 -06:00
Dave Chinner 8ac5b996bf xfs: fix off-by-one-block in xfs_discard_folio()
The recent writeback corruption fixes changed the code in
xfs_discard_folio() to calculate a byte range to for punching
delalloc extents. A mistake was made in using round_up(pos) for the
end offset, because when pos points at the first byte of a block, it
does not get rounded up to point to the end byte of the block. hence
the punch range is short, and this leads to unexpected behaviour in
certain cases in xfs_bmap_punch_delalloc_range.

e.g. pos = 0 means we call xfs_bmap_punch_delalloc_range(0,0), so
there is no previous extent and it rounds up the punch to the end of
the delalloc extent it found at offset 0, not the end of the range
given to xfs_bmap_punch_delalloc_range().

Fix this by handling the zero block offset case correctly.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217030
Link: https://lore.kernel.org/linux-xfs/Y+vOfaxIWX1c%2Fyy9@bfoster/
Fixes: 7348b32233 ("xfs: xfs_bmap_punch_delalloc_range() should take a byte range")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Found-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05 15:13:23 -08:00
Dave Chinner 0c7273e494 xfs: quotacheck failure can race with background inode inactivation
The background inode inactivation can attached dquots to inodes, but
this can race with a foreground quotacheck failure that leads to
disabling quotas and freeing the mp->m_quotainfo structure. The
background inode inactivation then tries to allocate a quota, tries
to dereference mp->m_quotainfo, and crashes like so:

XFS (loop1): Quotacheck: Unsuccessful (Error -5): Disabling quotas.
xfs filesystem being mounted at /root/syzkaller.qCVHXV/0/file0 supports timestamps until 2038 (0x7fffffff)
BUG: kernel NULL pointer dereference, address: 00000000000002a8
....
CPU: 0 PID: 161 Comm: kworker/0:4 Not tainted 6.2.0-c9c3395d5e3d #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: xfs-inodegc/loop1 xfs_inodegc_worker
RIP: 0010:xfs_dquot_alloc+0x95/0x1e0
....
Call Trace:
 <TASK>
 xfs_qm_dqread+0x46/0x440
 xfs_qm_dqget_inode+0x154/0x500
 xfs_qm_dqattach_one+0x142/0x3c0
 xfs_qm_dqattach_locked+0x14a/0x170
 xfs_qm_dqattach+0x52/0x80
 xfs_inactive+0x186/0x340
 xfs_inodegc_worker+0xd3/0x430
 process_one_work+0x3b1/0x960
 worker_thread+0x52/0x660
 kthread+0x161/0x1a0
 ret_from_fork+0x29/0x50
 </TASK>
....

Prevent this race by flushing all the queued background inode
inactivations pending before purging all the cached dquots when
quotacheck fails.

Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2023-03-05 15:13:22 -08:00
Linus Torvalds 20fdfd55ab 17 hotfixes. Eight are for MM and seven are for other parts of the
kernel.  Seven are cc:stable and eight address post-6.3 issues or were
 judged unsuitable for -stable backporting.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZAO0bAAKCRDdBJ7gKXxA
 jo73AP0Sbgd+E0u5Hs+aACHW28FpxleVRdyexc5chXD5QsyLKgEAwjntE7jfHHYK
 GkUKsoWQJblgjm3ksRxdLbVkDSQ8sQE=
 =CQ0B
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc fixes from Andrew Morton:
 "17 hotfixes.

  Eight are for MM and seven are for other parts of the kernel. Seven
  are cc:stable and eight address post-6.3 issues or were judged
  unsuitable for -stable backporting"

* tag 'mm-hotfixes-stable-2023-03-04-13-12' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  mailmap: map Dikshita Agarwal's old address to his current one
  mailmap: map Vikash Garodia's old address to his current one
  fs/cramfs/inode.c: initialize file_ra_state
  fs: hfsplus: fix UAF issue in hfsplus_put_super
  panic: fix the panic_print NMI backtrace setting
  lib: parser: update documentation for match_NUMBER functions
  kasan, x86: don't rename memintrinsics in uninstrumented files
  kasan: test: fix test for new meminstrinsic instrumentation
  kasan: treat meminstrinsic as builtins in uninstrumented files
  kasan: emit different calls for instrumentable memintrinsics
  ocfs2: fix non-auto defrag path not working issue
  ocfs2: fix defrag path triggering jbd2 ASSERT
  mailmap: map Georgi Djakov's old Linaro address to his current one
  mm/hwpoison: convert TTU_IGNORE_HWPOISON to TTU_HWPOISON
  lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH
  mm/damon/paddr: fix missing folio_put()
  mm/mremap: fix dup_anon_vma() in vma_merge() case 4
2023-03-04 13:32:50 -08:00
Linus Torvalds 3162745aad 9 cifs/smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmQCU+4ACgkQiiy9cAdy
 T1HuQAwArSaTpdZfgnd08HKz783fpGtDxicv47mqyQHOEzpReu7ZsVY7ybJgaIZy
 lR96qzTcZWsHi9WF69uF9hShDz4z4G15UfJXoa1AzoQUR9Kb6jsInEXodOeZQZBU
 57sioJqr7CU66i4yIWb30900iC7cPmmJ4YdOUiSIbeb7edW5IHQ59IyxuYRkJP3W
 EIWrKQ5wJQS49YiaM0C3LSpLgIRLhFU3Fkar66wHY8SwDHJunR0BzuiJn5EVgcVY
 Vf4R4K1F8s2H2svg9fwbqZhihUjrr0Je3T/1zrDg1xUom0uWYqA42iSIBhocSuRz
 J3ipIwpQwSiud4dvegq3QHbK61jgDsNa2VQu5+njGp6wX5pYefwbN50aTZ72xg6z
 j0tnpwPtw0ZW0TO57Q5wdwbQ+yjC4GEJwLcD9WPtUKDsGakVY7fnMdwfE2SAe+xg
 u5StlRUegYybMH1wCd3yHuAR17JisE0JyWlP9QyLCwL1Xa5ii1cB+YqHH/DTOp/l
 YDD69Hzi
 =wYW2
 -----END PGP SIGNATURE-----

Merge tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more cifs updates from Steve French:

 - xfstest generic/208 fix (memory leak)

 - minor netfs fix (to address smatch warning)

 - a DFS fix for stable

 - a reconnect race fix

 - two multichannel fixes

 - RDMA (smbdirect) fix

 - two additional writeback fixes from David

* tag '6.3-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Fix memory leak in direct I/O
  cifs: prevent data race in cifs_reconnect_tcon()
  cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID
  iov: Fix netfs_extract_user_to_sg()
  cifs: Fix cifs_write_back_from_locked_folio()
  cifs: reuse cifs_match_ipaddr for comparison of dstaddr too
  cifs: match even the scope id for ipv6 addresses
  cifs: Fix an uninitialised variable
  cifs: Add some missing xas_retry() calls
2023-03-03 16:26:43 -08:00
Andrew Morton 3e35102666 fs/cramfs/inode.c: initialize file_ra_state
file_ra_state_init() assumes that the file_ra_state has been zeroed out. 
Fixes a KMSAN used-unintialized issue (at least).

Fixes: cf948cbc35 ("cramfs: read_mapping_page() is synchronous")
Reported-by: syzbot <syzbot+8ce7f8308d91e6b8bbe2@syzkaller.appspotmail.com>
  Link: https://lkml.kernel.org/r/0000000000008f74e905f56df987@google.com
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Nicolas Pitre <nico@fluxnic.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-02 21:54:23 -08:00
Dongliang Mu 07db5e247a fs: hfsplus: fix UAF issue in hfsplus_put_super
The current hfsplus_put_super first calls hfs_btree_close on
sbi->ext_tree, then invokes iput on sbi->hidden_dir, resulting in an
use-after-free issue in hfsplus_release_folio.

As shown in hfsplus_fill_super, the error handling code also calls iput
before hfs_btree_close.

To fix this error, we move all iput calls before hfsplus_btree_close.

Note that this patch is tested on Syzbot.

Link: https://lkml.kernel.org/r/20230226124948.3175736-1-mudongliangabcd@gmail.com
Reported-by: syzbot+57e3e98f7e3b80f64d56@syzkaller.appspotmail.com
Tested-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-03-02 21:54:23 -08:00
Linus Torvalds c3f9b9fa10 Two small fixes from Xiubo and myself, marked for stable.
-----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmQA0uETHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi92pB/4yZ7Go/7j2zb84N9nEYPCHV23v1vED
 YGZIiWHYv6X3dJyTYpcU7Mn9TF00naTGDKi9NpTZjKOUIkibXPFJfbG7Dh4T2HhN
 TKw9EbldCaXE1mR7o+g/mrVQFM1PIR1VbtIeszL3eD2qO0aXEGyBMvPfUNqFX/M7
 lNWVjuglIaYUL235Uid/wt0zfmPDvtGD24fjpN0e22UQh/aBFnodIDpa/AapsFKp
 yifzqe/ADbvgnHwOhMiEMG1gRFd3vywVfPDQmQ41oSMnf7yTtLWE9t47wTfyoTY5
 IwZY2K1H51QJej/mObYJmClp/y81xSLXEydFdQ571MqZbDeDfQeM23/7
 =cWWl
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client

Pull ceph fixes from Ilya Dryomov:
 "Two small fixes from Xiubo and myself, marked for stable"

* tag 'ceph-for-6.3-rc1' of https://github.com/ceph/ceph-client:
  rbd: avoid use-after-free in do_rbd_add() when rbd_dev_create() fails
  ceph: update the time stamps and try to drop the suid/sgid
2023-03-02 10:48:30 -08:00
David Howells 71562809e4 cifs: Fix memory leak in direct I/O
When __cifs_readv() and __cifs_writev() extract pages from a user-backed
iterator into a BVEC-type iterator, they set ->bv_need_unpin to note
whether they need to unpin the pages later.  However, in both cases they
examine the BVEC-type iterator and not the source iterator - and so
bv_need_unpin doesn't get set and the pages are leaked.

I think this may be responsible for the generic/208 xfstest failing
occasionally with:

	WARNING: CPU: 0 PID: 3064 at mm/gup.c:218 try_grab_page+0x65/0x100
	RIP: 0010:try_grab_page+0x65/0x100
	follow_page_pte+0x1a7/0x570
	__get_user_pages+0x1a2/0x650
	__gup_longterm_locked+0xdc/0xb50
	internal_get_user_pages_fast+0x17f/0x310
	pin_user_pages_fast+0x46/0x60
	iov_iter_extract_pages+0xc9/0x510
	? __kmalloc_large_node+0xb1/0x120
	? __kmalloc_node+0xbe/0x130
	netfs_extract_user_iter+0xbf/0x200 [netfs]
	__cifs_writev+0x150/0x330 [cifs]
	vfs_write+0x2a8/0x3c0
	ksys_pwrite64+0x65/0xa0

with the page refcount going negative.  This is less unlikely than it seems
because the page is being pinned, not simply got, and so the refcount
increased by 1024 each time, and so only needs to be called around ~2097152
for the refcount to go negative.

Further, the test program (aio-dio-invalidate-failure) uses a 32MiB static
buffer and all the PTEs covering it refer to the same page because it's
never written to.

The warning in try_grab_page():

	if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
		return -ENOMEM;

then trips and prevents us ever using the page again for DIO at least.

Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Reported-by: Murphy Zhou <jencce.kernel@gmail.com>
Link: https://lore.kernel.org/r/CAH2r5mvaTsJ---n=265a4zqRA7pP+o4MJ36WCQUS6oPrOij8cw@mail.gmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Paulo Alcantara 1bcd548d93 cifs: prevent data race in cifs_reconnect_tcon()
Make sure to get an up-to-date TCP_Server_Info::nr_targets value prior
to waiting the server to be reconnected in cifs_reconnect_tcon().  It
is set in cifs_tcp_ses_needs_reconnect() and protected by
TCP_Server_Info::srv_lock.

Create a new cifs_wait_for_server_reconnect() helper that can be used
by both SMB2+ and CIFS reconnect code.

Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Paulo Alcantara b9ee2e307c cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID
Do not map STATUS_OBJECT_NAME_INVALID to -EREMOTE under non-DFS
shares, or 'nodfs' mounts or CONFIG_CIFS_DFS_UPCALL=n builds.
Otherwise, in the slow path, get a referral to figure out whether it
is an actual DFS link.

This could be simply reproduced under a non-DFS share by running the
following

  $ mount.cifs //srv/share /mnt -o ...
  $ cat /mnt/$(printf '\U110000')
  cat: '/mnt/'$'\364\220\200\200': Object is remote

Fixes: c877ce47e1 ("cifs: reduce roundtrips on create/qinfo requests")
CC: stable@vger.kernel.org # 6.2
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
David Howells 4c0421fa6d iov: Fix netfs_extract_user_to_sg()
Fix the loop check in netfs_extract_user_to_sg() for extraction from
user-backed iterators to do the body if npages > 0, not if npages < 0
(which it can never be).

This isn't currently used by cifs, which only ever extracts data from BVEC,
KVEC and XARRAY iterators at this level, user-backed iterators having being
decanted into BVEC iterators at a higher level to accommodate the work
being done in a kernel thread.

Found by smatch:
	fs/netfs/iterator.c:139 netfs_extract_user_to_sg() warn: unsigned 'npages' is never less than zero.

Fixes: 0185846975 ("netfs: Add a function to extract an iterator into a scatterlist")
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202302261115.P3TQi1ZO-lkp@intel.com/
Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/Y/yYnAhoAYDBKixX@kili
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
David Howells 0268792f77 cifs: Fix cifs_write_back_from_locked_folio()
cifs_write_back_from_locked_folio() should return the number of bytes read,
but returns the result of ->async_writev(), which will be 0 on success.  As
it happens, this doesn't prevent cifs_writepages_region() from working as
it will then examine and ignore the pages that are no longer dirty rather
than just skipping over them.

Fixes: d08089f649 ("cifs: Change the I/O paths to use an iterator rather than a page list")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Reviewed-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:25 -06:00
Shyam Prasad N 410612b072 cifs: reuse cifs_match_ipaddr for comparison of dstaddr too
We have two pieces of code that does pretty much the same
comparison. This change reuses cifs_match_ipaddr within
match_address.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:24 -06:00
Shyam Prasad N a21be1f5df cifs: match even the scope id for ipv6 addresses
match_address function matches the scope id for ipv6 addresses,
but cifs_match_ipaddr (which is another function used for comparison)
does not use scope id. Doing so with this change.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:18:24 -06:00
David Howells 4bd3e4c7e4 cifs: Fix an uninitialised variable
Fix an uninitialised variable introduced in cifs.

Fixes: 3d78fe73fa ("cifs: Build the RDMA SGE list directly from an iterator")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-rdma@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:17:36 -06:00
David Howells 1907e0fec1 cifs: Add some missing xas_retry() calls
The xas_for_each loops added into fs/cifs/file.c need to go round again if
indicated by xas_retry().

Fixes: b8713c4dbf ("cifs: Add some helper functions")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Tom Talpey <tom@talpey.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2023-03-01 18:17:16 -06:00
Boris Burkov fcd9531b30 btrfs: sysfs: add size class stats
Make it possible to see the distribution of size classes for block
groups. Helpful for testing and debugging the allocator w.r.t. to size
classes.

The new stats can be found at the path:

  /sys/fs/btrfs/<FSID>/allocation/<bg-type>/size_class

but they will only be non-zero for bg-type = data.

Signed-off-by: Boris Burkov <boris@bur.io>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-03-01 19:27:20 +01:00
Linus Torvalds f122a08b19 capability: just use a 'u64' instead of a 'u32[2]' array
Back in 2008 we extended the capability bits from 32 to 64, and we did
it by extending the single 32-bit capability word from one word to an
array of two words.  It was then obfuscated by hiding the "2" behind two
macro expansions, with the reasoning being that maybe it gets extended
further some day.

That reasoning may have been valid at the time, but the last thing we
want to do is to extend the capability set any more.  And the array of
values not only causes source code oddities (with loops to deal with
it), but also results in worse code generation.  It's a lose-lose
situation.

So just change the 'u32[2]' into a 'u64' and be done with it.

We still have to deal with the fact that the user space interface is
designed around an array of these 32-bit values, but that was the case
before too, since the array layouts were different (ie user space
doesn't use an array of 32-bit values for individual capability masks,
but an array of 32-bit slices of multiple masks).

So that marshalling of data is actually simplified too, even if it does
remain somewhat obscure and odd.

This was all triggered by my reaction to the new "cap_isidentical()"
introduced recently.  By just using a saner data structure, it went from

	unsigned __capi;
	CAP_FOR_EACH_U32(__capi) {
		if (a.cap[__capi] != b.cap[__capi])
			return false;
	}
	return true;

to just being

	return a.val == b.val;

instead.  Which is rather more obvious both to humans and to compilers.

Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Serge Hallyn <serge@hallyn.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Paul Moore <paul@paul-moore.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-03-01 10:01:22 -08:00
Linus Torvalds 64e851689e This pull request contains the following changes for UML:
- Add support for rust (yay!)
 - Add support for LTO
 - Add platform bus support to virtio-pci
 - Various virtio fixes
 - Coding style, spelling cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/AVYWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wUQvEADRVll8yCg+a4C1BORSefv0FQ/I
 z+3jiUtzq/ABGBf9S0TYEfaGcOn1LXFzlEgtqf4kd3vmzyIVG0pHUt3BxaqLSSTU
 IjDZkbtZ2GC0i7EK8D3iAC+lvew+TjBMfgEjhrNyni3VeYNDBh8EkUseWIbrNX8Q
 pqoUQwYlVdjY6PedtcdGDko70Fiy4OMIK7lat7JDtuoL5pvMEadbR1D7ClfiYRIh
 NGg5mSnfBBTGc20ochDkHUhubzagtSCDHvNe2SiYDBrM5sjeSANecsICmymWS+Pm
 aJtYybwpC4tl9J25O4OTD3SyfKxZ93mKwK89Tw9ryqQV2rmXG+3qB2hbZ0zpi75I
 vpgDrfv+VC+4daC0Wp8zjoeSE6zUbCKVE1s307EC5fjYyIoHjSUAsPE9GrNaJi5K
 91WVe1x8Dnpfq9/ZO8o3sLqftBo3aVH21dGVuqi5qS6OjRqDMkFkaY31nUjVXELV
 tEBj6n9UoyqPFzcgvsQfTcCjjlMiVL+JU+sl2L7dQXTNev1/RReTJngdK/vv7Epo
 BjLpGfn+1I/8dlbsyjLt0FOIwhIbUf+8RbWpezENGVgKP81iqEQPOdax/yBEhCKm
 NWduDz2itQn94KNMcUoPq+G3xoG3dOW7lXlEzXP6ZbLkcuIyXpeNZeJNlvq7J/z/
 2PBy61Ngs/DDUK/doQ==
 =q2ks
 -----END PGP SIGNATURE-----

Merge tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux

Pull UML updates from Richard Weinberger:

 - Add support for rust (yay!)

 - Add support for LTO

 - Add platform bus support to virtio-pci

 - Various virtio fixes

 - Coding style, spelling cleanups

* tag 'uml-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (27 commits)
  Documentation: rust: Fix arch support table
  uml: vector: Remove unused definitions VECTOR_{WRITE,HEADERS}
  um: virt-pci: properly remove PCI device from bus
  um: virtio_uml: move device breaking into workqueue
  um: virtio_uml: mark device as unregistered when breaking it
  um: virtio_uml: free command if adding to virtqueue failed
  UML: define RUNTIME_DISCARD_EXIT
  virt-pci: add platform bus support
  um-virt-pci: Make max delay configurable
  um: virt-pci: implement pcibios_get_phb_of_node()
  um: Support LTO
  um: put power options in a menu
  um: Use CFLAGS_vmlinux
  um: Prevent building modules incompatible with MODVERSIONS
  um: Avoid pcap multiple definition errors
  um: Make the definition of cpu_data more compatible
  x86: um: vdso: Add '%rcx' and '%r11' to the syscall clobber list
  rust: arch/um: Add support for CONFIG_RUST under x86_64 UML
  rust: arch/um: Disable FP/SIMD instruction to match x86
  rust: arch/um: Use 'pie' relocation mode under UML
  ...
2023-03-01 09:13:00 -08:00
Linus Torvalds e31b283a58 This pull request contains updates for JFFS2, UBI and UBIFS
JFFS2:
 	- Fix memory corruption in error path
 	- Spelling and coding style fixes
 
 UBI:
 	- Switch to BLK_MQ_F_BLOCKING in ubiblock
 	- Wire up partent device (for sysfs)
 	- Multiple UAF bugfixes
 	- Fix for an infinite loop in WL error path
 
 UBIFS:
 	- Fix for multiple memory leaks in error paths
 	- Fixes for wrong space accounting
 	- Minor cleanups
 	- Spelling and coding style fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmP/BSMWHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wX5QEADnPsAW87AIi44UlkbAsJEjkEqU
 yQrI9UPsdqfE7K5OR7Vb2tOti3MLfaiJ5MTG64xNaTEgmrbcqo4GgENk4Pe6zJ9q
 azO/xWTr6r8G4aE70KhsUBc9Vc/99Ok48rxLHJS+u7s+FvOJ//WirGXjyNZEVDyx
 TgvH80rhUlfj1ExqEXLcUfQ53WUnR3PefOw+zIi29ldtgaI/TFmnWObl2A/XtsGr
 0ExIdqDnoTfSsYecfGP71jVbYH8u2mLFe2zNOYW1pvrHYhcet6Q6e+69fgnyNRf9
 5s5dbDPmmWN2Qdz63AWwHfC4mncoQdc7HbnnKIk6bTm3v6gqXQSrgy/hZEX3LmWr
 G41j7RLhBMZ2mljRfFQH97V71n9TL010T6jZ5zurvqErXtVdFImmInGmqY3AR13I
 fzUWnJf81xKgcMv6My2/nEVGDtrGLFdILoT8j6VIXQO6teqtF7Vip7UfamrwL399
 57fy0Iwbx2aume/IdwI3TZYacBQ2d8GvQxTZshJ+qx+gCvhb6EyiSIn3AOgtrbAo
 srXMy+6xXJecKNHR7Bl1C5BLei/25HDIYjk/yiAH/IhhV20c5eiySSvA+q720Gcz
 reBLFh7EtcP51B2GXMdSCOgYY8wGaMl5zlbiLYzY79ptgnnBJ7luuYjc3c02MGB9
 25qVLnpvC/ZREzZqLg==
 =fTYq
 -----END PGP SIGNATURE-----

Merge tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull jffs2, ubi and ubifs updates from Richard Weinberger:
 "JFFS2:
   - Fix memory corruption in error path
   - Spelling and coding style fixes

  UBI:
   - Switch to BLK_MQ_F_BLOCKING in ubiblock
   - Wire up partent device (for sysfs)
   - Multiple UAF bugfixes
   - Fix for an infinite loop in WL error path

  UBIFS:
   - Fix for multiple memory leaks in error paths
   - Fixes for wrong space accounting
   - Minor cleanups
   - Spelling and coding style fixes"

* tag 'ubifs-for-linus-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: (36 commits)
  ubi: block: Fix a possible use-after-free bug in ubiblock_create()
  ubifs: make kobj_type structures constant
  mtd: ubi: block: wire-up device parent
  mtd: ubi: wire-up parent MTD device
  ubi: use correct names in function kernel-doc comments
  ubi: block: set BLK_MQ_F_BLOCKING
  jffs2: Fix list_del corruption if compressors initialized failed
  jffs2: Use function instead of macro when initialize compressors
  jffs2: fix spelling mistake "neccecary"->"necessary"
  ubifs: Fix kernel-doc
  ubifs: Fix some kernel-doc comments
  UBI: Fastmap: Fix kernel-doc
  ubi: ubi_wl_put_peb: Fix infinite loop when wear-leveling work failed
  ubi: Fix UAF wear-leveling entry in eraseblk_count_seq_show()
  ubi: fastmap: Fix missed fm_anchor PEB in wear-leveling after disabling fastmap
  ubifs: ubifs_releasepage: Remove ubifs_assert(0) to valid this process
  ubifs: ubifs_writepage: Mark page dirty after writing inode failed
  ubifs: dirty_cow_znode: Fix memleak in error handling path
  ubifs: Re-statistic cleaned znode count if commit failed
  ubi: Fix permission display of the debugfs files
  ...
2023-03-01 09:06:51 -08:00
Linus Torvalds 3808330b20 9p patches for 6.3 merge window (part 1)
Here is the 9p patches for the 6.3 merge window combining
 the tested and reviewed patches from both Dominique's
 for-next tree and my for-next tree.  Most of these
 patches have been in for-next since December with only
 some reword in the description:
 
 - some fixes and cleanup setting up for a larger set
   of performance patches I've been working on
 - a contributed fixes relating to 9p/rdma
 - some contributed fixes relating to 9p/xen
 
 I've marked this as part 1, I'm not sure I'll be
 submitting part 2.  There were several performance
 patches that I wanted to get in, but the revisions
 after review only went out last week so while they
 have been tested, I haven't received reviews on the
 revisions.
 
 Its been about a decade since I've submitted a pull
 request, sorry if I messed anything up.
 
        -eric
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEElpbw0ZalkJikytFRiP/V+0pf/5gFAmP/frIACgkQiP/V+0pf
 /5jb2w//dccypyaurk441RSdbsIVZUqf8aiGDzRSyX/iZBrUU4nmavdx8+qycpvc
 MVdD3WBiv7+PvdyYnt6EUGZYweGbI7vbVrpUpf20aq1q5A5oDWIR8K1EfrTUJQAd
 j69ALc4Qbq4FmX7kBOKYuj+qUvnnrcyDFQwTHTQ6o8d0MtpmW7MZr2kUiCDKeN3m
 8K2uemR83Azjb2m5TvhWRSjb+61cf03W/Jw4+8PsbtfzX8MyGblqUsgODi35IdiJ
 c2aO+JF0Cw4y9ulw7PR9SkX0yi6CY4Ll/pGV9xW0liIs6E6FQboiwczr+SZNzQoc
 TUC3S7bGSyodiCNTp685sK3KfHaxj5QHBvilUL/t7cgjOWwWSkl8neg9sbI8Bc7P
 WxQ4p6cqnMXMJiHAxk70QfIvCsabLSfTjBhN5zAUwjLmjQsxboFWHloKwndea49v
 32NHEhGwEt7QSE1WHdJ4m6xfuNRSayriOoQ28Yxyg8ekpK+7bWkI8CXGS3Cq9kGQ
 SGhX/UZqT5J1YCvBAwh9s7d5hhHUBxaVz74Pssvbd/PkeKI0CiXFoJM5cu32F+p2
 4gsY67NcyUjJZOq0NsUxFTqQXw4jdrnkcrnGD4istRFyDMf/3hhuV/F+SvPhBn0l
 GdMDwXjEkZLSYQi14TADp/V5DfDzRGRpquo++XbqikQyAgoNcKI=
 =Cvvf
 -----END PGP SIGNATURE-----

Merge tag '9p-6.3-for-linus-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:

 - some fixes and cleanup setting up for a larger set of performance
   patches I've been working on

 - a contributed fixes relating to 9p/rdma

 - some contributed fixes relating to 9p/xen

* tag '9p-6.3-for-linus-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  fs/9p: fix error reporting in v9fs_dir_release
  net/9p: fix bug in client create for .L
  9p/rdma: unmap receive dma buffer in rdma_request()/post_recv()
  9p/xen: fix connection sequence
  9p/xen: fix version parsing
  fs/9p: Expand setup of writeback cache to all levels
  net/9p: Adjust maximum MSIZE to account for p9 header
2023-03-01 08:52:49 -08:00
Linus Torvalds 6e110580bc Just one simple sanity check
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmP/dFwACgkQNqiEXrVA
 jGQEWQ/+K+4qaDXvdXnHFYgcdtaltzdlq5TCKSuU+TxSqVpkuX87zzh1NcE2iQOg
 HwzOISdXHuqLIdrs4U1ei7h+EFyi3+VMtKm9DYjznWyc9cARFauVdnnY9TiSwahW
 hUd80Wt1Fs0llbczChSIpvkE8JLhLe3t/oaB4TuNmZDuiLT1CLtVlVAZ/pQLpprr
 UYhBz1nScFbICgX+RdnO2/BGKzsUfiG6YsKGE0cw5O4xa2hUTBT0o9O7hTd/TcSa
 AcILIrrbMqd6zq0jTO0zjzfEDZrgDg+hvHB+Niwy9Rsuj2qOqx+RYs9+MW88c+XE
 BCo90JSZxK1NWVjm/COiDStlkAYbdcnABl9u7cUDbe9CARnGSEaeYPLbqDxJbkSX
 224oYeRbY56t9McTIWghBz4UAYathXUNt9w7d8IYlpck7Jkg4RUBl5BINpaS0oLw
 aYz5/TLSA91Gc0WM4QkzyeahP/ZlkAWpxvLfbKPVc3DbzoUvelmvNKcgjuf5suj1
 bLFhIMz3Ygk0mIix2L73E1yMfdMQ0wmeGDml6PN/GCE6K0Qwjjpj6vEKFaw41UQX
 fOIlRP85/L3cM6lUDPVzi1JFfTQcXKvrkPm+/pp7lw+tpASOLpzndbtQH4rDJm1H
 et4NstnZhSyGlx6JRUGlktOxRBaOTYr/o4ZZU15+otg5DyRxWdA=
 =T8bI
 -----END PGP SIGNATURE-----

Merge tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggy

Pull jfs update from Dave Kleikamp:
 "Just one simple sanity check"

* tag 'jfs-6.3' of https://github.com/kleikamp/linux-shaggy:
  fs/jfs: fix shift exponent db_agl2size negative
2023-03-01 08:47:19 -08:00
Linus Torvalds e103ecedce Description for this pull request:
- Handle vendor extension and allocation entries as unrecognized benign secondary entries.
 - Fix wrong ->i_blocks on devices with non-512 byte sector.
 - Add the check to avoid returning -EIO from exfat_readdir() at current position exceeding the directory size.
 - Fix a bug that reach the end of the directory stream at a position not aligned with the dentry size.
 - Redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number.
 - Two cleanup fixes and fix cluster leakage in error handling.
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmP/MaMWHGxpbmtpbmpl
 b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCI8WEADJpjyqM7eZ4FI2+16Ws7C8DaIG
 Kk77yXhI9c79+fSLLCcP7vneZYDPAsu4ELuFaHukT6yQ90t87RyWDJOBO/qJhibn
 jf89EbaO8U7L5AwU8ETsSsQ46MKi6ZdU5EUzqvcm4Tz5ZBCbMd6o+4uwMJ6VD/e/
 uj+gyZi5XSNWxbEq/sj4oGugFjz7hfDapvlseE9gqHpX4lZftwt7qpGlX3WRWiGS
 QJpqi4C4QwP6+62LI107fWFAGF6SWmzPYUPhsWZmjeIoKTwFWOD3KPfOxlTBh73M
 RI8QxJHF9g5slDHjLDW0NmeXPCDTUxCkMXRFljXBtmsiejIutEpdJS24RbBpxNGX
 69W6k/kg+EQRyzMBqLKAqrITZZTUSEDeneHfOLBly1sRIOhwxHa9JZ4Y/9TH9Eaj
 /e9KN1jbKzQDovDLaBdVJXb5zU1QC1S+D0ZEMX3BUNSR+lgnfE8jJVoQm3fXCYAo
 9GqkRFeItD5vJW4zYf05yIRFS/4TpNytGCetogBbfB2XGq4X66bwOuDi9QvUidwK
 MKuEL7Y22cMxe7tLPeVndUF3r1FuAprrOyrzNuqQDINitEEzwq4hNwCYkL2WClmT
 5ZfVWW5vMI+hn4XbzRIN20shBAPV1pXzpC4/2kbNQ64xXGAL/97JUjntVfhWgGkr
 TDRr0bNuYLIyBiWQaQ==
 =AbjH
 -----END PGP SIGNATURE-----

Merge tag 'exfat-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat updates from Namjae Jeon:

 - Handle vendor extension and allocation entries as unrecognized benign
   secondary entries

 - Fix wrong ->i_blocks on devices with non-512 byte sector

 - Add the check to avoid returning -EIO from exfat_readdir() at current
   position exceeding the directory size

 - Fix a bug that reach the end of the directory stream at a position
   not aligned with the dentry size

 - Redefine DIR_DELETED as 0xFFFFFFF7, the bad cluster number

 - Two cleanup fixes and fix cluster leakage in error handling

* tag 'exfat-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
  exfat: fix the newly allocated clusters are not freed in error handling
  exfat: don't print error log in normal case
  exfat: remove unneeded code from exfat_alloc_cluster()
  exfat: handle unreconized benign secondary entries
  exfat: fix inode->i_blocks for non-512 byte sector size device
  exfat: redefine DIR_DELETED as the bad cluster number
  exfat: fix reporting fs error when reading dir beyond EOF
  exfat: fix unexpected EOF while reading dir
2023-03-01 08:42:27 -08:00
Linus Torvalds c0927a7a53 New code for 6.3-rc1, part 2:
* Fix a deadlock in the free space allocator due to the AG-walking
    algorithm forgetting to follow AG-order locking rules.
  * Make the inode allocator prefer existing free inodes instead of
    failing to allocate new inode chunks when free space is low.
  * Set minleft correctly when setting allocator parameters for bmap
    changes.
  * Fix uninitialized variable access in the getfsmap code.
  * Make a distinction between active and passive per-AG structure
    references.  For now, active references are taken to perform some
    work in an AG on behalf of a high level operation; passive references
    are used by lower level code to finish operations started by other
    threads.  Eventually this will become part of online shrink.
  * Split out all the different allocator strategies into separate
    functions to move us away from design antipattern of filling out a
    huge structure for various differentish things and issuing a single
    function multiplexing call.
  * Various cleanups in the filestreams allocator code, which we might
    very well want to deprecate instead of continuing.
  * Fix a bug with the agi rotor code that was introduced earlier in this
    series.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCY/zgqgAKCRBKO3ySh0YR
 plIkAQDIscqdqXGH01gF19/ncqG2GUaXY+/zeOReuk1Iv3VEVgD+MVXf+QvHk7LD
 /LTWNl2K6NQmE/9RtaBt0aFNDzvIAgU=
 =k7r8
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull moar xfs updates from Darrick Wong:
 "This contains a fix for a deadlock in the allocator. It continues the
  slow march towards being able to offline AGs, and it refactors the
  interface to the xfs allocator to be less indirection happy.

  Summary:

   - Fix a deadlock in the free space allocator due to the AG-walking
     algorithm forgetting to follow AG-order locking rules

   - Make the inode allocator prefer existing free inodes instead of
     failing to allocate new inode chunks when free space is low

   - Set minleft correctly when setting allocator parameters for bmap
     changes

   - Fix uninitialized variable access in the getfsmap code

   - Make a distinction between active and passive per-AG structure
     references. For now, active references are taken to perform some
     work in an AG on behalf of a high level operation; passive
     references are used by lower level code to finish operations
     started by other threads. Eventually this will become part of
     online shrink

   - Split out all the different allocator strategies into separate
     functions to move us away from design antipattern of filling out a
     huge structure for various differentish things and issuing a single
     function multiplexing call

   - Various cleanups in the filestreams allocator code, which we might
     very well want to deprecate instead of continuing

   - Fix a bug with the agi rotor code that was introduced earlier in
     this series"

* tag 'xfs-6.3-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (44 commits)
  xfs: restore old agirotor behavior
  xfs: fix uninitialized variable access
  xfs: refactor the filestreams allocator pick functions
  xfs: return a referenced perag from filestreams allocator
  xfs: pass perag to filestreams tracing
  xfs: use for_each_perag_wrap in xfs_filestream_pick_ag
  xfs: track an active perag reference in filestreams
  xfs: factor out MRU hit case in xfs_filestream_select_ag
  xfs: remove xfs_filestream_select_ag() longest extent check
  xfs: merge new filestream AG selection into xfs_filestream_select_ag()
  xfs: merge filestream AG lookup into xfs_filestream_select_ag()
  xfs: move xfs_bmap_btalloc_filestreams() to xfs_filestreams.c
  xfs: use xfs_bmap_longest_free_extent() in filestreams
  xfs: get rid of notinit from xfs_bmap_longest_free_extent
  xfs: factor out filestreams from xfs_bmap_btalloc_nullfb
  xfs: convert trim to use for_each_perag_range
  xfs: convert xfs_alloc_vextent_iterate_ags() to use perag walker
  xfs: move the minimum agno checks into xfs_alloc_vextent_check_args
  xfs: fold xfs_alloc_ag_vextent() into callers
  xfs: move allocation accounting to xfs_alloc_vextent_set_fsbno()
  ...
2023-02-28 16:08:30 -08:00
Linus Torvalds b07ce43db6 Improve performance for ext4 by allowing multiple process to perform
direct I/O writes to preallocated blocks by using a shared inode lock
 instead of taking an exclusive lock.
 
 In addition, multiple bug fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAmP9gYkACgkQ8vlZVpUN
 gaNN0AgAqwS873C9QX7QQK8tE+VvKT7iteNaJ68c/CMymSP7o5RdalbQRiAsSy/Q
 88PjBFVFQOsIa1d7OAUr50RHQODjOuOz6SJpitKKPnVC89gAzDt7Pk1AQzABjR37
 GY7nneHTQs6fGXLMUz/SlsU+7a08Bz5BeAxVBQxzkRL6D28/sbpT6Iw1tDhUUsug
 0o3kz/RolEopCzjhmH/Fpxt5RlBnTya5yX8IgmfEV3y7CfQ+XcTWgRebqDXxVCBE
 /VCZOl2cv5n4PFlRH8eUihmyO5iu7p9W9ro6HbLEuxQXwcRNY7skONidceim2EYh
 KzWZt59/JAs0DyvRWqZ9irtPDkuYqA==
 =OIYo
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
 "Improve performance for ext4 by allowing multiple process to perform
  direct I/O writes to preallocated blocks by using a shared inode lock
  instead of taking an exclusive lock.

  In addition, multiple bug fixes and cleanups"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix incorrect options show of original mount_opt and extend mount_opt2
  ext4: Fix possible corruption when moving a directory
  ext4: init error handle resource before init group descriptors
  ext4: fix task hung in ext4_xattr_delete_inode
  jbd2: fix data missing when reusing bh which is ready to be checkpointed
  ext4: update s_journal_inum if it changes after journal replay
  ext4: fail ext4_iget if special inode unallocated
  ext4: fix function prototype mismatch for ext4_feat_ktype
  ext4: remove unnecessary variable initialization
  ext4: fix inode tree inconsistency caused by ENOMEM
  ext4: refuse to create ea block when umounted
  ext4: optimize ea_inode block expansion
  ext4: remove dead code in updating backup sb
  ext4: dio take shared inode lock when overwriting preallocated blocks
  ext4: don't show commit interval if it is zero
  ext4: use ext4_fc_tl_mem in fast-commit replay path
  ext4: improve xattr consistency checking and error reporting
2023-02-28 09:05:47 -08:00
Yuezhang Mo d5c514b6a0 exfat: fix the newly allocated clusters are not freed in error handling
In error handling 'free_cluster', before num_alloc clusters allocated,
p_chain->size will not updated and always 0, thus the newly allocated
clusters are not freed.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-28 20:01:40 +09:00
Yuezhang Mo 3ce937cb8c exfat: don't print error log in normal case
When allocating a new cluster, exFAT first allocates from the
next cluster of the last cluster of the file. If the last cluster
of the file is the last cluster of the volume, allocate from the
first cluster. This is a normal case, but the following error log
will be printed. It makes users confused, so this commit removes
the error log.

[1960905.181545] exFAT-fs (sdb1): hint_cluster is invalid (262130)

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-28 20:01:36 +09:00
Yuezhang Mo 8d2909eeca exfat: remove unneeded code from exfat_alloc_cluster()
In the removed code, num_clusters is 0, nothing is done in
exfat_chain_cont_cluster(), so it is unneeded, remove it.

Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-28 20:01:22 +09:00
Heming Zhao via Ocfs2-devel 236b9254f8 ocfs2: fix non-auto defrag path not working issue
This fixes three issues on move extents ioctl without auto defrag:

a) In ocfs2_find_victim_alloc_group(), we have to convert bits to block
   first in case of global bitmap.

b) In ocfs2_probe_alloc_group(), when finding enough bits in block
   group bitmap, we have to back off move_len to start pos as well,
   otherwise it may corrupt filesystem.

c) In ocfs2_ioctl_move_extents(), set me_threshold both for non-auto
   and auto defrag paths.  Otherwise it will set move_max_hop to 0 and
   finally cause unexpectedly ENOSPC error.

Currently there are no tools triggering the above issues since
defragfs.ocfs2 enables auto defrag by default.  Tested with manually
changing defragfs.ocfs2 to run non auto defrag path.

Link: https://lkml.kernel.org/r/20230220050526.22020-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-27 17:00:15 -08:00
Heming Zhao via Ocfs2-devel 60eed1e3d4 ocfs2: fix defrag path triggering jbd2 ASSERT
code path:

ocfs2_ioctl_move_extents
 ocfs2_move_extents
  ocfs2_defrag_extent
   __ocfs2_move_extent
    + ocfs2_journal_access_di
    + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
    + ocfs2_journal_dirty //crash by jbs2 ASSERT

crash stacks:

PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
 #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
 #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
 #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
 #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
 #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
 #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
 #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
    [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
    RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
    RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
    RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
    RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
    R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
    R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
 #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
 #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]

Analysis

This bug has the same root cause of 'commit 7f27ec978b ("ocfs2: call
ocfs2_journal_access_di() before ocfs2_journal_dirty() in
ocfs2_write_end_nolock()")'.  For this bug, jbd2_journal_restart() is
called by ocfs2_split_extent() during defragmenting.

How to fix

For ocfs2_split_extent() can handle journal operations totally by itself. 
Caller doesn't need to call journal access/dirty pair, and caller only
needs to call journal start/stop pair.  The fix method is to remove
journal access/dirty from __ocfs2_move_extent().

The discussion for this patch:
https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html

Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.com
Signed-off-by: Heming Zhao <heming.zhao@suse.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-02-27 17:00:15 -08:00
Mateusz Guzik 981ee95cc1 vfs: avoid duplicating creds in faccessat if possible
access(2) remains commonly used, for example on exec:
access("/etc/ld.so.preload", R_OK)

or when running gcc: strace -c gcc empty.c

  % time     seconds  usecs/call     calls    errors syscall
  ------ ----------- ----------- --------- --------- ----------------
    0.00    0.000000           0        42        26 access

It falls down to do_faccessat without the AT_EACCESS flag, which in turn
results in allocation of new creds in order to modify fsuid/fsgid and
caps.  This is a very expensive process single-threaded and most notably
multi-threaded, with numerous structures getting refed and unrefed on
imminent new cred destruction.

Turns out for typical consumers the resulting creds would be identical
and this can be checked upfront, avoiding the hard work.

An access benchmark plugged into will-it-scale running on Cascade Lake
shows:

    test     proc     before       after
    access1     1    1310582     2908735    (+121%) # distinct files
    access1    24    4716491    63822173   (+1353%) # distinct files
    access2    24    2378041     5370335    (+125%) # same file

The above benchmarks are not integrated into will-it-scale, but can be
found in a pull request:

  https://github.com/antonblanchard/will-it-scale/pull/36/files

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-27 16:39:19 -08:00
Linus Torvalds 103830683c f2fs-for-6.3-rc1
In this round, we've got a huge number of patches that improve code readability
 along with minor bug fixes, while we've mainly fixed some critical issues in
 recently-added per-block age-based extent_cache, atomic write support, and some
 folio cases.
 
 Enhancement:
  - add sysfs nodes to set last_age_weight and manage discard_io_aware_gran
  - show ipu policy in debugfs
  - reduce stack memory cost by using bitfield in struct f2fs_io_info
  - introduce trace_f2fs_replace_atomic_write_block
  - enhance iostat support and adds flush commands
 
 Bug fix:
  - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  - fix kernel crash on the atomic write abort flow
  - call clear_page_private_reference in .{release,invalid}_folio
  - support .migrate_folio for compressed inode
  - fix cgroup writeback accounting with fs-layer encryption
  - retry to update the inode page given data corruption
  - fix kernel crash due to null io->bio
  - fix some bugs in per-block age-based extent_cache:
     a. wrong calculation of block age
     b. update age extent in f2fs_do_zero_range()
     c. update age extent correctly during truncation
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmP9M/cACgkQQBSofoJI
 UNIX1Q//Yp+nDeY91H3IO6aMSHPqRoBBnVTr8ERtLUF0fuQbBkzcQE+t8cMSoYDM
 88sxoC+F7UnovNr84VeuKlHN4hYyAXuxj8OZetXI3XX+yiO+auEPtljGA0BaGwqL
 93lIg8nIQ2ing/oZ/4+h4dvpYCPrhKOQS6h1sHhIWlql6Wxwxq01uA47i0Ni6y/o
 D23JPFaDQfumN8qy1bm5xfLRhTQmaE35n5NhcBJpUD/rGK92NPXv7RLKPgZc3tKN
 tJmL+NXa3NNEx5e5TSP7JX+rhD7KlL5XlB/m8LbpPIx338I0pt0uzAc7nTIOlzM3
 DI0q3HXe9U3+JBHi+rKxkIniiRDmvhPx3NzdgcsYg75EnwdrNazsRHulBUEAXB4v
 ghHbx53OxA2uSnUVbkY4HXNYf7cYjrx5vbX0oqEx48btBCC8KGFIcHtI72tIBee3
 xdCxoM3e2AWijkFBOCkThXNuNNbdifQzn2e7xR7W+o0L9hwdR5t7tHhHT+cqG9Ox
 6UKWoZZUjYUAV/YQT5Qh6570GsGncM8gHAUMz7DVTIOB9wYkHtb0Q9tUmoQccwf5
 46r84c4/jxUQHt8SkXBBl1aiqHmR7EF17YvM5AXIdG3DqqOy70NWkM48hMOyIy2G
 eY/wTVJwpd6oDveQPtxaHn7Wo3jSPNUlidOfAYQ7Itm34VXY/zc=
 =yXLl
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've got a huge number of patches that improve code
  readability along with minor bug fixes, while we've mainly fixed some
  critical issues in recently-added per-block age-based extent_cache,
  atomic write support, and some folio cases.

  Enhancements:

   - add sysfs nodes to set last_age_weight and manage
     discard_io_aware_gran

   - show ipu policy in debugfs

   - reduce stack memory cost by using bitfield in struct f2fs_io_info

   - introduce trace_f2fs_replace_atomic_write_block

   - enhance iostat support and adds flush commands

  Bug fixes:

   - revert "f2fs: truncate blocks in batch in __complete_revoke_list()"

   - fix kernel crash on the atomic write abort flow

   - call clear_page_private_reference in .{release,invalid}_folio

   - support .migrate_folio for compressed inode

   - fix cgroup writeback accounting with fs-layer encryption

   - retry to update the inode page given data corruption

   - fix kernel crash due to NULL io->bio

   - fix some bugs in per-block age-based extent_cache:
       - wrong calculation of block age
       - update age extent in f2fs_do_zero_range()
       - update age extent correctly during truncation"

* tag 'f2fs-for-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (81 commits)
  f2fs: drop unnecessary arg for f2fs_ioc_*()
  f2fs: Revert "f2fs: truncate blocks in batch in __complete_revoke_list()"
  f2fs: synchronize atomic write aborts
  f2fs: fix wrong segment count
  f2fs: replace si->sbi w/ sbi in stat_show()
  f2fs: export ipu policy in debugfs
  f2fs: make kobj_type structures constant
  f2fs: fix to do sanity check on extent cache correctly
  f2fs: add missing description for ipu_policy node
  f2fs: fix to set ipu policy
  f2fs: fix typos in comments
  f2fs: fix kernel crash due to null io->bio
  f2fs: use iostat_lat_type directly as a parameter in the iostat_update_and_unbind_ctx()
  f2fs: add sysfs nodes to set last_age_weight
  f2fs: fix f2fs_show_options to show nogc_merge mount option
  f2fs: fix cgroup writeback accounting with fs-layer encryption
  f2fs: fix wrong calculation of block age
  f2fs: fix to update age extent in f2fs_do_zero_range()
  f2fs: fix to update age extent correctly during truncation
  f2fs: fix to avoid potential memory corruption in __update_iostat_latency()
  ...
2023-02-27 16:18:51 -08:00
Linus Torvalds 11c7052998 ARM: SoC drivers for 6.3
As usual, there are lots of minor driver changes across SoC platforms
 from  NXP, Amlogic, AMD Zynq, Mediatek, Qualcomm, Apple and Samsung.
 These usually add support for additional chip variations in existing
 drivers, but also add features or bugfixes.
 
 The SCMI firmware subsystem gains a unified raw userspace interface
 through debugfs, which can be used for validation purposes.
 
 Newly added drivers include:
 
  - New power management drivers for StarFive JH7110, Allwinner D1 and
    Renesas RZ/V2M
 
  - A driver for Qualcomm battery and power supply status
 
  - A SoC device driver for identifying Nuvoton WPCM450 chips
 
  - A regulator coupler driver for Mediatek MT81xxv
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmPtSN8ACgkQmmx57+YA
 GNkOSw/+JS5tElm/ZP7c3uWYp6uwvcb0jUlKW/U3aCtPiPEcYDLEqIEXwcNdaDMh
 m4rW3GYlW0IRL3FsyuYkSLx+EIIUIfs40wldYXJOqRDj0XasndiloIwltOQJGfd9
 C/UVM0FpJdxMJrcBMFgwLLQCIbAVnhHP34i6ppDRgxW/MfTeiCaaG6fnS70iv6mC
 oh2N7FoZSKDtTrFtlR5TqFiK5v/W1CgNJVuglkFB0ceFpjyBpp/8AT0FGS887xCz
 IYSTqm4Q/79vaZXI1Y2oog257cgdwsVqgPrnK5CuSFhTnAcJMCekiFelHq8Yhyuk
 Rw7j/B3KO3AOaxmR75c6SZdeZ+VHgUMRC/RKe3fay0sm3Zea2kAIPXA6Zn+r/cxb
 8M94V59qBz+f8XmpXRTK1UR3s3EbwFIuNyuDIkeorMtpSKtvqJXmZxGDwNIfXr2F
 /voo++MKjzdtdxdW/D/5Tc9DC0Pyb4HLi0EYj2QCzA03njmfLDF1w73NfzMec+GD
 R1zAd3FEbiJQx8Hin0PSPjYXpfMnkjkGAEcE9N9Ralg4ewNWAxfOFsAhHKTZNssL
 pitTAvHR/+dXtvkX7FUi2l/6fqn8nJUrg/xRazPPp3scRbpuk8m6P4MNr3/lsaHk
 HTQ/hYwDdecWLvKXjw5y9yIr3yhLmPPcloTVIIFFjsM0t8b+d9E=
 =p6Xp
 -----END PGP SIGNATURE-----

Merge tag 'soc-drivers-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc

Pull ARM SoC driver updates from Arnd Bergmann:
 "As usual, there are lots of minor driver changes across SoC platforms
  from NXP, Amlogic, AMD Zynq, Mediatek, Qualcomm, Apple and Samsung.
  These usually add support for additional chip variations in existing
  drivers, but also add features or bugfixes.

  The SCMI firmware subsystem gains a unified raw userspace interface
  through debugfs, which can be used for validation purposes.

  Newly added drivers include:

   - New power management drivers for StarFive JH7110, Allwinner D1 and
     Renesas RZ/V2M

   - A driver for Qualcomm battery and power supply status

   - A SoC device driver for identifying Nuvoton WPCM450 chips

   - A regulator coupler driver for Mediatek MT81xxv"

* tag 'soc-drivers-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (165 commits)
  power: supply: Introduce Qualcomm PMIC GLINK power supply
  soc: apple: rtkit: Do not copy the reg state structure to the stack
  soc: sunxi: SUN20I_PPU should depend on PM
  memory: renesas-rpc-if: Remove redundant division of dummy
  soc: qcom: socinfo: Add IDs for IPQ5332 and its variant
  dt-bindings: arm: qcom,ids: Add IDs for IPQ5332 and its variant
  dt-bindings: power: qcom,rpmpd: add RPMH_REGULATOR_LEVEL_LOW_SVS_L1
  firmware: qcom_scm: Move qcom_scm.h to include/linux/firmware/qcom/
  MAINTAINERS: Update qcom CPR maintainer entry
  dt-bindings: firmware: document Qualcomm SM8550 SCM
  dt-bindings: firmware: qcom,scm: add qcom,scm-sa8775p compatible
  soc: qcom: socinfo: Add Soc IDs for IPQ8064 and variants
  dt-bindings: arm: qcom,ids: Add Soc IDs for IPQ8064 and variants
  soc: qcom: socinfo: Add support for new field in revision 17
  soc: qcom: smd-rpm: Add IPQ9574 compatible
  soc: qcom: pmic_glink: remove redundant calculation of svid
  soc: qcom: stats: Populate all subsystem debugfs files
  dt-bindings: soc: qcom,rpmh-rsc: Update to allow for generic nodes
  soc: qcom: pmic_glink: add CONFIG_NET/CONFIG_OF dependencies
  soc: qcom: pmic_glink: Introduce altmode support
  ...
2023-02-27 10:04:49 -08:00
Linus Torvalds d40b2f4c94 fuse update for 6.3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCY/ysqAAKCRDh3BK/laaZ
 PLSfAP9c4z/KOZj/Am8nQ0mqI8Ss0Ei+hyu8Vow6NoyJkR4NvwD/SxEeI2+rpXj7
 TGOA+6k4chLc7QIFBsIwGvQgeXld1gA=
 =s4b/
 -----END PGP SIGNATURE-----

Merge tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse

Pull fuse updates from Miklos Szeredi:

 - Fix regression in fileattr permission checking

 - Fix possible hang during PID namespace destruction

 - Add generic support for request extensions

 - Add supplementary group list extension

 - Add limited support for supplying supplementary groups in create
   requests

 - Documentation fixes

* tag 'fuse-update-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: add inode/permission checks to fileattr_get/fileattr_set
  fuse: fix all W=1 kernel-doc warnings
  fuse: in fuse_flush only wait if someone wants the return code
  fuse: optional supplementary group in create requests
  fuse: add request extension
2023-02-27 09:53:58 -08:00
Darrick J. Wong 6e2985c938 xfs: restore old agirotor behavior
Prior to the removal of xfs_ialloc_next_ag, we would increment the agi
rotor and return the *old* value.  atomic_inc_return returns the new
value, which causes mkfs to allocate the root directory in AG 1.  Put
back the old behavior (at least for mkfs) by subtracting 1 here.

Fixes: 20a5eab49d ("xfs: convert xfs_ialloc_next_ag() to an atomic")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2023-02-27 08:53:45 -08:00
Namjae Jeon 8258ef2800 exfat: handle unreconized benign secondary entries
Sony PXW-Z280 camera add vendor allocation entries to directory of
pictures. Currently, linux exfat does not support it and the file is
not visible. This patch handle vendor extension and allocation entries
as unreconized benign secondary entries. As described in the specification,
it is recognized but ignored, and when deleting directory entry set,
the associated clusters allocation are removed as well as benign secondary
directory entries.

Reported-by: Barócsi Dénes <admin@tveger.hu>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27 21:14:46 +09:00
Yuezhang Mo 39c1ce8eaf exfat: fix inode->i_blocks for non-512 byte sector size device
inode->i_blocks is not real number of blocks, but 512 byte ones.

Fixes: 98d917047e ("exfat: add file operations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27 21:14:45 +09:00
Sungjong Seo bdaadfd343 exfat: redefine DIR_DELETED as the bad cluster number
When a file or a directory is deleted, the hint for the cluster of
its parent directory in its in-memory inode is set as DIR_DELETED.
Therefore, DIR_DELETED must be one of invalid cluster numbers. According
to the exFAT specification, a volume can have at most 2^32-11 clusters.
However, DIR_DELETED is wrongly defined as 0xFFFF0321, which could be
a valid cluster number. To fix it, let's redefine DIR_DELETED as
0xFFFFFFF7, the bad cluster number.

Fixes: 1acf1a564b ("exfat: add in-memory and on-disk structures and headers")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27 21:14:45 +09:00
Yuezhang Mo 706fdcac00 exfat: fix reporting fs error when reading dir beyond EOF
Since seekdir() does not check whether the position is valid, the
position may exceed the size of the directory. We found that for
a directory with discontinuous clusters, if the position exceeds
the size of the directory and the excess size is greater than or
equal to the cluster size, exfat_readdir() will return -EIO,
causing a file system error and making the file system unavailable.

Reproduce this bug by:

seekdir(dir, dir_size + cluster_size);
dirent = readdir(dir);

The following log will be printed if mount with 'errors=remount-ro'.

[11166.712896] exFAT-fs (sdb1): error, invalid access to FAT (entry 0xffffffff)
[11166.712905] exFAT-fs (sdb1): Filesystem has been set read-only

Fixes: 1e5654de0f ("exfat: handle wrong stream entry size in exfat_readdir()")
Cc: stable@vger.kernel.org # v5.7+
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27 21:14:45 +09:00
Yuezhang Mo 6cb5d1a16a exfat: fix unexpected EOF while reading dir
If the position is not aligned with the dentry size, the return
value of readdir() will be NULL and errno is 0, which means the
end of the directory stream is reached.

If the position is aligned with dentry size, but there is no file
or directory at the position, exfat_readdir() will continue to
get dentry from the next dentry. So the dentry gotten by readdir()
may not be at the position.

After this commit, if the position is not aligned with the dentry
size, round the position up to the dentry size and continue to get
the dentry.

Fixes: ca06197382 ("exfat: add directory operations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Reviewed-by: Andy Wu <Andy.Wu@sony.com>
Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com>
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2023-02-27 21:14:44 +09:00
Linus Torvalds 498a1cf902 Kbuild updates for v6.3
- Change V=1 option to print both short log and full command log.
 
  - Allow V=1 and V=2 to be combined as V=12.
 
  - Make W=1 detect wrong .gitignore files.
 
  - Tree-wide cleanups for unused command line arguments passed to Clang.
 
  - Stop using -Qunused-arguments with Clang.
 
  - Make scripts/setlocalversion handle only correct release tags instead
    of any arbitrary annotated tag.
 
  - Create Debian and RPM source packages without cleaning the source tree.
 
  - Various cleanups for packaging.
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmP7iHoVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGL/cQAK9q5rsNL5a2LgTbm89ORA+UV+ST
 hrAoGo5DkJHUbVH53oPzyLynFBZPvUzLK8yjApjXkyAzy2hXYnj+vbTs0s+JVCFL
 owS4NB0YP+tpHGuy8bGpWI0GMZSMwmspUteqxk86zuH8uQVAhnCaeV1/Cr6Aqj1h
 2jk1FZid3/h7qEkEgu5U8soeyFnV6VhAT6Ie5yfZ2O2RdsSqPUh6vfKrgdyW4RWz
 gito0SOUwvjIDfSmTnIIacUibisPRv2OW29OvmDp1aXj5rMhe3UfOznVE3NR86yl
 ZbWDAIm6KYT8V1ASOoAUR80qent9IPKytThLK9BVEQCT6bsujCZMvhYhhEvO30TF
 Lzsdr+FrES//xag3+hgc63FEied2xxWGQG1cRtzAhfRL9tJ03+mY1omoW6SyKqW/
 Gc9PIcTgQbCIrkeL0HuAI1q3I1vkvHXInJKtGkoHh1J9aJ8v5gQpwGA+DDRUnA+A
 LQSeEbT2Hf3MoF4CqZRnConvfhlMuLI+j5v54YPrhokxXmv7u807kjfwMFTiZ/+m
 CJFlEMf9YRv3pi8g/AYyGAg5ZQigCwzOCRUC5kguFqzZdgnjiI907GEL804lm1Mg
 lpx/HtYPyxwWEd2XyU6/C9AEIl3gm7MBd6b1tD54Tb/VmE+AvjS/O9jFYXZqnAnM
 Llv4BfK/cQKwHb6o
 =HpFZ
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild updates from Masahiro Yamada:

 - Change V=1 option to print both short log and full command log

 - Allow V=1 and V=2 to be combined as V=12

 - Make W=1 detect wrong .gitignore files

 - Tree-wide cleanups for unused command line arguments passed to Clang

 - Stop using -Qunused-arguments with Clang

 - Make scripts/setlocalversion handle only correct release tags instead
   of any arbitrary annotated tag

 - Create Debian and RPM source packages without cleaning the source
   tree

 - Various cleanups for packaging

* tag 'kbuild-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (74 commits)
  kbuild: rpm-pkg: remove unneeded KERNELRELEASE from modules/headers_install
  docs: kbuild: remove description of KBUILD_LDS_MODULE
  .gitattributes: use 'dts' diff driver for *.dtso files
  kbuild: deb-pkg: improve the usability of source package
  kbuild: deb-pkg: fix binary-arch and clean in debian/rules
  kbuild: tar-pkg: use tar rules in scripts/Makefile.package
  kbuild: make perf-tar*-src-pkg work without relying on git
  kbuild: deb-pkg: switch over to source format 3.0 (quilt)
  kbuild: deb-pkg: make .orig tarball a hard link if possible
  kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
  kbuild: srcrpm-pkg: create source package without cleaning
  kbuild: rpm-pkg: build binary packages from source rpm
  kbuild: deb-pkg: create source package without cleaning
  kbuild: add a tool to list files ignored by git
  Documentation/llvm: add Chimera Linux, Google and Meta datacenters
  setlocalversion: use only the correct release tag for git-describe
  setlocalversion: clean up the construction of version output
  .gitignore: ignore *.cover and *.mbx
  kbuild: remove --include-dir MAKEFLAG from top Makefile
  kbuild: fix trivial typo in comment
  ...
2023-02-26 11:53:25 -08:00
Xiubo Li e027253c4b ceph: update the time stamps and try to drop the suid/sgid
The fallocate will try to clear the suid/sgid if a unprevileged user
changed the file.

There is no POSIX item requires that we should clear the suid/sgid
in fallocate code path but this is the default behaviour for most of
the filesystems and the VFS layer. And also the same for the write
code path, which have already support it.

And also we need to update the time stamps since the fallocate will
change the file contents.

Cc: stable@vger.kernel.org
Link: https://tracker.ceph.com/issues/58054
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2023-02-26 20:03:14 +01:00
Zhang Yi e3645d72f8 ext4: fix incorrect options show of original mount_opt and extend mount_opt2
Current _ext4_show_options() do not distinguish MOPT_2 flag, so it mixed
extend sbi->s_mount_opt2 options with sbi->s_mount_opt, it could lead to
show incorrect options, e.g. show fc_debug_force if we mount with
errors=continue mode and miss it if we set.

  $ mkfs.ext4 /dev/pmem0
  $ mount -o errors=remount-ro /dev/pmem0 /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    #empty
  $ mount -o remount,errors=continue /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    fc_debug_force
  $ mount -o remount,errors=remount-ro,fc_debug_force /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    #empty

Fixes: 995a3ed67f ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230129034939.3702550-1-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-25 15:39:08 -05:00
Jan Kara 0813299c58 ext4: Fix possible corruption when moving a directory
When we are renaming a directory to a different directory, we need to
update '..' entry in the moved directory. However nothing prevents moved
directory from being modified and even converted from the inline format
to the normal format. When such race happens the rename code gets
confused and we crash. Fix the problem by locking the moved directory.

CC: stable@vger.kernel.org
Fixes: 32f7f22c0b ("ext4: let ext4_rename handle inline dir")
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230126112221.11866-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-25 15:39:07 -05:00
Ye Bin 172e344e6f ext4: init error handle resource before init group descriptors
Now, 's_err_report' timer is init after ext4_group_desc_init() when fill
super. Theoretically, ext4_group_desc_init() may access to error handle
as follows:
__ext4_fill_super
  ext4_group_desc_init
    ext4_check_descriptors
      ext4_get_group_desc
        ext4_error
          ext4_handle_error
            ext4_commit_super
              ext4_update_super
                if (!es->s_error_count)
                  mod_timer(&sbi->s_err_report, jiffies + 24*60*60*HZ);
		  --> Accessing Uninitialized Variables
timer_setup(&sbi->s_err_report, print_daily_error_info, 0);

Maybe above issue is just theoretical, as ext4_check_descriptors() didn't
judge 'gpd' which get from ext4_get_group_desc(), if access to error handle
ext4_get_group_desc() will return NULL, then will trigger null-ptr-deref in
ext4_check_descriptors().
However, from the perspective of pure code, it is better to initialize
resource that may need to be used first.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230119013711.86680-1-yebin@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2023-02-25 15:39:07 -05:00
Linus Torvalds 489fa31ea8 Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs updates from Al Viro:
 "Assorted stuff that didn't fit anywhere else"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nsfs: repair kernel-doc for ns_match()
  nsfs: add compat ioctl handler
  fs/cramfs: Convert kmap() to kmap_local_data()
2023-02-24 19:27:55 -08:00
Linus Torvalds 3df88c6a17 Merge branch 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull ipc namespace update from Al Viro:
 "Rik's patches reducing the amount of synchronize_rcu() triggered by
  ipc namespace destruction.

  I've some pending stuff reducing that on the normal umount side, but
  it's nowhere near ready and Rik's stuff shouldn't be held back due to
  conflicts - I'll just redo the parts of my series that stray into
  ipc/*"

* 'work.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ipc,namespace: batch free ipc_namespace structures
  ipc,namespace: make ipc namespace allocation wait for pending free
2023-02-24 19:20:07 -08:00
Linus Torvalds d6b9cf417c Merge branch 'work.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull sysv updates from Al Viro:
 "Fabio's 'switch to kmap_local_page()' patchset (originally after the
  ext2 counterpart, with a lot of cleaning up done to it; as the matter
  of fact, ext2 side is in need of similar cleanups - calling
  conventions there are bloody awful).

  Plus the equivalents of minix stuff..."

* 'work.sysv' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  sysv: fix handling of delete_entry and set_link failures
  fs/sysv: Replace kmap() with kmap_local_page()
  fs/sysv: Use dir_put_page() in sysv_rename()
  fs/sysv: Change the signature of dir_get_page()
  fs/sysv: Use the offset_in_page() helper
  sysv: don't flush page immediately for DIRSYNC directories
2023-02-24 19:03:26 -08:00
Linus Torvalds 397aa6b63f Merge branch 'work.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull minix updates from Al Viro:
 "Assorted fixes - mostly Christoph's"

* 'work.minix' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  minix_rename(): minix_delete_entry() might fail
  minix: don't flush page immediately for DIRSYNC directories
  minix: fix error handling in minix_set_link
  minix: fix error handling in minix_delete_entry
  minix: move releasing pages into unlink and rename
  minix: make minix_new_inode() return error as ERR_PTR(-E...)
2023-02-24 19:01:15 -08:00
Linus Torvalds a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
David Howells ab7362d04d cifs: Fix cifs_writepages_region()
Fix the cifs_writepages_region() to just jump over members of the batch
that have been cleaned up rather than counting them as skipped.

Unlike the other "skip_write" cases, this situation happens even for
WB_SYNC_ALL, simply because the page has either been cleaned by somebody
else, or was truncated.

So in this case we're not "skipping" the write, we simply no longer need
any write at all, so it's very different from the other skip_write cases.

And we definitely shouldn't stop writing the rest just because of too
many of these cases (or because we want to be rescheduled).

Fixes: 3822a7c409 ("Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/lkml/2213409.1677249075@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-02-24 11:01:58 -08:00
Eric Van Hensbergen 89c58cb395
fs/9p: fix error reporting in v9fs_dir_release
Checking the p9_fid_put value allows us to pass back errors
involved if we end up clunking the fid as part of dir_release.

This can help with more graceful response to errors in writeback
among other things.

Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
Reviewed-by: Dominique Martinet <asmadeus@codewreck.org>
2023-02-24 13:42:40 +00:00
Linus Torvalds d2980d8d82 There is no particular theme here - mainly quick hits all over the tree.
Most notable is a set of zlib changes from Mikhail Zaslonko which enhances
 and fixes zlib's use of S390 hardware support: "lib/zlib: Set of s390
 DFLTCC related patches for kernel zlib".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/QC4QAKCRDdBJ7gKXxA
 jtKdAQCbDCBdY8H45d1fONzQW2UDqCPnOi77MpVUxGL33r+1SAEA807C7rvDEmlf
 yP1Ft+722fFU5jogVU8ZFh+vapv2/gI=
 =Q9YK
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull non-MM updates from Andrew Morton:
 "There is no particular theme here - mainly quick hits all over the
  tree.

  Most notable is a set of zlib changes from Mikhail Zaslonko which
  enhances and fixes zlib's use of S390 hardware support: 'lib/zlib: Set
  of s390 DFLTCC related patches for kernel zlib'"

* tag 'mm-nonmm-stable-2023-02-20-15-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (55 commits)
  Update CREDITS file entry for Jesper Juhl
  sparc: allow PM configs for sparc32 COMPILE_TEST
  hung_task: print message when hung_task_warnings gets down to zero.
  arch/Kconfig: fix indentation
  scripts/tags.sh: fix the Kconfig tags generation when using latest ctags
  nilfs2: prevent WARNING in nilfs_dat_commit_end()
  lib/zlib: remove redundation assignement of avail_in dfltcc_gdht()
  lib/Kconfig.debug: do not enable DEBUG_PREEMPT by default
  lib/zlib: DFLTCC always switch to software inflate for Z_PACKET_FLUSH option
  lib/zlib: DFLTCC support inflate with small window
  lib/zlib: Split deflate and inflate states for DFLTCC
  lib/zlib: DFLTCC not writing header bits when avail_out == 0
  lib/zlib: fix DFLTCC ignoring flush modes when avail_in == 0
  lib/zlib: fix DFLTCC not flushing EOBS when creating raw streams
  lib/zlib: implement switching between DFLTCC and software
  lib/zlib: adjust offset calculation for dfltcc_state
  nilfs2: replace WARN_ONs for invalid DAT metadata block requests
  scripts/spelling.txt: add "exsits" pattern and fix typo instances
  fs: gracefully handle ->get_block not mapping bh in __mpage_writepage
  cramfs: Kconfig: fix spelling & punctuation
  ...
2023-02-23 17:55:40 -08:00
Linus Torvalds 3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds 06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00