Commit Graph

77024 Commits

Author SHA1 Message Date
Xiang wangx 1f3ddff375 ext4: fix a doubled word "need" in a comment
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Link: https://lore.kernel.org/r/20220605091503.12513-1-wangxiang@cdjrlc.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:36:20 -04:00
Zhang Yi b55c3cd102 ext4: add reserved GDT blocks check
We capture a NULL pointer issue when resizing a corrupt ext4 image which
is freshly clear resize_inode feature (not run e2fsck). It could be
simply reproduced by following steps. The problem is because of the
resize_inode feature was cleared, and it will convert the filesystem to
meta_bg mode in ext4_resize_fs(), but the es->s_reserved_gdt_blocks was
not reduced to zero, so could we mistakenly call reserve_backup_gdb()
and passing an uninitialized resize_inode to it when adding new group
descriptors.

 mkfs.ext4 /dev/sda 3G
 tune2fs -O ^resize_inode /dev/sda #forget to run requested e2fsck
 mount /dev/sda /mnt
 resize2fs /dev/sda 8G

 ========
 BUG: kernel NULL pointer dereference, address: 0000000000000028
 CPU: 19 PID: 3243 Comm: resize2fs Not tainted 5.18.0-rc7-00001-gfde086c5ebfd #748
 ...
 RIP: 0010:ext4_flex_group_add+0xe08/0x2570
 ...
 Call Trace:
  <TASK>
  ext4_resize_fs+0xbec/0x1660
  __ext4_ioctl+0x1749/0x24e0
  ext4_ioctl+0x12/0x20
  __x64_sys_ioctl+0xa6/0x110
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f2dd739617b
 ========

The fix is simple, add a check in ext4_resize_begin() to make sure that
the es->s_reserved_gdt_blocks is zero when the resize_inode feature is
disabled.

Cc: stable@kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220601092717.763694-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:36:08 -04:00
Ding Xiang bc75a6eb85 ext4: make variable "count" signed
Since dx_make_map() may return -EFSCORRUPTED now, so change "count" to
be a signed integer so we can correctly check for an error code returned
by dx_make_map().

Fixes: 46c116b920 ("ext4: verify dir block before splitting it")
Cc: stable@kernel.org
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20220530100047.537598-1-dingxiang@cmss.chinamobile.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:57 -04:00
Baokun Li cf4ff938b4 ext4: correct the judgment of BUG in ext4_mb_normalize_request
ext4_mb_normalize_request() can move logical start of allocated blocks
to reduce fragmentation and better utilize preallocation. However logical
block requested as a start of allocation (ac->ac_o_ex.fe_logical) should
always be covered by allocated blocks so we should check that by
modifying and to or in the assertion.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:57 -04:00
Baokun Li a08f789d2a ext4: fix bug_on ext4_mb_use_inode_pa
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/mballoc.c:3211!
[...]
RIP: 0010:ext4_mb_mark_diskspace_used.cold+0x85/0x136f
[...]
Call Trace:
 ext4_mb_new_blocks+0x9df/0x5d30
 ext4_ext_map_blocks+0x1803/0x4d80
 ext4_map_blocks+0x3a4/0x1a10
 ext4_writepages+0x126d/0x2c30
 do_writepages+0x7f/0x1b0
 __filemap_fdatawrite_range+0x285/0x3b0
 file_write_and_wait_range+0xb1/0x140
 ext4_sync_file+0x1aa/0xca0
 vfs_fsync_range+0xfb/0x260
 do_fsync+0x48/0xa0
[...]
==================================================================

Above issue may happen as follows:
-------------------------------------
do_fsync
 vfs_fsync_range
  ext4_sync_file
   file_write_and_wait_range
    __filemap_fdatawrite_range
     do_writepages
      ext4_writepages
       mpage_map_and_submit_extent
        mpage_map_one_extent
         ext4_map_blocks
          ext4_mb_new_blocks
           ext4_mb_normalize_request
            >>> start + size <= ac->ac_o_ex.fe_logical
           ext4_mb_regular_allocator
            ext4_mb_simple_scan_group
             ext4_mb_use_best_found
              ext4_mb_new_preallocation
               ext4_mb_new_inode_pa
                ext4_mb_use_inode_pa
                 >>> set ac->ac_b_ex.fe_len <= 0
           ext4_mb_mark_diskspace_used
            >>> BUG_ON(ac->ac_b_ex.fe_len <= 0);

we can easily reproduce this problem with the following commands:
	`fallocate -l100M disk`
	`mkfs.ext4 -b 1024 -g 256 disk`
	`mount disk /mnt`
	`fsstress -d /mnt -l 0 -n 1000 -p 1`

The size must be smaller than or equal to EXT4_BLOCKS_PER_GROUP.
Therefore, "start + size <= ac->ac_o_ex.fe_logical" may occur
when the size is truncated. So start should be the start position of
the group where ac_o_ex.fe_logical is located after alignment.
In addition, when the value of fe_logical or EXT4_BLOCKS_PER_GROUP
is very large, the value calculated by start_off is more accurate.

Cc: stable@kernel.org
Fixes: cd648b8a8f ("ext4: trim allocation requests to group size")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220528110017.354175-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:43 -04:00
Eric Biggers 85456054e1 ext4: fix up test_dummy_encryption handling for new mount API
Since ext4 was converted to the new mount API, the test_dummy_encryption
mount option isn't being handled entirely correctly, because the needed
fscrypt_set_test_dummy_encryption() helper function combines
parsing/checking/applying into one function.  That doesn't work well
with the new mount API, which split these into separate steps.

This was sort of okay anyway, due to the parsing logic that was copied
from fscrypt_set_test_dummy_encryption() into ext4_parse_param(),
combined with an additional check in ext4_check_test_dummy_encryption().
However, these overlooked the case of changing the value of
test_dummy_encryption on remount, which isn't allowed but ext4 wasn't
detecting until ext4_apply_options() when it's too late to fail.
Another bug is that if test_dummy_encryption was specified multiple
times with an argument, memory was leaked.

Fix this up properly by using the new helper functions that allow
splitting up the parse/check/apply steps for test_dummy_encryption.

Fixes: cebe85d570 ("ext4: switch to the new mount api")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20220526040412.173025-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:43 -04:00
Shuqi Zhang 4efd9f0d12 ext4: use kmemdup() to replace kmalloc + memcpy
Replace kmalloc + memcpy with kmemdup()

Signed-off-by: Shuqi Zhang <zhangshuqi3@huawei.com>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525030120.803330-1-zhangshuqi3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:43 -04:00
Ye Bin 9b6641dd95 ext4: fix super block checksum incorrect after mount
We got issue as follows:
[home]# mount  /dev/sda  test
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[home]# dmesg
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda): Errors on filesystem, clearing orphan list.
EXT4-fs (sda): recovery complete
EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none.
[home]# debugfs /dev/sda
debugfs 1.46.5 (30-Dec-2021)
Checksum errors in superblock!  Retrying...

Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update
super block checksum.

To solve above issue, defer update super block checksum after
ext4_orphan_cleanup.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-18 19:35:24 -04:00
Shyam Prasad N 5d24968f5b cifs: when a channel is not found for server, log its connection id
cifs_ses_get_chan_index gets the index for a given server pointer.
When a match is not found, we warn about a possible bug.
However, printing details about the non-matching server could be
more useful to debug here.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-18 14:55:06 -05:00
Xiang wangx 93a8c044b9 tracefs: Fix syntax errors in comments
Delete the redundant word 'to'.

Link: https://lkml.kernel.org/r/20220605092729.13010-1-wangxiang@cdjrlc.com

Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-06-17 19:01:28 -04:00
Linus Torvalds 4b35035bcf NFS Client Fixes for Linux 5.19-rc
- Bugfixes:
   - Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail
   - Fix trunking detection & cl_max_connect setting
   - Avoid pnfs_update_layout() livelocks
   - Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmKs2ZwACgkQ18tUv7Cl
 QOsnGxAA6g0M7IU6g375rfqxq9a/XqSlvwIeuSOb14WNybh7D6qXOa+iInsHFIl7
 d7coORg846SOdUl218hVcp8Ba1OTsj/XAruJllsrHQnB50raBJ/nUbIbQlrxGYKn
 WzMtVnyLQ8Ml+rvERINPqVcgUBJ0PGWMiRi/h8OcUlWylV5ZI/irpkaZiuXamzhe
 O0Wa04N/82bUU03dEmQ0ZuPuhMn5JbOMaSzciRvHEV8nLvqvRAhGIVBtvrrYF0VB
 UfZx/4DTRXDD5/RA65hX2vgixZh7/cLTv4pr26wmfDBofo1zDiFsQpPS8QaZo5bt
 Sw+UQK1c15kW/EJS+au90mazBmFnk5UX2BOyfN+Cg5/GlHjE0YHV1f2ejbHycsyh
 Rcsu8nxNa5T82mg2EjOCqK2YWy8mGHYr5MJTYftL/uE8NdqP/DSgNpqNSQhQD2Bm
 vzsG0wP5RP+i3pRWWQnXOlZE+GdaKxtXKtg2ZjHx1Wkb2QIUbgbxST59q/U5QnIN
 MJKS1nhbxQA0dGo3ClzYNe76S9It7DDE0A3mdvIzPwRSheQhgmF4UlzyTWgSdyfw
 lnT5EK3pQat6cvdaszjMn1f6vx4BvTuhXUE9eH35opVMYykvAU6hz4ypllxKoR/h
 BES6KMJPKXn77ICJJVC3RR5w2v756DRNMeOfkjCi18TiuTVFXek=
 =nTJk
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

 - Add FMODE_CAN_ODIRECT support to NFSv4 so opens don't fail

 - Fix trunking detection & cl_max_connect setting

 - Avoid pnfs_update_layout() livelocks

 - Don't keep retrying pNFS if the server replies with NFS4ERR_UNAVAILABLE

* tag 'nfs-for-5.19-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
  sunrpc: set cl_max_connect when cloning an rpc_clnt
  pNFS: Avoid a live lock condition in pnfs_update_layout()
  pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
2022-06-17 15:17:57 -05:00
Linus Torvalds f8e174c307 io_uring-5.19-2022-06-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKsc6oQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgGxD/9YB9O3Dw2WOlzE+bnbadDEL0/XdaMVSZQX
 t5pfz1YTBUf/KgF+HWo6cGgvNupNjo6a2FAGJiGIXaEx2lZbKw7gUEPXohqY18h6
 alLPzt881whWESXTjpsDtc57PfCVY/K5/5ebqN5AhoXCgtl6CvlePJZH8uzBMq5F
 vGjcBgdofum677uNPSEpn6AzIGVtd9jI6Rg8r/a4iRdzeAJlkp1ifVh424qGgWtQ
 fQuoV83EPut/RTUodXZwJ/2XrdJwNDex98LEmp1Pi78IprGawrQ5F9JzsypQR2ie
 8ajLe6xn4wiXuWFr3pE9paow3c1APuftJ/PRXqBHoh2X6sMI4G2B2UNDkKrlK6DD
 9r5INcKzpMY390nN6GnSD1BSWBGNuglu9mASXDKFXL/JK+XNi6nYlaXdPn4uAhyR
 Cp41xx3gGf3r8aq8Pv+YNRej3kpNSi8oHKhYPToxn+EwPX8TpTdexnQC4ZKWNMbZ
 Mg1hY5Z0NxuhEyvKlTXZmOF8dlf2dTZYJoqHHeYhvcoZT9dWwjrINXqJvqsCyywB
 2fPOPjdn1SuBwsugSkYkMlsbLm4rlyLCLnEL2SgcbzyQ2rubN5UFcp3ouJOEt5Nz
 HDZi4s7LBOZTGmnmtev5GOA7kDCQ2EqOcRZQOdWPSa5g5pOL11ahxRW0KESSsPik
 1pTBDjTfxg==
 =55JE
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "Bigger than usual at this time, both because we missed -rc2, but also
  because of some reverts that we chose to do. In detail:

   - Adjust mapped buffer API while we still can (Dylan)

   - Mapped buffer fixes (Dylan, Hao, Pavel, me)

   - Fix for uring_cmd wrong API usage for task_work (Dylan)

   - Fix for bug introduced in fixed file closing (Hao)

   - Fix race in buffer/file resource handling (Pavel)

   - Revert the NOP support for CQE32 and buffer selection that was
     brought up during the merge window (Pavel)

   - Remove IORING_CLOSE_FD_AND_FILE_SLOT introduced in this merge
     window. The API needs further refining, so just yank it for now and
     we'll revisit for a later kernel.

   - Series cleaning up the CQE32 support added in this merge window,
     making it more integrated rather than sitting on the side (Pavel)"

* tag 'io_uring-5.19-2022-06-16' of git://git.kernel.dk/linux-block: (21 commits)
  io_uring: recycle provided buffer if we punt to io-wq
  io_uring: do not use prio task_work_add in uring_cmd
  io_uring: commit non-pollable provided mapped buffers upfront
  io_uring: make io_fill_cqe_aux honour CQE32
  io_uring: remove __io_fill_cqe() helper
  io_uring: fix ->extra{1,2} misuse
  io_uring: fill extra big cqe fields from req
  io_uring: unite fill_cqe and the 32B version
  io_uring: get rid of __io_fill_cqe{32}_req()
  io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
  Revert "io_uring: add buffer selection support to IORING_OP_NOP"
  Revert "io_uring: support CQE32 for nop operation"
  io_uring: limit size of provided buffer ring
  io_uring: fix types in provided buffer ring
  io_uring: fix index calculation
  io_uring: fix double unlock for pbuf select
  io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
  io_uring: openclose: fix bug of closing wrong fixed file
  io_uring: fix not locked access to fixed buf table
  io_uring: fix races with buffer table unregister
  ...
2022-06-17 11:14:07 -07:00
Linus Torvalds 5c0cd3d4a9 Merge tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull writeback and ext2 fixes from Jan Kara:
 "A fix for writeback bug which prevented machines with kdevtmpfs from
  booting and also one small ext2 bugfix in IO error handling"

* tag 'fs_for_v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  init: Initialize noop_backing_dev_info early
  ext2: fix fs corruption when trying to remove a non-empty directory with IO error
2022-06-17 10:09:24 -07:00
Jens Axboe 6436c770f1 io_uring: recycle provided buffer if we punt to io-wq
io_arm_poll_handler() will recycle the buffer appropriately if we end
up arming poll (or if we're ready to retry), but not for the io-wq case
if we have attempted poll first.

Explicitly recycle the buffer to avoid both hanging on to it too long,
but also to avoid multiple reads grabbing the same one. This can happen
for ring mapped buffers, since it hasn't necessarily been committed.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Link: https://github.com/axboe/liburing/issues/605
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-17 06:24:26 -06:00
Mike Kravetz 68d32527d3 hugetlbfs: zero partial pages during fallocate hole punch
hugetlbfs fallocate support was originally added with commit 70c3547e36
("hugetlbfs: add hugetlbfs_fallocate()").  Initial support only operated
on whole hugetlb pages.  This makes sense for populating files as other
interfaces such as mmap and truncate require hugetlb page size alignment. 
Only operating on whole hugetlb pages for the hole punch case was a
simplification and there was no compelling use case to zero partial pages.

In a recent discussion[1] it was assumed that hugetlbfs hole punch would
zero partial hugetlb pages as that is in line with the man page
description saying 'partial filesystem blocks are zeroed'.  However, the
hugetlbfs hole punch code actually does this:

        hole_start = round_up(offset, hpage_size);
        hole_end = round_down(offset + len, hpage_size);

Modify code to zero partial hugetlb pages in hole punch range.  It is
possible that application code could note a change in behavior.  However,
that would imply the code is passing in an unaligned range and expecting
only whole pages be removed.  This is unlikely as the fallocate
documentation states the opposite.

The current hugetlbfs fallocate hole punch behavior is tested with the
libhugetlbfs test fallocate_align[2].  This test will be updated to
validate partial page zeroing.

[1] https://lore.kernel.org/linux-mm/20571829-9d3d-0b48-817c-b6b15565f651@redhat.com/
[2] https://github.com/libhugetlbfs/libhugetlbfs/blob/master/tests/fallocate_align.c

Link: https://lkml.kernel.org/r/YqeiMlZDKI1Kabfe@monkey
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-06-16 19:11:32 -07:00
Steve French 7c05eae8db smb3: add trace point for SMB2_set_eof
In order to debug problems with file size being reported incorrectly
temporarily (in this case xfstest generic/584 intermittent failure)
we need to add trace point for the non-compounded code path where
we set the file size (SMB2_set_eof).  The new trace point is:
   "smb3_set_eof"

Here is sample output from the tracepoint:

            TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
              | |         |   |||||     |         |
          xfs_io-75403   [002] ..... 95219.189835: smb3_set_eof: xid=221 sid=0xeef1cbd2 tid=0x27079ee6 fid=0x52edb58c offset=0x100000
 aio-dio-append--75418   [010] ..... 95219.242402: smb3_set_eof: xid=226 sid=0xeef1cbd2 tid=0x27079ee6 fid=0xae89852d offset=0x0

Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-16 18:07:10 -05:00
Dominique Martinet b0017602fd 9p: fix EBADF errors in cached mode
cached operations sometimes need to do invalid operations (e.g. read
on a write only file)
Historic fscache had added a "writeback fid", a special handle opened
RW as root, for this. The conversion to new fscache missed that bit.

This commit reinstates a slightly lesser variant of the original code
that uses the writeback fid for partial pages backfills if the regular
user fid had been open as WRONLY, and thus would lack read permissions.

Link: https://lkml.kernel.org/r/20220614033802.1606738-1-asmadeus@codewreck.org
Fixes: eb497943fa ("9p: Convert to using the netfs helper lib to do reads and caching")
Cc: stable@vger.kernel.org
Cc: David Howells <dhowells@redhat.com>
Reported-By: Christian Schoenebeck <linux_oss@crudebyte.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Tested-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-06-17 06:03:30 +09:00
Jan Kara 8d5459c11f ext4: improve write performance with disabled delalloc
When delayed allocation is disabled (either through mount option or
because we are running low on free space), ext4_write_begin() allocates
blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent
merging is disabled and since ext4_write_begin() is called for each page
separately, we end up with a *lot* of 1 block extents in the extent tree
and following writeback is writing 1 block at a time which results in
very poor write throughput (4 MB/s instead of 200 MB/s). These days when
ext4_get_block_unwritten() is used only by ext4_write_begin(),
ext4_page_mkwrite() and inline data conversion, we can safely allow
extent merging to happen from these paths since following writeback will
happen on different boundaries anyway. So use
EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance.

Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220520111402.4252-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-16 12:17:56 -04:00
Zhang Yi 15baa7dcad ext4: fix warning when submitting superblock in ext4_commit_super()
We have already check the io_error and uptodate flag before submitting
the superblock buffer, and re-set the uptodate flag if it has been
failed to write out. But it was lockless and could be raced by another
ext4_commit_super(), and finally trigger '!uptodate' WARNING when
marking buffer dirty. Fix it by submit buffer directly.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ritesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220520023216.3065073-1-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-16 11:50:48 -04:00
Dylan Yudaken 32fc810b36 io_uring: do not use prio task_work_add in uring_cmd
io_req_task_prio_work_add has a strict assumption that it will only be
used with io_req_task_complete. There is a codepath that assumes this is
the case and will not even call the completion function if it is hit.

For uring_cmd with an arbitrary completion function change the call to the
correct non-priority version.

Fixes: ee692a21e9 ("fs,io_uring: add infrastructure for uring-cmd")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220616135011.441980-1-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-16 09:10:26 -06:00
Wang Jianjian 48e02e6113 ext4: fix incorrect comment in ext4_bio_write_page()
Signed-off-by: Wang Jianjian <wangjianjian3@huawei.com>
Link: https://lore.kernel.org/r/20220520022255.2120576-1-wangjianjian3@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-16 11:03:16 -04:00
Yang Li 4f5bf12732 fs: fix jbd2_journal_try_to_free_buffers() kernel-doc comment
Add the description of @folio and remove @page in function kernel-doc
comment to remove warnings found by running scripts/kernel-doc, which
is caused by using 'make W=1'.

fs/jbd2/transaction.c:2149: warning: Function parameter or member
'folio' not described in 'jbd2_journal_try_to_free_buffers'
fs/jbd2/transaction.c:2149: warning: Excess function parameter 'page'
description in 'jbd2_journal_try_to_free_buffers'

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220512075432.31763-1-yang.lee@linux.alibaba.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2022-06-16 10:36:09 -04:00
Jens Axboe a76c0b31ee io_uring: commit non-pollable provided mapped buffers upfront
For recv/recvmsg, IO either completes immediately or gets queued for a
retry. This isn't the case for read/readv, if eg a normal file or a block
device is used. Here, an operation can get queued with the block layer.
If this happens, ring mapped buffers must get committed immediately to
avoid that the next read can consume the same buffer.

Check if we're dealing with pollable file, when getting a new ring mapped
provided buffer. If it's not, commit it immediately rather than wait post
issue. If we don't wait, we can race with completions coming in, or just
plain buffer reuse by committing after a retry where others could have
grabbed the same buffer.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-16 07:14:44 -06:00
Ye Bin 27cfa25895 ext2: fix fs corruption when trying to remove a non-empty directory with IO error
We got issue as follows:
[home]# mount  /dev/sdd  test
[home]# cd test
[test]# ls
dir1  lost+found
[test]# rmdir  dir1
ext2_empty_dir: inject fault
[test]# ls
lost+found
[test]# cd ..
[home]# umount test
[home]# fsck.ext2 -fn  /dev/sdd
e2fsck 1.42.9 (28-Dec-2013)
Pass 1: Checking inodes, blocks, and sizes
Inode 4065, i_size is 0, should be 1024.  Fix? no

Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Unconnected directory inode 4065 (/???)
Connect to /lost+found? no

'..' in ... (4065) is / (2), should be <The NULL inode> (0).
Fix? no

Pass 4: Checking reference counts
Inode 2 ref count is 3, should be 4.  Fix? no

Inode 4065 ref count is 2, should be 3.  Fix? no

Pass 5: Checking group summary information

/dev/sdd: ********** WARNING: Filesystem still has errors **********

/dev/sdd: 14/128016 files (0.0% non-contiguous), 18477/512000 blocks

Reason is same with commit 7aab5c84a0. We can't assume directory
is empty when read directory entry failed.

Link: https://lore.kernel.org/r/20220615090010.1544152-1-yebin10@huawei.com
Signed-off-by: Ye Bin <yebin10@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-16 10:55:45 +02:00
Darrick J. Wong e89ab76d7e xfs: preserve DIFLAG2_NREXT64 when setting other inode attributes
It is vitally important that we preserve the state of the NREXT64 inode
flag when we're changing the other flags2 fields.

Fixes: 9b7d16e34b ("xfs: Introduce XFS_DIFLAG2_NREXT64 and associated helpers")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Chandan Babu R <chandan.babu@oracle.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2022-06-15 23:13:33 -07:00
Darrick J. Wong 10930b254d xfs: fix variable state usage
The variable @args is fed to a tracepoint, and that's the only place
it's used.  This is fine for the kernel, but for userspace, tracepoints
are #define'd out of existence, which results in this warning on gcc
11.2:

xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
 1440 |         struct xfs_da_args              *args = attr->xattri_da_args;
      |                                          ^~~~

Clean this up.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2022-06-15 23:13:32 -07:00
Darrick J. Wong f4288f0182 xfs: fix TOCTOU race involving the new logged xattrs control knob
I found a race involving the larp control knob, aka the debugging knob
that lets developers enable logging of extended attribute updates:

Thread 1			Thread 2

echo 0 > /sys/fs/xfs/debug/larp
				setxattr(REPLACE)
				xfs_has_larp (returns false)
				xfs_attr_set

echo 1 > /sys/fs/xfs/debug/larp

				xfs_attr_defer_replace
				xfs_attr_init_replace_state
				xfs_has_larp (returns true)
				xfs_attr_init_remove_state

				<oops, wrong DAS state!>

This isn't a particularly severe problem right now because xattr logging
is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
what they're doing.

However, the eventual intent is that callers should be able to ask for
the assistance of the log in persisting xattr updates.  This capability
might not be required for /all/ callers, which means that dynamic
control must work correctly.  Once an xattr update has decided whether
or not to use logged xattrs, it needs to stay in that mode until the end
of the operation regardless of what subsequent parallel operations might
do.

Therefore, it is an error to continue sampling xfs_globals.larp once
xfs_attr_change has made a decision about larp, and it was not correct
for me to have told Allison that ->create_intent functions can sample
the global log incompat feature bitfield to decide to elide a log item.

Instead, create a new op flag for the xfs_da_args structure, and convert
all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
the attr update state machine to look for the operations flag.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
2022-06-15 23:13:32 -07:00
Dave Wysochanski 5ee3d10f84 NFSv4: Add FMODE_CAN_ODIRECT after successful open of a NFS4.x file
Commit a2ad63daa8 ("VFS: add FMODE_CAN_ODIRECT file flag")
added the FMODE_CAN_ODIRECT flag for NFSv3 but neglected to add
it for NFSv4.x.  This causes direct io on NFSv4.x to fail open
with EINVAL:
  mount -o vers=4.2 127.0.0.1:/export /mnt/nfs4
  dd if=/dev/zero of=/mnt/nfs4/file.bin bs=128k count=1 oflag=direct
  dd: failed to open '/mnt/nfs4/file.bin': Invalid argument
  dd of=/dev/null if=/mnt/nfs4/file.bin bs=128k count=1 iflag=direct
  dd: failed to open '/mnt/dir1/file1.bin': Invalid argument

Fixes: a2ad63daa8 ("VFS: add FMODE_CAN_ODIRECT file flag")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-06-15 15:03:12 -04:00
Linus Torvalds 979086f5e0 fs.fixes.v5.19-rc3
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYqmpKwAKCRCRxhvAZXjc
 ogvLAQCsgqKYjmqx1s9ta8PXH9qiTWLQh1/s3ONCAvSBe0rYRAD9HPwbUoxguqxr
 T2RzjuX2+rqzA5qTErjQqVEftn7DgAo=
 =+P6m
 -----END PGP SIGNATURE-----

Merge tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull vfs idmapping fix from Christian Brauner:
 "This fixes an issue where we fail to change the group of a file when
  the caller owns the file and is a member of the group to change to.

  This is only relevant on idmapped mounts.

  There's a detailed description in the commit message and regression
  tests have been added to xfstests"

* tag 'fs.fixes.v5.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  fs: account for group membership
2022-06-15 09:04:55 -07:00
Pavel Begunkov c5595975b5 io_uring: make io_fill_cqe_aux honour CQE32
Don't let io_fill_cqe_aux() post 16B cqes for CQE32 rings, neither the
kernel nor the userspace expect this to happen.

Fixes: 76c68fbf1a ("io_uring: enable CQE32")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64fae669fae1b7083aa15d0cd807f692b0880b9a.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:56 -06:00
Pavel Begunkov cd94903d3b io_uring: remove __io_fill_cqe() helper
In preparation for the following patch, inline __io_fill_cqe(), there is
only one user.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/71dab9afc3cde3f8b64d26f20d3b60bdc40726ff.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:42 -06:00
Pavel Begunkov 2caf9822f0 io_uring: fix ->extra{1,2} misuse
We don't really know the state of req->extra{1,2] fields in
__io_fill_cqe_req(), if an opcode handler is not aware of CQE32 option,
it never sets them up properly. Track the state of those fields with a
request flag.

Fixes: 76c68fbf1a ("io_uring: enable CQE32")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4b3e5be512fbf4debec7270fd485b8a3b014d464.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:09 -06:00
Pavel Begunkov 29ede2014c io_uring: fill extra big cqe fields from req
The only user of io_req_complete32()-like functions is cmd
requests. Instead of keeping the whole complete32 family, remove them
and provide the extras in already added for inline completions
req->extra{1,2}. When fill_cqe_res() finds CQE32 option enabled
it'll use those fields to fill a 32B cqe.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/af1319eb661b1f9a0abceb51cbbf72b8002e019d.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:09 -06:00
Pavel Begunkov f43de1f888 io_uring: unite fill_cqe and the 32B version
We want just one function that will handle both normal cqes and 32B
cqes. Combine __io_fill_cqe_req() and __io_fill_cqe_req32(). It's still
not entirely correct yet, but saves us from cases when we fill an CQE of
a wrong size.

Fixes: 76c68fbf1a ("io_uring: enable CQE32")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8085c5b2f74141520f60decd45334f87e389b718.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:09 -06:00
Pavel Begunkov 91ef75a7db io_uring: get rid of __io_fill_cqe{32}_req()
There are too many cqe filling helpers, kill __io_fill_cqe{32}_req(),
use __io_fill_cqe{32}_req_filled() instead, and then rename it. It'll
simplify fixing in following patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c18e0d191014fb574f24721245e4e3fddd0b6917.1655287457.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-15 05:06:09 -06:00
Tyler Hicks 2a3dcbccd6 9p: Fix refcounting during full path walks for fid lookups
Decrement the refcount of the parent dentry's fid after walking
each path component during a full path walk for a lookup. Failure to do
so can lead to fids that are not clunked until the filesystem is
unmounted, as indicated by this warning:

 9pnet: found fid 3 not clunked

The improper refcounting after walking resulted in open(2) returning
-EIO on any directories underneath the mount point when using the virtio
transport. When using the fd transport, there's no apparent issue until
the filesytem is unmounted and the warning above is emitted to the logs.

In some cases, the user may not yet be attached to the filesystem and a
new root fid, associated with the user, is created and attached to the
root dentry before the full path walk is performed. Increment the new
root fid's refcount to two in that situation so that it can be safely
decremented to one after it is used for the walk operation. The new fid
will still be attached to the root dentry when
v9fs_fid_lookup_with_uid() returns so a final refcount of one is
correct/expected.

Link: https://lkml.kernel.org/r/20220527000003.355812-2-tyhicks@linux.microsoft.com
Link: https://lkml.kernel.org/r/20220612085330.1451496-4-asmadeus@codewreck.org
Fixes: 6636b6dcc3 ("9p: add refcount to p9_fid struct")
Cc: stable@vger.kernel.org
Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
[Dominique: fix clunking fid multiple times discussed in second link]
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-06-15 12:05:33 +09:00
Dominique Martinet e5690f2632 9p: fix fid refcount leak in v9fs_vfs_get_link
we check for protocol version later than required, after a fid has
been obtained. Just move the version check earlier.

Link: https://lkml.kernel.org/r/20220612085330.1451496-3-asmadeus@codewreck.org
Fixes: 6636b6dcc3 ("9p: add refcount to p9_fid struct")
Cc: stable@vger.kernel.org
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-06-15 12:05:29 +09:00
Dominique Martinet beca774fc5 9p: fix fid refcount leak in v9fs_vfs_atomic_open_dotl
We need to release directory fid if we fail halfway through open

This fixes fid leaking with xfstests generic 531

Link: https://lkml.kernel.org/r/20220612085330.1451496-2-asmadeus@codewreck.org
Fixes: 6636b6dcc3 ("9p: add refcount to p9_fid struct")
Cc: stable@vger.kernel.org
Reported-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
2022-06-15 12:05:21 +09:00
Pavel Begunkov d884b6498d io_uring: remove IORING_CLOSE_FD_AND_FILE_SLOT
This partially reverts a7c41b4687

Even though IORING_CLOSE_FD_AND_FILE_SLOT might save cycles for some
users, but it tries to do two things at a time and it's not clear how to
handle errors and what to return in a single result field when one part
fails and another completes well. Kill it for now.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/837c745019b3795941eee4fcfd7de697886d645b.1655224415.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-14 10:57:40 -06:00
Pavel Begunkov aa165d6d2b Revert "io_uring: add buffer selection support to IORING_OP_NOP"
This reverts commit 3d200242a6.

Buffer selection with nops was used for debugging and benchmarking but
is useless in real life. Let's revert it before it's released.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c5012098ca6b51dfbdcb190f8c4e3c0bf1c965dc.1655224415.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-14 10:57:40 -06:00
Pavel Begunkov 8899ce4b2f Revert "io_uring: support CQE32 for nop operation"
This reverts commit 2bb04df7c2.

CQE32 nops were used for debugging and benchmarking but it doesn't
target any real use case. Revert it, we can return it back if someone
finds a good way to use it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5ff623d84ccb4b3f3b92a3ea41cdcfa612f3d96f.1655224415.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-14 10:57:40 -06:00
Christian Brauner 168f912893
fs: account for group membership
When calling setattr_prepare() to determine the validity of the
attributes the ia_{g,u}id fields contain the value that will be written
to inode->i_{g,u}id. This is exactly the same for idmapped and
non-idmapped mounts and allows callers to pass in the values they want
to see written to inode->i_{g,u}id.

When group ownership is changed a caller whose fsuid owns the inode can
change the group of the inode to any group they are a member of. When
searching through the caller's groups we need to use the gid mapped
according to the idmapped mount otherwise we will fail to change
ownership for unprivileged users.

Consider a caller running with fsuid and fsgid 1000 using an idmapped
mount that maps id 65534 to 1000 and 65535 to 1001. Consequently, a file
owned by 65534:65535 in the filesystem will be owned by 1000:1001 in the
idmapped mount.

The caller now requests the gid of the file to be changed to 1000 going
through the idmapped mount. In the vfs we will immediately map the
requested gid to the value that will need to be written to inode->i_gid
and place it in attr->ia_gid. Since this idmapped mount maps 65534 to
1000 we place 65534 in attr->ia_gid.

When we check whether the caller is allowed to change group ownership we
first validate that their fsuid matches the inode's uid. The
inode->i_uid is 65534 which is mapped to uid 1000 in the idmapped mount.
Since the caller's fsuid is 1000 we pass the check.

We now check whether the caller is allowed to change inode->i_gid to the
requested gid by calling in_group_p(). This will compare the passed in
gid to the caller's fsgid and search the caller's additional groups.

Since we're dealing with an idmapped mount we need to pass in the gid
mapped according to the idmapped mount. This is akin to checking whether
a caller is privileged over the future group the inode is owned by. And
that needs to take the idmapped mount into account. Note, all helpers
are nops without idmapped mounts.

New regression test sent to xfstests.

Link: https://github.com/lxc/lxd/issues/10537
Link: https://lore.kernel.org/r/20220613111517.2186646-1-brauner@kernel.org
Fixes: 2f221d6f7b ("attr: handle idmapped mounts")
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@vger.kernel.org # 5.15+
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
2022-06-14 12:18:47 +02:00
Jens Axboe feaf625e70 Merge branch 'io_uring/io_uring-5.19' of https://github.com/isilence/linux into io_uring-5.19
Pull io_uring fixes from Pavel.

* 'io_uring/io_uring-5.19' of https://github.com/isilence/linux:
  io_uring: fix double unlock for pbuf select
  io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
  io_uring: openclose: fix bug of closing wrong fixed file
  io_uring: fix not locked access to fixed buf table
  io_uring: fix races with buffer table unregister
  io_uring: fix races with file table unregister
2022-06-13 06:52:52 -06:00
Dylan Yudaken f9437ac0f8 io_uring: limit size of provided buffer ring
The type of head and tail do not allow more than 2^15 entries in a
provided buffer ring, so do not allow this.
At 2^16 while each entry can be indexed, there is no way to
disambiguate full vs empty.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220613101157.3687-4-dylany@fb.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-13 05:13:33 -06:00
Dylan Yudaken c6e9fa5c0a io_uring: fix types in provided buffer ring
The type of head needs to match that of tail in order for rollover and
comparisons to work correctly.

Without this change the comparison of tail to head might incorrectly allow
io_uring to use a buffer that userspace had not given it.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220613101157.3687-3-dylany@fb.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-13 05:13:31 -06:00
Dylan Yudaken 97da4a5379 io_uring: fix index calculation
When indexing into a provided buffer ring, do not subtract 1 from the
index.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220613101157.3687-2-dylany@fb.com
Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-13 05:13:09 -06:00
Pavel Begunkov fc9375e3f7 io_uring: fix double unlock for pbuf select
io_buffer_select(), which is the only caller of io_ring_buffer_select(),
fully handles locking, mutex unlock in io_ring_buffer_select() will lead
to double unlock.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 11:37:41 +01:00
Hao Xu 42db0c00e2 io_uring: kbuf: fix bug of not consuming ring buffer in partial io case
When we use ring-mapped provided buffer, we should consume it before
arm poll if partial io has been done. Otherwise the buffer may be used
by other requests and thus we lost the data.

Fixes: c7fb19428d ("io_uring: add support for ring mapped supplied buffers")
Signed-off-by: Hao Xu <howeyxu@tencent.com>
[pavel: 5.19 rebase]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 11:37:30 +01:00
Hao Xu e71d7c56dd io_uring: openclose: fix bug of closing wrong fixed file
Don't update ret until fixed file is closed, otherwise the file slot
becomes the error code.

Fixes: a7c41b4687 ("io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots")
Signed-off-by: Hao Xu <howeyxu@tencent.com>
[pavel: 5.19 rebase]
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 11:37:03 +01:00
Pavel Begunkov 05b538c176 io_uring: fix not locked access to fixed buf table
We can look inside the fixed buffer table only while holding
->uring_lock, however in some cases we don't do the right async prep for
IORING_OP_{WRITE,READ}_FIXED ending up with NULL req->imu forcing making
an io-wq worker to try to resolve the fixed buffer without proper
locking.

Move req->imu setup into early req init paths, i.e. io_prep_rw(), which
is called unconditionally for rw requests and under uring_lock.

Fixes: 634d00df5e ("io_uring: add full-fledged dynamic buffers support")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 09:53:41 +01:00
Pavel Begunkov d11d31fc5d io_uring: fix races with buffer table unregister
Fixed buffer table quiesce might unlock ->uring_lock, potentially
letting new requests to be submitted, don't allow those requests to
use the table as they will race with unregistration.

Reported-and-tested-by: van fantasy <g1042620637@gmail.com>
Fixes: bd54b6fe33 ("io_uring: implement fixed buffers registration similar to fixed files")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 09:53:27 +01:00
Pavel Begunkov b0380bf6da io_uring: fix races with file table unregister
Fixed file table quiesce might unlock ->uring_lock, potentially letting
new requests to be submitted, don't allow those requests to use the
table as they will race with unregistration.

Reported-and-tested-by: van fantasy <g1042620637@gmail.com>
Fixes: 05f3fb3c53 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2022-06-13 09:53:07 +01:00
Linus Torvalds 2275c6babf 3 smb3 reconnect fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKkwyEACgkQiiy9cAdy
 T1EqyAv/d8aDey0rQzGy918wzJd91gZrNFOJUpzVhIs3O5MakBgeoYn+S6rySl1+
 xs6lXTQdSEyiL0edqTIq8iqA+iuhLPCBW2BWa/Zw089yHM/Ho3tjc5gBl5w38OcF
 7NpFUInkg+yoBYWY9cCwjL83YaPxhcLKGY7S6WWptUxzf5Sg6eUqXCkMS7eUV6hb
 YniMa5uWZSJtqY4F6qw/NOw90QekodEmfL4lLU/GXOnDxlJ8v5Ztf3aGHITWNwsd
 ovhutUSai/tZz9fYHp6yOZYDcl4i0brOa3dIyU2tr52TdtzS73he8rE+Th4bu+uM
 XTXvrDCTwsnOTiRFyyBJcaVDF+6LpqPEcURqLEbVOf0xXHyoEQ4zEwVFQJIBYOP4
 Oy8XeXQePRxCnToI2cFZaw85IkLikoZ+4PggFkbsaFdJfkboR7b+XVhkfGVr5jnn
 A6Unwrn3f6LS6MLbsDAjUpfzftwRyhbcvYCukeYKWz836xAx5tyOBIZA3m6keXah
 LyhX4qSb
 =mHWJ
 -----END PGP SIGNATURE-----

Merge tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Three reconnect fixes, all for stable as well.

  One of these three reconnect fixes does address a problem with
  multichannel reconnect, but this does not include the additional
  fix (still being tested) for dynamically detecting multichannel
  adapter changes which will improve those reconnect scenarios even
  more"

* tag '5.19-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: populate empty hostnames for extra channels
  cifs: return errors during session setup during reconnects
  cifs: fix reconnect on smb3 mount types
2022-06-12 11:05:44 -07:00
Christophe JAILLET 06ee1c0aeb ksmbd: smbd: Remove useless license text when SPDX-License-Identifier is already used
An SPDX-License-Identifier is already in place. There is no need to
duplicate part of the corresponding license.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-11 11:18:26 -05:00
Namjae Jeon fe0fde09e1 ksmbd: use SOCK_NONBLOCK type for kernel_accept()
I found that normally it is O_NONBLOCK but there are different value
for some arch.

/include/linux/net.h:
#ifndef SOCK_NONBLOCK
#define SOCK_NONBLOCK   O_NONBLOCK
#endif

/arch/alpha/include/asm/socket.h:
#define SOCK_NONBLOCK   0x40000000

Use SOCK_NONBLOCK instead of O_NONBLOCK for kernel_accept().

Suggested-by: David Howells <dhowells@redhat.com>
Signed-off-by: Namjae Jeon <linkinjeon@kerne.org>
Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-11 11:18:26 -05:00
Linus Torvalds 0885eacdc8 Notable changes:
- There is now a backup maintainer for NFSD
 
 Notable fixes:
 - Prevent array overruns in svc_rdma_build_writes()
 - Prevent buffer overruns when encoding NFSv3 READDIR results
 - Fix a potential UAF in nfsd_file_put()
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKjhbsACgkQM2qzM29m
 f5cgig/6A9gC2c9v4lR2fH6ufiCWvJBfuVaBbToubwktJHaDLvqH56JcvS3s/gKL
 PKGmbQTI/6lgmVgJqQSxJUnfe6wzHx8G1MdjlZEIwi3pUeiV+LpiprZz9TOhgYYV
 YDnXRGhb4wKOm75w8rb6X8k106XdKwdBaRQwb88FDawffWoEY0XNYrlmNmmWi8To
 ELlOlIwRCBbKJoJ6yEEWQRrBuVBXapbsn29tipZXbdo58g+vL0yDQq9s97b0mHhi
 C2apAN2+k18FiBJsA7b7pW1l/P6k9FNEeetvgWyN8OSMpPNmt0vz1HvKaIstPgg1
 BX6rgWe5eQBFEk2KNvSGHrV3R+wAp7jeuVpHUMjxXvzmfj4exJV/H8lu+qZJNDGN
 ybCJatomR4APFxk+s1kptlzNo7zfyPz15L80HmWIngYJ/lrBOoKPIIi3bwPQcBwW
 q2Rc+SlvpqbJvEcomgF/lqQN6inmx44J+KpOSA/S8qSIdSkz0iaZsDahFxgZNe82
 h+X/i1maRtnSIvWdGMR7O6kEFT5jky35WlTv/VutTOsUwA4mUU9vZUnufBBHJH07
 nOdLMi/QS/O5GOnlyegrODtN75wi+IeKt+WMNmnN+JB8Tsg0kZwjOsc/dbQfyJqP
 PrQJ5AUP0TMm90B1873z8yKmhWeXtB71vgAI/d53aadBG4ZEstU=
 =6ugk
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd fixes from Chuck Lever:
 "Notable changes:

   - There is now a backup maintainer for NFSD

  Notable fixes:

   - Prevent array overruns in svc_rdma_build_writes()

   - Prevent buffer overruns when encoding NFSv3 READDIR results

   - Fix a potential UAF in nfsd_file_put()"

* tag 'nfsd-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  SUNRPC: Remove pointer type casts from xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_get_next_encode_buffer()
  SUNRPC: Clean up xdr_commit_encode()
  SUNRPC: Optimize xdr_reserve_space()
  SUNRPC: Fix the calculation of xdr->end in xdr_get_next_encode_buffer()
  SUNRPC: Trap RDMA segment overflows
  NFSD: Fix potential use-after-free in nfsd_file_put()
  MAINTAINERS: reciprocal co-maintainership for file locking and nfsd
2022-06-10 17:28:43 -07:00
Shyam Prasad N 4c14d7043f cifs: populate empty hostnames for extra channels
Currently, the secondary channels of a multichannel session
also get hostname populated based on the info in primary channel.
However, this will end up with a wrong resolution of hostname to
IP address during reconnect.

This change fixes this by not populating hostname info for all
secondary channels.

Fixes: 5112d80c16 ("cifs: populate server_hostname for extra channels")
Cc: stable@vger.kernel.org
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-10 18:55:02 -05:00
David Howells 40a8110120 netfs: Rename the netfs_io_request cleanup op and give it an op pointer
The netfs_io_request cleanup op is now always in a position to be given a
pointer to a netfs_io_request struct, so this can be passed in instead of
the mapping and private data arguments (both of which are included in the
struct).

So rename the ->cleanup op to ->free_request (to match ->init_request) and
pass in the I/O pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
cc: linux-cachefs@redhat.com
2022-06-10 20:55:21 +01:00
Linus Torvalds e81fb4198e netfs: Further cleanups after struct netfs_inode wrapper introduced
Change the signature of netfs helper functions to take a struct netfs_inode
pointer rather than a struct inode pointer where appropriate, thereby
relieving the need for the network filesystem to convert its internal inode
format down to the VFS inode only for netfslib to bounce it back up.  For
type safety, it's better not to do that (and it's less typing too).

Give netfs_write_begin() an extra argument to pass in a pointer to the
netfs_inode struct rather than deriving it internally from the file
pointer.  Note that the ->write_begin() and ->write_end() ops are intended
to be replaced in the future by netfslib code that manages this without the
need to call in twice for each page.

netfs_readpage() and similar are intended to be pointed at directly by the
address_space_operations table, so must stick to the signature dictated by
the function pointers there.

Changes
=======
- Updated the kerneldoc comments and documentation [DH].

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/
2022-06-10 20:55:21 +01:00
David Howells 102d841055 afs: Fix some checker issues
Remove an unused global variable and make another static as reported by
make C=1.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
2022-06-10 20:55:21 +01:00
Linus Torvalds ad6e076498 zonefs fixes for 5.19-rc2
* Fix handling of the explicit-open mount option, and in particular the
   conditions under which this option can be ignored.
 
 * Fix a problem with zonefs iomap_begin method, causing a hang in
   iomap_readahead() when a readahead request reaches the end of a file.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCYqMh/wAKCRDdoc3SxdoY
 dvd+AP4jNRFhAedXl0mIutoP4k0XwblSz9RwrXLOYzkOtgpXGQD+Lps42w6EQliE
 wWuuL4syVgKamolj0WGcPLarGZC7LQA=
 =neot
 -----END PGP SIGNATURE-----

Merge tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fixes from Damien Le Moal:

 - Fix handling of the explicit-open mount option, and in particular the
   conditions under which this option can be ignored.

 - Fix a problem with zonefs iomap_begin method, causing a hang in
   iomap_readahead() when a readahead request reaches the end of a file.

* tag 'zonefs-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
  zonefs: fix zonefs_iomap_begin() for reads
  zonefs: Do not ignore explicit_open with active zone limit
  zonefs: fix handling of explicit_open option on mount
2022-06-10 10:56:28 -07:00
David Howells 874c8ca1e6 netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context
While randstruct was satisfied with using an open-coded "void *" offset
cast for the netfs_i_context <-> inode casting, __builtin_object_size() as
used by FORTIFY_SOURCE was not as easily fooled.  This was causing the
following complaint[1] from gcc v12:

  In file included from include/linux/string.h:253,
                   from include/linux/ceph/ceph_debug.h:7,
                   from fs/ceph/inode.c:2:
  In function 'fortify_memset_chk',
      inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2,
      inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2:
  include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    242 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by embedding a struct inode into struct netfs_i_context (which
should perhaps be renamed to struct netfs_inode).  The struct inode
vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode
structs and vfs_inode is then simply changed to "netfs.inode" in those
filesystems.

Further, rename netfs_i_context to netfs_inode, get rid of the
netfs_inode() function that converted a netfs_i_context pointer to an
inode pointer (that can now be done with &ctx->inode) and rename the
netfs_i_context() function to netfs_inode() (which is now a wrapper
around container_of()).

Most of the changes were done with:

  perl -p -i -e 's/vfs_inode/netfs.inode/'g \
        `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]`

Kees suggested doing it with a pair structure[2] and a special
declarator to insert that into the network filesystem's inode
wrapper[3], but I think it's cleaner to embed it - and then it doesn't
matter if struct randomisation reorders things.

Dave Chinner suggested using a filesystem-specific VFS_I() function in
each filesystem to convert that filesystem's own inode wrapper struct
into the VFS inode struct[4].

Version #2:
 - Fix a couple of missed name changes due to a disabled cifs option.
 - Rename nfs_i_context to nfs_inode
 - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper
   structs.

[ This also undoes commit 507160f46c ("netfs: gcc-12: temporarily
  disable '-Wattribute-warning' for now") that is no longer needed ]

Fixes: bc899ee1c8 ("netfs: Add a netfs inode context")
Reported-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
cc: Jonathan Corbet <corbet@lwn.net>
cc: Eric Van Hensbergen <ericvh@gmail.com>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Steve French <smfrench@gmail.com>
cc: William Kucharski <william.kucharski@oracle.com>
cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
cc: Dave Chinner <david@fromorbit.com>
cc: linux-doc@vger.kernel.org
cc: v9fs-developer@lists.sourceforge.net
cc: linux-afs@lists.infradead.org
cc: ceph-devel@vger.kernel.org
cc: linux-cifs@vger.kernel.org
cc: samba-technical@lists.samba.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-hardening@vger.kernel.org
Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1]
Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2]
Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3]
Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4]
Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09 13:55:00 -07:00
Linus Torvalds 3d9f55c57b \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmKiO9UACgkQnJ2qBz9k
 QNk9+Af/RjaJEozyj/He7nqj1xncN6bIJzeyOqQVJNkHBsKYt7oDFvSuYI1Kbzk+
 x7/x8dRtVR3kRZCO6VarETkzGp6Nw10RdzFKqT2FRmQ66wVZaXPQeqVZqwXSKdtR
 qgU892e9S2SqUH9EyUwk3D/HwLr1VNKKp6B0N+By7EwKmZdyTg5siFJ26+z+QpJQ
 wo84nN/m6GgHSm+c8kMFa+cs635tMY3+vP4nviUKyuDTxW3Yu6maIa5973WLiFqo
 EZSLtSfXYasjoOl5fN3AaO0dAl8fRJIh6wsgbeQI/NeUYMIqKWslW+5esq1SwreS
 r1+Xig8MmxDJ/1I3i/L/aDM7FipY9A==
 =kMe8
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull ext2, writeback, and quota fixes and cleanups from Jan Kara:
 "A fix for race in writeback code and two cleanups in quota and ext2"

* tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  quota: Prevent memory allocation recursion while holding dq_lock
  writeback: Fix inode->i_io_list not be protected by inode->i_lock error
  fs: Fix syntax errors in comments
2022-06-09 12:26:05 -07:00
Linus Torvalds 507160f46c netfs: gcc-12: temporarily disable '-Wattribute-warning' for now
This is a pure band-aid so that I can continue merging stuff from people
while some of the gcc-12 fallout gets sorted out.

In particular, gcc-12 is very unhappy about the kinds of pointer
arithmetic tricks that netfs does, and that makes the fortify checks
trigger in afs and ceph:

  In function ‘fortify_memset_chk’,
      inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2,
      inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2,
      inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2:
  include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning]
    258 |                         __write_overflow_field(p_size_field, size);
        |                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

and the reason is that netfs_i_context_init() is passed a 'struct inode'
pointer, and then it does

        struct netfs_i_context *ctx = netfs_i_context(inode);

        memset(ctx, 0, sizeof(*ctx));

where that netfs_i_context() function just does pointer arithmetic on
the inode pointer, knowing that the netfs_i_context is laid out
immediately after it in memory.

This is all truly disgusting, since the whole "netfs_i_context is laid
out immediately after it in memory" is not actually remotely true in
general, but is just made to be that way for afs and ceph.

See for example fs/cifs/cifsglob.h:

  struct cifsInodeInfo {
        struct {
                /* These must be contiguous */
                struct inode    vfs_inode;      /* the VFS's inode record */
                struct netfs_i_context netfs_ctx; /* Netfslib context */
        };
	[...]

and realize that this is all entirely wrong, and the pointer arithmetic
that netfs_i_context() is doing is also very very wrong and wouldn't
give the right answer if netfs_ctx had different alignment rules from a
'struct inode', for example).

Anyway, that's just a long-winded way to say "the gcc-12 warning is
actually quite reasonable, and our code happens to work but is pretty
disgusting".

This is getting fixed properly, but for now I made the mistake of
thinking "the week right after the merge window tends to be calm for me
as people take a breather" and I did a sustem upgrade.  And I got gcc-12
as a result, so to continue merging fixes from people and not have the
end result drown in warnings, I am fixing all these gcc-12 issues I hit.

Including with these kinds of temporary fixes.

Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/all/AEEBCF5D-8402-441D-940B-105AA718C71F@chromium.org/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-09 11:29:36 -07:00
Sungjong Seo 204e6ceaa1 exfat: use updated exfat_chain directly during renaming
In order for a file to access its own directory entry set,
exfat_inode_info(ei) has two copied values. One is ei->dir, which is
a snapshot of exfat_chain of the parent directory, and the other is
ei->entry, which is the offset of the start of the directory entry set
in the parent directory.

Since the parent directory can be updated after the snapshot point,
it should be used only for accessing one's own directory entry set.

However, as of now, during renaming, it could try to traverse or to
allocate clusters via snapshot values, it does not make sense.

This potential problem has been revealed when exfat_update_parent_info()
was removed by commit d8dad2588a ("exfat: fix referencing wrong parent
directory information after renaming"). However, I don't think it's good
idea to bring exfat_update_parent_info() back.

Instead, let's use the updated exfat_chain of parent directory diectly.

Fixes: d8dad2588a ("exfat: fix referencing wrong parent directory information after renaming")
Reported-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Tested-by: Wang Yugui <wangyugui@e16-tech.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2022-06-09 21:26:32 +09:00
Damien Le Moal c1c1204c0d zonefs: fix zonefs_iomap_begin() for reads
If a readahead is issued to a sequential zone file with an offset
exactly equal to the current file size, the iomap type is set to
IOMAP_UNWRITTEN, which will prevent an IO, but the iomap length is
calculated as 0. This causes a WARN_ON() in iomap_iter():

[17309.548939] WARNING: CPU: 3 PID: 2137 at fs/iomap/iter.c:34 iomap_iter+0x9cf/0xe80
[...]
[17309.650907] RIP: 0010:iomap_iter+0x9cf/0xe80
[...]
[17309.754560] Call Trace:
[17309.757078]  <TASK>
[17309.759240]  ? lock_is_held_type+0xd8/0x130
[17309.763531]  iomap_readahead+0x1a8/0x870
[17309.767550]  ? iomap_read_folio+0x4c0/0x4c0
[17309.771817]  ? lockdep_hardirqs_on_prepare+0x400/0x400
[17309.778848]  ? lock_release+0x370/0x750
[17309.784462]  ? folio_add_lru+0x217/0x3f0
[17309.790220]  ? reacquire_held_locks+0x4e0/0x4e0
[17309.796543]  read_pages+0x17d/0xb60
[17309.801854]  ? folio_add_lru+0x238/0x3f0
[17309.807573]  ? readahead_expand+0x5f0/0x5f0
[17309.813554]  ? policy_node+0xb5/0x140
[17309.819018]  page_cache_ra_unbounded+0x27d/0x450
[17309.825439]  filemap_get_pages+0x500/0x1450
[17309.831444]  ? filemap_add_folio+0x140/0x140
[17309.837519]  ? lock_is_held_type+0xd8/0x130
[17309.843509]  filemap_read+0x28c/0x9f0
[17309.848953]  ? zonefs_file_read_iter+0x1ea/0x4d0 [zonefs]
[17309.856162]  ? trace_contention_end+0xd6/0x130
[17309.862416]  ? __mutex_lock+0x221/0x1480
[17309.868151]  ? zonefs_file_read_iter+0x166/0x4d0 [zonefs]
[17309.875364]  ? filemap_get_pages+0x1450/0x1450
[17309.881647]  ? __mutex_unlock_slowpath+0x15e/0x620
[17309.888248]  ? wait_for_completion_io_timeout+0x20/0x20
[17309.895231]  ? lock_is_held_type+0xd8/0x130
[17309.901115]  ? lock_is_held_type+0xd8/0x130
[17309.906934]  zonefs_file_read_iter+0x356/0x4d0 [zonefs]
[17309.913750]  new_sync_read+0x2d8/0x520
[17309.919035]  ? __x64_sys_lseek+0x1d0/0x1d0

Furthermore, this causes iomap_readahead() to loop forever as
iomap_readahead_iter() always returns 0, making no progress.

Fix this by treating reads after the file size as access to holes,
setting the iomap type to IOMAP_HOLE, the iomap addr to IOMAP_NULL_ADDR
and using the length argument as is for the iomap length. To simplify
the code with this change, zonefs_iomap_begin() is split into the read
variant, zonefs_read_iomap_begin() and zonefs_read_iomap_ops, and the
write variant, zonefs_write_iomap_begin() and zonefs_write_iomap_ops.

Reported-by: Jorgen Hansen <Jorgen.Hansen@wdc.com>
Fixes: 8dcc1a9d90 ("fs: New zonefs file system")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Jorgen Hansen <Jorgen.Hansen@wdc.com>
2022-06-08 19:13:55 +09:00
Damien Le Moal 96eca145cb zonefs: Do not ignore explicit_open with active zone limit
A zoned device may have no limit on the number of open zones but may
have a limit on the number of active zones it can support. In such
case, the explicit_open mount option should not be ignored to ensure
that the open() system call activates the zone with an explicit zone
open command, thus guaranteeing that the zone can be written.

Enforce this by ignoring the explicit_open mount option only for
devices that have both the open and active zone limits equal to 0.

Fixes: 87c9ce3ffe ("zonefs: Add active seq file accounting")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-06-08 15:38:44 +09:00
Damien Le Moal a2a513be71 zonefs: fix handling of explicit_open option on mount
Ignoring the explicit_open mount option on mount for devices that do not
have a limit on the number of open zones must be done after the mount
options are parsed and set in s_mount_opts. Move the check to ignore
the explicit_open option after the call to zonefs_parse_options() in
zonefs_fill_super().

Fixes: b5c00e9757 ("zonefs: open/close zone on file open/close")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
2022-06-08 15:38:42 +09:00
David Sterba e3a4167c88 btrfs: add error messages to all unrecognized mount options
Almost none of the errors stemming from a valid mount option but wrong
value prints a descriptive message which would help to identify why
mount failed. Like in the linked report:

  $ uname -r
  v4.19
  $ mount -o compress=zstd /dev/sdb /mnt
  mount: /mnt: wrong fs type, bad option, bad superblock on
  /dev/sdb, missing codepage or helper program, or other error.
  $ dmesg
  ...
  BTRFS error (device sdb): open_ctree failed

Errors caused by memory allocation failures are left out as it's not a
user error so reporting that would be confusing.

Link: https://lore.kernel.org/linux-btrfs/9c3fec36-fc61-3a33-4977-a7e207c3fa4e@gmx.de/
CC: stable@vger.kernel.org # 4.9+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-07 17:29:50 +02:00
Shyam Prasad N 8ea21823aa cifs: return errors during session setup during reconnects
During reconnects, we check the return value from
cifs_negotiate_protocol, and have handlers for both success
and failures. But if that passes, and cifs_setup_session
returns any errors other than -EACCES, we do not handle
that. This fix adds a handler for that, so that we don't
go ahead and try a tree_connect on a failed session.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-06 18:23:38 -05:00
Trond Myklebust 880265c77a pNFS: Avoid a live lock condition in pnfs_update_layout()
If we're about to send the first layoutget for an empty layout, we want
to make sure that we drain out the existing pending layoutget calls
first. The reason is that these layouts may have been already implicitly
returned to the server by a recall to which the client gave a
NFS4ERR_NOMATCHING_LAYOUT response.

The problem is that wait_var_event_killable() could in principle see the
plh_outstanding count go back to '1' when the first process to wake up
starts sending a new layoutget. If it fails to get a layout, then this
loop can continue ad infinitum...

Fixes: 0b77f97a7e ("NFSv4/pnfs: Fix layoutget behaviour after invalidation")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-06-06 11:53:55 -04:00
Trond Myklebust fe44fb23d6 pNFS: Don't keep retrying if the server replied NFS4ERR_LAYOUTUNAVAILABLE
If the server tells us that a pNFS layout is not available for a
specific file, then we should not keep pounding it with further
layoutget requests.

Fixes: 183d9e7b11 ("pnfs: rework LAYOUTGET retry handling")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-06-06 11:53:54 -04:00
Qu Wenruo 0591f04036 btrfs: prevent remounting to v1 space cache for subpage mount
Upstream commit 9f73f1aef9 ("btrfs: force v2 space cache usage for
subpage mount") forces subpage mount to use v2 cache, to avoid
deprecated v1 cache which doesn't support subpage properly.

But there is a loophole that user can still remount to v1 cache.

The existing check will only give users a warning, but does not really
prevent to do the remount.

Although remounting to v1 will not cause any problems since the v1 cache
will always be marked invalid when mounted with a different page size,
it's still better to prevent v1 cache at all for subpage mounts.

Fixes: 9f73f1aef9 ("btrfs: force v2 space cache usage for subpage mount")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-06 16:18:59 +02:00
Filipe Manana 31e70e5278 btrfs: fix hang during unmount when block group reclaim task is running
When we start an unmount, at close_ctree(), if we have the reclaim task
running and in the middle of a data block group relocation, we can trigger
a deadlock when stopping an async reclaim task, producing a trace like the
following:

[629724.498185] task:kworker/u16:7   state:D stack:    0 pid:681170 ppid:     2 flags:0x00004000
[629724.499760] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[629724.501267] Call Trace:
[629724.501759]  <TASK>
[629724.502174]  __schedule+0x3cb/0xed0
[629724.502842]  schedule+0x4e/0xb0
[629724.503447]  btrfs_wait_on_delayed_iputs+0x7c/0xc0 [btrfs]
[629724.504534]  ? prepare_to_wait_exclusive+0xc0/0xc0
[629724.505442]  flush_space+0x423/0x630 [btrfs]
[629724.506296]  ? rcu_read_unlock_trace_special+0x20/0x50
[629724.507259]  ? lock_release+0x220/0x4a0
[629724.507932]  ? btrfs_get_alloc_profile+0xb3/0x290 [btrfs]
[629724.508940]  ? do_raw_spin_unlock+0x4b/0xa0
[629724.509688]  btrfs_async_reclaim_metadata_space+0x139/0x320 [btrfs]
[629724.510922]  process_one_work+0x252/0x5a0
[629724.511694]  ? process_one_work+0x5a0/0x5a0
[629724.512508]  worker_thread+0x52/0x3b0
[629724.513220]  ? process_one_work+0x5a0/0x5a0
[629724.514021]  kthread+0xf2/0x120
[629724.514627]  ? kthread_complete_and_exit+0x20/0x20
[629724.515526]  ret_from_fork+0x22/0x30
[629724.516236]  </TASK>
[629724.516694] task:umount          state:D stack:    0 pid:719055 ppid:695412 flags:0x00004000
[629724.518269] Call Trace:
[629724.518746]  <TASK>
[629724.519160]  __schedule+0x3cb/0xed0
[629724.519835]  schedule+0x4e/0xb0
[629724.520467]  schedule_timeout+0xed/0x130
[629724.521221]  ? lock_release+0x220/0x4a0
[629724.521946]  ? lock_acquired+0x19c/0x420
[629724.522662]  ? trace_hardirqs_on+0x1b/0xe0
[629724.523411]  __wait_for_common+0xaf/0x1f0
[629724.524189]  ? usleep_range_state+0xb0/0xb0
[629724.524997]  __flush_work+0x26d/0x530
[629724.525698]  ? flush_workqueue_prep_pwqs+0x140/0x140
[629724.526580]  ? lock_acquire+0x1a0/0x310
[629724.527324]  __cancel_work_timer+0x137/0x1c0
[629724.528190]  close_ctree+0xfd/0x531 [btrfs]
[629724.529000]  ? evict_inodes+0x166/0x1c0
[629724.529510]  generic_shutdown_super+0x74/0x120
[629724.530103]  kill_anon_super+0x14/0x30
[629724.530611]  btrfs_kill_super+0x12/0x20 [btrfs]
[629724.531246]  deactivate_locked_super+0x31/0xa0
[629724.531817]  cleanup_mnt+0x147/0x1c0
[629724.532319]  task_work_run+0x5c/0xa0
[629724.532984]  exit_to_user_mode_prepare+0x1a6/0x1b0
[629724.533598]  syscall_exit_to_user_mode+0x16/0x40
[629724.534200]  do_syscall_64+0x48/0x90
[629724.534667]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[629724.535318] RIP: 0033:0x7fa2b90437a7
[629724.535804] RSP: 002b:00007ffe0b7e4458 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[629724.536912] RAX: 0000000000000000 RBX: 00007fa2b9182264 RCX: 00007fa2b90437a7
[629724.538156] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000555d6cf20dd0
[629724.539053] RBP: 0000555d6cf20ba0 R08: 0000000000000000 R09: 00007ffe0b7e3200
[629724.539956] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[629724.540883] R13: 0000555d6cf20dd0 R14: 0000555d6cf20cb0 R15: 0000000000000000
[629724.541796]  </TASK>

This happens because:

1) Before entering close_ctree() we have the async block group reclaim
   task running and relocating a data block group;

2) There's an async metadata (or data) space reclaim task running;

3) We enter close_ctree() and park the cleaner kthread;

4) The async space reclaim task is at flush_space() and runs all the
   existing delayed iputs;

5) Before the async space reclaim task calls
   btrfs_wait_on_delayed_iputs(), the block group reclaim task which is
   doing the data block group relocation, creates a delayed iput at
   replace_file_extents() (called when COWing leaves that have file extent
   items pointing to relocated data extents, during the merging phase
   of relocation roots);

6) The async reclaim space reclaim task blocks at
   btrfs_wait_on_delayed_iputs(), since we have a new delayed iput;

7) The task at close_ctree() then calls cancel_work_sync() to stop the
   async space reclaim task, but it blocks since that task is waiting for
   the delayed iput to be run;

8) The delayed iput is never run because the cleaner kthread is parked,
   and no one else runs delayed iputs, resulting in a hang.

So fix this by stopping the async block group reclaim task before we
park the cleaner kthread.

Fixes: 18bb8bbf13 ("btrfs: zoned: automatically reclaim zones")
CC: stable@vger.kernel.org # 5.15+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-06-06 16:18:52 +02:00
Matthew Wilcox (Oracle) 537e11cdc7 quota: Prevent memory allocation recursion while holding dq_lock
As described in commit 02117b8ae9 ("f2fs: Set GF_NOFS in
read_cache_page_gfp while doing f2fs_quota_read"), we must not enter
filesystem reclaim while holding the dq_lock.  Prevent this more generally
by using memalloc_nofs_save() while holding the lock.

Link: https://lore.kernel.org/r/20220605143815.2330891-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-06 10:08:10 +02:00
Jchao Sun 10e1407310 writeback: Fix inode->i_io_list not be protected by inode->i_lock error
Commit b35250c081 ("writeback: Protect inode->i_io_list with
inode->i_lock") made inode->i_io_list not only protected by
wb->list_lock but also inode->i_lock, but inode_io_list_move_locked()
was missed. Add lock there and also update comment describing
things protected by inode->i_lock. This also fixes a race where
__mark_inode_dirty() could move inode under flush worker's hands
and thus sync(2) could miss writing some inodes.

Fixes: b35250c081 ("writeback: Protect inode->i_io_list with inode->i_lock")
Link: https://lore.kernel.org/r/20220524150540.12552-1-sunjunchao2870@gmail.com
CC: stable@vger.kernel.org
Signed-off-by: Jchao Sun <sunjunchao2870@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-06 09:54:30 +02:00
Xiang wangx 2aab03b867 fs: Fix syntax errors in comments
Delete the redundant word 'not'.

Link: https://lore.kernel.org/r/20220605125509.14837-1-wangxiang@cdjrlc.com
Signed-off-by: Xiang wangx <wangxiang@cdjrlc.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2022-06-06 09:53:03 +02:00
Paulo Alcantara c36ee7dab7 cifs: fix reconnect on smb3 mount types
cifs.ko defines two file system types: cifs & smb3, and
__cifs_get_super() was not including smb3 file system type when
looking up superblocks, therefore failing to reconnect tcons in
cifs_tree_connect().

Fix this by calling iterate_supers_type() on both file system types.

Link: https://lore.kernel.org/r/CAFrh3J9soC36+BVuwHB=g9z_KB5Og2+p2_W+BBoBOZveErz14w@mail.gmail.com
Cc: stable@vger.kernel.org
Tested-by: Satadru Pramanik <satadru@gmail.com>
Reported-by: Satadru Pramanik <satadru@gmail.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-06 01:04:12 -05:00
Linus Torvalds 6684cf4290 fix for breakage in #work.fd this window
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpz/IQAKCRBZ7Krx/gZQ
 666mAPwKOC/voemjl2m+RpSruxAbdlRvKei3IY8YxLfyv+rmUQD9HKLJJtQX2VRF
 QTFmQ3p7kx30ydwSbyY8Kpw3VMCDxgs=
 =1ZKm
 -----END PGP SIGNATURE-----

Merge tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file descriptor fix from Al Viro:
 "Fix for breakage in #work.fd this window"

* tag 'pull-work.fd-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix the breakage in close_fd_get_file() calling conventions change
2022-06-05 17:14:03 -07:00
Al Viro 40a1926022 fix the breakage in close_fd_get_file() calling conventions change
It used to grab an extra reference to struct file rather than
just transferring to caller the one it had removed from descriptor
table.  New variant doesn't, and callers need to be adjusted.

Reported-and-tested-by: syzbot+47dd250f527cb7bebf24@syzkaller.appspotmail.com
Fixes: 6319194ec5 ("Unify the primitives for file descriptor closing")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-06-05 15:03:03 -04:00
Linus Torvalds 952923ddc0 Several cleanups in fs/namei.c.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpvrrgAKCRBZ7Krx/gZQ
 6+eZAP9r0c8E1UnUxRI32kQrzndJO2z9mKq6rI9D7GgJe+MtogEAlfynzklqbHzo
 fFvczeD0tzLu4aDExtGn6GNGf5gSpw4=
 =1blQ
 -----END PGP SIGNATURE-----

Merge tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull vfs pathname updates from Al Viro:
 "Several cleanups in fs/namei.c"

* tag 'pull-18-rc1-work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  namei: cleanup double word in comment
  get rid of dead code in legitimize_root()
  fs/namei.c:reserve_stack(): tidy up the call of try_to_unlazy()
2022-06-04 19:07:15 -07:00
Linus Torvalds cbd76edeab Cleanups (and one fix) around struct mount handling.
The fix is usermode_driver.c one - once you've done kern_mount(), you
 must kern_unmount(); simple mntput() will end up with a leak.  Several
 failure exits in there messed up that way...  In practice you won't
 hit those particular failure exits without fault injection, though.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpvrWQAKCRBZ7Krx/gZQ
 6z29AP9EZVSyIvnwXleehpa2mEZhsp+KAKgV/ENaKHMn7jiH0wD/bfgnhxIDNuc5
 108E2R5RWEYTynW5k7nnP5PsTsMq5Qc=
 =b3Wc
 -----END PGP SIGNATURE-----

Merge tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull mount handling updates from Al Viro:
 "Cleanups (and one fix) around struct mount handling.

  The fix is usermode_driver.c one - once you've done kern_mount(), you
  must kern_unmount(); simple mntput() will end up with a leak. Several
  failure exits in there messed up that way... In practice you won't hit
  those particular failure exits without fault injection, though"

* tag 'pull-18-rc1-work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  move mount-related externs from fs.h to mount.h
  blob_to_mnt(): kern_unmount() is needed to undo kern_mount()
  m->mnt_root->d_inode->i_sb is a weird way to spell m->mnt_sb...
  linux/mount.h: trim includes
  uninline may_mount() and don't opencode it in fspick(2)/fsopen(2)
2022-06-04 19:00:05 -07:00
Linus Torvalds dbe0ee4661 Descriptor handling cleanups
-----BEGIN PGP SIGNATURE-----
 
 iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCYpwEZAAKCRBZ7Krx/gZQ
 691uAP0QnwO0lOYOa41MfQ6QnzPbiYcffqtUuTJBWyfs8+WnugD2NNGQP7Zjtin9
 q0wcv2KA6yY7qgu7RCCbxU/7J0YdCg==
 =ywuh
 -----END PGP SIGNATURE-----

Merge tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull file descriptor updates from Al Viro.

 - Descriptor handling cleanups

* tag 'pull-18-rc1-work.fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  Unify the primitives for file descriptor closing
  fs: remove fget_many and fput_many interface
  io_uring_enter(): don't leave f.flags uninitialized
2022-06-04 18:52:00 -07:00
Linus Torvalds d66016c5cd Nine cifs/smb3 client fixes. Includes DFS fixes, some cleanup of leagcy SMB1 code, duplicated message cleanup and a double free and deadlock fix
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKb4dcACgkQiiy9cAdy
 T1G0HAwArA5ZwygidbWP+mZC+iuvFFDJczxKi+z3VteqfJCV1L7s4VQHVGMwjCbe
 duiIQ+wD5dUKwM13C5jVO4TeRixR+3eQXppVlq0fCmufyH+t05nZMXU+/vRgIbuH
 S3Tg8t6H7TJI+HbBYEH6FgnC2Gb18VAehckw119a6NYRR8czNRkEqcmyXPAhYaV7
 u6eRJIiJGnki85f1oGKT+LvAtmXWf9CKWkVy2KPzkWxMPm5GYDdL04wRHqSYpAAv
 oA+Z9t76tYhK8e4KBLhD16OzAgUMyYyHoJSZFZHH4jtTnGz4u1SRHxh0qQy/4WnJ
 3t8xSW2VsSil5N7jeXbF59GoZrbmspZ5Lu5kwaUiRjOGRKdByXU/8NFz4hS3C2Fb
 XPV5ZbgxLMHT857t0aoSeiiT0OXC86mLSDfdITF+ImftPjsmyfw5BL2AEf2x0oEO
 CiYIQyiZaGFmh8dlLhf9ysUKOETsBmX0stKs+uQ46SR78oXBDKg8vo/wnFhXItkv
 o9N5ucfM
 =Jhdy
 -----END PGP SIGNATURE-----

Merge tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client fixes from Steve French:
 "Nine cifs/smb3 client fixes.

  Includes DFS fixes, some cleanup of leagcy SMB1 code, duplicated
  message cleanup and a double free and deadlock fix"

* tag '5.19-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share
  cifs: skip trailing separators of prefix paths
  cifs: update internal module number
  cifs: version operations for smb20 unneeded when legacy support disabled
  cifs: do not build smb1ops if legacy support is disabled
  cifs: fix potential deadlock in direct reclaim
  cifs: when extending a file with falloc we should make files not-sparse
  cifs: remove repeated debug message on cifs_put_smb_ses()
  cifs: fix potential double free during failed mount
2022-06-04 17:42:33 -07:00
Steve French ee3c8019cc cifs: fix uninitialized pointer in error case in dfs_cache_get_tgt_share
Set default value of ppath to null.

Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-04 13:33:42 -05:00
Linus Torvalds 1f95267583 Ntfs3 for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmKY2fUACgkQqbAzH4Mk
 B7ZUaw/8CIuns8LuGG4a14Q2eN/smIBdBhnTFAhk0pjZPYHaBUauor6bR1QxJwVV
 Vbr4QBmrPHp+MJEZOd5FDC1Fd79TCVhe5d9SzbVn3tG0DHfe8bp5YZfkrAOxGURb
 pG2GaQ5Pq4AiDt1d2nia8pwxt7sCVPx8v5Bvi25WPCdrrSmlxduJRBftZx4RNXLv
 p6iDfakGq9p1LxgJX+YkvJPqKMloY52DoB0po4JgHkQTME2tKnnzZucxRNNPs1Jd
 8hBfM9g33wmETvlNffHpoI9JuAAxdKCcuU0/n6e2R9WQZqAZrrAdb5Jj2GRzV0em
 F/bezX0qa09v+siMgjxfHVKreKn4pea98Gu6GWI683CN4lvilIvrIsqu7z5HrX4r
 gSrfm+Wf5MUXMXwaKU2e/1kpP5cmUEjo7eFQS0sJzi7cO0nNqUnYYP8LwMHZw1jJ
 USvRhlJdGJMzRBsTXHvlYl42IVNwPP+KPlQuwgitk6oIl81L95fKivom0A5lMaEs
 bJrNtutz2Y9Pd4ZffJtgzMtqGvqfvSOxco7AHQYoSq2jrm8b735x1s2xhzNcdhv3
 OZbTZeKfpMw8Vxnv43yJ8zpHlZ3EgD3+T0UHaBcJEXjCvLbD+ibLzfMaQa9mqC5t
 TE5SaEhFWf+BskVufhrgG4VfwvJGj84VhF5u9k4roJkw8+yJRdY=
 =ut2G
 -----END PGP SIGNATURE-----

Merge tag 'ntfs3_for_5.19' of https://github.com/Paragon-Software-Group/linux-ntfs3

Pull ntfs3 updates from Konstantin Komarov:

 - fix some memory leaks and panic

 - fixed xfstests (tested on x86_64): generic/092 generic/099
   generic/228 generic/240 generic/307 generic/444

 - fix some typos, dead code, etc

* tag 'ntfs3_for_5.19' of https://github.com/Paragon-Software-Group/linux-ntfs3:
  fs/ntfs3: provide block_invalidate_folio to fix memory leak
  fs/ntfs3: Fix invalid free in log_replay
  fs/ntfs3: Update valid size if -EIOCBQUEUED
  fs/ntfs3: Check new size for limits
  fs/ntfs3: Fix fiemap + fix shrink file size (to remove preallocated space)
  fs/ntfs3: In function ntfs_set_acl_ex do not change inode->i_mode if called from function ntfs_init_acl
  fs/ntfs3: Optimize locking in ntfs_save_wsl_perm
  fs/ntfs3: Update i_ctime when xattr is added
  fs/ntfs3: Restore ntfs_xattr_get_acl and ntfs_xattr_set_acl functions
  fs/ntfs3: Keep preallocated only if option prealloc enabled
  fs/ntfs3: Fix some memory leaks in an error handling path of 'log_replay()'
2022-06-03 16:57:16 -07:00
Linus Torvalds 1ec6574a3c This set of changes updates init and user mode helper tasks to be
ordinary user mode tasks.
 
 In commit 40966e316f ("kthread: Ensure struct kthread is present for
 all kthreads") caused init and the user mode helper threads that call
 kernel_execve to have struct kthread allocated for them.  This struct
 kthread going away during execve in turned made a use after free of
 struct kthread possible.
 
 The commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
 init and umh") is enough to fix the use after free and is simple enough
 to be backportable.
 
 The rest of the changes pass struct kernel_clone_args to clean things
 up and cause the code to make sense.
 
 In making init and the user mode helpers tasks purely user mode tasks
 I ran into two complications.  The function task_tick_numa was
 detecting tasks without an mm by testing for the presence of
 PF_KTHREAD.  The initramfs code in populate_initrd_image was using
 flush_delayed_fput to ensuere the closing of all it's file descriptors
 was complete, and flush_delayed_fput does not work in a userspace thread.
 
 I have looked and looked and more complications and in my code review
 I have not found any, and neither has anyone else with the code sitting
 in linux-next.
 
 Link: https://lkml.kernel.org/r/87mtfu4up3.fsf@email.froward.int.ebiederm.org
 
 Eric W. Biederman (8):
       kthread: Don't allocate kthread_struct for init and umh
       fork: Pass struct kernel_clone_args into copy_thread
       fork: Explicity test for idle tasks in copy_thread
       fork: Generalize PF_IO_WORKER handling
       init: Deal with the init process being a user mode process
       fork: Explicitly set PF_KTHREAD
       fork: Stop allowing kthreads to call execve
       sched: Update task_tick_numa to ignore tasks without an mm
 
  arch/alpha/kernel/process.c      | 13 ++++++------
  arch/arc/kernel/process.c        | 13 ++++++------
  arch/arm/kernel/process.c        | 12 ++++++-----
  arch/arm64/kernel/process.c      | 12 ++++++-----
  arch/csky/kernel/process.c       | 15 ++++++-------
  arch/h8300/kernel/process.c      | 10 ++++-----
  arch/hexagon/kernel/process.c    | 12 ++++++-----
  arch/ia64/kernel/process.c       | 15 +++++++------
  arch/m68k/kernel/process.c       | 12 ++++++-----
  arch/microblaze/kernel/process.c | 12 ++++++-----
  arch/mips/kernel/process.c       | 13 ++++++------
  arch/nios2/kernel/process.c      | 12 ++++++-----
  arch/openrisc/kernel/process.c   | 12 ++++++-----
  arch/parisc/kernel/process.c     | 18 +++++++++-------
  arch/powerpc/kernel/process.c    | 15 +++++++------
  arch/riscv/kernel/process.c      | 12 ++++++-----
  arch/s390/kernel/process.c       | 12 ++++++-----
  arch/sh/kernel/process_32.c      | 12 ++++++-----
  arch/sparc/kernel/process_32.c   | 12 ++++++-----
  arch/sparc/kernel/process_64.c   | 12 ++++++-----
  arch/um/kernel/process.c         | 15 +++++++------
  arch/x86/include/asm/fpu/sched.h |  2 +-
  arch/x86/include/asm/switch_to.h |  8 +++----
  arch/x86/kernel/fpu/core.c       |  4 ++--
  arch/x86/kernel/process.c        | 18 +++++++++-------
  arch/xtensa/kernel/process.c     | 17 ++++++++-------
  fs/exec.c                        |  8 ++++---
  include/linux/sched/task.h       |  8 +++++--
  init/initramfs.c                 |  2 ++
  init/main.c                      |  2 +-
  kernel/fork.c                    | 46 +++++++++++++++++++++++++++++++++-------
  kernel/sched/fair.c              |  2 +-
  kernel/umh.c                     |  6 +++---
  33 files changed, 234 insertions(+), 160 deletions(-)
 
 Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEgjlraLDcwBA2B+6cC/v6Eiajj0AFAmKaR/MACgkQC/v6Eiaj
 j0Aayg/7Bx66872d9c6igkJ+MPCTuh+v9QKCGwiYEmiU4Q5sVAFB0HPJO27qC14u
 630X0RFNZTkPzNNEJNIW4kw6Dj8s8YRKf+FgQAVt4SzdRwT7eIPDjk1nGraopPJ3
 O04pjvuTmUyidyViRyFcf2ptx/pnkrwP8jUSc+bGTgfASAKAgAokqKE5ecjewbBc
 Y/EAkQ6QW7KxPjeSmpAHwI+t3BpBev9WEC4PbhRhsBCQFO2+PJiklvqdhVNBnIjv
 qUezll/1xv9UYgniB15Q4Nb722SmnWSU3r8as1eFPugzTHizKhufrrpyP+KMK1A0
 tdtEJNs5t2DZF7ZbGTFSPqJWmyTYLrghZdO+lOmnaSjHxK4Nda1d4NzbefJ0u+FE
 tutewowvHtBX6AFIbx+H3O+DOJM2IgNMf+ReQDU/TyNyVf3wBrTbsr9cLxypIJIp
 zze8npoLMlB7B4yxVo5ES5e63EXfi3iHl0L3/1EhoGwriRz1kWgVLUX/VZOUpscL
 RkJHsW6bT8sqxPWAA5kyWjEN+wNR2PxbXi8OE4arT0uJrEBMUgDCzydzOv5tJB00
 mSQdytxH9LVdsmxBKAOBp5X6WOLGA4yb1cZ6E/mEhlqXMpBDF1DaMfwbWqxSYi4q
 sp5zU3SBAW0qceiZSsWZXInfbjrcQXNV/DkDRDO9OmzEZP4m1j0=
 =x6fy
 -----END PGP SIGNATURE-----

Merge tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace

Pull kthread updates from Eric Biederman:
 "This updates init and user mode helper tasks to be ordinary user mode
  tasks.

  Commit 40966e316f ("kthread: Ensure struct kthread is present for
  all kthreads") caused init and the user mode helper threads that call
  kernel_execve to have struct kthread allocated for them. This struct
  kthread going away during execve in turned made a use after free of
  struct kthread possible.

  Here, commit 343f4c49f2 ("kthread: Don't allocate kthread_struct for
  init and umh") is enough to fix the use after free and is simple
  enough to be backportable.

  The rest of the changes pass struct kernel_clone_args to clean things
  up and cause the code to make sense.

  In making init and the user mode helpers tasks purely user mode tasks
  I ran into two complications. The function task_tick_numa was
  detecting tasks without an mm by testing for the presence of
  PF_KTHREAD. The initramfs code in populate_initrd_image was using
  flush_delayed_fput to ensuere the closing of all it's file descriptors
  was complete, and flush_delayed_fput does not work in a userspace
  thread.

  I have looked and looked and more complications and in my code review
  I have not found any, and neither has anyone else with the code
  sitting in linux-next"

* tag 'kthread-cleanups-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  sched: Update task_tick_numa to ignore tasks without an mm
  fork: Stop allowing kthreads to call execve
  fork: Explicitly set PF_KTHREAD
  init: Deal with the init process being a user mode process
  fork: Generalize PF_IO_WORKER handling
  fork: Explicity test for idle tasks in copy_thread
  fork: Pass struct kernel_clone_args into copy_thread
  kthread: Don't allocate kthread_struct for init and umh
2022-06-03 16:03:05 -07:00
Linus Torvalds 744983d878 This pull request contains fixes for JFFS2, UBI and UBIFS
JFFS2:
         - Fixes for a memory leak
 
 UBI:
         - Fixes for fastmap (UAF, high CPU usage)
 
 UBIFS:
         - Minor cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCAA0FiEEdgfidid8lnn52cLTZvlZhesYu8EFAmKZ0F0WHHJpY2hhcmRA
 c2lnbWEtc3Rhci5hdAAKCRBm+VmF6xi7wVOsD/wMI9/RR3NQukIdj1besXnYWCGr
 44PLTZsX+fwn/ndj/IDqtyZiStILQOX3gNK9bZ540bVQGkMxgYerQUjZH2A1CaBd
 BwisUhU9duv6t6ObB8IudnqYB3nYTM7mRGQQOnkw0ddGdCoN7S8WB7ed7ce6bLF2
 X4cbzpIfBhuXghKzZ0d9k/uNEj4ZcTsTov+yiX2F9M0xBgfFIoIrP51mk+5l7PAj
 BmEqDFRfuTPm5u1WMUpThPX/RhSEu5PkZDLztSmD8WfiaF20KPFrWIMtKQmn/3vT
 IkRzJoHZSX8KmjoD1eBHssblJwcF6l5OGZNrS2NGqJMpVtRJ2uFWq3lx/WjgN88W
 /eqIoryYod8NTogTzl//y/mtDhSkaGK6LdDKjk5Z1gP1ferJNBw8k1+icR/YP4sj
 /dBXJTI7nIrb79EegQHIGNVAfE+Oi+rjEB/r8cqG9z5sJQvCK9p30+cVJSuK0/Na
 N6wX107HtgDh5kLaUKwTQjZBi/1TpV0dQCdAOkqgyn5YlzxXKFKISBrRIPuFtGXM
 a6fBwd+RSd9ccWGPE4R+6JQELFOYDpOj6cunwheb0B1YLaDGoV+mjAcO6uGdExXy
 +iErtxV1v+lXqslQKR09uO+N2gApQYF7B01TjDjyxWsKWABe55Tl3z8lGpPxGIcb
 FvkFMzuuIE4FlyUY6g==
 =QSbc
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

Pull JFFS2, UBI and UBIFS updates from Richard Weinberger:
 "JFFS2:
   - Fixes for a memory leak

  UBI:
   - Fixes for fastmap (UAF, high CPU usage)

  UBIFS:
   - Minor cleanups"

* tag 'for-linus-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs:
  ubi: ubi_create_volume: Fix use-after-free when volume creation failed
  ubi: fastmap: Check wl_pool for free peb before wear leveling
  ubi: fastmap: Fix high cpu usage of ubi_bgt by making sure wl_pool not empty
  ubifs: Use NULL instead of using plain integer as pointer
  ubifs: Simplify the return expression of run_gc()
  jffs2: fix memory leak in jffs2_do_fill_super
  jffs2: Use kzalloc instead of kmalloc/memset
2022-06-03 14:42:24 -07:00
Paulo Alcantara ef605e8682 cifs: skip trailing separators of prefix paths
During DFS failover, prefix paths may change, so make sure to not
leave trailing separators when parsing thew in
dfs_cache_get_tgt_share().  The separators of prefix paths are already
handled by build_path_from_dentry_optional_prefix().

Consider the following DFS link:

  //dom/dfs/link: [\srv1\share\dir1, \srv2\share\dir1]

Before commit:

  mount.cifs //dom/dfs/link
  tree connect to \\srv1\share; prefix_path=dir1
  disconnect srv1; failover to srv2
  tree connect to \\srv2\share; prefix_path=dir1\
  mv foo bar

  ...
  SMB2 430 Create Request File: dir1\\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 582 Create Response File: dir1\\foo;GetInfo Response;Close Response
  SMB2 430 Create Request File: dir1\\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
  SMB2 462 Create Request File: dir1\\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\\bar;Close Request
  SMB2 478 Create Response File: dir1\\foo;SetInfo Response, Error: STATUS_OBJECT_NAME_INVALID;Close Response

After commit:

  mount.cifs //dom/dfs/link
  tree connect to \\srv1\share; prefix_path=dir1
  disconnect srv1; failover to srv2
  tree connect to \\srv2\share; prefix_path=dir1
  mv foo bar

  ...
  SMB2 430 Create Request File: dir1\foo;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 582 Create Response File: dir1\foo;GetInfo Response;Close Response
  SMB2 430 Create Request File: dir1\bar;GetInfo Request FILE_INFO/SMB2_FILE_ALL_INFO;Close Request
  SMB2 286 Create Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;GetInfo Response, Error: STATUS_OBJECT_NAME_NOT_FOUND;Close Response, Error: STATUS_OBJECT_NAME_NOT_FOUND
  SMB2 462 Create Request File: dir1\foo;SetInfo Request FILE_INFO/SMB2_FILE_RENAME_INFO NewName:dir1\bar;Close Request
  SMB2 478 Create Response File: dir1\foo;SetInfo Response;Close Response

Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-03 14:14:58 -05:00
Linus Torvalds 500a434fc5 Driver core changes for 5.19-rc1
Here is the set of driver core changes for 5.19-rc1.
 
 Note, I'm not really happy with this pull request as-is, see below for
 details, but overall this is all good for everything but a small set of
 systems, which we have a fix for already.
 
 Lots of tiny driver core changes and cleanups happened this cycle,
 but the two major things were:
 
 	- firmware_loader reorganization and additions including the
 	  ability to have XZ compressed firmware images and the ability
 	  for userspace to initiate the firmware load when it needs to,
 	  instead of being always initiated by the kernel. FPGA devices
 	  specifically want this ability to have their firmware changed
 	  over the lifetime of the system boot, and this allows them to
 	  work without having to come up with yet-another-custom-uapi
 	  interface for loading firmware for them.
 	- physical location support added to sysfs so that devices that
 	  know this information, can tell userspace where they are
 	  located in a common way.  Some ACPI devices already support
 	  this today, and more bus types should support this in the
 	  future.
 
 Smaller changes included:
 	- driver_override api cleanups and fixes
 	- error path cleanups and fixes
 	- get_abi script fixes
 	- deferred probe timeout changes.
 
 It's that last change that I'm the most worried about.  It has been
 reported to cause boot problems for a number of systems, and I have a
 tested patch series that resolves this issue.  But I didn't get it
 merged into my tree before 5.18-final came out, so it has not gotten any
 linux-next testing.
 
 I'll send the fixup patches (there are 2) as a follow-on series to this
 pull request if you want to take them directly, _OR_ I can just revert
 the probe timeout changes and they can wait for the next -rc1 merge
 cycle.  Given that the fixes are tested, and pretty simple, I'm leaning
 toward that choice.  Sorry this all came at the end of the merge window,
 I should have resolved this all 2 weeks ago, that's my fault as it was
 in the middle of some travel for me.
 
 All have been tested in linux-next for weeks, with no reported issues
 other than the above-mentioned boot time outs.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpnv/A8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk/fACgvmenbo5HipqyHnOmTQlT50xQ9EYAn2eTq6ai
 GkjLXBGNWOPBa5cU52qf
 =yEi/
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the set of driver core changes for 5.19-rc1.

  Lots of tiny driver core changes and cleanups happened this cycle, but
  the two major things are:

   - firmware_loader reorganization and additions including the ability
     to have XZ compressed firmware images and the ability for userspace
     to initiate the firmware load when it needs to, instead of being
     always initiated by the kernel. FPGA devices specifically want this
     ability to have their firmware changed over the lifetime of the
     system boot, and this allows them to work without having to come up
     with yet-another-custom-uapi interface for loading firmware for
     them.

   - physical location support added to sysfs so that devices that know
     this information, can tell userspace where they are located in a
     common way. Some ACPI devices already support this today, and more
     bus types should support this in the future.

  Smaller changes include:

   - driver_override api cleanups and fixes

   - error path cleanups and fixes

   - get_abi script fixes

   - deferred probe timeout changes.

  It's that last change that I'm the most worried about. It has been
  reported to cause boot problems for a number of systems, and I have a
  tested patch series that resolves this issue. But I didn't get it
  merged into my tree before 5.18-final came out, so it has not gotten
  any linux-next testing.

  I'll send the fixup patches (there are 2) as a follow-on series to this
  pull request.

  All have been tested in linux-next for weeks, with no reported issues
  other than the above-mentioned boot time-outs"

* tag 'driver-core-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
  driver core: fix deadlock in __device_attach
  kernfs: Separate kernfs_pr_cont_buf and rename_lock.
  topology: Remove unused cpu_cluster_mask()
  driver core: Extend deferred probe timeout on driver registration
  MAINTAINERS: add Russ Weight as a firmware loader maintainer
  driver: base: fix UAF when driver_attach failed
  test_firmware: fix end of loop test in upload_read_show()
  driver core: location: Add "back" as a possible output for panel
  driver core: location: Free struct acpi_pld_info *pld
  driver core: Add "*" wildcard support to driver_async_probe cmdline param
  driver core: location: Check for allocations failure
  arch_topology: Trace the update thermal pressure
  kernfs: Rename kernfs_put_open_node to kernfs_unlink_open_file.
  export: fix string handling of namespace in EXPORT_SYMBOL_NS
  rpmsg: use local 'dev' variable
  rpmsg: Fix calling device_lock() on non-initialized device
  firmware_loader: describe 'module' parameter of firmware_upload_register()
  firmware_loader: Move definitions from sysfs_upload.h to sysfs.h
  firmware_loader: Fix configs for sysfs split
  selftests: firmware: Add firmware upload selftests
  ...
2022-06-03 11:48:47 -07:00
Linus Torvalds 04d93b2b8b SPDX changes for 5.19-rc1
Here are some SPDX (i.e. licensing) changes for 5.19-rc1
 
 The SPDX-labeling effort has started to pick up again, so here are some
 changes for various parts of the tree that are related to this effort.
 
 Included in here are:
 	- freevxfs license updates
 	- spihash.c license cleanups
 	- spdxcheck script updates to make things easier to work with
 	  going forward
 
 All of the license updates came from the original authors/copyright
 holders of the code involved.
 
 All of these have been in linux-next for weeks with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYpngmg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yl41wCgzt9M0/9hLjVV9UIW2l2phyJQZPQAoK7u0RUU
 tYRRT2gSUwAHlu3khZSS
 =fjdf
 -----END PGP SIGNATURE-----

Merge tag 'spdx-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx

Pull SPDX updates from Greg KH:
 "Here are some SPDX license marker changes.

  The SPDX-labeling effort has started to pick up again, so here are
  some changes for various parts of the tree that are related to this
  effort.

  Included in here are:

   - freevxfs license updates

   - spihash.c license cleanups

   - spdxcheck script updates to make things easier to work with going
     forward

  All of the license updates came from the original authors/copyright
  holders of the code involved.

  All of these have been in linux-next for weeks with no reported
  issues"

* tag 'spdx-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/spdx:
  siphash: add SPDX tags as sole licensing authority
  scripts/spdxcheck: Exclude top-level README
  scripts/spdxcheck: Exclude MAINTAINERS/CREDITS
  scripts/spdxcheck: Exclude config directories
  scripts/spdxcheck: Put excluded files and directories into a separate file
  scripts/spdxcheck: Add option to display files without SPDX
  scripts/spdxcheck: Add [sub]directory statistics
  scripts/spdxcheck: Add directory statistics
  scripts/spdxcheck: Add percentage to statistics
  freevxfs: relicense to GPLv2 only
2022-06-03 10:34:34 -07:00
Linus Torvalds 5ac8bdb9ad io_uring-5.19-2022-06-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKZmm8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpka7EAC8aWPkU+s5qiRVD3pJRjQQb0kAIUHXwlgv
 Y4c7CdBqjIYcRbMkxZg5lJaBHr8cYh67X0RSrkxcgO4uEtF3DTMZcrlG7ZaV8jnB
 e44zjBNsWuI6//ef6vlACuTLqsb/ZRTxzDB2haPlqAZ/uMjK5TdUIQfFc83IDLAE
 UxrUWE25R49hvg84L3Apbn79kLsLpqbv2vANuctDhOE0bH10S0SS987PVTg4TUA+
 tkIzTyqgB9vWFJwsCkARp13uGreW+XroHwSh7KwKXJR55lu2f1vD4Spg0UvgigQQ
 I6vSphlvd4GL7oqQM0pHUyuraYOQ/WChXcUN3Jitgv92S1W0+s73hp/RgBFZBRZM
 E1m3Qu465QlNziwfRxV5gpPC4TpVH7CeXE8RpaH76KhACsQSzYtDbcl+gYaZJnOz
 6pp0kAbJ8Wrfn4P0bYwpCz/aPi1+P2cKNhzeUIF/wqz3yt8CsQkbTyrQ+TLQ4XtX
 VnMF10opDl6yt2ELP9KPt3vjJnzIbxEmlqdg1yj4rB5FWWj8CkBdoaeqFdnqcL2r
 AuLjB/yjtzxJTAOMZYV/gCnArRiGgGb/7JTbyymW+iXNEdnYhmG225HfEG9c7ytj
 XHsu1vdNonNsk+cXqrGRZ+AUm8eHDX4hZX4CG06UXUuPEBoOJ5qfEZTiXLIwOnGJ
 Q0IGHDGgaw==
 =OkKj
 -----END PGP SIGNATURE-----

Merge tag 'io_uring-5.19-2022-06-02' of git://git.kernel.dk/linux-block

Pull more io_uring updates from Jens Axboe:

 - A small series with some prep patches for the upcoming 5.20 split of
   the io_uring.c file. No functional changes here, just minor bits that
   are nice to get out of the way now (me)

 - Fix for a memory leak in high numbered provided buffer groups,
   introduced in the merge window (me)

 - Wire up the new socket opcode for allocated direct descriptors,
   making it consistent with the other opcodes that can instantiate a
   descriptor (me)

 - Fix for the inflight tracking, should go into 5.18-stable as well
   (me)

 - Fix for a deadlock for io-wq offloaded file slot allocations (Pavel)

 - Direct descriptor failure fput leak fix (Xiaoguang)

 - Fix for the direct descriptor allocation hinting in case of
   unsuccessful install (Xiaoguang)

* tag 'io_uring-5.19-2022-06-02' of git://git.kernel.dk/linux-block:
  io_uring: reinstate the inflight tracking
  io_uring: fix deadlock on iowq file slot alloc
  io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots
  io_uring: defer alloc_hint update to io_file_bitmap_set()
  io_uring: ensure fput() called correspondingly when direct install fails
  io_uring: wire up allocated direct descriptors for socket
  io_uring: fix a memory leak of buffer group list on exit
  io_uring: move shutdown under the general net section
  io_uring: unify calling convention for async prep handling
  io_uring: add io_op_defs 'def' pointer in req init and issue
  io_uring: make prep and issue side of req handlers named consistently
  io_uring: make timeout prep handlers consistent with other prep handlers
2022-06-03 10:10:38 -07:00
Chuck Lever b6c71c66b0 NFSD: Fix potential use-after-free in nfsd_file_put()
nfsd_file_put_noref() can free @nf, so don't dereference @nf
immediately upon return from nfsd_file_put_noref().

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 999397926a ("nfsd: Clean up nfsd_file_put()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-06-02 13:05:58 -04:00
Linus Torvalds 17d8e3d90b A big pile of assorted fixes and improvements for the filesystem with
nothing in particular standing out, except perhaps that the fact that
 the MDS never really maintained atime was made official and thus it's
 no longer updated on the client either.
 
 We also have a MAINTAINERS update: Jeff is transitioning his filesystem
 maintainership duties to Xiubo.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmKYs1wTHGlkcnlvbW92
 QGdtYWlsLmNvbQAKCRBKf944AhHzi+PvCACIj47W4FapO672xcIkQ4920ZT1Jw/o
 2BfKXUtNyVLpGgBlweJWSTd1tfXp0tl9MFg00t/zbVarHH0SGAgF1z6e/tM7rjA/
 vyCkFQXJDuwB0kCbCtZ9xt5XIQkkvPPJOmyLSKYl7RqImch7pTRd5IwxgKGWqXDx
 FraVXqFqvr8L+szV/JCopdxdMVTFixWRD48z5pPlOReaOXiGjtTMoFIBIPp7GqVL
 UB7wyOtDmyzcGnUsRNqMQFrkUBsBW1IEDKf/yVtQNDjUxmr3uXm8vugeISpMOGBO
 cCkZACDeO0lpgHrXSo4UCf46bg3/HujxZu0nTc9HqPDiFdOmKmf58N4n
 =MAi2
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client

Pull ceph updates from Ilya Dryomov:
 "A big pile of assorted fixes and improvements for the filesystem with
  nothing in particular standing out, except perhaps that the fact that
  the MDS never really maintained atime was made official and thus it's
  no longer updated on the client either.

  We also have a MAINTAINERS update: Jeff is transitioning his
  filesystem maintainership duties to Xiubo"

* tag 'ceph-for-5.19-rc1' of https://github.com/ceph/ceph-client: (23 commits)
  MAINTAINERS: move myself from ceph "Maintainer" to "Reviewer"
  ceph: fix decoding of client session messages flags
  ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE
  ceph: remove redundant variable ino
  ceph: try to queue a writeback if revoking fails
  ceph: fix statfs for subdir mounts
  ceph: fix possible deadlock when holding Fwb to get inline_data
  ceph: redirty the page for writepage on failure
  ceph: try to choose the auth MDS if possible for getattr
  ceph: disable updating the atime since cephfs won't maintain it
  ceph: flush the mdlog for filesystem sync
  ceph: rename unsafe_request_wait()
  libceph: use swap() macro instead of taking tmp variable
  ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
  ceph: no need to invalidate the fscache twice
  ceph: replace usage of found with dedicated list iterator variable
  ceph: use dedicated list iterator variable
  ceph: update the dlease for the hashed dentry when removing
  ceph: stop retrying the request when exceeding 256 times
  ceph: stop forwarding the request when exceeding 256 times
  ...
2022-06-02 08:59:39 -07:00
Jens Axboe 9cae36a094 io_uring: reinstate the inflight tracking
After some debugging, it was realized that we really do still need the
old inflight tracking for any file type that has io_uring_fops assigned.
If we don't, then trivial circular references will mean that we never get
the ctx cleaned up and hence it'll leak.

Just bring back the inflight tracking, which then also means we can
eliminate the conditional dropping of the file when task_work is queued.

Fixes: d5361233e9 ("io_uring: drop the old style inflight file tracking")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-01 23:57:02 -06:00
Steve French 096c956b0d cifs: update internal module number
To 2.37

Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-01 23:23:09 -05:00
Steve French 7ef93ffccd cifs: version operations for smb20 unneeded when legacy support disabled
We should not be including unused smb20 specific code when legacy
support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned
off).  For example smb2_operations and smb2_values aren't used
in that case.  Over time we can move more and more SMB1/CIFS and SMB2.0
code into the insecure legacy ifdefs

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-01 22:30:36 -05:00
Steve French 387ba9bf4c cifs: do not build smb1ops if legacy support is disabled
We should not be including unused SMB1/CIFS functions when legacy
support is disabled (CONFIG_CIFS_ALLOW_INSECURE_LEGACY turned
off), but especially obvious is not needing to build smb1ops.c
at all when legacy support is disabled. Over time we can move
more SMB1/CIFS and SMB2.0 legacy functions into ifdefs but this
is a good start (and shrinks the module size a few percent).

Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-01 21:49:27 -05:00
Linus Torvalds 0e5ab8dd87 xfs: Changes for 5.19-rc1 [2nd set]
This update includes:
 - fix refcount leak in xfs_ifree()
 - fix xfs_buf_cancel structure leaks in log recovery
 - fix dquot leak after failed quota check
 - fix a couple of problematic ASSERTS
 - fix small aim7 perf regression in from new btree sibling
   validation
 - clean up log incompat feature marking for new logged attribute
   feature
 - disallow logged attributes on legacy V4 filesystem formats.
 - fix da state leak when freeing attr intents
 - improve validation of the attr log items in recovery
 - use slab caches for commonly used attr structures
 - fix leaks of attr name/value buffer and reduce copying overhead
   during intent logging
 - remove some dead debug code from log recovery
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmKX4ZUUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h06gQ//X9786aR6rfeMprvrWLqY0Ui6mGz4
 qI7s1BhsEyh6VMMzjVa0AzjX7R565ISTr4SdxLNewdPPAvro+avd2K4t+FdfFTG0
 9cA4kgC5MoURljHZmflYB8EKGsLXQ2fuzDmih6Ozu4pmKhKc5QU3XpsLn2HzLded
 KrNc08GX2JKvBxjdImk0pTxUq2xZ5CPWvpjdrfxnN2bNPHdJJtqBh/lhX1r73bqA
 Tz0RLwUqbL7fUZfIeslDlu2rU/MlZDXhT7C81y6tnyg7ObNN35NXuZX/UfQKFIWR
 pXUiPZTurso9Z7g7leEJ2Uco7Aeivs36mqes60Mv4YvN5ilv/Ja07kFZlfdaYkhJ
 YYSeIod1QLH3aOJOImPjYpOFOjyHrXmdG5KS5iLqADokywCPfgDMxCVWKeKxtLCC
 /1jBEQnKDWdZtAHup+vQ4PC1YP0rsLhXfNQNjYau8pwhEaN8nl2MOWMmQOLMyoES
 VAsBV9zrCa60sPT5IdYgnkRG3C+QV7nwLoLluguS+XvWtBgB0zxqjSZG5jFYYgCr
 v8VfW5esnvs+hF8YD3RmWpKxnoTuCXaftbc7ZdxneKZJyDPzWqr81zySCeBVCbt/
 wWrkl5E3Mdhq+LHDcbnrRZ63W377aRiNAh5D+aIeJUm0HZoEP+VLqBRVnWOuv/LC
 AfIuZcQi24PIZPw=
 =OLD4
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.19-for-linus-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull more xfs updates from Dave Chinner:
 "This update is largely bug fixes and cleanups for all the code merged
  in the first pull request. The majority of them are to the new logged
  attribute code, but there are also a couple of fixes for other log
  recovery and memory leaks that have recently been found.

  Summary:

   - fix refcount leak in xfs_ifree()

   - fix xfs_buf_cancel structure leaks in log recovery

   - fix dquot leak after failed quota check

   - fix a couple of problematic ASSERTS

   - fix small aim7 perf regression in from new btree sibling validation

   - clean up log incompat feature marking for new logged attribute
     feature

   - disallow logged attributes on legacy V4 filesystem formats.

   - fix da state leak when freeing attr intents

   - improve validation of the attr log items in recovery

   - use slab caches for commonly used attr structures

   - fix leaks of attr name/value buffer and reduce copying overhead
     during intent logging

   - remove some dead debug code from log recovery"

* tag 'xfs-5.19-for-linus-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits)
  xfs: fix xfs_ifree() error handling to not leak perag ref
  xfs: move xfs_attr_use_log_assist usage out of libxfs
  xfs: move xfs_attr_use_log_assist out of xfs_log.c
  xfs: warn about LARP once per mount
  xfs: implement per-mount warnings for scrub and shrink usage
  xfs: don't log every time we clear the log incompat flags
  xfs: convert buf_cancel_table allocation to kmalloc_array
  xfs: don't leak xfs_buf_cancel structures when recovery fails
  xfs: refactor buffer cancellation table allocation
  xfs: don't leak btree cursor when insrec fails after a split
  xfs: purge dquots after inode walk fails during quotacheck
  xfs: assert in xfs_btree_del_cursor should take into account error
  xfs: don't assert fail on perag references on teardown
  xfs: avoid unnecessary runtime sibling pointer endian conversions
  xfs: share xattr name and value buffers when logging xattr updates
  xfs: do not use logged xattr updates on V4 filesystems
  xfs: Remove duplicate include
  xfs: reduce IOCB_NOWAIT judgment for retry exclusive unaligned DIO
  xfs: Remove dead code
  xfs: fix typo in comment
  ...
2022-06-01 17:23:53 -07:00
Linus Torvalds 8171acb8bc Changes since last update:
- Leave compressed inodes unsupported in fscache mode for now;
 
  - Avoid crash when using tracepoint cachefiles_prep_read;
 
  - Fix `backmost' behavior due to a recent cleanup;
 
  - Update documentation for better description of recent new features;
 
  - Several decompression cleanups w/o logical change.
 -----BEGIN PGP SIGNATURE-----
 
 iIcEABYIAC8WIQThPAmQN9sSA0DVxtI5NzHcH7XmBAUCYpeFXxEceGlhbmdAa2Vy
 bmVsLm9yZwAKCRA5NzHcH7XmBC9eAQC8YSePEG+YCGbmOCGadSuBsgU+OXzKGpCV
 KxPyy3SmPQEAyNCDk11HoaYDRywS8TbMPntlyRfXvtEGSxbRe+5d1Qc=
 =4RnO
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull more erofs updates from Gao Xiang:
 "This is a follow-up to the main updates, including some fixes of
  fscache mode related to compressed inodes and a cachefiles tracepoint.
  There is also a patch to fix an unexpected decompression strategy
  change due to a cleanup in the past. All the fixes are quite small.

  Apart from these, documentation is also updated for a better
  description of recent new features.

  In addition, this has some trivial cleanups without actual code logic
  changes, so I could have a more recent codebase to work on folios and
  avoiding the PG_error page flag for the next cycle.

  Summary:

   - Leave compressed inodes unsupported in fscache mode for now

   - Avoid crash when using tracepoint cachefiles_prep_read

   - Fix `backmost' behavior due to a recent cleanup

   - Update documentation for better description of recent new features

   - Several decompression cleanups w/o logical change"

* tag 'erofs-for-5.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix 'backmost' member of z_erofs_decompress_frontend
  erofs: simplify z_erofs_pcluster_readmore()
  erofs: get rid of label `restart_now'
  erofs: get rid of `struct z_erofs_collection'
  erofs: update documentation
  erofs: fix crash when enable tracepoint cachefiles_prep_read
  erofs: leave compressed inodes unsupported in fscache mode for now
2022-06-01 11:54:29 -07:00
Linus Torvalds e5b0208713 fourteen smb3 server fixes, including some important RDMA (smbdirect) and crediting ones
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKW9QcACgkQiiy9cAdy
 T1HVFgv/e0eeV061PXnxXRrKcL65T3iJhkrmmobTksjrXMAku6KtTG7AwD+CbXkR
 TlYxsCDQhaip+iaMX0cm9DUdxjiuETCWUWho5HTvjh4VcXXbi2NBm9b9dVk5HFTD
 gxzVCPkHrkYidAhBhXNfXmppJudiq2rj1o3ksmL9MMlI+iw5Q8WSuPg89E8lKEB4
 WPvNDhuu9/volCk9p3QkUnh2LW9z+xjTvrYlX5kGFeZYTgwWEO1mym2FF+zgt7Bd
 wjlP9ip4HfzBmRr8DGzO8E4wIgUmmMIj5yMqn0f5ArisMj3pZYMSPHywO4EfktIw
 mhWJDsPWyuh78Cx7JBFGZoxCSAiHHvxfOvHeL9T0k+d1dNudYwFjMUmO0SEkpd3W
 u40lwU4nnyBn5/oHz7JN+D87zOe4AaHDbemLjQLXFPqOmmR2McRRTKYHNJF8zJby
 VKz/5WbnHZi6L0Hl/ZEm5ZqtEcWQQkOuVXlhw7q1KlI1glHzqBnJfeSVDZVmnj30
 ykFgfDOl
 =KsXc
 -----END PGP SIGNATURE-----

Merge tag '5.19-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull ksmbd server updates from Steve French:

 - rdma (smbdirect) fixes, cleanup and optimizations

 - crediting (flow control) fix for mounts from Windows client

 - ACL fix

 - Windows client query dir fix

 - write validation fix

 - cleanups

* tag '5.19-rc-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: smbd: relax the count of sges required
  ksmbd: fix outstanding credits related bugs
  ksmbd: smbd: fix connection dropped issue
  ksmbd: Fix some kernel-doc comments
  ksmbd: fix wrong smbd max read/write size check
  ksmbd: add smbd max io size parameter
  ksmbd: handle smb2 query dir request for OutputBufferLength that is too small
  ksmbd: smbd: handle multiple Buffer descriptors
  ksmbd: smbd: change the return value of get_sg_list
  ksmbd: smbd: simplify tracking pending packets
  ksmbd: smbd: introduce read/write credits for RDMA read/write
  ksmbd: smbd: change prototypes of RDMA read/write related functions
  ksmbd: validate length in smb2_write()
  ksmbd: fix reference count leak in smb_check_perm_dacl()
2022-06-01 11:17:24 -07:00
David Howells 17eabd4256 afs: Fix infinite loop found by xfstest generic/676
In AFS, a directory is handled as a file that the client downloads and
parses locally for the purposes of performing lookup and getdents
operations.  The in-kernel afs filesystem has a number of functions that
do this.

A directory file is arranged as a series of 2K blocks divided into
32-byte slots, where a directory entry occupies one or more slots, plus
each block starts with one or more metadata blocks.

When parsing a block, if the last slots are occupied by a dirent that
occupies more than a single slot and the file position points at a slot
that's not the initial one, the logic in afs_dir_iterate_block() that
skips over it won't advance the file pointer to the end of it.  This
will cause an infinite loop in getdents() as it will keep retrying that
block and failing to advance beyond the final entry.

Fix this by advancing the file pointer if the next entry will be beyond
it when we skip a block.

This was found by the generic/676 xfstest but can also be triggered with
something like:

	~/xfstests-dev/src/t_readdir_3 /xfstest.test/z 4000 1

Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: http://lore.kernel.org/r/165391973497.110268.2939296942213894166.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-06-01 11:11:51 -07:00
Pavel Begunkov 61c1b44a21 io_uring: fix deadlock on iowq file slot alloc
io_fixed_fd_install() can grab uring_lock in the slot allocation path
when called from io-wq, and then call into io_install_fixed_file(),
which will lock it again. Pull all locking out of
io_install_fixed_file() into io_fixed_fd_install().

Fixes: 1339f24b33 ("io_uring: allow allocated fixed files for openat/openat2")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/64116172a9d0b85b85300346bb280f3657aafc26.1654087283.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-06-01 10:12:41 -06:00
Mikulas Patocka 724bbe49c5
fs/ntfs3: provide block_invalidate_folio to fix memory leak
The ntfs3 filesystem lacks the 'invalidate_folio' method and it causes
memory leak. If you write to the filesystem and then unmount it, the
cached written data are not freed and they are permanently leaked.
Fixes: 7ba13abbd3 ("fs: Turn block_invalidatepage into block_invalidate_folio")

Reported-by: José Luis Lara Carrascal <manualinux@yahoo.es>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: stable@vger.kernel.org  # v5.18
2022-06-01 13:10:19 +03:00
Vincent Whitchurch cc391b694f cifs: fix potential deadlock in direct reclaim
The srv_mutex is used during writeback so cifs should ensure that
allocations done when that mutex is held are done with GFP_NOFS, to
avoid having direct reclaim ending up waiting for the same mutex and
causing a deadlock.  This is detected by lockdep with the splat below:

 ======================================================
 WARNING: possible circular locking dependency detected
 5.18.0 #70 Not tainted
 ------------------------------------------------------
 kswapd0/49 is trying to acquire lock:
 ffff8880195782e0 (&tcp_ses->srv_mutex){+.+.}-{3:3}, at: compound_send_recv

 but task is already holding lock:
 ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (fs_reclaim){+.+.}-{0:0}:
        fs_reclaim_acquire
        kmem_cache_alloc_trace
        __request_module
        crypto_alg_mod_lookup
        crypto_alloc_tfm_node
        crypto_alloc_shash
        cifs_alloc_hash
        smb311_crypto_shash_allocate
        smb311_update_preauth_hash
        compound_send_recv
        cifs_send_recv
        SMB2_negotiate
        smb2_negotiate
        cifs_negotiate_protocol
        cifs_get_smb_ses
        cifs_mount
        cifs_smb3_do_mount
        smb3_get_tree
        vfs_get_tree
        path_mount
        __x64_sys_mount
        do_syscall_64
        entry_SYSCALL_64_after_hwframe

 -> #0 (&tcp_ses->srv_mutex){+.+.}-{3:3}:
        __lock_acquire
        lock_acquire
        __mutex_lock
        mutex_lock_nested
        compound_send_recv
        cifs_send_recv
        SMB2_write
        smb2_sync_write
        cifs_write
        cifs_writepage_locked
        cifs_writepage
        shrink_page_list
        shrink_lruvec
        shrink_node
        balance_pgdat
        kswapd
        kthread
        ret_from_fork

 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(fs_reclaim);
                                lock(&tcp_ses->srv_mutex);
                                lock(fs_reclaim);
   lock(&tcp_ses->srv_mutex);

  *** DEADLOCK ***

 1 lock held by kswapd0/49:
  #0: ffffffffa98e66c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat

 stack backtrace:
 CPU: 2 PID: 49 Comm: kswapd0 Not tainted 5.18.0 #70
 Call Trace:
  <TASK>
  dump_stack_lvl
  dump_stack
  print_circular_bug.cold
  check_noncircular
  __lock_acquire
  lock_acquire
  __mutex_lock
  mutex_lock_nested
  compound_send_recv
  cifs_send_recv
  SMB2_write
  smb2_sync_write
  cifs_write
  cifs_writepage_locked
  cifs_writepage
  shrink_page_list
  shrink_lruvec
  shrink_node
  balance_pgdat
  kswapd
  kthread
  ret_from_fork
  </TASK>

Fix this by using the memalloc_nofs_save/restore APIs around the places
where the srv_mutex is held.  Do this in a wrapper function for the
lock/unlock of the srv_mutex, and rename the srv_mutex to avoid missing
call sites in the conversion.

Note that there is another lockdep warning involving internal crypto
locks, which was masked by this problem and is visible after this fix,
see the discussion in this thread:

 https://lore.kernel.org/all/20220523123755.GA13668@axis.com/

Link: https://lore.kernel.org/r/CANT5p=rqcYfYMVHirqvdnnca4Mo+JQSw5Qu12v=kPfpk5yhhmg@mail.gmail.com/
Reported-by: Shyam Prasad N <nspmangalore@gmail.com>
Suggested-by: Lars Persson <larper@axis.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-06-01 00:03:18 -05:00
Linus Torvalds 700170bf6b NFS Client Updates for Linux 5.18
- New Features:
   - Add support for 'dacl' and 'sacl' attributes
 
 - Bugfixes and Cleanups:
   - Fixes for reporting mapping errors
   - Fixes for memory allocation errors
   - Improve warning message when locks are lost
   - Update documentation for the nfs4_unique_id parameter
   - Add an explanation of NFSv4 client identifiers
   - Ensure the i_size attribute is written to the fscache storage
   - Fix freeing uninitialized nfs4_labels
   - Better handling when xprtrdma bc_serv is NULL
   - Marke qualified async operations as MOVEABLE tasks
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmKWhFQACgkQ18tUv7Cl
 QOszjRAAllmtKLbzOkwQwcT3e5ljh9NEJ8NL+Nv1SXjozpFY+1fuXc0ivT4rniU6
 68ZHz+faK2UtLytwOO94M0jo2RCAlYS5rfnts89CpdfP3bqmGPAj0Ytw/c/vg+Qf
 4eQbAzz++T35DgU7cdeKKZKg9Wtwbq7g0kYv1W8QCiCbxakSjnc/V9Ll5XhS/CAC
 1WqKD90TRKUkX0Y1NNsNdXB1dJn/6QAq9B6JTjan+2Rhn7/NCTU8p98mEZGcVD7r
 cPHyXTqkPF4IH7lgjEMIRf6eXEzDDZNIs98QLdHJ2Gk0LxW7p7IL7VW8TKzYunvl
 coA1bZfYhUZBUJ+eDrrKZ5hHMSn/+eNR5iiIcfqtyU8o3J0NXAXGlLh/iJSGsxIH
 PjyjWSfpCgoZVPc4dG3lxR9Iu7UZeAuuB2ZoiNakUkd+UNKK5U5PpaPnYT6adaIp
 TegivZclCmgyLQiAdPRifDzhaL5J2pp6kVb5iMY6oX+ObyclW/UcqzKMqIKSt3R8
 6+JAmZ6633ojS4r3xFsw/dlEUWuuVq7kYwXK209LqiBn5vvjWNa/WgH4MaSfnJe9
 rlw+fs8Aky0w59IhzRJMMVCJ/Q2EYDKmtQLQgYVw80RBFiFgBpMW0wDqMGiddTcu
 1IZ2c5+t1GxfASpyu8miexQjRJW6A2MTp0gfHGiHarxdCpaAycA=
 =0ccI
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:
   - Add support for 'dacl' and 'sacl' attributes

  Bugfixes and Cleanups:
   - Fixes for reporting mapping errors
   - Fixes for memory allocation errors
   - Improve warning message when locks are lost
   - Update documentation for the nfs4_unique_id parameter
   - Add an explanation of NFSv4 client identifiers
   - Ensure the i_size attribute is written to the fscache storage
   - Fix freeing uninitialized nfs4_labels
   - Better handling when xprtrdma bc_serv is NULL
   - Mark qualified async operations as MOVEABLE tasks"

* tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFSv4.1 mark qualified async operations as MOVEABLE tasks
  xprtrdma: treat all calls not a bcall when bc_serv is NULL
  NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
  NFS: Pass i_size to fscache_unuse_cookie() when a file is released
  Documentation: Add an explanation of NFSv4 client identifiers
  NFS: update documentation for the nfs4_unique_id parameter
  NFS: Improve warning message when locks are lost.
  NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes
  NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes
  NFSv4: Specify the type of ACL to cache
  NFSv4: Don't hold the layoutget locks across multiple RPC calls
  pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors
  NFS: Further fixes to the writeback error handling
  NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout
  NFS: Memory allocation failures are not server fatal errors
  NFS: Don't report errors from nfs_pageio_complete() more than once
  NFS: Do not report flush errors in nfs_write_end()
  NFS: Don't report ENOSPC write errors twice
  NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS
  NFS: Do not report EINTR/ERESTARTSYS as mapping errors
2022-05-31 16:58:24 -07:00
Linus Torvalds 1501f707d2 f2fs-for-5.19
In this round, we've refactored the existing atomic write support implemented
 by in-memory operations to have storing data in disk temporarily, which can give
 us a benefit to accept more atomic writes. At the same time, we removed the
 existing volatile write support. We've also revisited the file pinning and GC
 flows and found some corner cases which contributeed abnormal system behaviours.
 As usual, there're several minor code refactoring for readability, sanity check,
 and clean ups.
 
 Enhancement
  - allow compression for mmap files in compress_mode=user
  - kill volatile write support
  - change the current atomic write way
  - give priority to select unpinned section for foreground GC
  - introduce data read/write showing path info
  - remove unnecessary f2fs_lock_op in f2fs_new_inode
 
 Bug fix
  - fix the file pinning flow during checkpoint=disable and GCs
  - fix foreground and background GCs to select the right victims and get free
    sections on time
  - fix GC flags on defragmenting pages
  - avoid an infinite loop to flush node pages
  - fix fallocate to use file_modified to update permissions consistently
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAmKWfyEACgkQQBSofoJI
 UNJaAQ/9Hs3aGIyriGV8CMbarklRuQ24o3khQKdia5gHseFVsydMfba8tyvl7vYV
 fZnHKp9rnEV1emxWn7hHLaGOvPV8leajZqMLhqG384BIb0yoTnRipnK5t0JkoiJX
 53XC5yfxQd01dwS+J4uOSu2jW0Gs6iBLD6H9ahOs86OE6jF1TeQ/fqjsrhm9I8Zr
 GsNON6zxafPn248sYyVBB3Y5GjPBPf+USif3ZEidAWimW/TIGbXLUT1hA0B79YoX
 DRAmN3tYS75yXauQvFPerMbOmP2gwCPcvdCI/PZ4U/ApsEPP7k1SbOZYAjjGUB30
 Qn8cSMxzPZ1cHvzIC96vwJk8XPdcDhICfzROb7jJdeznD8cWTDv0E+Vd33HUf/mG
 pi5Lkpc4STvYD+KUaKpdnHVg6ARWw4HOnUtW43MF3OsfuyGEEPlROs6lBVYnk/Hz
 smlrgnnLMTOpH9y2JyuyExeHEJ3EAgWbJ8aRpq7Ua7FvKF45Yj1lIytWlvWXSnRf
 rp+A5QJhVtYvT+y2Rk2h5oTRj/9l3+pR0X7CTOfSivJuf6aH5XVgI0EmxT2iBTCp
 4SDBjLC+nXXP3EK1HamLiz1mU23Qg1Qwvx3Wc4xgdwQf3s+jyYxki9tIjzdwJCCZ
 adjd3fc/GrD9UPDmJDXlD5QSoOJ94K/NOwYpu1L1/Q+dVwkl+IE=
 =ta8Y
 -----END PGP SIGNATURE-----

Merge tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've refactored the existing atomic write support
  implemented by in-memory operations to have storing data in disk
  temporarily, which can give us a benefit to accept more atomic writes.

  At the same time, we removed the existing volatile write support.

  We've also revisited the file pinning and GC flows and found some
  corner cases which contributeed abnormal system behaviours.

  As usual, there're several minor code refactoring for readability,
  sanity check, and clean ups.

  Enhancements:

   - allow compression for mmap files in compress_mode=user

   - kill volatile write support

   - change the current atomic write way

   - give priority to select unpinned section for foreground GC

   - introduce data read/write showing path info

   - remove unnecessary f2fs_lock_op in f2fs_new_inode

  Bug fixes:

   - fix the file pinning flow during checkpoint=disable and GCs

   - fix foreground and background GCs to select the right victims and
     get free sections on time

   - fix GC flags on defragmenting pages

   - avoid an infinite loop to flush node pages

   - fix fallocate to use file_modified to update permissions
     consistently"

* tag 'f2fs-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
  f2fs: fix to tag gcing flag on page during file defragment
  f2fs: replace F2FS_I(inode) and sbi by the local variable
  f2fs: add f2fs_init_write_merge_io function
  f2fs: avoid unneeded error handling for revoke_entry_slab allocation
  f2fs: allow compression for mmap files in compress_mode=user
  f2fs: fix typo in comment
  f2fs: make f2fs_read_inline_data() more readable
  f2fs: fix to do sanity check for inline inode
  f2fs: fix fallocate to use file_modified to update permissions consistently
  f2fs: don't use casefolded comparison for "." and ".."
  f2fs: do not stop GC when requiring a free section
  f2fs: keep wait_ms if EAGAIN happens
  f2fs: introduce f2fs_gc_control to consolidate f2fs_gc parameters
  f2fs: reject test_dummy_encryption when !CONFIG_FS_ENCRYPTION
  f2fs: kill volatile write support
  f2fs: change the current atomic write way
  f2fs: don't need inode lock for system hidden quota
  f2fs: stop allocating pinned sections if EAGAIN happens
  f2fs: skip GC if possible when checkpoint disabling
  f2fs: give priority to select unpinned section for foreground GC
  ...
2022-05-31 16:52:59 -07:00
Ronnie Sahlberg f66f8b94e7 cifs: when extending a file with falloc we should make files not-sparse
as this is the only way to make sure the region is allocated.
Fix the conditional that was wrong and only tried to make already
non-sparse files non-sparse.

Cc: stable@vger.kernel.org
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-31 18:04:06 -05:00
Linus Torvalds 35b51afd23 RISC-V Patches for the 5.19 Merge Window, Part 1
* Support for the Svpbmt extension, which allows memory attributes to be
   encoded in pages.
 * Support for the Allwinner D1's implementation of page-based memory
   attributes.
 * Support for running rv32 binaries on rv64 systems, via the compat
   subsystem.
 * Support for kexec_file().
 * Support for the new generic ticket-based spinlocks, which allows us to
   also move to qrwlock.  These should have already gone in through the
   asm-geneic tree as well.
 * A handful of cleanups and fixes, include some larger ones around
   atomics and XIP.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmKWOx8THHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYieAiEADAUdP7ctoaSQwk5skd/fdA3b4KJuKn
 1Zjl+Br32WP0DlbirYBYWRUQZnCCsvABbTiwSJMcG7NBpU5pyQ5XDtB3OA5kJswO
 Fdp8Nd53//+GK1M5zdEM9OdgvT9fbfTZ3qTu8bKsROOQhGwnYL+Csc9KjFRqEmzN
 oQii0jlb3n5PM4FL3GsbV4uMn9zzkP9mnVAPQktcock2EKFEK/Fy3uNYMQiO2KPi
 n8O6bIDaeRdQ6SurzWOuOkt0cro0tEF85ilzT04mynQsOU0el5oGqCxnOhNH3VWg
 ndqPT6Yafw12hZOtbKJeP+nF8IIR6aJLP3jOtRwEVgcfbXYAw4QwbAV8kQZISefN
 ipn8JGY7GX9Y9TYU692OUGkcmAb3/dxb6c0WihBdvJ0M6YyLD5X+YKHNuG2onLgK
 ss43C5Mxsu629rsjdu/PV91B1+pve3rG9siVmF+g4eo0x9rjMq6/JB0Kal/8SLI1
 Je5T55d5ujV1a2XxhZLQOSD5owrK7J1M9owb0bloTnr9nVwFTWDrfEQEU82o3kP+
 Xm+FfXktnz9ai55NjkMbbEur5D++dKJhBavwCTnBcTrJmMtEH0R45GTK9ZehP+WC
 rNVrRXjIsS18wsTfJxnkZeFQA38as6VBKTzvwHvOgzTrrZU1/xk3lpkouYtAO6BG
 gKacHshVilmUuA==
 =Loi6
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the Svpbmt extension, which allows memory attributes to
   be encoded in pages

 - Support for the Allwinner D1's implementation of page-based memory
   attributes

 - Support for running rv32 binaries on rv64 systems, via the compat
   subsystem

 - Support for kexec_file()

 - Support for the new generic ticket-based spinlocks, which allows us
   to also move to qrwlock. These should have already gone in through
   the asm-geneic tree as well

 - A handful of cleanups and fixes, include some larger ones around
   atomics and XIP

* tag 'riscv-for-linus-5.19-mw0' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (51 commits)
  RISC-V: Prepare dropping week attribute from arch_kexec_apply_relocations[_add]
  riscv: compat: Using seperated vdso_maps for compat_vdso_info
  RISC-V: Fix the XIP build
  RISC-V: Split out the XIP fixups into their own file
  RISC-V: ignore xipImage
  RISC-V: Avoid empty create_*_mapping definitions
  riscv: Don't output a bogus mmu-type on a no MMU kernel
  riscv: atomic: Add custom conditional atomic operation implementation
  riscv: atomic: Optimize dec_if_positive functions
  riscv: atomic: Cleanup unnecessary definition
  RISC-V: Load purgatory in kexec_file
  RISC-V: Add purgatory
  RISC-V: Support for kexec_file on panic
  RISC-V: Add kexec_file support
  RISC-V: use memcpy for kexec_file mode
  kexec_file: Fix kexec_file.c build error for riscv platform
  riscv: compat: Add COMPAT Kbuild skeletal support
  riscv: compat: ptrace: Add compat_arch_ptrace implement
  riscv: compat: signal: Add rt_frame implementation
  riscv: add memory-type errata for T-Head
  ...
2022-05-31 14:10:54 -07:00
Olga Kornievskaia 118f09eda2 NFSv4.1 mark qualified async operations as MOVEABLE tasks
Mark async operations such as RENAME, REMOVE, COMMIT MOVEABLE
for the nfsv4.1+ sessions.

Fixes: 85e39feead ("NFSv4.1 identify and mark RPC tasks that can move between transports")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-31 17:09:30 -04:00
Benjamin Coddington c3ed222745 NFSv4: Fix free of uninitialized nfs4_label on referral lookup.
Send along the already-allocated fattr along with nfs4_fs_locations, and
drop the memcpy of fattr.  We end up growing two more allocations, but this
fixes up a crash as:

PID: 790    TASK: ffff88811b43c000  CPU: 0   COMMAND: "ls"
 #0 [ffffc90000857920] panic at ffffffff81b9bfde
 #1 [ffffc900008579c0] do_trap at ffffffff81023a9b
 #2 [ffffc90000857a10] do_error_trap at ffffffff81023b78
 #3 [ffffc90000857a58] exc_stack_segment at ffffffff81be1f45
 #4 [ffffc90000857a80] asm_exc_stack_segment at ffffffff81c009de
 #5 [ffffc90000857b08] nfs_lookup at ffffffffa0302322 [nfs]
 #6 [ffffc90000857b70] __lookup_slow at ffffffff813a4a5f
 #7 [ffffc90000857c60] walk_component at ffffffff813a86c4
 #8 [ffffc90000857cb8] path_lookupat at ffffffff813a9553
 #9 [ffffc90000857cf0] filename_lookup at ffffffff813ab86b

Suggested-by: Trond Myklebust <trondmy@hammerspace.com>
Fixes: 9558a007db ("NFS: Remove the label from the nfs4_lookup_res struct")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-05-31 17:09:24 -04:00
Enzo Matsumiya 0d5106a80e cifs: remove repeated debug message on cifs_put_smb_ses()
Similar message is printed a few lines later in the same function

Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-31 15:09:51 -05:00
Weizhao Ouyang 4398d3c31b erofs: fix 'backmost' member of z_erofs_decompress_frontend
Initialize 'backmost' to true in DECOMPRESS_FRONTEND_INIT.

Fixes: 5c6dcc57e2 ("erofs: get rid of `struct z_erofs_collector'")
Signed-off-by: Weizhao Ouyang <o451686892@gmail.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Yue Hu <huyue2@coolpad.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220530075114.918874-1-o451686892@gmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-31 23:15:30 +08:00
Gao Xiang aa793b46bb erofs: simplify z_erofs_pcluster_readmore()
Get rid of unnecessary label `skip'. No logic changes.

Link: https://lore.kernel.org/r/20220529055425.226363-4-xiang@kernel.org
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-31 23:15:21 +08:00
Gao Xiang 39397a46cf erofs: get rid of label `restart_now'
Simplify this part of code. No logic changes.

Link: https://lore.kernel.org/r/20220529055425.226363-3-xiang@kernel.org
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-31 23:14:58 +08:00
Gao Xiang 87ca34a706 erofs: get rid of `struct z_erofs_collection'
It was incompletely introduced for deduplication between different
logical extents backed with the same pcluster.

We will have a better in-memory representation in the next release
cycle for this, as well as partial memory folios support. So get rid
of it instead.

No logic changes.

Link: https://lore.kernel.org/r/20220529055425.226363-2-xiang@kernel.org
Acked-by: Chao Yu <chao@kernel.org>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-31 23:13:59 +08:00
Namjae Jeon f26967b9f7
fs/ntfs3: Fix invalid free in log_replay
log_read_rst() returns ENOMEM error when there is not enough memory.
In this case, if info is returned without initialization,
it attempts to kfree the uninitialized info->r_page pointer. This patch
moves the memset initialization code to before log_read_rst() is called.

Reported-by: Gerald Lee <sundaywind2004@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2022-05-31 17:29:29 +03:00
Konstantin Komarov 03ab8e6297 Linux 5.18
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmKKlIAeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGC3oH/iPm/fLG2sJut8My
 sU0RC9K+6ESV5h2Qy6k00/lqKstlu4EvBjw4V8vYpx3Q2+hbSFMn2SeWqqqT3Lkk
 Zb8KINCFuuyMtdCBb42PV0zhUf5pCQF7ocm/Ae4jllDHtPmqk3WJ6IGtZBK5JBlw
 z6RR/wKt0y0MRj9eZyPyYjOee2L2vuVh4tgnexK/4L8g2ZtMMRThhvUzSMWG4zxR
 STYYNp0uFcfT1Vt85+ODevFH4TvdECAj+SqAegN+seHLM17YY7M0/WiIYpxGRv8P
 lIpDQl4PBU8EBkpI5hkpJ/3qPincbuVOMLsYfxFtpcjjG12vGjFp2krGpS3TedZQ
 3mvaJ7c=
 =vLke
 -----END PGP SIGNATURE-----
gpgsig -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEh0DEKNP0I9IjwfWEqbAzH4MkB7YFAmKWFH8ACgkQqbAzH4Mk
 B7YsNA/+JEUIX0XGcDICBtIHAsy6qYrkJmAi2kyBBdx+rDOqZg/vu/32d+/jSGLK
 QThrMqFw1K09bjuprgeD1ncZE/65hhnTdsu2m3CD7sSsKYvaEuWvdBy2//nGMWxA
 28jtzpcyQDSGFsQnL+YqiNBUc+YAScKq8JpXC0MLShT+MabdXJtaLeqfnJYKGAG0
 zZV+449LYf87VHnW1ngObDpXOY7Tu4u6Oz43ipb5mCRRk3E8eDBSTcMhRNLHzqfS
 8UaF/cJzoGen/SmwL5wyMitTT9LJtDEU2IqTVGS+BlpO5+PyJAfyRoQ09CdWekst
 4aHvpqv7nHPTzgxqQqtTxswOHtYqjXX1UvCNs04dYNbtvikshRPYLsjngj6vFo8s
 VdXykS7m3UbW+te6okfFYVztk7ZDbrOoBXCDK0xvZeQuEWO44PNUhfJOYWooF53Y
 VbqpCT868jpVLsiw0+/vtcEZM0lFF2QmeBDzHkjcbrEMbIzzd6s+0czHmWB2TLhE
 UqcN6nMj542hIKs2cD5hrVHQBS92Xr6LK4HQQp0XG/K5k8O5cvCfiwOcA8nYeDBu
 0GmvzV7dPTg0kUhPIyqyPeoB+rIdukp27Iwr+H2345dYdP/BTeJVV3lLcxvuIUM3
 8SESbYltJYw1V6rbUUPFZHmHBtxOsc2V34wvd6S9UZwr64196qE=
 =wXwb
 -----END PGP SIGNATURE-----

Merge tag 'v5.18'

Linux 5.18
2022-05-31 16:13:23 +03:00
Xiaoguang Wang a7c41b4687 io_uring: let IORING_OP_FILES_UPDATE support choosing fixed file slots
One big issue with the file registration feature is that it needs user
space apps to maintain free slot info about io_uring's fixed file table,
which really is a burden for development. io_uring now supports choosing
free file slot for user space apps by using IORING_FILE_INDEX_ALLOC flag
in accept, open, and socket operations, but they need the app to use
direct accept or direct open, which not all apps are prepared to use yet.

To support apps that still need real fds, make use of the registration
feature easier. Let IORING_OP_FILES_UPDATE support choosing fixed file
slots, which will store picked fixed files slots in fd array and let cqe
return the number of slots allocated.

Suggested-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
[axboe: move flag to uapi io_uring header, change goto to break, init]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Xiaoguang Wang 4278a0deb1 io_uring: defer alloc_hint update to io_file_bitmap_set()
io_file_bitmap_get() returns a free bitmap slot, but if it isn't
used later, such as io_queue_rsrc_removal() returns error, in this
case, we should not update alloc_hint at all, which still should
be considered as a valid candidate for next io_file_bitmap_get()
calls.

To fix this issue, only update alloc_hint in io_file_bitmap_set().

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220528015109.48039-1-xiaoguang.wang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Xiaoguang Wang 8c71fe7502 io_uring: ensure fput() called correspondingly when direct install fails
io_fixed_fd_install() may fail for short of free fixed file bitmap,
in this case, need to call fput() correspondingly.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220527025400.51048-1-xiaoguang.wang@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Jens Axboe fa82dd105b io_uring: wire up allocated direct descriptors for socket
The socket support was merged in an earlier branch that didn't yet
have support for allocating direct descriptors, hence only open
and accept got support for that.

Do the one-liner to enable it now, so we have consistent support for
any request that can instantiate a file/direct descriptor.

Reviewed-by: Hao Xu <howeyxu@tencent.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Jens Axboe 21870e02fc io_uring: fix a memory leak of buffer group list on exit
If we use a buffer group ID that is large enough to require io_uring
to allocate it, then we don't correctly free it if the cleanup is
deferred to the ring exit. The explicit removal paths are fine.

Fixes: 9cfc7e94e4 ("io_uring: get rid of hashed provided buffer groups")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Jens Axboe 1151a7cccb io_uring: move shutdown under the general net section
Gets rid of some ifdefs and enables use of the net defines for when
CONFIG_NET isn't set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Jens Axboe 157dc813b4 io_uring: unify calling convention for async prep handling
Make them consistent in preparation for defining a req async prep
handler. The readv/writev requests share a prep handler, move it one
level down so the initial one is consistent with the others.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:06 -06:00
Jens Axboe fcde59feb1 io_uring: add io_op_defs 'def' pointer in req init and issue
Define and set it when appropriate, and use it consistently in the
function rather than using io_op_defs[opcode].

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-31 02:50:02 -06:00
Ronnie Sahlberg 8378a51e3f cifs: fix potential double free during failed mount
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=2088799

Cc: stable@vger.kernel.org
Signed-off-by: Roberto Bergantinos <rbergant@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-30 23:04:58 -05:00
Linus Torvalds 2c5ca23f74 overlayfs update for 5.19
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCYo+nGwAKCRDh3BK/laaZ
 PBouAP0VBH/jygclzc42jlRkKjp+wJnF1FifpWOJEtTPiYqhtAD/UWjR/2Sy4TMT
 fRsw9N9/FXxcXShjg3U42fpCNSVEqgM=
 =oY4z
 -----END PGP SIGNATURE-----

Merge tag 'ovl-update-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs

Pull overlayfs updates from Miklos Szeredi:

 - Support idmapped layers in overlayfs (Christian Brauner)

 - Add a fix to exportfs that is relevant to open_by_handle_at(2) as
   well

 - Introduce new lookup helpers that allow passing mnt_userns into
   inode_permission()

* tag 'ovl-update-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
  ovl: support idmapped layers
  ovl: handle idmappings in ovl_xattr_{g,s}et()
  ovl: handle idmappings in layer open helpers
  ovl: handle idmappings in ovl_permission()
  ovl: use ovl_copy_{real,upper}attr() wrappers
  ovl: store lower path in ovl_inode
  ovl: handle idmappings for layer lookup
  ovl: handle idmappings for layer fileattrs
  ovl: use ovl_path_getxattr() wrapper
  ovl: use ovl_lookup_upper() wrapper
  ovl: use ovl_do_notify_change() wrapper
  ovl: pass layer mnt to ovl_open_realfile()
  ovl: pass ofs to setattr operations
  ovl: handle idmappings in creation operations
  ovl: add ovl_upper_mnt_userns() wrapper
  ovl: pass ofs to creation operations
  ovl: use wrappers to all vfs_*xattr() calls
  exportfs: support idmapped mounts
  fs: add two trivial lookup helpers
2022-05-30 11:19:16 -07:00
Linus Torvalds 2d2da475ac m68knommu: changes for linux 5.19
. correctly set up ZERO_PAGE pointer
 . drop ISA_DMA_API support
 . fix comment typos
 . fixes for undefined symbols
 . remove unused code and variables
 . elf-fdpic loader support for m68k
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEmsfM6tQwfNjBOxr3TiQVqaG9L4AFAmKURewACgkQTiQVqaG9
 L4BMZQ/+NkDZREfDz8mvWfZfuBbsc2+jEm1MpEZkj0JuHbJxoewOzUqBvI8ya/jD
 Q9pz1WQzDe2X0kU60vwH1AmdoJZ7ZuK/44JCKaxSlUVFuvOGwtOfbaAb51RkhLhh
 P6ds2w32X/DgGIUgrvUwkRytG80lFrnw0YHPOFokfLsfZYdVSOsPsbEDZG1g0mWR
 yD4kKuUoChUa5RxmlhJNWByTyXgJQQB0qaMr04ovx8gRHGIDNjW0kBq1fBS6i2ua
 6ZD00PkgWpDiEpiBkcpr80CDnjEv1Aaz79lltkS+lsAkey0EVEFiZN9GhHuejYgf
 703E1vXgRmB8iYc2IVR+KhGquqVH4aKny1MxVCOPrRmchhq094x/t3hmfsyOSlVk
 MVngtJ8SQCNYE8tMNVC8CuZWn14vpHZzg9cLiMAxh+WWOTxgZgAr+6iIUWcd1q+Q
 z/qILRIZbva5Le3gq03+vRzW+BDsqgIsq0Py4q3xvOo+TG99C2LCZSoV+mwaod6G
 g3ive8SmacwJLM7VrEYbElykFxasN02K+DZDuvq4M2/CP5FsZb7fbAJA4L7y/1B8
 /LH2LQA4uwxnRME4KRCY2MDcRYz5O/1q9U3eE4MzbsvMWCCFYyyp7XGDTL3HCtWM
 QyhwvTYfX9YkgqZHCt97mdyBBWfADVcgnGGtWFIziFaLczU3zpg=
 =O3zh
 -----END PGP SIGNATURE-----

Merge tag 'm68knommu-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu

Pull m68knommu updates from Greg Ungerer:
 "A collection of changes to add elf-fdpic loader support for m68k.

  Also a collection of various fixes. They include typo corrections,
  undefined symbol compilation fixes, removal of the ISA_DMA_API support
  and removal of unused code.

  Summary:

   - correctly set up ZERO_PAGE pointer

   - drop ISA_DMA_API support

   - fix comment typos

   - fixes for undefined symbols

   - remove unused code and variables

   - elf-fdpic loader support for m68k"

* tag 'm68knommu-for-v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
  m68knommu: fix 68000 CPU link with no platform selected
  m68k: removed unused "mach_get_ss"
  m68knommu: fix undefined reference to `mach_get_rtc_pll'
  m68knommu: fix undefined reference to `_init_sp'
  m68knommu: allow elf_fdpic loader to be selected
  m68knommu: add definitions to support elf_fdpic program loader
  m68knommu: implement minimal regset support
  m68knommu: use asm-generic/mmu.h for nommu setups
  m68k: fix typos in comments
  m68k: coldfire: drop ISA_DMA_API support
  m68knommu: set ZERO_PAGE() to the allocated zeroed page
2022-05-30 10:56:18 -07:00
Dave Chinner 7146bda743 Merge branch 'guilt/xfs-5.19-larp-cleanups' into xfs-5.19-for-next
This series contains a two key cleanups for the new LARP code.  Most
of it is refactoring and tweaking the code that creates kernel log
messages about enabling and disabling features -- we should be
warning about LARP being turned on once per mount, instead of once
per insmod cycle; we shouldn't be spamming the logs so aggressively
about turning *off* log incompat features.

The second part of the series refactors the LARP code responsible
for getting (and releasing) permission to use xattr log items.  The
implementation code doesn't belong in xfs_log.c, and calls to
logging functions don't belong in libxfs -- they really should be
done by the VFS implementation functions before they start calling
into libraries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-30 10:58:59 +10:00
Dave Chinner 621dc801df Merge branch 'guilt/xfs-5.19-recovery-buf-cancel' into xfs-5.19-for-next
As part of solving the memory leaks and UAF problems in the new LARP
code, kmemleak also reported that log recovery will leak the table
used to hash buffer cancellations if the recovery fails.  Fix this
problem by creating alloc/free helpers that initialize and free the
hashtable contents correctly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-30 10:58:13 +10:00
Brian Foster 6f5097e336 xfs: fix xfs_ifree() error handling to not leak perag ref
For some reason commit 9a5280b312 ("xfs: reorder iunlink remove
operation in xfs_ifree") replaced a jump to the exit path in the
event of an xfs_difree() error with a direct return, which skips
releasing the perag reference acquired at the top of the function.
Restore the original code to drop the reference on error.

Fixes: 9a5280b312 ("xfs: reorder iunlink remove operation in xfs_ifree")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-30 10:56:33 +10:00
Xin Yin b5cb79dcfd erofs: fix crash when enable tracepoint cachefiles_prep_read
RIP: 0010:trace_event_raw_event_cachefiles_prep_read+0x88/0xe0
[cachefiles]
Call Trace:
  <TASK>
  cachefiles_prepare_read+0x1d7/0x3a0 [cachefiles]
  erofs_fscache_read_folios+0x188/0x220 [erofs]
  erofs_fscache_meta_readpage+0x106/0x160 [erofs]
  do_read_cache_folio+0x42a/0x590
  ? bdi_register_va.part.14+0x1a7/0x210
  ? super_setup_bdi_name+0x76/0xe0
  erofs_bread+0x5b/0x170 [erofs]
  erofs_fc_fill_super+0x12b/0xc50 [erofs]

This tracepoint uses rreq->inode, should set it when allocating.

Fixes: d435d53228 ("erofs: change to use asynchronous io for fscache readpage/readahead")
Signed-off-by: Xin Yin <yinxin.x@bytedance.com>
Reviewed-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220527101800.22360-1-yinxin.x@bytedance.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-29 15:36:04 +08:00
Jeffle Xu 0130e4e8e4 erofs: leave compressed inodes unsupported in fscache mode for now
erofs over fscache doesn't support the compressed layout yet. It will
cause NULL crash if there are compressed inodes contained when working
in fscache mode.

So far in the erofs based container image distribution scenarios
(RAFS v6), the compressed RAFS v6 images are downloaded and then
decompressed on demand as an uncompressed erofs image. Then the erofs
image is mounted in fscache mode for containers to use. IOWs, currently
compressed data is decompressed on the userspace side instead and
uncompressed erofs images will be finally cached.

The fscache support for the compressed layout is still under
development and it will be used for runtime decompression feature.
Anyway, to avoid the potential crash, let's leave the compressed inodes
unsupported in fscache mode until we support it later.

Fixes: 1442b02b66 ("erofs: implement fscache-based data read for non-inline layout")
Signed-off-by: Jeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/r/20220526010344.118493-1-jefflexu@linux.alibaba.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2022-05-29 15:34:54 +08:00
Linus Torvalds 6112bd00e8 powerpc updates for 5.19
- Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT).
 
  - Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later).
 
  - Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ.
 
  - Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later.
 
  - Drop support for system call instruction emulation.
 
  - Many other small features and fixes.
 
 Thanks to: Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas Sanjaya, Bjorn
 Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian King, Daniel Axtens, Dwaipayan
 Ray, Fabiano Rosas, Finn Thain, Frank Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu
 Hua, Haowen Bai, Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing
 Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof Kozlowski, Laurent
 Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes, Miaoqian Lin, Minghao Chi, Nathan
 Chancellor, Naveen N. Rao, Nicholas Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár,
 Paul Mackerras, Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib
 Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang wangx, Xiaomeng Tong,
 Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing, Yu Kuai, Zheng Bin, Zou Wei, Zucheng
 Zheng.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmKSEgETHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgJpLEACee7mu2I00Z7VWtW5ckT4RFbAXYZcM
 Hv5DbTnVB2ItoQMRHvG52DNbR73j9HnYrz8kpwfTBVk90udxVP14L/swXDs3xbT4
 riXEYtJ1DRVc/bLiOK637RLPWNrmmZStWZme7k0Y9Ki5Aif8i1Erjjq7EIy47m9j
 j1MTcwp3ND7IsBON2nZ3PkttEHhevKvOwCPb/BWtPMDV0OhyQUFKB2SNegrlCrkT
 wshDgdQcYqbIix98PoGa2ZfUVgFQD3JVLzXa4sLpqouzGD+HvEFStOFa2Gq/ZEvV
 zunaeXDdZUCjlib6KvA8+aumBbIQ1s/urrDbxd+3BuYxZ094vNP1B428NT1AWVtl
 3bEZQIN8GSx0v9aHxZ8HePsAMXgG9d2o0xC9EMQ430+cqroN+6UHP7lkekwkprb7
 U9EpZCG9U8jV6SDcaMigW3tooEjn657we0R8nZG2NgUNssdSHVh/JYxGDALPXIAk
 awL3NQrR0tYF3Y3LJm5AxdQrK1hJH8E+hZFCZvIpUXGsr/uf9Gemy/62pD1rhrr/
 niULpxIneRGkJiXB5qdGy8pRu27ED53k7Ky6+8MWSEFQl1mUsHSryYACWz939D8c
 DydhBwQqDTl6Ozs41a5TkVjIRLOCrZADUd/VZM6A4kEOqPJ5t2Gz22Bn8ya1z6Ks
 5Sx6vrGH7GnDjA==
 =15oQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Convert to the generic mmap support (ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT)

 - Add support for outline-only KASAN with 64-bit Radix MMU (P9 or later)

 - Increase SIGSTKSZ and MINSIGSTKSZ and add support for AT_MINSIGSTKSZ

 - Enable the DAWR (Data Address Watchpoint) on POWER9 DD2.3 or later

 - Drop support for system call instruction emulation

 - Many other small features and fixes

Thanks to Alexey Kardashevskiy, Alistair Popple, Andy Shevchenko, Bagas
Sanjaya, Bjorn Helgaas, Bo Liu, Chen Huang, Christophe Leroy, Colin Ian
King, Daniel Axtens, Dwaipayan Ray, Fabiano Rosas, Finn Thain, Frank
Rowand, Fuqian Huang, Guilherme G. Piccoli, Hangyu Hua, Haowen Bai,
Haren Myneni, Hari Bathini, He Ying, Jason Wang, Jiapeng Chong, Jing
Yangyang, Joel Stanley, Julia Lawall, Kajol Jain, Kevin Hao, Krzysztof
Kozlowski, Laurent Dufour, Lv Ruyi, Madhavan Srinivasan, Magali Lemes,
Miaoqian Lin, Minghao Chi, Nathan Chancellor, Naveen N. Rao, Nicholas
Piggin, Oliver O'Halloran, Oscar Salvador, Pali Rohár, Paul Mackerras,
Peng Wu, Qing Wang, Randy Dunlap, Reza Arbab, Russell Currey, Sohaib
Mohamed, Vaibhav Jain, Vasant Hegde, Wang Qing, Wang Wensheng, Xiang
wangx, Xiaomeng Tong, Xu Wang, Yang Guang, Yang Li, Ye Bin, YueHaibing,
Yu Kuai, Zheng Bin, Zou Wei, and Zucheng Zheng.

* tag 'powerpc-5.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (200 commits)
  powerpc/64: Include cache.h directly in paca.h
  powerpc/64s: Only set HAVE_ARCH_UNMAPPED_AREA when CONFIG_PPC_64S_HASH_MMU is set
  powerpc/xics: Include missing header
  powerpc/powernv/pci: Drop VF MPS fixup
  powerpc/fsl_book3e: Don't set rodata RO too early
  powerpc/microwatt: Add mmu bits to device tree
  powerpc/powernv/flash: Check OPAL flash calls exist before using
  powerpc/powermac: constify device_node in of_irq_parse_oldworld()
  powerpc/powermac: add missing g5_phy_disable_cpu1() declaration
  selftests/powerpc/pmu: fix spelling mistake "mis-match" -> "mismatch"
  powerpc: Enable the DAWR on POWER9 DD2.3 and above
  powerpc/64s: Add CPU_FTRS_POWER10 to ALWAYS mask
  powerpc/64s: Add CPU_FTRS_POWER9_DD2_2 to CPU_FTRS_ALWAYS mask
  powerpc: Fix all occurences of "the the"
  selftests/powerpc/pmu/ebb: remove fixed_instruction.S
  powerpc/platforms/83xx: Use of_device_get_match_data()
  powerpc/eeh: Drop redundant spinlock initialization
  powerpc/iommu: Add missing of_node_put in iommu_init_early_dart
  powerpc/pseries/vas: Call misc_deregister if sysfs init fails
  powerpc/papr_scm: Fix leaking nvdimm_events_map elements
  ...
2022-05-28 11:27:17 -07:00
Hyunchul Lee 621433b7e2 ksmbd: smbd: relax the count of sges required
Remove the condition that the count of sges
must be greater than or equal to
SMB_DIRECT_MAX_SEND_SGES(8).
Because ksmbd needs sges only for SMB direct
header, SMB2 transform header, SMB2 response,
and optional payload.

Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Reviewed-by: Tom Talpey <tom@talpey.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-27 21:31:20 -05:00
Linus Torvalds bf272460d7 Twenty four cifs/smb3 client fixes, including multichannel but does not include the iov_iter ones
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmKRBV4ACgkQiiy9cAdy
 T1FYcgv7BWeF/72rw2qxuLUnj9B2aCnjCkpb2r7sN0951gTgFV9Iw4Bg5KyCym1A
 Pjl7H3hj0R/djIwzSTbPmsIUZxEzAB56MyKgaoBbkg0N0AfwHYqEOHpTa7c9NaqT
 CkbgJxtqcFBl3uNLMW9qyAD7MFDqF8OkSFCv01HYUukaQKBgzUnuoLmhvNQYeN50
 DhxSIk+6+ekyUpuTKitHclldbk8IbUDRO5jRZrhXjP7SObWID1EMVBz4QNyrw3Du
 G3Mi4K/FbVkrHe4OTcyMMc4rTVbaOwaHJmvgBFM5Qb1buaplbGEo7lTxus0PUVzd
 aWyaj2duXNuKjFZuov/ZCsnSJMvl2TG21Bku/uLNGKsnIQn7UhYCLcDyZa/UCnRE
 zPd5M2PD/L8uKONSg/6IVlVIzNMmvYRpyqqGg/4CZpu1Qhs53MkLdnZqSB+NyzV7
 O2I6CIGVbp64f8YyBFZ6bhdxBwyXeoiF3RkYeKYrtCp4Z0RfQYjyMb5t2NDcpVo/
 gL0tho/Q
 =bR0w
 -----END PGP SIGNATURE-----

Merge tag '5.19-rc-smb3-client-fixes-updated' of git://git.samba.org/sfrench/cifs-2.6

Pull cifs client updates from Steve French:

 - multichannel fixes to improve reconnect after network failure

 - improved caching of root directory contents (extending benefit of
   directory leases)

 - two DFS fixes

 - three fixes for improved debugging

 - an NTLMSSP fix for mounts t0 older servers

 - new mount parm to allow disabling creating sparse files

 - various cleanup fixes and minor fixes pointed out by coverity

* tag '5.19-rc-smb3-client-fixes-updated' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  smb3: remove unneeded null check in cifs_readdir
  cifs: fix ntlmssp on old servers
  cifs: cache the dirents for entries in a cached directory
  cifs: avoid parallel session setups on same channel
  cifs: use new enum for ses_status
  cifs: do not use tcpStatus after negotiate completes
  smb3: add mount parm nosparse
  smb3: don't set rc when used and unneeded in query_info_compound
  smb3: check for null tcon
  cifs: fix minor compile warning
  Add various fsctl structs
  Add defines for various newer FSCTLs
  smb3: add trace point for oplock not found
  cifs: return the more nuanced writeback error on close()
  smb3: add trace point for lease not found issue
  cifs: smbd: fix typo in comment
  cifs: set the CREATE_NOT_FILE when opening the directory in use_cached_dir()
  cifs: check for smb1 in open_cached_dir()
  cifs: move definition of cifs_fattr earlier in cifsglob.h
  cifs: print TIDs as hex
  ...
2022-05-27 16:05:57 -07:00
Linus Torvalds aef1ff1592 JFS: One bug fix and some code cleanup
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAmKQ6TQACgkQNqiEXrVA
 jGQJBg/+Pm8HDYVKueUg2ZfGWfyFyYPvFkpthTD/sErTFmqjTYvVrEGVGxnHZPG1
 4NLXsgsNOGh1HHWGO8UMlKCTLW3nEfzn/PZ6lH9AFb0tCE0jdE/K/9iMbLr5rkZM
 CTCnj3xC0/2tS1deX+9KfbOYvhk1sXPoazhHXl3QQ0TyFKSRHeZdnPRWhJsHtrsH
 S30WIaYvuyiMw+0grlV3dPVq+Cj49fRX4k0ipr7JVQoPEUoahCp7h5i6Fxk1PRYZ
 2P2iF9zFzMjRjPrj86pDQNI9GxzOsmKIa9f0n/C97wyI8HDNj39kfNriRPvjbJ/D
 k6j8ReddSxc61368tiOASA9j8bORb7aRFsKQ3kPkRHZi/TF4l62s4jSr2wfvSHvV
 uH3wIfZ49uRHyWwDcuvWguKd3w3Zx3hVahs0SQSZm1j7GxmCGxT9E4BrRNe9oTyl
 Th3c6pZaDImJ8JmewqGz+yBfMGMhBpXaKPQuHaqKrNtFfkNEyp/PKst+V8OdS5+v
 8FQaR6hfJpWyN00LJq87NX5rv0Uq+CI1UaEEw9ks+brY5xoGZkKk/Cmxeh70otyz
 eRVfm6xzwBMZcfEuEQ5wH/BdBtbMKIo6O04q5ity+c75igvIw8H8n+M+v5rOaw/l
 puLOCplWdvVnbHabHeg7y0OyiNx0WagdW8q8ACLMl1TELl/tiAE=
 =KcwK
 -----END PGP SIGNATURE-----

Merge tag 'jfs-5.19' of https://github.com/kleikamp/linux-shaggy

Pull jfs updates from David Kleikamp:
 "One bug fix and some code cleanup"

* tag 'jfs-5.19' of https://github.com/kleikamp/linux-shaggy:
  fs/jfs: Remove dead code
  fs: jfs: fix possible NULL pointer dereference in dbFree()
2022-05-27 15:59:21 -07:00
Linus Torvalds 35cdd8656e libnvdimm for 5.19
- Add support for clearing memory error via pwrite(2) on DAX
 
 - Fix 'security overwrite' support in the presence of media errors
 
 - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSbo+XnGs+rwLz9XGXfioYZHlFsZwUCYpFPcQAKCRDfioYZHlFs
 Z9A3AQCdfoT5sY3OK+I/3oTvJ//6lw2MtXrnXFM046ICKPi9sgD8CzR9mRAHA+vj
 kxOtJEU2bA9naninXGORsDUndiNkwQo=
 =gVIn
 -----END PGP SIGNATURE-----

Merge tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm and DAX updates from Dan Williams:
 "New support for clearing memory errors when a file is in DAX mode,
  alongside with some other fixes and cleanups.

  Previously it was only possible to clear these errors using a truncate
  or hole-punch operation to trigger the filesystem to reallocate the
  block, now, any page aligned write can opportunistically clear errors
  as well.

  This change spans x86/mm, nvdimm, and fs/dax, and has received the
  appropriate sign-offs. Thanks to Jane for her work on this.

  Summary:

   - Add support for clearing memory error via pwrite(2) on DAX

   - Fix 'security overwrite' support in the presence of media errors

   - Miscellaneous cleanups and fixes for nfit_test (nvdimm unit tests)"

* tag 'libnvdimm-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  pmem: implement pmem_recovery_write()
  pmem: refactor pmem_clear_poison()
  dax: add .recovery_write dax_operation
  dax: introduce DAX_RECOVERY_WRITE dax access mode
  mce: fix set_mce_nospec to always unmap the whole page
  x86/mce: relocate set{clear}_mce_nospec() functions
  acpi/nfit: rely on mce->misc to determine poison granularity
  testing: nvdimm: asm/mce.h is not needed in nfit.c
  testing: nvdimm: iomap: make __nfit_test_ioremap a macro
  nvdimm: Allow overwrite in the presence of disabled dimms
  tools/testing/nvdimm: remove unneeded flush_workqueue
2022-05-27 15:49:30 -07:00
Chao Yu 2d1fe8a86b f2fs: fix to tag gcing flag on page during file defragment
In order to garantee migrated data be persisted during checkpoint,
otherwise out-of-order persistency between data and node may cause
data corruption after SPOR.

Signed-off-by: Chao Yu <chao.yu@oppo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-27 13:58:42 -07:00
Yufen Yu 054cb2891b f2fs: replace F2FS_I(inode) and sbi by the local variable
We have define 'fi' at the begin of the functions, just use it,
rather than use F2FS_I(inode) again.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: replace sbi]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-27 13:58:23 -07:00
Linus Torvalds 6f664045c8 Not a lot of material this cycle. Many singleton patches against various
subsystems.   Most notably some maintenance work in ocfs2 and initramfs.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo/6xQAKCRDdBJ7gKXxA
 jkD9AQCPczLBbRWpe1edL+5VHvel9ePoHQmvbHQnufdTh9rB5QEAu0Uilxz4q9cx
 xSZypNhj2n9f8FCYca/ZrZneBsTnAA8=
 =AJEO
 -----END PGP SIGNATURE-----

Merge tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull misc updates from Andrew Morton:
 "The non-MM patch queue for this merge window.

  Not a lot of material this cycle. Many singleton patches against
  various subsystems. Most notably some maintenance work in ocfs2
  and initramfs"

* tag 'mm-nonmm-stable-2022-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (65 commits)
  kcov: update pos before writing pc in trace function
  ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
  ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
  fs/ntfs: remove redundant variable idx
  fat: remove time truncations in vfat_create/vfat_mkdir
  fat: report creation time in statx
  fat: ignore ctime updates, and keep ctime identical to mtime in memory
  fat: split fat_truncate_time() into separate functions
  MAINTAINERS: add Muchun as a memcg reviewer
  proc/sysctl: make protected_* world readable
  ia64: mca: drop redundant spinlock initialization
  tty: fix deadlock caused by calling printk() under tty_port->lock
  relay: remove redundant assignment to pointer buf
  fs/ntfs3: validate BOOT sectors_per_clusters
  lib/string_helpers: fix not adding strarray to device's resource list
  kernel/crash_core.c: remove redundant check of ck_cmdline
  ELF, uapi: fixup ELF_ST_TYPE definition
  ipc/mqueue: use get_tree_nodev() in mqueue_get_tree()
  ipc: update semtimedop() to use hrtimer
  ipc/sem: remove redundant assignments
  ...
2022-05-27 11:22:03 -07:00
David Howells 189b0ddc24 pipe: Fix missing lock in pipe_resize_ring()
pipe_resize_ring() needs to take the pipe->rd_wait.lock spinlock to
prevent post_one_notification() from trying to insert into the ring
whilst the ring is being replaced.

The occupancy check must be done after the lock is taken, and the lock
must be taken after the new ring is allocated.

The bug can lead to an oops looking something like:

 BUG: KASAN: use-after-free in post_one_notification.isra.0+0x62e/0x840
 Read of size 4 at addr ffff88801cc72a70 by task poc/27196
 ...
 Call Trace:
  post_one_notification.isra.0+0x62e/0x840
  __post_watch_notification+0x3b7/0x650
  key_create_or_update+0xb8b/0xd20
  __do_sys_add_key+0x175/0x340
  __x64_sys_add_key+0xbe/0x140
  do_syscall_64+0x5c/0xc0
  entry_SYSCALL_64_after_hwframe+0x44/0xae

Reported by Selim Enes Karaduman @Enesdex working with Trend Micro Zero
Day Initiative.

Fixes: c73be61ced ("pipe: Add general notification queue support")
Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-17291
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-27 10:45:59 -07:00
Steve French 44a48081fc smb3: remove unneeded null check in cifs_readdir
Coverity pointed out an unneeded check.

Addresses-Coverity: 1518030 ("Null pointer dereferences")
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-27 12:05:47 -05:00
Haowen Bai 532aef5912 ubifs: Use NULL instead of using plain integer as pointer
This fixes the following sparse warnings:
fs/ubifs/xattr.c:680:58: warning: Using plain integer as NULL pointer

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 16:24:12 +02:00
Minghao Chi 5bff56edab ubifs: Simplify the return expression of run_gc()
Simplify the return expression.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn>
Reviewed-by: Tudor Ambarus <tudor.ambarus@microchip.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 16:21:20 +02:00
Baokun Li c14adb1cf7 jffs2: fix memory leak in jffs2_do_fill_super
If jffs2_iget() or d_make_root() in jffs2_do_fill_super() returns
an error, we can observe the following kmemleak report:

--------------------------------------------
unreferenced object 0xffff888105a65340 (size 64):
  comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff859c45e5>] kmem_cache_alloc_trace+0x475/0x8a0
    [<ffffffff86160146>] jffs2_sum_init+0x96/0x1a0
    [<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
    [<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
    [<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
    [...]
unreferenced object 0xffff8881bd7f0000 (size 65536):
  comm "mount", pid 710, jiffies 4302851558 (age 58.239s)
  hex dump (first 32 bytes):
    bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
    bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb  ................
  backtrace:
    [<ffffffff858579ba>] kmalloc_order+0xda/0x110
    [<ffffffff85857a11>] kmalloc_order_trace+0x21/0x130
    [<ffffffff859c2ed1>] __kmalloc+0x711/0x8a0
    [<ffffffff86160189>] jffs2_sum_init+0xd9/0x1a0
    [<ffffffff86140e25>] jffs2_do_mount_fs+0x745/0x2120
    [<ffffffff86149fec>] jffs2_do_fill_super+0x35c/0x810
    [<ffffffff8614aae9>] jffs2_fill_super+0x2b9/0x3b0
    [...]
--------------------------------------------

This is because the resources allocated in jffs2_sum_init() are not
released. Call jffs2_sum_exit() to release these resources to solve
the problem.

Fixes: e631ddba58 ("[JFFS2] Add erase block summary support (mount time improvement)")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 16:17:11 +02:00
Haowen Bai 22abf318c3 jffs2: Use kzalloc instead of kmalloc/memset
Use kzalloc rather than duplicating its implementation, which
makes code simple and easy to understand.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
[rw: Fixed printk string]
Signed-off-by: Richard Weinberger <richard@nod.at>
2022-05-27 16:12:55 +02:00
Linus Torvalds 780d8ce716 v5.19 pull request
Small collection of incremental improvement patches:
 
 - Minor code cleanup patches, comment improvements, etc from static tools
 
 - Clean the some of the kernel caps, reducing the historical stealth uAPI
   leftovers
 
 - Bug fixes and minor changes for rdmavt, hns, rxe, irdma
 
 - Remove unimplemented cruft from rxe
 
 - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs layer
 
 - flush_workqueue(system_unbound_wq) removal
 
 - Ensure rxe waits for objects to be unused before allowing the core to
   free them
 
 - Several rc quality bug fixes for hfi1
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRRRCHOFoQz/8F5bUaFwuHvBreFYQUCYo+NxgAKCRCFwuHvBreF
 YbqSAQDJ+QolaATUvOQUPLbuLopUCJLe95VS15Kl3SNXiVUUFAEA8DLL1s6+WShd
 AgypUxGHipx5BAytrn45/WiwuDeEbQ8=
 =jgTl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "Small collection of incremental improvement patches:

   - Minor code cleanup patches, comment improvements, etc from static
     tools

   - Clean the some of the kernel caps, reducing the historical stealth
     uAPI leftovers

   - Bug fixes and minor changes for rdmavt, hns, rxe, irdma

   - Remove unimplemented cruft from rxe

   - Reorganize UMR QP code in mlx5 to avoid going through the IB verbs
     layer

   - flush_workqueue(system_unbound_wq) removal

   - Ensure rxe waits for objects to be unused before allowing the core
     to free them

   - Several rc quality bug fixes for hfi1"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (67 commits)
  RDMA/rtrs-clt: Fix one kernel-doc comment
  RDMA/hfi1: Remove all traces of diagpkt support
  RDMA/hfi1: Consolidate software versions
  RDMA/hfi1: Remove pointless driver version
  RDMA/hfi1: Fix potential integer multiplication overflow errors
  RDMA/hfi1: Prevent panic when SDMA is disabled
  RDMA/hfi1: Prevent use of lock before it is initialized
  RDMA/rxe: Fix an error handling path in rxe_get_mcg()
  IB/core: Fix typo in comment
  RDMA/core: Fix typo in comment
  IB/hf1: Fix typo in comment
  IB/qib: Fix typo in comment
  IB/iser: Fix typo in comment
  RDMA/mlx4: Avoid flush_scheduled_work() usage
  IB/isert: Avoid flush_scheduled_work() usage
  RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()
  RDMA/qedr: Remove unnecessary synchronize_irq() before free_irq()
  RDMA/hns: Use hr_reg_read() instead of remaining roce_get_xxx()
  RDMA/hns: Use hr_reg_xxx() instead of remaining roce_set_xxx()
  RDMA/irdma: Add SW mechanism to generate completions on error
  ...
2022-05-26 21:08:40 -07:00
Linus Torvalds 6d29d7fe4f NFSD 5.19 Release Notes
We introduce "courteous server" in this release. Previously NFSD
 would purge open and lock state for an unresponsive client after
 one lease period (typically 90 seconds). Now, after one lease
 period, another client can open and lock those files and the
 unresponsive client's lease is purged; otherwise if the unrespon-
 sive client's open and lock state is uncontended, the server retains
 that open and lock state for up to 24 hours, allowing the client's
 workload to resume after a lengthy network partition.
 
 A longstanding issue with NFSv4 file creation is also addressed.
 Previously a file creation can fail internally, returning an error
 to the client, but leave the newly created file in place as an
 artifact. The file creation code path has been reorganized so that
 internal failures and race conditions are less likely to result in
 an unwanted file creation.
 
 A fault injector has been added to help exercise paths that are run
 during kernel metadata cache invalidation. These caches contain
 information maintained by user space about exported filesystems.
 Many of our test workloads do not trigger cache invalidation.
 
 There is one patch that is needed to support PREEMPT_RT and a fix
 for an ancient "sleep while spin-locked" splat that seems to have
 become easier to hit since v5.18-rc3.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmKPliAACgkQM2qzM29m
 f5dB3BAAorPa2L8xu5P1Ge1oTNogNSOVRkLPDzEkfEwK07ZM2qvz78eMZGkMziJ/
 strorvBWl3SWBlVtTePgNpJUjgYQ75MRRwaX7Qh2WuHeRKm1JlZm0/NId3+zKgbh
 N40QI20jdswWcNDuhidxVFFWurd09GlcM4z1cu8gZLbfthkiUOjZoPiLkXeNcvhk
 7wC9GiueWxHefYQQDAKh1nQS/L0GG1EkzJdJo7WUVAldZ9qVY9LpmJVMRqrBBbta
 XrFYfpeY1zFFDY4Qolyz5PUJSeQuDj9PctlhoZ6B1hp56PD/6yaqVhYXiPxtlALj
 tITtktfiekULZkgfvfvyzssCv+wkbYiaEBZcSSCauR7dkGOmBmajO+cf7vpsERgE
 fbCU8DWGk78SMeehdCrO+26cV37VP+8c2t2Txq/rG5Eq4ZoCi++Hj5poRboFLqb+
 oom+0Ee0LfcAKXkxH5gWTPTblHo49GzGitPZtRzTgZ9uFnVwvEaJ4+t0ij0J8JpL
 HuVtWrg5/REhqpEvOSwF0sRmkYWLTu7KdueGn/iZ8xUi7GHEue01NsVkClohKJcR
 WOjWrbNCNF/LJaG88MX0z5u7IO7s9bOHphd7PJ92vR+4YsehW3uRhk+rNi2ZBqQz
 hzULfu8BiaicV9fdB/hDcMmKQD6U6due2AVVPtxTf5XY+CHQNRY=
 =phE1
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

Pull nfsd updates from Chuck Lever:
 "We introduce 'courteous server' in this release. Previously NFSD would
  purge open and lock state for an unresponsive client after one lease
  period (typically 90 seconds). Now, after one lease period, another
  client can open and lock those files and the unresponsive client's
  lease is purged; otherwise if the unresponsive client's open and lock
  state is uncontended, the server retains that open and lock state for
  up to 24 hours, allowing the client's workload to resume after a
  lengthy network partition.

  A longstanding issue with NFSv4 file creation is also addressed.
  Previously a file creation can fail internally, returning an error to
  the client, but leave the newly created file in place as an artifact.
  The file creation code path has been reorganized so that internal
  failures and race conditions are less likely to result in an unwanted
  file creation.

  A fault injector has been added to help exercise paths that are run
  during kernel metadata cache invalidation. These caches contain
  information maintained by user space about exported filesystems. Many
  of our test workloads do not trigger cache invalidation.

  There is one patch that is needed to support PREEMPT_RT and a fix for
  an ancient 'sleep while spin-locked' splat that seems to have become
  easier to hit since v5.18-rc3"

* tag 'nfsd-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (36 commits)
  NFSD: nfsd_file_put() can sleep
  NFSD: Add documenting comment for nfsd4_release_lockowner()
  NFSD: Modernize nfsd4_release_lockowner()
  NFSD: Fix possible sleep during nfsd4_release_lockowner()
  nfsd: destroy percpu stats counters after reply cache shutdown
  nfsd: Fix null-ptr-deref in nfsd_fill_super()
  nfsd: Unregister the cld notifier when laundry_wq create failed
  SUNRPC: Use RMW bitops in single-threaded hot paths
  NFSD: Clean up the show_nf_flags() macro
  NFSD: Trace filecache opens
  NFSD: Move documenting comment for nfsd4_process_open2()
  NFSD: Fix whitespace
  NFSD: Remove dprintk call sites from tail of nfsd4_open()
  NFSD: Instantiate a struct file when creating a regular NFSv4 file
  NFSD: Clean up nfsd_open_verified()
  NFSD: Remove do_nfsd_create()
  NFSD: Refactor NFSv4 OPEN(CREATE)
  NFSD: Refactor NFSv3 CREATE
  NFSD: Refactor nfsd_create_setattr()
  NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
  ...
2022-05-26 20:52:24 -07:00
Darrick J. Wong efc2efeba1 xfs: move xfs_attr_use_log_assist usage out of libxfs
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that sets up logged xattr updates out of libxfs and into
xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:34:04 +10:00
Darrick J. Wong d9c61ccb3b xfs: move xfs_attr_use_log_assist out of xfs_log.c
The LARP patchset added an awkward coupling point between libxfs and
what would be libxlog, if the XFS log were actually its own library.
Move the code that enables logged xattr updates out of "lib"xlog and into
xfs_xattr.c so that it no longer has to know about xlog_* functions.

While we're at it, give xfs_xattr.c its own header file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:33:29 +10:00
Darrick J. Wong 202865cc21 xfs: warn about LARP once per mount
Since LARP is an experimental debug-only feature, we should try to warn
about it being in use once per mount, not once per reboot.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:32:07 +10:00
Darrick J. Wong df5660cf63 xfs: implement per-mount warnings for scrub and shrink usage
Currently, we don't have a consistent story around logging when an
EXPERIMENTAL feature gets turned on at runtime -- online fsck and shrink
log a message once per day across all mounts, and the recently merged
LARP mode only ever does it once per insmod cycle or reboot.

Because EXPERIMENTAL tags are supposed to go away eventually, convert
the existing daily warnings into state flags that travel with the mount,
and warn once per mount.  Making this an opstate flag means that we'll
be able to capture the experimental usage in the ftrace output too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:31:34 +10:00
Darrick J. Wong 374037966d xfs: don't log every time we clear the log incompat flags
There's no need to spam the logs every time we clear the log incompat
flags -- if someone is periodically using one of these features, they'll
be cleared every time the log tries to clean itself, which can get
pretty chatty:

$ dmesg | grep -i clear
[ 5363.894711] XFS (sdd): Clearing log incompat feature flags.
[ 5365.157516] XFS (sdd): Clearing log incompat feature flags.
[ 5369.388543] XFS (sdd): Clearing log incompat feature flags.
[ 5371.281246] XFS (sdd): Clearing log incompat feature flags.

These aren't high value messages either -- nothing's gone wrong, and
nobody's trying anything tricky.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:29:51 +10:00
Darrick J. Wong 910bbdf2f4 xfs: convert buf_cancel_table allocation to kmalloc_array
While we're messing around with how recovery allocates and frees the
buffer cancellation table, convert the allocation to use kmalloc_array
instead of the old kmem_alloc APIs, and make it handle a null return,
even though that's not likely.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:27:19 +10:00
Darrick J. Wong 8db074bd84 xfs: don't leak xfs_buf_cancel structures when recovery fails
If log recovery fails, we free the memory used by the buffer
cancellation buckets, but we don't actually traverse each bucket list to
free the individual xfs_buf_cancel objects.  This leads to a memory
leak, as reported by kmemleak in xfs/051:

unreferenced object 0xffff888103629560 (size 32):
  comm "mount", pid 687045, jiffies 4296935916 (age 10.752s)
  hex dump (first 32 bytes):
    08 d3 0a 01 00 00 00 00 08 00 00 00 01 00 00 00  ................
    d0 f5 0b 92 81 88 ff ff 80 64 64 25 81 88 ff ff  .........dd%....
  backtrace:
    [<ffffffffa0317c83>] kmem_alloc+0x73/0x140 [xfs]
    [<ffffffffa03234a9>] xlog_recover_buf_commit_pass1+0x139/0x200 [xfs]
    [<ffffffffa032dc27>] xlog_recover_commit_trans+0x307/0x350 [xfs]
    [<ffffffffa032df15>] xlog_recovery_process_trans+0xa5/0xe0 [xfs]
    [<ffffffffa032e12d>] xlog_recover_process_data+0x8d/0x140 [xfs]
    [<ffffffffa032e49d>] xlog_do_recovery_pass+0x19d/0x740 [xfs]
    [<ffffffffa032f22d>] xlog_do_log_recovery+0x6d/0x150 [xfs]
    [<ffffffffa032f343>] xlog_do_recover+0x33/0x1d0 [xfs]
    [<ffffffffa032faba>] xlog_recover+0xda/0x190 [xfs]
    [<ffffffffa03194bc>] xfs_log_mount+0x14c/0x360 [xfs]
    [<ffffffffa030bfed>] xfs_mountfs+0x50d/0xa60 [xfs]
    [<ffffffffa03124b5>] xfs_fs_fill_super+0x6a5/0x950 [xfs]
    [<ffffffff812b92a5>] get_tree_bdev+0x175/0x280
    [<ffffffff812b7c3a>] vfs_get_tree+0x1a/0x80
    [<ffffffff812e366f>] path_mount+0x6ff/0xaa0
    [<ffffffff812e3b13>] __x64_sys_mount+0x103/0x140

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:26:38 +10:00
Darrick J. Wong 2723234923 xfs: refactor buffer cancellation table allocation
Move the code that allocates and frees the buffer cancellation tables
used by log recovery into the file that actually uses the tables.  This
is a precursor to some cleanups and a memory leak fix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:26:17 +10:00
Darrick J. Wong a54f78def7 xfs: don't leak btree cursor when insrec fails after a split
The recent patch to improve btree cycle checking caused a regression
when I rebased the in-memory btree branch atop the 5.19 for-next branch,
because in-memory short-pointer btrees do not have AG numbers.  This
produced the following complaint from kmemleak:

unreferenced object 0xffff88803d47dde8 (size 264):
  comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
  hex dump (first 32 bytes):
    90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff  .M..............
    e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff  .D:.............
  backtrace:
    [<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
    [<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
    [<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
    [<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
    [<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
    [<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
    [<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
    [<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
    [<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
    [<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
    [<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
    [<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
    [<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
    [<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
    [<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
    [<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]

I noticed that xfs_btree_insrec has a bunch of debug code that return
out of the function immediately, without freeing the "new" btree cursor
that can be returned when _make_block_unfull calls xfs_btree_split.  Fix
the error return in this function to free the btree cursor.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:22:56 +10:00
Darrick J. Wong 86d40f1e49 xfs: purge dquots after inode walk fails during quotacheck
xfs/434 and xfs/436 have been reporting occasional memory leaks of
xfs_dquot objects.  These tests themselves were the messenger, not the
culprit, since they unload the xfs module, which trips the slub
debugging code while tearing down all the xfs slab caches:

=============================================================================
BUG xfs_dquot (Tainted: G        W        ): Objects remaining in xfs_dquot on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff)
CPU: 0 PID: 3953166 Comm: modprobe Tainted: G        W         5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014

Since we don't generally rmmod the xfs module between fstests, this
means that xfs/434 is really just the canary in the coal mine --
something leaked a dquot, but we don't know who.  After days of pounding
on fstests with kmemleak enabled, I finally got it to spit this out:

unreferenced object 0xffff8880465654c0 (size 536):
  comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s)
  hex dump (first 32 bytes):
    60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff  `JVF....X..\....
    00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00  ..RI............
  backtrace:
    [<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs]
    [<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs]
    [<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs]
    [<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs]
    [<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs]
    [<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs]
    [<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs]
    [<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs]
    [<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs]
    [<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs]
    [<ffffffff81170bb2>] process_one_work+0x672/0x1040
    [<ffffffff81171b1b>] worker_thread+0x59b/0xec0
    [<ffffffff8118711e>] kthread+0x29e/0x340
    [<ffffffff810032bf>] ret_from_fork+0x1f/0x30

Now we know that quotacheck is at fault, but even this report was
canaryish -- it was triggered by xfs/494, which doesn't actually mount
any filesystems.  (kmemleak can be a little slow to notice leaks, even
with fstests repeatedly whacking it to look for them.)  Looking at the
*previous* fstest, however, showed that the test run before xfs/494 was
xfs/117.  The tipoff to the problem is in this excerpt from dmesg:

XFS (sda4): Quotacheck needed: Please wait.
XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode
XFS (sda4): Unmount and run xfs_repair
XFS (sda4): First 128 bytes of corrupted metadata buffer:
00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00  IN..............
00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68  ..........WTT.Lh
00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00  ..}.m...4.}.m...
00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00  4.}.m...........
00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab  ................
00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03  .....W{.........
00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08  ................
XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas.

The dinode verifier decided that the inode was corrupt, which causes
iget to return with EFSCORRUPTED.  Since this happened during
quotacheck, it is obvious that the kernel aborted the inode walk on
account of the corruption error and disabled quotas.  Unfortunately, we
neglect to purge the dquot cache before doing that, which is how the
dquots leaked.

The problems started 10 years ago in commit b84a3a, when the dquot lists
were converted to a radix tree, but the error handling behavior was not
correctly preserved -- in that commit, if the bulkstat failed and
usrquota was enabled, the bulkstat failure code would be overwritten by
the result of flushing all the dquots to disk.  As long as that
succeeds, we'd continue the quota mount as if everything were ok, but
instead we're now operating with a corrupt inode and incorrect quota
usage counts.  I didn't notice this bug in 2019 when I wrote commit
ebd126a, which changed quotacheck to skip the dqflush when the scan
doesn't complete due to inode walk failures.

Introduced-by: b84a3a9675 ("xfs: remove the per-filesystem list of dquots")
Fixes: ebd126a651 ("xfs: convert quotacheck to use the new iwalk functions")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:21:43 +10:00
Dave Chinner 56486f3071 xfs: assert in xfs_btree_del_cursor should take into account error
xfs/538 on a 1kB block filesystem failed with this assert:

XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448

The problem was that an allocation failed unexpectedly in
xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
injections, resulting in an EFSCORRUPTED error being returned to
xfs_bmapi_write(). The error occurred on extent-to-btree format
conversion allocating the new root block:

 RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
 Call Trace:
  <TASK>
  xfs_btree_new_iroot+0xdf/0x520
  xfs_btree_make_block_unfull+0x10d/0x1c0
  xfs_btree_insrec+0x364/0x790
  xfs_btree_insert+0xaa/0x210
  xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
  xfs_bmapi_allocate+0x34c/0x420
  xfs_bmapi_write+0x53c/0x9c0
  xfs_alloc_file_space+0xee/0x320
  xfs_file_fallocate+0x36b/0x450
  vfs_fallocate+0x148/0x340
  __x64_sys_fallocate+0x3c/0x70
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xa

Why the allocation failed at this point is unknown, but is likely
that we ran the transaction out of reserved space and filesystem out
of space with bmbt blocks because of all the minlen allocations
being done causing worst case fragmentation of a large allocation.

Regardless of the cause, we've then called xfs_bmapi_finish() which
calls xfs_btree_del_cursor(cur, error) to tear down the cursor.

So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
and the filesystem is still up. The assert fails to take into
account that allocation can fail with an error and the transaction
teardown will shut the filesystem down if necessary. i.e. the
assert needs to check "|| error != 0" as well, because at this point
shutdown is pending because the current transaction is dirty....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:21:09 +10:00
Dave Chinner 5b55cbc2d7 xfs: don't assert fail on perag references on teardown
Not fatal, the assert is there to catch developer attention. I'm
seeing this occasionally during recoveryloop testing after a
shutdown, and I don't want this to stop an overnight recoveryloop
run as it is currently doing.

Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
corruption report into the log and cause a test failure that way,
but it won't stop the machine dead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:21:04 +10:00
Dave Chinner 5672225e8f xfs: avoid unnecessary runtime sibling pointer endian conversions
Commit dc04db2aa7 has caused a small aim7 regression, showing a
small increase in CPU usage in __xfs_btree_check_sblock() as a
result of the extra checking.

This is likely due to the endian conversion of the sibling poitners
being unconditional instead of relying on the compiler to endian
convert the NULL pointer at compile time and avoiding the runtime
conversion for this common case.

Rework the checks so that endian conversion of the sibling pointers
is only done if they are not null as the original code did.

.... and these need to be "inline" because the compiler completely
fails to inline them automatically like it should be doing.

$ size fs/xfs/libxfs/xfs_btree.o*
   text	   data	    bss	    dec	    hex	filename
  51874	    240	      0	  52114	   cb92 fs/xfs/libxfs/xfs_btree.o.orig
  51562	    240	      0	  51802	   ca5a fs/xfs/libxfs/xfs_btree.o.inline

Just when you think the tools have advanced sufficiently we don't
have to care about stuff like this anymore, along comes a reminder
that *our tools still suck*.

Fixes: dc04db2aa7 ("xfs: detect self referencing btree sibling pointers")
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2022-05-27 10:20:45 +10:00
Linus Torvalds 44d35720c9 sysctl changes for v5.19-rc1
For two kernel releases now kernel/sysctl.c has been being cleaned up
 slowly, since the tables were grossly long, sprinkled with tons of #ifdefs and
 all this caused merge conflicts with one susbystem or another.
 
 This tree was put together to help try to avoid conflicts with these cleanups
 going on different trees at time. So nothing exciting on this pull request,
 just cleanups.
 
 I actually had this sysctl-next tree up since v5.18 but I missed sending a
 pull request for it on time during the last merge window. And so these changes
 have been being soaking up on sysctl-next and so linux-next for a while.
 The last change was merged May 4th.
 
 Most of the compile issues were reported by 0day and fixed.
 
 To help avoid a conflict with bpf folks at Daniel Borkmann's request
 I merged bpf-next/pr/bpf-sysctl into sysctl-next to get the effor which
 moves the BPF sysctls from kernel/sysctl.c to BPF core.
 
 Possible merge conflicts and known resolutions as per linux-next:
 
 bfp:
 https://lkml.kernel.org/r/20220414112812.652190b5@canb.auug.org.au
 
 rcu:
 https://lkml.kernel.org/r/20220420153746.4790d532@canb.auug.org.au
 
 powerpc:
 https://lkml.kernel.org/r/20220520154055.7f964b76@canb.auug.org.au
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmKOq8ASHG1jZ3JvZkBr
 ZXJuZWwub3JnAAoJEM4jHQowkoinDAkQAJVo5YVM9f74UwYp4PQhTpjxJBCjRoZD
 z1u9bp5rMj2ujTC8Fr7VmzKaHrb8+r1C1WvCvZtIzemYNB4lZUrHpVDYfXuXiPRB
 ihPmEjhlPO5PFBx6cVCpI3cu9bEhG00rLc1QXnABx/pXwNPcOTJAGZJVamZvqubk
 chjgZrb7N+adHPfvS55v1+zpwdeKfpp5U3zuu5qlT/nn0GS0HCVzOj5fj4oC4wtJ
 IqfUubo+FX50Ga58yQABWNrjaPD9Crykz5ohVazy3ElQl0hJ4VsK65ct3blqc2vz
 1Bb8kPpWuv6aZ5nr1lCVE8qvF4ZIL33ySvpg5BSdWLQEDrBbSpzvJe9Yn7wgR+eq
 y7fhpO24+zRM82EoDMEvyxX9u1n1RsvoXRtf3ds9BGf63MUxk8a1cgjlU6vuyO2U
 JhDmfM1xzdKvPoY4COOnHzcAiIqzItTqKd09N5y0cahmYstROU8lvp9huhTAHqk1
 SjQMbLIZG7OnX8ZeQcR1EB8sq/IOPZT48ejj0iJmQ8FyMaep71MOQLYyLPAq4lgh
 JHXm8P6QdB57jfJbqAeNSyZoK0qdxOUR/83Zcah7Jjns6vkju1DNatEsaEEI2y2M
 4n7/rkHeZ3TyFHBUX4e9FomKvGLsAalDBRiqsuxLSOPMU8rGrNLAslOAtKwvp90X
 4ht3M2VP098l
 =btwh
 -----END PGP SIGNATURE-----

Merge tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux

Pull sysctl updates from Luis Chamberlain:
 "For two kernel releases now kernel/sysctl.c has been being cleaned up
  slowly, since the tables were grossly long, sprinkled with tons of
  #ifdefs and all this caused merge conflicts with one susbystem or
  another.

  This tree was put together to help try to avoid conflicts with these
  cleanups going on different trees at time. So nothing exciting on this
  pull request, just cleanups.

  Thanks a lot to the Uniontech and Huawei folks for doing some of this
  nasty work"

* tag 'sysctl-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (28 commits)
  sched: Fix build warning without CONFIG_SYSCTL
  reboot: Fix build warning without CONFIG_SYSCTL
  kernel/kexec_core: move kexec_core sysctls into its own file
  sysctl: minor cleanup in new_dir()
  ftrace: fix building with SYSCTL=y but DYNAMIC_FTRACE=n
  fs/proc: Introduce list_for_each_table_entry for proc sysctl
  mm: fix unused variable kernel warning when SYSCTL=n
  latencytop: move sysctl to its own file
  ftrace: fix building with SYSCTL=n but DYNAMIC_FTRACE=y
  ftrace: Fix build warning
  ftrace: move sysctl_ftrace_enabled to ftrace.c
  kernel/do_mount_initrd: move real_root_dev sysctls to its own file
  kernel/delayacct: move delayacct sysctls to its own file
  kernel/acct: move acct sysctls to its own file
  kernel/panic: move panic sysctls to its own file
  kernel/lockdep: move lockdep sysctls to its own file
  mm: move page-writeback sysctls to their own file
  mm: move oom_kill sysctls to their own file
  kernel/reboot: move reboot sysctls to its own file
  sched: Move energy_aware sysctls to topology.c
  ...
2022-05-26 16:57:20 -07:00
Linus Torvalds 98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Chuck Lever 08af54b3e5 NFSD: nfsd_file_put() can sleep
Now that there are no more callers of nfsd_file_put() that might
hold a spin lock, ensure the lockdep infrastructure can catch
newly introduced calls to nfsd_file_put() made while a spinlock
is held.

Link: https://lore.kernel.org/linux-nfs/ece7fd1d-5fb3-5155-54ba-347cfc19bd9a@oracle.com/T/#mf1855552570cf9a9c80d1e49d91438cd9085aada
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
2022-05-26 10:50:51 -04:00
Chuck Lever 043862b09c NFSD: Add documenting comment for nfsd4_release_lockowner()
And return explicit nfserr values that match what is documented in the
new comment / API contract.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-26 10:50:50 -04:00
Chuck Lever bd8fdb6e54 NFSD: Modernize nfsd4_release_lockowner()
Refactor: Use existing helpers that other lock operations use. This
change removes several automatic variables, so re-organize the
variable declarations for readability.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2022-05-26 10:50:50 -04:00
Chuck Lever ce3c4ad7f4 NFSD: Fix possible sleep during nfsd4_release_lockowner()
nfsd4_release_lockowner() holds clp->cl_lock when it calls
check_for_locks(). However, check_for_locks() calls nfsd_file_get()
/ nfsd_file_put() to access the backing inode's flc_posix list, and
nfsd_file_put() can sleep if the inode was recently removed.

Let's instead rely on the stateowner's reference count to gate
whether the release is permitted. This should be a reliable
indication of locks-in-use since file lock operations and
->lm_get_owner take appropriate references, which are released
appropriately when file locks are removed.

Reported-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org
2022-05-26 10:50:49 -04:00
Linus Torvalds babf0bb978 xfs: Changes for 5.19-rc1
This update includes:
 - support for printk message indexing.
 - large extent counts to provide support for up to 2^47 data extents and 2^32
   attribute extents, allowing us to scale beyond 4 billion data extents
   to billions of xattrs per inode.
 - conversion of various flags fields to be consistently declared as
   unsigned bit fields.
 - improvements to realtime extent accounting and converts them to per-cpu
   counters to match all the other block and inode accounting.
 - reworks core log formatting code to reduce iterations, have a shorter, cleaner
   fast path and generally be easier to understand and maintain.
 - improvements to rmap btree searches that reduce overhead by up
   to 30% resulting in xfs_scrub runtime reductions of 15%.
 - improvements to reflink that remove the size limitations in remapping operations
   and greatly reduce the size of transaction reservations.
 - reworks the minimum log size calculations to allow us to change transaction
   reservations without changing the minimum supported log size.
 - removal of quota warning support as it has never been used on Linux.
 - intent whiteouts to allow us to cancel intents that are completed entirely
   in memory rather than having use CPU and disk bandwidth formatting and writing
   them into the journal when it is not necessary. This makes rmap, reflink and
   extent freeing slightly more efficient, but provides massive improvements
   for....
 - Logged Attribute Replay feature support. This is a fundamental change to the
   way we modify attributes, laying the foundation for future integration of
   attribute modifications as part of other atomic transactional operations the
   filesystem performs.
 - Lots of cleanups and fixes for the logged attribute replay functionality.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEmJOoJ8GffZYWSjj/regpR/R1+h0FAmKO2lIUHGRhdmlkQGZy
 b21vcmJpdC5jb20ACgkQregpR/R1+h0cYRAAutdpA5BZzfgpqnRbmjkOzCmhp6xj
 mSB6A8iBvlhtfY8p0IFFSbTT6jnf+EWfnsjy/jopojhhz5vCqYKfhGM6P9KBHxfz
 amxfmWZd3XWcnc8Ay9hcjLIa7QLQr8PXh3zJhjiYm8PvsrtNzsiEKrh6lxG6pe0w
 vQiq062ColCdN5DcuFVtfScsynCrzZCbUWFGm3y27NF00JpLdm8aBO57/ZaSFVdA
 UKKsogoPUNkRIbmf81IjTWTx2f0syNQyjrK+CX0sxGb6nzcoU/dT8qQ5t/U5gPTc
 cGpHE6vyBLdNA6BlnrFBoVAQ/M8n+ixnYy7XytZuTL5Izo80N+Vo+U5d1nLvC+fn
 ZLKAxbtpudqjy2O393Nv0cqEkT/xPUy2x3IvNL1rKXlQmNWt+KFGuiNrE+y2W4WT
 1bfbnmUJi0Knde4MD43iImwwaocXXdtVkED9f68aknZLCihqGEoi1EmU1Sr4+Wbj
 D8lXZe4BZfGVCHoA2sDtgJsATAG5rdBu/Y6lJcEfUSblvwF2Ufh0r9ehieDrnGmq
 asCTuXmIX/AzUQDa7JjgAzo2sgdhI+nOIPWJeKDVHRdpFjq+7xV573Iqa77Brik9
 DNxAMATh5bZc+9paDib8Za55yE7NJO1cM/UJkwwqn3rvbV5hYki0XZvlKZQsJGig
 ur5otF9Sdz+AcmE=
 =yUEM
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Dave Chinner:
 "This is a big update with lots of new code. The summary below them
  all, so I'll just touch on teh higlights. The two main new features
  are Large Extent Counts and Logged Attribute Replay - these are two
  new foundational features that we are building more complex future
  features on top of.

  For upcoming functionality, we need to be able to store hundreds of
  millions of xattrs per inode. The Large Extent Count feature removes
  the limits that prevent this scale of xattr storage, and while we were
  modifying the on disk extent count format we also increased the number
  of data extents we support per inode from 2^32 to 2^47.

  We also need to be able to modify xattrs as part of larger atomic
  transactions rather than as standalone transactions. The Logged
  Attribute Replay feature introduces the infrastructure that allows us
  to use intents to record the attribute modifications in the journal
  before we start them, hence allowing other atomic transactions to log
  attribute modification intents and then defer the actual modification
  to later. If we then crash, log recovery then guarantees that the
  attribute is replayed in the context of the atomic transaction that
  logged the intent.

  A significant chunk of the commits in this merge are for the base
  attribute replay functionality along with fixes, improvements and
  cleanups related to this new functioanlity. Allison deserves a big
  round of thanks for her ongoing work to get this functionality into
  XFS.

  There are also many other smaller changes and improvements, so overall
  this is one of the bigger XFS merge requests in some time.

  I will be following up next week with another smaller pull request -
  we already have another round of fixes and improvements to the logged
  attribute replay functionality just about ready to go. They'll soak
  and test over the next week, and I'll send a pull request for them
  near the end of the merge window.

  Summary:

   - support for printk message indexing.

   - large extent counts to provide support for up to 2^47 data extents
     and 2^32 attribute extents, allowing us to scale beyond 4 billion
     data extents to billions of xattrs per inode.

   - conversion of various flags fields to be consistently declared as
     unsigned bit fields.

   - improvements to realtime extent accounting and converts them to
     per-cpu counters to match all the other block and inode accounting.

   - reworks core log formatting code to reduce iterations, have a
     shorter, cleaner fast path and generally be easier to understand
     and maintain.

   - improvements to rmap btree searches that reduce overhead by up to
     30% resulting in xfs_scrub runtime reductions of 15%.

   - improvements to reflink that remove the size limitations in
     remapping operations and greatly reduce the size of transaction
     reservations.

   - reworks the minimum log size calculations to allow us to change
     transaction reservations without changing the minimum supported log
     size.

   - removal of quota warning support as it has never been used on
     Linux.

   - intent whiteouts to allow us to cancel intents that are completed
     entirely in memory rather than having use CPU and disk bandwidth
     formatting and writing them into the journal when it is not
     necessary. This makes rmap, reflink and extent freeing slightly
     more efficient, but provides massive improvements for....

   - Logged Attribute Replay feature support. This is a fundamental
     change to the way we modify attributes, laying the foundation for
     future integration of attribute modifications as part of other
     atomic transactional operations the filesystem performs.

   - Lots of cleanups and fixes for the logged attribute replay
     functionality"

* tag 'xfs-5.19-for-linus' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (124 commits)
  xfs: can't use kmem_zalloc() for attribute buffers
  xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify
  xfs: ATTR_REPLACE algorithm with LARP enabled needs rework
  xfs: use XFS_DA_OP flags in deferred attr ops
  xfs: remove xfs_attri_remove_iter
  xfs: switch attr remove to xfs_attri_set_iter
  xfs: introduce attr remove initial states into xfs_attr_set_iter
  xfs: xfs_attr_set_iter() does not need to return EAGAIN
  xfs: clean up final attr removal in xfs_attr_set_iter
  xfs: remote xattr removal in xfs_attr_set_iter() is conditional
  xfs: XFS_DAS_LEAF_REPLACE state only needed if !LARP
  xfs: split remote attr setting out from replace path
  xfs: consolidate leaf/node states in xfs_attr_set_iter
  xfs: kill XFS_DAC_LEAF_ADDNAME_INIT
  xfs: separate out initial attr_set states
  xfs: don't set quota warning values
  xfs: remove warning counters from struct xfs_dquot_res
  xfs: remove quota warning limit from struct xfs_quota_limits
  xfs: rework deferred attribute operation setup
  xfs: make xattri_leaf_bp more useful
  ...
2022-05-25 19:34:40 -07:00
Linus Torvalds e375780b63 \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmKOIC0ACgkQnJ2qBz9k
 QNmmqwf+PlZrxoXoDxxw5LdXnIXj6qwN5p/5mKDmKt7CPU8Vt5Reb8GA3b2OcUj2
 XaqQLOpEVrGW9nVKgKzUIujJtK9Sa4IlHSuwYGN3ZTYnsh0rT7VhIyfVNn2Zngo9
 juDHaGrE+g2c8hz3eUGrnkIeiHy/Ny0QEHLjxaXzYYpx3XInzGSmMS3/4/I8tFyr
 G/g1KasTTeBMR3aVh0pt4TvT/p7E/BJL3fFVrsQyeFBFrxisUennUtmK9ngcU7CH
 Y7hEl8CYMNXfm06ZH6Dt1oX9BzFjU9x18kOYAVhpuhzIA3VViL1iWPbyK/8xl1eZ
 PIRsOdDyVWtlcZdkmmlHc9Bnrj4AFA==
 =e7PC
 -----END PGP SIGNATURE-----

Merge tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull fsnotify updates from Jan Kara:
 "The biggest part of this is support for fsnotify inode marks that
  don't pin inodes in memory but rather get evicted together with the
  inode (they are useful if userspace needs to exclude receipt of events
  from potentially large subtrees using fanotify ignore marks).

  There is also a fix for more consistent handling of events sent to
  parent and a fix of sparse(1) complaints"

* tag 'fsnotify_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fanotify: fix incorrect fmode_t casts
  fsnotify: consistent behavior for parent not watching children
  fsnotify: introduce mark type iterator
  fanotify: enable "evictable" inode marks
  fanotify: use fsnotify group lock helpers
  fanotify: implement "evictable" inode marks
  fanotify: factor out helper fanotify_mark_update_flags()
  fanotify: create helper fanotify_mark_user_flags()
  fsnotify: allow adding an inode mark without pinning inode
  dnotify: use fsnotify group lock helpers
  nfsd: use fsnotify group lock helpers
  audit: use fsnotify group lock helpers
  inotify: use fsnotify group lock helpers
  fsnotify: create helpers for group mark_mutex lock
  fsnotify: make allow_dups a property of the group
  fsnotify: pass flags argument to fsnotify_alloc_group()
  fsnotify: fix wrong lockdep annotations
  inotify: move control flags from mask to mark flags
  inotify: show inotify mask flags in proc fdinfo
2022-05-25 19:29:54 -07:00
Linus Torvalds 8b728edc5b \n
-----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmKOH70ACgkQnJ2qBz9k
 QNkRBwgAxhAQmdNGzsMmE9wlF/Dj9VyGioug9thdUBMfEL6xXr1/Gf0e+fMLFFSr
 h2niti2jcuAuYSMqJNuss1Qdo8rWuGbKvjEyqzDfgJ1VDy6oGD14i1qPbKUv1/LN
 AkH4kClQpsltPYYOq9gkagOwRtHaMPCRc1HreqMGk/rDqCe26vZ4xkp2zr328Dt7
 JVdiHNnLj5d03xt3LWgchtqptdaRHJm8ymvICNcIxV3Ievxs+p21BhA/2m/v22/i
 sG/Kg9g7+bQ5sTI+DQHCjodylm9PciFviTRPc3xOkjXDLQ/Md63RlNoHja6uCrPD
 OxvH/L9rZvJ/yfSrg7ztDxcB1vGT/w==
 =UWYT
 -----END PGP SIGNATURE-----

Merge tag 'fs_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs

Pull writeback and ext2 cleanups from Jan Kara:
 "One small ext2 cleanup and one writeback spelling fix"

* tag 'fs_for_v5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  writeback: fix typo in comment
  fs: ext2: Fix duplicate included linux/dax.h
2022-05-25 19:24:06 -07:00
Linus Torvalds 62e5873ec9 Misc hardening changes for 5.19-rc1
Hi Linus,
 
 Please, pull the following hardening changes that I've been collecting
 in my tree during the last development cycle. All of them have been
 baking in linux-next.
 
 Replace open-coded instances with size_t saturating arithmetic helpers:
 
 - virt: acrn: Prefer array_size and struct_size over open coded arithmetic (Len Baker)
 - afs: Prefer struct_size over open coded arithmetic (Len Baker)
 
 Thanks
 --
 Gustavo
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmKNUvYACgkQRwW0y0cG
 2zGm1g//YZYWBEzqOncd1K4/TBzbMytxE7oQYqUiS4e3V5D1eiT8924BLr7iuWMd
 c64Lv0trjyHCjdqBxGLMz5cv5EZd206xS3bqIQ8gqcKHOK/z7ob/c7YJxdUcXaWK
 oyGY65Arg9XnuRIJvNeiwdJNpPGsSyI8BSRC6gf6NFYke/GNIgv3rxmVBSbH8V7W
 QVBWf3AWR8+ZlXJFHWEfIAMRwFtumAhak3ZpN/4SDWM6cw3PIm1TFEO5Jx/rWaNn
 LZth+91uJU6UeSrhrME/QQ9Q1kgmLUMqzW3U/HpMky7Ugi0ffWIHc83XYDOgrFRJ
 Br5KLL4c5SHMJfPXqHMsbRNG/bI6fJfd2A0vUDIQMdy4kX88MAzTsUzSiGe6gAAc
 a3RQV4qUkBvC53glZm4pb0IGY4j3vO1tf6Hd9BdxH4knpfQzVNTAwHXUxDq8kDBu
 Bir3B60qAktszxdp/QukmfoTRpVPSV732TMgIllVPFtFD/KYCiYFhSILPouh5JWJ
 3TP8ND1hKrV3P9FVirlJxdFLcADu8dyrmwxKdxZVa6ek9xtr0gt111FYcTeZG+eS
 PaJ7N8G+OkvYktv14/+0DBgIQdeHQ/kp6D4eTjQZ6y0k0b7uAI3HRv7fxWh8eyOw
 RmaE8jQCHokUnerouBUIGEwAmyFhN3atogqIHczB4ebBNbEeOGg=
 =22O8
 -----END PGP SIGNATURE-----

Merge tag 'size_t-saturating-helpers-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull misc hardening updates from Gustavo Silva:
 "Replace a few open-coded instances with size_t saturating arithmetic
  helpers"

* tag 'size_t-saturating-helpers-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux:
  virt: acrn: Prefer array_size and struct_size over open coded arithmetic
  afs: Prefer struct_size over open coded arithmetic
2022-05-25 13:56:57 -07:00
Junxiao Bi via Ocfs2-devel 863e0d81b6 ocfs2: dlmfs: fix error handling of user_dlm_destroy_lock
When user_dlm_destroy_lock failed, it didn't clean up the flags it set
before exit.  For USER_LOCK_IN_TEARDOWN, if this function fails because of
lock is still in used, next time when unlink invokes this function, it
will return succeed, and then unlink will remove inode and dentry if lock
is not in used(file closed), but the dlm lock is still linked in dlm lock
resource, then when bast come in, it will trigger a panic due to
user-after-free.  See the following panic call trace.  To fix this,
USER_LOCK_IN_TEARDOWN should be reverted if fail.  And also error should
be returned if USER_LOCK_IN_TEARDOWN is set to let user know that unlink
fail.

For the case of ocfs2_dlm_unlock failure, besides USER_LOCK_IN_TEARDOWN,
USER_LOCK_BUSY is also required to be cleared.  Even though spin lock is
released in between, but USER_LOCK_IN_TEARDOWN is still set, for
USER_LOCK_BUSY, if before every place that waits on this flag,
USER_LOCK_IN_TEARDOWN is checked to bail out, that will make sure no flow
waits on the busy flag set by user_dlm_destroy_lock(), then we can
simplely revert USER_LOCK_BUSY when ocfs2_dlm_unlock fails.  Fix
user_dlm_cluster_lock() which is the only function not following this.

[  941.336392] (python,26174,16):dlmfs_unlink:562 ERROR: unlink
004fb0000060000b5a90b8c847b72e1, error -16 from destroy
[  989.757536] ------------[ cut here ]------------
[  989.757709] kernel BUG at fs/ocfs2/dlmfs/userdlm.c:173!
[  989.757876] invalid opcode: 0000 [#1] SMP
[  989.758027] Modules linked in: ksplice_2zhuk2jr_ib_ipoib_new(O)
ksplice_2zhuk2jr(O) mptctl mptbase xen_netback xen_blkback xen_gntalloc
xen_gntdev xen_evtchn cdc_ether usbnet mii ocfs2 jbd2 rpcsec_gss_krb5
auth_rpcgss nfsv4 nfsv3 nfs_acl nfs fscache lockd grace ocfs2_dlmfs
ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs bnx2fc
fcoe libfcoe libfc scsi_transport_fc sunrpc ipmi_devintf bridge stp llc
rds_rdma rds bonding ib_sdp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm falcon_lsm_serviceable(PE) falcon_nf_netcontain(PE)
mlx4_vnic falcon_kal(E) falcon_lsm_pinned_13402(E) mlx4_ib ib_sa ib_mad
ib_core ib_addr xenfs xen_privcmd dm_multipath iTCO_wdt iTCO_vendor_support
pcspkr sb_edac edac_core i2c_i801 lpc_ich mfd_core ipmi_ssif i2c_core ipmi_si
ipmi_msghandler
[  989.760686]  ioatdma sg ext3 jbd mbcache sd_mod ahci libahci ixgbe dca ptp
pps_core vxlan udp_tunnel ip6_udp_tunnel megaraid_sas mlx4_core crc32c_intel
be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi ipv6 cxgb3 mdio
libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi wmi
dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
ksplice_2zhuk2jr_ib_ipoib_old]
[  989.761987] CPU: 10 PID: 19102 Comm: dlm_thread Tainted: P           OE
4.1.12-124.57.1.el6uek.x86_64 #2
[  989.762290] Hardware name: Oracle Corporation ORACLE SERVER
X5-2/ASM,MOTHERBOARD,1U, BIOS 30350100 06/17/2021
[  989.762599] task: ffff880178af6200 ti: ffff88017f7c8000 task.ti:
ffff88017f7c8000
[  989.762848] RIP: e030:[<ffffffffc07d4316>]  [<ffffffffc07d4316>]
__user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
[  989.763185] RSP: e02b:ffff88017f7cbcb8  EFLAGS: 00010246
[  989.763353] RAX: 0000000000000000 RBX: ffff880174d48008 RCX:
0000000000000003
[  989.763565] RDX: 0000000000120012 RSI: 0000000000000003 RDI:
ffff880174d48170
[  989.763778] RBP: ffff88017f7cbcc8 R08: ffff88021f4293b0 R09:
0000000000000000
[  989.763991] R10: ffff880179c8c000 R11: 0000000000000003 R12:
ffff880174d48008
[  989.764204] R13: 0000000000000003 R14: ffff880179c8c000 R15:
ffff88021db7a000
[  989.764422] FS:  0000000000000000(0000) GS:ffff880247480000(0000)
knlGS:ffff880247480000
[  989.764685] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  989.764865] CR2: ffff8000007f6800 CR3: 0000000001ae0000 CR4:
0000000000042660
[  989.765081] Stack:
[  989.765167]  0000000000000003 ffff880174d48040 ffff88017f7cbd18
ffffffffc07d455f
[  989.765442]  ffff88017f7cbd88 ffffffff816fb639 ffff88017f7cbd38
ffff8800361b5600
[  989.765717]  ffff88021db7a000 ffff88021f429380 0000000000000003
ffffffffc0453020
[  989.765991] Call Trace:
[  989.766093]  [<ffffffffc07d455f>] user_bast+0x5f/0xf0 [ocfs2_dlmfs]
[  989.766287]  [<ffffffff816fb639>] ? schedule_timeout+0x169/0x2d0
[  989.766475]  [<ffffffffc0453020>] ? o2dlm_lock_ast_wrapper+0x20/0x20
[ocfs2_stack_o2cb]
[  989.766738]  [<ffffffffc045303a>] o2dlm_blocking_ast_wrapper+0x1a/0x20
[ocfs2_stack_o2cb]
[  989.767010]  [<ffffffffc0864ec6>] dlm_do_local_bast+0x46/0xe0 [ocfs2_dlm]
[  989.767217]  [<ffffffffc084f5cc>] ? dlm_lockres_calc_usage+0x4c/0x60
[ocfs2_dlm]
[  989.767466]  [<ffffffffc08501f1>] dlm_thread+0xa31/0x1140 [ocfs2_dlm]
[  989.767662]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.767834]  [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
[  989.768006]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.768178]  [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
[  989.768349]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.768521]  [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
[  989.768693]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.768893]  [<ffffffff816f78ce>] ? __schedule+0x23e/0x810
[  989.769067]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.769241]  [<ffffffff810ce4d0>] ? wait_woken+0x90/0x90
[  989.769411]  [<ffffffffc084f7c0>] ? dlm_kick_thread+0x80/0x80 [ocfs2_dlm]
[  989.769617]  [<ffffffff810a8bbb>] kthread+0xcb/0xf0
[  989.769774]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.769945]  [<ffffffff816f78da>] ? __schedule+0x24a/0x810
[  989.770117]  [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180
[  989.770321]  [<ffffffff816fdaa1>] ret_from_fork+0x61/0x90
[  989.770492]  [<ffffffff810a8af0>] ? kthread_create_on_node+0x180/0x180
[  989.770689] Code: d0 00 00 00 f0 45 7d c0 bf 00 20 00 00 48 89 83 c0 00 00
00 48 89 83 c8 00 00 00 e8 55 c1 8c c0 83 4b 04 10 48 83 c4 08 5b 5d c3 <0f>
0b 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 55 41 54 53 48 83
[  989.771892] RIP  [<ffffffffc07d4316>]
__user_dlm_queue_lockres.part.4+0x76/0x80 [ocfs2_dlmfs]
[  989.772174]  RSP <ffff88017f7cbcb8>
[  989.772704] ---[ end trace ebd1e38cebcc93a8 ]---
[  989.772907] Kernel panic - not syncing: Fatal exception
[  989.773173] Kernel Offset: disabled

Link: https://lkml.kernel.org/r/20220518235224.87100-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 13:05:42 -07:00
Junxiao Bi 0b6d14e3db ocfs2: dlmfs: don't clear USER_LOCK_ATTACHED when destroying lock
The following function is the only place that checks USER_LOCK_ATTACHED. 
This flag is set when lock request is granted through user_ast() and only
the following function will clear it.

Checking of this flag here is to make sure ocfs2_dlm_unlock is not issued
if this lock is never granted.  For example, lock file is created and then
get removed, open file never happens.

Clearing the flag here is not necessary because this is the only function
that checks it, if another flow is executing user_dlm_destroy_lock(), it
will bail out at the beginning because of USER_LOCK_IN_TEARDOWN and never
check USER_LOCK_ATTACHED.  Drop the clear, so we don't need take care of
it for the following error handling patch.

int user_dlm_destroy_lock(struct user_lock_res *lockres)
{
    ...

    status = 0;
    if (!(lockres->l_flags & USER_LOCK_ATTACHED)) {
        spin_unlock(&lockres->l_lock);
        goto bail;
    }

    lockres->l_flags &= ~USER_LOCK_ATTACHED;
    lockres->l_flags |= USER_LOCK_BUSY;
    spin_unlock(&lockres->l_lock);

	status = ocfs2_dlm_unlock(conn, &lockres->l_lksb, DLM_LKF_VALBLK);
    if (status) {
        user_log_dlm_error("ocfs2_dlm_unlock", status, lockres);
        goto bail;
    }
	...
}

V1 discussion with Joseph:
https://lore.kernel.org/all/7b620c53-0c45-da2c-829e-26195cbe7d4e@linux.alibaba.com/T/

Link: https://lkml.kernel.org/r/20220518235224.87100-1-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 13:05:42 -07:00
Linus Torvalds 7e062cda7d Networking changes for 5.19.
Core
 ----
 
  - Support TCPv6 segmentation offload with super-segments larger than
    64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).
 
  - Generalize skb freeing deferral to per-cpu lists, instead of
    per-socket lists.
 
  - Add a netdev statistic for packets dropped due to L2 address
    mismatch (rx_otherhost_dropped).
 
  - Continue work annotating skb drop reasons.
 
  - Accept alternative netdev names (ALT_IFNAME) in more netlink
    requests.
 
  - Add VLAN support for AF_PACKET SOCK_RAW GSO.
 
  - Allow receiving skb mark from the socket as a cmsg.
 
  - Enable memcg accounting for veth queues, sysctl tables and IPv6.
 
 BPF
 ---
 
  - Add libbpf support for User Statically-Defined Tracing (USDTs).
 
  - Speed up symbol resolution for kprobes multi-link attachments.
 
  - Support storing typed pointers to referenced and unreferenced
    objects in BPF maps.
 
  - Add support for BPF link iterator.
 
  - Introduce access to remote CPU map elements in BPF per-cpu map.
 
  - Allow middle-of-the-road settings for the
    kernel.unprivileged_bpf_disabled sysctl.
 
  - Implement basic types of dynamic pointers e.g. to allow for
    dynamically sized ringbuf reservations without extra memory copies.
 
 Protocols
 ---------
 
  - Retire port only listening_hash table, add a second bind table
    hashed by port and address. Avoid linear list walk when binding
    to very popular ports (e.g. 443).
 
  - Add bridge FDB bulk flush filtering support allowing user space
    to remove all FDB entries matching a condition.
 
  - Introduce accept_unsolicited_na sysctl for IPv6 to implement
    router-side changes for RFC9131.
 
  - Support for MPTCP path manager in user space.
 
  - Add MPTCP support for fallback to regular TCP for connections
    that have never connected additional subflows or transmitted
    out-of-sequence data (partial support for RFC8684 fallback).
 
  - Avoid races in MPTCP-level window tracking, stabilize and improve
    throughput.
 
  - Support lockless operation of GRE tunnels with seq numbers enabled.
 
  - WiFi support for host based BSS color collision detection.
 
  - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.
 
  - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).
 
  - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).
 
  - Allow matching on the number of VLAN tags via tc-flower.
 
  - Add tracepoint for tcp_set_ca_state().
 
 Driver API
 ----------
 
  - Improve error reporting from classifier and action offload.
 
  - Add support for listing line cards in switches (devlink).
 
  - Add helpers for reporting page pool statistics with ethtool -S.
 
  - Add support for reading clock cycles when using PTP virtual clocks,
    instead of having the driver convert to time before reporting.
    This makes it possible to report time from different vclocks.
 
  - Support configuring low-latency Tx descriptor push via ethtool.
 
  - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
    - Sunplus SP7021 SoC (sp7021_emac)
    - Add support for Renesas RZ/V2M (in ravb)
    - Add support for MediaTek mt7986 switches (in mtk_eth_soc)
 
  - Ethernet PHYs:
    - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
    - TI DP83TD510 PHY
    - Microchip LAN8742/LAN88xx PHYs
 
  - WiFi:
    - Driver for pureLiFi X, XL, XC devices (plfxlc)
    - Driver for Silicon Labs devices (wfx)
    - Support for WCN6750 (in ath11k)
    - Support Realtek 8852ce devices (in rtw89)
 
  - Mobile:
    - MediaTek T700 modems (Intel 5G 5000 M.2 cards)
 
  - CAN:
   - ctucanfd: add support for CTU CAN FD open-source IP core
     from Czech Technical University in Prague
 
 Drivers
 -------
 
  - Delete a number of old drivers still using virt_to_bus().
 
  - Ethernet NICs:
    - intel: support TSO on tunnels MPLS
    - broadcom: support multi-buffer XDP
    - nfp: support VF rate limiting
    - sfc: use hardware tx timestamps for more than PTP
    - mlx5: multi-port eswitch support
    - hyper-v: add support for XDP_REDIRECT
    - atlantic: XDP support (including multi-buffer)
    - macb: improve real-time perf by deferring Tx processing to NAPI
 
  - High-speed Ethernet switches:
    - mlxsw: implement basic line card information querying
    - prestera: add support for traffic policing on ingress and egress
 
  - Embedded Ethernet switches:
    - lan966x: add support for packet DMA (FDMA)
    - lan966x: add support for PTP programmable pins
    - ti: cpsw_new: enable bc/mc storm prevention
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - Wake-on-WLAN support for QCA6390 and WCN6855
    - device recovery (firmware restart) support
    - support setting Specific Absorption Rate (SAR) for WCN6855
    - read country code from SMBIOS for WCN6855/QCA6390
    - enable keep-alive during WoWLAN suspend
    - implement remain-on-channel support
 
  - MediaTek WiFi (mt76):
    - support Wireless Ethernet Dispatch offloading packet movement
      between the Ethernet switch and WiFi interfaces
    - non-standard VHT MCS10-11 support
    - mt7921 AP mode support
    - mt7921 IPv6 NS offload support
 
  - Ethernet PHYs:
    - micrel: ksz9031/ksz9131: cabletest support
    - lan87xx: SQI support for T1 PHYs
    - lan937x: add interrupt support for link detection
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmKNMPQACgkQMUZtbf5S
 IrsRARAAuDyYs6jFYB3p+xazZdOnbF4iAgVv71+DQGvmsCl6CB9OrsNZMlvE85OL
 Q3gjcRbgjrkN4lhgI8DmiGYbsUJnAvVjFdNjccz1Z/vTLYvuIM0ol54MUp5S+9WY
 StncOJkOGJxxR/Gi5gzVmejPDsysU3Jik+hm/fpIcz8pybXxAsFKU5waY5qfl+/T
 TZepfV0VCfqRDjqcF1qA5+jJZNU8pdodQlZ1+mh8bwu6Jk1ZkWkj6Ov8MWdwQldr
 LnPeK/9hIGzkdJYHZfajxA3t8D0K5CHzSuih2bJ9ry8ZXgVBkXEThew778/R5izW
 uB0YZs9COFlrIP7XHjtRTy/2xHOdYIPlj2nWhVdfuQDX8Crvt4VRN6EZ1rjko1ZJ
 WanfG6WHF8NH5pXBRQbh3kIMKBnYn6OIzuCfCQSqd+niHcxFIM4vRiggeXI5C5TW
 vJgEWfK6X+NfDiFVa3xyCrEmp5ieA/pNecpwd8rVkql+MtFAAw4vfsotLKOJEAru
 J/XL6UE+YuLqIJV9ACZ9x1AFXXAo661jOxBunOo4VXhXVzWS9lYYz5r5ryIkgT/8
 /Fr0zjANJWgfIuNdIBtYfQ4qG+LozGq038VA06RhFUAZ5tF9DzhqJs2Q2AFuWWBC
 ewCePJVqo1j2Ceq2mGonXRt47OEnlePoOxTk9W+cKZb7ZWE+zEo=
 =Wjii
 -----END PGP SIGNATURE-----

Merge tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core
  ----

   - Support TCPv6 segmentation offload with super-segments larger than
     64k bytes using the IPv6 Jumbogram extension header (AKA BIG TCP).

   - Generalize skb freeing deferral to per-cpu lists, instead of
     per-socket lists.

   - Add a netdev statistic for packets dropped due to L2 address
     mismatch (rx_otherhost_dropped).

   - Continue work annotating skb drop reasons.

   - Accept alternative netdev names (ALT_IFNAME) in more netlink
     requests.

   - Add VLAN support for AF_PACKET SOCK_RAW GSO.

   - Allow receiving skb mark from the socket as a cmsg.

   - Enable memcg accounting for veth queues, sysctl tables and IPv6.

  BPF
  ---

   - Add libbpf support for User Statically-Defined Tracing (USDTs).

   - Speed up symbol resolution for kprobes multi-link attachments.

   - Support storing typed pointers to referenced and unreferenced
     objects in BPF maps.

   - Add support for BPF link iterator.

   - Introduce access to remote CPU map elements in BPF per-cpu map.

   - Allow middle-of-the-road settings for the
     kernel.unprivileged_bpf_disabled sysctl.

   - Implement basic types of dynamic pointers e.g. to allow for
     dynamically sized ringbuf reservations without extra memory copies.

  Protocols
  ---------

   - Retire port only listening_hash table, add a second bind table
     hashed by port and address. Avoid linear list walk when binding to
     very popular ports (e.g. 443).

   - Add bridge FDB bulk flush filtering support allowing user space to
     remove all FDB entries matching a condition.

   - Introduce accept_unsolicited_na sysctl for IPv6 to implement
     router-side changes for RFC9131.

   - Support for MPTCP path manager in user space.

   - Add MPTCP support for fallback to regular TCP for connections that
     have never connected additional subflows or transmitted
     out-of-sequence data (partial support for RFC8684 fallback).

   - Avoid races in MPTCP-level window tracking, stabilize and improve
     throughput.

   - Support lockless operation of GRE tunnels with seq numbers enabled.

   - WiFi support for host based BSS color collision detection.

   - Add support for SO_TXTIME/SCM_TXTIME on CAN sockets.

   - Support transmission w/o flow control in CAN ISOTP (ISO 15765-2).

   - Support zero-copy Tx with TLS 1.2 crypto offload (sendfile).

   - Allow matching on the number of VLAN tags via tc-flower.

   - Add tracepoint for tcp_set_ca_state().

  Driver API
  ----------

   - Improve error reporting from classifier and action offload.

   - Add support for listing line cards in switches (devlink).

   - Add helpers for reporting page pool statistics with ethtool -S.

   - Add support for reading clock cycles when using PTP virtual clocks,
     instead of having the driver convert to time before reporting. This
     makes it possible to report time from different vclocks.

   - Support configuring low-latency Tx descriptor push via ethtool.

   - Separate Clause 22 and Clause 45 MDIO accesses more explicitly.

  New hardware / drivers
  ----------------------

   - Ethernet:
      - Marvell's Octeon NIC PCI Endpoint support (octeon_ep)
      - Sunplus SP7021 SoC (sp7021_emac)
      - Add support for Renesas RZ/V2M (in ravb)
      - Add support for MediaTek mt7986 switches (in mtk_eth_soc)

   - Ethernet PHYs:
      - ADIN1100 industrial PHYs (w/ 10BASE-T1L and SQI reporting)
      - TI DP83TD510 PHY
      - Microchip LAN8742/LAN88xx PHYs

   - WiFi:
      - Driver for pureLiFi X, XL, XC devices (plfxlc)
      - Driver for Silicon Labs devices (wfx)
      - Support for WCN6750 (in ath11k)
      - Support Realtek 8852ce devices (in rtw89)

   - Mobile:
      - MediaTek T700 modems (Intel 5G 5000 M.2 cards)

   - CAN:
      - ctucanfd: add support for CTU CAN FD open-source IP core from
        Czech Technical University in Prague

  Drivers
  -------

   - Delete a number of old drivers still using virt_to_bus().

   - Ethernet NICs:
      - intel: support TSO on tunnels MPLS
      - broadcom: support multi-buffer XDP
      - nfp: support VF rate limiting
      - sfc: use hardware tx timestamps for more than PTP
      - mlx5: multi-port eswitch support
      - hyper-v: add support for XDP_REDIRECT
      - atlantic: XDP support (including multi-buffer)
      - macb: improve real-time perf by deferring Tx processing to NAPI

   - High-speed Ethernet switches:
      - mlxsw: implement basic line card information querying
      - prestera: add support for traffic policing on ingress and egress

   - Embedded Ethernet switches:
      - lan966x: add support for packet DMA (FDMA)
      - lan966x: add support for PTP programmable pins
      - ti: cpsw_new: enable bc/mc storm prevention

   - Qualcomm 802.11ax WiFi (ath11k):
      - Wake-on-WLAN support for QCA6390 and WCN6855
      - device recovery (firmware restart) support
      - support setting Specific Absorption Rate (SAR) for WCN6855
      - read country code from SMBIOS for WCN6855/QCA6390
      - enable keep-alive during WoWLAN suspend
      - implement remain-on-channel support

   - MediaTek WiFi (mt76):
      - support Wireless Ethernet Dispatch offloading packet movement
        between the Ethernet switch and WiFi interfaces
      - non-standard VHT MCS10-11 support
      - mt7921 AP mode support
      - mt7921 IPv6 NS offload support

   - Ethernet PHYs:
      - micrel: ksz9031/ksz9131: cabletest support
      - lan87xx: SQI support for T1 PHYs
      - lan937x: add interrupt support for link detection"

* tag 'net-next-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1809 commits)
  ptp: ocp: Add firmware header checks
  ptp: ocp: fix PPS source selector debugfs reporting
  ptp: ocp: add .init function for sma_op vector
  ptp: ocp: vectorize the sma accessor functions
  ptp: ocp: constify selectors
  ptp: ocp: parameterize input/output sma selectors
  ptp: ocp: revise firmware display
  ptp: ocp: add Celestica timecard PCI ids
  ptp: ocp: Remove #ifdefs around PCI IDs
  ptp: ocp: 32-bit fixups for pci start address
  Revert "net/smc: fix listen processing for SMC-Rv2"
  ath6kl: Use cc-disable-warning to disable -Wdangling-pointer
  selftests/bpf: Dynptr tests
  bpf: Add dynptr data slices
  bpf: Add bpf_dynptr_read and bpf_dynptr_write
  bpf: Dynptr support for ring buffers
  bpf: Add bpf_dynptr_from_mem for local dynptrs
  bpf: Add verifier support for dynptrs
  bpf: Suppress 'passing zero to PTR_ERR' warning
  bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack
  ...
2022-05-25 12:22:58 -07:00
Luís Henriques ea16567f11 ceph: fix decoding of client session messages flags
The cephfs kernel client started to show  the message:

 ceph: mds0 session blocklisted

when mounting a filesystem.  This is due to the fact that the session
messages are being incorrectly decoded: the skip needs to take into
account the 'len'.

While there, fixed some whitespaces too.

Cc: stable@vger.kernel.org
Fixes: e1c9788cb3 ("ceph: don't rely on error_string to validate blocklisted session.")
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 5e56776d52 ceph: switch TASK_INTERRUPTIBLE to TASK_KILLABLE
If the task is placed in the TASK_INTERRUPTIBLE state it will sleep
until either something explicitly wakes it up, or a non-masked signal
is received. Switch to TASK_KILLABLE to avoid the noises.

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Colin Ian King 2ecd0edd13 ceph: remove redundant variable ino
Variable ino is being assigned a value that is never read. The variable
and assignment are redundant, remove it.

Cleans up clang scan build warning:
warning: Although the value stored to 'ino' is used in the enclosing
expression, the value is never actually read from 'ino'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li a74379543d ceph: try to queue a writeback if revoking fails
If the pagecaches writeback just finished and the i_wrbuffer_ref
reaches zero it will try to trigger ceph_check_caps(). But if just
before ceph_check_caps() the i_wrbuffer_ref could be increased
again by mmap/cache write, then the Fwb revoke will fail.

We need to try to queue a writeback in this case instead of
triggering the writeback by BDI's delayed work per 5 seconds.

URL: https://tracker.ceph.com/issues/46904
URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Luís Henriques 55ab552080 ceph: fix statfs for subdir mounts
When doing a mount using as base a directory that has 'max_bytes' quotas
statfs uses that value as the total; if a subdirectory is used instead,
the same 'max_bytes' too in statfs, unless there is another quota set.

Unfortunately, if this subdirectory only has the 'max_files' quota set,
then statfs uses the filesystem total.  Fix this by making sure we only
lookup realms that contain the 'max_bytes' quota.

Cc: Ryan Taylor <rptaylor@uvic.ca>
URL: https://tracker.ceph.com/issues/55090
Signed-off-by: Luís Henriques <lhenriques@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 825978fd6a ceph: fix possible deadlock when holding Fwb to get inline_data
1, mount with wsync.
2, create a file with O_RDWR, and the request was sent to mds.0:

   ceph_atomic_open()-->
     ceph_mdsc_do_request(openc)
     finish_open(file, dentry, ceph_open)-->
       ceph_open()-->
         ceph_init_file()-->
           ceph_init_file_info()-->
             ceph_uninline_data()-->
             {
               ...
               if (inline_version == 1 || /* initial version, no data */
                   inline_version == CEPH_INLINE_NONE)
                     goto out_unlock;
               ...
             }

The inline_version will be 1, which is the initial version for the
new create file. And here the ci->i_inline_version will keep with 1,
it's buggy.

3, buffer write to the file immediately:

   ceph_write_iter()-->
     ceph_get_caps(file, need=Fw, want=Fb, ...);
     generic_perform_write()-->
       a_ops->write_begin()-->
         ceph_write_begin()-->
           netfs_write_begin()-->
             netfs_begin_read()-->
               netfs_rreq_submit_slice()-->
                 netfs_read_from_server()-->
                   rreq->netfs_ops->issue_read()-->
                     ceph_netfs_issue_read()-->
                     {
                       ...
                       if (ci->i_inline_version != CEPH_INLINE_NONE &&
                           ceph_netfs_issue_op_inline(subreq))
                         return;
                       ...
                     }
     ceph_put_cap_refs(ci, Fwb);

The ceph_netfs_issue_op_inline() will send a getattr(Fsr) request to
mds.1.

4, then the mds.1 will request the rd lock for CInode::filelock from
the auth mds.0, the mds.0 will do the CInode::filelock state transation
from excl --> sync, but it need to revoke the Fxwb caps back from the
clients.

While the kernel client has aleady held the Fwb caps and waiting for
the getattr(Fsr).

It's deadlock!

URL: https://tracker.ceph.com/issues/55377
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 3459bd0c55 ceph: redirty the page for writepage on failure
When run out of memories we should redirty the page before failing
the writepage. Or we will hit BUG_ON(folio_get_private(folio)) in
ceph_dirty_folio().

URL: https://tracker.ceph.com/issues/55421
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 5eed80fba6 ceph: try to choose the auth MDS if possible for getattr
If any 'x' caps is issued we can just choose the auth MDS instead
of the random replica MDSes. Because only when the Locker is in
LOCK_EXEC state will the loner client could get the 'x' caps. And
if we send the getattr requests to any replica MDS it must auth pin
and tries to rdlock from the auth MDS, and then the auth MDS need
to do the Locker state transition to LOCK_SYNC. And after that the
lock state will change back.

This cost much when doing the Locker state transition and usually
will need to revoke caps from clients.

URL: https://tracker.ceph.com/issues/55240
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li f7a2d0688a ceph: disable updating the atime since cephfs won't maintain it
Since CephFS makes no attempt to maintain atime, we shouldn't
try to update it in mmap and generic read cases and ignore updating
it in direct and sync read cases.

And even we update it in mmap and generic read cases we will drop
it and won't sync it to MDS. And we are seeing the atime will be
updated and then dropped to the floor again and again.

URL: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/VSJM7T4CS5TDRFF6XFPIYMHP75K73PZ6/
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Acked-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 1b2ba3c561 ceph: flush the mdlog for filesystem sync
Before waiting for a request's safe reply, we will send the mdlog flush
request to the relevant MDS. And this will also flush the mdlog for all
the other unsafe requests in the same session, so we can record the last
session and no need to flush mdlog again in the next loop. But there
still have cases that it may send the mdlog flush requst twice or more,
but that should be not often.

Rename wait_unsafe_requests() to
flush_mdlog_and_wait_mdsc_unsafe_requests() to make it more
descriptive.

[xiubli: fold in MDS request refcount leak fix from Jeff]

URL: https://tracker.ceph.com/issues/55284
URL: https://tracker.ceph.com/issues/55411
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li ae06706330 ceph: rename unsafe_request_wait()
Rename it to flush_mdlog_and_wait_inode_unsafe_requests() to make
it more descriptive.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:14 +02:00
Xiubo Li 261998c300 ceph: fix statx AT_STATX_DONT_SYNC vs AT_STATX_FORCE_SYNC check
From the posix and the initial statx supporting commit comments,
the AT_STATX_DONT_SYNC is a lightweight stat and the
AT_STATX_FORCE_SYNC is a heaverweight one. And also checked all
the other current usage about these two flags they are all doing
the same, that is only when the AT_STATX_FORCE_SYNC is not set
and the AT_STATX_DONT_SYNC is set will they skip sync retriving
the attributes from storage.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li 68e5ec2ec9 ceph: no need to invalidate the fscache twice
Fixes: 400e1286c0 ("ceph: conversion to new fscache API")
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel 3ffa9d6f99 ceph: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean.

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Jakob Koschel 57a5df0e86 ceph: use dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li 7ffe4fcea7 ceph: update the dlease for the hashed dentry when removing
The MDS will always refresh the dentry lease when removing the files
or directories. And if the dentry is still hashed, we can update
the dentry lease and no need to do the lookup from the MDS later.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li 546a5d6122 ceph: stop retrying the request when exceeding 256 times
The type of 'r_attempts' in kernel 'ceph_mds_request' is 'int',
while in 'ceph_mds_request_head' the type of 'num_retry' is '__u8'.
So in case the request retries exceeding 256 times, the MDS will
receive a incorrect retry seq.

In this case it's ususally a bug in MDS and continue retrying the
request makes no sense. For now let's limit it to 256. In future
this could be fixed in ceph code, so avoid using the hardcode here.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li 1980b1bf17 ceph: stop forwarding the request when exceeding 256 times
The type of 'num_fwd' in ceph 'MClientRequestForward' is 'int32_t',
while in 'ceph_mds_request_head' the type is '__u8'. So in case
the request bounces between MDSes exceeding 256 times, the client
will get stuck.

In this case it's ususally a bug in MDS and continue bouncing the
request makes no sense.

URL: https://tracker.ceph.com/issues/55130
Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Luís Henriques <lhenriques@suse.de>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Xiubo Li 6c1dc50284 ceph: remove unused CEPH_MDS_LEASE_RELEASE related code
The ceph_mdsc_lease_release() has been removed by commit 8aa152c778
(ceph: remove ceph_mdsc_lease_release). ceph_mdsc_lease_send_msg will
never be called with CEPH_MDS_LEASE_RELEASE.

Signed-off-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Venky Shankar d7a2dc5230 ceph: allow ceph.dir.rctime xattr to be updatable
`rctime' has been a pain point in cephfs due to its buggy
nature - inconsistent values reported and those sorts.
Fixing rctime is non-trivial needing an overall redesign
of the entire nested statistics infrastructure.

As a workaround, PR

     http://github.com/ceph/ceph/pull/37938

allows this extended attribute to be manually set. This allows
users to "fixup" inconsistent rctime values. While this sounds
messy, its probably the wisest approach allowing users/scripts
to workaround buggy rctime values.

The above PR enables Ceph MDS to allow manually setting
rctime extended attribute with the corresponding user-land
changes. We may as well allow the same to be done via kclient
for parity.

Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Yufen Yu 908ea65416 f2fs: add f2fs_init_write_merge_io function
Almost all other initialization of variables in f2fs_fill_super are
extraced to a single function. Also do it for write_io[], which can
make code more clean.

This patch just refactors the code, theres no functional change.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Reviewed-by: Chao Yu <chao@kernel.org>
[Jaegeuk Kim: clean up]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2022-05-25 10:48:56 -07:00
Paulo Alcantara de3a9e943d cifs: fix ntlmssp on old servers
Some older servers seem to require the workstation name during ntlmssp
to be at most 15 chars (RFC1001 name length), so truncate it before
sending when using insecure dialects.

Link: https://lore.kernel.org/r/e6837098-15d9-acb6-7e34-1923cf8c6fe1@winds.org
Reported-by: Byron Stanoszek <gandalf@winds.org>
Tested-by: Byron Stanoszek <gandalf@winds.org>
Fixes: 49bd49f983 ("cifs: send workstation name during ntlmssp session setup")
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2022-05-25 07:41:22 -05:00
Jens Axboe 54739cc6b4 io_uring: make prep and issue side of req handlers named consistently
Almost all of them are, the odd ones out are the poll remove and the
files update request. Name them like the others, which is:

io_#cmdname_prep	for request preparation
io_#cmdname		for request issue

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-25 05:37:06 -06:00
Jens Axboe ecddc25d13 io_uring: make timeout prep handlers consistent with other prep handlers
All other opcodes take a {req, sqe} set for prep handling, split out
a timeout prep handler so that timeout and linked timeouts can use
the same one.

Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-25 05:36:54 -06:00