Commit Graph

7621 Commits

Author SHA1 Message Date
Suwan Kim 4e04005256 virtio-blk: support polling I/O
This patch supports polling I/O via virtio-blk driver. Polling
feature is enabled by module parameter "poll_queues" and it sets
dedicated polling queues for virtio-blk. This patch improves the
polling I/O throughput and latency.

The virtio-blk driver doesn't not have a poll function and a poll
queue and it has been operating in interrupt driven method even if
the polling function is called in the upper layer.

virtio-blk polling is implemented upon 'batched completion' of block
layer. virtblk_poll() queues completed request to io_comp_batch->req_list
and later, virtblk_complete_batch() calls unmap function and ends
the requests in batch.

virtio-blk reads the number of poll queues from module parameter
"poll_queues". If VM sets queue parameter as below,
("num-queues=N" [QEMU property], "poll_queues=M" [module parameter])
It allocates N virtqueues to virtio_blk->vqs[N] and it uses [0..(N-M-1)]
as default queues and [(N-M)..(N-1)] as poll queues. Unlike the default
queues, the poll queues have no callback function.

Regarding HW-SW queue mapping, the default queue mapping uses the
existing method that condsiders MSI irq vector. But the poll queue
doesn't have an irq, so it uses the regular blk-mq cpu mapping.

For verifying the improvement, I did Fio polling I/O performance test
with io_uring engine with the options below.
(io_uring, hipri, randread, direct=1, bs=512, iodepth=64 numjobs=N)
I set 4 vcpu and 4 virtio-blk queues - 2 default queues and 2 poll
queues for VM.

As a result, IOPS and average latency improved about 10%.

Test result:

- Fio io_uring poll without virtio-blk poll support
	-- numjobs=1 : IOPS = 339K, avg latency = 188.33us
	-- numjobs=2 : IOPS = 367K, avg latency = 347.33us
	-- numjobs=4 : IOPS = 383K, avg latency = 682.06us

- Fio io_uring poll with virtio-blk poll support
	-- numjobs=1 : IOPS = 385K, avg latency = 165.94us
	-- numjobs=2 : IOPS = 408K, avg latency = 313.28us
	-- numjobs=4 : IOPS = 424K, avg latency = 613.05us

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Suwan Kim <suwan.kim027@gmail.com>
Message-Id: <20220406153207.163134-2-suwan.kim027@gmail.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-05-31 12:44:23 -04:00
Christoph Hellwig e2e5308672 blk-mq: remove the done argument to blk_execute_rq_nowait
Let the caller set it together with the end_io_data instead of passing
a pointless argument.  Note the the target code did in fact already
set it and then just overrode it again by calling blk_execute_rq_nowait.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-28 06:15:27 -06:00
Yu Kuai 1243172d58 nbd: use pr_err to output error message
Instead of using the long printk(KERN_ERR "nbd: ...") to
output error message, defining pr_fmt and using
the short pr_err("") to do that. The replacemen is done
by using the following command:

  sed -i 's/printk(KERN_ERR "nbd: /pr_err("/g' \
		  drivers/block/nbd.c

This patch also rewrap to 80 columns where possible.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-7-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Zhang Wensheng 858f1bf65d nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
When 'index' is a big numbers, it may become negative which forced
to 'int'. then 'index << part_shift' might overflow to a positive
value that is not greater than '0xfffff', then sysfs might complains
about duplicate creation. Because of this, move the 'index' judgment
to the front will fix it and be better.

Fixes: b0d9111a2d ("nbd: use an idr to keep track of nbd devices")
Fixes: 940c264984 ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-6-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Yu Kuai 09dadb5985 nbd: fix io hung while disconnecting device
In our tests, "qemu-nbd" triggers a io hung:

INFO: task qemu-nbd:11445 blocked for more than 368 seconds.
      Not tainted 5.18.0-rc3-next-20220422-00003-g2176915513ca #884
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:qemu-nbd        state:D stack:    0 pid:11445 ppid:     1 flags:0x00000000
Call Trace:
 <TASK>
 __schedule+0x480/0x1050
 ? _raw_spin_lock_irqsave+0x3e/0xb0
 schedule+0x9c/0x1b0
 blk_mq_freeze_queue_wait+0x9d/0xf0
 ? ipi_rseq+0x70/0x70
 blk_mq_freeze_queue+0x2b/0x40
 nbd_add_socket+0x6b/0x270 [nbd]
 nbd_ioctl+0x383/0x510 [nbd]
 blkdev_ioctl+0x18e/0x3e0
 __x64_sys_ioctl+0xac/0x120
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7fd8ff706577
RSP: 002b:00007fd8fcdfebf8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000040000000 RCX: 00007fd8ff706577
RDX: 000000000000000d RSI: 000000000000ab00 RDI: 000000000000000f
RBP: 000000000000000f R08: 000000000000fbe8 R09: 000055fe497c62b0
R10: 00000002aff20000 R11: 0000000000000246 R12: 000000000000006d
R13: 0000000000000000 R14: 00007ffe82dc5e70 R15: 00007fd8fcdff9c0

"qemu-ndb -d" will call ioctl 'NBD_DISCONNECT' first, however, following
message was found:

block nbd0: Send disconnect failed -32

Which indicate that something is wrong with the server. Then,
"qemu-nbd -d" will call ioctl 'NBD_CLEAR_SOCK', however ioctl can't clear
requests after commit 2516ab1543fd("nbd: only clear the queue on device
teardown"). And in the meantime, request can't complete through timeout
because nbd_xmit_timeout() will always return 'BLK_EH_RESET_TIMER', which
means such request will never be completed in this situation.

Now that the flag 'NBD_CMD_INFLIGHT' can make sure requests won't
complete multiple times, switch back to call nbd_clear_sock() in
nbd_clear_sock_ioctl(), so that inflight requests can be cleared.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-5-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Yu Kuai 2895f1831e nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
Otherwise io will hung because request will only be completed if the
cmd has the flag 'NBD_CMD_INFLIGHT'.

Fixes: 07175cb1ba ("nbd: make sure request completion won't concurrent")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-4-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Yu Kuai c55b2b983b nbd: fix race between nbd_alloc_config() and module removal
When nbd module is being removing, nbd_alloc_config() may be
called concurrently by nbd_genl_connect(), although try_module_get()
will return false, but nbd_alloc_config() doesn't handle it.

The race may lead to the leak of nbd_config and its related
resources (e.g, recv_workq) and oops in nbd_read_stat() due
to the unload of nbd module as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000040
  Oops: 0000 [#1] SMP PTI
  CPU: 5 PID: 13840 Comm: kworker/u17:33 Not tainted 5.14.0+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: knbd16-recv recv_work [nbd]
  RIP: 0010:nbd_read_stat.cold+0x130/0x1a4 [nbd]
  Call Trace:
   recv_work+0x3b/0xb0 [nbd]
   process_one_work+0x1ed/0x390
   worker_thread+0x4a/0x3d0
   kthread+0x12a/0x150
   ret_from_fork+0x22/0x30

Fixing it by checking the return value of try_module_get()
in nbd_alloc_config(). As nbd_alloc_config() may return ERR_PTR(-ENODEV),
assign nbd->config only when nbd_alloc_config() succeeds to ensure
the value of nbd->config is binary (valid or NULL).

Also adding a debug message to check the reference counter
of nbd_config during module removal.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-3-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Yu Kuai 06c4da89c2 nbd: call genl_unregister_family() first in nbd_cleanup()
Otherwise there may be race between module removal and the handling of
netlink command, which can lead to the oops as shown below:

  BUG: kernel NULL pointer dereference, address: 0000000000000098
  Oops: 0002 [#1] SMP PTI
  CPU: 1 PID: 31299 Comm: nbd-client Tainted: G            E     5.14.0-rc4
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  RIP: 0010:down_write+0x1a/0x50
  Call Trace:
   start_creating+0x89/0x130
   debugfs_create_dir+0x1b/0x130
   nbd_start_device+0x13d/0x390 [nbd]
   nbd_genl_connect+0x42f/0x748 [nbd]
   genl_family_rcv_msg_doit.isra.0+0xec/0x150
   genl_rcv_msg+0xe5/0x1e0
   netlink_rcv_skb+0x55/0x100
   genl_rcv+0x29/0x40
   netlink_unicast+0x1a8/0x250
   netlink_sendmsg+0x21b/0x430
   ____sys_sendmsg+0x2a4/0x2d0
   ___sys_sendmsg+0x81/0xc0
   __sys_sendmsg+0x62/0xb0
   __x64_sys_sendmsg+0x1f/0x30
   do_syscall_64+0x3b/0xc0
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  Modules linked in: nbd(E-)

Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220521073749.3146892-2-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 20:39:33 -06:00
Christoph Hellwig b9684a71fc block, loop: support partitions without scanning
Historically we did distinguish between a flag that surpressed partition
scanning, and a combinations of the minors variable and another flag if
any partitions were supported.  This was generally confusing and doesn't
make much sense, but some corner case uses of the loop driver actually
do want to support manually added partitions on a device that does not
actively scan for partitions.  To make things worsee the loop driver
also wants to dynamically toggle the scanning for partitions on a live
gendisk, which makes the disk->flags updates non-atomic.

Introduce a new GD_SUPPRESS_PART_SCAN bit in disk->state that disables
just scanning for partitions, and toggle that instead of GENHD_FL_NO_PART
in the loop driver.

Fixes: 1ebe2e5f9d ("block: remove GENHD_FL_EXT_DEVT")
Reported-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220527055806.1972352-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-27 06:39:00 -06:00
Juergen Gross 49f8b459fc xen: switch gnttab_end_foreign_access() to take a struct page pointer
Instead of a virtual kernel address use a pointer of the associated
struct page as second parameter of gnttab_end_foreign_access().

Most users have that pointer available already and are creating the
virtual address from it, risking problems in case the memory is
located in highmem.

gnttab_end_foreign_access() itself won't need to get the struct page
from the address again.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-05-27 11:05:29 +02:00
Linus Torvalds 98931dd95f Yang Shi has improved the behaviour of khugepaged collapsing of readonly
file-backed transparent hugepages.
 
 Johannes Weiner has arranged for zswap memory use to be tracked and
 managed on a per-cgroup basis.
 
 Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for runtime
 enablement of the recent huge page vmemmap optimization feature.
 
 Baolin Wang contributes a series to fix some issues around hugetlb
 pagetable invalidation.
 
 Zhenwei Pi has fixed some interactions between hwpoisoned pages and
 virtualization.
 
 Tong Tiangen has enabled the use of the presently x86-only
 page_table_check debugging feature on arm64 and riscv.
 
 David Vernet has done some fixup work on the memcg selftests.
 
 Peter Xu has taught userfaultfd to handle write protection faults against
 shmem- and hugetlbfs-backed files.
 
 More DAMON development from SeongJae Park - adding online tuning of the
 feature and support for monitoring of fixed virtual address ranges.  Also
 easier discovery of which monitoring operations are available.
 
 Nadav Amit has done some optimization of TLB flushing during mprotect().
 
 Neil Brown continues to labor away at improving our swap-over-NFS support.
 
 David Hildenbrand has some fixes to anon page COWing versus
 get_user_pages().
 
 Peng Liu fixed some errors in the core hugetlb code.
 
 Joao Martins has reduced the amount of memory consumed by device-dax's
 compound devmaps.
 
 Some cleanups of the arch-specific pagemap code from Anshuman Khandual.
 
 Muchun Song has found and fixed some errors in the TLB flushing of
 transparent hugepages.
 
 Roman Gushchin has done more work on the memcg selftests.
 
 And, of course, many smaller fixes and cleanups.  Notably, the customary
 million cleanup serieses from Miaohe Lin.
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCYo52xQAKCRDdBJ7gKXxA
 jtJFAQD238KoeI9z5SkPMaeBRYSRQmNll85mxs25KapcEgWgGQD9FAb7DJkqsIVk
 PzE+d9hEfirUGdL6cujatwJ6ejYR8Q8=
 =nFe6
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:
 "Almost all of MM here. A few things are still getting finished off,
  reviewed, etc.

   - Yang Shi has improved the behaviour of khugepaged collapsing of
     readonly file-backed transparent hugepages.

   - Johannes Weiner has arranged for zswap memory use to be tracked and
     managed on a per-cgroup basis.

   - Munchun Song adds a /proc knob ("hugetlb_optimize_vmemmap") for
     runtime enablement of the recent huge page vmemmap optimization
     feature.

   - Baolin Wang contributes a series to fix some issues around hugetlb
     pagetable invalidation.

   - Zhenwei Pi has fixed some interactions between hwpoisoned pages and
     virtualization.

   - Tong Tiangen has enabled the use of the presently x86-only
     page_table_check debugging feature on arm64 and riscv.

   - David Vernet has done some fixup work on the memcg selftests.

   - Peter Xu has taught userfaultfd to handle write protection faults
     against shmem- and hugetlbfs-backed files.

   - More DAMON development from SeongJae Park - adding online tuning of
     the feature and support for monitoring of fixed virtual address
     ranges. Also easier discovery of which monitoring operations are
     available.

   - Nadav Amit has done some optimization of TLB flushing during
     mprotect().

   - Neil Brown continues to labor away at improving our swap-over-NFS
     support.

   - David Hildenbrand has some fixes to anon page COWing versus
     get_user_pages().

   - Peng Liu fixed some errors in the core hugetlb code.

   - Joao Martins has reduced the amount of memory consumed by
     device-dax's compound devmaps.

   - Some cleanups of the arch-specific pagemap code from Anshuman
     Khandual.

   - Muchun Song has found and fixed some errors in the TLB flushing of
     transparent hugepages.

   - Roman Gushchin has done more work on the memcg selftests.

  ... and, of course, many smaller fixes and cleanups. Notably, the
  customary million cleanup serieses from Miaohe Lin"

* tag 'mm-stable-2022-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (381 commits)
  mm: kfence: use PAGE_ALIGNED helper
  selftests: vm: add the "settings" file with timeout variable
  selftests: vm: add "test_hmm.sh" to TEST_FILES
  selftests: vm: check numa_available() before operating "merge_across_nodes" in ksm_tests
  selftests: vm: add migration to the .gitignore
  selftests/vm/pkeys: fix typo in comment
  ksm: fix typo in comment
  selftests: vm: add process_mrelease tests
  Revert "mm/vmscan: never demote for memcg reclaim"
  mm/kfence: print disabling or re-enabling message
  include/trace/events/percpu.h: cleanup for "percpu: improve percpu_alloc_percpu event trace"
  include/trace/events/mmflags.h: cleanup for "tracing: incorrect gfp_t conversion"
  mm: fix a potential infinite loop in start_isolate_page_range()
  MAINTAINERS: add Muchun as co-maintainer for HugeTLB
  zram: fix Kconfig dependency warning
  mm/shmem: fix shmem folio swapoff hang
  cgroup: fix an error handling path in alloc_pagecache_max_30M()
  mm: damon: use HPAGE_PMD_SIZE
  tracing: incorrect isolate_mote_t cast in mm_vmscan_lru_isolate
  nodemask.h: fix compilation error with GCC12
  ...
2022-05-26 12:32:41 -07:00
Jakob Koschel 3302ffd44c rbd: replace usage of found with dedicated list iterator variable
To move the list iterator variable into the list_for_each_entry_*()
macro in the future it should be avoided to use the list iterator
variable after the loop body.

To *never* use the list iterator variable after the loop it was
concluded to use a separate iterator variable instead of a
found boolean.

This removes the need to use a found variable and simply checking if
the variable was set, can determine if the break/goto was hit.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2022-05-25 20:45:13 +02:00
Randy Dunlap 6140ae41ef zram: fix Kconfig dependency warning
ZSMALLOC depends on MMU so ZRAM should also depend on MMU since 'select'
does not follow any dependency chains.

Fixes this Kconfig warning:

WARNING: unmet direct dependencies detected for ZSMALLOC
  Depends on [n]: MMU [=n]
  Selected by [y]:
  - ZRAM [=y] && BLK_DEV [=y] && BLOCK [=y] && SYSFS [=y] && (CRYPTO_LZO [=y] || CRYPTO_ZSTD [=m] || CRYPTO_LZ4 [=m] || CRYPTO_LZ4HC [=n] || CRYPTO_842 [=n])

Link: https://lkml.kernel.org/r/20220522204027.22964-1-rdunlap@infradead.org
Fixes: b3fbd58fcb ("mm: Kconfig: simplify zswap configuration")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-25 10:47:47 -07:00
Linus Torvalds d613060475 xen: branch for v5.19-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYosaQAAKCRCAXGG7T9hj
 vil9AP9b4C+f9LTG0kAinjxLPyWE0Mo/iq3gO60MteZ2HyeI+AD/eSzJioJA0vyH
 4pnU/UaGLJSp/B1LitLdjwoWIvwcEws=
 =pDcW
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - decouple the PV interface from kernel internals in the Xen
   scsifront/scsiback pv drivers

 - harden the Xen scsifront PV driver against a malicious backend driver

 - simplify Xen PV frontend driver ring page setup

 - support Xen setups with multiple domains created at boot time to
   tolerate Xenstore coming up late

 - two small cleanup patches

* tag 'for-linus-5.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits)
  xen: add support for initializing xenstore later as HVM domain
  xen: sync xs_wire.h header with upstream xen
  x86: xen: remove STACK_FRAME_NON_STANDARD from xen_cpuid
  xen-blk{back,front}: Update contact points for buffer_squeeze_duration_ms and feature_persistent
  xen/xenbus: eliminate xenbus_grant_ring()
  xen/sndfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/usbfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/scsifront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/pcifront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/drmfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/tpmfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/netfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/blkfront: use xenbus_setup_ring() and xenbus_teardown_ring()
  xen/xenbus: add xenbus_setup_ring() service function
  xen: update ring.h
  xen/shbuf: switch xen-front-pgdir-shbuf to use INVALID_GRANT_REF
  xen/dmabuf: switch gntdev-dmabuf to use INVALID_GRANT_REF
  xen/sound: switch xen_snd_front to use INVALID_GRANT_REF
  xen/drm: switch xen_drm_front to use INVALID_GRANT_REF
  xen/usb: switch xen-hcd to use INVALID_GRANT_REF
  ...
2022-05-23 20:49:45 -07:00
Linus Torvalds 5dc921868c for-5.19/drivers-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrTcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgph/REAC0/7odRfJeTJ1PkJhSKFc7dhyS7rK4du2s
 3+z+H6Yeua2yVIJb0mYYGEJcOUUQ9nD2T9424n3NzDOw88U4y8Vg2YEH+UiJBuj4
 AJoxPNkQdxL7WzmwHmRNLCcOOFhISLqWiCJSr45d+LP1f6aO24Q9lewYWxtNA4TW
 mqb7Ne7e3Z77m9rmsCsZ26bzQHg1EEQ6qgjZM9tqMhOeTqYhmrqfrD9KtG8TIkpK
 N8277E5QcequHf7v6VpKqEOzf3d2kx55JaZdu+oxLPVMED3wJJFwcYF1/xmM7Fgx
 tp7xCjqqUHXwKvJNCFJpnvw+cXu0Ct7cWOIG4ROCvaTD4vBI1KzZLc0gO7pKFW0Y
 hNIlMXr4n8PmonS81tMV4TqmRWxedX/jxuaeJCVNr89PqYU4luPpigJZqv7rlGry
 KZUlktQot22M/7FC2MS6KhgbQKLPrRGTAEyY/JNwBHckCZiduWQFlmKLQ926xQIJ
 6vdjSzHK5MrT/d+yow3bGFxAJWloGJ+L+RsH0b+WikF81+6ic9P3AoStgbVilfKD
 6sbjcju8SShDlQ+W/Ocm0rHC+i/RDKT3QqItXgfhA/1FfMPODQGc/xcZg+AdTswn
 VSnUIkvk9/mTO0StilVfNJDfG1QkSpJ5Ilvs/DnIahZj6IG4QbJvtnVNbmQX6ptz
 AUB4DdGwXg==
 =geQL
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/drivers-2022-05-22' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:
 "Here are the driver updates queued up for 5.19. This contains:

   - NVMe pull requests via Christoph:
       - tighten the PCI presence check (Stefan Roese)
       - fix a potential NULL pointer dereference in an error path (Kyle
         Miller Smith)
       - fix interpretation of the DMRSL field (Tom Yan)
       - relax the data transfer alignment (Keith Busch)
       - verbose error logging improvements (Max Gurtovoy, Chaitanya
         Kulkarni)
       - misc cleanups (Chaitanya Kulkarni, Christoph)
       - set non-mdts limits in nvme_scan_work (Chaitanya Kulkarni)
       - add support for TP4084 - Time-to-Ready Enhancements (Christoph)

   - MD pull request via Song:
       - Improve annotation in raid5 code, by Logan Gunthorpe
       - Support MD_BROKEN flag in raid-1/5/10, by Mariusz Tkaczyk
       - Other small fixes/cleanups

   - null_blk series making the configfs side much saner (Damien)

   - Various minor drbd cleanups and fixes (Haowen, Uladzislau, Jiapeng,
     Arnd, Cai)

   - Avoid using the system workqueue (and hence flushing it) in rnbd
     (Jack)

   - Avoid using the system workqueue (and hence flushing it) in aoe
     (Tetsuo)

   - Series fixing discard_alignment issues in drivers (Christoph)

   - Small series fixing drivers poking at disk->part0 for openers
     information (Christoph)

   - Series fixing deadlocks in loop (Christoph, Tetsuo)

   - Remove loop.h and add SPDX headers (Christoph)

   - Various fixes and cleanups (Julia, Xie, Yu)"

* tag 'for-5.19/drivers-2022-05-22' of git://git.kernel.dk/linux-block: (72 commits)
  mtip32xx: fix typo in comment
  nvme: set non-mdts limits in nvme_scan_work
  nvme: add support for TP4084 - Time-to-Ready Enhancements
  nvme: split the enum used for various register constants
  nbd: Fix hung on disconnect request if socket is closed before
  nvme-fabrics: add a request timeout helper
  nvme-pci: harden drive presence detect in nvme_dev_disable()
  nvme-pci: fix a NULL pointer dereference in nvme_alloc_admin_tags
  nvme: mark internal passthru request RQF_QUIET
  nvme: remove unneeded include from constants file
  nvme: add missing status values to verbose logging
  nvme: set dma alignment to dword
  nvme: fix interpretation of DMRSL
  loop: remove most the top-of-file boilerplate comment from the UAPI header
  loop: remove most the top-of-file boilerplate comment
  loop: add a SPDX header
  loop: remove loop.h
  block: null_blk: Improve device creation with configfs
  block: null_blk: Cleanup messages
  block: null_blk: Cleanup device creation and deletion
  ...
2022-05-23 14:04:14 -07:00
Linus Torvalds 115cd47132 for-5.19/block-2022-05-22
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmKKrUsQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgDjD/44hY9h0JsOLoRH1IvFtuaH6n718JXuqG17
 hHCfmnAUVqj2jT00IUbVlUTd905bCGpfrodBL3PAmPev1zZHOUd/MnJKrSynJ+/s
 NJEMZQaHxLmocNDpJ1sZo7UbAFErsZXB0gVYUO8cH2bFYNu84H1mhRCOReYyqmvQ
 aIAASX5qRB/ciBQCivzAJl2jTdn4WOn5hWi9RLidQB7kSbaXGPmgKAuN88WI4H7A
 zQgAkEl2EEquyMI5tV1uquS7engJaC/4PsenF0S9iTyrhJLjneczJBJZKMLeMR8d
 sOm6sKJdpkrfYDyaA4PIkgmLoEGTtwGpqGHl4iXTyinUAxJoca5tmPvBb3wp66GE
 2Mr7pumxc1yJID2VHbsERXlOAX3aZNCowx2gum2MTRIO8g11Eu3aaVn2kv37MBJ2
 4R2a/cJFl5zj9M8536cG+Yqpy0DDVCCQKUIqEupgEu1dyfpznyWH5BTAHXi1E8td
 nxUin7uXdD0AJkaR0m04McjS/Bcmc1dc6I8xvkdUFYBqYCZWpKOTiEpIBlHg0XJA
 sxdngyz5lSYTGVA4o4QCrdR0Tx1n36A1IYFuQj0wzxBJYZ02jEZuII/A3dd+8hiv
 EY+VeUQeVIXFFuOcY+e0ScPpn7Nr17hAd1en/j2Hcoe4ZE8plqG2QTcnwgflcbis
 iomvJ4yk0Q==
 =0Rw1
 -----END PGP SIGNATURE-----

Merge tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Here are the core block changes for 5.19. This contains:

   - blk-throttle accounting fix (Laibin)

   - Series removing redundant assignments (Michal)

   - Expose bio cache via the bio_set, so that DM can use it (Mike)

   - Finish off the bio allocation interface cleanups by dealing with
     the weirdest member of the family. bio_kmalloc combines a kmalloc
     for the bio and bio_vecs with a hidden bio_init call and magic
     cleanup semantics (Christoph)

   - Clean up the block layer API so that APIs consumed by file systems
     are (almost) only struct block_device based, so that file systems
     don't have to poke into block layer internals like the
     request_queue (Christoph)

   - Clean up the blk_execute_rq* API (Christoph)

   - Clean up various lose end in the blk-cgroup code to make it easier
     to follow in preparation of reworking the blkcg assignment for bios
     (Christoph)

   - Fix use-after-free issues in BFQ when processes with merged queues
     get moved to different cgroups (Jan)

   - BFQ fixes (Jan)

   - Various fixes and cleanups (Bart, Chengming, Fanjun, Julia, Ming,
     Wolfgang, me)"

* tag 'for-5.19/block-2022-05-22' of git://git.kernel.dk/linux-block: (83 commits)
  blk-mq: fix typo in comment
  bfq: Remove bfq_requeue_request_body()
  bfq: Remove superfluous conversion from RQ_BIC()
  bfq: Allow current waker to defend against a tentative one
  bfq: Relax waker detection for shared queues
  blk-cgroup: delete rcu_read_lock_held() WARN_ON_ONCE()
  blk-throttle: Set BIO_THROTTLED when bio has been throttled
  blk-cgroup: Remove unnecessary rcu_read_lock/unlock()
  blk-cgroup: always terminate io.stat lines
  block, bfq: make bfq_has_work() more accurate
  block, bfq: protect 'bfqd->queued' by 'bfqd->lock'
  block: cleanup the VM accounting in submit_bio
  block: Fix the bio.bi_opf comment
  block: reorder the REQ_ flags
  blk-iocost: combine local_stat and desc_stat to stat
  block: improve the error message from bio_check_eod
  block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone
  block: remove superfluous calls to blkcg_bio_issue_init
  kthread: unexport kthread_blkcg
  blk-cgroup: cleanup blkcg_maybe_throttle_current
  ...
2022-05-23 13:56:39 -07:00
Julia Lawall 537b9f2bf6 mtip32xx: fix typo in comment
Spelling mistake (triple letters) in comment.
Detected with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr>
Link: https://lore.kernel.org/r/20220521111145.81697-28-Julia.Lawall@inria.fr
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-21 06:32:27 -06:00
Johannes Weiner b3fbd58fcb mm: Kconfig: simplify zswap configuration
- CONFIG_ZRAM: Zram is a user-facing feature, whereas zsmalloc is
  not. Don't make the user chase down a technical dependency like
  that, just select it in automatically when zram is requested. The
  CONFIG_CRYPTO dependency is redundant due to more specific deps.

- CONFIG_ZPOOL: This is not a user-facing feature. Hide the symbol and
  have it selected in as needed.

- CONFIG_ZSWAP: Select CRYPTO instead of depend. Common pattern.

- Make the ZSWAP suboptions and their descriptions (compression,
  allocation backend) a bit more straight-forward for the user.

Link: https://lkml.kernel.org/r/20220510152847.230957-5-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Seth Jennings <sjenning@redhat.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-19 14:08:53 -07:00
Juergen Gross 47cbd59833 xen/blkfront: use xenbus_setup_ring() and xenbus_teardown_ring()
Simplify blkfront's ring creation and removal via xenbus_setup_ring()
and xenbus_teardown_ring().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-05-19 14:21:56 +02:00
Juergen Gross 21b539711a xen/blkfront: switch blkfront to use INVALID_GRANT_REF
Instead of using a private macro for an invalid grant reference use
the common one.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
2022-05-19 14:21:39 +02:00
Xie Yongji 491bf8f236 nbd: Fix hung on disconnect request if socket is closed before
When userspace closes the socket before sending a disconnect
request, the following I/O requests will be blocked in
wait_for_reconnect() until dead timeout. This will cause the
following disconnect request also hung on blk_mq_quiesce_queue().
That means we have no way to disconnect a nbd device if there
are some I/O requests waiting for reconnecting until dead timeout.
It's not expected. So let's wake up the thread waiting for
reconnecting directly when a disconnect request is sent.

Reported-by: Xu Jianhai <zero.xu@bytedance.com>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220322080639.142-1-xieyongji@bytedance.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-16 06:19:35 -06:00
Paul Gortmaker e24ccaaf7e block: remove last remaining traces of IDE documentation
The last traces of the IDE driver went away in commit b7fb14d3ac
("ide: remove the legacy ide driver") but it left behind some traces
of old documentation.

As luck would have it Randy and I would submit similar changes within
a week of each other to address this.  As Randy's commit is in the doc
tree already - this delta is just the stuff my removal contained that
was not in Randy's IDE doc removal.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Phillip Potter <phil@philpotter.co.uk>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lore.kernel.org/all/20220427165917.GE12977@windriver.com
[phil@philpotter.co.uk: removed diffs already added by others]
Signed-off-by: Phillip Potter <phil@philpotter.co.uk>
Link: https://lore.kernel.org/r/20220515205833.944139-5-phil@philpotter.co.uk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-15 18:31:28 -06:00
Alexey Romanov e7be8d1dd9 zram: remove double compression logic
The 2nd trial allocation under per-cpu presumption has been used to
prevent regression of allocation failure.  However, it makes trouble for
maintenance without significant benefit.  The slowpath branch is executed
extremely rarely: getting there is problematic.  Therefore, we delete this
branch.

Since b09ab054b6 ("zram: support BDI_CAP_STABLE_WRITES"), zram has used
QUEUE_FLAG_STABLE_WRITES to prevent buffer change between 1st and 2nd
memory allocations.  Since we remove second trial memory allocation logic,
we could remove the STABLE_WRITES flag because there is no change buffer
to be modified under us.

Link: https://lkml.kernel.org/r/20220505094443.11728-1-avromanov@sberdevices.ru
Signed-off-by: Alexey Romanov <avromanov@sberdevices.ru>
Signed-off-by: Dmitry Rokosov <ddrokosov@sberdevices.ru>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-13 07:20:18 -07:00
Christoph Hellwig eb04bb154b loop: remove most the top-of-file boilerplate comment
Remove the irrelevant changelogs and todo notes and just leave the SPDX
marker and the copyright notice.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220419063303.583106-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-10 06:30:05 -06:00
Christoph Hellwig f21e6e185a loop: add a SPDX header
The copyright statement says:

"Redistribution of this file is permitted under the GNU General Public
 License." and was added by Ted in 1993, at which point GPLv2 only
 was the default Linux license.

Replace it with the usual GPLv2 only SPDX header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220419063303.583106-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-10 06:30:05 -06:00
Christoph Hellwig 754d96798f loop: remove loop.h
Merge loop.h into loop.c as all the content is only used there.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220419063303.583106-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-10 06:30:05 -06:00
NeilBrown a2ad63daa8 VFS: add FMODE_CAN_ODIRECT file flag
Currently various places test if direct IO is possible on a file by
checking for the existence of the direct_IO address space operation.
This is a poor choice, as the direct_IO operation may not be used - it is
only used if the generic_file_*_iter functions are called for direct IO
and some filesystems - particularly NFS - don't do this.

Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
various places to check this (avoiding pointer dereferences).
do_dentry_open() will set this flag if ->direct_IO is present, so
filesystems do not need to be changed.

NFS *is* changed, to set the flag explicitly and discard the direct_IO
entry in the address_space_operations for files.

Other filesystems which currently use noop_direct_IO could usefully be
changed to set this flag instead.

Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: David Howells <dhowells@redhat.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-05-09 18:20:49 -07:00
Willy Tarreau f3b10a3c22 ataflop: use a statically allocated error counters
This is the last driver making use of fd_request->error_count, which is
easy to get wrong as was shown in floppy.c.  We don't need to keep it
there, it can be moved to the atari_floppy_struct instead, so let's do
this.

Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Cc: Minh Yuan <yuanmingbuaa@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08 10:01:48 -07:00
Willy Tarreau f71f01394f floppy: use a statically allocated error counter
Interrupt handler bad_flp_intr() may cause a UAF on the recently freed
request just to increment the error count.  There's no point keeping
that one in the request anyway, and since the interrupt handler uses a
static pointer to the error which cannot be kept in sync with the
pending request, better make it use a static error counter that's reset
for each new request.  This reset now happens when entering
redo_fd_request() for a new request via set_next_request().

One initial concern about a single error counter was that errors on one
floppy drive could be reported on another one, but this problem is not
real given that the driver uses a single drive at a time, as that
PC-compatible controllers also have this limitation by using shared
signals.  As such the error count is always for the "current" drive.

Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Tested-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-05-08 10:01:48 -07:00
Damien Le Moal 49c3b9266a block: null_blk: Improve device creation with configfs
Currently, the directory name used to create a nullb device through
sysfs is not used as the device name, potentially causing headaches for
users if devices are already created through the modprobe operation
withe the nr_device module parameter not set to 0. E.g. a user can do
"mkdir /sys/kernel/config/nullb/nullb0" to create a nullb device even
though /dev/nullb0 was already created by modprobe. In this case, the
configfs nullb device will be named nullb1, causing confusion for the
user.

Simplify this by using the configfs directory name as the nullb device
name, always, unless another nullb device is already using the same
name. E.g. if modprobe created nullb0, then:

$ mkdir /sys/kernel/config/nullb/nullb0
mkdir: cannot create directory '/sys/kernel/config/nullb/nullb0': File
exists

will be reported to the user.

To implement this, the function null_find_dev_by_name() is added to
check for the existence of a nullb device with the name used for a new
configfs device directory. nullb_group_make_item() uses this new
function to check if the directory name can be used as the disk name.
Finally, null_add_dev() is modified to use the device config item name
as the disk name for a new nullb device created using configfs.
The naming of devices created though modprobe remains unchanged.

Of note is that it is possible for a user to create through configfs a
nullb device with the same name as an existing device. E.g.

$ mkdir /sys/kernel/config/nullb/null

will successfully create the nullb device named "null" but this block
device will however not appear under /dev/ since /dev/null already
exists.

Suggested-by: Joseph Bacik <josef@toxicpanda.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220420005718.3780004-5-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04 05:24:58 -06:00
Damien Le Moal db060f54e0 block: null_blk: Cleanup messages
Use the pr_fmt() macro to prefix all null_blk pr_xxx() messages with
"null_blk:" to clarify which module is printing the messages. Also add
a pr_info() message in null_add_dev() to print the name of a newly
created disk.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220420005718.3780004-4-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04 05:24:58 -06:00
Damien Le Moal b3a0a73e8a block: null_blk: Cleanup device creation and deletion
Introduce the null_create_dev() and null_destroy_dev() helper functions
to respectivel create nullb devices on modprobe and destroy them on
rmmod. The null_destroy_dev() helper avoids duplicated code in the
null_init() and null_exit() functions for deleting devices.

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220420005718.3780004-3-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04 05:24:58 -06:00
Damien Le Moal 525323d25e block: null_blk: Fix code style issues
Fix message grammar and code style issues (brackets and indentation) in
null_init().

Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220420005718.3780004-2-damien.lemoal@opensource.wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04 05:24:58 -06:00
Christoph Hellwig 0000f2f720 xen-blkback: use bdev_discard_alignment
Use bdev_discard_alignment to calculate the correct discard alignment
offset even for partitions instead of just looking at the queue limit.

Also switch to use bdev_discard_granularity to get rid of the last direct
queue reference in xen_blkbk_discard.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-12-hch@lst.de
[axboe: fold in 'q' removal as it's now unused]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-04 05:24:40 -06:00
Christoph Hellwig 18292faa89 rnbd-srv: use bdev_discard_alignment
Use bdev_discard_alignment to calculate the correct discard alignment
offset even for partitions instead of just looking at the queue limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Christoph Hellwig 4418bfd8fb loop: remove a spurious clear of discard_alignment
The loop driver never sets a discard_alignment, so it also doens't need
to clear it to zero.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Christoph Hellwig 62952cc5bc virtio_blk: fix the discard_granularity and discard_alignment queue limits
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.

On the other hand the discard_sector_alignment from the virtio 1.1 looks
similar to what Linux uses as discard granularity (even if not very well
described):

  "discard_sector_alignment can be used by OS when splitting a request
   based on alignment. "

And at least qemu does set it to the discard granularity.

So stop setting the discard_alignment and use the virtio
discard_sector_alignment to set the discard granularity.

Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Christoph Hellwig fb749a87f4 null_blk: don't set the discard_alignment queue limit
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by null_blk is mostly
harmless but also useless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Christoph Hellwig 4a04d517c5 nbd: don't set the discard_alignment queue limit
The discard_alignment queue limit is named a bit misleading means the
offset into the block device at which the discard granularity starts.
Setting it to the discard granularity as done by nbd is mostly harmless
but also useless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220418045314.360785-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-03 10:38:50 -06:00
Christoph Hellwig bbb1ebe7a9 blk-cgroup: replace bio_blkcg with bio_blkcg_css
All callers of bio_blkcg actually want the CSS, so replace it with an
interface that does return the CSS.  This now allows to move
struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a
public header.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-02 14:06:20 -06:00
Tetsuo Handa 0b8d7622ab aoe: Avoid flush_scheduled_work() usage
Flushing system-wide workqueues is dangerous and will be forbidden.
Replace system_wq with local aoe_wq.

Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/abb37616-eec9-2794-e21e-7c623085d987@I-love.SAKURA.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-05-01 06:41:41 -06:00
Brian Geffon 30226b69f8 zram: add a huge_idle writeback mode
Today it's only possible to write back as a page, idle, or huge.  A user
might want to writeback pages which are huge and idle first as these idle
pages do not require decompression and make a good first pass for
writeback.

Idle writeback specifically has the advantage that a refault is unlikely
given that the page has been swapped for some amount of time without being
refaulted.

Huge writeback has the advantage that you're guaranteed to get the maximum
benefit from a single page writeback, that is, you're reclaiming one full
page of memory.  Pages which are compressed in zram being written back
result in some benefit which is always less than a page size because of
the fact that it was compressed.

The primary use of this is for minimizing refaults in situations where the
device has to be sensitive to storage endurance.  On ChromeOS we have
devices with slow eMMC and repeated writes and refaults can negatively
affect performance and endurance.

Link: https://lkml.kernel.org/r/20220322215821.1196994-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-04-29 14:36:59 -07:00
Willy Tarreau 233087ca06 floppy: disable FDRAWCMD by default
Minh Yuan reported a concurrency use-after-free issue in the floppy code
between raw_cmd_ioctl and seek_interrupt.

[ It turns out this has been around, and that others have reported the
  KASAN splats over the years, but Minh Yuan had a reproducer for it and
  so gets primary credit for reporting it for this fix   - Linus ]

The problem is, this driver tends to break very easily and nowadays,
nobody is expected to use FDRAWCMD anyway since it was used to
manipulate non-standard formats.  The risk of breaking the driver is
higher than the risk presented by this race, and accessing the device
requires privileges anyway.

Let's just add a config option to completely disable this ioctl and
leave it disabled by default.  Distros shouldn't use it, and only those
running on antique hardware might need to enable it.

Link: https://lore.kernel.org/all/000000000000b71cdd05d703f6bf@google.com/
Link: https://lore.kernel.org/lkml/CAKcFiNC=MfYVW-Jt9A3=FPJpTwCD2PL_ULNCpsCVE5s8ZeBQgQ@mail.gmail.com
Link: https://lore.kernel.org/all/CAEAjamu1FRhz6StCe_55XY5s389ZP_xmCF69k987En+1z53=eg@mail.gmail.com
Reported-by: Minh Yuan <yuanmingbuaa@gmail.com>
Reported-by: syzbot+8e8958586909d62b6840@syzkaller.appspotmail.com
Reported-by: cruise k <cruise4k@gmail.com>
Reported-by: Kyungtae Kim <kt0755@gmail.com>
Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org>
Tested-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-27 09:41:54 -07:00
Yu Kuai 8ba816b23a null-blk: save memory footprint for struct nullb_cmd
Total 16 bytes can be saved in two ways:

1) The field 'bio' will only be used in bio based mode, and the field
   'rq' will only be used in mq mode. Since they won't be used in the
   same time, declare a union for them.
2) The field 'bool fake_timeout' can be placed in the hole after the
   field 'error'.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Link: https://lore.kernel.org/r/20220426022133.3999006-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-25 20:31:20 -06:00
Jack Wang 5ea7c1339e block/rnbd-clt: Avoid flush_workqueue(system_long_wq) usage
Flushing system-wide workqueues is dangerous and will be forbidden.

Replace system_long_wq with local rnbd_clt_wq.

Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp

Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: Santosh Kumar Pradhan <santosh.pradhan@ionos.com>
Link: https://lore.kernel.org/r/20220413123420.66470-1-jinpu.wang@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 09:24:56 -06:00
Christoph Hellwig d292dc8068 loop: don't destroy lo->workqueue in __loop_clr_fd
There is no need to destroy the workqueue when clearing unbinding
a loop device from a backing file.  Not doing so on the other hand
avoid creating a complex lock dependency chain involving the global
system_transition_mutex.

Based on a patch from Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>.

Reported-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: syzbot+6479585dfd4dedd3f7e1@syzkaller.appspotmail.com
Link: https://lore.kernel.org/r/20220330052917.2566582-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:10 -06:00
Christoph Hellwig a0e286b6a5 loop: remove lo_refcount and avoid lo_mutex in ->open / ->release
lo_refcount counts how many openers a loop device has, but that count
is already provided by the block layer in the bd_openers field of the
whole-disk block_device.  Remove lo_refcount and allow opens to
succeed even on devices beeing deleted - now that ->free_disk is
implemented we can handle that race gracefull and all I/O on it will
just fail. Similarly there is a small race window now where
loop_control_remove does not synchronize the delete vs the remove
due do bd_openers not being under lo_mutex protection, but we can
handle that just as gracefully.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-15-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Tetsuo Handa 158eaeba4b loop: avoid loop_validate_mutex/lo_mutex in ->release
Since ->release is called with disk->open_mutex held, and __loop_clr_fd()
 from lo_release() is called via ->release when disk_openers() == 0, we are
guaranteed that "struct file" which will be passed to loop_validate_file()
via fget() cannot be the loop device __loop_clr_fd(lo, true) will clear.
Thus, there is no need to hold loop_validate_mutex from __loop_clr_fd()
if release == true.

When I made commit 3ce6e1f662 ("loop: reintroduce global lock for
safe loop_validate_file() traversal"), I wrote "It is acceptable for
loop_validate_file() to succeed, for actual clear operation has not started
yet.". But now I came to feel why it is acceptable to succeed.

It seems that the loop driver was added in Linux 1.3.68, and

  if (lo->lo_refcnt > 1)
    return -EBUSY;

check in loop_clr_fd() was there from the beginning. The intent of this
check was unclear. But now I think that current

  disk_openers(lo->lo_disk) > 1

form is there for three reasons.

(1) Avoid I/O errors when some process which opens and reads from this
    loop device in response to uevent notification (e.g. systemd-udevd),
    as described in commit a1ecac3b06 ("loop: Make explicit loop
    device destruction lazy"). This opener is short-lived because it is
    likely that the file descriptor used by that process is closed soon.

(2) Avoid I/O errors caused by underlying layer of stacked loop devices
    (i.e. ioctl(some_loop_fd, LOOP_SET_FD, other_loop_fd)) being suddenly
    disappeared. This opener is long-lived because this reference is
    associated with not a file descriptor but lo->lo_backing_file.

(3) Avoid I/O errors caused by underlying layer of mounted loop device
    (i.e. mount(some_loop_device, some_mount_point)) being suddenly
    disappeared. This opener is long-lived because this reference is
    associated with not a file descriptor but mount.

While race in (1) might be acceptable, (2) and (3) should be checked
racelessly. That is, make sure that __loop_clr_fd() will not run if
loop_validate_file() succeeds, by doing refcount check with global lock
held when explicit loop device destruction is requested.

As a result of no longer waiting for lo->lo_mutex after setting Lo_rundown,
we can remove pointless BUG_ON(lo->lo_state != Lo_rundown) check.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220330052917.2566582-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 498ef5c777 loop: suppress uevents while reconfiguring the device
Currently, udev change event is generated for a loop device before the
device is ready for IO. Due to serialization on lo->lo_mutex in
lo_open() this does not matter because anybody is able to open the
device and do IO only after the configuration is finished. However this
synchronization in lo_open() is going away so make sure userspace
reacting to the change event will see the new device state by generating
the event only when the device is setup.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig d2c7f56f8b loop: implement ->free_disk
Ensure that the lo_device which is stored in the gendisk private
data is valid until the gendisk is freed.  Currently the loop driver
uses a lot of effort to make sure a device is not freed when it is
still in use, but to to fix a potential deadlock this will be relaxed
a bit soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 1fe0b1acb1 loop: only freeze the queue in __loop_clr_fd when needed
->release is only called after all outstanding I/O has completed, so only
freeze the queue when clearing the backing file of a live loop device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 46dc967445 loop: don't freeze the queue in lo_release
By the time the final ->release is called there can't be outstanding I/O.
For non-final ->release there is no need for driver action at all.

Thus remove the useless queue freeze.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 98ded54a33 loop: remove the racy bd_inode->i_mapping->nrpages asserts
Nothing prevents a file system or userspace opener of the block device
from redirtying the page right afte sync_blockdev returned.  Fortunately
data in the page cache during a block device change is mostly harmless
anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig b15ed54694 loop: initialize the worker tracking fields once
There is no need to reinitialize idle_worker_list, worker_tree and timer
every time a loop device is configured.  Just initialize them once at
allocation time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 2cf429b53c loop: de-duplicate the idle worker freeing code
Use a common helper for both timer based and uncoditional freeing of idle
workers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Tested-by: Darrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220330052917.2566582-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig dbdc1be325 block: add a disk_openers helper
Add a helper that returns the openers for a given gendisk to avoid having
drivers poke into disk->part0 to get at this information in a somewhat
cumbersome way.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 7a86d6dc14 zram: cleanup zram_remove
Remove the bdev variable and just use the gendisk pointed to by the
zram_device directly.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig d666e20e2e zram: cleanup reset_store
Use a local variable for the gendisk instead of the part0 block_device,
as the gendisk is what this function actually operates on.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Christoph Hellwig 2a852a693f nbd: use the correct block_device in nbd_bdev_reset
The bdev parameter to ->ioctl contains the block device that the ioctl
is called on, which can be the partition.  But the openers check in
nbd_bdev_reset really needs to check use the whole device, so switch to
using that.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220330052917.2566582-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:54:09 -06:00
Ming Lei 5f0614a55e block: change exported IO accounting interface from gendisk to bdev
Export IO accounting interfaces in terms of block_device now that
gendisk has become more internal to block core.

Rename __part_{start,end}_io_acct's first argument from part to bdev.
Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and
export them.  Remove disk_{start,end}_io_acct and update caller (zram)
to use bdev_{start,end}_io_acct.

DM can now be updated to use bdev_{start,end}_io_acct.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220418022733.56168-2-snitzer@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-18 06:49:52 -06:00
Haowen Bai 8fd6533ef3 drbd: Return true/false (not 1/0) from bool functions
Return boolean values ("true" or "false") instead of 1 or 0 from bool
functions.  This fixes the following warnings from coccicheck:

./drivers/block/drbd/drbd_req.c:912:9-10: WARNING: return of 0/1 in
function 'remote_due_to_read_balancing' with return type bool

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-8-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Uladzislau Rezki (Sony) 90c6c29145 drdb: Switch to kvfree_rcu() API
Instead of invoking a synchronize_rcu() to free a pointer
after a grace period we can directly make use of new API
that does the same but in more efficient way.

TO: Jens Axboe <axboe@kernel.dk>
TO: Philipp Reisner <philipp.reisner@linbit.com>
TO: Jason Gunthorpe <jgg@nvidia.com>
TO: drbd-dev@lists.linbit.com
TO: linux-block@vger.kernel.org
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-7-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Cai Huoqing e6be38a164 drbd: Replace "unsigned" with "unsigned int"
when run checkpath.pl for the first patch, found that
WARNING: Prefer 'unsigned int' to bare use of 'unsigned'.
so fix it. BTW

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-6-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Cai Huoqing ba6bee98d0 drbd: Make use of PFN_UP helper macro
it's a refactor to make use of PFN_UP helper macro

Signed-off-by: Cai Huoqing <caihuoqing@baidu.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-5-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Jiapeng Chong e1838cf01b block: drbd: drbd_receiver: Remove redundant assignment to err
Variable err is set to '-EIO' but this value is never read as
it is overwritten or not used later on, hence it is a redundant
assignment and can be removed.

Clean up the following clang-analyzer warning:

drivers/block/drbd/drbd_receiver.c:3955:5: warning: Value stored to
'err' is never read [clang-analyzer-deadcode.DeadStores].

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-4-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Arnd Bergmann 4b28f3b448 drbd: address enum mismatch warnings
gcc -Wextra warns about mixing drbd_state_rv with drbd_ret_code
in a couple of places:

drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_set_role':
drivers/block/drbd/drbd_nl.c:777:14: warning: comparison between 'enum drbd_state_rv' and 'enum drbd_ret_code' [-Wenum-compare]
  777 |  if (retcode != NO_ERROR)
      |              ^~
drivers/block/drbd/drbd_nl.c:784:12: warning: implicit conversion from 'enum drbd_ret_code' to 'enum drbd_state_rv' [-Wenum-conversion]
  784 |    retcode = ERR_MANDATORY_TAG;
      |            ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_attach':
drivers/block/drbd/drbd_nl.c:1965:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
 1965 |  retcode = rv;  /* FIXME: Type mismatch. */
      |          ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_connect':
drivers/block/drbd/drbd_nl.c:2690:10: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
 2690 |  retcode = conn_request_state(connection, NS(conn, C_UNCONNECTED), CS_VERBOSE);
      |          ^
drivers/block/drbd/drbd_nl.c: In function 'drbd_adm_disconnect':
drivers/block/drbd/drbd_nl.c:2803:11: warning: implicit conversion from 'enum drbd_state_rv' to 'enum drbd_ret_code' [-Wenum-conversion]
 2803 |   retcode = rv;  /* FIXME: Type mismatch. */
      |           ^

In each case, both are passed into drbd_adm_finish(), which just takes
a 32-bit integer and is happy with either, presumably intentionally.

Restructure the code to pass either type directly in there in most
cases, avoiding the warnings.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-3-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Arnd Bergmann 33cb0917bb drbd: fix duplicate array initializer
There are two initializers for P_RETRY_WRITE:

drivers/block/drbd/drbd_main.c:3676:22: warning: initialized field overwritten [-Woverride-init]

Remove the first one since it was already ignored by the compiler
and reorder the list to match the enum definition. As P_ZEROES had
no entry, add that one instead.

Fixes: 036b17eaab ("drbd: Receiving part for the PROTOCOL_UPDATE packet")
Fixes: f31e583aa2 ("drbd: introduce P_ZEROES (REQ_OP_WRITE_ZEROES on the "wire")")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220406190715.1938174-2-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:54:24 -06:00
Christoph Hellwig 44abff2c0b block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD
Secure erase is a very different operation from discard in that it is
a data integrity operation vs hint.  Fully split the limits and helper
infrastructure to make the separation more clear.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2]
Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Acked-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig 7b47ef52d0 block: add a bdev_discard_granularity helper
Abstract away implementation details from file systems by providing a
block_device based helper to retrieve the discard granularity.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig 70200574cc block: remove QUEUE_FLAG_DISCARD
Just use a non-zero max_discard_sectors as an indicator for discard
support, similar to what is done for write zeroes.

The only places where needs special attention is the RAID5 driver,
which must clear discard support for security reasons by default,
even if the default stacking rules would allow for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig cf0fbf894b block: add a bdev_max_discard_sectors helper
Add a helper to query the number of sectors support per each discard bio
based on the block device and use this helper to stop various places from
poking into the request_queue to see if discard is supported and if so how
much.  This mirrors what is done e.g. for write zeroes as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd]
Acked-by: Coly Li <colyli@suse.de> [bcache]
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig a557e82e5a block: add a bdev_fua helper
Add a helper to check the FUA flag based on the block_device instead of
having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig 08e688fdb8 block: add a bdev_write_cache helper
Add a helper to check the write cache flag based on the block_device
instead of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig 10f0d2a517 block: add a bdev_nonrot helper
Add a helper to check the nonrot flag based on the block_device instead
of having to poke into the block layer internal request_queue.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: David Sterba <dsterba@suse.com> [btrfs]
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:59 -06:00
Christoph Hellwig 998e9cbcd6 drbd: cleanup decide_on_discard_support
Sanitize the calling conventions and use a goto label to cleanup the
code flow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-8-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig c6f23b1a05 drbd: use bdev_alignment_offset instead of queue_alignment_offset
The bdev version does the right thing for partitions, so use that.

Fixes: 9104d31a75 ("drbd: introduce WRITE_SAME support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig 7a38acce22 drbd: use bdev based limit helpers in drbd_send_sizes
Use the bdev based limits helpers where they exist.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig 40349d0e16 drbd: remove assign_p_sizes_qlim
Fold each branch into its only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220415045258.199825-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:49:58 -06:00
Christoph Hellwig 852ad96cb0 pktcdvd: stop using bio_reset
Just initialize the bios on-demand.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220406061228.410163-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:30:47 -06:00
Christoph Hellwig 066ff57101 block: turn bio_kmalloc into a simple kmalloc wrapper
Remove the magic autofree semantics and require the callers to explicitly
call bio_init to initialize the bio.

This allows bio_free to catch accidental bio_put calls on bio_init()ed
bios as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Coly Li <colyli@suse.de>
Acked-by: Mike Snitzer <snitzer@kernel.org>
Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-17 19:30:41 -06:00
Ming Lei 3e3876d322 block: null_blk: end timed out poll request
When poll request is timed out, it is removed from the poll list,
but not completed, so the request is leaked, and never get chance
to complete.

Fix the issue by ending it in timeout handler.

Fixes: 0a593fbbc2 ("null_blk: poll queue support")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220413084836.1571995-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-14 10:16:33 -06:00
Christoph Böhmwalder 286901941f drbd: set QUEUE_FLAG_STABLE_WRITES
We want our pages not to change while they are being written.

Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-06 13:07:53 -06:00
Xiaomeng Tong ae4d37b5df drbd: fix an invalid memory access caused by incorrect use of list iterator
The bug is here:
	idr_remove(&connection->peer_devices, vnr);

If the previous for_each_connection() don't exit early (no goto hit
inside the loop), the iterator 'connection' after the loop will be a
bogus pointer to an invalid structure object containing the HEAD
(&resource->connections). As a result, the use of 'connection' above
will lead to a invalid memory access (including a possible invalid free
as idr_remove could call free_layer).

The original intention should have been to remove all peer_devices,
but the following lines have already done the work. So just remove
this line and the unneeded label, to fix this bug.

Cc: stable@vger.kernel.org
Fixes: c06ece6ba6 ("drbd: Turn connection->volumes into connection->peer_devices")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Reviewed-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-06 13:07:52 -06:00
Lv Yunlong aadb22ba2f drbd: Fix five use after free bugs in get_initial_state
In get_initial_state, it calls notify_initial_state_done(skb,..) if
cb->args[5]==1. If genlmsg_put() failed in notify_initial_state_done(),
the skb will be freed by nlmsg_free(skb).
Then get_initial_state will goto out and the freed skb will be used by
return value skb->len, which is a uaf bug.

What's worse, the same problem goes even further: skb can also be
freed in the notify_*_state_change -> notify_*_state calls below.
Thus 4 additional uaf bugs happened.

My patch lets the problem callee functions: notify_initial_state_done
and notify_*_state_change return an error code if errors happen.
So that the error codes could be propagated and the uaf bugs can be avoid.

v2 reports a compilation warning. This v3 fixed this warning and built
successfully in my local environment with no additional warnings.
v2: https://lore.kernel.org/patchwork/patch/1435218/

Fixes: a29728463b ("drbd: Backport the "events2" command")
Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-06 13:07:50 -06:00
Linus Torvalds 6f34f8c3d6 for-5.18/drivers-2022-04-02
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJIixQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg6WEADDnz9XznQNF6msnq60uwGUY1pvBoavumRU
 AAyBVllUv8THA/q0P9hiF48kA1JrqSttkvIqRQg+1ceK7KUaW5ua7jq72BBecoac
 rsVzagM0QpH3CBAku4z77NXk/Vc2IXCpI72xJzXpvlhRqzywwDX85LzygNecLpRo
 t8jxvI+rwlHJZ5W+Fo1oh/wsXJCJqG1Z3vMUm70hPJnycRjimNSpTX/RWFSo4k0K
 NsyxL0KZAMs5uYTieQLtZIexCaGcHYPiTZdv8uD0onI9ymHTYB43EImT5ZKdlflv
 xMBGpaeGFpgp7FH5sSfPc92Q8MybhbYBhAhZ5V28N8xN1bGtWkBuQXrfvVY9y5Eh
 WDNYfVUBWhTO6QuSnC9h+ce4GerCHycet7cYvaE5NXme6zG28z4Jq3u1ioSXqvGs
 OD4V5Pb6KhDdhwyBeNFP6e7/UIk3DNsd1hC3aegN8WjioH9kXsUj5bQturZXWlgI
 Gn+O6msD7EgDXWZYtYjN3WbKmQCnSTAVkErq9hXxK+uLkOFlCuaKBy/R/P+92qZo
 yhJVYmkLJmaC2UsRRQ5zMe7ZqL64fTTeyU96+7b9qEqsW3qJ6njaPE3AUEQVRRmd
 GFmn/oBacZoxW1cc8CuyK4i21D1Q4SWm56XK1F8Uf4ThjJZA//nXdyI+fo4ftSjT
 ordrF1vLTQ==
 =F9xx
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block

Pull block driver fix from Jens Axboe:
 "Got two reports on nbd spewing warnings on load now, which is a
  regression from a commit that went into your tree yesterday.

  Revert the problematic change for now"

* tag 'for-5.18/drivers-2022-04-02' of git://git.kernel.dk/linux-block:
  Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
2022-04-02 11:03:03 -07:00
Jens Axboe 7198bfc201 Revert "nbd: fix possible overflow on 'first_minor' in nbd_dev_add()"
This reverts commit 6d35d04a9e.

Both Gabriel and Borislav report that this commit casues a regression
with nbd:

sysfs: cannot create duplicate filename '/dev/block/43:0'

Revert it before 5.18-rc1 and we'll investigage this separately in
due time.

Link: https://lore.kernel.org/all/YkiJTnFOt9bTv6A2@zn.tnic/
Reported-by: Gabriel L. Somlo <somlo@cmu.edu>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-02 11:40:23 -06:00
Linus Torvalds 8467b0ed6c for-5.18/drivers-2022-04-01
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmJHUgMQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpgMOD/9J9mO8CQ3THnyJeZb8hy+k7fPHw+P3OrAR
 umOea3ujoMqzJsd/aRenMMAHsr7Phnb5PmljvryOo59nvwxOZ5MIBzSf2H4qJ8U2
 B4jGwESVW4OFNS6Mu+lgYH7XMyDHvqCSVdIhcnqkseoFyndpnTfsu4cCphqajVaP
 gOmXLBSQAetULxMfbqm7ofKk8F7zA80LFbwVs1VWVnCMeLVDccmJJbfn97jDZaJc
 rl8xmcvmarYLOTxDoOSdmfp4ek7QzRKQuKDlfvn1Xi+lDtkKAnYygMHhqQ0WYYc1
 /jJEd3iLCeV0jYfsDpVq6n2KRAGPCtrP0HkujifMGtuL5N2MAn/Aq3Fqoztas1yK
 p3T3SBIBVeznTtOXxl4Fm8tvim3i3rxn2vPdYnm/8uuNxqCQy78gVf5bPlLY+bzT
 4ytrP7AUpyzFj5E+8F33mZd0Vj2AL1kvgjbfEWqyQdXu7zs98UJL3xWicLbrvt/E
 nmdlZjOOBbEgV3vGQD5wTRvlsBJswtl4mHBpYYLzZtmDr8wZxHD3DCUuM1i4xyHG
 qDxNTKME2KfCXA8DIlcPxOAnNeXtwW3J7KTDuPDwC6XW84hFiC2tkM5M8aWqRos+
 GiRczSArhaomaGq8W+vRqgCnPQgFAIt+oyJ9aqen0xHG8qCOTv3N6mMTJk/uBnMI
 qcfpguHp+g==
 =0Axq
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block

Pull block driver fixes from Jens Axboe:
 "Followup block driver updates and fixes for the 5.18-rc1 merge window.
  In detail:

   - NVMe pull request
       - Fix multipath hang when disk goes live over reconnect (Anton
         Eidelman)
       - fix RCU hole that allowed for endless looping in multipath
         round robin (Chris Leech)
       - remove redundant assignment after left shift (Colin Ian King)
       - add quirks for Samsung X5 SSDs (Monish Kumar R)
       - fix the read-only state for zoned namespaces with unsupposed
         features (Pankaj Raghav)
       - use a private workqueue instead of the system workqueue in
         nvmet (Sagi Grimberg)
       - allow duplicate NSIDs for private namespaces (Sungup Moon)
       - expose use_threaded_interrupts read-only in sysfs (Xin Hao)"

   - nbd minor allocation fix (Zhang)

   - drbd fixes and maintainer addition (Lars, Jakob, Christoph)

   - n64cart build fix (Jackie)

   - loop compat ioctl fix (Carlos)

   - misc fixes (Colin, Dongli)"

* tag 'for-5.18/drivers-2022-04-01' of git://git.kernel.dk/linux-block:
  drbd: remove check of list iterator against head past the loop body
  drbd: remove usage of list iterator variable after loop
  nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
  MAINTAINERS: add drbd co-maintainer
  drbd: fix potential silent data corruption
  loop: fix ioctl calls using compat_loop_info
  nvme-multipath: fix hang when disk goes live over reconnect
  nvme: fix RCU hole that allowed for endless looping in multipath round robin
  nvme: allow duplicate NSIDs for private namespaces
  nvmet: remove redundant assignment after left shift
  nvmet: use a private workqueue instead of the system workqueue
  nvme-pci: add quirks for Samsung X5 SSDs
  nvme-pci: expose use_threaded_interrupts read-only in sysfs
  nvme: fix the read-only state for zoned namespaces with unsupposed features
  n64cart: convert bi_disk to bi_bdev->bd_disk fix build
  xen/blkfront: fix comment for need_copy
  xen-blkback: remove redundant assignment to variable i
2022-04-01 16:26:57 -07:00
Jakob Koschel 2651ee5ae4 drbd: remove check of list iterator against head past the loop body
When list_for_each_entry() completes the iteration over the whole list
without breaking the loop, the iterator value will be a bogus pointer
computed based on the head element.

While it is safe to use the pointer to determine if it was computed
based on the head element, either with list_entry_is_head() or
&pos->member == head, using the iterator variable after the loop should
be avoided.

In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to point to the found element [1].

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Link: https://lore.kernel.org/r/20220331220349.885126-2-jakobkoschel@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-31 17:08:15 -06:00
Jakob Koschel 901aeda62e drbd: remove usage of list iterator variable after loop
In preparation to limit the scope of a list iterator to the list
traversal loop, use a dedicated pointer to iterate through the list [1].

Since that variable should not be used past the loop iteration, a
separate variable is used to 'remember the current location within the
loop'.

To either continue iterating from that position or skip the iteration
(if the previous iteration was complete) list_prepare_entry() is used.

Link: https://lore.kernel.org/all/CAHk-=wgRr_D8CB-D9Kg-c=EHreAsk5SqXPwr9Y7k9sA6cWXJ6w@mail.gmail.com/ [1]
Signed-off-by: Jakob Koschel <jakobkoschel@gmail.com>
Link: https://lore.kernel.org/r/20220331220349.885126-1-jakobkoschel@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-31 17:08:04 -06:00
Zhang Wensheng 6d35d04a9e nbd: fix possible overflow on 'first_minor' in nbd_dev_add()
When 'index' is a big numbers, it may become negative which forced
to 'int'. then 'index << part_shift' might overflow to a positive
value that is not greater than '0xfffff', then sysfs might complains
about duplicate creation. Because of this, move the 'index' judgment
to the front will fix it and be better.

Fixes: b0d9111a2d ("nbd: use an idr to keep track of nbd devices")
Fixes: 940c264984 ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()")
Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20220310093224.4002895-1-zhangwensheng5@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-31 08:30:21 -06:00
Lars Ellenberg f4329d1f84 drbd: fix potential silent data corruption
Scenario:
---------

bio chain generated by blk_queue_split().
Some split bio fails and propagates its error status to the "parent" bio.
But then the (last part of the) parent bio itself completes without error.

We would clobber the already recorded error status with BLK_STS_OK,
causing silent data corruption.

Reproducer:
-----------

How to trigger this in the real world within seconds:

DRBD on top of degraded parity raid,
small stripe_cache_size, large read_ahead setting.
Drop page cache (sysctl vm.drop_caches=1, fadvise "DONTNEED",
umount and mount again, "reboot").

Cause significant read ahead.

Large read ahead request is split by blk_queue_split().
Parts of the read ahead that are already in the stripe cache,
or find an available stripe cache to use, can be serviced.
Parts of the read ahead that would need "too much work",
would need to wait for a "stripe_head" to become available,
are rejected immediately.

For larger read ahead requests that are split in many pieces, it is very
likely that some "splits" will be serviced, but then the stripe cache is
exhausted/busy, and the remaining ones will be rejected.

Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com>
Cc: <stable@vger.kernel.org> # 4.13.x
Link: https://lore.kernel.org/r/20220330185551.3553196-1-christoph.boehmwalder@linbit.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-30 12:58:56 -06:00
Carlos Llamas f941c51eea loop: fix ioctl calls using compat_loop_info
Support for cryptoloop was deleted in commit 47e9624616 ("block:
remove support for cryptoloop and the xor transfer"), making the usage
of loop_info->lo_encrypt_type obsolete. However, this member was also
removed from the compat_loop_info definition and this breaks userspace
ioctl calls for 32-bit binaries and CONFIG_COMPAT=y.

This patch restores the compat_loop_info->lo_encrypt_type member and
marks it obsolete as well as in the uapi header definitions.

Fixes: 47e9624616 ("block: remove support for cryptoloop and the xor transfer")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220329201815.1347500-1-cmllamas@google.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-30 08:54:32 -06:00
Linus Torvalds a701f370b5 xen: branch for v5.18-rc1
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYkF9UwAKCRCAXGG7T9hj
 vsXpAPwKXI4WIQcvnVCdULQfuXpA1TbD5XZuS9OuiN/OxWHbzAEA1VHWTmS+tpZ1
 ptOyoGhAWhTGeplToobDSGz5qTXEPAI=
 =FaKX
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:

 - A bunch of minor cleanups

 - A fix for kexec in Xen dom0 when executed on a high cpu number

 - A fix for resuming after suspend of a Xen guest with assigned PCI
   devices

 - A fix for a crash due to not disabled preemption when resuming as Xen
   dom0

* tag 'for-linus-5.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: fix is_xen_pmu()
  xen: don't hang when resuming PCI device
  arch:x86:xen: Remove unnecessary assignment in xen_apic_read()
  xen/grant-table: remove readonly parameter from functions
  xen/grant-table: remove gnttab_*transfer*() functions
  drivers/xen: use helper macro __ATTR_RW
  x86/xen: Fix kerneldoc warning
  xen: delay xen_hvm_init_time_ops() if kdump is boot on vcpu>=32
  xen: use time_is_before_eq_jiffies() instead of open coding it
2022-03-28 14:32:39 -07:00
Linus Torvalds 6f2689a766 SCSI misc on 20220324
This series consists of the usual driver updates (qla2xxx, pm8001,
 libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
 and bug fixes.  The high blast radius core update is the removal of
 write same, which affects block and several non-SCSI devices.  The
 other big change, which is more local, is the removal of the SCSI
 pointer.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
 6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
 Ec1HDuUCE18H1H2QAh0=
 =/Ty9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (qla2xxx, pm8001,
  libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
  and bug fixes.

  The high blast radius core update is the removal of write same, which
  affects block and several non-SCSI devices. The other big change,
  which is more local, is the removal of the SCSI pointer"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
  scsi: scsi_ioctl: Drop needless assignment in sg_io()
  scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
  scsi: lpfc: Copyright updates for 14.2.0.0 patches
  scsi: lpfc: Update lpfc version to 14.2.0.0
  scsi: lpfc: SLI path split: Refactor BSG paths
  scsi: lpfc: SLI path split: Refactor Abort paths
  scsi: lpfc: SLI path split: Refactor SCSI paths
  scsi: lpfc: SLI path split: Refactor CT paths
  scsi: lpfc: SLI path split: Refactor misc ELS paths
  scsi: lpfc: SLI path split: Refactor VMID paths
  scsi: lpfc: SLI path split: Refactor FDISC paths
  scsi: lpfc: SLI path split: Refactor LS_RJT paths
  scsi: lpfc: SLI path split: Refactor LS_ACC paths
  scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
  scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
  scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
  scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
  scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
  scsi: lpfc: SLI path split: Refactor lpfc_iocbq
  scsi: lpfc: Use kcalloc()
  ...
2022-03-24 19:37:53 -07:00
Linus Torvalds 9030fb0bb9 Folio changes for 5.18
- Rewrite how munlock works to massively reduce the contention
    on i_mmap_rwsem (Hugh Dickins):
    https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
  - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
    https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
  - Convert GUP to use folios and make pincount available for order-1
    pages. (Matthew Wilcox)
  - Convert a few more truncation functions to use folios (Matthew Wilcox)
  - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
  - Convert rmap_walk to use folios (Matthew Wilcox)
  - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
  - Add support for creating large folios in readahead (Matthew Wilcox)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
 gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
 P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
 s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
 Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
 CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
 v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
 =n9Ad
 -----END PGP SIGNATURE-----

Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache

Pull folio updates from Matthew Wilcox:

 - Rewrite how munlock works to massively reduce the contention on
   i_mmap_rwsem (Hugh Dickins):

     https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/

 - Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
   Hellwig):

     https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/

 - Convert GUP to use folios and make pincount available for order-1
   pages. (Matthew Wilcox)

 - Convert a few more truncation functions to use folios (Matthew
   Wilcox)

 - Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
   Wilcox)

 - Convert rmap_walk to use folios (Matthew Wilcox)

 - Convert most of shrink_page_list() to use a folio (Matthew Wilcox)

 - Add support for creating large folios in readahead (Matthew Wilcox)

* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
  mm/damon: minor cleanup for damon_pa_young
  selftests/vm/transhuge-stress: Support file-backed PMD folios
  mm/filemap: Support VM_HUGEPAGE for file mappings
  mm/readahead: Switch to page_cache_ra_order
  mm/readahead: Align file mappings for non-DAX
  mm/readahead: Add large folio readahead
  mm: Support arbitrary THP sizes
  mm: Make large folios depend on THP
  mm: Fix READ_ONLY_THP warning
  mm/filemap: Allow large folios to be added to the page cache
  mm: Turn can_split_huge_page() into can_split_folio()
  mm/vmscan: Convert pageout() to take a folio
  mm/vmscan: Turn page_check_references() into folio_check_references()
  mm/vmscan: Account large folios correctly
  mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
  mm/vmscan: Free non-shmem folios without splitting them
  mm/rmap: Constify the rmap_walk_control argument
  mm/rmap: Convert rmap_walk() to take a folio
  mm: Turn page_anon_vma() into folio_anon_vma()
  mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
  ...
2022-03-22 17:03:12 -07:00
Linus Torvalds 3bf03b9a08 Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton:

 - A few misc subsystems: kthread, scripts, ntfs, ocfs2, block, and vfs

 - Most the MM patches which precede the patches in Willy's tree: kasan,
   pagecache, gup, swap, shmem, memcg, selftests, pagemap, mremap,
   sparsemem, vmalloc, pagealloc, memory-failure, mlock, hugetlb,
   userfaultfd, vmscan, compaction, mempolicy, oom-kill, migration, thp,
   cma, autonuma, psi, ksm, page-poison, madvise, memory-hotplug, rmap,
   zswap, uaccess, ioremap, highmem, cleanups, kfence, hmm, and damon.

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (227 commits)
  mm/damon/sysfs: remove repeat container_of() in damon_sysfs_kdamond_release()
  Docs/ABI/testing: add DAMON sysfs interface ABI document
  Docs/admin-guide/mm/damon/usage: document DAMON sysfs interface
  selftests/damon: add a test for DAMON sysfs interface
  mm/damon/sysfs: support DAMOS stats
  mm/damon/sysfs: support DAMOS watermarks
  mm/damon/sysfs: support schemes prioritization
  mm/damon/sysfs: support DAMOS quotas
  mm/damon/sysfs: support DAMON-based Operation Schemes
  mm/damon/sysfs: support the physical address space monitoring
  mm/damon/sysfs: link DAMON for virtual address spaces monitoring
  mm/damon: implement a minimal stub for sysfs-based DAMON interface
  mm/damon/core: add number of each enum type values
  mm/damon/core: allow non-exclusive DAMON start/stop
  Docs/damon: update outdated term 'regions update interval'
  Docs/vm/damon/design: update DAMON-Idle Page Tracking interference handling
  Docs/vm/damon: call low level monitoring primitives the operations
  mm/damon: remove unnecessary CONFIG_DAMON option
  mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}()
  mm/damon/dbgfs-test: fix is_target_id() change
  ...
2022-03-22 16:11:53 -07:00
NeilBrown b9b1335e64 remove bdi_congested() and wb_congested() and related functions
These functions are no longer useful as no BDIs report congestions any
more.

Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.

So replace the calls by 'false' and simplify the code - and remove the
functions.

[akpm@linux-foundation.org: fix build]

Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>	[nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-22 15:57:01 -07:00
Linus Torvalds 69d1dea852 for-5.18/drivers-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0/QUQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpn8GEACRVxJaJV5qjZfoFAQKoAWJEtquwjeARyB+
 0V8ROWHDWHSacdug9wBytayiS1lz2zmUHJ6YXyts2dn0v6CrK4s8yGzk5G/RgH6+
 6M3GmBKjj+r1DfE8L3OoQWkDR1JFPuFxXTG/uBd7fBY2Excih1Z0D2lpspMleIRf
 w8zBrlWrWH8lZlm6HF3fadjEoiWhOM5F4Ofz3eg/PAQrHuD06z8hjQgMeR0jQVzw
 bWF9jrdNIplxRjNWIwCTsQRM+z5KQhUGwDODJjIwdQtVaKSt9D99ZbeKTudlslQ2
 zrizsCq8P1RjBPcrA45FV6QnT9DIRRGrYzHD63qC6fDae34rbzdSHUwRMP2XSxo8
 +hT1AzGypiBauODTPzHFtTskaQ0KibLznEanChh/ThySmNYcEVAljSx3Z5Vo81J+
 IqJYK2m3RESCFruy9w3U/P7qiXZmqYldPfjxAKq8ucg6x1PU3XRAVm7SI/i4l75D
 Crk1ujj2LJgsyxL6qMrK3XUavl1SJdzWeFSarcCt3m4m11EWWfYzmG8Yn8OE2CEZ
 a2CAyDsRi8CZ3hvkaMwigL4wBJjrrig8vyIgok3VrfCmYlNNqMQqM5Rw7vzjR3v1
 cKewI3rQjkFXEaveIXyGPTI/0Da4cT0DOfn/Mws9MDUXNPlFMNEDUZkPuzMywiTB
 2SWDLTe77g==
 =993h
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - NVMe updates via Christoph:
      - add vectored-io support for user-passthrough (Kanchan Joshi)
      - add verbose error logging (Alan Adamson)
      - support buffered I/O on block devices in nvmet (Chaitanya
        Kulkarni)
      - central discovery controller support (Martin Belanger)
      - fix and extended the globally unique idenfier validation
        (Christoph)
      - move away from the deprecated IDA APIs (Sagi Grimberg)
      - misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
        Chaitanya Kulkarni)
      - add lockdep annotations for in-kernel sockets (Chris Leech)
      - use vmalloc for ANA log buffer (Hannes Reinecke)
      - kerneldoc fixes (Chaitanya Kulkarni)
      - cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
      - warn about shared namespaces without multipathing (Christoph)

 - MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
   Erik, Dirk)

 - loop cleanups and queue depth configuration (Chaitanya)

 - null_blk cleanups and fixes (Chaitanya)

 - Use descriptive init/exit names in virtio_blk (Randy)

 - Use bvec_kmap_local() in drivers (Christoph)

 - bcache fixes (Mingzhe)

 - xen blk-front persistent grant speedups (Juergen)

 - rnbd fix and cleanup (Gioh)

 - Misc fixes (Christophe, Colin)

* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
  virtio_blk: eliminate anonymous module_init & module_exit
  nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
  nvme: remove nvme_alloc_request and nvme_alloc_request_qid
  nvme: cleanup how disk->disk_name is assigned
  nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
  nvmet: use snprintf() with PAGE_SIZE in configfs
  nvmet: don't fold lines
  nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
  nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
  nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
  nvme-tcp: lockdep: annotate in-kernel sockets
  nvme-tcp: don't fold the line
  nvme-tcp: don't initialize ret variable
  nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
  nvme-multipath: use vmalloc for ANA log buffer
  xen/blkfront: speed up purge_persistent_grants()
  raid5: initialize the stripe_head embeeded bios as needed
  raid5-cache: statically allocate the recovery ra bio
  raid5-cache: fully initialize flush_bio when needed
  raid5-ppl: fully initialize the bio in ppl_new_iounit
  ...
2022-03-21 17:16:01 -07:00
Linus Torvalds 616355cc81 for-5.18/block-2022-03-18
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0+GcQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgprUpD/9aTJEnj7VCw7UouSsg098sdjtoy9ilslU3
 ew47K8CIXHbCB4CDqLnFyvCwAdG1XGgS+fUmFAxvTr29R9SZeS5d+bXL6sZzEo0C
 bwxsJy9MM2QRtMvB+giAt1myXbwB8cG+ketMBWXqwXXRHRzPbbQfMZia7FqWMnfY
 KQanH9IwYHp1oa5U/W6Qcjm4oCnLgBMRwqByzUCtiF3y9qgaLkK+3IgkNwjJQjLA
 DTeUJ/9CgxGQQbzA+LPktbw2xfTqiUfcKq0mWx6Zt4wwNXn1ClqUDUXX6QSM8/5u
 3OimbscSkEPPTIYZbVBPkhFnAlQb4JaJEgOrbXvYKVV2Dh+eZY81XwNeE/E8gdBY
 TnHOTOCjkN/4sR3hIrWazlJzPLdpPA0eOYrhguCraQsX9mcsYNxlJ9otRv/Ve99g
 uqL0RZg3+NoK84fm79FCGy/ZmPQJvJttlBT9CKVwylv/Lky42xWe7AdM3OipKluY
 2nh+zN5Ai7WxZdTKXQFRhCSWfWQ+1qW51tB3dcGW+BooZr/oox47qKQVcHsEWbq1
 RNR45F5a4AuPwYUHF/P36WviLnEuq9AvX7OTTyYOplyVQohKIoDXp9chVzLNzBiZ
 KBR00W6MLKKKN+8foalQWgNyb2i2PH7Ib4xRXvXj/22Vwxg5UmUoBmSDSas9SZUS
 +dMo7CtNgA==
 =DpgP
 -----END PGP SIGNATURE-----

Merge tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - BFQ cleanups and fixes (Yu, Zhang, Yahu, Paolo)

 - blk-rq-qos completion fix (Tejun)

 - blk-cgroup merge fix (Tejun)

 - Add offline error return value to distinguish it from an IO error on
   the device (Song)

 - IO stats fixes (Zhang, Christoph)

 - blkcg refcount fixes (Ming, Yu)

 - Fix for indefinite dispatch loop softlockup (Shin'ichiro)

 - blk-mq hardware queue management improvements (Ming)

 - sbitmap dead code removal (Ming, John)

 - Plugging merge improvements (me)

 - Show blk-crypto capabilities in sysfs (Eric)

 - Multiple delayed queue run improvement (David)

 - Block throttling fixes (Ming)

 - Start deprecating auto module loading based on dev_t (Christoph)

 - bio allocation improvements (Christoph, Chaitanya)

 - Get rid of bio_devname (Christoph)

 - bio clone improvements (Christoph)

 - Block plugging improvements (Christoph)

 - Get rid of genhd.h header (Christoph)

 - Ensure drivers use appropriate flush helpers (Christoph)

 - Refcounting improvements (Christoph)

 - Queue initialization and teardown improvements (Ming, Christoph)

 - Misc fixes/improvements (Barry, Chaitanya, Colin, Dan, Jiapeng,
   Lukas, Nian, Yang, Eric, Chengming)

* tag 'for-5.18/block-2022-03-18' of git://git.kernel.dk/linux-block: (127 commits)
  block: cancel all throttled bios in del_gendisk()
  block: let blkcg_gq grab request queue's refcnt
  block: avoid use-after-free on throttle data
  block: limit request dispatch loop duration
  block/bfq-iosched: Fix spelling mistake "tenative" -> "tentative"
  sr: simplify the local variable initialization in sr_block_open()
  block: don't merge across cgroup boundaries if blkcg is enabled
  block: fix rq-qos breakage from skipping rq_qos_done_bio()
  block: flush plug based on hardware and software queue order
  block: ensure plug merging checks the correct queue at least once
  block: move rq_qos_exit() into disk_release()
  block: do more work in elevator_exit
  block: move blk_exit_queue into disk_release
  block: move q_usage_counter release into blk_queue_release
  block: don't remove hctx debugfs dir from blk_mq_exit_queue
  block: move blkcg initialization/destroy into disk allocation/release handler
  sr: implement ->free_disk to simplify refcounting
  sd: implement ->free_disk to simplify refcounting
  sd: delay calling free_opal_dev
  sd: call sd_zbc_release_disk before releasing the scsi_device reference
  ...
2022-03-21 16:48:55 -07:00
Matthew Wilcox (Oracle) cbcc268bb1 fs: Move many prototypes to pagemap.h
These functions are page cache functionality and don't need to be
declared in fs.h.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
2022-03-21 12:59:02 -04:00
Jackie Liu b2479de38d n64cart: convert bi_disk to bi_bdev->bd_disk fix build
My kernel robot report below:

  drivers/block/n64cart.c: In function ‘n64cart_submit_bio’:
  drivers/block/n64cart.c:91:26: error: ‘struct bio’ has no member named ‘bi_disk’
     91 |  struct device *dev = bio->bi_disk->private_data;
        |                          ^~
    CC      drivers/slimbus/qcom-ctrl.o
    CC      drivers/auxdisplay/hd44780.o
    CC      drivers/watchdog/watchdog_core.o
    CC      drivers/nvme/host/fault_inject.o
    AR      drivers/accessibility/braille/built-in.a
  make[2]: *** [scripts/Makefile.build:288: drivers/block/n64cart.o] Error 1

Fixes: 309dca309f ("block: store a block_device pointer in struct bio");
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220321071216.1549596-1-liu.yun@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-21 06:34:45 -06:00
Dongli Zhang 08719dd917 xen/blkfront: fix comment for need_copy
The 'need_copy' is set when rq_data_dir(req) returns WRITE, in order to
copy the written data to persistent page.

".need_copy = rq_data_dir(req) && info->feature_persistent,"

Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Fixes: c004a6fe0c ('block/xen-blkfront: Make it running on 64KB page granularity')
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317220930.5698-1-dongli.zhang@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-21 06:31:44 -06:00
Colin Ian King 93b4e74789 xen-blkback: remove redundant assignment to variable i
Variable i is being assigned a value that is never read, it is being
re-assigned later in a for-loop. The assignment is redundant and can
be removed.

Cleans up clang scan build warning:
drivers/block/xen-blkback/blkback.c:934:14: warning: Although the value
stored to 'i' is used in the enclosing expression, the value is never
actually read from 'i' [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317234646.78158-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-21 06:31:36 -06:00
Jens Axboe ae53aea611 second round of nvme updates for Linux 5.18
- add lockdep annotations for in-kernel sockets (Chris Leech)
  - use vmalloc for ANA log buffer (Hannes Reinecke)
  - kerneldoc fixes (Chaitanya Kulkarni)
  - cleanups (Guoqing Jiang, Chaitanya Kulkarni, me)
  - warn about shared namespaces without multipathing (me)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIzanILHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYN2YxAAgOMawhITprPhIyjf2sRCDGfYP3qZVp93RCaqr1rK
 fzylo7/BwDXAjGO1tXqCLFGgsktxOvyyCR0Nvx1NBCpZxVAFE+oUEQTMfyGe8yhk
 G1mugZuu1TnD1SY8UXKMz17O71dvBc+wVvxR+M32ZoL3fO+yAM7p9fNSzZOpwgAS
 XiHek18EiEeIRldoRNXl3DWsXVxxC8aTo8ljxaSgzabIPN90D6Rt2BMxhVFRd6DP
 401qTf8e1FQOaQQ+QBVCLHb7gCL/3sjbxhuU5s+QNJD5rxT5a4F5s05doa+5lfzH
 soCnhIJcjJwbCZKvvTP2Vvaspro5/IFNOAybEbrrFamUtrin7CrqaXr8IQJ+vK0m
 vDaBiJUcdv91JQtWnRsoKlG2nAguGPpD4qIeL3fCr6sy+yo9Zmr4bJJyiAUrPFVp
 a7ffVU4MUg6KeofKey7nLUvPa2g9OUxAHRr30/cQeBnCsI6QoGz/wTPd8sB11KrC
 e4RoHDlPzNZGyMEnHYKWn+0SCiYykgYRtxjwkVqdr2/j0PTHScM1u4q3bHDer53j
 hQmzDokgHV1PuHWXTQ7t4058ENYgpk+zg2tq+nr+b2jfElPXUrIpTIOzzfZLhg/T
 sFBUeQ4Q8Q35JWI3jvrRHDNB0+vpQEnQu4pXLdhvGWeGAkFEjO5bBKtqLzuyCWOZ
 nIs=
 =xUr4
 -----END PGP SIGNATURE-----

Merge tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme into for-5.18/drivers

Pull NVMe updates from Christoph:

"Second round of nvme updates for Linux 5.18

 - add lockdep annotations for in-kernel sockets (Chris Leech)
 - use vmalloc for ANA log buffer (Hannes Reinecke)
 - kerneldoc fixes (Chaitanya Kulkarni)
 - cleanups (Guoqing Jiang, Chaitanya Kulkarni, me)
 - warn about shared namespaces without multipathing (me)"

* tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme:
  nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
  nvme: remove nvme_alloc_request and nvme_alloc_request_qid
  nvme: cleanup how disk->disk_name is assigned
  nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
  nvmet: use snprintf() with PAGE_SIZE in configfs
  nvmet: don't fold lines
  nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
  nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
  nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
  nvme-tcp: lockdep: annotate in-kernel sockets
  nvme-tcp: don't fold the line
  nvme-tcp: don't initialize ret variable
  nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
  nvme-multipath: use vmalloc for ANA log buffer
2022-03-17 20:46:22 -06:00
Randy Dunlap bcfe9b6cbb virtio_blk: eliminate anonymous module_init & module_exit
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.

Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.

Example 1: (System.map)
 ffffffff832fc78c t init
 ffffffff832fc79e t init
 ffffffff832fc8f8 t init

Example 2: (initcall_debug log)
 calling  init+0x0/0x12 @ 1
 initcall init+0x0/0x12 returned 0 after 15 usecs
 calling  init+0x0/0x60 @ 1
 initcall init+0x0/0x60 returned 0 after 2 usecs
 calling  init+0x0/0x9a @ 1
 initcall init+0x0/0x9a returned 0 after 74 usecs

Fixes: e467cde238 ("Block driver using virtio.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-17 10:11:29 -06:00
Christoph Hellwig ce8d78616a nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
Start warning about exposing a namespace as multiple block devices,
and set a fixed deprecation release.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
2022-03-16 16:48:00 +01:00
Juergen Gross c94b731da2 xen/grant-table: remove readonly parameter from functions
The gnttab_end_foreign_access() family of functions is taking a
"readonly" parameter, which isn't used. Remove it from the function
parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220311103429.12845-3-jgross@suse.com
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2022-03-15 20:34:40 -05:00
Juergen Gross 85d9abcd73 xen/blkfront: speed up purge_persistent_grants()
purge_persistent_grants() is scanning the grants list for persistent
grants being no longer in use by the backend. When having found such a
grant, it will be set to "invalid" and pushed to the tail of the list.

Instead of pushing it directly to the end of the list, add it first to
a temporary list, avoiding to scan those entries again in the main
list traversal. After having finished the scan, append the temporary
list to the grant list.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20220311103527.12931-1-jgross@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-11 05:15:33 -07:00
Linus Torvalds b5521fe9a9 xen: XSA-396 security patches for v5.17
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYiXISAAKCRCAXGG7T9hj
 vlw+AP0dZipcd8hTThcXHF72WToepL5bGb3oNgRy7qNuRCrwEgEAj3LkQD3L26h/
 lzqZsiysQ35KP91+g2KSLJqg2RpnFQU=
 =rkK1
 -----END PGP SIGNATURE-----

Merge tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen fixes from Juergen Gross:
 "Several Linux PV device frontends are using the grant table interfaces
  for removing access rights of the backends in ways being subject to
  race conditions, resulting in potential data leaks, data corruption by
  malicious backends, and denial of service triggered by malicious
  backends:

   - blkfront, netfront, scsifront and the gntalloc driver are testing
     whether a grant reference is still in use. If this is not the case,
     they assume that a following removal of the granted access will
     always succeed, which is not true in case the backend has mapped
     the granted page between those two operations.

     As a result the backend can keep access to the memory page of the
     guest no matter how the page will be used after the frontend I/O
     has finished. The xenbus driver has a similar problem, as it
     doesn't check the success of removing the granted access of a
     shared ring buffer.

   - blkfront, netfront, scsifront, usbfront, dmabuf, xenbus, 9p,
     kbdfront, and pvcalls are using a functionality to delay freeing a
     grant reference until it is no longer in use, but the freeing of
     the related data page is not synchronized with dropping the granted
     access.

     As a result the backend can keep access to the memory page even
     after it has been freed and then re-used for a different purpose.

   - netfront will fail a BUG_ON() assertion if it fails to revoke
     access in the rx path.

     This will result in a Denial of Service (DoS) situation of the
     guest which can be triggered by the backend"

* tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
  xen/gnttab: fix gnttab_end_foreign_access() without page specified
  xen/pvcalls: use alloc/free_pages_exact()
  xen/9p: use alloc/free_pages_exact()
  xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()
  xen: remove gnttab_query_foreign_access()
  xen/gntalloc: don't use gnttab_query_foreign_access()
  xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
  xen/netfront: don't use gnttab_query_foreign_access() for mapped status
  xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
  xen/grant-table: add gnttab_try_end_foreign_access()
  xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
2022-03-09 20:44:17 -08:00
Ming Lei 4f48120874 blk-mq: prepare for implementing hctx table via xarray
It is inevitable to cause use-after-free on q->queue_hw_ctx between
queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting
to xarray can fix the uaf, meantime code gets cleaner.

Prepare for converting q->queue_hctx_ctx into xarray, one thing is that
xa_for_each() can only accept 'unsigned long' as index, so changes type
of hctx index of queue_for_each_hw_ctx() into 'unsigned long'.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-08 17:57:19 -07:00
Ming Lei de0328d3a2 block: mtip32xx: don't touch q->queue_hw_ctx
q->queue_hw_ctx is really one blk-mq internal structure for retrieving
hctx via its index, not supposed to be used by drivers. Meantime drivers
can get the tags structure easily from tagset.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-08 17:57:19 -07:00
Christoph Hellwig 47c426d524 pktcdvd: remove a pointless debug check in pkt_submit_bio
->queuedata is set up in pkt_init_queue, so it can't be NULL here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-07 06:42:33 -07:00
Juergen Gross abf1fd5919 xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.

In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.

For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.

If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".

This is CVE-2022-23036 / part of XSA-396.

Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
V2:
- use gnttab_try_end_foreign_access()
V4:
- use alloc_pages_exact() and free_pages_exact()
- set state to error if backend didn't unmap (Roger Pau Monné)
2022-03-07 09:48:54 +01:00
Xie Yongji e030759a1d virtio-blk: Remove BUG_ON() in virtio_queue_rq()
Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.

Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 06:06:50 -05:00
Xie Yongji dacc73ed0b virtio-blk: Don't use MAX_DISCARD_SEGMENTS if max_discard_seg is zero
Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.

Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-03-06 06:06:50 -05:00
Christoph Hellwig 13d4ef0f66 floppy: use memcpy_{to,from}_bvec
Use the helpers instead of open coding them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig 3eddaa60b8 drbd: use bvec_kmap_local in recv_dless_read
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig 472278508d drbd: use bvec_kmap_local in drbd_csum_bio
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig bd3d3203eb zram: use memcpy_from_bvec in zram_bvec_write
Use memcpy_from_bvec instead of open coding the logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:21 -07:00
Christoph Hellwig b3bd0a8a74 zram: use memcpy_to_bvec in zram_bvec_read
Use the proper helper instead of open coding the copy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:20 -07:00
Christoph Hellwig b7ab4611b6 aoe: use bvec_kmap_local in bvcpy
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220303111905.321089-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-03-04 12:29:20 -07:00
Christoph Hellwig dbb4c84d87 scsi: core: Move the result field from struct scsi_request to struct scsi_cmnd
Prepare for removing the scsi_request structure by moving the result field
to struct scsi_cmnd.

Link: https://lore.kernel.org/r/20220224175552.988286-7-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-01 22:21:50 -05:00
Christoph Hellwig ce70fd9a55 scsi: core: Remove the cmd field from struct scsi_request
Now that each scsi_request is backed by a scsi_cmnd, there is no need to
indirect the CDB storage.  Change all submitters of SCSI passthrough
requests to store the CDB information directly in the scsi_cmnd, and while
doing so allocate the full 32 bytes that cover all Linux supported SCSI
hosts instead of requiring dynamic allocation for > 16 byte CDBs.  On
64-bit systems this does not change the size of the scsi_cmnd at all, while
on 32-bit systems it slightly increases it for now, but that increase will
be made up by the removal of the remaining scsi_request fields.

Link: https://lore.kernel.org/r/20220224175552.988286-4-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-03-01 22:21:49 -05:00
Chaitanya Kulkarni df00b1d26c null_blk: null_alloc_page() cleanup
Remove goto labels and use direct returns as error unwinding code only
needs to free t_page variable if we alloc_pages() call fails as having
two labels for one kfree() can be avoided easily.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni c90b6b50b4 null_blk: remove hardcoded null_alloc_page() param
Only caller of null_alloc_page() is null_insert_page() unconditionally
sets only parameter to GFP_NOIO and that is statically hard-coded in
null_blk. There is no point in having statically hardcoded function
parameter.

Remove the unnecessary parameter gfp_flags and adjust the code, so it
can retain existing behavior null_alloc_page() with GFP_NOIO.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni 3d3472f3ed null_blk: remove hardcoded alloc_cmd() parameter
Only caller of alloc_cmd() is null_submit_bio() unconditionally sets
second parameter to true and that is statically hard-coded in null_blk.
There is no point in having statically hardcoded function parameter.

Remove the unnecessary parameter can_wait and adjust the code so it
can retain existing behavior of waiting when we don't get valid
nullb_cmd from __alloc_cmd() in alloc_cmd().

The restructured code avoids multiple return statements, multiple
calls to __alloc_cmd() and resulting a fast path call to
prepare_to_wait() due to removal of first alloc_cmd() call.

Follow the pattern that we have in bio_alloc() to set the structure
members in the structure allocation function in alloc_cmd() and pass
bio to initialize newly allocated cmd->bio member.

Follow the pattern in copy_to_nullb() to use result of one function call
(null_cache_active()) to be used as a parameter to another function call
(null_insert_page()), use result of alloc_cmd() as a first parameter to
the null_handle_cmd() in null_submit_bio() function. This allow us to
remove the local variable cmd on stack in null_submit_bio() that is in
fast path.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni ef44c50837 loop: allow user to set the queue depth
Instead of hardcoding queue depth allow user to set the hw queue depth
using module parameter. Set default value to 128 to retain the existing
behavior.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-5-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni 9c64e38cc6 loop: remove extra variable in lo_req_flush
The local variable file is used to pass it to the vfs_fsync(). We can
get away with using lo->lo_backing_file instead of storing in a local
variable which is not used anywhere else.

No functional change in this patch.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-4-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni 0aab29b854 loop: remove extra variable in lo_fallocate()
The local variable q is used to pass it to the blk_queue_discard(). We
can get away with using lo->lo_queue instead of storing in a local
variable which is not used anywhere else.

No functional change in this patch.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni b27824d31f loop: use sysfs_emit() in the sysfs xxx show()
sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.

Use a generic sysfs_emit function that knows the size of the
temporary buffer and ensures that no overrun is done for offset
attribute in
loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Chaitanya Kulkarni a75110c3b3 null_blk: fix return value from null_add_dev()
The function nullb_device_power_store() returns -ENOMEM when
null_add_dev() fails. null_add_dev() can fail with return value
other than -ENOMEM such as -EINVAL when Zoned Block Device option
is used, see :

nullb_device_power_store()
 null_add_dev()
  null_init_zoned_dev()
	return -EINVAL;

When trying to load the module having -ENOMEM value returned on the
command line creates confusion when pleanty of memory is free on the
machine.

Instead of hardcoding -ENOMEM return the value of null_add_dev()
function.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Colin Ian King d9a74051a7 loop: clean up grammar in warning message
The phrase "has still" should be "still has" to clean up the grammar.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220208114656.61629-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Christophe JAILLET 24afc15dbe block/rnbd: Remove a useless mutex
According to lib/idr.c,
   The IDA handles its own locking.  It is safe to call any of the IDA
   functions without synchronisation in your code.

so the 'ida_lock' mutex can just be removed.
It is here only to protect some ida_simple_get()/ida_simple_remove() calls.

While at it, switch to ida_alloc_XXX()/ida_free() instead to
ida_simple_get()/ida_simple_remove().
The latter is deprecated and more verbose.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/7f9eccd8b1fce1bac45ac9b01a78cf72f54c0a61.1644266862.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:49 -07:00
Gioh Kim 030ce8ba97 block/rnbd: client device does not care queue/rotational
On client side, the device is a network device. There is no reason
to set rotational even-if the target device on server is rotational.

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-3-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:48 -07:00
Gioh Kim 4a09a845c1 block/rnbd-clt: fix CHECK:BRACES warning
This patch fix the "CHECK:BRACES: braces {} should be used on all
arms of this statement" warning from checkpatch

Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-2-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-27 14:49:48 -07:00
Christoph Hellwig ebd0473763 scsi: rnbd: Remove WRITE_SAME support
REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code,
which has switched to use REQ_OP_WRITE_ZEROES long before rnbd was even
merged.

Link: https://lore.kernel.org/r/20220209082828.2629273-4-hch@lst.de
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Christoph Hellwig a34592ff6b scsi: drbd: Remove WRITE_SAME support
REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code,
which has switched to use REQ_OP_WRITE_ZEROES long ago.

Link: https://lore.kernel.org/r/20220209082828.2629273-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-02-22 21:11:08 -05:00
Christoph Hellwig 7a5428dcb7 block: fix surprise removal for drivers calling blk_set_queue_dying
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb8 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.

Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.

Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-17 07:54:03 -07:00
Christoph Hellwig 24b45e6c25 virtio_blk: simplify refcounting
Implement the ->free_disk method to free the virtio_blk structure only
once the last gendisk reference goes away instead of keeping a local
refcount.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-16 19:44:24 -07:00
Ming Lei 06582bc86d block: loop:use kstatfs.f_bsize of backing file to set discard granularity
If backing file's filesystem has implemented ->fallocate(), we think the
loop device can support discard, then pass sb->s_blocksize as
discard_granularity. However, some underlying FS, such as overlayfs,
doesn't set sb->s_blocksize, and causes discard_granularity to be set as
zero, then the warning in __blkdev_issue_discard() is triggered.

Christoph suggested to pass kstatfs.f_bsize as discard granularity, and
this way is fine because kstatfs.f_bsize means 'Optimal transfer block
size', which still matches with definition of discard granularity.

So fix the issue by setting discard_granularity as kstatfs.f_bsize if it
is available, otherwise claims discard isn't supported.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220126035830.296465-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 15:10:49 -07:00
Tetsuo Handa bf23747ee0 loop: revert "make autoclear operation asynchronous"
The kernel test robot is reporting that xfstest which does

  umount ext2 on xfs
  umount xfs

sequence started failing, for commit 322c4293ec ("loop: make
autoclear operation asynchronous") removed a guarantee that fput() of
backing file is processed before lo_release() from close() returns to
user mode.

And syzbot is reporting that deferring destroy_workqueue() from
__loop_clr_fd() to a WQ context did not help [1]. Revert that commit.

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot <syzbot+831661966588c802aae9@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20220211071554.3424-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-11 05:51:23 -07:00
Christoph Hellwig abfc426d1b block: pass a block_device to bio_clone_fast
Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04 07:43:18 -07:00
Christoph Hellwig c347a787e3 drbd: set ->bi_bdev in drbd_req_new
Make sure the newly allocated bio has the correct bi_bdev set from the
start.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-04 07:43:17 -07:00
Christoph Hellwig a7c50c9404 block: pass a block_device and opf to bio_reset
Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:50:00 -07:00
Christoph Hellwig 49add4966d block: pass a block_device and opf to bio_init
Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment.  A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig 07888c665b block: pass a block_device and opf to bio_alloc
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment.  NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig 609be10667 block: pass a block_device and opf to bio_alloc_bioset
Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment.  NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig 7d8d0c658d xen-blkback: bio_alloc can't fail if it is allow to sleep
Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig d7b78de2b1 rnbd-srv: remove struct rnbd_dev_blk_io
Only the priv field of rnbd_dev_blk_io is used, so store the value of
that in bio->bi_private directly and remove the entire bio_set overhead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00
Christoph Hellwig 1fe0640ff9 rnbd-srv: simplify bio mapping in process_rdma
The memory mapped in process_rdma is contiguous, so there is no need
to loop over bio_add_page.  Remove rnbd_bio_map_kern and just open code
the bio allocation and mapping in the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionons.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-02-02 07:49:59 -07:00