This series consists of the usual driver updates (qla2xxx, pm8001,
libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
and bug fixes. The high blast radius core update is the removal of
write same, which affects block and several non-SCSI devices. The
other big change, which is more local, is the removal of the SCSI
pointer.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYjzDQyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQMYAQDEWUGV
6U0+736AHVtOfiMNfiRN79B1HfXVoHvemnPcTwD/UlndwFfy/3GGOtoZmqEpc73J
Ec1HDuUCE18H1H2QAh0=
=/Ty9
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This series consists of the usual driver updates (qla2xxx, pm8001,
libsas, smartpqi, scsi_debug, lpfc, iscsi, mpi3mr) plus minor updates
and bug fixes.
The high blast radius core update is the removal of write same, which
affects block and several non-SCSI devices. The other big change,
which is more local, is the removal of the SCSI pointer"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (281 commits)
scsi: scsi_ioctl: Drop needless assignment in sg_io()
scsi: bsg: Drop needless assignment in scsi_bsg_sg_io_fn()
scsi: lpfc: Copyright updates for 14.2.0.0 patches
scsi: lpfc: Update lpfc version to 14.2.0.0
scsi: lpfc: SLI path split: Refactor BSG paths
scsi: lpfc: SLI path split: Refactor Abort paths
scsi: lpfc: SLI path split: Refactor SCSI paths
scsi: lpfc: SLI path split: Refactor CT paths
scsi: lpfc: SLI path split: Refactor misc ELS paths
scsi: lpfc: SLI path split: Refactor VMID paths
scsi: lpfc: SLI path split: Refactor FDISC paths
scsi: lpfc: SLI path split: Refactor LS_RJT paths
scsi: lpfc: SLI path split: Refactor LS_ACC paths
scsi: lpfc: SLI path split: Refactor the RSCN/SCR/RDF/EDC/FARPR paths
scsi: lpfc: SLI path split: Refactor PLOGI/PRLI/ADISC/LOGO paths
scsi: lpfc: SLI path split: Refactor base ELS paths and the FLOGI path
scsi: lpfc: SLI path split: Introduce lpfc_prep_wqe
scsi: lpfc: SLI path split: Refactor fast and slow paths to native SLI4
scsi: lpfc: SLI path split: Refactor lpfc_iocbq
scsi: lpfc: Use kcalloc()
...
- Rewrite how munlock works to massively reduce the contention
on i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
-----BEGIN PGP SIGNATURE-----
iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4ucgACgkQDpNsjXcp
gj69Wgf6AwqwmO5Tmy+fLScDPqWxmXJofbocae1kyoGHf7Ui91OK4U2j6IpvAr+g
P/vLIK+JAAcTQcrSCjymuEkf4HkGZOR03QQn7maPIEe4eLrZRQDEsmHC1L9gpeJp
s/GMvDWiGE0Tnxu0EOzfVi/yT+qjIl/S8VvqtCoJv1HdzxitZ7+1RDuqImaMC5MM
Qi3uHag78vLmCltLXpIOdpgZhdZexCdL2Y/1npf+b6FVkAJRRNUnA0gRbS7YpoVp
CbxEJcmAl9cpJLuj5i5kIfS9trr+/QcvbUlzRxh4ggC58iqnmF2V09l2MJ7YU3XL
v1O/Elq4lRhXninZFQEm9zjrri7LDQ==
=n9Ad
-----END PGP SIGNATURE-----
Merge tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache
Pull folio updates from Matthew Wilcox:
- Rewrite how munlock works to massively reduce the contention on
i_mmap_rwsem (Hugh Dickins):
https://lore.kernel.org/linux-mm/8e4356d-9622-a7f0-b2c-f116b5f2efea@google.com/
- Sort out the page refcount mess for ZONE_DEVICE pages (Christoph
Hellwig):
https://lore.kernel.org/linux-mm/20220210072828.2930359-1-hch@lst.de/
- Convert GUP to use folios and make pincount available for order-1
pages. (Matthew Wilcox)
- Convert a few more truncation functions to use folios (Matthew
Wilcox)
- Convert page_vma_mapped_walk to use PFNs instead of pages (Matthew
Wilcox)
- Convert rmap_walk to use folios (Matthew Wilcox)
- Convert most of shrink_page_list() to use a folio (Matthew Wilcox)
- Add support for creating large folios in readahead (Matthew Wilcox)
* tag 'folio-5.18c' of git://git.infradead.org/users/willy/pagecache: (114 commits)
mm/damon: minor cleanup for damon_pa_young
selftests/vm/transhuge-stress: Support file-backed PMD folios
mm/filemap: Support VM_HUGEPAGE for file mappings
mm/readahead: Switch to page_cache_ra_order
mm/readahead: Align file mappings for non-DAX
mm/readahead: Add large folio readahead
mm: Support arbitrary THP sizes
mm: Make large folios depend on THP
mm: Fix READ_ONLY_THP warning
mm/filemap: Allow large folios to be added to the page cache
mm: Turn can_split_huge_page() into can_split_folio()
mm/vmscan: Convert pageout() to take a folio
mm/vmscan: Turn page_check_references() into folio_check_references()
mm/vmscan: Account large folios correctly
mm/vmscan: Optimise shrink_page_list for non-PMD-sized folios
mm/vmscan: Free non-shmem folios without splitting them
mm/rmap: Constify the rmap_walk_control argument
mm/rmap: Convert rmap_walk() to take a folio
mm: Turn page_anon_vma() into folio_anon_vma()
mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read()
...
These functions are no longer useful as no BDIs report congestions any
more.
Removing the test on bdi_write_contested() in current_may_throttle()
could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE
is set.
So replace the calls by 'false' and simplify the code - and remove the
functions.
[akpm@linux-foundation.org: fix build]
Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nilfs]
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jeff Layton <jlayton@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmI0/QUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn8GEACRVxJaJV5qjZfoFAQKoAWJEtquwjeARyB+
0V8ROWHDWHSacdug9wBytayiS1lz2zmUHJ6YXyts2dn0v6CrK4s8yGzk5G/RgH6+
6M3GmBKjj+r1DfE8L3OoQWkDR1JFPuFxXTG/uBd7fBY2Excih1Z0D2lpspMleIRf
w8zBrlWrWH8lZlm6HF3fadjEoiWhOM5F4Ofz3eg/PAQrHuD06z8hjQgMeR0jQVzw
bWF9jrdNIplxRjNWIwCTsQRM+z5KQhUGwDODJjIwdQtVaKSt9D99ZbeKTudlslQ2
zrizsCq8P1RjBPcrA45FV6QnT9DIRRGrYzHD63qC6fDae34rbzdSHUwRMP2XSxo8
+hT1AzGypiBauODTPzHFtTskaQ0KibLznEanChh/ThySmNYcEVAljSx3Z5Vo81J+
IqJYK2m3RESCFruy9w3U/P7qiXZmqYldPfjxAKq8ucg6x1PU3XRAVm7SI/i4l75D
Crk1ujj2LJgsyxL6qMrK3XUavl1SJdzWeFSarcCt3m4m11EWWfYzmG8Yn8OE2CEZ
a2CAyDsRi8CZ3hvkaMwigL4wBJjrrig8vyIgok3VrfCmYlNNqMQqM5Rw7vzjR3v1
cKewI3rQjkFXEaveIXyGPTI/0Da4cT0DOfn/Mws9MDUXNPlFMNEDUZkPuzMywiTB
2SWDLTe77g==
=993h
-----END PGP SIGNATURE-----
Merge tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- NVMe updates via Christoph:
- add vectored-io support for user-passthrough (Kanchan Joshi)
- add verbose error logging (Alan Adamson)
- support buffered I/O on block devices in nvmet (Chaitanya
Kulkarni)
- central discovery controller support (Martin Belanger)
- fix and extended the globally unique idenfier validation
(Christoph)
- move away from the deprecated IDA APIs (Sagi Grimberg)
- misc code cleanup (Keith Busch, Max Gurtovoy, Qinghua Jin,
Chaitanya Kulkarni)
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, Christoph)
- warn about shared namespaces without multipathing (Christoph)
- MD updates via Song with a set of cleanups (Christoph, Mariusz, Paul,
Erik, Dirk)
- loop cleanups and queue depth configuration (Chaitanya)
- null_blk cleanups and fixes (Chaitanya)
- Use descriptive init/exit names in virtio_blk (Randy)
- Use bvec_kmap_local() in drivers (Christoph)
- bcache fixes (Mingzhe)
- xen blk-front persistent grant speedups (Juergen)
- rnbd fix and cleanup (Gioh)
- Misc fixes (Christophe, Colin)
* tag 'for-5.18/drivers-2022-03-18' of git://git.kernel.dk/linux-block: (76 commits)
virtio_blk: eliminate anonymous module_init & module_exit
nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
nvme: remove nvme_alloc_request and nvme_alloc_request_qid
nvme: cleanup how disk->disk_name is assigned
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet: use snprintf() with PAGE_SIZE in configfs
nvmet: don't fold lines
nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
nvme-tcp: lockdep: annotate in-kernel sockets
nvme-tcp: don't fold the line
nvme-tcp: don't initialize ret variable
nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
nvme-multipath: use vmalloc for ANA log buffer
xen/blkfront: speed up purge_persistent_grants()
raid5: initialize the stripe_head embeeded bios as needed
raid5-cache: statically allocate the recovery ra bio
raid5-cache: fully initialize flush_bio when needed
raid5-ppl: fully initialize the bio in ppl_new_iounit
...
These functions are page cache functionality and don't need to be
declared in fs.h.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
My kernel robot report below:
drivers/block/n64cart.c: In function ‘n64cart_submit_bio’:
drivers/block/n64cart.c:91:26: error: ‘struct bio’ has no member named ‘bi_disk’
91 | struct device *dev = bio->bi_disk->private_data;
| ^~
CC drivers/slimbus/qcom-ctrl.o
CC drivers/auxdisplay/hd44780.o
CC drivers/watchdog/watchdog_core.o
CC drivers/nvme/host/fault_inject.o
AR drivers/accessibility/braille/built-in.a
make[2]: *** [scripts/Makefile.build:288: drivers/block/n64cart.o] Error 1
Fixes: 309dca309f ("block: store a block_device pointer in struct bio");
Reported-by: k2ci <kernel-bot@kylinos.cn>
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220321071216.1549596-1-liu.yun@linux.dev
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The 'need_copy' is set when rq_data_dir(req) returns WRITE, in order to
copy the written data to persistent page.
".need_copy = rq_data_dir(req) && info->feature_persistent,"
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Fixes: c004a6fe0c ('block/xen-blkfront: Make it running on 64KB page granularity')
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317220930.5698-1-dongli.zhang@oracle.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Variable i is being assigned a value that is never read, it is being
re-assigned later in a for-loop. The assignment is redundant and can
be removed.
Cleans up clang scan build warning:
drivers/block/xen-blkback/blkback.c:934:14: warning: Although the value
stored to 'i' is used in the enclosing expression, the value is never
actually read from 'i' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220317234646.78158-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, me)
- warn about shared namespaces without multipathing (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmIzanILHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYN2YxAAgOMawhITprPhIyjf2sRCDGfYP3qZVp93RCaqr1rK
fzylo7/BwDXAjGO1tXqCLFGgsktxOvyyCR0Nvx1NBCpZxVAFE+oUEQTMfyGe8yhk
G1mugZuu1TnD1SY8UXKMz17O71dvBc+wVvxR+M32ZoL3fO+yAM7p9fNSzZOpwgAS
XiHek18EiEeIRldoRNXl3DWsXVxxC8aTo8ljxaSgzabIPN90D6Rt2BMxhVFRd6DP
401qTf8e1FQOaQQ+QBVCLHb7gCL/3sjbxhuU5s+QNJD5rxT5a4F5s05doa+5lfzH
soCnhIJcjJwbCZKvvTP2Vvaspro5/IFNOAybEbrrFamUtrin7CrqaXr8IQJ+vK0m
vDaBiJUcdv91JQtWnRsoKlG2nAguGPpD4qIeL3fCr6sy+yo9Zmr4bJJyiAUrPFVp
a7ffVU4MUg6KeofKey7nLUvPa2g9OUxAHRr30/cQeBnCsI6QoGz/wTPd8sB11KrC
e4RoHDlPzNZGyMEnHYKWn+0SCiYykgYRtxjwkVqdr2/j0PTHScM1u4q3bHDer53j
hQmzDokgHV1PuHWXTQ7t4058ENYgpk+zg2tq+nr+b2jfElPXUrIpTIOzzfZLhg/T
sFBUeQ4Q8Q35JWI3jvrRHDNB0+vpQEnQu4pXLdhvGWeGAkFEjO5bBKtqLzuyCWOZ
nIs=
=xUr4
-----END PGP SIGNATURE-----
Merge tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme into for-5.18/drivers
Pull NVMe updates from Christoph:
"Second round of nvme updates for Linux 5.18
- add lockdep annotations for in-kernel sockets (Chris Leech)
- use vmalloc for ANA log buffer (Hannes Reinecke)
- kerneldoc fixes (Chaitanya Kulkarni)
- cleanups (Guoqing Jiang, Chaitanya Kulkarni, me)
- warn about shared namespaces without multipathing (me)"
* tag 'nvme-5.18-2022-03-17' of git://git.infradead.org/nvme:
nvme: warn about shared namespaces without CONFIG_NVME_MULTIPATH
nvme: remove nvme_alloc_request and nvme_alloc_request_qid
nvme: cleanup how disk->disk_name is assigned
nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet: use snprintf() with PAGE_SIZE in configfs
nvmet: don't fold lines
nvmet-rdma: fix kernel-doc warning for nvmet_rdma_device_removal
nvmet-fc: fix kernel-doc warning for nvmet_fc_unregister_targetport
nvmet-fc: fix kernel-doc warning for nvmet_fc_register_targetport
nvme-tcp: lockdep: annotate in-kernel sockets
nvme-tcp: don't fold the line
nvme-tcp: don't initialize ret variable
nvme-multipath: call bio_io_error in nvme_ns_head_submit_bio
nvme-multipath: use vmalloc for ANA log buffer
Eliminate anonymous module_init() and module_exit(), which can lead to
confusion or ambiguity when reading System.map, crashes/oops/bugs,
or an initcall_debug log.
Give each of these init and exit functions unique driver-specific
names to eliminate the anonymous names.
Example 1: (System.map)
ffffffff832fc78c t init
ffffffff832fc79e t init
ffffffff832fc8f8 t init
Example 2: (initcall_debug log)
calling init+0x0/0x12 @ 1
initcall init+0x0/0x12 returned 0 after 15 usecs
calling init+0x0/0x60 @ 1
initcall init+0x0/0x60 returned 0 after 2 usecs
calling init+0x0/0x9a @ 1
initcall init+0x0/0x9a returned 0 after 74 usecs
Fixes: e467cde238 ("Block driver using virtio.")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220316192010.19001-2-rdunlap@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Start warning about exposing a namespace as multiple block devices,
and set a fixed deprecation release.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
The gnttab_end_foreign_access() family of functions is taking a
"readonly" parameter, which isn't used. Remove it from the function
parameters.
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20220311103429.12845-3-jgross@suse.com
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
purge_persistent_grants() is scanning the grants list for persistent
grants being no longer in use by the backend. When having found such a
grant, it will be set to "invalid" and pushed to the tail of the list.
Instead of pushing it directly to the end of the list, add it first to
a temporary list, avoiding to scan those entries again in the main
list traversal. After having finished the scan, append the temporary
list to the grant list.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20220311103527.12931-1-jgross@suse.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCYiXISAAKCRCAXGG7T9hj
vlw+AP0dZipcd8hTThcXHF72WToepL5bGb3oNgRy7qNuRCrwEgEAj3LkQD3L26h/
lzqZsiysQ35KP91+g2KSLJqg2RpnFQU=
=rkK1
-----END PGP SIGNATURE-----
Merge tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Several Linux PV device frontends are using the grant table interfaces
for removing access rights of the backends in ways being subject to
race conditions, resulting in potential data leaks, data corruption by
malicious backends, and denial of service triggered by malicious
backends:
- blkfront, netfront, scsifront and the gntalloc driver are testing
whether a grant reference is still in use. If this is not the case,
they assume that a following removal of the granted access will
always succeed, which is not true in case the backend has mapped
the granted page between those two operations.
As a result the backend can keep access to the memory page of the
guest no matter how the page will be used after the frontend I/O
has finished. The xenbus driver has a similar problem, as it
doesn't check the success of removing the granted access of a
shared ring buffer.
- blkfront, netfront, scsifront, usbfront, dmabuf, xenbus, 9p,
kbdfront, and pvcalls are using a functionality to delay freeing a
grant reference until it is no longer in use, but the freeing of
the related data page is not synchronized with dropping the granted
access.
As a result the backend can keep access to the memory page even
after it has been freed and then re-used for a different purpose.
- netfront will fail a BUG_ON() assertion if it fails to revoke
access in the rx path.
This will result in a Denial of Service (DoS) situation of the
guest which can be triggered by the backend"
* tag 'xsa396-5.17-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
xen/gnttab: fix gnttab_end_foreign_access() without page specified
xen/pvcalls: use alloc/free_pages_exact()
xen/9p: use alloc/free_pages_exact()
xen/usb: don't use gnttab_end_foreign_access() in xenhcd_gnttab_done()
xen: remove gnttab_query_foreign_access()
xen/gntalloc: don't use gnttab_query_foreign_access()
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
xen/grant-table: add gnttab_try_end_foreign_access()
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
It is inevitable to cause use-after-free on q->queue_hw_ctx between
queue_for_each_hw_ctx() and blk_mq_update_nr_hw_queues(). And converting
to xarray can fix the uaf, meantime code gets cleaner.
Prepare for converting q->queue_hctx_ctx into xarray, one thing is that
xa_for_each() can only accept 'unsigned long' as index, so changes type
of hctx index of queue_for_each_hw_ctx() into 'unsigned long'.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
q->queue_hw_ctx is really one blk-mq internal structure for retrieving
hctx via its index, not supposed to be used by drivers. Meantime drivers
can get the tags structure easily from tagset.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20220308073219.91173-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
->queuedata is set up in pkt_init_queue, so it can't be NULL here.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20220304180105.409765-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It isn't enough to check whether a grant is still being in use by
calling gnttab_query_foreign_access(), as a mapping could be realized
by the other side just after having called that function.
In case the call was done in preparation of revoking a grant it is
better to do so via gnttab_end_foreign_access_ref() and check the
success of that operation instead.
For the ring allocation use alloc_pages_exact() in order to avoid
high order pages in case of a multi-page ring.
If a grant wasn't unmapped by the backend without persistent grants
being used, set the device state to "error".
This is CVE-2022-23036 / part of XSA-396.
Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
---
V2:
- use gnttab_try_end_foreign_access()
V4:
- use alloc_pages_exact() and free_pages_exact()
- set state to error if backend didn't unmap (Roger Pau Monné)
Currently we have a BUG_ON() to make sure the number of sg
list does not exceed queue_max_segments() in virtio_queue_rq().
However, the block layer uses queue_max_discard_segments()
instead of queue_max_segments() to limit the sg list for
discard requests. So the BUG_ON() might be triggered if
virtio-blk device reports a larger value for max discard
segment than queue_max_segments(). To fix it, let's simply
remove the BUG_ON() which has become unnecessary after commit
02746e26c39e("virtio-blk: avoid preallocating big SGL for data").
And the unused vblk->sg_elems can also be removed together.
Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20220304100058.116-2-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Currently the value of max_discard_segment will be set to
MAX_DISCARD_SEGMENTS (256) with no basis in hardware if device
set 0 to max_discard_seg in configuration space. It's incorrect
since the device might not be able to handle such large descriptors.
To fix it, let's follow max_segments restrictions in this case.
Fixes: 1f23816b8e ("virtio_blk: add discard and write zeroes support")
Signed-off-by: Xie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20220304100058.116-1-xieyongji@bytedance.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Use the helpers instead of open coding them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use memcpy_from_bvec instead of open coding the logic.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use the proper helper instead of open coding the copy.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/20220303111905.321089-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Using local kmaps slightly reduces the chances to stray writes, and
the bvec interface cleans up the code a little bit.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220303111905.321089-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Prepare for removing the scsi_request structure by moving the result field
to struct scsi_cmnd.
Link: https://lore.kernel.org/r/20220224175552.988286-7-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now that each scsi_request is backed by a scsi_cmnd, there is no need to
indirect the CDB storage. Change all submitters of SCSI passthrough
requests to store the CDB information directly in the scsi_cmnd, and while
doing so allocate the full 32 bytes that cover all Linux supported SCSI
hosts instead of requiring dynamic allocation for > 16 byte CDBs. On
64-bit systems this does not change the size of the scsi_cmnd at all, while
on 32-bit systems it slightly increases it for now, but that increase will
be made up by the removal of the remaining scsi_request fields.
Link: https://lore.kernel.org/r/20220224175552.988286-4-hch@lst.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove goto labels and use direct returns as error unwinding code only
needs to free t_page variable if we alloc_pages() call fails as having
two labels for one kfree() can be avoided easily.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only caller of null_alloc_page() is null_insert_page() unconditionally
sets only parameter to GFP_NOIO and that is statically hard-coded in
null_blk. There is no point in having statically hardcoded function
parameter.
Remove the unnecessary parameter gfp_flags and adjust the code, so it
can retain existing behavior null_alloc_page() with GFP_NOIO.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220222152852.26043-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only caller of alloc_cmd() is null_submit_bio() unconditionally sets
second parameter to true and that is statically hard-coded in null_blk.
There is no point in having statically hardcoded function parameter.
Remove the unnecessary parameter can_wait and adjust the code so it
can retain existing behavior of waiting when we don't get valid
nullb_cmd from __alloc_cmd() in alloc_cmd().
The restructured code avoids multiple return statements, multiple
calls to __alloc_cmd() and resulting a fast path call to
prepare_to_wait() due to removal of first alloc_cmd() call.
Follow the pattern that we have in bio_alloc() to set the structure
members in the structure allocation function in alloc_cmd() and pass
bio to initialize newly allocated cmd->bio member.
Follow the pattern in copy_to_nullb() to use result of one function call
(null_cache_active()) to be used as a parameter to another function call
(null_insert_page()), use result of alloc_cmd() as a first parameter to
the null_handle_cmd() in null_submit_bio() function. This allow us to
remove the local variable cmd on stack in null_submit_bio() that is in
fast path.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Link: https://lore.kernel.org/r/20220216172945.31124-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Instead of hardcoding queue depth allow user to set the hw queue depth
using module parameter. Set default value to 128 to retain the existing
behavior.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-5-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The local variable file is used to pass it to the vfs_fsync(). We can
get away with using lo->lo_backing_file instead of storing in a local
variable which is not used anywhere else.
No functional change in this patch.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-4-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The local variable q is used to pass it to the blk_queue_discard(). We
can get away with using lo->lo_queue instead of storing in a local
variable which is not used anywhere else.
No functional change in this patch.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-3-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
sprintf does not know the PAGE_SIZE maximum of the temporary buffer
used for outputting sysfs content and it's possible to overrun the
PAGE_SIZE buffer length.
Use a generic sysfs_emit function that knows the size of the
temporary buffer and ensures that no overrun is done for offset
attribute in
loop_attr_[offset|sizelimit|autoclear|partscan|dio]_show() callbacks.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Link: https://lore.kernel.org/r/20220215213310.7264-2-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The function nullb_device_power_store() returns -ENOMEM when
null_add_dev() fails. null_add_dev() can fail with return value
other than -ENOMEM such as -EINVAL when Zoned Block Device option
is used, see :
nullb_device_power_store()
null_add_dev()
null_init_zoned_dev()
return -EINVAL;
When trying to load the module having -ENOMEM value returned on the
command line creates confusion when pleanty of memory is free on the
machine.
Instead of hardcoding -ENOMEM return the value of null_add_dev()
function.
Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220215115951.15945-1-kch@nvidia.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
According to lib/idr.c,
The IDA handles its own locking. It is safe to call any of the IDA
functions without synchronisation in your code.
so the 'ida_lock' mutex can just be removed.
It is here only to protect some ida_simple_get()/ida_simple_remove() calls.
While at it, switch to ida_alloc_XXX()/ida_free() instead to
ida_simple_get()/ida_simple_remove().
The latter is deprecated and more verbose.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/7f9eccd8b1fce1bac45ac9b01a78cf72f54c0a61.1644266862.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On client side, the device is a network device. There is no reason
to set rotational even-if the target device on server is rotational.
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-3-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch fix the "CHECK:BRACES: braces {} should be used on all
arms of this statement" warning from checkpatch
Signed-off-by: Gioh Kim <gi-oh.kim@ionos.com>
Signed-off-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Link: https://lore.kernel.org/r/20220114155855.984144-2-haris.iqbal@ionos.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code,
which has switched to use REQ_OP_WRITE_ZEROES long before rnbd was even
merged.
Link: https://lore.kernel.org/r/20220209082828.2629273-4-hch@lst.de
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
REQ_OP_WRITE_SAME was only ever submitted by the legacy Linux zeroing code,
which has switched to use REQ_OP_WRITE_ZEROES long ago.
Link: https://lore.kernel.org/r/20220209082828.2629273-3-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Various block drivers call blk_set_queue_dying to mark a disk as dead due
to surprise removal events, but since commit 8e141f9eb8 that doesn't
work given that the GD_DEAD flag needs to be set to stop I/O.
Replace the driver calls to blk_set_queue_dying with a new (and properly
documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the
only remaining caller.
Fixes: 8e141f9eb8 ("block: drain file system I/O on del_gendisk")
Reported-by: Markus Blöchl <markus.bloechl@ipetronik.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Implement the ->free_disk method to free the virtio_blk structure only
once the last gendisk reference goes away instead of keeping a local
refcount.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20220215094514.3828912-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If backing file's filesystem has implemented ->fallocate(), we think the
loop device can support discard, then pass sb->s_blocksize as
discard_granularity. However, some underlying FS, such as overlayfs,
doesn't set sb->s_blocksize, and causes discard_granularity to be set as
zero, then the warning in __blkdev_issue_discard() is triggered.
Christoph suggested to pass kstatfs.f_bsize as discard granularity, and
this way is fine because kstatfs.f_bsize means 'Optimal transfer block
size', which still matches with definition of discard granularity.
So fix the issue by setting discard_granularity as kstatfs.f_bsize if it
is available, otherwise claims discard isn't supported.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Reported-by: Pei Zhang <pezhang@redhat.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220126035830.296465-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The kernel test robot is reporting that xfstest which does
umount ext2 on xfs
umount xfs
sequence started failing, for commit 322c4293ec ("loop: make
autoclear operation asynchronous") removed a guarantee that fput() of
backing file is processed before lo_release() from close() returns to
user mode.
And syzbot is reporting that deferring destroy_workqueue() from
__loop_clr_fd() to a WQ context did not help [1]. Revert that commit.
Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Acked-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: syzbot <syzbot+831661966588c802aae9@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Link: https://lore.kernel.org/r/20220211071554.3424-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass a block_device to bio_clone_fast and __bio_clone_fast and give
the functions more suitable names.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Make sure the newly allocated bio has the correct bi_bdev set from the
start.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20220202160109.108149-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass the block_device that we plan to use this bio for and the
operation to bio_reset to optimize the assigment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass the block_device that we plan to use this bio for and the
operation to bio_init to optimize the assignment. A NULL block_device
can be passed, both for the passthrough case on a raw request_queue and
to temporarily avoid refactoring some nasty code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pass the block_device and operation that we plan to use this bio for to
bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both
for the passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.
Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-13-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only the priv field of rnbd_dev_blk_io is used, so store the value of
that in bio->bi_private directly and remove the entire bio_set overhead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The memory mapped in process_rdma is contiguous, so there is no need
to loop over bio_add_page. Remove rnbd_bio_map_kern and just open code
the bio allocation and mapping in the caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jack Wang <jinpu.wang@ionons.com>
Tested-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220124091107.642561-11-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove handling of NULL returns from sleeping bio_alloc calls given that
those can't fail.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220124091107.642561-10-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no good reason to keep genhd.h separate from the main blkdev.h
header that includes it. So fold the contents of genhd.h into blkdev.h
and remove genhd.h entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQHJBAABCgAzFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmHi+xgVHHl1cnkubm9y
b3ZAZ21haWwuY29tAAoJELFEgP06H77IxdoMAMf3E+L51Ys/4iAiyJQNVoT3aIBC
A8ZVOB9he1OA3o3wBNIRKmICHk+ovnfCWcXTr9fG/Ade2wJz88NAsGPQ1Phywb+s
iGlpySllFN72RT9ZqtJhLEzgoHHOL0CzTW07TN9GJy4gQA2h2G9CTP+OmsQdnVqE
m9Fn3PSlJ5lhzePlKfnln8rGZFgrriJakfEFPC79n/7an4+2Hvkb5rWigo7KQc4Z
9YNqYUcHWZFUgq80adxEb9LlbMXdD+Z/8fCjOrAatuwVkD4RDt6iKD0mFGjHXGL7
MZ9KRS8AfZXawmetk3jjtsV+/QkeS+Deuu7k0FoO0Th2QV7BGSDhsLXAS5By/MOC
nfSyHhnXHzCsBMyVNrJHmNhEZoN29+tRwI84JX9lWcf/OLANcCofnP6f2UIX7tZY
CAZAgVELp+0YQXdybrfzTQ8BT3TinjS/aZtCrYijRendI1GwUXcyl69vdOKqAHuk
5jy8k/xHyp+ZWu6v+PyAAAEGowY++qhL0fmszA==
=RKW4
-----END PGP SIGNATURE-----
Merge tag 'bitmap-5.17-rc1' of git://github.com/norov/linux
Pull bitmap updates from Yury Norov:
- introduce for_each_set_bitrange()
- use find_first_*_bit() instead of find_next_*_bit() where possible
- unify for_each_bit() macros
* tag 'bitmap-5.17-rc1' of git://github.com/norov/linux:
vsprintf: rework bitmap_list_string
lib: bitmap: add performance test for bitmap_print_to_pagebuf
bitmap: unify find_bit operations
mm/percpu: micro-optimize pcpu_is_populated()
Replace for_each_*_bit_from() with for_each_*_bit() where appropriate
find: micro-optimize for_each_{set,clear}_bit()
include/linux: move for_each_bit() macros from bitops.h to find.h
cpumask: replace cpumask_next_* with cpumask_first_* where appropriate
tools: sync tools/bitmap with mother linux
all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate
cpumask: use find_first_and_bit()
lib: add find_first_and_bit()
arch: remove GENERIC_FIND_FIRST_BIT entirely
include: move find.h from asm_generic to linux
bitops: move find_bit_*_le functions from le.h to find.h
bitops: protect find_first_{,zero}_bit properly
by Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota
and a new mount option for turning on unbuffered I/O mode globally
instead of on a per-file basis with ioctl(CEPH_IOC_SYNCIO).
-----BEGIN PGP SIGNATURE-----
iQFHBAABCAAxFiEEydHwtzie9C7TfviiSn/eOAIR84sFAmHpP5ATHGlkcnlvbW92
QGdtYWlsLmNvbQAKCRBKf944AhHzi0TgB/480i2lPHgA3ujJNqo5Q6z+W0vtTA2+
Wx+4rAUgIESJVunbFxvecPbzyUXTe7wWFI11TCVHPpf6GyIIDTD+uHd3kKWtLsfL
Zkk1/2PN9Q5Dh29R+N8rP9NaP8tIaTQjyiO3iqmRZlo+k0Z/lYtWUb+fUP05XlVY
ML/ktW543tkKeYwl3SWdW5MqAAOVGDbTt+L51CraDhVoiUac5ptkP+cmDmIqsnGa
ZHVqpwugxgndEIyuBHDLBps+5/LrEaL10xDhGcMtP9hwGYhyNr6Yj+azfGtHWwOi
jdVsdHDiecUBVtGyZ351Y4pCMOmP0uJif6MOUZFXYYSSeUBUhH8UjgEi
=jcte
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"The highlight is the new mount "device" string syntax implemented by
Venky Shankar. It solves some long-standing issues with using
different auth entities and/or mounting different CephFS filesystems
from the same cluster, remounting and also misleading /proc/mounts
contents. The existing syntax of course remains to be maintained.
On top of that, there is a couple of fixes for edge cases in quota and
a new mount option for turning on unbuffered I/O mode globally instead
of on a per-file basis with ioctl(CEPH_IOC_SYNCIO)"
* tag 'ceph-for-5.17-rc1' of git://github.com/ceph/ceph-client:
ceph: move CEPH_SUPER_MAGIC definition to magic.h
ceph: remove redundant Lsx caps check
ceph: add new "nopagecache" option
ceph: don't check for quotas on MDS stray dirs
ceph: drop send metrics debug message
rbd: make const pointer spaces a static const array
ceph: Fix incorrect statfs report for small quota
ceph: mount syntax module parameter
doc: document new CephFS mount device syntax
ceph: record updated mon_addr on remount
ceph: new device mount syntax
libceph: rename parse_fsid() to ceph_parse_fsid() and export
libceph: generalize addr/ip parsing based on delimiter
partial support for < MAX_ORDER - 1 granularity for virtio-mem
driver_override for vdpa
sysfs ABI documentation for vdpa
multiqueue config support for mlx5 vdpa
Misc fixes, cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmHiDHkPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpVT4H/3Veixt3uYPOmuLU2tSx+8X+sFTtik81hyiE
okz5fRJrxxA8SqS76FnmO10FS4hlPOGNk0Z5WVhr0yihwFvPLvpCM/xi2Lmrz9I7
pB0sXOIocEL1xApsxukR9K1Twpb2hfYsflbJYUVlRfhS5G0izKJNZp5I7OPrzd80
vVNNDWKW2iLDlfqsavumI4Kvm4nsFuCHG03jzMtcIa7YTXYV3DORD4ZGFFVUOIQN
t5F74TznwHOeYgJeg7TzjFjfPWmXjLetvx10QX1A1uOvwppWW/QY6My0UafTXNXj
VB3gOwJPf+gxXAXl/4bafq4NzM0xys6cpcPpjvhmU+erY4UuyAU=
=Y1eO
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"virtio,vdpa,qemu_fw_cfg: features, cleanups, and fixes.
- partial support for < MAX_ORDER - 1 granularity for virtio-mem
- driver_override for vdpa
- sysfs ABI documentation for vdpa
- multiqueue config support for mlx5 vdpa
- and misc fixes, cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (42 commits)
vdpa/mlx5: Fix tracking of current number of VQs
vdpa/mlx5: Fix is_index_valid() to refer to features
vdpa: Protect vdpa reset with cf_mutex
vdpa: Avoid taking cf_mutex lock on get status
vdpa/vdpa_sim_net: Report max device capabilities
vdpa: Use BIT_ULL for bit operations
vdpa/vdpa_sim: Configure max supported virtqueues
vdpa/mlx5: Report max device capabilities
vdpa: Support reporting max device capabilities
vdpa/mlx5: Restore cur_num_vqs in case of failure in change_num_qps()
vdpa: Add support for returning device configuration information
vdpa/mlx5: Support configuring max data virtqueue
vdpa/mlx5: Fix config_attr_mask assignment
vdpa: Allow to configure max data virtqueues
vdpa: Read device configuration only if FEATURES_OK
vdpa: Sync calls set/get config/status with cf_mutex
vdpa/mlx5: Distribute RX virtqueues in RQT object
vdpa: Provide interface to read driver features
vdpa: clean up get_config_size ret value handling
virtio_ring: mark ring unused on error
...
If brd_alloc() from brd_probe() is called before brd_alloc() from
brd_init() is called, module loading will fail with -EEXIST error.
To close this race, call __register_blkdev() just before leaving
brd_init().
Then, we can remove brd_devices_mutex mutex, for brd_device list
will no longer be accessed concurrently.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/6b074af7-c165-4fab-b7da-8270a4f6f6cd@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
find_first{,_zero}_bit is a more effective analogue of 'next' version if
start == 0. This patch replaces 'next' with 'first' where things look
trivial.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Here is the large set of char, misc, and other "small" driver subsystem
changes for 5.17-rc1.
Lots of different things are in here for char/misc drivers such as:
- habanalabs driver updates
- mei driver updates
- lkdtm driver updates
- vmw_vmci driver updates
- android binder driver updates
- other small char/misc driver updates
Also smaller driver subsystems have also been updated, including:
- fpga subsystem updates
- iio subsystem updates
- soundwire subsystem updates
- extcon subsystem updates
- gnss subsystem updates
- phy subsystem updates
- coresight subsystem updates
- firmware subsystem updates
- comedi subsystem updates
- mhi subsystem updates
- speakup subsystem updates
- rapidio subsystem updates
- spmi subsystem updates
- virtual driver updates
- counter subsystem updates
Too many individual changes to summarize, the shortlog contains the full
details.
All of these have been in linux-next for a while with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCYeGNAQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymoVgCg1CPjMu8/SDj3Sm3a1UMQJn9jnl8AnjQcEp3z
hMr9mISG4r6g4PvjrJBj
=9May
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver updates from Greg KH:
"Here is the large set of char, misc, and other "small" driver
subsystem changes for 5.17-rc1.
Lots of different things are in here for char/misc drivers such as:
- habanalabs driver updates
- mei driver updates
- lkdtm driver updates
- vmw_vmci driver updates
- android binder driver updates
- other small char/misc driver updates
Also smaller driver subsystems have also been updated, including:
- fpga subsystem updates
- iio subsystem updates
- soundwire subsystem updates
- extcon subsystem updates
- gnss subsystem updates
- phy subsystem updates
- coresight subsystem updates
- firmware subsystem updates
- comedi subsystem updates
- mhi subsystem updates
- speakup subsystem updates
- rapidio subsystem updates
- spmi subsystem updates
- virtual driver updates
- counter subsystem updates
Too many individual changes to summarize, the shortlog contains the
full details.
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (406 commits)
counter: 104-quad-8: Fix use-after-free by quad8_irq_handler
dt-bindings: mux: Document mux-states property
dt-bindings: ti-serdes-mux: Add defines for J721S2 SoC
counter: remove old and now unused registration API
counter: ti-eqep: Convert to new counter registration
counter: stm32-lptimer-cnt: Convert to new counter registration
counter: stm32-timer-cnt: Convert to new counter registration
counter: microchip-tcb-capture: Convert to new counter registration
counter: ftm-quaddec: Convert to new counter registration
counter: intel-qep: Convert to new counter registration
counter: interrupt-cnt: Convert to new counter registration
counter: 104-quad-8: Convert to new counter registration
counter: Update documentation for new counter registration functions
counter: Provide alternative counter registration functions
counter: stm32-timer-cnt: Convert to counter_priv() wrapper
counter: stm32-lptimer-cnt: Convert to counter_priv() wrapper
counter: ti-eqep: Convert to counter_priv() wrapper
counter: ftm-quaddec: Convert to counter_priv() wrapper
counter: intel-qep: Convert to counter_priv() wrapper
counter: microchip-tcb-capture: Convert to counter_priv() wrapper
...
The variable n is being bit-wise or'd with a value and reassigned
before being returned. The update of n is redundant, replace
the |= operator with | instead. Cleans up clang scan warning:
drivers/block/aoe/aoecmd.c:125:9: warning: Although the value stored
to 'n' is used in the enclosing expression, the value is never
actually read from 'n' [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220113000545.1307091-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The pointer node is being initialized with a value that is never
read, it is being re-assigned the same value a little futher on.
Remove the redundant initialization. Cleans up clang scan warning:
drivers/block/loop.c:823:19: warning: Value stored to 'node' during
its initialization is never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20220113001432.1331871-1-colin.i.king@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Substantially all bug fixes and cleanups:
- Update drivers to use common helpers for GUIDs, pkeys, bitmaps,
memset_startat, and others
- General code cleanups from bots
- Simplify some of the rxe pool code in preparation for a larger rework
- Clean out old stuff from hns, including all support for hip06 devices
- Fix a bug where GID table entries could be missed if the table had holes
in it
- Rename paths and sessions in rtrs for better understandability
- Consolidate the roce source port selection code
- NDR speed support in mlx5
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAmHgct4ACgkQOG33FX4g
mxpFuQ//UqtbxowPeDB9bjJ5OLKZ1fGx0FxWkfBeR1cr0apboBNqdK1WOiz5Q7un
F2xpASNEsOCr6JMMBhHMOvNiMjRSs33GvydyBj5T7LRx/QGie+0AeSzlS314/mJs
NXvOinD21l1YEKIodw4Pfhtdl2QVmEvRpUJnccGyEGUKQ4jpUwVCTfa/tpoMVD5y
MsWqv+xOrhsmDahW2nUSXHhBIazVqYETg4EE8O7J1Lb48F98keVOdVkH5wL4nmKj
gl6oyN9lkw1sWDJBnom7mgd38L2M42mRtQkiFdMdnpj5D5jbLTcGv30GgBfyMPr6
8tI3sXcAJh3Wk3TUu2jEh2F+SjsHKRTqVjGVwQbkvEuhFK2TSHAhGC+gmP6ueZKG
diHKcJVNm6rBX6L/EczYQ7hjOiMzJLlLjhZnr8+2Lqw0X+DzQbN19RUb+XX8iqkP
ITM5LPQHf+7N8Rz2W7jcHk1h3wLv1VcKktErc6mUTHdxxpJv/XEsmLP22kqHgSyx
So6yAlMtMMMZfP6taWkpTzC6KoduFJwWARf3zYoJreeWmL18F4+Tha2th8xnQMi2
cq0UOu1WnVEFwiIzdMa3aCtTDxXQ6UgPVk1E24RaiZTEBp5hO5+Xmn56du7G89Cb
nlZbAudbh3aElbj9ptUsJGSVowGgSLJvvfgFyZz2u+wFBqdJnUk=
=EL3r
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
"Another small cycle. Mostly cleanups and bug fixes, quite a bit
assisted from bots. There are a few new syzkaller splats that haven't
been solved yet but they should get into the rcs in a few weeks, I
think.
Summary:
- Update drivers to use common helpers for GUIDs, pkeys, bitmaps,
memset_startat, and others
- General code cleanups from bots
- Simplify some of the rxe pool code in preparation for a larger
rework
- Clean out old stuff from hns, including all support for hip06
devices
- Fix a bug where GID table entries could be missed if the table had
holes in it
- Rename paths and sessions in rtrs for better understandability
- Consolidate the roce source port selection code
- NDR speed support in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (83 commits)
RDMA/irdma: Remove the redundant return
RDMA/rxe: Use the standard method to produce udp source port
RDMA/irdma: Make the source udp port vary
RDMA/hns: Replace get_udp_sport with rdma_get_udp_sport
RDMA/core: Calculate UDP source port based on flow label or lqpn/rqpn
IB/qib: Fix typos
RDMA/rtrs-clt: Rename rtrs_clt to rtrs_clt_sess
RDMA/rtrs-srv: Rename rtrs_srv to rtrs_srv_sess
RDMA/rtrs-clt: Rename rtrs_clt_sess to rtrs_clt_path
RDMA/rtrs-srv: Rename rtrs_srv_sess to rtrs_srv_path
RDMA/rtrs: Rename rtrs_sess to rtrs_path
RDMA/hns: Modify the hop num of HIP09 EQ to 1
IB/iser: Align coding style across driver
IB/iser: Remove un-needed casting to/from void pointer
IB/iser: Don't suppress send completions
IB/iser: Rename ib_ret local variable
IB/iser: Fix RNR errors
IB/iser: Remove deprecated pi_guard module param
IB/mlx5: Expose NDR speed through MAD
RDMA/cxgb4: Set queue pair state when being queried
...
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmHbZ+YeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGDs4H/RgC8JOV3Dki1VtO
6OwPxUKKojhVU9LJis7kyG5voB/zE7tK5nI+jC3gYGQUFKWaZ3YY8s3UcV1zvg/b
a44b91boA+dKxEwOq4RZNQ9mU+QWnNoG5+UqBkmB8vewi3QC3T8xEmpWcERLbU7d
KrI2T6i4ksJ9OYSYMEMyrvrpt7nt3n1tDX8b71faXjf1zbLeGo9zT53t6BJ/LknV
AK406Eq/3bg36OZrKFuG7hCJfRE/cSlxF9bxK3sIfMBMQ2YPe1S5+pxl5iBD0nyl
NaHOBYcLTxPAne3YgIvK0zDdsS+EtPSlaVdWfSmNjQhX2vqEixldgdrOCmwp37vd
3gV9D28=
=hrOo
-----END PGP SIGNATURE-----
Merge tag 'v5.16' into rdma.git for-next
To resolve minor conflict in:
drivers/infiniband/hw/mlx5/mlx5_ib.h
By merging both hunks.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Don't populate the const array spaces on the stack but make it static
const and make the pointer an array to remove a dereference. Shrinks
object code a little too. Also clean up intent, currently it is spaces
and should be a tab.
Signed-off-by: Colin Ian King <colin.i.king@googlemail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
... and remove hardcoded function name in ceph_parse_ips().
[ idryomov: delim parameter, drop CEPH_ADDR_PARSE_DEFAULT_DELIM ]
Signed-off-by: Venky Shankar <vshankar@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmHd8EIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpnOKEADGpxp+Vntbm8nZI/PFP5fA2gUTZWgSVB4l
axVTYW21pjSrsrAhGg2FIgBgL0tNkgxQnIPRn50YL8jT3pTkCEcR7kLbhEU7W/Ln
7hrsBgFnsCBoCs38LvzXHZD69jtEtNRk1ijPMLo5iCcHkAyUVKa1glfeMwefuI5/
Rl8SoueRXppvCfwNPptaAKiDsYVN8KCJPvvhlMNoKP5n1iTsNYJ/HVsLqfRnP0oc
CR6eHaYceWGLER8tWtBlG2Qp40+cd/A320thkIlEpEKJPWE/ce5AUp0PYxVJbwjU
qvO1tMYSya7gPiaVWRJcUeAgRFiivM/kTdDrGwiY9hpv/BQG7EAW5D9Xecz/M4UG
BgNLfhe0aR9QssjPxITgyiy9sRpwwpnpoVONTu3slgXVTUVlOq0QT6LOTPR1B9A4
ZjbHVCuI3eyrAOqD4IjYSqjHa6GjFLiKTh8Q0ZB/KJGX1eItLVLVdJfcfV4RkBIf
6RZg9+7/mXaDxU74DZ2tfUhHT0sC5RS+5VFxpkhThVk9qRbVdZGGWAHcVOkMjk9B
L4PCpJeuaR+rzXvCDOCOI5sHraa5F/IRhMaTu5sHj/MIuEpq1fqjaB7tWRvfm6HO
4tepUtb++rS3/zFFQlZCLyjVk2o0p2b0viwPLjvsRqsBp1bVoO9mJIiyp6POmM3G
UjxQS0vEDw==
=k0IZ
-----END PGP SIGNATURE-----
Merge tag 'for-5.17/drivers-2022-01-11' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- mtip32xx pci cleanups (Bjorn)
- mtip32xx conversion to generic power management (Vaibhav)
- rsxx pci powermanagement cleanups (Bjorn)
- Remove the rsxx driver. This hardware never saw much adoption, and
it's been end of lifed for a while. (Christoph)
- MD pull request from Song:
- REQ_NOWAIT support (Vishal Verma)
- raid6 benchmark optimization (Dirk Müller)
- Fix for acct bioset (Xiao Ni)
- Clean up max_queued_requests (Mariusz Tkaczyk)
- PREEMPT_RT optimization (Davidlohr Bueso)
- Use default_groups in kobj_type (Greg Kroah-Hartman)
- Use attribute groups in pktcdvd and rnbd (Greg)
- NVMe pull request from Christoph:
- increment request genctr on completion (Keith Busch, Geliang
Tang)
- add a 'iopolicy' module parameter (Hannes Reinecke)
- print out valid arguments when reading from /dev/nvme-fabrics
(Hannes Reinecke)
- Use struct_group() in drbd (Kees)
- null_blk fixes (Ming)
- Get rid of congestion logic in pktcdvd (Neil)
- Floppy ejection hang fix (Tasos)
- Floppy max user request size fix (Xiongwei)
- Loop locking fix (Tetsuo)
* tag 'for-5.17/drivers-2022-01-11' of git://git.kernel.dk/linux-block: (32 commits)
md: use default_groups in kobj_type
md: Move alloc/free acct bioset in to personality
lib/raid6: Use strict priority ranking for pq gen() benchmarking
lib/raid6: skip benchmark of non-chosen xor_syndrome functions
md: fix spelling of "its"
md: raid456 add nowait support
md: raid10 add nowait support
md: raid1 add nowait support
md: add support for REQ_NOWAIT
md: drop queue limitation for RAID1 and RAID10
md/raid5: play nice with PREEMPT_RT
block/rnbd-clt-sysfs: use default_groups in kobj_type
pktcdvd: convert to use attribute groups
block: null_blk: only set set->nr_maps as 3 if active poll_queues is > 0
nvme: add 'iopolicy' module parameter
nvme: drop unused variable ctrl in nvme_setup_cmd
nvme: increment request genctr on completion
nvme-fabrics: print out valid arguments when reading from /dev/nvme-fabrics
block: remove the rsxx driver
rsxx: Drop PCI legacy power management
...
Structure rtrs_clt is used for sessions. So to avoid confusions rename it
to rtrs_clt_sess.
Transformations are done with the help of following coccinelle script.
@@
@@
struct
- rtrs_clt
+ rtrs_clt_sess
Link: https://lore.kernel.org/r/20220105180708.7774-6-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Structure rtrs_srv is used for sessions so in order to avoid confusions
rename it to rtrs_srv_sess.
All changes were done with the help of following Coccinelle script:
@@
@@
struct
- rtrs_srv
+ rtrs_srv_sess
Link: https://lore.kernel.org/r/20220105180708.7774-5-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rtrs_srv_sess is used for paths and not sessions on the server side. This
creates confusion so let's rename it to rtrs_srv_path. Also, rename
related variables and functions.
Coccinelle is used to do the transformations for most of the occurrences
and remaining ones were handled manually.
Link: https://lore.kernel.org/r/20220105180708.7774-3-jinpu.wang@ionos.com
Signed-off-by: Vaishali Thakkar <vaishali.thakkar@ionos.com>
Signed-off-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the rnbd controller sysfs code to use default_groups field
which has been the preferred way since aa30f47cf6 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.
Cc: "Md. Haris Iqbal" <haris.iqbal@ionos.com>
Cc: Jack Wang <jinpu.wang@ionos.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Link: https://lore.kernel.org/r/20220104162947.1320936-1-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There is no need to create kobject children of the pktcdvd device just
to display a subdirectory name. Instead, use a named attribute group
which removes the extra kobjects and also fixes the userspace race where
the device is created yet tools like libudev can not see the attributes
as they think the subdirectories are some other sort of device.
Cc: linux-block@vger.kernel.org
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/r/20220103162408.742003-1-gregkh@linuxfoundation.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
It isn't correct to set set->nr_maps as 3 if g_poll_queues is > 0 since
we can change it via configfs for null_blk device created there, so only
set it as 3 if active poll_queues is > 0.
Fixes divide zero exception reported by Shinichiro.
Fixes: 2bfdbe8b7e ("null_blk: allow zero poll queues")
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Link: https://lore.kernel.org/r/20211224010831.1521805-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
MIPS include files define "PC" so when building the paride driver the
following build warning shows up:
rivers/block/paride/bpck.c:32: warning: "PC" redefined
Fix this by undefining PC before redefining it as is done for other
defines in this driver.
Cc: Tim Waugh <tim@cyberelk.net>
Acked-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20211130084626.3215987-1-gregkh@linuxfoundation.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This driver was for rare and shortlived high end enterprise hardware
and hasn't been maintained since 2014, which also means it never got
converted to use blk-mq.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The Xen blkfront driver is still vulnerable for an attack via excessive
number of events sent by the backend. Fix that by using lateeoi event
channels.
This is part of XSA-391
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Previously we passed a struct pci_dev * to mtip_check_surprise_removal(),
which immediately looked up the driver_data. But all callers already have
the driver_data pointer, so just pass it directly and skip the extra
lookup. No functional change intended.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-3-helgaas@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The .suspend() and .resume() methods are only called after the .probe()
method (mtip_pci_probe()) has set the drvdata and returned success.
Therefore, if we get to mtip_pci_suspend() or mtip_pci_resume(), the
drvdata must be valid. Drop the unnecessary checking.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20211208192449.146076-2-helgaas@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.
Add a struct_group() for the algs so that memset() can correctly reason
about the size.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20211118203712.1288866-1-keescook@chromium.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.
This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.
Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372@syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
kernel test robot reports that sparse now triggers a warning on null_blk:
>> drivers/block/null_blk/main.c:1577:55: sparse: sparse: incorrect type in argument 3 (different base types) @@ expected int ioerror @@ got restricted blk_status_t [usertype] error @@
drivers/block/null_blk/main.c:1577:55: sparse: expected int ioerror
drivers/block/null_blk/main.c:1577:55: sparse: got restricted blk_status_t [usertype] error
because blk_mq_add_to_batch() takes an integer instead of a blk_status_t.
Just cast this to an integer to silence it, null_blk is the odd one out
here since the command status is the "right" type. If we change the
function type, then we'll have do that for other callers too (existing and
future ones).
Fixes: 2385ebf38f ("block: null_blk: batched complete poll requests")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The bdi congestion framework isn't widely used and should be
deprecated.
pktdvd makes use of it to track congestion, but this can be done
entirely internally to pktdvd, so it doesn't need to use the framework.
So introduce a "congested" flag. When waiting for bio_queue_size to
drop, set this flag and a var_waitqueue() to wait for it. When
bio_queue_size does drop and this flag is set, clear the flag and call
wake_up_var().
We don't use a wait_var_event macro for the waiting as we need to set
the flag and drop the spinlock before calling schedule() and while that
is possible with __wait_var_event(), result is not easy to read.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Link: https://lore.kernel.org/r/163910843527.9928.857338663717630212@noble.neil.brown.name
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGqefsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpkuDEADL+y8mkJEW6wcvFQSYlHdXXXthVEGmZyWV
OINHE4JO/ysU4P8PTnzOvmXuV2rKhcpV9KF9rdo0vqMLXxQFlyjbUkIf7RP0YUjT
Hz8nmGUQLXCEAWHYvdh9tRV2lgSutkajsxw/mNzwWJiM7sEnQXHifa4sXsQNGpLp
eclOe4ITe/J4Q5CnntlA8u0lM4Y16O5jHrCneoqJxy6v/Wx9JkKI+L+mEfJQCpVN
YxtVjFrE9nznB2eWBtZYlHnRxxZaNzRzY38KygKqhw0MhjCgpR+0izBTewzJyJjv
09k+blskSqBtd1+GatbU0dGevcmQPJ7h9ersXQl00RaQiKBcFH1If5I89U2hkw0F
GzI/N8y/zZIjTuSdVWEbTJCArTo8FxJ3ASc327n2Q2GSItbp9mjJlshLXkK2B+gg
O1U3ZPAOFEsQOW4GIP6f/nPQtu+tHqBFytgpBJ2fegzaSIU03K3crEOOifKOODYm
tGA2yxLymY3ZSVThAWixr+6efQMqD+J6Dv4uUx6sRMtJ0fi7+P6l7YimjQw17z/J
XmEkionAsn6cbVIyJaMhcFop2IED5n18ZgPT0wdU2vfkwrwQsSpk5Uh9FPMg62RA
Fzymb6IHr3+AgR+ZxQACYz7NRrULj2rRxAKpqX1iu6aodeJR3WrJx/9Ci/76Xgfm
HT+f1uH7ww==
=QBwo
-----END PGP SIGNATURE-----
Merge tag 'block-5.16-2021-12-03' of git://git.kernel.dk/linux-block
Pull block fix from Jens Axboe:
"A single fix for repeated printk spam from loop"
* tag 'block-5.16-2021-12-03' of git://git.kernel.dk/linux-block:
loop: Use pr_warn_once() for loop_control_remove() warning
Complete poll requests via blk_mq_add_to_batch() and
blk_mq_end_request_batch(), so that we can cover batched complete
code path by running null_blk test.
Meantime this way shows ~14% IOPS boost on 't/io_uring /dev/nullb0'
in my test.
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203081703.3506020-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the watchdog detects a disk change, it calls cancel_activity(),
which in turn tries to cancel the fd_timer delayed work.
In the above scenario, fd_timer_fn is set to fd_watchdog(), meaning
it is trying to cancel its own work.
This results in a hang as cancel_delayed_work_sync() is waiting for the
watchdog (itself) to return, which never happens.
This can be reproduced relatively consistently by attempting to read a
broken floppy, and ejecting it while IO is being attempted and retried.
To resolve this, this patch calls cancel_delayed_work() instead, which
cancels the work without waiting for the watchdog to return and finish.
Before this regression was introduced, the code in this section used
del_timer(), and not del_timer_sync() to delete the watchdog timer.
Link: https://lore.kernel.org/r/399e486c-6540-db27-76aa-7a271b061f76@tasossah.com
Fixes: 070ad7e793 ("floppy: convert to delayed work and single-thread wq")
Signed-off-by: Tasos Sahanidis <tasos@tasossah.com>
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There isn't any reason to not allow zero poll queues from user
viewpoint.
Also sometimes we need to compare io poll between poll mode and irq
mode, so not allowing poll queues is bad.
Fixes: 15dfc662ef ("null_blk: Fix handling of submit_queues and poll_queues attributes")
Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20211203023935.3424042-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
kernel test robot reported that RCU stall via printk() flooding is
possible [1] when stress testing.
Link: https://lkml.kernel.org/r/20211129073709.GA18483@xsang-OptiPlex-9020 [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.
Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.
Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.
Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: syzbot <syzbot+63614029dfb79abd4383@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakura.ne.jp
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove the gendisk aregument to blk_execute_rq and blk_execute_rq_nowait
given that it is unused now. Also convert the boolean at_head parameter
to actually use the bool type while touching the prototype.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Just use the disk attached to the request_queue instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
All modern drivers can support extra partitions using the extended
dev_t. In fact except for the ioctl method drivers never even see
partitions in normal operation.
So remove the GENHD_FL_EXT_DEVT and allow extra partitions for all
block devices that do support partitions, and require those that
do not support partitions to explicit disallow them using
GENHD_FL_NO_PART.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-12-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This manually reverts commit 27290b469051 ("null_blk: suppress invalid
partition info"). The message in that commit log can't appearch as
the flag is never checked during probing, and there is no good reason
to treat null_blk special in /proc/partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-9-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The GENHD_FL_NO_PART_SCAN controls more than just partitions canning,
so rename it to GENHD_FL_NO_PART.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20211122130625.1136848-7-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GENHD_FL_CD marks a gendisk as a vaguely CD-ROM like device.
Besides being used internally inside of sunvdc.c an xen-blkfront it
is used by xen-blkback as a hint to claim a device exported to a
guest is a CD-ROM like device. Just check for disk->cdi instead
which is the right indicator for "real" CD-ROM or DVD drivers. This
will miss the paravirtualized guest drivers, but those make little
sense to report anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE is all about the event reporting
mechanism, so move it to the event_flags field.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20211122130625.1136848-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken does
not seem to work, breaking too many guests in the process. We'll need to
do length validation using some other approach.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmGe0sEPHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRp8WEH/imDIq1iduDeAuvFnmrm5eEO9w3wzXCT4NiG
8Pla241FzQ1pEFEAne16KP0+SlLhj7P0oc5FR8vkYvxxuyneDbCzcS2M1kYMOpA1
ry28PuObAnekzE/WXxvC031ozB5Zb/FL54gmw+/1EdAOdMGL0CdQ1aJxREBHRTBo
p4ZHr83GA2D2C/IyKCsgQ8cB9ZrMqImTQQ4vRD89HoFBp+GH2u2Di1iyXEWuOqdI
n1+7M9jjbyW8A+N1bkOicpShS/6UcyJQOOcg8kvUQOV6srVkYhfaiWC/CbOP2g73
8PKK+/K2Htf92s6RdvDUPSKmvqGR/4KPZWPtWThXBYXGgWul0uI=
=q6tO
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
"Misc fixes all over the place.
Revert of virtio used length validation series: the approach taken
does not seem to work, breaking too many guests in the process. We'll
need to do length validation using some other approach"
[ This merge also ends up reverting commit f7a36b03a7 ("vsock/virtio:
suppress used length validation"), which came in through the
networking tree in the meantime, and was part of that whole used
length validation series - Linus ]
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa_sim: avoid putting an uninitialized iova_domain
vhost-vdpa: clean irqs before reseting vdpa device
virtio-blk: modify the value type of num in virtio_queue_rq()
vhost/vsock: cleanup removing `len` variable
vhost/vsock: fix incorrect used length reported to the guest
Revert "virtio_ring: validate used buffer length"
Revert "virtio-net: don't let virtio core to validate used length"
Revert "virtio-blk: don't let virtio core to validate used length"
Revert "virtio-scsi: don't let virtio core to validate used buffer length"
If writeback isn't configured, then we get the following warning when
compiling zram:
drivers/block/zram/zram_drv.c:1824:45: warning: unused variable 'zram_wb_devops' [-Wunused-const-variable]
Make sure we only define the block_device_operations if that option is
enabled.
Link: https://lore.kernel.org/lkml/202111261614.gCJMqcyh-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This was found by coccicheck:
./drivers/block/virtio_blk.c, 334, 14-17, WARNING Unsigned expression
compared with zero num < 0
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211117063955.160777-1-ye.guojin@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: 02746e26c3 ("virtio-blk: avoid preallocating big SGL for data")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
This reverts commit a40392edf1.
Attempts to validate length in the core did not work out.
We'll drop them, so revert the dependent changes in drivers.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmGKqOcQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpr3yEADD9Cx8oNk3KzWV3c3JlIR4JQtvpczS3dho
KkGU0D5fOh1sViXbLBNr6VxypcEIKQoHxDQQ6qid1kOu/B3mCNM1duLsVjyj3Qa0
7nbm2dVUsD/EVDuXedRmMvcfCUx6Z23DbpI182wXtIPaCsEEmsANzHnZNg38OV44
25SYG0QUvb9ViSz1Y1GORu0ttEJNF2GhZfiBpb0WveRnY7eTSL/PnHNDzHsSeFv4
zD0W205g7jKbt0+57kgNElTz7DbdM3p8XVex+aXPlFaHz2qx4ZoJJIsaMv/P8tT5
14b50cB41xnPvlGTvqr1WfZZfJocDNq2rG+fh6N5D1sO86ogWpj7psiiADfa0pb6
ZWoJqhk3BvEUMPQ5N/BJ/8j3FWGIYWtKQf4QcyxrJYpqDwtwbBfMlzKkc7JMPFYk
JAi6uq1uF5SbA4x99G90tK85LvxsbkseyIYXgBJ/GIyW5doIPkD9TPDEzJMCdHOe
laynHS5PMHzuhPLuEDDn9sTVXpZWAMBnoy4j1L4wGmBjiogYWLTSJVobODzCAqHY
1Va2oP6SXfCdVRkCysFbcrdsjJuoIWlMKrdE40tNvkmU0v7sEX0Zd+GLHiaWdIZa
fgxC9fmZtDDOowCp+Iw0VaAqPeeptmyUrof06ZktJleOAscX7kSwbxPdmr1FM0jy
dbnLDyaq/A==
=QaFI
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block
Pull more block driver updates from Jens Axboe:
- Last series adding error handling support for add_disk() in drivers.
After this one, and once the SCSI side has been merged, we can
finally annotate add_disk() as must_check. (Luis)
- bcache fixes (Coly)
- zram fixes (Ming)
- ataflop locking fix (Tetsuo)
- nbd fixes (Ye, Yu)
- MD merge via Song
- Cleanup (Yang)
- sysfs fix (Guoqing)
- Misc fixes (Geert, Wu, luo)
* tag 'for-5.16/drivers-2021-11-09' of git://git.kernel.dk/linux-block: (34 commits)
bcache: Revert "bcache: use bvec_virt"
ataflop: Add missing semicolon to return statement
floppy: address add_disk() error handling on probe
ataflop: address add_disk() error handling on probe
block: update __register_blkdev() probe documentation
ataflop: remove ataflop_probe_lock mutex
mtd/ubi/block: add error handling support for add_disk()
block/sunvdc: add error handling support for add_disk()
z2ram: add error handling support for add_disk()
nvdimm/pmem: use add_disk() error handling
nvdimm/pmem: cleanup the disk if pmem_release_disk() is yet assigned
nvdimm/blk: add error handling support for add_disk()
nvdimm/blk: avoid calling del_gendisk() on early failures
nvdimm/btt: add error handling support for add_disk()
nvdimm/btt: use goto error labels on btt_blk_init()
loop: Remove duplicate assignments
drbd: Fix double free problem in drbd_create_device
nvdimm/btt: do not call del_gendisk() if not needed
bcache: fix use-after-free problem in bcache_device_free()
zram: replace fsync_bdev with sync_blockdev
...
Merge misc updates from Andrew Morton:
"257 patches.
Subsystems affected by this patch series: scripts, ocfs2, vfs, and
mm (slab-generic, slab, slub, kconfig, dax, kasan, debug, pagecache,
gup, swap, memcg, pagemap, mprotect, mremap, iomap, tracing, vmalloc,
pagealloc, memory-failure, hugetlb, userfaultfd, vmscan, tools,
memblock, oom-kill, hugetlbfs, migration, thp, readahead, nommu, ksm,
vmstat, madvise, memory-hotplug, rmap, zsmalloc, highmem, zram,
cleanups, kfence, and damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (257 commits)
mm/damon: remove return value from before_terminate callback
mm/damon: fix a few spelling mistakes in comments and a pr_debug message
mm/damon: simplify stop mechanism
Docs/admin-guide/mm/pagemap: wordsmith page flags descriptions
Docs/admin-guide/mm/damon/start: simplify the content
Docs/admin-guide/mm/damon/start: fix a wrong link
Docs/admin-guide/mm/damon/start: fix wrong example commands
mm/damon/dbgfs: add adaptive_targets list check before enable monitor_on
mm/damon: remove unnecessary variable initialization
Documentation/admin-guide/mm/damon: add a document for DAMON_RECLAIM
mm/damon: introduce DAMON-based Reclamation (DAMON_RECLAIM)
selftests/damon: support watermarks
mm/damon/dbgfs: support watermarks
mm/damon/schemes: activate schemes based on a watermarks mechanism
tools/selftests/damon: update for regions prioritization of schemes
mm/damon/dbgfs: support prioritization weights
mm/damon/vaddr,paddr: support pageout prioritization
mm/damon/schemes: prioritize regions within the quotas
mm/damon/selftests: support schemes quotas
mm/damon/dbgfs: support quotas of schemes
...
This change introduces an aged idle interface to the existing idle sysfs
file for zram.
When CONFIG_ZRAM_MEMORY_TRACKING is enabled the idle file now also
accepts an integer argument. This integer is the age (in seconds) of
pages to mark as idle. The idle file still supports 'all' as it always
has. This new approach allows for much more control over which pages
get marked as idle.
[bgeffon@google.com: use IS_ENABLED and cleanup comment]
Link: https://lkml.kernel.org/r/20210924161128.1508015-1-bgeffon@google.com
[bgeffon@google.com: Sergey's cleanup suggestions]
Link: https://lkml.kernel.org/r/20210929143056.13067-1-bgeffon@google.com
Link: https://lkml.kernel.org/r/20210923130115.1344361-1-bgeffon@google.com
Signed-off-by: Brian Geffon <bgeffon@google.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: Jesse Barnes <jsbarnes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
snprintf() returns the number of bytes it would have printed if there
were space. But it does not count the NUL terminator. So that means
that if "count == copied" then this has already overflowed by one
character.
This bug likely isn't super harmful in real life.
Link: https://lkml.kernel.org/r/20210916130404.GA25094@kili
Fixes: c0265342bf ("zram: introduce zram memory tracking")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The read_from_bdev_async is not called on atomic context. So GFP_NOIO
is available rather than GFP_ATOMIC. If there were reclaimable pages
with GFP_NOIO, we can avoid allocation failure and page fault failure.
Link: https://lkml.kernel.org/r/20210908005241.28062-1-jaewon31.kim@samsung.com
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to cleanup resources on the probe() callback registered
with __register_blkdev(), now that add_disk() error handling is
supported. Address this.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211103230437.1639990-14-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We need to cleanup resources on the probe() callback registered
with __register_blkdev(), now that add_disk() error handling is
supported. Address this.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211103230437.1639990-13-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Commit bf9c0538e4 ("ataflop: use a separate gendisk for each media
format") introduced ataflop_probe_lock mutex, but forgot to unlock the
mutex when atari_floppy_init() (i.e. module loading) succeeded. This will
result in double lock deadlock if ataflop_probe() is called. Also,
unregister_blkdev() must not be called from atari_floppy_init() with
ataflop_probe_lock held when atari_floppy_init() failed, for
ataflop_probe() waits for ataflop_probe_lock with major_names_lock held
(i.e. AB-BA deadlock).
__register_blkdev() needs to be called last in order to avoid calling
ataflop_probe() when atari_floppy_init() is about to fail, for memory for
completing already-started ataflop_probe() safely will be released as soon
as atari_floppy_init() released ataflop_probe_lock mutex.
As with commit 8b52d8be86 ("loop: reorder loop_exit"),
unregister_blkdev() needs to be called first in order to avoid calling
ataflop_alloc_disk() from ataflop_probe() after del_gendisk() from
atari_floppy_exit().
By relocating __register_blkdev() / unregister_blkdev() as explained above,
we can remove ataflop_probe_lock mutex, for probe function and __exit
function are serialized by major_names_lock mutex.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Fixes: bf9c0538e4 ("ataflop: use a separate gendisk for each media format")
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Tested-by: Michael Schmitz <schmitzmic@gmail.com>
Link: https://lore.kernel.org/r/20211103230437.1639990-11-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
We re-use the same free tag call, so we also add a label for
that as well.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211103230437.1639990-9-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling. Only the disk is cleaned up inside
z2ram_register_disk() as the caller deals with the rest.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211103230437.1639990-8-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The assignment and operation there will be overwritten later, so
it should be deleted.
The clang_analyzer complains as follows:
drivers/block/loop.c:2330:2 warning:
Value stored to 'err' is never read
change in v2:
Repair the sending email box
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: luo penghao <luo.penghao@zte.com.cn>
Link: https://lore.kernel.org/r/20211104064546.3074-1-luo.penghao@zte.com.cn
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In drbd_create_device(), the 'out_no_io_page' lable has called
blk_cleanup_disk() when return failed.
So remove the 'out_cleanup_disk' lable to avoid double free the
disk pointer.
Fixes: e92ab4eda5 ("drbd: add error handling support for add_disk()")
Signed-off-by: Wu Bo <wubo40@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/1636013229-26309-1-git-send-email-wubo40@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Hardening work by Jason
vdpa driver for Alibaba ENI
Performance tweaks for virtio blk
virtio rng rework using an internal buffer
mac/mtu programming for mlx5 vdpa
Misc fixes, cleanups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmF/su8PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpo+sIAJjBTvbET+d0KuIMt8YBGsU+NWgHaJxW06hm
GHZzinueZYWaS20MPMYMPfcHUgigWj0kk7F0gMljG5rtDFzR85JkOi7+e5V6RVJW
8SQc7JjA1Krde6EiPJdlv3mVkLz/5VbrGUgjAW9di/O04Xc/jAc9kmJ41wTYr0mL
E5IsUi9QBmgHtKQ+2ofb9QEWuebJclGK+NVd3Kp08BtAQOrdn5UrIzY30/sjkZwE
ul2ll6v/k7T3dCQbHD/wMgfFycPvoZVfxaVj+FWQnwdWkqZwzMRmpKPRes4KJfww
Lfx1qox10mJzH7XJCs7xmOM8f7HgNNWi0gD5FxNlfkiz1AwHYOk=
=DTXY
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
"vhost and virtio fixes and features:
- Hardening work by Jason
- vdpa driver for Alibaba ENI
- Performance tweaks for virtio blk
- virtio rng rework using an internal buffer
- mac/mtu programming for mlx5 vdpa
- Misc fixes, cleanups"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (45 commits)
vdpa/mlx5: Forward only packets with allowed MAC address
vdpa/mlx5: Support configuration of MAC
vdpa/mlx5: Fix clearing of VIRTIO_NET_F_MAC feature bit
vdpa_sim_net: Enable user to set mac address and mtu
vdpa: Enable user to set mac and mtu of vdpa device
vdpa: Use kernel coding style for structure comments
vdpa: Introduce query of device config layout
vdpa: Introduce and use vdpa device get, set config helpers
virtio-scsi: don't let virtio core to validate used buffer length
virtio-blk: don't let virtio core to validate used length
virtio-net: don't let virtio core to validate used length
virtio_ring: validate used buffer length
virtio_blk: correct types for status handling
virtio_blk: allow 0 as num_request_queues
i2c: virtio: Add support for zero-length requests
virtio-blk: fixup coccinelle warnings
virtio_ring: fix typos in vring_desc_extra
virtio-pci: harden INTX interrupts
virtio_pci: harden MSI-X interrupts
virtio_config: introduce a new .enable_cbs method
...
When calling fsync_bdev(), zram driver guarantees that the bdev won't be
opened by anyone, then there can't be one active fs/superblock over the
zram bdev, so replace fsync_bdev with sync_blockdev.
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20211025025426.2815424-5-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
After resetting device in zram_remove(), disksize_store still may come and
allocate resources again before deleting gendisk, fix the race by resetting
zram after del_gendisk() returns. At that time, disksize_store can't come
any more.
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20211025025426.2815424-4-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When the zram module is being unloaded, no one should be using the
zram disks. However even while being unloaded the zram module's
sysfs attributes might be poked at to re-configure zram devices.
This is expected, and kernfs ensures that these operations complete
before device_del() completes.
But reset_store() may set ->claim which will fail zram_remove(), when
this happens, zram_reset_device() is bypassed, and zram->comp can't
be destroyed, so the warning of 'Error: Removing state 63 which has
instances left.' is triggered during unloading module, together with
memory leak and sort of thing.
Fixes the issue by not failing zram_remove() if ->claim is set, and
we actually need to do nothing in case that zram_reset() is running
since del_gendisk() will wait until zram_reset() is done.
Reported-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20211025025426.2815424-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit fcf3d633d8 ("nbd: check sock index in nbd_read_stat()") just
add error message when socket index doesn't match. Since the request
and reply must be transmitted over the same socket, it's ok to error
out in such situation.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20211101092538.1155842-1-yukuai3@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If first_minor is illegal will goto out_free_idr label, this will miss
cleanup disk.
Fixes: b1a811633f ("block: nbd: add sanity check for first_minor")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-4-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If 'part_shift' is not zero, then 'index << part_shift' might
overflow to a value that is not greater than '0xfffff', then sysfs
might complains about duplicate creation.
Fixes: b0d9111a2d ("nbd: use an idr to keep track of nbd devices")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-3-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
commit b1a811633f ("block: nbd: add sanity check for first_minor")
checks that 'first_minor' should not be greater than 0xff, which is
wrong. Whitout the commit, the details that when user pass 0x100000,
it ends up create sysfs dir "/sys/block/43:0" are as follows:
nbd_dev_add
disk->first_minor = index << part_shift
-> default part_shift is 5, first_minor is 0x2000000
device_add_disk
ddev->devt = MKDEV(disk->major, disk->first_minor)
-> (0x2b << 20) | (0x2000000) = 0x2b00000
device_add
device_create_sys_dev_entry
format_dev_t
sprintf(buffer, "%u:%u", MAJOR(dev), MINOR(dev));
-> got 43:0
sysfs_create_link -> /sys/block/43:0
By the way, with the wrong fix, when part_shift is the default value,
only 8 ndb devices can be created since 8 << 5 is greater than 0xff.
Since the max bits for 'first_minor' should be the same as what
MKDEV() does, which is 20. Change the upper bound of 'first_minor'
from 0xff to 0xfffff.
Fixes: b1a811633f ("block: nbd: add sanity check for first_minor")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-2-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8MOUQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpmeqEACrayLMDMdlb1FduTYw29QAL7XxS375r92T
bwLippmKQIFNi8p5ScHraelV5ixgxse2j68MexlQHpl9aHIn/oL7qHACIMgDP05m
KaSy8Hr2abqr+zz+rLMhkm21zAva6aWjQu7NoEjBE4dC5L4l9p885LaA+jmqQUno
1wvpaEcype8cITJ+sSCb3kD6nZx7y1Lt5zEefUfk6ruMm9x9FwvU6uc4rIHi+Zve
Hwo8yGbTvlU8rGSi9naC/U8pIZ4bqEuTAcV5VHNrWG+b4aA/aFPpSjpIiSBZSXo0
HXa+jmcr6gkejfPeOZkBbRub6Fm9Wq2pDAZskPWFX6zyX0pIV05GjJ2J/ba8rovn
QrcfxaBv8XitKgrjFZeR0ZBqD2iJjPA/Yq5/r1ZmZ0wSHI3W4UuTGhQYEPyDLceH
ZWq/wcfVFek4kAoCxCqy9kWiOujY90WWKQW3yD7b8FPZ0d+/R1Mn+drlYaSKN1Pk
/9/+z1DaLtBWbJ2G+BQ9oUkYmNSapAiYc2YXVss86hmhLX+prFtSj3zECZUvhyAz
b42A2DVsjU+65yT2zdPBXlMrbI91qNnvIXcz5szNdTfHTn9FiLQb4BffMV0FHT3g
vap8N3Rb8UkZ3v4NCVAtlfcGr0kvYHQH+Qgh6oAlXB4NQoKJCVadzpTFPMWjx788
oHBUjA0UTQ==
=4vl/
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block
Pull kiocb->ki_complete() cleanup from Jens Axboe:
"This removes the res2 argument from kiocb->ki_complete().
Only the USB gadget code used it, everybody else passes 0. The USB
guys checked the user gadget code they could find, and everybody just
uses res as expected for the async interface"
* tag 'for-5.16/ki_complete-2021-10-29' of git://git.kernel.dk/linux-block:
fs: get rid of the res2 iocb->ki_complete argument
usb: remove res2 argument from gadget code completions
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8MnsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuBpEACzrzbUfkTQ33bwF60mZQaqbR0ha7TrP/hp
oAqthmf1S2U+7mzXHQ+6MN7p4+TVPa/ITxQZtLTw7U/68+w68tTUZfZHJ5H6tSXu
92OHFDDP4ZeqATRTcJBij/5Si9BiKBHexMqeyVYPw0DWdEukAko9f7Z81GonFbTu
EIdIWivBc76bLiK/X3w7lhLcaNyUv9cKalwjbI4xtwcHtcIYj5d2jIc9PF2I9Xtl
3oqNT4GOSv7s3mW7syB1UEPrzbhVIzCSNbMSviCoK7GA5g8EN5KMEGQQoUJ942Zv
bHMjMpGrXsWebPto9maXycGY/9WsVcpNB7opyQRpyG8yDDZq0AFNJxD/NBMkQo4S
Sfp0fxpVXDRWu7zX0EktwGyOp4YNwfS6pDeAhqhnSl2uPWTsxGZ0kXvlMpR9Rt/t
TjEKZe6lmcC7s42rPVRBRw5HEzEsVovf0z4lyvC4M223CV3c5cuYkAAtCcqLdVWq
JkceHSb7EKu7QY6jf3sBud14HaAj+sub7kffOWhhAxObg3Ytsql61AGzbhandnxT
AtN3n9PBHNGmrSv4MiiuP+Dq5jeT5NspFkf1FvnRcfmZMJtH1VXHKr84JbAy4VHr
5cZoDJzL9Zm1d865f+VWkZeYd3b2kKP8C0dm6tAn4VweT6eb8bu6tgB7wFQwLIFK
aRxz5vQ1AQ==
=dLYJ
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block
Pull QUEUE_FLAG_SCSI_PASSTHROUGH removal from Jens Axboe:
"This contains a series leading to the removal of the
QUEUE_FLAG_SCSI_PASSTHROUGH queue flag"
* tag 'for-5.16/passthrough-flag-2021-10-29' of git://git.kernel.dk/linux-block:
block: remove blk_{get,put}_request
block: remove QUEUE_FLAG_SCSI_PASSTHROUGH
block: remove the initialize_rq_fn blk_mq_ops method
scsi: add a scsi_alloc_request helper
bsg-lib: initialize the bsg_job in bsg_transport_sg_io_fn
nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands
sd: implement ->get_unique_id
block: add a ->get_unique_id method
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8L70QHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpo9YEAC17yEJ0xwwtUUwZW8avzss4vdcIreFdiZu
gaS+9Oi1bLxj0d2SjaZXJxjT9K+W2LftEsLuQ4oM6VHiLQkcEDbjJdVm3goftTt5
aOvVormDdKbWNcGSbgxA/OcyUT39DH7y17NRVdqYzQSpnrhCod/1tb2ssck0OoYb
VEyBKogMwYeYR55Z3I8yL5pNcEhR8TihZv3rL1iQ7DNpvh5I0I9naSEtGNC84aLP
s4nwRIG+TYll+mg0sfSB29KF7xkoFQO7X7s1rnC/on+gsFEzbJcgkJPDIWeVLnLm
ma8F1i+vJliCGaztyXoleAdg5QDiFmwTQwXRPAk2u8njJhcKi/RwIk2QYMZBZmEJ
bB5EJnlnEaWxjgpCD7JDrtKgIgpbbQHc5QVHRZccsu43UqvDqOZIlvZNYY+h3ivz
jT1zKuKDaTf8YWbfdOJwqm9e+qyR0AFm3rLMdHO58QEh1DBvSLIIdRCNE8wX7nFM
Wx/GmQEkPqNTIZwJOQJMygK+sIuFUDybt3oAH2pjX1zyMx7kTJkrXvj0dhSS/B5u
+gfMs3otWqxQ4P1qfnaUd9mYl8JabV7le2NHzhjdARm4NKFJEtcJe5BJBwiMbo0n
vodqt7aUIAXwMrZXnWZL+w8CobhJBp8I5XHUgng147gDBuCjYQjBQT334auAXxgz
MUCgbjBDqw==
=Vadi
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block
Pull bdev size cleanups from Jens Axboe:
"Clean up the bdev size handling with new bdev_nr_bytes() helper"
* tag 'for-5.16/bdev-size-2021-10-29' of git://git.kernel.dk/linux-block: (34 commits)
partitions/ibm: use bdev_nr_sectors instead of open coding it
partitions/efi: use bdev_nr_bytes instead of open coding it
block/ioctl: use bdev_nr_sectors and bdev_nr_bytes
block: cache inode size in bdev
udf: use sb_bdev_nr_blocks
reiserfs: use sb_bdev_nr_blocks
ntfs: use sb_bdev_nr_blocks
jfs: use sb_bdev_nr_blocks
ext4: use sb_bdev_nr_blocks
block: add a sb_bdev_nr_blocks helper
block: use bdev_nr_bytes instead of open coding it in blkdev_fallocate
squashfs: use bdev_nr_bytes instead of open coding it
reiserfs: use bdev_nr_bytes instead of open coding it
pstore/blk: use bdev_nr_bytes instead of open coding it
ntfs3: use bdev_nr_bytes instead of open coding it
nilfs2: use bdev_nr_bytes instead of open coding it
nfs/blocklayout: use bdev_nr_bytes instead of open coding it
jfs: use bdev_nr_bytes instead of open coding it
hfsplus: use bdev_nr_sectors instead of open coding it
hfs: use bdev_nr_sectors instead of open coding it
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmF8KFsQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgph1ZEACwNuHkAZcIgNzKhzuLP9OjMhv9vV+q254G
/EcM31e+qgRioMd0ihbVsgW76jOwLEmb3ldKGcN+0Wo5+Sv9Im8+wAWYY1REOZO5
ZTUBfAzhEh63/EtqTFiU8U+7dmXqy4z7NaICnhlynjwkd3IT+I561os6kcqwJMMr
G+Q1Cnk9rgCMIoLOCoVThIpjmjyZzF33qJb2VEIkHfkot62iNdpABWaSASF+CCba
z8LfbvLAYz3YLl4thXlLJFU282T5y7gzgSomGvX4F0rMJSbqFbgoNEPxaYw9CvzC
uC6MnYCYdCdvVkWVm1b8I8LYzPd5GrpVOSh3JQGvuA4Ppv2IyJCDSruYGgVUlhao
cVPzuHCqNCfKk0ykYVRZy9oKiBk5wmFeKM/lSHu408y8VNraPNIAEpB6sA9qGr22
AYr8lNh3JDr0g8dtFsDOq+7u3MANW0KQozfzwTPZo6NjzEE1D2jIg39Ljiijo9+Y
3pU8pitIAhsKd2KhW1H6LmtJbF4dX756VKYDXOhzgORU0NZYgvGhBIj9tAdpQR0S
xeae5Kj0/wBGcqR/owf/n1EY/q7rWgNDETnsBhbmzMZyhwH3L6zhT+bfD8YoQCHY
ueyqhyIUe4YBxTrIpICqwDlqaMYAmQ0jRaci+bK9ovVlQ89FQ9o/BE2COPlI/DGX
w+rUmmoX4g==
=HiWU
-----END PGP SIGNATURE-----
Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
- paride driver cleanups (Christoph)
- Remove cryptoloop support (Christoph)
- null_blk poll support (me)
- Now that add_disk() supports proper error handling, add it to various
drivers (Luis)
- Make ataflop actually work again (Michael)
- s390 dasd fixes (Stefan, Heiko)
- nbd fixes (Yu, Ye)
- Remove redundant wq flush in mtip32xx (Christophe)
- NVMe updates
- fix a multipath partition scanning deadlock (Hannes Reinecke)
- generate uevent once a multipath namespace is operational again
(Hannes Reinecke)
- support unique discovery controller NQNs (Hannes Reinecke)
- fix use-after-free when a port is removed (Israel Rukshin)
- clear shadow doorbell memory on resets (Keith Busch)
- use struct_size (Len Baker)
- add error handling support for add_disk (Luis Chamberlain)
- limit the maximal queue size for RDMA controllers (Max Gurtovoy)
- use a few more symbolic names (Max Gurtovoy)
- fix error code in nvme_rdma_setup_ctrl (Max Gurtovoy)
- add support for ->map_queues on FC (Saurav Kashyap)
- support the current discovery subsystem entry (Hannes Reinecke)
- use flex_array_size and struct_size (Len Baker)
- bcache fixes (Christoph, Coly, Chao, Lin, Qing)
- MD updates (Christoph, Guoqing, Xiao)
- Misc fixes (Dan, Ding, Jiapeng, Shin'ichiro, Ye)
* tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block: (117 commits)
null_blk: Fix handling of submit_queues and poll_queues attributes
block: ataflop: Fix warning comparing pointer to 0
bcache: replace snprintf in show functions with sysfs_emit
bcache: move uapi header bcache.h to bcache code directory
nvmet: use flex_array_size and struct_size
nvmet: register discovery subsystem as 'current'
nvmet: switch check for subsystem type
nvme: add new discovery log page entry definitions
block: ataflop: more blk-mq refactoring fixes
block: remove support for cryptoloop and the xor transfer
mtd: add add_disk() error handling
rnbd: add error handling support for add_disk()
um/drivers/ubd_kern: add error handling support for add_disk()
m68k/emu/nfblock: add error handling support for add_disk()
xen-blkfront: add error handling support for add_disk()
bcache: add error handling support for add_disk()
dm: add add_disk() error handling
block: aoe: fixup coccinelle warnings
nvmet: use struct_size over open coded arithmetic
nvme: drop scan_lock and always kick requeue list when removing namespaces
...
We never tries to use used length, so the patch prevents the virtio
core from validating used length.
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-4-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
virtblk_setup_cmd returns blk_status_t in an int, callers then assign it
back to a blk_status_t variable. blk_status_t is either u32 or (more
typically) u8 so it works, but is inelegant and causes sparse warnings.
Pass the status in blk_status_t in a consistent way.
Reported-by: kernel test robot <lkp@intel.com>
Fixes: b2c5221fd074 ("virtio-blk: avoid preallocating big SGL for data")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
The default value is 0 meaning "no limit". However if 0
is specified on the command line it is instead silently
converted to 1. Further, the value is already validated
at point of use, there's no point in duplicating code
validating the value when it is set.
Simplify the code while making the behaviour more consistent
by using plain module_param.
Fixes: 1a662cf6cb9a ("virtio-blk: add num_request_queues module parameter")
Cc: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
coccicheck complains about the use of snprintf() in sysfs show
functions:
WARNING use scnprintf or sprintf
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Ye Guojin <ye.guojin@zte.com.cn>
Link: https://lore.kernel.org/r/20211021065111.1047824-1-ye.guojin@zte.com.cn
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
There is a spelling mistake in a dev_err error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20211025102240.22801-1-colin.i.king@gmail.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Acked-by: Jason Wang <jasowang@redhat.com>
If an untrusted device neogitates BLK_F_MQ but advertises a zero
num_queues, the driver may end up trying to allocating zero size
buffers where ZERO_SIZE_PTR is returned which may pass the checking
against the NULL. This will lead unexpected results.
Fixing this by failing the probe in this case.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211019070152.8236-2-jasowang@redhat.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Sometimes a user would like to control the amount of request queues to
be created for a block device. For example, for limiting the memory
footprint of virtio-blk devices.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210902204622.54354-1-mgurtovoy@nvidia.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
No need to pre-allocate a big buffer for the IO SGL anymore. If a device
has lots of deep queues, preallocation for the sg list can consume
substantial amounts of memory. For HW virtio-blk device, nr_hw_queues
can be 64 or 128 and each queue's depth might be 128. This means the
resulting preallocation for the data SGLs is big.
Switch to runtime allocation for SGL for lists longer than 2 entries.
This is the approach used by NVMe drivers so it should be reasonable for
virtio block as well. Runtime SGL allocation has always been the case
for the legacy I/O path so this is nothing new.
The preallocated small SGL depends on SG_CHAIN so if the ARCH doesn't
support SG_CHAIN, use only runtime allocation for the SGL.
Re-organize the setup of the IO request to fit the new sg chain
mechanism.
No performance degradation was seen (fio libaio engine with 16 jobs and
128 iodepth):
IO size IOPs Rand Read (before/after) IOPs Rand Write (before/after)
-------- --------------------------------- ----------------------------------
512B 318K/316K 329K/325K
4KB 323K/321K 353K/349K
16KB 199K/208K 250K/275K
128KB 36K/36.1K 39.2K/41.7K
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Link: https://lore.kernel.org/r/20210901131434.31158-1-mgurtovoy@nvidia.com
Reviewed-by: Feng Li <lifeng1519@gmail.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Tested-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Arnd Bergmann <arnd@arndb.de> # kconfig fixups
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Link: https://lore.kernel.org/r/20211015235219.2191207-2-mcgrof@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>