OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dan Carpenter	ec0862ac5a	nvme: delete an unnecessary declaration The nvme_put_ctrl() is implemented earlier as an inline function so this declaration isn't required. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Gustavo A. R. Silva	f1e71d75f0	nvme: replace zero-length array with flexible-array The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit `7649773293` ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Damien Le Moal	68ab60ca2d	nvme: fix io_opt limit setting Currently, a namespace io_opt queue limit is set by default to the physical sector size of the namespace and to the the write optimal size (NOWS) when the namespace reports optimal IO sizes. This causes problems with block limits stacking in blk_stack_limits() when a namespace block device is combined with an HDD which generally do not report any optimal transfer size (io_opt limit is 0). The code: /* Optimal I/O a multiple of the physical block size? */ if (t->io_opt & (t->physical_block_size - 1)) { t->io_opt = 0; t->misaligned = 1; ret = -1; } in blk_stack_limits() results in an error return for this function when the combined devices have different but compatible physical sector sizes (e.g. 512B sector SSD with 4KB sector disks). Fix this by not setting the optimal IO size queue limit if the namespace does not report an optimal write size value. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bart van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Wu Bo	84e4c204b6	nvme: disable streams when get stream params failed Disable streams again if getting the stream params fails. Signed-off-by: Wu Bo <wubo40@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Martin George	614fc1c0d9	nvme-fc: print proper nvme-fc devloss_tmo value The nvme-fc devloss_tmo is computed as the min of either the ctrl_loss_tmo (max_retries * reconnect_delay) or the remote port's devloss_tmo. But what gets printed as the nvme-fc devloss_tmo in nvme_fc_reconnect_or_delete() is always the remote port's devloss_tmo value. So correct this by printing the min value instead. Signed-off-by: Martin George <marting@netapp.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Weiping Zhang	9c9e76d579	nvme-pci: make sure write/poll_queues less or equal then cpu count Check module parameter write/poll_queues before using it to catch too large values. Reproducer: modprobe -r nvme modprobe nvme write_queues=`nproc` echo $((`nproc`+1)) > /sys/module/nvme/parameters/write_queues echo 1 > /sys/block/nvme0n1/device/reset_controller [ 657.069000] ------------[ cut here ]------------ [ 657.069022] WARNING: CPU: 10 PID: 1163 at kernel/irq/affinity.c:390 irq_create_affinity_masks+0x47c/0x4a0 [ 657.069056] dm_region_hash dm_log dm_mod [ 657.069059] CPU: 10 PID: 1163 Comm: kworker/u193:9 Kdump: loaded Tainted: G W 5.6.0+ #8 [ 657.069060] Hardware name: Inspur SA5212M5/YZMB-00882-104, BIOS 4.0.9 08/27/2019 [ 657.069064] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 657.069066] RIP: 0010:irq_create_affinity_masks+0x47c/0x4a0 [ 657.069067] Code: fe ff ff 48 c7 c0 b0 89 14 95 48 89 46 20 e9 e9 fb ff ff 31 c0 e9 90 fc ff ff 0f 0b 48 c7 44 24 08 00 00 00 00 e9 e9 fc ff ff <0f> 0b e9 87 fe ff ff 48 8b 7c 24 28 e8 33 a0 80 00 e9 b6 fc ff ff [ 657.069068] RSP: 0018:ffffb505ce1ffc78 EFLAGS: 00010202 [ 657.069069] RAX: 0000000000000060 RBX: ffff9b97921fe5c0 RCX: 0000000000000000 [ 657.069069] RDX: ffff9b67bad80000 RSI: 00000000ffffffa0 RDI: 0000000000000000 [ 657.069070] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff9b97921fe718 [ 657.069070] R10: ffff9b97921fe710 R11: 0000000000000001 R12: 0000000000000064 [ 657.069070] R13: 0000000000000060 R14: 0000000000000000 R15: 0000000000000001 [ 657.069071] FS: 0000000000000000(0000) GS:ffff9b67c0880000(0000) knlGS:0000000000000000 [ 657.069072] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 657.069072] CR2: 0000559eac6fc238 CR3: 000000057860a002 CR4: 00000000007606e0 [ 657.069073] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 657.069073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 657.069073] PKRU: 55555554 [ 657.069074] Call Trace: [ 657.069080] __pci_enable_msix_range+0x233/0x5a0 [ 657.069085] ? kernfs_put+0xec/0x190 [ 657.069086] pci_alloc_irq_vectors_affinity+0xbb/0x130 [ 657.069089] nvme_reset_work+0x6e6/0xeab [nvme] [ 657.069093] ? __switch_to_asm+0x34/0x70 [ 657.069094] ? __switch_to_asm+0x40/0x70 [ 657.069095] ? nvme_irq_check+0x30/0x30 [nvme] [ 657.069098] process_one_work+0x1a7/0x370 [ 657.069101] worker_thread+0x1c9/0x380 [ 657.069102] ? max_active_store+0x80/0x80 [ 657.069103] kthread+0x112/0x130 [ 657.069104] ? __kthread_parkme+0x70/0x70 [ 657.069105] ret_from_fork+0x35/0x40 [ 657.069106] ---[ end trace f4f06b7d24513d06 ]--- [ 657.077110] nvme nvme0: 95/1/0 default/read/poll queues Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:37 +02:00
Sagi Grimberg	5bb052d7aa	nvme-tcp: set MSG_SENDPAGE_NOTLAST with MSG_MORE when we have more to send We can signal the stack that this is not the last page coming and the stack can build a larger tso segment, so go ahead and use it. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-05-27 07:12:36 +02:00
Keith Busch	92decf118f	nvme: define constants for identification values Improve code readability by defining the specification's constants that the driver is using when decoding identification payloads. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Bart van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Christoph Hellwig	7890b9701b	nvme-multipath: stop using ->queuedata nvme-multipath already uses the gendisk private data, not need to also set up the request_queue queuedata and use it in one place only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Sagi Grimberg	db5ad6b7f8	nvme-tcp: try to send request in queue_rq context Today, nvme-tcp automatically schedules a send request to a workqueue context, which is 1 more than we'd need in case the socket buffer is wide open. However, because we have async send activity (as a result of r2t, or write_space callbacks), we need to synchronize sends from possibly multiple contexts (ideally all running on the same cpu though). Thus, we only try to send directly from queue_rq in cases: 1. the send_list is empty 2. we can send it synchronously (i.e. not from the RX path) 3. we run on the same cpu as the queue->io_cpu to avoid contention on the send operation. Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Sagi Grimberg	72e5d757c6	nvme-tcp: avoid scheduling io_work if we are already polling When the user runs polled I/O, we shouldn't have to trigger the workqueue to generate the receive work upon the .data_ready upcall. This prevents a redundant context switch when the application is already polling for completions. Proposed-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Sagi Grimberg	386e5e6e1a	nvme-tcp: use bh_lock in data_ready data_ready may be invoked from send context or from softirq, so need bh locking for that. Fixes: `3f2304f8c6` ("nvme-tcp: add NVMe over TCP host driver") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Weiping Zhang	2a5bcfdd41	nvme-pci: align io queue count with allocted nvme_queue in nvme_probe Since commit `147b27e4bd` ("nvme-pci: allocate device queues storage space at probe"), nvme_alloc_queue does not alloc the nvme queues itself anymore. If the write/poll_queues module parameters are changed at runtime to values larger than the number of allocated queues in nvme_probe, nvme_alloc_queue will access unallocated memory. Add a new nr_allocated_queues member to struct nvme_dev to record how many queues were alloctated in nvme_probe to avoid using more than the allocated queues after a reset following a change to the write/poll_queues module parameters. Also add nr_write_queues and nr_poll_queues members to allow refreshing the number of write and poll queues based on a change to the module parameters when resetting the controller. Fixes: `147b27e4bd` ("nvme-pci: allocate device queues storage space at probe") Signed-off-by: Weiping Zhang <zhangweiping@didiglobal.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> [hch: add nvme_max_io_queues, update the commit message] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Keith Busch	54b2fcee1d	nvme-pci: remove last_sq_tail The nvme driver does not have enough tags to wrap the queue, and blk-mq will no longer call commit_rqs() when there are no new submissions to notify. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Keith Busch	74943d45ee	nvme-pci: remove volatile cqes The completion queue entry is not volatile once the phase is confirmed. Remove the volatile keywords and check the phase using the appropriate READ_ONCE() accessor, allowing the compiler to optimize the remaining completion path. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Keith Busch	b04df85d9a	nvme: flush scan work on passthrough commands If a passthrough command causes the namespace inventory or capabilities to change, flush the scan work that handles these changes so the driver synchronizes with the user command's effects before returning the result to user space. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Christoph Hellwig	6623c5b3df	nvme: clean up error handling in nvme_init_ns_head Use a common label for putting the nshead if needed and only convert nvme status codes for the one case where it actually is needed. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Arnd Bergmann	3add1d93d9	nvme-fc: avoid gcc-10 zero-length-bounds warning When CONFIG_ARCH_NO_SG_CHAIN is set, op->sgl[0] cannot be dereferenced, as gcc-10 now points out: drivers/nvme/host/fc.c: In function 'nvme_fc_init_request': drivers/nvme/host/fc.c:1774:29: warning: array subscript 0 is outside the bounds of an interior zero-length array 'struct scatterlist[0]' [-Wzero-length-bounds] 1774 \| op->op.fcp_req.first_sgl = &op->sgl[0]; \| ^~~~~~~~~~~ drivers/nvme/host/fc.c:98:21: note: while referencing 'sgl' 98 \| struct scatterlist sgl[NVME_INLINE_SG_CNT]; \| ^~~ I don't know if this is a legitimate warning or a false-positive. If this is just a false alarm, the warning is easily suppressed by interpreting the array as a pointer. Fixes: `b1ae1a2389` ("nvme-fc: Avoid preallocating big SGL for data") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:36 -06:00
Keith Busch	31fdad7be1	nvme: consolodate io settings The stream parameters indicating optimal io settings were just getting overwritten later. Rearrange the settings so the streams parameters can be preserved if provided. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	bc1af009a8	nvme: revalidate namespace stream parameters The stream parameters are based on the currently formatted logical block size. Recheck these parameters on namespace revalidation so the registered constraints will be accurate. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	38adf94e16	nvme: consolidate chunk_sectors settings Move the quirked chunk_sectors setting to the same location as noiob so one place registers this setting. And since the noiob value is only used locally, remove the member from struct nvme_ns. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	b2b2de7c5a	nvme: revalidate after verifying identifiers If the namespace identifiers have changed, skip updating the disk information, as that will register parameters from a mismatched namespace. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	b2ce4d9069	nvme-multipath: set bdi capabilities once The queues' backing device info capabilities don't change with each namespace revalidation. Set it only when each path's request_queue is initially added to a multipath queue. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	0c284db7f2	nvme: check namespace head shared property Reject a new shared namespace if a duplicate unshared namespace exists. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	9ad1927a3b	nvme: always search for namespace head Even if a namespace reports it is not capable of sharing, search the subsystem for a matching namespace head. If found, the driver should reject that namespace since it's coming from an invalid configuration. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	ac262508da	nvme: release namespace head reference on error If a namespace identification does not match the subsystem's head for that NSID, release the reference that was taken when the matching head was initially found. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	d567572906	nvme: unlink head after removing last namespace The driver had been unlinking the namespace head from the subsystem's list only after the last reference was released, and outside of the list's subsys->lock protection. There is no reason to track an empty head, so unlink the entry from the subsystem's list when the last namespace using that head is removed and with the mutex lock protecting the list update. The next namespace to attach reusing the previous NSID will allocate a new head rather than find the old head with mismatched identifiers. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Christoph Hellwig	aec459b484	nvme: remove the magic 1024 constant in nvme_scan_ns_list Replace it with a value derived from the identify data and nsid sizes. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Christoph Hellwig	4005f28d25	nvme: avoid an Identify Controller command for each namespace scan The namespace lists are 0-terminated, so we don't really need the NN value execept for the legacy sequential scan. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Christoph Hellwig	4450ba3bbb	nvme: factor out a nvme_ns_remove_by_nsid helper Factor out a piece of deeply indented and logicaly separate code from nvme_scan_ns_list into a new helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Christoph Hellwig	25dcaa9292	nvme: clean up nvme_scan_work Move the check for the supported CNS value into nvme_scan_ns_list, and limit the life time of the identify controller allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Christoph Hellwig	b9a5c3d4c3	nvme: refine the Qemu Identify CNS quirk Add a helper to check if we can use Identify CNS values > 1, and refine the Qemu quirk to not apply to reported versions larger than 1.1, as the Qemu implementation had been fixed by then. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	03f8cebc12	nvme: remove unused parameter nvme_alloc_ns_head() doesn't use the 'struct nvme_id_ns' parameter. Remove it, and update caller accordingly. Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:35 -06:00
Keith Busch	71fb90eb71	nvme: provide num dword helper Various nvme commands use a zeroes based number of dwords field. Create a helper function to convert byte lengths to this format. Signed-off-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:34 -06:00
James Smart	14fd1e98af	nvme-fc: Add Disconnect Association Rcv support The nvme-fc host transport did not support the reception of a FC-NVME LS. Reception is necessary to implement full compliance with FC-NVME-2. Populate the LS receive handler, and specifically the handling of a Disconnect Association LS. The response to the LS, if it matched a controller, must be sent after the aborts for any I/O on any connection have been sent. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	fd5a5f2213	nvme-fc: Update header and host for common definitions for LS handling Given that both host and target now generate and receive LS's create a single table definition for LS names. Each tranport half will have a local version of the table. As Create Association LS is issued by both sides, and received by both sides, create common routines to format the LS and to validate the LS. Convert the host side transport to use the new common Create Association LS formatting routine. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	eb4ee8f125	nvme-fc: convert assoc_active flag to bit op Convert the assoc_active boolean flag to a bitop on the flags field. The bit ops will provide atomicity. To make this change, the flags field was converted to a long type, which also affects the FCCTRL_TERMIO flag. Both FCCTRL_TERMIO and now ASSOC_ACTIVE flags are set/cleared by bit operations. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	f56bf76f79	nvme-fc: Ensure private pointers are NULL if no data Ensure that when allocations are done, and the lldd options indicate no private data is needed, that private pointers will be set to NULL (catches driver error that forgot to set private data size). Slightly reorg the allocations so that private data follows allocations for LS request/response buffers. Ensures better alignments for the buffers as well as the private pointer. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	3b8281b02b	nvmet-fc: Better size LS buffers Current code uses NVME_FC_MAX_LS_BUFFER_SIZE (2KB) when allocating buffers for LS requests and responses. This is considerable overkill for what is actually defined. Rework code to have unions for all possible requests and responses and size based on the unions. Remove NVME_FC_MAX_LS_BUFFER_SIZE. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	ca19bcd086	nvme-fc nvmet-fc: refactor for common LS definitions Routines in the target will want to be used in the host as well. Error definitions should now shared as both sides will process requests and responses to requests. Moved common declarations to new fc.h header kept in the host subdirectory. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
James Smart	72e6329f86	nvme-fc and nvmet-fc: revise LLDD api for LS reception and LS request The current LLDD api has: nvme-fc: contains api for transport to do LS requests (and aborts of them). However, there is no interface for reception of LS's and sending responses for them. nvmet-fc: contains api for transport to do reception of LS's and sending of responses for them. However, there is no interface for doing LS requests. Revise the api's so that both nvme-fc and nvmet-fc can send LS's, as well as receiving LS's and sending their responses. Change name of the rcv_ls_req struct to better reflect generic use as a context to used to send an ls rsp. Specifically: nvmefc_tgt_ls_req -> nvmefc_ls_rsp nvmefc_tgt_ls_req.nvmet_fc_private -> nvmefc_ls_rsp.nvme_fc_private Change nvmet_fc_rcv_ls_req() calling sequence to provide handle that can be used by transport in later LS request sequences for an association. nvme-fc nvmet_fc nvme_fcloop: Revise to adapt to changed names in api header. Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle. Add stubs for new interfaces: host/fc.c: nvme_fc_rcv_ls_req() target/fc.c: nvmet_fc_invalidate_host() lpfc: Revise to adapt code to changed names in api header. Change calling sequence to nvmet_fc_rcv_ls_req() for hosthandle. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:33 -06:00
Sagi Grimberg	59c7c3caaa	nvme: fix possible hang when ns scanning fails during error recovery When the controller is reconnecting, the host fails I/O and admin commands as the host cannot reach the controller. ns scanning may revalidate namespaces during that period and it is wrong to remove namespaces due to these failures as we may hang (see `205da24343`). One command that may fail is nvme_identify_ns_descs. Since we return success due to having ns identify descriptor list optional, we continue to compare ns identifiers in nvme_revalidate_disk, obviously fail and return -ENODEV to nvme_validate_ns, which will remove the namespace. Exactly what we don't want to happen. Fixes: `22802bf742` ("nvme: Namepace identification descriptor list is optional") Tested-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:07:58 -06:00
Alexey Dobriyan	a8de663916	nvme-pci: fix "slimmer CQ head update" Pre-incrementing ->cq_head can't be done in memory because OOB value can be observed by another context. This devalues space savings compared to original code :-\ $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-32 (-32) Function old new delta nvme_poll_irqdisable 464 456 -8 nvme_poll 455 447 -8 nvme_irq 388 380 -8 nvme_dev_disable 955 947 -8 But the code is minimal now: one read for head, one read for q_depth, one increment, one comparison, single instruction phase bit update and one write for new head. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reported-by: John Garry <john.garry@huawei.com> Tested-by: John Garry <john.garry@huawei.com> Fixes: `e2a366a4b0` ("nvme-pci: slimmer CQ head update") Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:07:58 -06:00
Niklas Cassel	132be62387	nvme: prevent double free in nvme_alloc_ns() error handling When jumping to the out_put_disk label, we will call put_disk(), which will trigger a call to disk_release(), which calls blk_put_queue(). Later in the cleanup code, we do blk_cleanup_queue(), which will also call blk_put_queue(). Putting the queue twice is incorrect, and will generate a KASAN splat. Set the disk->queue pointer to NULL, before calling put_disk(), so that the first call to blk_put_queue() will not free the queue. The second call to blk_put_queue() uses another pointer to the same queue, so this call will still free the queue. Fixes: `85136c0102` ("lightnvm: simplify geometry enumeration") Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-04-27 17:08:06 +02:00
Linus Torvalds	8df2a0a6da	block-5.7-2020-04-10 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl6QhDIQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpsE/EADOQ0xDMOa8EmzRvjuCkiaB9yK2zXiBSAj5 ZBi7ReownfXhCR7nVc8Bv1s2f00PD6CFNURXZmdgyDDrXEd2ojueDoAZNBk59t0e i2CAF2wLAQ5EfuVaxSHVEOrVEmtu+ue+Ix83JNlnGPd7pf9s7uKc/W4iKGpgpxIo 1CpXmWwm5RwjX4z/Qsiaka2lB7QojjImp1n3C+XI5+pp/bJXiftep1lxH5Y3nSWU iR4jO81uxDMxhTEZ9z2cb1HarhctKvnihcb39gQYQ/kYYu7hSZnBPZo5zp5Dyb/t 4tGuDsfXCQCbF0smkusUrcyeT19vh9tOsGkiMzJ/ihm7TMyN4fT23h6DUb/7pAON jnlcB7r5Ezs8jLz9i+mAoq06djd5u54kiuKFog8170sTrtYsncZbyc01wLNAla/V /6KX1sMbPlbXZ+a3l3i7i/gcCBJ7ci6pV3x2elvM9dKHxyqJmwEGMlFVwt4s26ev wS+7+dktLAC73889Zyn8LutA4bWy5FmisSPA4PydSUSOZA+7JjlbILcz15jjwlP2 HzYk+TXsd3yJUQRYX5P0FcDaBUTISr/xeUUB+KT1rLv4Lhtso+S/9cvSc8x5mOa9 989gmqNfFAWoj1nKEIKeRwLjk0b6YA9qMv4jOwwiuobsT55aBxpbP80huNoRVj5L xFIWgBSwzg== =3woC -----END PGP SIGNATURE----- Merge tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "Here's a set of fixes that should go into this merge window. This contains: - NVMe pull request from Christoph with various fixes - Better discard support for loop (Evan) - Only call ->commit_rqs() if we have queued IO (Keith) - blkcg offlining fixes (Tejun) - fix (and fix the fix) for busy partitions" * tag 'block-5.7-2020-04-10' of git://git.kernel.dk/linux-block: block: fix busy device checking in blk_drop_partitions again block: fix busy device checking in blk_drop_partitions nvmet-rdma: fix double free of rdma queue blk-mq: don't commit_rqs() if none were queued nvme-fc: Revert "add module to ops template to allow module references" nvme: fix deadlock caused by ANA update wrong locking nvmet-rdma: fix bonding failover possible NULL deref loop: Better discard support for block devices loop: Report EOPNOTSUPP properly nvmet: fix NULL dereference when removing a referral nvme: inherit stable pages constraint in the mpath stack device blkcg: don't offline parent blkcg first blkcg: rename blkcg->cgwb_refcnt to ->online_pin and always use it nvme-tcp: fix possible crash in recv error flow nvme-tcp: don't poll a non-live queue nvme-tcp: fix possible crash in write_zeroes processing nvmet-fc: fix typo in comment nvme-rdma: Replace comma with a semicolon nvme-fcloop: fix deallocation of working context nvme: fix compat address handling in several ioctls	2020-04-10 10:06:54 -07:00
James Smart	8c5c660529	nvme-fc: Revert "add module to ops template to allow module references" The original patch was to resolve the lldd being able to be unloaded while being used to talk to the boot device of the system. However, the end result of the original patch is that any driver unload while a nvme controller is live via the lldd is now being prohibited. Given the module reference, the module teardown routine can't be called, thus there's no way, other than manual actions to terminate the controllers. Fixes: `863fbae929` ("nvme_fc: add module to ops template to allow module references") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-04-04 09:09:39 +02:00
Sagi Grimberg	657f1975e9	nvme: fix deadlock caused by ANA update wrong locking The deadlock combines 4 flows in parallel: - ns scanning (triggered from reconnect) - request timeout - ANA update (triggered from reconnect) - I/O coming into the mpath device (1) ns scanning triggers disk revalidation -> update disk info -> freeze queue -> but blocked, due to (2) (2) timeout handler reference the g_usage_counter - > but blocks in the transport .timeout() handler, due to (3) (3) the transport timeout handler (indirectly) calls nvme_stop_queue() -> which takes the (down_read) namespaces_rwsem - > but blocks, due to (4) (4) ANA update takes the (down_write) namespaces_rwsem -> calls nvme_mpath_set_live() -> which synchronize the ns_head srcu (see commit `504db087aa`) -> but blocks, due to (5) (5) I/O came into nvme_mpath_make_request -> took srcu_read_lock -> direct_make_request > blk_queue_enter -> but blocked, due to (1) ==> the request queue is under freeze -> deadlock. The fix is making ANA update take a read lock as the namespaces list is not manipulated, it is just the ns and ns->head that are being updated (which is protected with the ns->head lock). Fixes: `0d0b660f21` ("nvme: add ANA support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <kbusch@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-04-04 09:07:03 +02:00
Linus Torvalds	79f51b7b9c	SCSI misc on 20200402 update changing all our txt files to rst ones. Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc, zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and some other minor updates. The major core update is Hannes moving functions out of the aacraid driver and into the core. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXoYKiyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSasAP4iGwSB Y8tFaZgWadu76+wj5MdqTBoXdhnIuFF0rZG3pQEAiIKdsfQlbSFdm75+gUtx5hG/ GOilX/pJczTRJDCGNis= =g7Sk -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "This series has a huge amount of churn because it pulls in Mauro's doc update changing all our txt files to rst ones. Excluding that, we have the usual driver updates (qla2xxx, ufs, lpfc, zfcp, ibmvfc, pm80xx, aacraid), a treewide update for scnprintf and some other minor updates. The major core change is Hannes moving functions out of the aacraid driver and into the core" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (223 commits) scsi: aic7xxx: aic97xx: Remove FreeBSD-specific code scsi: ufs: Do not rely on prefetched data scsi: dc395x: remove dc395x_bios_param scsi: libiscsi: Fix error count for active session scsi: hpsa: correct race condition in offload enabled scsi: message: fusion: Replace zero-length array with flexible-array member scsi: qedi: Add PCI shutdown handler support scsi: qedi: Add MFW error recovery process scsi: ufs: Enable block layer runtime PM for well-known logical units scsi: ufs-qcom: Override devfreq parameters scsi: ufshcd: Let vendor override devfreq parameters scsi: ufshcd: Update the set frequency to devfreq scsi: ufs: Resume ufs host before accessing ufs device scsi: ufs-mediatek: customize the delay for enabling host scsi: ufs: make HCE polling more compact to improve initialization latency scsi: ufs: allow custom delay prior to host enabling scsi: ufs-mediatek: use common delay function scsi: ufs: introduce common and flexible delay function scsi: ufs: use an enum for host capabilities scsi: ufs: fix uninitialized tx_lanes in ufshcd_disable_tx_lcc() ...	2020-04-02 17:03:53 -07:00
Sagi Grimberg	74e4d20e2f	nvme: inherit stable pages constraint in the mpath stack device If the backing device require stable pages, we need to set it on the stack mpath device as well. This applies to rdma/fc transports when doing data integrity and tcp transport calculating digests. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-04-02 10:51:46 +02:00
Sagi Grimberg	39d06079a5	nvme-tcp: fix possible crash in recv error flow If the target misbehaves and sends us unexpected payload we need to make sure to fail the controller and stop processing the input stream. We clear the rd_enabled flag and stop the io_work, but we may still requeue it if we still have pending sends and then in the next invocation we will process the input stream as the check is only in the .data_ready upcall. To fix this we need to make sure not to self-requeue io_work upon a recv flow error. This fixes the crash: nvme nvme2: receive failed: -22 BUG: unable to handle page fault for address: ffffbeb5816c3b48 nvme_ns_head_make_request: 29 callbacks suppressed block nvme0n5: no usable path - requeuing I/O block nvme0n5: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n7: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O block nvme0n3: no usable path - requeuing I/O #PF: supervisor read access inkernel mode #PF: error_code(0x0000) - not-present page PGD 1039157067 P4D 1039157067 PUD 103915a067 PMD 102719f067 PTE 0 Oops: 0000 [#1] SMP PTI CPU: 8 PID: 411 Comm: kworker/8:1H Not tainted 5.3.0-40-generic #32~18.04.1-Ubuntu Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0 12/17/2015 Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] RIP: 0010:nvme_tcp_recv_skb+0x2ae/0xb50 [nvme_tcp] RSP: 0018:ffffbeb5806cfd10 EFLAGS: 00010246 RAX: ffffbeb5816c3b48 RBX: 00000000000003d0 RCX: 0000000000000008 RDX: 00000000000003d0 RSI: 0000000000000001 RDI: ffff9a3040684b40 RBP: ffffbeb5806cfd90 R08: 0000000000000000 R09: ffffffff946e6900 R10: ffffbeb5806cfce0 R11: 0000000000000001 R12: 0000000000000000 R13: ffff9a2ff86501c0 R14: 00000000000003d0 R15: ffff9a30b85f2798 FS: 0000000000000000(0000) GS:ffff9a30bf800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffbeb5816c3b48 CR3: 000000088400a006 CR4: 00000000003626e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_read_sock+0x8c/0x290 ? __release_sock+0x9d/0xe0 ? nvme_tcp_write_space+0xb0/0xb0 [nvme_tcp] nvme_tcp_io_work+0x4b4/0x830 [nvme_tcp] ? finish_task_switch+0x163/0x270 process_one_work+0x1fd/0x3f0 worker_thread+0x34/0x410 kthread+0x121/0x140 ? process_one_work+0x3f0/0x3f0 ? kthread_park+0xb0/0xb0 ret_from_fork+0x35/0x40 Reported-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2020-04-01 11:07:13 +02:00

1 2 3 4 5 ...

1479 Commits