OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Jens Axboe	2c30540b38	NVMe: add ->exit_hctx() hook If we do teardown and setup of the queue and block related parts of the driver, then we should clear nvmeq->hctx once we kill the hardware queue. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-20 11:10:03 -07:00
Jens Axboe	e32efbfc35	NVMe: make setup work for devices that don't do INTx The setup/probe part currently relies on INTx being there and working, that's not always the case. For devices that don't advertise INTx, enable a single MSIx vector early on and disable it again before we ask for our full range of queue vecs. Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-19 12:46:17 -07:00
Jens Axboe	1397688953	NVMe: enable IO stats by default Before the blk-mq conversion they were on by default, we should not change behavior there. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-18 08:45:31 -07:00
Jens Axboe	6dcc0cf6cb	NVMe: nvme_submit_async_admin_req() must use atomic rq allocation We are called for async event notification issues, and the nvmeq lock is already held. If we fail the request allocation, we'll just retry next time. Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-18 08:21:18 -07:00
Jens Axboe	9d135bb8c2	NVMe: replace blk_put_request() with blk_mq_free_request() No point in using blk_put_request(), since we know we are blk-mq. This only makes sense in core code where we could be dealing with either legacy or blk-mq drivers. Additionally, use blk_mq_free_hctx_request() for the request completion fast path, where we already know the mapping from request to hardware queue. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-17 10:43:42 -07:00
kbuild test robot	a64e6bb4db	NVMe: __nvme_submit_admin_cmd() can be static drivers/block/nvme-core.c:865:5: sparse: symbol '__nvme_submit_admin_cmd' was not declared. Should it be static? Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-10 09:27:01 -07:00
Dan Carpenter	9f173b3384	NVMe: blk_mq_alloc_request() returns error pointers We recently converted this to blk_mq but the error checks have to be updated to check for IS_ERR() instead of NULL. Fixes: `a4aea5623d` ('NVMe: Convert to blk-mq') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-05 13:47:03 -07:00
Matias Bjørling	a4aea5623d	NVMe: Convert to blk-mq This converts the NVMe driver to a blk-mq request-based driver. The NVMe driver is currently bio-based and implements queue logic within itself. By using blk-mq, a lot of these responsibilities can be moved and simplified. The patch is divided into the following blocks: * Per-command data and cmdid have been moved into the struct request field. The cmdid_data can be retrieved using blk_mq_rq_to_pdu() and id maintenance are now handled by blk-mq through the rq->tag field. * The logic for splitting bio's has been moved into the blk-mq layer. The driver instead notifies the block layer about limited gap support in SG lists. * blk-mq handles timeouts and is reimplemented within nvme_timeout(). This both includes abort handling and command cancelation. * Assignment of nvme queues to CPUs are replaced with the blk-mq version. The current blk-mq strategy is to assign the number of mapped queues and CPUs to provide synergy, while the nvme driver assign as many nvme hw queues as possible. This can be implemented in blk-mq if needed. * NVMe queues are merged with the tags structure of blk-mq. * blk-mq takes care of setup/teardown of nvme queues and guards invalid accesses. Therefore, RCU-usage for nvme queues can be removed. * IO tracing and accounting are handled by blk-mq and therefore removed. * Queue suspension logic is replaced with the logic from the block layer. Contributions in this patch from: Sam Bradshaw <sbradshaw@micron.com> Jens Axboe <axboe@fb.com> Keith Busch <keith.busch@intel.com> Robert Nelson <rlnelson@google.com> Acked-by: Keith Busch <keith.busch@intel.com> Acked-by: Jens Axboe <axboe@fb.com> Updated for new ->queue_rq() prototype. Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:52 -07:00
Keith Busch	9dbbfab7d5	NVMe: Do not over allocate for discard requests Discard requests are often for very large ranges. The discard size is not representative of the data transfer size so we don't need to allocate for such a large prp list. This patch requests allocating only enough for the memory needed for the data transfer and saves a little over 8k of memory per max discard request. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:37 -07:00
Keith Busch	9e60352cf8	NVMe: Do not open disks that are being deleted It is possible the block layer will request to open a block device after the driver deleted it. Subsequent releases will cause a double free, or the disk's private_data is pointing to freed memory. This patch protects the driver's freed disks from being opened and accessed: the nvme namespaces are freed only when the device's refcount is 0, so at that moment there were no active openers and no more should be allowed, and it is safe to clear the disk's private_data that is about to be freed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Henry Chow <henry.chow@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:32 -07:00
Keith Busch	5940c8578f	NVMe: Clear QUEUE_FLAG_STACKABLE The nvme namespace request_queue's flags are initialized to QUEUE_FLAG_DEFAULT, which currently sets QUEUE_FLAG_STACKABLE. The device-mapper indicates this flag means the block driver is requset based, though this driver is bio-based and problems will occur if an nvme namespace is used with a request based dm device. This patch clears the stackable flag. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:18:10 -07:00
Keith Busch	387caa5a2b	NVMe: Fix device probe waiting on kthread If we ever do parallel device probing, we need to wake up all processes waiting for nvme kthread to start, not just one. This is currently serialized so the bug is not reachable today, but fixing this anyway in the hopes we implement parallel or asynchronous probe in the future. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:10 -07:00
Keith Busch	7963e52181	NVMe: Passthrough IOCTL for IO commands The NVME_IOCTL_SUBMIT_IO only works for IO commands with block data transfers and isn't usable for other NVMe commands like flush, data set management, or any sort of vendor unique command. The NVME_IOCTL_ADMIN_CMD, however, can easily be modified to accept arbitrary IO commands in addition to arbitrary admin commands without breaking backward compatibility. This patch just adds a new IOCTL to distinguish if the driver should submit the command on an IO or Admin queue. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	1b9dbf7fe0	NVMe: Add revalidate_disk callback This adds a callback to revalidate the disk and change its block size and capacity if needed. Before, a user would have to remove + rescan an entire device if they changed the logical block size using an NVMe Format or other vendor specific command; now they can just run something that issues the BLKRRPART IOCTL, like # hdparm -z /dev/nvmeXnY This can also be used in response to the 1.2 Spec's Namespace Attribute Change asynchronous event. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	7be50e93fb	NVMe: Fix nvmeq waitqueue entry initialization We need to update the nvme queue's wait_queue_t entry during each initialization since the nvme_thread may be ended and restarted when the device is reset. If a device reset occurs during a large amount of buffered IO, it would take a lot longer to complete the outstanding requests due to the 1 second polling instead of waking up as completions occur. Fixes: `b9afca3efb` Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	b4ff9c8ddb	NVMe: Translate NVMe status to errno This returns a more appropriate error for the "capacity exceeded" status. In case other NVMe statuses have a better errno, this patch adds a convience function to translate an NVMe status code to an errno for IO commands, defaulting to the current -EIO. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	e179729a82	NVMe: Remove duplicate compat SG_IO code We can return -ENOIOCTLCMD and the ioctl will be handled by fs/compat_ioctl.c instead. This removes a lot of duplicate code in the nvme driver. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Keith Busch	a96d4f5c2d	NVMe: Reference count pci device If an nvme device is removed but user space has an open reference, the nvme driver would have been holding an invalid reference to its pci device. You may get a general protection fault on x86 h/w when the driver uses that reference in dma_map_sg(), as is done in nvme_map_user_pages() from the IOCTL interface. This patch fixes the fault by taking a reference on the pci device and holding it even after device removal until all opens on the nvme device are closed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Nilesh Choudhury <nilesh.choudhury@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:09 -07:00
Andreea-Cristina Bernat	062261be4e	nvme: Replace rcu_assign_pointer() with RCU_INIT_POINTER() The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Sam Bradshaw	5905535610	NVMe: Correctly handle IOCTL_SUBMIT_IO when cpus > online queues nvme_submit_io_cmd() uses smp_processor_id() to pick an IO queue index. This patch fixes the case where there are more cpus from which the ioctl call can originate than online queues, which can happen when a device supports or was allocated fewer interrupt vectors than exist cpu cores. Thanks to Keith Busch for the implementation suggestion. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	302c6727e5	NVMe: Fix filesystem sync deadlock on removal This changes the order of deleting the gendisks so it happens after the nvme IO queues are freed. If a device is removed while a filesystem has associated dirty data, the removal will wait on these to complete before proceeding from del_gendisk, which could have caused deadlock before. The implication of this is that an orderly removal of a responsive device won't necessarily wait for dirty data to be written, but we are not guaranteed the device is even going to respond at this point either. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	f435c2825b	NVMe: Call nvme_free_queue directly Rather than relying on call_rcu, this patch directly frees the nvme_queue's memory after ensuring no readers exist. Some arch specific dma_free_coherent implementations may not be called from a call_rcu's soft interrupt context, hence the change. Signed-off-by: Keith Busch <keith.busch@intel.com> Reported-by: Matthew Minter <matthew_minter@xyratex.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Dan McLeran	2484f40780	NVMe: Add shutdown timeout as module parameter. The current implementation hard-codes the shutdown timeout to 2 seconds. Some devices take longer than this to complete a normal shutdown. Changing the shutdown timeout to a module parameter with a default timeout of 5 seconds. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	7c1b245038	NVMe: Skip orderly shutdown on failed devices Rather than skipping shutdown only for devices that have been removed, skip the orderly shutdown on failed devices to avoid the long timeout handling that inevitably happens when deleting queues on such a device. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	a67394790a	NVMe: Whitespace fixes Fixing tabs inadvertently converted to spaces. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:08 -07:00
Keith Busch	c81f49758a	NVMe: Use pci_stop_and_remove_bus_device_locked() Race conditions are theoretically possible between the NVMe PCI device removal and the generic PCI bus rescan and device removal that can be triggered via sysfs. To avoid those race conditions make the NVMe code use pci_stop_and_remove_bus_device_locked(). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	badc34d415	NVMe: Handling devices incapable of I/O This is a minor refactor for handling devices that are incapable of IO. The driver previously used special error codes to know that IO queues are unavailable, but we have an online queue count now. This also fixes an issue where the driver successfully sets the queue count, but either is unable to allocate an IO queue or the device can't create one for some reason. If the driver can successfully enable the device and get responses to admin commands, the driver will bring up a character device for managment but not create block devices. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Dan McLeran	01079522f9	NVMe: Change nvme_enable_ctrl to set EN and manage CC thru ctrl_config. Change the behavior of nvme_enable_ctrl to set EN. Clear CC.SH for both nvme_enable_ctrl and nvme_disable_ctrl. Remove reading of the CC register and manage the state in dev->ctrl_config. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> [removed an unwanted write to CC] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	1d09062460	NVMe: Mismatched host/device page size support Adds support for devices with max page size smaller than the host's. In the case we encounter such a host/device combination, the driver will split a page into as many PRP entries as necessary for the device's page size capabilities. If the device's reported minimum page size is greater than the host's, the driver will not attempt to enable the device and return an error instead. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Keith Busch	6fccf9383b	NVMe: Async event request Submits NVMe asynchronous event requests, one event up to the controller maximum or number of possible different event types (8), whichever is smaller. Events successfully returned by the controller are logged. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Joe Perches	4d51abf9bc	block: Use dma_zalloc_coherent Use the zeroing function instead of dma_alloc_coherent & memset(,0,) Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-11-04 13:17:07 -07:00
Mike Snitzer	b277da0a8a	block: disable entropy contributions for nonrot devices Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set QUEUE_FLAG_NONROT. Historically, all block devices have automatically made entropy contributions. But as previously stated in commit `e2e1a148` ("block: add sysfs knob for turning off disk entropy contributions"): - On SSD disks, the completion times aren't as random as they are for rotational drives. So it's questionable whether they should contribute to the random pool in the first place. - Calling add_disk_randomness() has a lot of overhead. There are more reliable sources for randomness than non-rotational block devices. From a security perspective it is better to err on the side of caution than to allow entropy contributions from unreliable "random" sources. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2014-10-04 10:55:32 -06:00
Linus Torvalds	b55b390202	Merge git://git.infradead.org/users/willy/linux-nvme Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...	2014-06-15 15:58:03 -10:00
Keith Busch	f3db22feb5	NVMe: Fix hot cpu notification dead lock There is a potential dead lock if a cpu event occurs during nvme probe since it registered with hot cpu notification. This fixes the race by having the module register with notification outside of probe rather than have each device register. The actual work is done in a scheduled work queue instead of in the notifier since assigning IO queues has the potential to block if the driver creates additional queues. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-13 10:43:34 -04:00
Matthew Wilcox	bd67608a61	NVMe: Rename io_timeout to nvme_io_timeout It's positively immoral to have a global variable called 'io_timeout'. Keep the module parameter called io_timeout, though. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 23:04:30 -04:00
Sam Bradshaw	b4e75cbf13	NVMe: Adhere to request queue block accounting enable/disable Recently, a new sysfs control "iostats" was added to selectively enable or disable io statistics collection for request queues. This patch hooks that control. IO statistics collection is rather expensive on large, multi-node machines with drives pushing millions of iops. Having the ability to disable collection if not needed can improve throughput significantly. As a data point, on a quad E5-4640, I see more than 50% throughput improvement when io statistics accounting is disabled during heavily multi-threaded small block random read benchmarks where device performance is in the million iops+ range. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:58:54 -04:00
Keith Busch	a51afb5433	NVMe: Fix nvme get/put queue semantics The routines to get and lock nvme queues required the caller to "put" or "unlock" them even if getting one returned NULL. This patch fixes that. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:58:34 -04:00
Keith Busch	9d43cf646e	NVMe: Make admin timeout a module parameter Signed-off-by: Keith Busch <keith.busch@intel.com> [made admin_timeout static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:57:56 -04:00
Keith Busch	61e4ce086d	NVMe: Make iod bio timeout a parameter This was originally set to 4 times the IO timeout, but that was when the IO timeout was 5 seconds instead of 30. 20 seconds for total time to failure seemed more reasonable than 2 minutes for most, but other users have requested to make this a module parameter instead. Signed-off-by: Keith Busch <keith.busch@intel.com> [renamed the module parameter to retry_time] [made retry_time static] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 22:56:43 -04:00
Santosh Y	6808c5fb7f	NVMe: Prevent possible NULL pointer dereference kmalloc() used by the nvme_alloc_iod() to allocate memory for 'iod' can fail. So check the return value. Signed-off-by: Santosh Y <santosh.sy@samsung.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-06-03 16:43:30 -04:00
Keith Busch	f0d54a5417	NVMe: Implement PCIe reset notification callback Quiesce and shutdown the device prior to reset, then restart the device and resume IO after. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2014-05-27 11:12:29 -06:00
Matthew Wilcox	21bd78bcf4	NVMe: Enable BUILD_BUG_ON checks Since _nvme_check_size() wasn't being called from anywhere, the compiler was optimising it away ... along with all the link-time build failures that would result if any of the structures were the wrong size. Call it from nvme_exit() for no particular reason. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-09 22:42:39 -04:00
Keith Busch	53562be74b	NVMe: Flush with data support It is possible a filesystem may send a flush flagged bio with write data. There is no such composite NVMe command, so the driver sends flush and write separately. The device is allowed to execute these commands in any order, so it was possible the driver ends the bio after the write completes, but while the flush is still active. We don't want to let a filesystem believe flush succeeded before it really has; this could cause data corruption on a power loss between these events. To fix, this patch splits the flush and write into chained bios. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-05 10:54:02 -04:00
Keith Busch	a7d2ce2832	NVMe: Configure support for block flush This configures an nvme request_queue as flush capable if the device has a volatile write cache present. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-05 10:53:53 -04:00
Keith Busch	3291fa57cb	NVMe: Add tracepoints Adding tracepoints for bio_complete and block_split into nvme to help with gathering IO info using blktrace and blkparse. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-05 10:41:26 -04:00
Keith Busch	94bbac4052	NVMe: Protect against badly formatted CQEs If a misbehaving device posts a CQE with a command id < depth but for one that was never allocated, the command info will have a callback function set to NULL and we don't want to try invoking that. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-05 10:41:25 -04:00
Matthew Wilcox	27e8166c31	NVMe: Improve error messages Help people diagnose what is going wrong at initialisation time by printing out which command has gone wrong and what the device returned. Also fix the error message printed while waiting for reset. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com>	2014-05-05 10:41:25 -04:00
Matthew Wilcox	8757ad65d3	NVMe: Update copyright headers Make the copyright dates accurate and remove the final paragraph that includes the address of the FSF. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-05-05 10:41:25 -04:00
Linus Torvalds	3e8072d48b	Merge git://git.infradead.org/users/willy/linux-nvme Pull NVMe driver updates from Matthew Wilcox: "Various updates to the NVMe driver. The most user-visible change is that drive hotplugging now works and CPU hotplug while an NVMe drive is installed should also work better" * git://git.infradead.org/users/willy/linux-nvme: NVMe: Retry failed commands with non-fatal errors NVMe: Add getgeo to block ops NVMe: Start-stop nvme_thread during device add-remove. NVMe: Make I/O timeout a module parameter NVMe: CPU hot plug notification NVMe: per-cpu io queues NVMe: Replace DEFINE_PCI_DEVICE_TABLE NVMe: Fix divide-by-zero in nvme_trans_io_get_num_cmds NVMe: IOCTL path RCU protect queue access NVMe: RCU protected access to io queues NVMe: Initialize device reference count earlier NVMe: Add CONFIG_PM_SLEEP to suspend/resume functions	2014-04-11 16:45:59 -07:00
Keith Busch	edd10d3328	NVMe: Retry failed commands with non-fatal errors For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by: Keith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>	2014-04-10 17:11:59 -04:00

1 2 3

128 Commits