OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	9bdcfb10f2	nvme-pci: consistencly use ctrl->device for logging This is what most of the code already does and gives much more useful prefixes than the device embedded in the pci_dev. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-05-26 09:54:07 +03:00
Linus Torvalds	894e21642d	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A small collection of fixes that should go into this cycle. - a pull request from Christoph for NVMe, which ended up being manually applied to avoid pulling in newer bits in master. Mostly fibre channel fixes from James, but also a few fixes from Jon and Vijay - a pull request from Konrad, with just a single fix for xen-blkback from Gustavo. - a fuseblk bdi fix from Jan, fixing a regression in this series with the dynamic backing devices. - a blktrace fix from Shaohua, replacing sscanf() with kstrtoull(). - a request leak fix for drbd from Lars, fixing a regression in the last series with the kref changes. This will go to stable as well" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: release the sq ref on rdma read errors nvmet-fc: remove target cpu scheduling flag nvme-fc: stop queues on error detection nvme-fc: require target or discovery role for fc-nvme targets nvme-fc: correct port role bits nvme: unmap CMB and remove sysfs file in reset path blktrace: fix integer parse fuseblk: Fix warning in super_setup_bdi_name() block: xen-blkback: add null check to avoid null pointer dereference drbd: fix request leak introduced by locking/atomic, kref: Kill kref_sub()	2017-05-20 16:12:30 -07:00
Jon Derrick	f63572dff1	nvme: unmap CMB and remove sysfs file in reset path CMB doesn't get unmapped until removal while getting remapped on every reset. Add the unmapping and sysfs file removal to the reset path in nvme_pci_disable to match the mapping path in nvme_pci_enable. Fixes: `202021c1a` ("nvme : Add sysfs entry for NVMe CMBs when appropriate") Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-By: Stephen Bates <sbates@raithlin.com> Cc: <stable@vger.kernel.org> # 4.9+ Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-20 10:11:34 -06:00
Linus Torvalds	857f864014	pci-v4.12-changes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5 gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG 0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+ IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3 IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq ags00JtMLitfNPBH3eSl =eE4D -----END PGP SIGNATURE----- Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add framework for supporting PCIe devices in Endpoint mode (Kishon Vijay Abraham I) - use non-postable PCI config space mappings when possible (Lorenzo Pieralisi) - clean up and unify mmap of PCI BARs (David Woodhouse) - export and unify Function Level Reset support (Christoph Hellwig) - avoid FLR for Intel 82579 NICs (Sasha Neftin) - add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig) - short-circuit config access failures for disconnected devices (Keith Busch) - remove D3 sleep delay when possible (Adrian Hunter) - freeze PME scan before suspending devices (Lukas Wunner) - stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava) - disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann) - add arch-specific alignment control to improve device passthrough by avoiding multiple BARs in a page (Yongji Xie) - add sysfs sriov_drivers_autoprobe to control VF driver binding (Bodong Wang) - allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas) - fix crashes when unbinding host controllers that don't support removal (Brian Norris) - add driver for MicroSemi Switchtec management interface (Logan Gunthorpe) - add driver for Faraday Technology FTPCI100 host bridge (Linus Walleij) - add i.MX7D support (Andrey Smirnov) - use generic MSI support for Aardvark (Thomas Petazzoni) - make Rockchip driver modular (Brian Norris) - advertise 128-byte Read Completion Boundary support for Rockchip (Shawn Lin) - advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin) - convert atomic_t to refcount_t in HV driver (Elena Reshetova) - add CPU IRQ affinity in HV driver (K. Y. Srinivasan) - fix PCI bus removal in HV driver (Long Li) - add support for ThunderX2 DMA alias topology (Jayachandran C) - add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki) - add ITE 8893 bridge DMA alias quirk (Jarod Wilson) - restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices (Manish Jaggi) * tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits) PCI: Don't allow unbinding host controllers that aren't prepared ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP MAINTAINERS: Add PCI Endpoint maintainer Documentation: PCI: Add userguide for PCI endpoint test function tools: PCI: Add sample test script to invoke pcitest tools: PCI: Add a userspace tool to test PCI endpoint Documentation: misc-devices: Add Documentation for pci-endpoint-test driver misc: Add host side PCI driver for PCI test function device PCI: Add device IDs for DRA74x and DRA72x dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access PCI: dwc: dra7xx: Workaround for errata id i870 dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode PCI: dwc: dra7xx: Add EP mode support PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently dt-bindings: PCI: Add DT bindings for PCI designware EP mode PCI: dwc: designware: Add EP mode support Documentation: PCI: Add binding documentation for pci-test endpoint function ixgbe: Use pcie_flr() instead of duplicating it IB/hfi1: Use pcie_flr() instead of duplicating it PCI: imx6: Fix spelling mistake: "contol" -> "control" ...	2017-05-08 19:03:25 -07:00
Christoph Hellwig	d6296d39e9	blk-mq: update ->init_request and ->exit_request prototypes Remove the request_idx parameter, which can't be used safely now that we support I/O schedulers with blk-mq. Except for a superflous check in mtip32xx it was unused anyway. Also pass the tag_set instead of just the driver data - this allows drivers to avoid some code duplication in a follow on cleanup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-05-02 07:52:08 -06:00
Linus Torvalds	694752922b	Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ was initially a fork of CFQ, but subsequently changed to implement fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant to be used on desktop type single drives, providing good fairness. From Paolo. - Add Kyber IO scheduler. This is a full multiqueue aware scheduler, using a scalable token based algorithm that throttles IO based on live completion IO stats, similary to blk-wbt. From Omar. - A series from Jan, moving users to separately allocated backing devices. This continues the work of separating backing device life times, solving various problems with hot removal. - A series of updates for lightnvm, mostly from Javier. Includes a 'pblk' target that exposes an open channel SSD as a physical block device. - A series of fixes and improvements for nbd from Josef. - A series from Omar, removing queue sharing between devices on mostly legacy drivers. This helps us clean up other bits, if we know that a queue only has a single device backing. This has been overdue for more than a decade. - Fixes for the blk-stats, and improvements to unify the stats and user windows. This both improves blk-wbt, and enables other users to register a need to receive IO stats for a device. From Omar. - blk-throttle improvements from Shaohua. This provides a scalable framework for implementing scalable priotization - particularly for blk-mq, but applicable to any type of block device. The interface is marked experimental for now. - Bucketized IO stats for IO polling from Stephen Bates. This improves efficiency of polled workloads in the presence of mixed block size IO. - A few fixes for opal, from Scott. - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics. From a variety of folks, mostly Sagi and James Smart. - A series from Bart, improving our exposed info and capabilities from the blk-mq debugfs support. - A series from Christoph, cleaning up how handle WRITE_ZEROES. - A series from Christoph, cleaning up the block layer handling of how we track errors in a request. On top of being a nice cleanup, it also shrinks the size of struct request a bit. - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was never used by platforms, and the latter has outlived it's usefulness. - Various little bug fixes and cleanups from a wide variety of folks. * 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits) block: hide badblocks attribute by default blk-mq: unify hctx delay_work and run_work block: add kblock_mod_delayed_work_on() blk-mq: unify hctx delayed_run_work and run_work nbd: fix use after free on module unload MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler blk-mq-sched: alloate reserved tags out of normal pool mtip32xx: use runtime tag to initialize command header scsi: Implement blk_mq_ops.show_rq() blk-mq: Add blk_mq_ops.show_rq() blk-mq: Show operation, cmd_flags and rq_flags names blk-mq: Make blk_flags_show() callers append a newline character blk-mq: Move the "state" debugfs attribute one level down blk-mq: Unregister debugfs attributes earlier blk-mq: Only unregister hctxs for which registration succeeded blk-mq-debugfs: Rename functions for registering and unregistering the mq directory blk-mq: Let blk_mq_debugfs_register() look up the queue name blk-mq: Register <dev>/queue/mq after having registered <dev>/queue ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset ide-pm: always pass 0 error to __blk_end_request_all ..	2017-05-01 10:39:57 -07:00
Keith Busch	7776db1ccc	nvme/pci: Poll CQ on timeout If an IO timeout occurs, it's helpful to know if the controller did not post a completion or the driver missed an interrupt. While we never expect the latter, this patch will make it possible to tell the difference so we don't have to guess. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>	2017-04-21 16:41:55 +02:00
Helen Koike	f9f38e3338	nvme: improve performance for virtual NVMe devices This change provides a mechanism to reduce the number of MMIO doorbell writes for the NVMe driver. When running in a virtualized environment like QEMU, the cost of an MMIO is quite hefy here. The main idea for the patch is provide the device two memory location locations: 1) to store the doorbell values so they can be lookup without the doorbell MMIO write 2) to store an event index. I believe the doorbell value is obvious, the event index not so much. Similar to the virtio specification, the virtual device can tell the driver (guest OS) not to write MMIO unless you are writing past this value. FYI: doorbell values are written by the nvme driver (guest OS) and the event index is written by the virtual device (host OS). The patch implements a new admin command that will communicate where these two memory locations reside. If the command fails, the nvme driver will work as before without any optimizations. Contributions: Eric Northup <digitaleric@google.com> Frank Swiderski <fes@google.com> Ted Tso <tytso@mit.edu> Keith Busch <keith.busch@intel.com> Just to give an idea on the performance boost with the vendor extension: Running fio [1], a stock NVMe driver I get about 200K read IOPs with my vendor patch I get about 1000K read IOPs. This was running with a null device i.e. the backing device simply returned success on every read IO request. [1] Running on a 4 core machine: fio --time_based --name=benchmark --runtime=30 --filename=/dev/nvme0n1 --nrfiles=1 --ioengine=libaio --iodepth=32 --direct=1 --invalidate=1 --verify=0 --verify_fatal=0 --numjobs=4 --rw=randread --blocksize=4k --randrepeat=false Signed-off-by: Rob Nelson <rlnelson@google.com> [mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org> [koike: updated for upstream] Signed-off-by: Helen Koike <helen.koike@collabora.co.uk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com>	2017-04-21 16:41:47 +02:00
Keith Busch	81c1cd9835	nvme/pci: Don't set reserved SQ create flags The QPRIO field is only valid if weighted round robin arbitration is used, and this driver doesn't enable that controller configuration option. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2017-04-21 16:41:46 +02:00
Andy Lutomirski	ff5350a86b	nvme: Adjust the Samsung APST quirk I got a couple more reports: the Samsung APST issues appears to affect multiple 950-series devices in Dell XPS 15 9550 and Precision 5510 laptops. Change the quirk: rather than blacklisting the firmware on the first problematic SSD that was reported, disable APST on all 144d:a802 devices if they're installed in the two affected Dell models. While we're at it, disable only the deepest sleep state instead of all of them -- the reporters say that this is sufficient to fix the problem. (I have a device that appears to be entirely identical to one of the affected devices, but I have a different Dell laptop, so it's not the case that all Samsung devices with firmware BXW75D0Q are broken under all circumstances.) Samsung engineers have an affected system, and hopefully they'll give us a better workaround some time soon. In the mean time, this should minimize regressions. See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 14:42:09 -06:00
Christoph Hellwig	27fa9bc545	nvme: split nvme status from block req->errors We want our own clearly defined error field for NVMe passthrough commands, and the request errors field is going away in its current form. Just store the status and result field in the nvme_request field from hardirq completion context (using a new helper) and then generate a Linux errno for the block layer only when we actually need it. Because we can't overload the status value with a negative error code for cancelled command we now have a flags filed in struct nvme_request that contains a bit for this condition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-20 12:16:10 -06:00
Christoph Hellwig	0ff199cb48	nvme/pci: Switch to pci_request_irq() Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Keith Busch <keith.busch@intel.com>	2017-04-18 13:43:29 -05:00
Christoph Hellwig	e850fd16f7	nvme: implement REQ_OP_WRITE_ZEROES But now for the real NVMe Write Zeroes yet, just to get rid of the discard abuse for zeroing. Also rename the quirk flag to be a bit more self-explanatory. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-08 11:25:38 -06:00
Christoph Hellwig	987f699a8f	nvme: move ->retries setup to nvme_setup_cmd ->retries is counting the number of times a command is resubmitted, and be cleared on the first time we see the command. We currently don't do that for non-PCIe command, which is easily fixed by moving the setup to common code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-05 12:05:08 -06:00
Christoph Hellwig	77f02a7acd	nvme: factor request completion code into a common helper This avoids duplicating the logic four times, and it also allows to keep some helpers static in core.c or just opencode them. Note that this loses printing the aborted status on completions in the PCI driver as that uses a data structure not available any more. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-04-04 09:48:23 -06:00
Eric Biggers	f363b089be	blk-mq: constify struct blk_mq_ops Constify all instances of blk_mq_ops, as they are never modified. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-31 08:28:58 -06:00
Keith Busch	302ad8cc09	nvme: Complete all stuck requests If the nvme driver is shutting down its controller, the drievr will not start the queues up again, preventing blk-mq's hot CPU notifier from making forward progress. To fix that, this patch starts a request_queue freeze when the driver resets a controller so no new requests may enter. The driver will wait for frozen after IO queues are restarted to ensure the queue reference can be reinitialized when nvme requests to unfreeze the queues. If the driver is doing a safe shutdown, the driver will wait for the controller to successfully complete all inflight requests so that we don't unnecessarily fail them. Once the controller has been disabled, the queues will be restarted to force remaining entered requests to end in failure so that blk-mq's hot cpu notifier may progress. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-02 08:56:59 -07:00
Shaohua Li	d3af3ecdc6	nvme: allocate nvme_queue in correct node nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq will use it. Signed-off-by: Shaohua Li <shli@fb.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-03-02 08:56:04 -07:00
Scott Bauer	e286bcfc59	nvme/pci: re-check security protocol support after reset A device may change capabilities after each reset, e.g. due to a firmware upgrade. We should thus check for Security Send/Receive and OPAL support after each reset. Based on patches from Christoph and Keith. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-23 11:55:43 -07:00
Daniel Roschka	124298bd03	nvme: detect NVMe controller in recent MacBooks Adds support for detection of the NVMe controller found in the following recent MacBooks: - Retina MacBook 2016 (MacBook9,1) - 13" MacBook Pro 2016 without Touch Bar (MacBook13,1) - 13" MacBook Pro 2016 with Touch Bar (MacBook13,2) Signed-off-by: Daniel Roschka <danielroschka@phoenitydawn.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-22 15:17:29 -07:00
Keith Busch	9ef3932e25	nvme/pci: No special case for queue busy on IO This driver previously required we have a special check for IO submitted to nvme IO queues that are temporarily suspended. That is no longer necessary since blk-mq provides a quiesce, so any IO that actually gets submitted to such a queue must be ended since the queue isn't going to start back up. This is fixing a condition where we have fewer IO queues after a controller reset. This may happen if the number of CPU's has changed, or controller firmware update changed the queue count, for example. While it may be possible to complete the IO on a different queue, the block layer does not provide a way to resubmit a request on a different hardware context once the request has entered the queue. We don't want these requests to be stuck indefinitely either, so ending them in error is our only option at the moment. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-22 13:34:00 -07:00
Keith Busch	6db28eda26	nvme/pci: Disable on removal when disconnected If the device is not present, the driver should disable the queues immediately. Prior to this, the driver was relying on the watchdog timer to kill the queues if requests were outstanding to the device, and that just delays removal up to one second. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-22 13:34:00 -07:00
Jens Axboe	818551e2b2	Merge branch 'for-4.11/next' into for-4.11/linus-merge Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-17 14:08:19 -07:00
Jens Axboe	6010720da8	Merge branch 'for-4.11/block' into for-4.11/linus-merge Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-17 14:06:45 -07:00
Scott Bauer	8a9ae52328	nvme: Check for Security send/recv support before issuing commands. We need to verify that the controller supports the security commands before actually trying to issue them. Signed-off-by: Scott Bauer <scott.bauer@intel.com> [hch: moved the check so that we don't call into the OPAL code if not supported] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-17 12:41:49 -07:00
Christoph Hellwig	4f1244c829	block/sed-opal: allocate struct opal_dev dynamically Insted of bloating the containing structure with it all the time this allocates struct opal_dev dynamically. Additionally this allows moving the definition of struct opal_dev into sed-opal.c. For this a new private data field is added to it that is passed to the send/receive callback. After that a lot of internals can be made private as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-17 12:41:47 -07:00
Scott Bauer	a98e58e54f	nvme: Add Support for Opal: Unlock from S3 & Opal Allocation/Ioctls This patch implements the necessary logic to unlock an Opal enabled device coming back from an S3. The patch also implements the SED/Opal allocation necessary to support the opal ioctls. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-02-06 09:44:21 -07:00
Christoph Hellwig	57292b58dd	block: introduce blk_rq_is_passthrough This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-31 14:00:34 -07:00
Keith Busch	7bf7d77862	nvme/pci: Don't mark IOD as aborted if abort wasn't sent This patch sets the aborted flag only if an abort was sent, reducing excessive kernel message spamming for completed IO that wasn't actually aborted. Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-31 08:34:47 -07:00
Jens Axboe	f924ba70c1	Merge branch 'for-4.11/block' into for-4.11/rq-refactor Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-27 15:08:31 -07:00
Jens Axboe	d348499138	blk-mq-sched: allow setting of default IO scheduler Add Kconfig entries to manage what devices get assigned an MQ scheduler, and add a blk-mq flag for drivers to opt out of scheduling. The latter is useful for admin type queues that still allocate a blk-mq queue and tag set, but aren't use for normal IO. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com>	2017-01-17 10:04:31 -07:00
Christoph Hellwig	b131c61d62	nvme: use blk_rq_payload_bytes The new blk_rq_payload_bytes generalizes the payload length hacks that nvme_map_len did before. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2017-01-13 15:17:04 -07:00
Jens Axboe	8e5d31eb02	Merge branch 'nvme-4.10' of git://git.infradead.org/nvme into for-linus Christoph writes: The most significant one is that we've agreed on shared maintaince and a common repository for the PCIe NVMe driver and NVMe over Fabrics. The target code still only has a subset of the maintainers but goes through the same tree as well. Keith, Sagi and me will take turns at collecting patches and sending you pull requests.	2016-12-22 11:54:46 -07:00
Keith Busch	ff13b39ecf	nvme/pci: Delete misleading queue-wrap comment It is not theoretically possible for this driver to wrap twice while processing completions. The driver allocates only 'queue_depth - 1' tags, so there can never be more than that to reap when processing a completion queue. Removing this misleading comment makes it a little less likely people with broken controllers will blame the driver for their spurious interrupts. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2016-12-21 11:33:23 +01:00
Max Gurtovoy	9fa196e7fc	nvme/pci: Fix whitespace problem Convert to tabs and remove unneeded whitespaces. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2016-12-21 11:33:18 +01:00
Stephen Bates	c965809c66	nvme : Use correct scnprintf in cmb show Make sure we are using the correct scnprintf in the sysfs show function for the CMB. Signed-off-by: Stephen Bates <sbates@raithlin.com> Reviewed-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-12-19 08:35:24 -07:00
Andy Lutomirski	d2a6191840	nvme/pci: Log PCI_STATUS when the controller dies When debugging nvme controller crashes, it's nice to know whether the controller died cleanly so that the failure is just reflected in CSTS, it died and put an error in PCI_STATUS, or whether it died so badly that it stopped responding to PCI configuration space reads. I've seen a failure that gives 0xffff in PCI_STATUS on a Samsung "SM951 NVMe SAMSUNG 256GB" with firmware "BXW75D0Q". Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andy Lutomirski <luto@kernel.org> Reviewed-by: Keith Busch <keith.busch@intel.com> Fixed up white space and hunk reject. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-12-13 21:07:08 -07:00
Linus Torvalds	b92e09bb5b	Merge branch 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata updates from Tejun Heo: - Adam added opt-in ATA command priority support. - There are machines which hide multiple nvme devices behind an ahci BAR. Dan Williams proposed a solution to force-switch the mode but deemed too hackishd. People are gonna discuss the proper way to handle the situation in nvme standard meetings. For now, detect and warn about the situation. - Low level driver specific changes. Christoph Hellwig pipes in about the hidden nvme warning: "I wish that was the case. We've pretty much agreed that we'll want to implement it as a virtual PCIe root bridge, similar to Intels other 'innovation' VMD that we work around that way. But Intel management has apparently decided that they don't want to spend more cycles on this now that Lenovo has an optional BIOS that doesn't force this broken mode anymore, and no one outside of Intel has enough information to implement something like this. So for now I guess this warning is it, until Intel reconsideres and spends resources on fixing up the damage their Chipset people caused" * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ahci: warn about remapped NVMe devices ahci-remap.h: add ahci remapping definitions nvme: move NVMe class code to pci_ids.h pata: imx: support controller modes up to PIO4 pata: imx: add support of setting timings for PIO modes pata: imx: set controller PIO mode with .set_piomode callback pata: imx: sort headers out ata: set ncq_prio_enabled iff device has support ata: ATA Command Priority Disabled By Default ata: Enabling ATA Command Priorities block: Add iocontext priority to request ahci: qoriq: added ls1046a platform support	2016-12-13 13:26:24 -08:00
Linus Torvalds	36869cb93d	Merge branch 'for-4.10/block' of git://git.kernel.dk/linux-block Pull block layer updates from Jens Axboe: "This is the main block pull request this series. Contrary to previous release, I've kept the core and driver changes in the same branch. We always ended up having dependencies between the two for obvious reasons, so makes more sense to keep them together. That said, I'll probably try and keep more topical branches going forward, especially for cycles that end up being as busy as this one. The major parts of this pull request is: - Improved support for O_DIRECT on block devices, with a small private implementation instead of using the pig that is fs/direct-io.c. From Christoph. - Request completion tracking in a scalable fashion. This is utilized by two components in this pull, the new hybrid polling and the writeback queue throttling code. - Improved support for polling with O_DIRECT, adding a hybrid mode that combines pure polling with an initial sleep. From me. - Support for automatic throttling of writeback queues on the block side. This uses feedback from the device completion latencies to scale the queue on the block side up or down. From me. - Support from SMR drives in the block layer and for SD. From Hannes and Shaun. - Multi-connection support for nbd. From Josef. - Cleanup of request and bio flags, so we have a clear split between which are bio (or rq) private, and which ones are shared. From Christoph. - A set of patches from Bart, that improve how we handle queue stopping and starting in blk-mq. - Support for WRITE_ZEROES from Chaitanya. - Lightnvm updates from Javier/Matias. - Supoort for FC for the nvme-over-fabrics code. From James Smart. - A bunch of fixes from a whole slew of people, too many to name here" * 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits) blk-stat: fix a few cases of missing batch flushing blk-flush: run the queue when inserting blk-mq flush elevator: make the rqhash helpers exported blk-mq: abstract out blk_mq_dispatch_rq_list() helper blk-mq: add blk_mq_start_stopped_hw_queue() block: improve handling of the magic discard payload blk-wbt: don't throttle discard or write zeroes nbd: use dev_err_ratelimited in io path nbd: reset the setup task for NBD_CLEAR_SOCK nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME nvme-fabrics: Add target support for FC transport nvme-fabrics: Add host support for FC transport nvme-fabrics: Add FC transport LLDD api definitions nvme-fabrics: Add FC transport FC-NVME definitions nvme-fabrics: Add FC transport error codes to nvme.h Add type 0x28 NVME type code to scsi fc headers nvme-fabrics: patch target code in prep for FC transport support nvme-fabrics: set sqe.command_id in core not transports parser: add u64 number parser nvme-rdma: align to generic ib_event logging helper ...	2016-12-13 10:19:16 -08:00
Christoph Hellwig	f9d03f96b9	block: improve handling of the magic discard payload Instead of allocating a single unused biovec for discard requests, send them down without any payload. Instead we allow the driver to add a "special" payload using a biovec embedded into struct request (unioned over other fields never used while in the driver), and overloading the number of segments for this case. This has a couple of advantages: - we don't have to allocate the bio_vec - the amount of special casing for discard requests in the block layer is significantly reduced - using this same scheme for other request types is trivial, which will be important for implementing the new WRITE_ZEROES op on devices where it actually requires a payload (e.g. SCSI) - we can get rid of playing games with the request length, as we'll never touch it and completions will work just fine - it will allow us to support ranged discard operations in the future by merging non-contiguous discard bios into a single request - last but not least it removes a lot of code This patch is the common base for my WIP series for ranges discards and to remove discard_zeroes_data in favor of always using REQ_OP_WRITE_ZEROES, so it would be good to get it in quickly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-12-09 08:30:51 -07:00
James Smart	721b3917c4	nvme-fabrics: set sqe.command_id in core not transports Currently, core.c sets command_id only on rd/wr commands, leaving it to the transport to set it again to ensure the request had a command id. Move location of set in core so applies to all commands. Remove transport sets. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-12-06 10:17:03 +02:00
Christoph Hellwig	a2e7eefd56	nvme: move NVMe class code to pci_ids.h We'll need to check for it in the AHCI drivers (yes, really) soon. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tejun Heo <tj@kernel.org>	2016-12-05 14:31:23 -05:00
Keith Busch	d48756228e	nvme/pci: Don't free queues on error The nvme_remove function tears down all allocated resources in the correct order, so no need to free queues on error during initialization. This fixes possible use-after-free errors when queues are still associated with a blk-mq hctx. Reported-by: Scott Bauer <scott.bauer@intel.com> Tested-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimbeg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-16 12:39:57 -07:00
Omar Sandoval	bac0000af5	nvme: untangle 0 and BLK_MQ_RQ_QUEUE_OK Let's not depend on any of the BLK_MQ_RQ_QUEUE_* constants having specific values. No functional change. Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-15 12:50:11 -07:00
Christoph Hellwig	7bf58533a0	nvme: don't pass the full CQE to nvme_complete_async_event We only need the status and result fields, and passing them explicitly makes life a lot easier for the Fibre Channel transport which doesn't have a full CQE for the fast path case. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-10 10:06:26 -07:00
Christoph Hellwig	d49187e97e	nvme: introduce struct nvme_request This adds a shared per-request structure for all NVMe I/O. This structure is embedded as the first member in all NVMe transport drivers request private data and allows to implement common functionality between the drivers. The first use is to replace the current abuse of the SCSI command passthrough fields in struct request for the NVMe command passthrough, but it will grow a field more fields to allow implementing things like common abort handlers in the future. The passthrough commands are handled by having a pointer to the SQE (struct nvme_command) in struct nvme_request, and the union of the possible result fields, which had to be turned from an anonymous into a named union for that purpose. This avoids having to pass a reference to a full CQE around and thus makes checking the result a lot more lightweight. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-11-10 10:06:24 -07:00
Christoph Hellwig	e806402130	block: split out request-only flags into a new namespace A lot of the REQ_* flags are only used on struct requests, and only of use to the block layer and a few drivers that dig into struct request internals. This patch adds a new req_flags_t rq_flags field to struct request for them, and thus dramatically shrinks the number of common requests. It also removes the unfortunate situation where we have to fit the fields from the same enum into 32 bits for struct bio and 64 bits for struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-28 08:45:17 -06:00
Linus Torvalds	ecd06f2883	Merge branch 'for-linus' of git://git.kernel.dk/linux-block Pull block fixes from Jens Axboe: "A set of fixes that missed the merge window, mostly due to me being away around that time. Nothing major here, a mix of nvme cleanups and fixes, and one fix for the badblocks handling" * 'for-linus' of git://git.kernel.dk/linux-block: nvmet: use symbolic constants for CNS values nvme: use symbolic constants for CNS values nvme.h: add an enum for cns values nvme.h: don't use uuid_be nvme.h: resync with nvme-cli nvme: Add tertiary number to NVME_VS nvme : Add sysfs entry for NVMe CMBs when appropriate nvme: don't schedule multiple resets nvme: Delete created IO queues on reset nvme: Stop probing a removed device badblocks: fix overlapping check for clearing	2016-10-21 10:54:01 -07:00
Gabriel Krisman Bertazi	8ef2074d28	nvme: Add tertiary number to NVME_VS NVMe 1.2.1 specification adds a tertiary element to the version number. This updates the macro and its callers to include the final number and fixup a single place in nvmet where the version was generated manually. Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-19 11:36:22 -06:00
Stephen Bates	202021c1a6	nvme : Add sysfs entry for NVMe CMBs when appropriate Add a sysfs attribute that contains salient information about the NVMe Controller Memory Buffer when one is present. For now, just display the information about the CMB available from the control registers. We attach the CMB attribute file to the existing nvme_ctrl sysfs group so it can handle the sysfs teardown. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Stephen Bates <sbates@raithlin.com> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-12 11:10:28 -06:00
Keith Busch	c5f6ce97c1	nvme: don't schedule multiple resets The queue_work only fails if the work is pending, but not yet running. If the work is running, the work item would get requeued, triggering a double reset. If the first reset fails for any reason, the second reset triggers: WARN_ON(dev->ctrl.state == NVME_CTRL_RESETTING) Hitting that schedules controller deletion for a second time, which potentially takes a reference on the device that is being deleted. If the reset occurs at the same time as a hot removal event, this causes a double-free. This patch has the reset helper function check if the work is busy prior to queueing, and changes all places that schedule resets to use this function. Since most users don't want to sync with that work, the "flush_work" is moved to the only caller that wants to sync. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg<sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-12 09:24:39 -06:00
Keith Busch	7065906096	nvme: Delete created IO queues on reset The driver was decrementing the online_queues prior to attempting to delete those IO queues, so the driver ended up not requesting the controller delete any. This patch saves the online_queues prior to suspending them, and adds that parameter for deleting io queues. Fixes: `c21377f8` ("nvme: Suspend all queues before deletion") Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-10-12 09:22:16 -06:00
Mauricio Faria de Oliveira	2b6b535d91	nvme: use the DMA_ATTR_NO_WARN attribute Use the DMA_ATTR_NO_WARN attribute for the dma_map_sg() call of the nvme driver that returns BLK_MQ_RQ_QUEUE_BUSY (not for BLK_MQ_RQ_QUEUE_ERROR). Link: http://lkml.kernel.org/r/1470092390-25451-4-git-send-email-mauricfo@linux.vnet.ibm.com Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@fb.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-10-11 15:06:32 -07:00
Linus Torvalds	12e3d3cdd9	Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-block Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event	2016-10-09 17:29:33 -07:00
Christoph Hellwig	dca51e7892	nvme: switch to use pci_alloc_irq_vectors Use the new helper to automatically select the right interrupt type, as well as to use the automatic interupt affinity assignment. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-09-15 08:42:03 -06:00
Christoph Hellwig	7d7e0f90b7	blk-mq: remove ->map_queue All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-09-15 08:42:03 -06:00
Wenbo Wang	015282c9eb	nvme/quirk: Add a delay before checking device ready for memblaze device Signed-off-by: Wenbo Wang <wenbo.wang@memblaze.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-09-08 10:17:45 -06:00
Gabriel Krisman Bertazi	82469c59d2	nvme: Don't suspend admin queue that wasn't created This fixes a regression in my previous commit `c21377f836` ("nvme: Suspend all queues before deletion"), which provoked an Oops in the removal path when removing a device that became IO incapable very early at probe (i.e. after a failed EEH recovery). Turns out, if the error occurred very early at the probe path, before even configuring the admin queue, we might try to suspend the uninitialized admin queue, accessing bad memory. Fixes: `c21377f836` ("nvme: Suspend all queues before deletion") Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-09-07 08:53:21 -06:00
Gabriel Krisman Bertazi	c21377f836	nvme: Suspend all queues before deletion When nvme_delete_queue fails in the first pass of the nvme_disable_io_queues() loop, we return early, failing to suspend all of the IO queues. Later, on the nvme_pci_disable path, this causes us to disable MSI without actually having freed all the IRQs, which triggers the BUG_ON in free_msi_irqs(), as show below. This patch refactors nvme_disable_io_queues to suspend all queues before start submitting delete queue commands. This way, we ensure that we have at least returned every IRQ before continuing with the removal path. [ 487.529200] kernel BUG at ../drivers/pci/msi.c:368! cpu 0x46: Vector: 700 (Program Check) at [c0000078c5b83650] pc: c000000000627a50: free_msi_irqs+0x90/0x200 lr: c000000000627a40: free_msi_irqs+0x80/0x200 sp: c0000078c5b838d0 msr: 9000000100029033 current = 0xc0000078c5b40000 paca = 0xc000000002bd7600 softe: 0 irq_happened: 0x01 pid = 1376, comm = kworker/70:1H kernel BUG at ../drivers/pci/msi.c:368! Linux version 4.7.0.mainline+ (root@iod76) (gcc version 5.3.1 20160413 (Ubuntu/IBM 5.3.1-14ubuntu2.1) ) #104 SMP Fri Jul 29 09:20:17 CDT 2016 enter ? for help [c0000078c5b83920] d0000000363b0cd8 nvme_dev_disable+0x208/0x4f0 [nvme] [c0000078c5b83a10] d0000000363b12a4 nvme_timeout+0xe4/0x250 [nvme] [c0000078c5b83ad0] c0000000005690e4 blk_mq_rq_timed_out+0x64/0x110 [c0000078c5b83b40] c00000000056c930 bt_for_each+0x160/0x170 [c0000078c5b83bb0] c00000000056d928 blk_mq_queue_tag_busy_iter+0x78/0x110 [c0000078c5b83c00] c0000000005675d8 blk_mq_timeout_work+0xd8/0x1b0 [c0000078c5b83c50] c0000000000e8cf0 process_one_work+0x1e0/0x590 [c0000078c5b83ce0] c0000000000e9148 worker_thread+0xa8/0x660 [c0000078c5b83d80] c0000000000f2090 kthread+0x110/0x130 [c0000078c5b83e30] c0000000000095f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: linux-nvme@lists.infradead.org Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-11 09:35:57 -06:00
Linus Torvalds	c8d0267efd	PCI changes for the v4.8 merge window: Enumeration Move ecam.h to linux/include/pci-ecam.h (Jayachandran C) Add parent device field to ECAM struct pci_config_window (Jayachandran C) Add generic MCFG table handling (Tomasz Nowicki) Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki) Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki) Resource management Add devm_request_pci_bus_resources() (Bjorn Helgaas) Unify pci_resource_to_user() declarations (Bjorn Helgaas) Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas) Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas) Make PCI I/O space optional on ARM32 (Bjorn Helgaas) Ignore write combining when mapping I/O port space (Bjorn Helgaas) Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas) Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas) Support I/O resources when parsing host bridge resources (Jayachandran C) Add helpers to request/release memory and I/O regions (Johannes Thumshirn) Use pci_(request\|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn) Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5)) Add generic pci_bus_claim_resources() (Lorenzo Pieralisi) Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi) Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya) Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu) PCI device hotplug Allow additional bus numbers for hotplug bridges (Keith Busch) Ignore interrupts during D3cold (Lukas Wunner) Power management Enforce type casting for pci_power_t (Andy Shevchenko) Don't clear d3cold_allowed for PCIe ports (Mika Westerberg) Put PCIe ports into D3 during suspend (Mika Westerberg) Power on bridges before scanning new devices (Mika Westerberg) Runtime resume bridge before rescan (Mika Westerberg) Add runtime PM support for PCIe ports (Mika Westerberg) Remove redundant check of pcie_set_clkpm (Shawn Lin) Virtualization Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra) Add DMA alias quirk for Adaptec 3805 (Alex Williamson) Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake) Add ACS quirk for Solarflare SFC9220 (Edward Cree) MSI Fix PCI_MSI dependencies (Arnd Bergmann) Add pci_msix_desc_addr() helper (Christoph Hellwig) Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig) Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig) Provide sensible IRQ vector alloc/free routines (Christoph Hellwig) Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig) Error Handling Bind DPC to Root Ports as well as Downstream Ports (Keith Busch) Remove DPC tristate module option (Keith Busch) Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg) Generic host bridge driver Select IRQ_DOMAIN (Arnd Bergmann) Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) ACPI host bridge driver Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki) Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki) Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki) Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki) Altera host bridge driver Check link status before retrain link (Ley Foon Tan) Poll for link up status after retraining the link (Ley Foon Tan) Axis ARTPEC-6 host bridge driver Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann) Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel) Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel) Intel VMD host bridge driver Use lock save/restore in interrupt enable path (Jon Derrick) Select device dma ops to override (Keith Busch) Initialize list item in IRQ disable (Keith Busch) Use x86_vector_domain as parent domain (Keith Busch) Separate MSI and MSI-X vector sharing (Keith Busch) Marvell Aardvark host bridge driver Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni) Add Aardvark PCI host controller driver (Thomas Petazzoni) Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni) Microsoft Hyper-V host bridge driver Fix interrupt cleanup path (Cathy Avery) Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov) Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov) NVIDIA Tegra host bridge driver Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren) Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren) Use lower-case hex consistently for register definitions (Thierry Reding) Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding) Stop setting pcibios_min_mem (Thierry Reding) Renesas R-Car host bridge driver Drop gen2 dummy I/O port region (Bjorn Helgaas) TI DRA7xx host bridge driver Fix return value in case of error (Christophe JAILLET) Xilinx AXI host bridge driver Fix return value in case of error (Christophe JAILLET) Miscellaneous Make bus_attr_resource_alignment static (Ben Dooks) Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks) MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven) Make host bridge drivers explicitly non-modular (Paul Gortmaker) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXoNRtAAoJEFmIoMA60/r8LMkP/3kiNh21QFS6RZGOaDft5/Py n14Zo0w51avspxoI3iyDlBd5q/SssMqi+2c6Ko/fh2D2xMxJgmQOjdMDrIGARxGA qEHk/5IoXquY2/GcptmCk3ap66cJ6kTovS4OPrb73m3fPuknFwFwdzExq22XHbnI crPya6xwQxPLc54VpY/TsgW8E+EKZd/3FW9wuzzNHXrXmTILyhBQzQAA0K470GMx wEXU6kc3M/XhRuF1zjV9/O+H/xguwfnbTpZLvd2NAF6uXKZoRytEHHtNnVqu1hoe UPpDS2xq32pMNbGxGqBetCdIbkY/hWOufmckHI7Yu2OfXBYyHBYMG2je1+nMPkOV WiFhhrchGt5KnEMUwXPS4ROqnSZVpZBl1Fd4s10GhUYkoE2HNKJXta398H9FR1jj 4NEVSi4mSX/+CkaoIN3lXYiaf9P0wv4Wppve4Scr30+VnLjJhm7Vw5La7v12oo6x otrJ/g98AkmnbuUdLeWBUS/+TOcdPjZYbw52rqBsbOOjFm51Zcj6D7kf5WcTypQy HzbvygSVabcioWehUG1uudC8pdJmQlUGx1aES/iu+mZEae4cuUFALu6hDBD9IYnZ 5JdwjVzI0UItEwT3rQt3t4xiAqHADQ0NAVNJVCeREdoy/YQpSoTWGXIpyqCZ1yCm aBykjRsxbKQXlhVeIxuc =NVxu -----END PGP SIGNATURE----- Merge tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Highlights: - ARM64 support for ACPI host bridges - new drivers for Axis ARTPEC-6 and Marvell Aardvark - new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx - pci_resource_to_user() cleanup (more to come) Detailed summary: Enumeration: - Move ecam.h to linux/include/pci-ecam.h (Jayachandran C) - Add parent device field to ECAM struct pci_config_window (Jayachandran C) - Add generic MCFG table handling (Tomasz Nowicki) - Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki) - Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki) Resource management: - Add devm_request_pci_bus_resources() (Bjorn Helgaas) - Unify pci_resource_to_user() declarations (Bjorn Helgaas) - Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas) - Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas) - Make PCI I/O space optional on ARM32 (Bjorn Helgaas) - Ignore write combining when mapping I/O port space (Bjorn Helgaas) - Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas) - Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas) - Support I/O resources when parsing host bridge resources (Jayachandran C) - Add helpers to request/release memory and I/O regions (Johannes Thumshirn) - Use pci_(request\|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn) - Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5)) - Add generic pci_bus_claim_resources() (Lorenzo Pieralisi) - Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) - Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi) - Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya) - Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu) PCI device hotplug: - Allow additional bus numbers for hotplug bridges (Keith Busch) - Ignore interrupts during D3cold (Lukas Wunner) Power management: - Enforce type casting for pci_power_t (Andy Shevchenko) - Don't clear d3cold_allowed for PCIe ports (Mika Westerberg) - Put PCIe ports into D3 during suspend (Mika Westerberg) - Power on bridges before scanning new devices (Mika Westerberg) - Runtime resume bridge before rescan (Mika Westerberg) - Add runtime PM support for PCIe ports (Mika Westerberg) - Remove redundant check of pcie_set_clkpm (Shawn Lin) Virtualization: - Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra) - Add DMA alias quirk for Adaptec 3805 (Alex Williamson) - Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake) - Add ACS quirk for Solarflare SFC9220 (Edward Cree) MSI: - Fix PCI_MSI dependencies (Arnd Bergmann) - Add pci_msix_desc_addr() helper (Christoph Hellwig) - Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig) - Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig) - Provide sensible IRQ vector alloc/free routines (Christoph Hellwig) - Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig) Error Handling: - Bind DPC to Root Ports as well as Downstream Ports (Keith Busch) - Remove DPC tristate module option (Keith Busch) - Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg) Generic host bridge driver: - Select IRQ_DOMAIN (Arnd Bergmann) - Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) ACPI host bridge driver: - Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki) - Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki) - Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki) - Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki) Altera host bridge driver: - Check link status before retrain link (Ley Foon Tan) - Poll for link up status after retraining the link (Ley Foon Tan) Axis ARTPEC-6 host bridge driver: - Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann) - Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel) - Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel) Intel VMD host bridge driver: - Use lock save/restore in interrupt enable path (Jon Derrick) - Select device dma ops to override (Keith Busch) - Initialize list item in IRQ disable (Keith Busch) - Use x86_vector_domain as parent domain (Keith Busch) - Separate MSI and MSI-X vector sharing (Keith Busch) Marvell Aardvark host bridge driver: - Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni) - Add Aardvark PCI host controller driver (Thomas Petazzoni) - Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni) Microsoft Hyper-V host bridge driver: - Fix interrupt cleanup path (Cathy Avery) - Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov) - Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov) NVIDIA Tegra host bridge driver: - Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren) - Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren) - Use lower-case hex consistently for register definitions (Thierry Reding) - Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding) - Stop setting pcibios_min_mem (Thierry Reding) Renesas R-Car host bridge driver: - Drop gen2 dummy I/O port region (Bjorn Helgaas) TI DRA7xx host bridge driver: - Fix return value in case of error (Christophe JAILLET) Xilinx AXI host bridge driver: - Fix return value in case of error (Christophe JAILLET) Miscellaneous: - Make bus_attr_resource_alignment static (Ben Dooks) - Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks) - MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven) - Make host bridge drivers explicitly non-modular (Paul Gortmaker)" * tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits) PCI: xgene: Make explicitly non-modular PCI: thunder-pem: Make explicitly non-modular PCI: thunder-ecam: Make explicitly non-modular PCI: tegra: Make explicitly non-modular PCI: rcar-gen2: Make explicitly non-modular PCI: rcar: Make explicitly non-modular PCI: mvebu: Make explicitly non-modular PCI: layerscape: Make explicitly non-modular PCI: keystone: Make explicitly non-modular PCI: hisi: Make explicitly non-modular PCI: generic: Make explicitly non-modular PCI: designware-plat: Make it explicitly non-modular PCI: artpec6: Make explicitly non-modular PCI: armada8k: Make explicitly non-modular PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency PCI: Add ACS quirk for Solarflare SFC9220 arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700 PCI: aardvark: Add Aardvark PCI host controller driver dt-bindings: add DT binding for the Aardvark PCIe controller PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values ...	2016-08-02 17:12:29 -04:00
Linus Torvalds	3fc9d69093	Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...	2016-07-26 15:37:51 -07:00
Keith Busch	13880f5b57	nvme/pci: Provide SR-IOV support This registers an sr-iov callback for nvme. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 21:29:59 -06:00
Masayoshi Mizuma	2fa843512b	nvme: avoid crashes when node 0 is memoryless node. When CONFIG_NUMA is enabled and node 0 is memoryless, the system crashes because nvme_probe() sets the device->numa_node to 0 by set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0. To avoid the crash, we should change the 0 to first_memory_node. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-13 09:17:55 -07:00
Keith Busch	f80ec966c1	nvme: Limit command retries Many controller implementations will return errors to commands that will not succeed, but without the DNR bit set. The driver previously retried these commands an unlimited number of times until the command timeout has exceeded, which takes an unnecessarilly long period of time. This patch limits the number of retries a command can have, defaulting to 5, but is user tunable at load or runtime. The struct request's 'retries' field is used to track the number of retries attempted. This is in contrast with scsi's use of this field, which indicates how many retries are allowed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 16:20:31 -07:00
Guilherme G. Piccoli	54adc01055	nvme/quirk: Add a delay before checking for adapter readiness When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:23:00 -07:00
Christoph Hellwig	eb793e2c92	nvme.h: add NVMe over Fabrics definitions The NVMe over Fabrics specification defines a protocol interface and related extensions to NVMe that enable operation over network protocols. The NVMe over Fabrics specification has an NVMe Transport binding for each NVMe Transport. This patch adds the fabrics related definitions: - fabric specific command set and error codes - transport addressing and binding definitions - fabrics sgl extensions - controller identification fabrics enhancements - discovery log page definition Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-05 11:28:14 -06:00
Ming Lin	1a353d85b0	nvme: add fabrics sysfs attributes - delete_controller: This attribute allows to delete a controller. A driver is not obligated to support it (pci doesn't) so it is created only if the driver supports it. The new fabrics drivers will support it (essentialy a disconnect operation). Usage: echo > /sys/class/nvme/nvme0/delete_controller - subsysnqn: This attribute shows the subsystem nqn of the configured device. If a driver does not implement the get_subsysnqn method, the file will not appear in sysfs. - transport: This attribute shows the transport name. Added a "name" field to struct nvme_ctrl_ops. For loop, cat /sys/class/nvme/nvme0/transport loop For RDMA, cat /sys/class/nvme/nvme0/transport rdma For PCIe, cat /sys/class/nvme/nvme0/transport pcie - address: This attributes shows the controller address. The fabrics drivers that will implement get_address can show the address of the connected controller. example: cat /sys/class/nvme/nvme0/address traddr=192.168.2.2,trsvcid=1023 Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-05 11:28:12 -06:00
Christoph Hellwig	eb71f43557	nvme: Modify and export sync command submission for fabrics NVMe over fabrics will use __nvme_submit_sync_cmd in the the transport and require a few tweaks to it. For that we export it and add a few more paramters: 1. allow passing a queue ID to the block layer For the NVMe over Fabrics connect command we need to able to specify a queue ID that we want to send the command on. Add a qid parameter to the relevant functions to enable this behavior. 2. allow submitting at_head commands In cases where we want to (re)connect to a controller where we have inflight queued commands we want to first connect and only then allow the other queued commands to be kicked. This will prevents failures in controller resets and reconnects. 3. allow passing flags to blk_mq_allocate_request Both for Fabrics connect the the keep-alive feature in NVMe 1.2.1 we want to be able to use reserved requests. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-05 11:28:11 -06:00
Johannes Thumshirn	a1f447b35b	NVMe: Use pci_(request\|release)_mem_regions Now that we do have pci_request_mem_regions() and pci_release_mem_regions() at hand, use it in the NVMe driver. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> CC: Keith Busch <keith.busch@intel.com> CC: Jens Axboe <axboe@fb.com>	2016-06-21 17:09:32 -05:00
Christoph Hellwig	f5fa90dc0a	nvme: move the workaround for I/O queue-less controllers from PCIe to core We want to apply this to Fabrics drivers as well, so move it to common code. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-12 07:29:43 -06:00
Johannes Thumshirn	edb50a5403	NVMe: Only release requested regions The NVMe driver only requests the PCIe device's memory regions but releases all possible regions (including eventual I/O regions). This leads to a stale warning entry in dmesg about freeing non existent resources. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-09 14:28:28 -06:00
Ming Lin	c55a2fd4bb	nvme: move nvme_cancel_request() to common code So it can be used by fabrics driver also. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.bsuch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:43:02 -06:00
Ming Lin	e1958e6534	nvme: update and rename nvme_cancel_io to nvme_cancel_request nvme_cancel_io is a bit confusing (given the distinction of io/admin), so rename it to nvme_cancel_request. And update it a bit to pass in struct nvme_ctrl, so it can be used by Fabrics driver also. Signed-off-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Suggested-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.bsuch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-06-07 13:43:02 -06:00
Keith Busch	99466e708d	NVMe: Add device ID's with stripe quirk Adds two Intel controllers that have the "stripe" quirk. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Keith Busch	0ff9d4e1a2	NVMe: Short-cut removal on surprise hot-unplug This patch adds a new state that when set has the core automatically kill request queues prior to removing namespaces. If PCI device is not present at the time the nvme driver's remove is called, we can kill all IO queues immediately instead of waiting for the watchdog thread to do that at its polling interval. This improves scenarios where multiple hot plug events occur at the same time since it doesn't block the pci enumeration for as long. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Keith Busch	d011fb3164	NVMe: Reduce driver log spamming Reduce error logging when no corrective action is required. Suggessted-by: Chris Petersen <cpetersen@fb.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Keith Busch	921920ab32	NVMe: Unbind driver on failure Instead of removing the PCI device from the kernel's topology on controller failure, this patch simply requests unbinding the device from the driver. This avoids concurrently running pci removal with the hot plug event, which has been reported to be problematic when multiple surprise events occur near simultaneously. The other benefit is that we will have PCI config and memory space available to poke around for debugging a failed controller, assuming the device was not physically removed. The down side occurs if the platform and/or kernel do not support any type of surprise hot removal. The device will remain visible through sysfs (and therefore lspci), and some manual work is necessary to get the logical topology corrected. But if your platform and/or kernel don't support surprise removal, you probably shouldn't be doing that anyway. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Keith Busch	014a0d609e	NVMe: Delete only created queues Use the online queue count instead of the number of allocated queues. The controller should just return an invalid queue identifier error to the commands if a queue wasn't created. While it's not harmful, it's still not correct. Reported-by: Saar Gross <saar@annapurnalabs.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Keith Busch	2800b8e7d9	NVMe: Allocate queues only for online cpus The driver previously requested allocating queues for the total possible number of CPUs so that blk-mq could rebalance these if CPUs were added after initialization. The number of hardware contexts can now be changed at runtime, so we only need to allocate the number of online queues since we can add more later. Suggested-by: Jeff Lien <jeff.lien@hgst.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-17 17:14:21 -06:00
Linus Torvalds	24b9f0cf00	Merge branch 'for-4.7/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "On top of the core pull request, this is the drivers pull request for this merge window. This contains: - Switch drivers to the new write back cache API, and kill off the flush flags. From me. - Kill the discard support for the STEC pci-e flash driver. It's trivially broken, and apparently unmaintained, so it's safer to just remove it. From Jeff Moyer. - A set of lightnvm updates from the usual suspects (Matias/Javier, and Simon), and fixes from Arnd, Jeff Mahoney, Sagi, and Wenwei Tao. - A set of updates for NVMe: - Turn the controller state management into a proper state machine. From Christoph. - Shuffling of code in preparation for NVMe-over-fabrics, also from Christoph. - Cleanup of the command prep part from Ming Lin. - Rewrite of the discard support from Ming Lin. - Deadlock fix for namespace removal from Ming Lin. - Use the now exported blk-mq tag helper for IO termination. From Sagi. - Various little fixes from Christoph, Guilherme, Keith, Ming Lin, Wang Sheng-Hui. - Convert mtip32xx to use the now exported blk-mq tag iter function, from Keith" * 'for-4.7/drivers' of git://git.kernel.dk/linux-block: (74 commits) lightnvm: reserved space calculation incorrect lightnvm: rename nr_pages to nr_ppas on nvm_rq lightnvm: add is_cached entry to struct ppa_addr lightnvm: expose gennvm_mark_blk to targets lightnvm: remove mgt targets on mgt removal lightnvm: pass dma address to hardware rather than pointer lightnvm: do not assume sequential lun alloc. nvme/lightnvm: Log using the ctrl named device lightnvm: rename dma helper functions lightnvm: enable metadata to be sent to device lightnvm: do not free unused metadata on rrpc lightnvm: fix out of bound ppa lun id on bb tbl lightnvm: refactor set_bb_tbl for accepting ppa list lightnvm: move responsibility for bad blk mgmt to target lightnvm: make nvm_set_rqd_ppalist() aware of vblks lightnvm: remove struct factory_blks lightnvm: refactor device ops->get_bb_tbl() lightnvm: introduce nvm_for_each_lun_ppa() macro lightnvm: refactor dev->online_target to global nvm_targets lightnvm: rename nvm_targets to nvm_tgt_type ...	2016-05-17 16:03:32 -07:00
Keith Busch	87c3207781	NVMe: Fix reset/remove race This fixes a scenario where device is present and being reset, but a request to unbind the driver occurs. A previous patch series addressing a device failure removal scenario flushed reset_work after controller disable to unblock reset_work waiting on a completion that wouldn't occur. This isn't safe as-is. The broken scenario can potentially be induced with: modprobe nvme && modprobe -r nvme To fix, the reset work is flushed immediately after setting the controller removing flag, and any subsequent reset will not proceed with controller initialization if the flag is set. The controller status must be polled while active, so the watchdog timer is also left active until the controller is disabled to cleanup requests that may be stuck during namespace removal. [Fixes: `ff23a2a15a`] Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-03 14:00:29 -06:00
Ming Lin	6904242db1	nvme: add helper nvme_cleanup_cmd() This hides command cleanup into nvme.h and fabrics drivers will also use it. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:11:58 -06:00
Christoph Hellwig	f866fc4282	nvme: move AER handling to common code The transport driver still needs to do the actual submission, but all the higher level code can be shared. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:09:26 -06:00
Christoph Hellwig	5955be2144	nvme: move namespace scanning to core Move the scan work item and surrounding code to the common code. For now we need a new finish_scan method to allow the PCI driver to set the irq affinity hints, but I have plans in the works to obsolete this as well. Note that this moves the namespace scanning from nvme_wq to the system workqueue, but as we don't rely on namespace scanning to finish from reset or I/O this should be fine. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:09:24 -06:00
Christoph Hellwig	92911a55d4	nvme: tighten up state check for namespace scanning We only should be scanning namespaces if the controller is live. Currently we call the function just before setting it live, so fix the code up to move the call to nvme_queue_scan to just below the state change. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:09:23 -06:00
Christoph Hellwig	bb8d261e08	nvme: introduce a controller state machine Replace the adhoc flags in the PCI driver with a state machine in the core code. Based on code from Sagi Grimberg for the Fabrics driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:09:21 -06:00
Christoph Hellwig	04a934d4c7	nvme: remove the io_incapable method It's unused since "NVMe: Move error handling to failed reset handler". Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:09:20 -06:00
Keith Busch	3b24774e1f	NVMe: Fix check_flush_dependency warning If the controller fails and is degraded after a reset, we need to kill off all requests queues before removing the inaccessble namespaces. This will prevent del_gendisk from syncing dirty data, which we can't due from a WQ_MEM_RECLAIM work queue. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-05-02 09:03:06 -06:00
Keith Busch	a5229050b6	NVMe: Always use MSI/MSI-x interrupts Multiple users have reported device initialization failure due the driver not receiving legacy PCI interrupts. This is not unique to any particular controller, but has been observed on multiple platforms. There have been no issues reported or observed when with message signaled interrupts, so this patch attempts to use MSI-x during initialization, falling back to MSI. If that fails, legacy would become the default. The setup_io_queues error handling had to change as a result: the admin queue's msix_entry used to be initialized to the legacy IRQ. The case where nr_io_queues is 0 would fail request_irq when setting up the admin queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving the admin queue's msix_entry invalid. Instead, return success immediately. Reported-by: Tim Muhlemmer <muhlemmer@gmail.com> Reported-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-14 14:04:50 -06:00
Guilherme G. Piccoli	c875a7093f	nvme: Avoid reset work on watchdog timer function during error recovery This patch adds a check on nvme_watchdog_timer() function to avoid the call to reset_work() when an error recovery process is ongoing on controller. The check is made by looking at pci_channel_offline() result. If we don't check for this on nvme_watchdog_timer(), error recovery mechanism can't recover well, because reset_work() won't be able to do its job (since we're in the middle of an error) and so the controller is removed from the system before error recovery mechanism can perform slot reset (which would allow the adapter to recover). In this patch we also have split the huge condition expression on nvme_watchdog_timer() by introducing an auxiliary function to help make the code more readable. Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-13 08:15:39 -06:00
Jens Axboe	7e19793096	NVMe: silence warning about unused 'dev' Depending on options, we might not be using dev in nvme_cancel_io(): drivers/nvme/host/pci.c: In function ‘nvme_cancel_io’: drivers/nvme/host/pci.c:970:19: warning: unused variable ‘dev’ [-Wunused-variable] struct nvme_dev *dev = data; ^ So get rid of it, and just cast for the dev_dbg_ratelimited() call. Fixes: `82b4552b91` ("nvme: Use blk-mq helper for IO termination") Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 16:11:11 -06:00
Sagi Grimberg	82b4552b91	nvme: Use blk-mq helper for IO termination blk-mq offers a tagset iterator so let's use that instead of using nvme_clear_queues. Note, we changed nvme_queue_cancel_ios name to nvme_cancel_io as there is no concept of a queue now in this function (we also lost the print). Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 15:07:15 -06:00
Keith Busch	21f033f7c7	NVMe: Skip async events for degraded controllers If the controller is degraded, the driver should stay out of the way so the user can recover the drive. This patch skips driver initiated async event requests when the drive is in this state. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Ming Lin	8093f7ca73	nvme: add helper nvme_setup_cmd() This moves nvme_setup_{flush,discard,rw} calls into a common nvme_setup_cmd() helper. So we can eventually hide all the command setup in the core module and don't even need to update the fabrics drivers for any specific command type. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Ming Lin	03b5929ebb	nvme: rewrite discard support This rewrites nvme_setup_discard() with blk_add_request_payload(). It allocates only the necessary amount(16 bytes) for the payload. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Ming Lin	58b4560275	nvme: add helper nvme_map_len() The helper returns the number of bytes that need to be mapped using PRPs/SGL entries. Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Ming Lin	2e39e0f608	nvme: add missing lock nesting notation When unloading driver, nvme_disable_io_queues() calls nvme_delete_queue() that sends nvme_admin_delete_cq command to admin sq. So when the command completed, the lock acquired by nvme_irq() actually belongs to admin queue. While the lock that nvme_del_cq_end() trying to acquire belongs to io queue. So it will not deadlock. This patch adds lock nesting notation to fix following report. [ 109.840952] ============================================= [ 109.846379] [ INFO: possible recursive locking detected ] [ 109.851806] 4.5.0+ #180 Tainted: G E [ 109.856533] --------------------------------------------- [ 109.861958] swapper/0/0 is trying to acquire lock: [ 109.866771] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme] [ 109.876535] [ 109.876535] but task is already holding lock: [ 109.882398] (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme] [ 109.891547] [ 109.891547] other info that might help us debug this: [ 109.898107] Possible unsafe locking scenario: [ 109.898107] [ 109.904056] CPU0 [ 109.906515] ---- [ 109.908974] lock(&(&nvmeq->q_lock)->rlock); [ 109.913381] lock(&(&nvmeq->q_lock)->rlock); [ 109.917787] [ 109.917787] * DEADLOCK * [ 109.917787] [ 109.923738] May be due to missing lock nesting notation [ 109.923738] [ 109.930558] 1 lock held by swapper/0/0: [ 109.934413] #0: (&(&nvmeq->q_lock)->rlock){-.....}, at: [<ffffffffc0820c2b>] nvme_irq+0x1b/0x50 [nvme] [ 109.944010] [ 109.944010] stack backtrace: [ 109.948389] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G E 4.5.0+ #180 [ 109.955734] Hardware name: Dell Inc. OptiPlex 7010/0YXT71, BIOS A15 08/12/2013 [ 109.962989] 0000000000000000 ffff88011e203c38 ffffffff81383d9c ffffffff81c13540 [ 109.970478] ffffffff826711d0 ffff88011e203ce8 ffffffff810bb429 0000000000000046 [ 109.977964] 0000000000000046 0000000000000000 0000000000b2e597 ffffffff81f4cb00 [ 109.985453] Call Trace: [ 109.987911] <IRQ> [<ffffffff81383d9c>] dump_stack+0x85/0xc9 [ 109.993711] [<ffffffff810bb429>] __lock_acquire+0x19b9/0x1c60 [ 109.999575] [<ffffffff810b6d1d>] ? trace_hardirqs_off+0xd/0x10 [ 110.005524] [<ffffffff810b386d>] ? complete+0x3d/0x50 [ 110.010688] [<ffffffff810bb760>] lock_acquire+0x90/0xf0 [ 110.016029] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme] [ 110.022418] [<ffffffff81772afb>] _raw_spin_lock_irqsave+0x4b/0x60 [ 110.028632] [<ffffffffc0820bc6>] ? nvme_del_cq_end+0x26/0x70 [nvme] [ 110.035019] [<ffffffffc0820bc6>] nvme_del_cq_end+0x26/0x70 [nvme] [ 110.041232] [<ffffffff8135b485>] blk_mq_end_request+0x35/0x60 [ 110.047095] [<ffffffffc0821ad8>] nvme_complete_rq+0x68/0x190 [nvme] [ 110.053481] [<ffffffff8135b53f>] __blk_mq_complete_request+0x8f/0x130 [ 110.060043] [<ffffffff8135b611>] blk_mq_complete_request+0x31/0x40 [ 110.066343] [<ffffffffc08209e3>] __nvme_process_cq+0x83/0x240 [nvme] [ 110.072818] [<ffffffffc0820c35>] nvme_irq+0x25/0x50 [nvme] [ 110.078419] [<ffffffff810cdb66>] handle_irq_event_percpu+0x36/0x110 [ 110.084804] [<ffffffff810cdc77>] handle_irq_event+0x37/0x60 [ 110.090491] [<ffffffff810d0ea3>] handle_edge_irq+0x93/0x150 [ 110.096180] [<ffffffff81012306>] handle_irq+0xa6/0x130 [ 110.101431] [<ffffffff81011abe>] do_IRQ+0x5e/0x120 [ 110.106333] [<ffffffff8177384c>] common_interrupt+0x8c/0x8c Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Keith Busch	788e15abbb	NVMe: Always use MSI/MSI-x interrupts Multiple users have reported device initialization failure due the driver not receiving legacy PCI interrupts. This is not unique to any particular controller, but has been observed on multiple platforms. There have been no issues reported or observed when with message signaled interrupts, so this patch attempts to use MSI-x during initialization, falling back to MSI. If that fails, legacy would become the default. The setup_io_queues error handling had to change as a result: the admin queue's msix_entry used to be initialized to the legacy IRQ. The case where nr_io_queues is 0 would fail request_irq when setting up the admin queue's interrupt since re-enabling MSI-x fails with 0 vectors, leaving the admin queue's msix_entry invalid. Instead, return success immediately. Reported-by: Tim Muhlemmer <muhlemmer@gmail.com> Reported-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-12 13:44:00 -06:00
Keith Busch	9bf2b972af	NVMe: Fix reset/remove race This fixes a scenario where device is present and being reset, but a request to unbind the driver occurs. A previous patch series addressing a device failure removal scenario flushed reset_work after controller disable to unblock reset_work waiting on a completion that wouldn't occur. This isn't safe as-is. The broken scenario can potentially be induced with: modprobe nvme && modprobe -r nvme To fix, the reset work is flushed immediately after setting the controller removing flag, and any subsequent reset will not proceed with controller initialization if the flag is set. The controller status must be polled while active, so the watchdog timer is also left active until the controller is disabled to cleanup requests that may be stuck during namespace removal. [Fixes: `ff23a2a15a`] Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-04-11 10:00:04 -06:00
Marta Rybczynska	d783e0bd02	nvme: avoid cqe corruption when update at the same time as read Make sure the CQE phase (validity) is read before the rest of the structure. The phase bit is the highest address and the CQE read will happen on most platforms from lower to upper addresses and will be done by multiple non-atomic loads. If the structure is updated by PCI during the reads from the processor, the processor may get a corrupted copy. The addition of the new nvme_cqe_valid function that verifies the validity bit also allows refactoring of the other CQE read sequences. Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-22 10:27:29 -06:00
Linus Torvalds	237045fc3c	Merge branch 'for-4.6/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This is the block driver pull request for this merge window. It sits on top of for-4.6/core, that was just sent out. This contains: - A set of fixes for lightnvm. One from Alan, fixing an overflow, and the rest from the usual suspects, Javier and Matias. - A set of fixes for nbd from Markus and Dan, and a fixup from Arnd for correct usage of the signed 64-bit divider. - A set of bug fixes for the Micron mtip32xx, from Asai. - A fix for the brd discard handling from Bart. - Update the maintainers entry for cciss, since that hardware has transferred ownership. - Three bug fixes for bcache from Eric Wheeler. - Set of fixes for xen-blk{back,front} from Jan and Konrad. - Removal of the cpqarray driver. It has been disabled in Kconfig since 2013, and we were initially scheduled to remove it in 3.15. - Various updates and fixes for NVMe, with the most important being: - Removal of the per-device NVMe thread, replacing that with a watchdog timer instead. From Christoph. - Exposing the namespace WWID through sysfs, from Keith. - Set of cleanups from Ming Lin. - Logging the controller device name instead of the underlying PCI device name, from Sagi. - And a bunch of fixes and optimizations from the usual suspects in this area" * 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits) NVMe: Expose ns wwid through single sysfs entry drivers:block: cpqarray clean up brd: Fix discard request processing cpqarray: remove it from the kernel cciss: update MAINTAINERS NVMe: Remove unused sq_head read in completion path bcache: fix cache_set_flush() NULL pointer dereference on OOM bcache: cleaned up error handling around register_cache() bcache: fix race of writeback thread starting before complete initialization NVMe: Create discard zero quirk white list nbd: use correct div_s64 helper mtip32xx: remove unneeded variable in mtip_cmd_timeout() lightnvm: generalize rrpc ppa calculations lightnvm: remove struct nvm_dev->total_blocks lightnvm: rename ->nr_pages to ->nr_sects lightnvm: update closed list outside of intr context xen/blback: Fit the important information of the thread in 17 characters lightnvm: fold get bb tbl when using dual/quad plane mode lightnvm: fix up nonsensical configure overrun checking xen-blkback: advertise indirect segment support earlier ...	2016-03-18 17:13:31 -07:00
Jon Derrick	48c7823f42	NVMe: Remove unused sq_head read in completion path Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-08 15:01:06 -07:00
Keith Busch	08095e7078	NVMe: Create discard zero quirk white list The NVMe specification does not require discarded blocks return zeroes on read, but provides that behavior as a possibility. Some applications more efficiently use an SSD if reads on discarded blocks were deterministically zero, based on the "discard_zeroes_data" queue attribute. There is no specification defined way to determine device behavior on discarded blocks, so the driver always left the queue setting disabled. We can only know behavior based on individual device models, so this patch adds a flag to the NVMe "quirk" list that vendors may set if they know their controller works that way. The patch also sets the new flag for one such known device. Signed-off-by: Keith Busch <keith.busch@intel.com> Suggested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-08 08:32:40 -07:00
Keith Busch	69d9a99c25	NVMe: Move error handling to failed reset handler This moves failed queue handling out of the namespace removal path and into the reset failure path, fixing a hanging condition if the controller fails or link down during del_gendisk. Previously the driver had to see the controller as degraded prior to calling del_gendisk to setup the queues to fail. But, if the controller happened to fail after this, there was no task to end outstanding requests. On failure, all namespace states are set to dead. This has capacity revalidate to 0, and ends all new requests with error status. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 14:42:50 -07:00
Keith Busch	f58944e265	NVMe: Simplify device reset failure A reset failure schedules the device to unbind from the driver through the pci driver's remove. This cleans up all intialization, so there is no need to duplicate the potentially racy cleanup. To help understand why a reset failed, the status is logged with the existing warning message. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 14:42:49 -07:00
Keith Busch	646017a612	NVMe: Fix namespace removal deadlock This patch makes nvme namespace removal lockless. It is up to the caller to ensure no active namespace scanning is occuring. To ensure no scan work occurs, the nvme pci driver adds a removing state to the controller device to avoid queueing scan work during removal. The work is flushed after setting the state, so no new scan work can be queued. The lockless removal allows the driver to cleanup a namespace request_queue if the controller fails during removal. Previously this could deadlock trying to acquire the namespace mutex in order to handle such events. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 14:42:49 -07:00
Keith Busch	b00a726a9f	NVMe: Don't unmap controller registers on reset Unmapping the registers on reset or shutdown is not necessary. Keeping the mapping simplifies reset handling. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-03-03 14:42:49 -07:00
Christoph Hellwig	1cb3cce5eb	nvme: return the whole CQE through the request passthrough interface Both LighNVM and NVMe over Fabrics need to look at more than just the status and result field. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Matias Bj?rling <m@bjorling.me> Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-29 08:47:17 -07:00
Christoph Hellwig	2d55cd5f51	nvme: replace the kthread with a per-device watchdog timer The only work left in the kthread is the periodic health check for each controller. There is no need to run this from process context or keep a thread context around for it, so replace it with a simpler timer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-29 08:47:16 -07:00
Christoph Hellwig	79f2b358c9	nvme: don't poll the CQ from the kthread There is no reason to do unconditional polling of CQs per the NVMe spec. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-29 08:47:14 -07:00
Christoph Hellwig	9396dec916	nvme: use a work item to submit async event requests Use a dedicated work item to submit async event requests instead of the global kthread. This simplifies the code and reduces the latencies to resubmit a request once an even notification happened. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-29 08:47:13 -07:00
Keith Busch	f8e68a7c9a	NVMe: Rate limit nvme IO warnings We don't need to spam the kernel logs with thousands of IO cancelling messages. We can infer all IO's are being cancelled with fewer, or even none at all. This patch rate limits the message and uses the debug log level as it is mainly used for testing purposes. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-12 08:10:31 -07:00
Keith Busch	ff23a2a15a	NVMe: Poll device while still active during remove A device failure or link down wouldn't have been detected during namespace removal. This patch keeps the device in the list for polling so that the thread may see such failure and initiate a reset. The device is removed from the list after disable, so we can safely flush the reset work as it can't be requeued when disable completes. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-12 08:10:16 -07:00
Keith Busch	ae1fba2001	NVMe: Requeue requests on suspended queues It's possible a request may get to the driver after the nvme queue was disabled. This has the request requeue if that happens. Note the request is still "started" by the driver, but requeuing will clear the start state for timeout handling. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-12 08:10:08 -07:00
Ming Lin	576d55d625	nvme: split pci module out of core module NVMe over Fabrics drivers are going to reuse the core, so splits nvme.ko into 2 modules: nvme-core.ko: the core part nvme.ko: the PCI driver Export symbols from nvme-core.ko. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-10 14:22:38 -07:00
Ming Lin	9f2482b91b	nvme: split dev_list_lock Split dev_list_lock into one in the core and one in the PCI driver. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-10 14:22:36 -07:00
Ming Lin	ba0ba7d3e5	nvme: move timeout variables to core.c These variables are used by PCI driver and will also be used in the forthcoming NVMe over Fabrics drivers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-10 14:22:34 -07:00
Sagi Grimberg	e439bb12e7	nvme/host: reference the fabric module for each bdev open callout We don't want to be able to unload the fabric driver when we have openened referenced to our namespaces. Thus, for each nvme_open we take a reference on the fabric driver and put it in nvme_release. This behavior is consistent with the scsi model. This resolves the panic when unloading a fabric module with mpath holders. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ian Bakshan <ianb@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-10 14:22:32 -07:00
Sagi Grimberg	1b3c47c182	nvme: Log the ctrl device name instead of the underlying pci device name Having the ctrl name "nvmeX" seems much more friendly than the underlying device name. Also, with other nvme transports such as the soon to come nvme-loop we don't have an underlying device so it doesn't makes sense to make up one. In order to help matching an instance name to a pci function, we add a info print in nvme_probe. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Keith Busch <keith.busch@intel.com> Manually fixed up the hunk in nvme_cancel_queue_ios(). Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-10 08:51:15 -07:00
Keith Busch	949928c1c7	NVMe: Fix possible queue use after freed This notifies blk-mq when the tag set contains a different number of queues prior to freeing unused ones that the request queue points to. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-02-09 12:42:35 -07:00
Linus Torvalds	3e1e21c7bf	Merge branch 'for-4.5/nvme' of git://git.kernel.dk/linux-block Pull NVMe updates from Jens Axboe: "Last branch for this series is the nvme changes. It's in a separate branch to avoid splitting too much between core and NVMe changes, since NVMe is still helping drive some blk-mq changes. That said, not a huge amount of core changes in here. The grunt of the work is the continued split of the code" * 'for-4.5/nvme' of git://git.kernel.dk/linux-block: (67 commits) uapi: update install list after nvme.h rename NVMe: Export NVMe attributes to sysfs group NVMe: Shutdown controller only for power-off NVMe: IO queue deletion re-write NVMe: Remove queue freezing on resets NVMe: Use a retryable error code on reset NVMe: Fix admin queue ring wrap nvme: make SG_IO support optional nvme: fixes for NVME_IOCTL_IO_CMD on the char device nvme: synchronize access to ctrl->namespaces nvme: Move nvme_freeze/unfreeze_queues to nvme core PCI/AER: include header file NVMe: Export namespace attributes to sysfs NVMe: Add pci error handlers block: remove REQ_NO_TIMEOUT flag nvme: merge iod and cmd_info nvme: meta_sg doesn't have to be an array nvme: properly free resources for cancelled command nvme: simplify completion handling nvme: special case AEN requests ...	2016-01-21 19:58:02 -08:00
Linus Torvalds	7c24d9f3b2	Merge branch 'for-4.5/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: "We don't have a lot of core changes this time around, it's mostly in drivers, which will come in a subsequent pull. The cores changes include: - blk-mq - Prep patch from Christoph, changing blk_mq_alloc_request() to take flags instead of just using gfp_t for sleep/nosleep. - Doc patch from me, clarifying the difference between legacy and blk-mq for timer usage. - Fixes from Raghavendra for memory-less numa nodes, and a reuse of CPU masks. - Cleanup from Geliang Tang, using offset_in_page() instead of open coding it. - From Ilya, rename request_queue slab to it reflects what it holds, and a fix for proper use of bdgrab/put. - A real fix for the split across stripe boundaries from Keith. We yanked a broken version of this from 4.4-rc final, this one works. - From Mike Krinkin, emit a trace message when we split. - From Wei Tang, two small cleanups, not explicitly clearing memory that is already cleared" * 'for-4.5/core' of git://git.kernel.dk/linux-block: block: use bd{grab,put}() instead of open-coding block: split bios to max possible length block: add call to split trace point blk-mq: Avoid memoryless numa node encoded in hctx numa_node blk-mq: Reuse hardware context cpumask for tags blk-mq: add a flags parameter to blk_mq_alloc_request Revert "blk-flush: Queue through IO scheduler when flush not required" block: clarify blk_add_timer() use case for blk-mq bio: use offset_in_page macro block: do not initialise statics to 0 or NULL block: do not initialise globals to 0 or NULL block: rename request_queue slab cache	2016-01-19 15:03:34 -08:00
Keith Busch	a5cdb68c2c	NVMe: Shutdown controller only for power-off We don't need to shutdown a controller for a reset. A controller in a shutdown state may take longer to become ready than one that was simply disabled. This patch has the driver shut down a controller only if the device is about to be powered off or being removed. When taking the controller down for a reset reason, the controller will be disabled instead. Function names have been updated in this patch to reflect their changed semantics. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 14:47:59 -07:00
Keith Busch	db3cbfff5b	NVMe: IO queue deletion re-write The nvme driver deletes IO queues asynchronously since this operation may potentially take an undesirable amount of time with a large number of queues if done serially. The driver used to manage coordinating asynchronous deletions. This patch simplifies that by leveraging the block layer rather than using kthread workers and chaining more complicated callbacks. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 14:47:56 -07:00
Keith Busch	25646264e1	NVMe: Remove queue freezing on resets NVMe submits all commands through the block layer now. This means we can let requests queue at the blk-mq hardware context since there is no path that bypasses this anymore so we don't need to freeze the queues anymore. The driver can simply stop the h/w queues from running during a reset instead. This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen with requeued requests. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 13:33:36 -07:00
Keith Busch	1d49c38c48	NVMe: Use a retryable error code on reset A negative status has the "do not retry" bit set, which makes it not retryable. Use a fake status that can potentially be retried on reset. An aborted command's status is overridden by the timeout handler so that it won't be retried, which is necessary to keep initialization from getting into a reset loop. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 13:33:35 -07:00
Keith Busch	e3e9d50cd6	NVMe: Fix admin queue ring wrap The tag set queue depth needs to be one less than the h/w queue depth so we don't wrap the circular buffer. This conforms to the specification defined "Full Queue" condition. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 13:33:33 -07:00
Sagi Grimberg	363c9aacb6	nvme: Move nvme_freeze/unfreeze_queues to nvme core Nothing pci specific about them and We'll need them exported in other transports too. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-01-12 13:30:11 -07:00
Keith Busch	b5875222de	NVMe: IO ending fixes on surprise removal This patch fixes a lost request discovered during IO + hot removal. The driver's pci removal deletes gendisks prior to shutting down the controller to allow dirty data to sync. Dirty data can not be synced on a surprise removal, though, and would potentially block indefinitely. The driver previously had marked the queue as dying in this scenario to prevent new requests from attempting, however it will still block for requests that already entered the queue. This patch fixes this by quiescing IO first, then aborting the requeued requests before deleting disks. Reported-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Tested-by: Sujith Pandel <sujith_pandel@dell.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 10:12:04 -07:00
Keith Busch	a0a3408ee6	NVMe: Add pci error handlers Requests enabling pcie aer support. Shuts down the controller on error detected with io frozen state prior to requesting slot reset; resumes controller after reset completes. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 10:09:06 -07:00
Christoph Hellwig	f4800d6d15	nvme: merge iod and cmd_info Merge the two per-request structures in the nvme driver into a single one. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	bf68405705	nvme: meta_sg doesn't have to be an array Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	eee417b069	nvme: properly free resources for cancelled command We need to move freeing of resources to the ->complete handler to ensure they are also freed when we cancel the command. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	aae239e191	nvme: simplify completion handling Now that all commands are executed as block layer requests we can remove the internal completion in the NVMe driver. Note that we can simply call blk_mq_complete_request to abort commands as the block layer will protect against double copletions internally. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	adf68f21c1	nvme: special case AEN requests AEN requests are different from other requests in that they don't time out or can easily be cancelled. Because of that we should not use the blk-mq infrastructure but just special case them in the completion path. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	e7a2a87d59	nvme: switch abort to blk_execute_rq_nowait And remove the now unused nvme_submit_cmd helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	d8f32166a9	nvme: switch delete SQ/CQ to blk_execute_rq_nowait Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	7688faa6dd	nvme: factor out a few helpers from req_completion We'll need them in other places later. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Christoph Hellwig	4680072003	nvme: fix admin queue depth The number in tag_set->queue depth includes the reserved tags. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:34 -07:00
Keith Busch	53029b0441	NVMe: Remove device management handles on remove We don't want to allow new references to open on a device that is removed. This ties the lifetime of these handles to the physical device's presence rather than to the open reference count. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Keith Busch	92f7a1624b	NVMe: Use unbounded work queue for all work Removes all usage of the global work queue so work can't be scheduled on two different work queues, and removes nvme's work queue singlethreadedness so controllers can be driven in parallel. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: keep the dead controller removal on the system workqueue to avoid deadlocks] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Keith Busch	540c801c65	NVMe: Implement namespace list scanning The NVMe 1.1 specification provides an identify mode to return a list of active namespaces. This is more efficient to discover which namespace identifiers are active on a controller, providing potentially significant improvement in scan time for controllers with sparesly populated namespaces. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: add quirk for the broken Qemu Identify implementation. To be relaxed later] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	6bf25d1641	nvme: switch abort_limit to an atomic_t There is no lock to sychronize access to the abort_limit field of struct nvme_ctrl, so switch it to an atomic_t. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	5c8809e650	nvme: remove dead controllers from a work item Compared to the kthread this gives us multiple call prevention for free. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	fd634f4142	nvme: merge probe_work and reset_work If we're using two work queues we're always going to run into races where one item is tearing down what the other one is initializing. So insted merge the two work queues, and let the old probe_work also tear the controller down first if it was alive. Together with the better detection of the probe path using a flag this gives us a properly serialized reset/probe path that also doesn't accidentally trigger when two commands time out and the second one tries to reset the controller while the first reset is still in progress. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Keith Busch	e1569a1618	nvme: do not restart the request timeout if we're resetting the controller Otherwise we're never going to complete a command when it is restarted just after we completed all other outstanding commands in nvme_clear_queue. The controller must be disabled prior to completing a presumed lost command, do this by directly shutting down the controller before queueing the reset work, and return EH_HANDLED from the timeout handler after we shut the controller down. Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split and rebase] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	846cc05f95	nvme: simplify resets Don't delete the controller from dev_list before queuing a reset, instead just check for it being reset in the polling kthread. This allows to remove the dev_list_lock in various places, and in addition we can simply rely on checking the queue_work return value to see if we could reset a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	297465c873	nvme: add NVME_SC_CANCELLED To properly document how we are using a negative Linux error value to communicate request cancellations inside the driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	31c7c7d2c9	nvme: merge nvme_abort_req and nvme_timeout We want to be able to return bettern error values frmo nvme_timeout, which is significantly easier if the two functions are merged. Also clean up and reduce the printk spew so that we only get one message per abort. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:33 -07:00
Christoph Hellwig	4c9f748f0e	nvme: don't take the I/O queue q_lock in nvme_timeout There is nothing it protects, but it makes lockdep unhappy in many different ways. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:32 -07:00
Keith Busch	77bf25ea70	nvme: protect against simultaneous shutdown invocations Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:32 -07:00
Christoph Hellwig	7385014c07	nvme: only add a controller to dev_list after it's been fully initialized Without this we can easily get bad derferences on nvmeq->d_db when the nvme kthread tries to poll the CQs for controllers that are in half initialized state. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:23 -07:00
Christoph Hellwig	749941f236	nvme: only ignore hardware errors in nvme_create_io_queues Half initialized queues due to kernel error returns or timeout are still a good reason to give up on initializing a controller. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-22 09:38:20 -07:00
Stephan Günther	1f390c1fde	nvme: temporary fix for Apple controller reset Recent patches added basic support for the Apple NVMe controller but still cause resets and data corruption on that particular controller when a specific pattern of read/flush commands occurs. Limiting the queue depth to 2 works around that issue. This patch enforces that limit only for the Apple controller and is considered a temporary fix until we find the root source of that problem. Signed-off-by: Stephan Günther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 13:23:22 -07:00
Christoph Hellwig	9a0be7abb6	nvme: refactor set_queue_count Split out a helper that just issues the Set Features and interprets the result which can go to common code, and document why we are ignoring non-timeout error returns in the PCIe driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:40 -07:00
Christoph Hellwig	f3ca80fc11	nvme: move chardev and sysfs interface to common code For this we need to add a proper controller init routine and a list of all controllers that is in addition to the list of PCIe controllers, which stays in pci.c. Note that we remove the sysfs device when the last reference to a controller is dropped now - the old code would have kept it around longer, which doesn't make much sense. This requires a new ->reset_ctrl operation to implement controleller resets, and a new ->write_reg32 operation that is required to implement subsystem resets. We also now store caches copied of the NVMe compliance version and the flag if a controller is attached to a subsystem or not in the generic controller structure now. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixes for pr merge] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:40 -07:00
Christoph Hellwig	5bae7f73d3	nvme: move namespace scanning to common code The namespace scanning code has been mostly generic already, we just need to store a pointer to the tagset in the nvme_ctrl structure, and add a method to check if a controller is I/O incapable. The latter will hopefully be replaced by a proper controller state machine soon. Signed-off-by: Christoph Hellwig <hch@lst.de> [Fixed pr conflicts] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:40 -07:00
Christoph Hellwig	ce4541f40a	nvme: move the call to nvme_init_identify earlier We want to record the identify and CAP values even if no I/O queue is available. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:40 -07:00
Christoph Hellwig	7fd8930f26	nvme: add a common helper to read Identify Controller data And add the 64-bit register read operation for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	5fd4ce1b00	nvme: move nvme_{enable,disable,shutdown}_ctrl to common code Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	1b2eb37465	nvme: move remaining CC setup into nvme_enable_ctrl Remove the calculation of all the bits written into the CC register into nvme_enable_ctrl, so that they can be moved into the core NVMe driver in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	106198edb7	nvme: add explicit quirk handling Add an enum for all workarounds not in the spec and identify the affected controllers at probe time. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	1673f1f08c	nvme: move block_device_operations and ns/ctrl freeing to common code This moves the block_device_operations over to common code mostly as-is. The only change is that the ns and ctrl refcounting got some small refcounting to have wrappers around the kref_put operations. A new free_ctrl operation is added to allow the PCI driver to free it's ressources on the final drop. Signed-off-by: Christoph Hellwig <hch@lst.de> [Moved the integrity and pr changes due to merge conflict] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Keith Busch	0b7f1f26f9	nvme: use the block layer for userspace passthrough metadata Use the integrity API to pass through metadata from userspace. For PI enabled devices this means that we now validate the reftag, which seems like an unintentional ommission in the old code. Thanks to Keith Busch for testing and fixes. Signed-off-by: Christoph Hellwig <hch@lst.de> [Skip metadata setup on admin commands] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	4160982e75	nvme: split __nvme_submit_sync_cmd Add a separate nvme_submit_user_cmd for commands that directly DMA to or from userspace. We'll add metadata support to that soon and the common version would become too messy. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	22944e9981	nvme: move nvme_setup_flush and nvme_setup_rw to common code And mark them inline so that we don't slow down the I/O submission path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	15a190f7f5	nvme: move nvme_error_status to common code And mark it inline so that we don't slow down the completion path by having to turn it into a forced out of line call. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	d4f6c3aba5	nvme: factor out a nvme_unmap_data helper This is the counter part to nvme_map_data. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	ba1ca37ea4	nvme: refactor nvme_queue_rq This "backports" the structure I've used for the fabrics driver. It mostly started out as a cleanup so that I could actually understand the code, but I think it also qualifies as a micro-optimization due to the reduced time we hold q_lock and disable interrupts. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	69d2b57174	nvme: simplify nvme_setup_prps calling convention Pass back a true/false value instead of the length which needs a compare with the bytes in the request and drop the pointless gfp_t argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:39 -07:00
Christoph Hellwig	1c63dc6658	nvme: split a new struct nvme_ctrl out of struct nvme_dev The new struct nvme_ctrl will be used by the common NVMe code that sits on top of struct request_queue and the new nvme_ctrl_ops abstraction. It only contains the bare minimum required, which consists of values sampled during controller probe, the admin queue pointer and a second struct device pointer at the moment, but more will follow later. Only values that are not used in the I/O fast path should be moved to struct nvme_ctrl so that drivers can optimize their cache line usage easily. That's also the reason why we have two device pointers as the struct device is used for DMA mapping purposes. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:38 -07:00
Christoph Hellwig	7a67cbea65	nvme: use offset instead of a struct for registers This makes life easier for future non-PCI drivers where access to the registers might be more complicated. Note that Linux drivers are pretty evenly split between the two versions, and in fact the NVMe driver already uses offsets for the doorbells. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> [Fixed CMBSZ offset] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:38 -07:00
Christoph Hellwig	21d34711e1	nvme: split command submission helpers out of pci.c Create a new core.c and start by adding the command submission helpers to it, which are already abstracted away from the actual hardware queues by the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:38 -07:00
Christoph Hellwig	71bd150c71	nvme: move struct nvme_iod to pci.c This structure is specific to the PCIe driver internals and should be moved to pci.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:59:38 -07:00
Christoph Hellwig	6f3b0e8bcf	blk-mq: add a flags parameter to blk_mq_alloc_request We already have the reserved flag, and a nowait flag awkwardly encoded as a gfp_t. Add a real flags argument to make the scheme more extensible and allow for a nicer calling convention. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-12-01 10:53:59 -07:00
Christoph Hellwig	bf508e910b	nvme: add missing unmaps in nvme_queue_rq When we fail various metadata related operations in nvme_queue_rq we need to unmap the data SGL. Cc: stable@vger.kernel.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-24 15:24:05 -07:00
Nishanth Aravamudan	c5c9f25b98	NVMe: default to 4k device page size We received a bug report recently when DDW (64-bit direct DMA on Power) is not enabled for NVMe devices. In that case, we fall back to 32-bit DMA via the IOMMU, which is always done via 4K TCEs (Translation Control Entries). The NVMe device driver, though, assumes that the DMA alignment for the PRP entries will match the device's page size, and that the DMA aligment matches the kernel's page aligment. On Power, the the IOMMU page size, as mentioned above, can be 4K, while the device can have a page size of 8K, while the kernel has a page size of 64K. This eventually trips the BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple of 4K but not 8K (e.g., 0xF000). In this particular case of page sizes, we clearly want to use the IOMMU's page size in the driver. And generally, the NVMe driver in this function should be using the IOMMU's page size for the default device page size, rather than the kernel's page size. There is not currently an API to obtain the IOMMU's page size across all architectures and in the interest of a stop-gap fix to this functional issue, default the NVMe device page size to 4K, with the intent of adding such an API and implementation across all architectures in the next merge window. With the functionally equivalent v3 of this patch, our hardware test exerciser survives when using 32-bit DMA; without the patch, the kernel will BUG within a few minutes. Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-24 15:05:51 -07:00
Keith Busch	604e8c8da8	NVMe: reap completion entries when deleting queue Make sure that there are no unprocesssed entries on a completion queue before deleting it, and check for validity of the CQ door bell before writing completions to it. This fixes problems with doing a sysfs reset of the device while it's handling IO. Tested-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-20 08:38:13 -07:00
Keith Busch	6824c5ef5e	NVMe: Fix possible arithmetic overflow for max segments Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-19 13:02:24 -07:00
Stephan Günther	c74dc7801d	NVMe: add support for Apple NVMe controller Add PCI ID of Apple's NVMe controller. Signed-off-by: Stephan Guenther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-11 09:36:57 -07:00
Stephan Günther	a310acd7a7	NVMe: use split lo_hi_{read,write}q Some controllers may require ordered split transfers even on 64bit machines, e.g. Apple's NVMe controller as found in the MacBook8,1 and MacBookAir7,1 (256/512GB models). This patch enforces ordered split transfers on 64bit platforms, which works around that issue for all controllers. As pointed out by Christoph [1] there should be no performance impact due to that modification. [1] http://lists.infradead.org/pipermail/linux-nvme/2015-November/002965.html Signed-off-by: Stephan Guenther <guenther@tum.de> Signed-off-by: Maurice Leclaire <leclaire@in.tum.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Updated by me to explicitly use lo_hi_read/writeq instead of playing define tricks. Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-11 09:36:56 -07:00
Sathyavathi M	b12363d0a5	NVMe: Increase the max transfer size when mdts is 0 This patch address the issue when IO with 128KB from FIO is split into two parts, 124KB and 4KB, due to max transfer size(127KB). This degrades the device performance. Signed-off-by: Sathyavathi M <sathya.m@samsung.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-11 09:36:56 -07:00
Linus Torvalds	3419b45039	Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie	2015-11-10 17:23:49 -08:00
Linus Torvalds	ad804a0b2a	Merge branch 'akpm' (patches from Andrew) Merge second patch-bomb from Andrew Morton: - most of the rest of MM - procfs - lib/ updates - printk updates - bitops infrastructure tweaks - checkpatch updates - nilfs2 update - signals - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc, dma-debug, dma-mapping, ... * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (102 commits) ipc,msg: drop dst nil validation in copy_msg include/linux/zutil.h: fix usage example of zlib_adler32() panic: release stale console lock to always get the logbuf printed out dma-debug: check nents in dma_sync_sg* dma-mapping: tidy up dma_parms default handling pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode kexec: use file name as the output message prefix fs, seqfile: always allow oom killer seq_file: reuse string_escape_str() fs/seq_file: use seq_* helpers in seq_hex_dump() coredump: change zap_threads() and zap_process() to use for_each_thread() coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT) signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread() signal: turn dequeue_signal_lock() into kernel_dequeue_signal() signals: kill block_all_signals() and unblock_all_signals() nilfs2: fix gcc uninitialized-variable warnings in powerpc build nilfs2: fix gcc unused-but-set-variable warnings MAINTAINERS: nilfs2: add header file for tracing nilfs2: add tracepoints for analyzing reading and writing metadata files ...	2015-11-07 14:32:45 -08:00
Jens Axboe	a0fa9647a5	NVMe: add blk polling support Add nvme_poll(), which will check a specific completion queue for command completions. Wire that up to the new block layer poll mechanism. Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com>	2015-11-07 10:40:47 -07:00
Mel Gorman	71baba4b92	mm, page_alloc: rename __GFP_WAIT to __GFP_RECLAIM __GFP_WAIT was used to signal that the caller was in atomic context and could not sleep. Now it is possible to distinguish between true atomic context and callers that are not willing to sleep. The latter should clear __GFP_DIRECT_RECLAIM so kswapd will still wake. As clearing __GFP_WAIT behaves differently, there is a risk that people will clear the wrong flags. This patch renames __GFP_WAIT to __GFP_RECLAIM to clearly indicate what it does -- setting it allows all reclaim activity, clearing them prevents it. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Vitaly Wool <vitalywool@gmail.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-11-06 17:50:42 -08:00
Linus Torvalds	9cf5c095b6	asm-generic cleanups The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIVAwUAVjzaf2CrR//JCVInAQImmhAA20fZ91sUlnA5skKNPT1phhF6Z7UF2Sx5 nPKcHQD3HA3lT1OKfPBYvCo+loYflvXFLaQThVylVcnE/8ecAEMtft4nnGW2nXvh sZqHIZ8fszTB53cynAZKTjdobD1wu33Rq7XRzg0ugn1mdxFkOzCHW/xDRvWRR5TL rdQjzzgvn2PNlqFfHlh6cZ5ykShM36AIKs3WGA0H0Y/aYsE9GmDOAUp41q1mLXnA 4lKQaIxoeOa+kmlsUB0wEHUecWWWJH4GAP+CtdKzTX9v12bGNhmiKUMCETG78BT3 uL8irSqaViNwSAS9tBxSpqvmVUsa5aCA5M3MYiO+fH9ifd7wbR65g/wq39D3Pc01 KnZ3BNVRW5XSA3c86pr8vbg/HOynUXK8TN0lzt6rEk8bjoPBNEDy5YWzy0t6reVe wX65F+ver8upjOKe9yl2Jsg+5Kcmy79GyYjLUY3TU2mZ+dIdScy/jIWatXe/OTKZ iB4Ctc4MDe9GDECmlPOWf98AXqsBUuKQiWKCN/OPxLtFOeWBvi4IzvFuO8QvnL9p jZcRDmIlIWAcDX/2wMnLjV+Hqi3EeReIrYznxTGnO7HHVInF555GP51vFaG5k+SN smJQAB0/sostmC1OCCqBKq5b6/li95/No7+0v0SUhJJ5o76AR5CcNsnolXesw1fu vTUkB/I66Hk= =dQKG -----END PGP SIGNATURE----- Merge tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic cleanups from Arnd Bergmann: "The asm-generic changes for 4.4 are mostly a series from Christoph Hellwig to clean up various abuses of headers in there. The patch to rename the io-64-nonatomic-.h headers caused some conflicts with new users, so I added a workaround that we can remove in the next merge window. The only other patch is a warning fix from Marek Vasut" * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: asm-generic: temporarily add back asm-generic/io-64-nonatomic.h asm-generic: cmpxchg: avoid warnings from macro-ized cmpxchg() implementations gpio-mxc: stop including <asm-generic/bug> n_tracesink: stop including <asm-generic/bug> n_tracerouter: stop including <asm-generic/bug> mlx5: stop including <asm-generic/kmap_types.h> hifn_795x: stop including <asm-generic/kmap_types.h> drbd: stop including <asm-generic/kmap_types.h> move count_zeroes.h out of asm-generic move io-64-nonatomic.h out of asm-generic	2015-11-06 14:22:15 -08:00
Linus Torvalds	ccf21b69a8	Merge branch 'for-4.4/reservations' of git://git.kernel.dk/linux-block Pull block reservation support from Jens Axboe: "This adds support for persistent reservations, both at the core level, as well as for sd and NVMe" [ Background from the docs: "Persistent Reservations allow restricting access to block devices to specific initiators in a shared storage setup. All implementations are expected to ensure the reservations survive a power loss and cover all connections in a multi path environment" ] * 'for-4.4/reservations' of git://git.kernel.dk/linux-block: NVMe: Precedence error in nvme_pr_clear() nvme: add missing endianess annotations in nvme_pr_command NVMe: Add persistent reservation ops sd: implement the Persistent Reservation API block: add an API for Persistent Reservations block: cleanup blkdev_ioctl	2015-11-04 21:01:27 -08:00
Linus Torvalds	527d1529e3	Merge branch 'for-4.4/integrity' of git://git.kernel.dk/linux-block Pull block integrity updates from Jens Axboe: ""This is the joint work of Dan and Martin, cleaning up and improving the support for block data integrity" * 'for-4.4/integrity' of git://git.kernel.dk/linux-block: block, libnvdimm, nvme: provide a built-in blk_integrity nop profile block: blk_flush_integrity() for bio-based drivers block: move blk_integrity to request_queue block: generic request_queue reference counting nvme: suspend i/o during runtime blk_integrity_unregister md: suspend i/o during runtime blk_integrity_unregister md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown block: Inline blk_integrity in struct gendisk block: Export integrity data interval size in sysfs block: Reduce the size of struct blk_integrity block: Consolidate static integrity profile properties block: Move integrity kobject to struct gendisk	2015-11-04 20:51:48 -08:00
Linus Torvalds	effa04cc5a	Merge branch 'for-4.4/lightnvm' of git://git.kernel.dk/linux-block Pull lightnvm support from Jens Axboe: "This adds support for lightnvm, and adds support to NVMe as well. This is pretty exciting, in that it enables new and interesting use cases for compatible flash devices. There's a LWN writeup about an earlier posting here: https://lwn.net/Articles/641247/ This has been underway for a while, and should be ready for merging at this point" * 'for-4.4/lightnvm' of git://git.kernel.dk/linux-block: nvme: lightnvm: clean up a data type lightnvm: refactor phys addrs type to u64 nvme: LightNVM support rrpc: Round-robin sector target with cost-based gc gennvm: Generic NVM manager lightnvm: Support for Open-Channel SSDs	2015-11-04 20:46:08 -08:00
Linus Torvalds	a9aa31cdc2	Merge branch 'for-4.4/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "Here are the block driver changes for 4.4. This pull request contains: - NVMe: - Refactor and moving of code to prepare for proper target support. From Christoph and Jay. - 32-bit nvme warning fix from Arnd. - Error initialization fix from me. - Proper namespace removal and reference counting support from Keith. - Device resume fix on IO failure, also from Keith. - Dependency fix from Keith, now that nvme isn't under the umbrella of the block anymore. - Target location and maintainers update from Jay. - From Ming Lei, the long awaited DIO/AIO support for loop. - Enable BD-RE writeable opens, from Georgios" * 'for-4.4/drivers' of git://git.kernel.dk/linux-block: (24 commits) Update target repo for nvme patch contributions NVMe: initialize error to '0' nvme: use an integer value to Linux errno values nvme: fix 32-bit build warning NVMe: Add explicit block config dependency nvme: include <linux/types.ĥ> in <linux/nvme.h> nvme: move to a new drivers/nvme/host directory nvme.h: add missing nvme_id_ctrl endianess annotations nvme: move hardware structures out of the uapi version of nvme.h nvme: add a local nvme.h header nvme: properly handle partially initialized queues in nvme_create_io_queues nvme: merge nvme_dev_start, nvme_dev_resume and nvme_async_probe nvme: factor reset code into a common helper nvme: merge nvme_dev_reset into nvme_reset_failed_dev nvme: delete dev from dev_list in nvme_reset NVMe: Simplify device resume on io queue failure NVMe: Namespace removal simplifications NVMe: Reference count open namespaces cdrom: Random writing support for BD-RE media block: loop: support DIO & AIO ...	2015-11-04 20:37:27 -08:00
Dan Carpenter	73fcf4e20e	NVMe: Precedence error in nvme_pr_clear() The original code is equivalent to: u32 cdw10 = (1 \| key) ? 1 << 3 : 0; But we want: u32 cdw10 = 1 \| (key ? 1 << 3 : 0); Fixes: 1d277a637a71: ('NVMe: Add persistent reservation ops') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-11-03 12:59:03 -07:00
Matias Bjørling	ca0640850e	nvme: LightNVM support The first generation of Open-Channel SSDs is based on NVMe. The NVMe driver is extended with support for the LightNVM command set. Detection is made through PCI IDs. Current supported devices are the qemu nvme simulator and CNEX Labs Westlake SSD. The qemu nvme enables support through vendor specific bits in the namespace identification and the CNEX Labs Westlake SSD implements a LightNVM compatible firmware and is detected using the same method as qemu. After detection, vendor specific codes are used to identify the device and enumerate supported features. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Javier González <jg@lightnvm.io> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-29 17:57:29 +09:00
Christoph Hellwig	a6dd1020d8	nvme: add missing endianess annotations in nvme_pr_command Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: ad4fd3610c27 ("NVMe: Add persistent reservation ops") Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-22 09:15:48 -06:00
Keith Busch	1d277a637a	NVMe: Add persistent reservation ops Signed-off-by: Keith Busch <keith.busch@intel.com> [hch: rebased, set PTPL=1] Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-22 09:15:48 -06:00
Dan Williams	4125a09b0a	block, libnvdimm, nvme: provide a built-in blk_integrity nop profile The libnvidmm-btt and nvme drivers use blk_integrity to reserve space for per-sector metadata, but sometimes without protection checksums. This property is generically useful, so teach the block core to internally specify a nop profile if one is not provided at registration time. Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Suggested-by: Christoph Hellwig <hch@lst.de> [hch: kill the local nvme nop profile as well] Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:43:45 -06:00
Dan Williams	ac6fc48c9f	block: move blk_integrity to request_queue A trace like the following proceeds a crash in bio_integrity_process() when it goes to use an already freed blk_integrity profile. BUG: unable to handle kernel paging request at ffff8800d31b10d8 IP: [<ffff8800d31b10d8>] 0xffff8800d31b10d8 PGD 2f65067 PUD 21fffd067 PMD 80000000d30001e3 Oops: 0011 [#1] SMP Dumping ftrace buffer: --------------------------------- ndctl-2222 2.... 44526245us : disk_release: pmem1s systemd--2223 4.... 44573945us : bio_integrity_endio: pmem1s <...>-409 4.... 44574005us : bio_integrity_process: pmem1s --------------------------------- [..] Call Trace: [<ffffffff8144e0f9>] ? bio_integrity_process+0x159/0x2d0 [<ffffffff8144e4f6>] bio_integrity_verify_fn+0x36/0x60 [<ffffffff810bd2dc>] process_one_work+0x1cc/0x4e0 Given that a request_queue is pinned while i/o is in flight and that a gendisk is allowed to have a shorter lifetime, move blk_integrity to request_queue to satisfy requests arriving after the gendisk has been torn down. Cc: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> [martin: fix the CONFIG_BLK_DEV_INTEGRITY=n case] Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:43:42 -06:00
Dan Williams	4cfc766e07	nvme: suspend i/o during runtime blk_integrity_unregister Synchronize pending i/o against a change in the integrity profile to avoid the possibility of spurious integrity errors. Cc: Matthew Wilcox <willy@linux.intel.com> Acked-by: Keith Busch <keith.busch@intel.com> [keith: also protect dynamic integrity registration] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:43:39 -06:00
Dan Williams	9609b9942b	md, dm, scsi, nvme, libnvdimm: drop blk_integrity_unregister() at shutdown Now that the integrity profile is statically allocated there is no work to do when shutting down an integrity enabled block device. Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: James Bottomley <JBottomley@Odin.com> Acked-by: NeilBrown <neilb@suse.com> Acked-by: Keith Busch <keith.busch@intel.com> Acked-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:43:37 -06:00
Martin K. Petersen	25520d55cd	block: Inline blk_integrity in struct gendisk Up until now the_integrity profile has been dynamically allocated and attached to struct gendisk after the disk has been made active. This causes problems because NVMe devices need to register the profile prior to the partition table being read due to a mandatory metadata buffer requirement. In addition, DM goes through hoops to deal with preallocating, but not initializing integrity profiles. Since the integrity profile is small (4 bytes + a pointer), Christoph suggested moving it to struct gendisk proper. This requires several changes: - Moving the blk_integrity definition to genhd.h. - Inlining blk_integrity in struct gendisk. - Removing the dynamic allocation code. - Adding helper functions which allow gendisk to set up and tear down the integrity sysfs dir when a disk is added/deleted. - Adding a blk_integrity_revalidate() callback for updating the stable pages bdi setting. - The calls that depend on whether a device has an integrity profile or not now key off of the bi->profile pointer. - Simplifying the integrity support routines in DM (Mike Snitzer). Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:42:42 -06:00
Martin K. Petersen	0f8087ecde	block: Consolidate static integrity profile properties We previously made a complete copy of a device's data integrity profile even though several of the fields inside the blk_integrity struct are pointers to fixed template entries in t10-pi.c. Split the static and per-device portions so that we can reference the template directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Reported-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-21 14:42:38 -06:00
Jens Axboe	ef658fc2a6	NVMe: initialize error to '0' Reported-by: Keith Busch <keith.busch@intel.com> Fixes: `1951feae88` ("nvme: use an integer value to Linux errno values") Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-15 09:49:57 -06:00
Christoph Hellwig	1951feae88	nvme: use an integer value to Linux errno values Use a separate integer variable to hold the signed Linux errno values we pass back to the block layer. Note that for pass through commands those might still be NVMe values, but those fit into the int as well. Fixes: f4829a9b7a61: ("blk-mq: fix racy updates of rq->errors") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-15 09:04:36 -06:00
Arnd Bergmann	3d42e67fe5	nvme: fix 32-bit build warning Compiling the nvme driver on 32-bit warns about a cast from a __u64 variable to a pointer: drivers/block/nvme-core.c: In function 'nvme_submit_io': drivers/block/nvme-core.c:1847:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] (void __user *)io.addr, length, NULL, 0); The cast here is intentional and safe, so we can shut up the gcc warning by adding an intermediate cast to 'uintptr_t'. I had previously submitted a patch to fix this problem in the nvme driver, but it was accepted on the same day that two new warnings got added. For clarification, I also change the third instance of this cast to use uintptr_t instead of unsigned long now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `d29ec8241c` ("nvme: submit internal commands through the block layer") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-12 13:09:15 -06:00
Jay Sternberg	57dacad5f2	nvme: move to a new drivers/nvme/host directory This patch moves the NVMe driver from drivers/block/ to its own new drivers/nvme/host/ directory. This is in preparation of splitting the current monolithic driver up and add support for the upcoming NVMe over Fabrics standard. The drivers/nvme/host/ is chose to leave space for a NVMe target implementation in addition to this host side driver. Signed-off-by: Jay Sternberg <jay.e.sternberg@intel.com> [hch: rebased, renamed core.c to pci.c, slight tweaks] Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2015-10-09 10:40:37 -06:00

... 4 5 6 7 8 ...

455 Commits