OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Sagi Grimberg	1a9460cef5	nvme-tcp: support simple polling Simple polling support via socket busy_poll interface. Although we do not shutdown interrupts but simply hammer the socket poll, we can sometimes find completions faster than the normal interrupt driven RX path. We add per queue nr_cqe counter that resets every time RX path is invoked such that .poll callback can return it to stay consistent with the semantics. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:01 -07:00
Minwoo Im	79fd751d61	nvme: tcp: selects CRYPTO_CRC32C for nvme-tcp The tcp host module is now taking those APIs from crypto ahash: (1) crypto_ahash_final() (2) crypto_ahash_digest() (3) crypto_alloc_ahash() nvme-tcp should depends on CRYPTO_CRC32C. Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Cc: Jens Axboe <axboe@fb.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:01 -07:00
Sagi Grimberg	b5b0504878	nvme: don't pass cap to nvme_disable_ctrl All seem to call it with ctrl->cap so no need to pass it at all. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Sagi Grimberg	c0f2f45be2	nvme: move sqsize setting to the core nvme_enable_ctrl reads the cap register right after, so no need to do that locally in the transport driver. Have sqsize setting in nvme_init_identify. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Sagi Grimberg	aa22c8e665	nvme-pci: set ctrl sqsize to the device q_depth Align with what the rest of the transports are doing. Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Sagi Grimberg	4fba445828	nvme: have nvme_init_identify set ctrl->cap No need to use a stack cap variable. Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Potnuri Bharat Teja	10407ec9b4	nvme-tcp: Use protocol specific operations while reading socket Using socket specific read_sock() calls instead of directly calling tcp_read_sock() helps lld module registered handlers if any, to be called from nvme-tcp host. This patch therefore replaces the tcp_read_sock() with socket specific prot_ops. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Sagi Grimberg	6be182607d	nvme-tcp: cleanup nvme_tcp_recv_pdu Can return directly in the switch statement Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2019-08-29 12:55:00 -07:00
Hans Holmberg	48e5da7255	lightnvm: move metadata mapping to lower level driver Now that blk_rq_map_kern can map both kmem and vmem, move internal metadata mapping down to the lower level driver. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-06 08:20:10 -06:00
Hans Holmberg	98d87f70f4	lightnvm: remove nvm_submit_io_sync_fn Move the redundant sync handling interface and wait for a completion in the lightnvm core instead. Reviewed-by: Javier González <javier@javigon.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Hans Holmberg <hans@owltronix.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-06 08:20:09 -06:00
Ming Lei	a87ccce0b5	blk-mq: remove blk_mq_complete_request_sync blk_mq_tagset_wait_completed_request() has been applied for waiting for completed request's fn, so not necessary to use blk_mq_complete_request_sync() any more. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-04 21:41:29 -06:00
Ming Lei	622b8b6893	nvme: wait until all completed request's complete fn is called When aborting in-flight request for recovering controller, we have to make sure that queue's complete function is called on completed request before moving on. Otherwise, for example, the warning of WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be triggered on nvme-rdma. Fix this issue by using blk_mq_tagset_wait_completed_request. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-04 21:41:29 -06:00
Ming Lei	78ca407247	nvme: don't abort completed request in nvme_cancel_request Before aborting in-flight requests, all IO queues and their interrupts have been shutdown. However, request's completion function may not be done yet because it can be scheduled to run via IPI. So don't abort one request if it is marked as completed, otherwise we may abort one normal completed request. Cc: Max Gurtovoy <maxg@mellanox.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-08-04 21:41:29 -06:00
yangerkun	8fe34be14e	Revert "nvme-pci: don't create a read hctx mapping without read queues" This reverts commit `0298d54352`. With this patch, set 'poll_queues > hard queues' will lead to 'nr_read_queues = 0' in nvme_calc_irq_sets. Then poll_queues setting can fail since dev->tagset.nr_maps equals to 2 and nvme_pci_map_queues will not do map for poll queues. Signed-off-by: yangerkun <yangerkun@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-23 17:47:02 +02:00
Marta Rybczynska	66b20ac0a1	nvme: fix multipath crash when ANA is deactivated Fix a crash with multipath activated. It happends when ANA log page is larger than MDTS and because of that ANA is disabled. The driver then tries to access unallocated buffer when connecting to a nvme target. The signature is as follows: [ 300.433586] nvme nvme0: ANA log page size (8208) larger than MDTS (8192). [ 300.435387] nvme nvme0: disabling ANA support. [ 300.437835] nvme nvme0: creating 4 I/O queues. [ 300.459132] nvme nvme0: new ctrl: NQN "nqn.0.0.0", addr 10.91.0.1:8009 [ 300.464609] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 300.466342] #PF error: [normal kernel read fault] [ 300.467385] PGD 0 P4D 0 [ 300.467987] Oops: 0000 [#1] SMP PTI [ 300.468787] CPU: 3 PID: 50 Comm: kworker/u8:1 Not tainted 5.0.20kalray+ #4 [ 300.470264] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 300.471532] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 300.472724] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core] [ 300.474038] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48 [ 300.477374] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296 [ 300.478334] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000 [ 300.479784] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258 [ 300.481488] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044 [ 300.483203] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0 [ 300.484928] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0 [ 300.486626] FS: 0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000 [ 300.488538] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 300.489907] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0 [ 300.491612] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 300.493303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 300.494991] Call Trace: [ 300.495645] nvme_mpath_add_disk+0x5c/0xb0 [nvme_core] [ 300.496880] nvme_validate_ns+0x2ef/0x550 [nvme_core] [ 300.498105] ? nvme_identify_ctrl.isra.45+0x6a/0xb0 [nvme_core] [ 300.499539] nvme_scan_work+0x2b4/0x370 [nvme_core] [ 300.500717] ? __switch_to_asm+0x35/0x70 [ 300.501663] process_one_work+0x171/0x380 [ 300.502340] worker_thread+0x49/0x3f0 [ 300.503079] kthread+0xf8/0x130 [ 300.503795] ? max_active_store+0x80/0x80 [ 300.504690] ? kthread_bind+0x10/0x10 [ 300.505502] ret_from_fork+0x35/0x40 [ 300.506280] Modules linked in: nvme_tcp nvme_rdma rdma_cm iw_cm ib_cm ib_core nvme_fabrics nvme_core xt_physdev ip6table_raw ip6table_mangle ip6table_filter ip6_tables xt_comment iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_CHECKSUM iptable_mangle iptable_filter veth ebtable_filter ebtable_nat ebtables iptable_raw vxlan ip6_udp_tunnel udp_tunnel sunrpc joydev pcspkr virtio_balloon br_netfilter bridge stp llc ip_tables xfs libcrc32c ata_generic pata_acpi virtio_net virtio_console net_failover virtio_blk failover ata_piix serio_raw libata virtio_pci virtio_ring virtio [ 300.514984] CR2: 0000000000000008 [ 300.515569] ---[ end trace faa2eefad7e7f218 ]--- [ 300.516354] RIP: 0010:nvme_parse_ana_log+0x21/0x140 [nvme_core] [ 300.517330] Code: 45 01 d2 d8 48 98 c3 66 90 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 89 fb 48 83 ec 08 48 8b af 20 0a 00 00 48 89 34 24 <66> 83 7d 08 00 0f 84 c6 00 00 00 44 8b 7d 14 49 89 d5 8b 55 10 48 [ 300.520353] RSP: 0018:ffffa50e80fd7cb8 EFLAGS: 00010296 [ 300.521229] RAX: 0000000000000001 RBX: ffff9130f1872258 RCX: 0000000000000000 [ 300.522399] RDX: ffffffffc06c4c30 RSI: ffff9130edad4280 RDI: ffff9130f1872258 [ 300.523560] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000044 [ 300.524734] R10: 0000000000000220 R11: 0000000000000040 R12: ffff9130f18722c0 [ 300.525915] R13: ffff9130f18722d0 R14: ffff9130edad4280 R15: ffff9130f18722c0 [ 300.527084] FS: 0000000000000000(0000) GS:ffff9130f7b80000(0000) knlGS:0000000000000000 [ 300.528396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 300.529440] CR2: 0000000000000008 CR3: 00000002365e6000 CR4: 00000000000006e0 [ 300.530739] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 300.531989] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 300.533264] Kernel panic - not syncing: Fatal exception [ 300.534338] Kernel Offset: 0x17c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 300.536227] ---[ end Kernel panic - not syncing: Fatal exception ]--- Condition check refactoring from Christoph Hellwig. Signed-off-by: Marta Rybczynska <marta.rybczynska@kalray.eu> Tested-by: Jean-Baptiste Riaux <jbriaux@kalray.eu> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-23 17:46:57 +02:00
Logan Gunthorpe	e654dfd38c	nvme: fix memory leak caused by incorrect subsystem free When freeing the subsystem after finding another match with __nvme_find_get_subsystem(), use put_device() instead of __nvme_release_subsystem() which calls kfree() directly. Per the documentation, put_device() should always be used after device_initialization() is called. Otherwise, leaks like the one below which was detected by kmemleak may occur. Once the call of __nvme_release_subsystem() is removed it no longer makes sense to keep the helper, so fold it back into nvme_release_subsystem(). unreferenced object 0xffff8883d12bfbc0 (size 16): comm "nvme", pid 2635, jiffies 4294933602 (age 739.952s) hex dump (first 16 bytes): 6e 76 6d 65 2d 73 75 62 73 79 73 32 00 88 ff ff nvme-subsys2.... backtrace: [<000000007d8fc208>] __kmalloc_track_caller+0x16d/0x2a0 [<0000000081169e5f>] kvasprintf+0xad/0x130 [<0000000025626f25>] kvasprintf_const+0x47/0x120 [<00000000fa66ad36>] kobject_set_name_vargs+0x44/0x120 [<000000004881f8b3>] dev_set_name+0x98/0xc0 [<000000007124dae3>] nvme_init_identify+0x1995/0x38e0 [<000000009315020a>] nvme_loop_configure_admin_queue+0x4fa/0x5e0 [<000000001a63e766>] nvme_loop_create_ctrl+0x489/0xf80 [<00000000a46ecc23>] nvmf_dev_write+0x1a12/0x2220 [<000000002259b3d5>] __vfs_write+0x66/0x120 [<000000002f6df81e>] vfs_write+0x154/0x490 [<000000007e8cfc19>] ksys_write+0x10a/0x240 [<00000000ff5c7b85>] __x64_sys_write+0x73/0xb0 [<00000000fee6d692>] do_syscall_64+0xaa/0x470 [<00000000997e1ede>] entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: `ab9e00cc72` ("nvme: track subsystems") Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-23 17:46:43 +02:00
Misha Nasledov	08b903b5fd	nvme: ignore subnqn for ADATA SX6000LNP The ADATA SX6000LNP NVMe SSDs have the same subnqn and, due to this, a system with more than one of these SSDs will only have one usable. [ 0.942706] nvme nvme1: ignoring ctrl due to duplicate subnqn (nqn.2018-05.com.example:nvme:nvm-subsystem-OUI00E04C). [ 0.943017] nvme nvme1: Removing after probe failure status: -22 02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01) 71:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. Device [10ec:5762] (rev 01) There are no firmware updates available from the vendor, unfortunately. Applying the NVME_QUIRK_IGNORE_DEV_SUBNQN quirk for these SSDs resolves the issue, and they all work after this patch: /dev/nvme0n1 2J1120050420 ADATA SX6000LNP [...] /dev/nvme1n1 2J1120050540 ADATA SX6000LNP [...] Signed-off-by: Misha Nasledov <misha@nasledov.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-23 17:46:43 +02:00
Linus Torvalds	9637d51734	for-linus-20190715 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl0s1ZEQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiCEEACE9H/pXoegTTWIVPVajMlsa19UHIeilk4N GI7oKSiirQEMZnAOmrEzgB4/0zyYQsVypys0gZlYUD3GJVsXDT3zzjNXL5NpVg/O nqwSGWMHBSjWkLbaM40Pb2QLXsYgveptNL+9PtxrgtoYPoT5/+TyrJMFrRfi72EK WFeNDKOu6aJxpJ26JSsckJ0gluKeeEpRoEqsgHGIwaMIGHQf+b+ikk7tel5FAIgA uDwwD+Oxsdgh/ChsXL0d90GkcbcSp6GQ7GybxVmw/tPijx6mpeIY72xY3Zx+t8zF b71UNk6NmCKjOPO/6fiuYKKTYw+KhzlyEKO0j675HKfx2AhchEwKw0irp4yUlydA zxWYmz4U7iRgktJtymv3J4FEQQ3S6d1EnuQkQNX1LwiOsEsfzhkWi+7jy7KFhZoJ AqtYzqnOXvLx92q0vloj06HtK6zo+I/MINldy0+qn9lq0N0VF+dctyztAHLsF7P6 pUtS6i7l1JSFKAmMhC31sIj5TImaehM2e/TWMUPEDZaO96oKCmQwOF1oiloc6vlW h4xWsxP/9zOFcWNyPzy6Vo3JUXWRvFA7K+jV3Hsukw6rVHiNCGVYGSlTv8Roi5b7 I4ggu9R2JOGyku7UIlL50IRxEyjAp11LaO8yHhcCnRB65rmyBuNMQNcfOsfxpZ5Y 1mtSNhm5TQ== =g8xI -----END PGP SIGNATURE----- Merge tag 'for-linus-20190715' of git://git.kernel.dk/linux-block Pull more block updates from Jens Axboe: "A later pull request with some followup items. I had some vacation coming up to the merge window, so certain things items were delayed a bit. This pull request also contains fixes that came in within the last few days of the merge window, which I didn't want to push right before sending you a pull request. This contains: - NVMe pull request, mostly fixes, but also a few minor items on the feature side that were timing constrained (Christoph et al) - Report zones fixes (Damien) - Removal of dead code (Damien) - Turn on cgroup psi memstall (Josef) - block cgroup MAINTAINERS entry (Konstantin) - Flush init fix (Josef) - blk-throttle low iops timing fix (Konstantin) - nbd resize fixes (Mike) - nbd 0 blocksize crash fix (Xiubo) - block integrity error leak fix (Wenwen) - blk-cgroup writeback and priority inheritance fixes (Tejun)" * tag 'for-linus-20190715' of git://git.kernel.dk/linux-block: (42 commits) MAINTAINERS: add entry for block io cgroup null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONED block: Limit zone array allocation size sd_zbc: Fix report zones buffer allocation block: Kill gfp_t argument of blkdev_report_zones() block: Allow mapping of vmalloc-ed buffers block/bio-integrity: fix a memory leak bug nvme: fix NULL deref for fabrics options nbd: add netlink reconfigure resize support nbd: fix crash when the blksize is zero block: Disable write plugging for zoned block devices block: Fix elevator name declaration block: Remove unused definitions nvme: fix regression upon hot device removal and insertion blk-throttle: fix zero wait time for iops throttled group block: Fix potential overflow in blk_report_zones() blkcg: implement REQ_CGROUP_PUNT blkcg, writeback: Implement wbc_blkcg_css() blkcg, writeback: Add wbc->no_cgroup_owner blkcg, writeback: Rename wbc_account_io() to wbc_account_cgroup_owner() ...	2019-07-15 21:20:52 -07:00
Linus Torvalds	2a3c389a0f	5.3 Merge window RDMA pull request A smaller cycle this time. Notably we see another new driver, 'Soft iWarp', and the deletion of an ancient unused driver for nes. - Revise and simplify the signature offload RDMA MR APIs - More progress on hoisting object allocation boiler plate code out of the drivers - Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib, i40iw - Tree wide cleanups: struct_size, put_user_page, xarray, rst doc conversion - Removal of obsolete ib_ucm chardev and nes driver - netlink based discovery of chardevs and autoloading of the modules providing them - Move more of the rdamvt/hfi1 uapi to include/uapi/rdma - New driver 'siw' for software based iWarp running on top of netdev, much like rxe's software RoCE. - mlx5 feature to report events in their raw devx format to userspace - Expose per-object counters through rdma tool - Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core from netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl0ozSwACgkQOG33FX4g mxqncg//Qe2zSnlbd6r3hofsc1WiHSx/CiXtT52BUGipO+cWQUwO7hGFuUHIFCuZ JBg7mc998xkyLIH85a/txd+RwAIApKgHVdd+VlrmybZeYCiERAMFpWg8cHpzrbnw l3Ln9fTtJf/NAhO0ZCGV9DCd01fs9yVQgAv21UnLJMUhp9Pzk/iMhu7C7IiSLKvz t7iFhEqPXNJdoqZ+wtWyc/463YxKUd9XNg9Z1neQdaeZrX4UjgDbY9x/ub3zOvQV jc/IL4GysJ3z8mfx5mAd6sE/jAjhcnJuaGYYATqkxiLZEP+muYwU50CNs951XhJC b/EfRQIcLg9kq/u6CP+CuWlMrRWy3U7yj3/mrbbGhlGq88Yt6FGqUf0aFy6TYMaO RzTG5ZR+0AmsOrR1QU+DbH9CKX5PGZko6E7UCdjROqUlAUOjNwRr99O5mYrZoM9E PdN2vtdWY9COR3Q+7APdhWIA/MdN2vjr3LDsR3H94tru1yi6dB/BPDRcJieozaxn 2T+YrZbV+9/YgrccpPQCilaQdanXKpkmbYkbEzVLPcOEV/lT9odFDt3eK+6duVDL ufu8fs1xapMDHKkcwo5jeNZcoSJymAvHmGfZlo2PPOmh802Ul60bvYKwfheVkhHF Eee5/ovCMs1NLqFiq7Zq5mXO0fR0BHyg9VVjJBZm2JtazyuhoHQ= =iWcG -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "A smaller cycle this time. Notably we see another new driver, 'Soft iWarp', and the deletion of an ancient unused driver for nes. - Revise and simplify the signature offload RDMA MR APIs - More progress on hoisting object allocation boiler plate code out of the drivers - Driver bug fixes and revisions for hns, hfi1, efa, cxgb4, qib, i40iw - Tree wide cleanups: struct_size, put_user_page, xarray, rst doc conversion - Removal of obsolete ib_ucm chardev and nes driver - netlink based discovery of chardevs and autoloading of the modules providing them - Move more of the rdamvt/hfi1 uapi to include/uapi/rdma - New driver 'siw' for software based iWarp running on top of netdev, much like rxe's software RoCE. - mlx5 feature to report events in their raw devx format to userspace - Expose per-object counters through rdma tool - Adaptive interrupt moderation for RDMA (DIM), sharing the DIM core from netdev" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (194 commits) RMDA/siw: Require a 64 bit arch RDMA/siw: Mark expected switch fall-throughs RDMA/core: Fix -Wunused-const-variable warnings rdma/siw: Remove set but not used variable 's' rdma/siw: Add missing dependencies on LIBCRC32C and DMA_VIRT_OPS RDMA/siw: Add missing rtnl_lock around access to ifa rdma/siw: Use proper enumerated type in map_cqe_status RDMA/siw: Remove unnecessary kthread create/destroy printouts IB/rdmavt: Fix variable shadowing issue in rvt_create_cq RDMA/core: Fix race when resolving IP address RDMA/core: Make rdma_counter.h compile stand alone IB/core: Work on the caller socket net namespace in nldev_newlink() RDMA/rxe: Fill in wc byte_len with IB_WC_RECV_RDMA_WITH_IMM RDMA/mlx5: Set RDMA DIM to be enabled by default RDMA/nldev: Added configuration of RDMA dynamic interrupt moderation to netlink RDMA/core: Provide RDMA DIM support for ULPs linux/dim: Implement RDMA adaptive moderation (DIM) IB/mlx5: Report correctly tag matching rendezvous capability docs: infiniband: add it to the driver-api bookset IB/mlx5: Implement VHCA tunnel mechanism in DEVX ...	2019-07-15 20:38:15 -07:00
Linus Torvalds	1f7563f743	SCSI sg on 20190709 This topic branch covers a fundamental change in how our sg lists are allocated to make mq more efficient by reducing the size of the preallocated sg list. This necessitates a large number of driver changes because the previous guarantee that if a driver specified SG_ALL as the size of its scatter list, it would get a non-chained list and didn't need to bother with scatterlist iterators is now broken and every driver must use scatterlist iterators. This was broken out as a separate topic because we need to convert all the drivers before pulling the trigger and unconverted drivers kept being found, necessitating a rebase. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXSTzzCYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZB+AP9I8j/s wWfg0Z3WNuf4D5I3rH4x1J3cQTqPJed+RjwgcQEA1gZvtOTg1ZEn/CYMVnaB92x0 t6MZSchIaFXeqfD+E7U= =cv8o -----END PGP SIGNATURE----- Merge tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI scatter-gather list updates from James Bottomley: "This topic branch covers a fundamental change in how our sg lists are allocated to make mq more efficient by reducing the size of the preallocated sg list. This necessitates a large number of driver changes because the previous guarantee that if a driver specified SG_ALL as the size of its scatter list, it would get a non-chained list and didn't need to bother with scatterlist iterators is now broken and every driver must use scatterlist iterators. This was broken out as a separate topic because we need to convert all the drivers before pulling the trigger and unconverted drivers kept being found, necessitating a rebase" * tag 'scsi-sg' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (21 commits) scsi: core: don't preallocate small SGL in case of NO_SG_CHAIN scsi: lib/sg_pool.c: clear 'first_chunk' in case of no preallocation scsi: core: avoid preallocating big SGL for data scsi: core: avoid preallocating big SGL for protection information scsi: lib/sg_pool.c: improve APIs for allocating sg pool scsi: esp: use sg helper to iterate over scatterlist scsi: NCR5380: use sg helper to iterate over scatterlist scsi: wd33c93: use sg helper to iterate over scatterlist scsi: ppa: use sg helper to iterate over scatterlist scsi: pcmcia: nsp_cs: use sg helper to iterate over scatterlist scsi: imm: use sg helper to iterate over scatterlist scsi: aha152x: use sg helper to iterate over scatterlist scsi: s390: zfcp_fc: use sg helper to iterate over scatterlist scsi: staging: unisys: visorhba: use sg helper to iterate over scatterlist scsi: usb: image: microtek: use sg helper to iterate over scatterlist scsi: pmcraid: use sg helper to iterate over scatterlist scsi: ipr: use sg helper to iterate over scatterlist scsi: mvumi: use sg helper to iterate over scatterlist scsi: lpfc: use sg helper to iterate over scatterlist scsi: advansys: use sg helper to iterate over scatterlist ...	2019-07-11 15:17:41 -07:00
Minwoo Im	7d30c81b80	nvme: fix NULL deref for fabrics options git://git.infradead.org/nvme.git nvme-5.3 branch now causes the following NULL deref oops. Check the ctrl->opts first before the deref. [ 16.337581] BUG: kernel NULL pointer dereference, address: 0000000000000056 [ 16.338551] #PF: supervisor read access in kernel mode [ 16.338551] #PF: error_code(0x0000) - not-present page [ 16.338551] PGD 0 P4D 0 [ 16.338551] Oops: 0000 [#1] SMP PTI [ 16.338551] CPU: 2 PID: 1035 Comm: kworker/u16:5 Not tainted 5.2.0-rc6+ #1 [ 16.338551] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014 [ 16.338551] Workqueue: nvme-wq nvme_scan_work [nvme_core] [ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core] [ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf [ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283 [ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007 [ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720 [ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840 [ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8 [ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001 [ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000 [ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0 [ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.338551] Call Trace: [ 16.338551] nvme_scan_work+0x2c0/0x340 [nvme_core] [ 16.338551] ? __switch_to_asm+0x40/0x70 [ 16.338551] ? _raw_spin_unlock_irqrestore+0x18/0x30 [ 16.338551] ? try_to_wake_up+0x408/0x450 [ 16.338551] process_one_work+0x20b/0x3e0 [ 16.338551] worker_thread+0x1f9/0x3d0 [ 16.338551] ? cancel_delayed_work+0xa0/0xa0 [ 16.338551] kthread+0x117/0x120 [ 16.338551] ? kthread_stop+0xf0/0xf0 [ 16.338551] ret_from_fork+0x3a/0x50 [ 16.338551] Modules linked in: nvme nvme_core [ 16.338551] CR2: 0000000000000056 [ 16.338551] ---[ end trace b9bf761a93e62d84 ]--- [ 16.338551] RIP: 0010:nvme_validate_ns+0xc9/0x7e0 [nvme_core] [ 16.338551] Code: c0 49 89 c5 0f 84 00 07 00 00 48 8b 7b 58 e8 be 48 39 c1 48 3d 00 f0 ff ff 49 89 45 18 0f 87 a4 06 00 00 48 8b 93 70 0a 00 00 <80> 7a 56 00 74 0c 48 8b 40 68 83 48 3c 08 49 8b 45 18 48 89 c6 bf [ 16.338551] RSP: 0018:ffffc900024c7d10 EFLAGS: 00010283 [ 16.338551] RAX: ffff888135a30720 RBX: ffff88813a4fd1f8 RCX: 0000000000000007 [ 16.338551] RDX: 0000000000000000 RSI: ffffffff8256dd38 RDI: ffff888135a30720 [ 16.338551] RBP: 0000000000000001 R08: 0000000000000007 R09: ffff88813aa6a840 [ 16.338551] R10: 0000000000000001 R11: 000000000002d060 R12: ffff88813a4fd1f8 [ 16.338551] R13: ffff88813a77f800 R14: ffff88813aa35180 R15: 0000000000000001 [ 16.338551] FS: 0000000000000000(0000) GS:ffff88813ba80000(0000) knlGS:0000000000000000 [ 16.338551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.338551] CR2: 0000000000000056 CR3: 000000000240a002 CR4: 0000000000360ee0 [ 16.338551] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.338551] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: `958f2a0f81` ("nvme-tcp: set the STABLE_WRITES flag when data digests are enabled") Cc: Christoph Hellwig <hch@lst.de> Cc: Keith Busch <kbusch@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2019-07-11 16:02:26 -06:00
Sagi Grimberg	420dc733f9	nvme: fix regression upon hot device removal and insertion When we validate the new controller id, we want to skip controllers that are either deleting or dead. Fix the check to do that and not on the newly added controller. Fixes: `1b1031ca63` ("nvme: validate cntlid during controller initialisation") Reported-by: Jon Derrick <jonathan.derrick@intel.com> Tested-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-10 09:36:16 -07:00
James Smart	4c73cbdff1	nvme-fc: fix module unloads while lports still pending Current code allows the module to be unloaded even if there are pending data structures, such as localports and controllers on the localports, that have yet to hit their reference counting to remove them. Fix by having exit entrypoint explicitly delete every controller, which in turn will remove references on the remoteports and localports causing them to be deleted as well. The exit entrypoint, after initiating the deletes, will wait for the last localport to be deleted before continuing. Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:37:05 -07:00
Mikhail Skorzhinskii	37c1521959	nvme-tcp: don't use sendpage for SLAB pages According to commit `a10674bf24` ("tcp: detecting the misuse of .sendpage for Slab objects") and previous discussion, tcp_sendpage should not be used for pages that is managed by SLAB, as SLAB is not taking page reference counters into consideration. Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:18:09 -07:00
Mikhail Skorzhinskii	958f2a0f81	nvme-tcp: set the STABLE_WRITES flag when data digests are enabled There was a few false alarms sighted on target side about wrong data digest while performing high throughput load to XFS filesystem shared through NVMoF TCP. This flag tells the rest of the kernel to ensure that the data buffer does not change while the write is in flight. It incurs a performance penalty, so only enable it when it is actually needed, i.e. when we are calculating data digests. Although even with this change in place, ext2 users can steel experience false positives, as ext2 is not respecting this flag. This may be apply to vfat as well. Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com> Signed-off-by: Mike Playle <mplayle@solarflare.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:18:09 -07:00
Hannes Reinecke	04e70bd4a0	nvme-multipath: do not select namespaces which are about to be removed nvme_ns_remove() will first set the NVME_NS_REMOVING flag before removing it from the list at the very last step. So to avoid selecting a namespace in nvme_find_path() which is about to be removed check the NVME_NS_REMOVING flag, too, when selecting a new path. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:18:03 -07:00
Hannes Reinecke	2032d07471	nvme-multipath: also check for a disabled path if there is a single sibling When we have a singular list in nvme_round_robin_path() we still need to check its validity. Signed-off-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:17:21 -07:00
Hannes Reinecke	ca7ae5c966	nvme-multipath: factor out a nvme_path_is_disabled helper Factor our a common helper to check if a path has been disabled by something other than the per-namespace ANA state. Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: split from a bigger patch] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:16:38 -07:00
Bart Van Assche	81adb86334	nvme: set physical block size and optimal I/O size >From the NVMe 1.4 spec: NSFEAT bit 4 if set to 1: indicates that the fields NPWG, NPWA, NPDG, NPDA, and NOWS are defined for this namespace and should be used by the host for I/O optimization; [ ... ] Namespace Preferred Write Granularity (NPWG): This field indicates the smallest recommended write granularity in logical blocks for this namespace. This is a 0's based value. The size indicated should be less than or equal to Maximum Data Transfer Size (MDTS) that is specified in units of minimum memory page size. The value of this field may change if the namespace is reformatted. The size should be a multiple of Namespace Preferred Write Alignment (NPWA). Refer to section 8.25 for how this field is utilized to improve performance and endurance. [ ... ] Each Write, Write Uncorrectable, or Write Zeroes commands should address a multiple of Namespace Preferred Write Granularity (NPWG) (refer to Figure 245) and Stream Write Size (SWS) (refer to Figure 515) logical blocks (as expressed in the NLB field), and the SLBA field of the command should be aligned to Namespace Preferred Write Alignment (NPWA) (refer to Figure 245) for best performance. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:15:37 -07:00
Tom Wu	4c0181bf6c	nvme-trace: add delete completion and submission queue to admin cmds tracer The trace log for 'delete I/O submission queue' and 'delete I/O completion queue' command will look like as below: kworker/u49:1-3438 [003] .... 6693.070865: nvme_setup_cmd: nvme0: qid=0, cmdid=11, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_sq sqid=1) kworker/u49:1-3438 [003] .... 6693.071171: nvme_setup_cmd: nvme0: qid=0, cmdid=8, nsid=0, flags=0x0, meta=0x0, cmd=(nvme_admin_delete_cq cqid=24) Signed-off-by: Tom Wu <tomwu@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 14:15:37 -07:00
Colin Ian King	91f6d79853	nvme-trace: fix spelling mistake "spcecific" -> "specific" There are two spelling mistakes in trace_seq_printf messages, fix these. Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 13:44:45 -07:00
Christoph Hellwig	7637de311b	nvme-pci: limit max_hw_sectors based on the DMA max mapping size When running a NVMe device that is attached to a addressing challenged PCIe root port that requires bounce buffering, our request sizes can easily overflow the swiotlb bounce buffer size. Limit the maximum I/O size to the limit exposed by the DMA mapping subsystem. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Atish Patra <Atish.Patra@wdc.com> Tested-by: Atish Patra <Atish.Patra@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2019-07-09 13:44:44 -07:00
Alan Mikhak	bfac8e9f55	nvme-pci: check for NULL return from pci_alloc_p2pmem() Modify nvme_alloc_sq_cmds() to call pci_free_p2pmem() to free the memory it allocated using pci_alloc_p2pmem() in case pci_p2pmem_virt_to_bus() returns null. Makes sure not to call pci_free_p2pmem() if pci_alloc_p2pmem() returned NULL, which can happen if CONFIG_PCI_P2PDMA is not configured. The current implementation is not expected to leak since pci_p2pmem_virt_to_bus() is expected to fail only if pci_alloc_p2pmem() returns null. However, checking the return value of pci_alloc_p2pmem() is more explicit. Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 13:44:44 -07:00
Alan Mikhak	0298d54352	nvme-pci: don't create a read hctx mapping without read queues Only request an IRQ mapping for read queues if at least one read queue is being allocted, as nvme_pci_map_queues() will later on ignore the unnecessary mapping request should nvme_dev_add() request such an IRQ mapping even though no read queues are being allocated. However, nvme_dev_add() can avoid making the request by checking the number of read queues without assuming. This would bring it more in line with nvme_setup_irqs() and nvme_calc_irq_sets(). Signed-off-by: Alan Mikhak <alan.mikhak@sifive.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 13:44:44 -07:00
Christoph Hellwig	4fe06923f5	nvme-pci: don't fall back to a 32-bit DMA mask Since Linux 5.0 drivers can safely set the largest DMA mask supported by the device, and don't need fallbacks to work around the dma mapping implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>	2019-07-09 13:44:44 -07:00
YueHaibing	2177422232	nvme-pci: make nvme_dev_pm_ops static Fix sparse warning: drivers/nvme/host/pci.c:2926:25: warning: symbol 'nvme_dev_pm_ops' was not declared. Should it be static? Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-07-09 13:16:10 -07:00
Jason Gunthorpe	371bb62158	Linux 5.2-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl0Os1seHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGtx4H/j6i482XzcGFKTBm A7mBoQpy+kLtoUov4EtBAR62OuwI8rsahW9di37QKndPoQrczWaKBmr3De6LCdPe v3pl3O6wBbvH5ru+qBPFX9PdNbDvimEChh7LHxmMxNQq3M+AjZAZVJyfpoiFnx35 Fbge+LZaH/k8HMwZmkMr5t9Mpkip715qKg2o9Bua6dkH0AqlcpLlC8d9a+HIVw/z aAsyGSU8jRwhoAOJsE9bJf0acQ/pZSqmFp0rDKqeFTSDMsbDRKLGq/dgv4nW0RiW s7xqsjb/rdcvirRj3rv9+lcTVkOtEqwk0PVdL9WOf7g4iYrb3SOIZh8ZyViaDSeH VTS5zps= =huBY -----END PGP SIGNATURE----- Merge tag 'v5.2-rc6' into rdma.git for-next For dependencies in next patches. Resolve conflicts: - Use uverbs_get_cleared_udata() with new cq allocation flow - Continue to delete nes despite SPDX conflict - Resolve list appends in mlx5_command_str() - Use u16 for vport_rule stuff - Resolve list appends in struct ib_client Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-28 21:18:23 -03:00
Israel Rukshin	5a6781a558	RDMA/core: Add an integrity MR pool support This is a preparation for adding new signature API to the rw-API. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>	2019-06-24 11:49:27 -03:00
Akinobu Mita	f79d5fda4e	nvme: enable to inject errors into admin commands This enables to inject errors into the commands submitted to the admin queue. It is useful to test error handling in the controller initialization. # echo 100 > /sys/kernel/debug/nvme0/fault_inject/probability # echo 1 > /sys/kernel/debug/nvme0/fault_inject/times # echo 10 > /sys/kernel/debug/nvme0/fault_inject/space # nvme reset /dev/nvme0 # dmesg ... nvme nvme0: Could not set queue count (16385) nvme nvme0: IO queues not created Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:50 +02:00
Akinobu Mita	a3646451ed	nvme: prepare for fault injection into admin commands Currenlty fault injection support for nvme only enables to inject errors into the commands submitted to I/O queues. In preparation for fault injection into the admin commands, this makes the helper functions independent of struct nvme_ns. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:15:50 +02:00
Minwoo Im	5f965f4fd9	nvme-trace: print result and status in hex format The "result" field is in 64bit to be printed out which means it could be like: nvme_complete_rq: nvme0: qid=0, cmdid=0, res=18446612684158962624, etries=0, flags=0x0, status=0 Switch both the result and status field to be printed in hexadecimal format to be easier to read. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:37 +02:00
Minwoo Im	ad795e47cd	nvme-trace: support for fabrics commands in host-side This patch introduces fabrics commands tracing feature from host-side. This patch does not include any changes for the previous host-side tracing, but just add fabrics commands parsing in cmd=() format. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> [hch: fixed some whitespace damage] Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:22 +02:00
Minwoo Im	26f2990d85	nvme-trace: move opcode symbol print to nvme.h The following patches are going to provide the target-side trace which might need these kind of macros. It would be great if it can be shared between host and target side both. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:12:19 +02:00
Minwoo Im	7183a46a48	nvme-trace: do not export nvme_trace_disk_name nvme_trace_disk_name() is now already being invoked with the function prototype in trace.h. We don't need to export this symbol at all. The following patches are going to provide target-side trace feature with the exactly same function with this so that this patch removes the EXPORT_SYMBOL() for this function. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:11:56 +02:00
Chaitanya Kulkarni	7c1ce408eb	nvme-pci: clean up nvme_remove_dead_ctrl a bit Remove the status parameter o nvme_remove_dead_ctrl(), which is only used for printing it. We move the print message to the same function where actual error is occurring. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:39 +02:00
Minwoo Im	cee6c269b0	nvme-pci: properly report state change failure in nvme_reset_work If the state change to NVME_CTRL_CONNECTING fails, the dmesg is going to be like: [ 293.689160] nvme nvme0: failed to mark controller CONNECTING [ 293.689160] nvme nvme0: Removing after probe failure status: 0 Even it prints the first line to indicate the situation, the second line is not proper because the status is 0 which means normally success of the previous operation. This patch makes it indicate the proper error value when it fails. [ 25.932367] nvme nvme0: failed to mark controller CONNECTING [ 25.932369] nvme nvme0: Removing after probe failure status: -16 This situation is able to be easily reproduced by: root@target:~# rmmod nvme && modprobe nvme && rmmod nvme Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:39 +02:00
Chaitanya Kulkarni	e71afda493	nvme-pci: set the errno on ctrl state change error This patch removes the confusing assignment of the variable result at the time of declaration and sets the value in error cases next to the places where the actual error is happening. Here we also set the result value to -ENODEV when we fail at the final ctrl state transition in nvme_reset_work(). Without this assignment result will hold 0 from nvme_setup_io_queue() and on failure 0 will be passed to he nvme_remove_dead_ctrl() from final state transition. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	dad77d6390	nvme-pci: adjust irq max_vector using num_possible_cpus() If the "irq_queues" are greater than num_possible_cpus(), nvme_calc_irq_sets() can have irq set_size for HCTX_TYPE_DEFAULT greater than it can be afforded. 2039 affd->set_size[HCTX_TYPE_DEFAULT] = nrirqs - nr_read_queues; It might cause a WARN() from the irq_build_affinity_masks() like [1]: 220 if (nr_present < numvecs) 221 WARN_ON(nr_present + nr_others < numvecs); This patch prevents it from the WARN() by adjusting the max_vector value from the nvme_setup_irqs(). [1] WARN messages when modprobe nvme write_queues=32 poll_queues=0: root@target:~/nvme# nproc 8 root@target:~/nvme# modprobe nvme write_queues=32 poll_queues=0 [ 17.925326] nvme nvme0: pci function 0000:00:04.0 [ 17.940601] WARNING: CPU: 3 PID: 1030 at kernel/irq/affinity.c:221 irq_create_affinity_masks+0x222/0x330 [ 17.940602] Modules linked in: nvme nvme_core [last unloaded: nvme] [ 17.940605] CPU: 3 PID: 1030 Comm: kworker/u17:4 Tainted: G W 5.1.0+ #156 [ 17.940605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 17.940608] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 17.940609] RIP: 0010:irq_create_affinity_masks+0x222/0x330 [ 17.940611] Code: 4c 8d 4c 24 28 4c 8d 44 24 30 e8 c9 fa ff ff 89 44 24 18 e8 c0 38 fa ff 8b 44 24 18 44 8b 54 24 1c 5a 44 01 d0 41 39 c4 76 02 <0f> 0b 48 89 df 44 01 e5 e8 f1 ce 10 00 48 8b 34 24 44 89 f0 44 01 [ 17.940611] RSP: 0018:ffffc90002277c50 EFLAGS: 00010216 [ 17.940612] RAX: 0000000000000008 RBX: ffff88807ca48860 RCX: 0000000000000000 [ 17.940612] RDX: ffff88807bc03800 RSI: 0000000000000020 RDI: 0000000000000000 [ 17.940613] RBP: 0000000000000001 R08: ffffc90002277c78 R09: ffffc90002277c70 [ 17.940613] R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000020 [ 17.940614] R13: 0000000000025d08 R14: 0000000000000001 R15: ffff88807bc03800 [ 17.940614] FS: 0000000000000000(0000) GS:ffff88807db80000(0000) knlGS:0000000000000000 [ 17.940616] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.940617] CR2: 00005635e583f790 CR3: 000000000240a000 CR4: 00000000000006e0 [ 17.940617] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 17.940618] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 17.940618] Call Trace: [ 17.940622] __pci_enable_msix_range+0x215/0x540 [ 17.940623] ? kernfs_put+0x117/0x160 [ 17.940625] pci_alloc_irq_vectors_affinity+0x74/0x110 [ 17.940626] nvme_reset_work+0xc30/0x1397 [nvme] [ 17.940628] ? __switch_to_asm+0x34/0x70 [ 17.940628] ? __switch_to_asm+0x40/0x70 [ 17.940629] ? __switch_to_asm+0x34/0x70 [ 17.940630] ? __switch_to_asm+0x40/0x70 [ 17.940630] ? __switch_to_asm+0x34/0x70 [ 17.940631] ? __switch_to_asm+0x40/0x70 [ 17.940632] ? nvme_irq_check+0x30/0x30 [nvme] [ 17.940633] process_one_work+0x20b/0x3e0 [ 17.940634] worker_thread+0x1f9/0x3d0 [ 17.940635] ? cancel_delayed_work+0xa0/0xa0 [ 17.940636] kthread+0x117/0x120 [ 17.940637] ? kthread_stop+0xf0/0xf0 [ 17.940638] ret_from_fork+0x3a/0x50 [ 17.940639] ---[ end trace aca8a131361cd42a ]--- [ 17.942124] nvme nvme0: 7/1/0 default/read/poll queues Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	483178f38c	nvme-pci: remove queue_count_ops for write_queues and poll_queues queue_count_set() seems like that it has been provided to limit the number of queue entries for write/poll queues. But, the queue_count_set() has been doing nothing but a parameter check even it has num_possible_cpus() which is nop. This patch removes entire queue_count_ops from the write_queues and poll_queues. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00
Minwoo Im	a232ea0ebf	nvme-pci: remove unnecessary zero for static var poll_queues will be zero even without zero initialization here. Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2019-06-21 11:08:38 +02:00

1 2 3 4 5 ...

1260 Commits