OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Christoph Hellwig	6d46964230	block: remove the lock argument to blk_alloc_queue_node With the legacy request path gone there is no real need to override the queue_lock. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-11-15 12:13:35 -07:00
Linus Torvalds	6078e07dcf	libnvdimm for 4.20 * Improve the efficiency and performance of reading nvdimm-namespace labels. Reduce the amount of label data read at driver load time by a few orders of magnitude. Reduce heavyweight call-outs to platform-firmware routines. * Handle media errors located in the 'struct page' array stored on a persistent memory namespace. Let the kernel clear these errors rather than an awkward userspace workaround. * Fix Address Range Scrub (ARS) completion tracking. Correct occasions where the kernel indicates completion of ARS before submission. * Fix asynchronous device registration reference counting. * Add support for reporting an nvdimm dirty-shutdown-count via sysfs. * Fix various small libnvdimm core and uapi issues. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJb0NuwAAoJEB7SkWpmfYgCjUYP/1RK35zXBJSArGE3CUkap/zp exuqUzhisiE3RER13hNvC59AxXB9QuIbzuR5bzWm+Lawuuaozn3iL2oKn3Gy0inl yE3m/1Hx43FTkYdH86K9bpaXtRfymppJiR475jRFin17xWL3UP2JJgYtGwoRfO4p OL1aLcGo04Y1E2h6sVx97DjiwWN5uTaG9jCciZr2w+s5pg1seuEOJcAayp+6D0Tq i2hZvQ6nyxxF2WzGqRk3ABbRySpQ5o4b33I/jjOKEFwYoB8UiZQeLuL2WRr1ztfz jo+aalLJjZTOMgeWPIYSuV+U8vySVUwXpMhfrMGnIRm5BuE9JUlHrYkMLcLZJyei 2qgQ65mDmoViBVx0w5k2nUjP8Ju5lC7fZTaLU60vf+3FZvBbSTtmog2+P0xMLg17 AHebl9slzJPO4r/z4XY+n9Bk/qOz6sfWk07LugfNcMdeZriJKr7BUclZVZDYiPJA /Rtnd8XRu8hS5Kfj7wK2QD5sVklS5VQhho/zzBZHQcQkQBfRo6f6YQ83N/6yoTKD p6nel3uRMX2n8+EPyODYt9j0cF7JupWqlSpRKUORrdSz85gt4D6W578tkJCEOCm0 JOm5HdLlwIhlIcam/w0blLOr+a0sISS4cWR72Vc/lSZHoM8ouQiQC/lplpiAAWwI 7pSmlYEEbZRQCy6ZrlVy =0FtE -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Improve the efficiency and performance of reading nvdimm-namespace labels. Reduce the amount of label data read at driver load time by a few orders of magnitude. Reduce heavyweight call-outs to platform-firmware routines. - Handle media errors located in the 'struct page' array stored on a persistent memory namespace. Let the kernel clear these errors rather than an awkward userspace workaround. - Fix Address Range Scrub (ARS) completion tracking. Correct occasions where the kernel indicates completion of ARS before submission. - Fix asynchronous device registration reference counting. - Add support for reporting an nvdimm dirty-shutdown-count via sysfs. - Fix various small libnvdimm core and uapi issues. * tag 'libnvdimm-for-4.20' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits) acpi, nfit: Further restrict userspace ARS start requests acpi, nfit: Fix Address Range Scrub completion tracking UAPI: ndctl: Remove use of PAGE_SIZE UAPI: ndctl: Fix g++-unsupported initialisation in headers tools/testing/nvdimm: Populate dirty shutdown data acpi, nfit: Collect shutdown status acpi, nfit: Introduce nfit_mem flags libnvdimm, label: Fix sparse warning nvdimm: Use namespace index data to reduce number of label reads needed nvdimm: Split label init out from the logic for getting config data nvdimm: Remove empty if statement nvdimm: Clarify comment in sizeof_namespace_index nvdimm: Sanity check labeloff libnvdimm, dimm: Maximize label transfer size libnvdimm, pmem: Fix badblocks population for 'raw' namespaces libnvdimm, namespace: Drop the repeat assignment for variable dev->parent libnvdimm, region: Fail badblocks listing for inactive regions libnvdimm, pfn: during init, clear errors in the metadata area libnvdimm: Set device node in nd_device_register libnvdimm: Hold reference on parent while scheduling async init ...	2018-10-25 06:31:56 -07:00
Dan Williams	97052c1c31	libnvdimm, label: Fix sparse warning The kbuild robot reports: drivers/nvdimm/label.c:500:32: warning: restricted __le32 degrades to integer ...read 'nslot' into a local u32. Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:39:41 -07:00
Alexander Duyck	7d47aad457	nvdimm: Use namespace index data to reduce number of label reads needed This patch adds logic that is meant to make use of the namespace index data to reduce the number of reads that are needed to initialize a given namespace. The general idea is that once we have enough data to validate the namespace index we do so and then proceed to fetch only those labels that are not listed as being "free". By doing this I am seeing a total time reduction from about 4-5 seconds to 2-3 seconds for 24 NVDIMM modules each with 128K of label config area. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:39:31 -07:00
Alexander Duyck	2d657d17f7	nvdimm: Split label init out from the logic for getting config data This patch splits the initialization of the label data into two functions. One for doing the init, and another for reading the actual configuration data. The idea behind this is that by doing this we create a symmetry between the getting and setting of config data in that we have a function for both. In addition it will make it easier for us to identify the bits that are related to init versus the pieces that are a wrapper for reading data from the ACPI interface. So for example by splitting things out like this it becomes much more obvious that we were performing checks that weren't necessarily related to the set/get operations such as relying on ndd->data being present when the set and get ops should not care about a locally cached copy of the label area. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:39:24 -07:00
Alexander Duyck	19418b0244	nvdimm: Remove empty if statement This patch removes an empty statement from an if expression and promotes the else statement to the if expression with the expression logic reversed. I feel this is more readable as the empty statement can lead to issues if any additional logic was ever added. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:39:15 -07:00
Alexander Duyck	1cfeb66e8e	nvdimm: Clarify comment in sizeof_namespace_index When working on the label code I found it rather confusing to see several spots that reference a minimum label size of 256 while working with labels that are 128 bytes in size. This patch is meant to provide a clarification on one of the comments that was at the heart of the issue. Specifically for version 1.2 and later of the namespace specification the minimum label size is 256, prior to that the minimum label size was 128. So we should state that as such to avoid confusion. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:39:10 -07:00
Alexander Duyck	d86d4d63d8	nvdimm: Sanity check labeloff This patch adds validation for the labeloff field in the indexes. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-12 08:38:53 -07:00
Dan Williams	d11cf4a732	libnvdimm, dimm: Maximize label transfer size Use kvzalloc() to bypass the arbitrary PAGE_SIZE limit of label transfer operations. Given the expense of calling into firmware, maximize the amount of label data we transfer per call to be up to the total label space if allowed by the firmware. Instead of limiting based on PAGE_SIZE we can instead simply limit the maximum size based on either the config_size int he case of the get operation, or the length of the write based on the set operation. On a system with 24 NVDIMM modules each with a config_size of 128K and a maximum transfer size of 64K - 4, this patch reduces the init time for the label data from around 24 seconds down to between 4-5 seconds. Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-10 20:54:16 -07:00
Dan Williams	91ed7ac444	libnvdimm, pmem: Fix badblocks population for 'raw' namespaces The driver is only initializing bb_res in the devm_memremap_pages() paths, but the raw namespace case is passing an uninitialized bb_res to nvdimm_badblocks_populate(). Fixes: `e8d5134833` ("memremap: change devm_memremap_pages interface...") Cc: <stable@vger.kernel.org> Cc: Christoph Hellwig <hch@lst.de> Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-09 00:04:56 -07:00
GuangZhe Fu	55781b6693	libnvdimm, namespace: Drop the repeat assignment for variable dev->parent The variable dev-parent is assigned twice with the same &nd_region->dev. I think we could drop the second one. Signed-off-by: GuangZhe Fu <fugz1@lenovo.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-01 21:27:09 -07:00
Dan Williams	5d394eee2c	libnvdimm, region: Fail badblocks listing for inactive regions While experimenting with region driver loading the following backtrace was triggered: INFO: trying to register non-static key. the code is fine but needs lockdep annotation. turning off the locking correctness validator. [..] Call Trace: dump_stack+0x85/0xcb register_lock_class+0x571/0x580 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 __lock_acquire+0xd4/0x1310 ? dev_attr_show+0x1c/0x50 ? __lock_acquire+0x2ba/0x1310 ? kernfs_seq_start+0x2a/0x80 ? lock_acquire+0x9e/0x1a0 lock_acquire+0x9e/0x1a0 ? dev_attr_show+0x1c/0x50 badblocks_show+0x70/0x190 ? dev_attr_show+0x1c/0x50 dev_attr_show+0x1c/0x50 This results from a missing successful call to devm_init_badblocks() from nd_region_probe(). Block attempts to show badblocks while the region is not enabled. Fixes: `6a6bef9042` ("libnvdimm: add mechanism to publish badblocks...") Cc: <stable@vger.kernel.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-09-28 11:12:34 -07:00
Vishal Verma	48af2f7e52	libnvdimm, pfn: during init, clear errors in the metadata area If there are badblocks present in the 'struct page' area for pfn namespaces, until now, the only way to clear them has been to force the namespace into raw mode, clear the errors, and re-enable the fsdax mode. This is clunky, given that it should be easy enough for the pfn driver to do the same. Add a new helper that uses the most recently available badblocks list to check whether there are any badblocks that lie in the volatile struct page area. If so, before initializing the struct pages, send down targeted writes via nvdimm_write_bytes to write zeroes to the affected blocks, and thus clear errors. Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-09-28 11:06:56 -07:00
Hannes Reinecke	fef912bf86	block: genhd: add 'groups' argument to device_add_disk Update device_add_disk() to take an 'groups' argument so that individual drivers can register a device with additional sysfs attributes. This avoids race condition the driver would otherwise have if these groups were to be created with sysfs_add_groups(). Signed-off-by: Martin Wilck <martin.wilck@suse.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-09-28 08:30:28 -06:00
Alexander Duyck	1a091d16db	libnvdimm: Set device node in nd_device_register This change makes it so that we don't repeatedly overwrite the device node for nvdimm regions. The earliest we can set the node is immediately after calling device init, so I have moved the code there so we can avoid rewriting the node with each uevent. Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-09-26 12:02:32 -07:00
Alexander Duyck	b6eae0f61d	libnvdimm: Hold reference on parent while scheduling async init Unlike asynchronous initialization in the core we have not yet associated the device with the parent, and as such the device doesn't hold a reference to the parent. In order to resolve that we should be holding a reference on the parent until the asynchronous initialization has completed. Cc: <stable@vger.kernel.org> Fixes: `4d88a97aa9` ("libnvdimm: ...base ... infrastructure") Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-09-26 12:02:31 -07:00
Pankaj Gupta	3c5c98d135	libnvdimm: remove duplicate include Removed duplicate include. Signed-off-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-09-26 12:02:31 -07:00
Linus Torvalds	2923b27e54	libnvdimm-for-4.19_dax-memory-failure * memory_failure() gets confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states that needs new enabling to handle poison in dax mappings. Teach memory_failure() about ZONE_DEVICE pages. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9ui8ACgkQYGjFFmlT OEpNRw//XGj9s7sezfJFeol4psJlRUd935yii/gmJRgi/yPf2VxxQG9qyM6SMBUc 75jASfOL6FSsfxHz0kplyWzMDNdrTkNNAD+9rv80FmY7GqWgcas9DaJX7jZ994vI 5SRO7pfvNZcXlo7IhqZippDw3yxkIU9Ufi0YQKaEUm7GFieptvCZ0p9x3VYfdvwM BExrxQe0X1XUF4xErp5P78+WUbKxP47DLcucRDig8Q7dmHELUdyNzo3E1SVoc7m+ 3CmvyTj6XuFQgOZw7ZKun1BJYfx/eD5ZlRJLZbx6wJHRtTXv/Uea8mZ8mJ31ykN9 F7QVd0Pmlyxys8lcXfK+nvpL09QBE0/PhwWKjmZBoU8AdgP/ZvBXLDL/D6YuMTg6 T4wwtPNJorfV4lVD06OliFkVI4qbKbmNsfRq43Ns7PCaLueu4U/eMaSwSH99UMaZ MGbO140XW2RZsHiU9yTRUmZq73AplePEjxtzR8oHmnjo45nPDPy8mucWPlkT9kXA oUFMhgiviK7dOo19H4eaPJGqLmHM93+x5tpYxGqTr0dUOXUadKWxMsTnkID+8Yi7 /kzQWCFvySz3VhiEHGuWkW08GZT6aCcpkREDomnRh4MEnETlZI8bblcuXYOCLs6c nNf1SIMtLdlsl7U1fEX89PNeQQ2y237vEDhFQZftaalPeu/JJV0= =Ftop -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm memory-failure update from Dave Jiang: "As it stands, memory_failure() gets thoroughly confused by dev_pagemap backed mappings. The recovery code has specific enabling for several possible page states and needs new enabling to handle poison in dax mappings. In order to support reliable reverse mapping of user space addresses: 1/ Add new locking in the memory_failure() rmap path to prevent races that would typically be handled by the page lock. 2/ Since dev_pagemap pages are hidden from the page allocator and the "compound page" accounting machinery, add a mechanism to determine the size of the mapping that encompasses a given poisoned pfn. 3/ Given pmem errors can be repaired, change the speculatively accessed poison protection, mce_unmap_kpfn(), to be reversible and otherwise allow ongoing access from the kernel. A side effect of this enabling is that MADV_HWPOISON becomes usable for dax mappings, however the primary motivation is to allow the system to survive userspace consumption of hardware-poison via dax. Specifically the current behavior is: mce: Uncorrected hardware memory error in user-access at af34214200 {1}[Hardware Error]: It has been corrected by h/w and requires no further action mce: [Hardware Error]: Machine check events logged {1}[Hardware Error]: event severity: corrected Memory failure: 0xaf34214: reserved kernel page still referenced by 1 users [..] Memory failure: 0xaf34214: recovery action for reserved kernel page: Failed mce: Memory error not recovered <reboot> ...and with these changes: Injecting memory failure for pfn 0x20cb00 at process virtual address 0x7f763dd00000 Memory failure: 0x20cb00: Killing dax-pmd:5421 due to hardware memory corruption Memory failure: 0x20cb00: recovery action for dax page: Recovered Given all the cross dependencies I propose taking this through nvdimm.git with acks from Naoya, x86/core, x86/RAS, and of course dax folks" * tag 'libnvdimm-for-4.19_dax-memory-failure' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pmem: Restore page attributes when clearing errors x86/memory_failure: Introduce {set, clear}_mce_nospec() x86/mm/pat: Prepare {reserve, free}_memtype() for "decoy" addresses mm, memory_failure: Teach memory_failure() about dev_pagemap pages filesystem-dax: Introduce dax_lock_mapping_entry() mm, memory_failure: Collect mapping size in collect_procs() mm, madvise_inject_error: Let memory_failure() optionally take a page reference mm, dev_pagemap: Do not clear ->mapping on final put mm, madvise_inject_error: Disable MADV_SOFT_OFFLINE for ZONE_DEVICE pages filesystem-dax: Set page->index device-dax: Set page->index device-dax: Enable page_mapping() device-dax: Convert to vmf_insert_mixed and vm_fault_t	2018-08-25 18:43:59 -07:00
Linus Torvalds	828bf6e904	libnvdimm-for-4.19_misc Collection of misc libnvdimm patches for 4.19 submission * Adding support to read locked nvdimm capacity. * Change test code to make DSM failure code injection an override. * Add support for calculate maximum contiguous area for namespace. * Add support for queueing a short ARS when there is on going ARS for nvdimm. * Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. * Improve smart injection support for nvdimm emulation testing. * Fix test code that supports for emulating controller temperature. * Fix hang on error before devm_memremap_pages() * Fix a bug that causes user memory corruption when data returned to user for ars_status. * Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9uUIACgkQYGjFFmlT OErL+xAAgWHSGs8w98VtYA9kLDeTYEXutq93wJZQoBu/FMAXuuU3hYmQYnOQU87h KKYmfDkeusaih1R3IX7mzlegnnzSfQ6MraNSV76M43noJHbRTunknCPZH6ebp4fo b/eljvWlZF/idM+7YcsnoFMnHSRj2pjJGXmKQDlKedHD+KMxpmk6zEl2s5Y0zvPU 4U7UQLtk3D5IIpLNsLEmxge32BfvNf5IzoSO1aZp7Eqk0+U5Tq3Sq/Tjmd+J0RKt 6WH5yA6NqXQgBh+ayHsYU8YX62RqnbKQZXqVxD35OH64zJEUefnP1fpt9pmaZ9eL 43BPMkpM09eLAikO2ET3/3c2k6h3h9ttz1sH8t/hiroCtfmxs3XgskY06hxpKjZV EbN+BUmut5Mr+zzYitRr3dbK2aHPVU9IbU7jUw/1Tz23rq3kU5iI7SHHv1b/eWup 1Cr77Z1M6HB8VBhjnJ+R607sbRrnKQUOV7fGzAaIskyUOTWsEvIgTh/6MRiaj9MD 5HXIgc/0y9E+G93s7MsUWwzpB7J6E7EGoybST2SKPtqwtDMPsBNeWRjyA9quBCoN u1s+e+lWHYutqRW0eisDTTlq3nJwPijSx1nnzhJxw9s1EkCXz3f7KRZhyH1C79Co 7wjiuvKQ79e/HI/oXvGmTnv5lbLEpWYyJ3U3KIFfoUqugeyhr0k= =5p2n -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: "Collection of misc libnvdimm patches for 4.19 submission: - Adding support to read locked nvdimm capacity. - Change test code to make DSM failure code injection an override. - Add support for calculate maximum contiguous area for namespace. - Add support for queueing a short ARS when there is on going ARS for nvdimm. - Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. - Improve smart injection support for nvdimm emulation testing. - Fix test code that supports for emulating controller temperature. - Fix hang on error before devm_memremap_pages() - Fix a bug that causes user memory corruption when data returned to user for ars_status. - Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax" * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: fix ars_status output length calculation device-dax: avoid hang on error before devm_memremap_pages() tools/testing/nvdimm: improve emulation of smart injection filesystem-dax: Do not request kaddr and pfn when not required md/dm-writecache: Don't request pointer dummy_addr when not required dax/super: Do not request a pointer kaddr when not required tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access() s390, dcssblk: kaddr and pfn can be NULL to ->direct_access() libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() acpi/nfit: queue issuing of ars when an uc error notification comes in libnvdimm: Export max available extent libnvdimm: Use max contiguous area for namespace size MAINTAINERS: Add Jan Kara for filesystem DAX MAINTAINERS: update Ross Zwisler's email address tools/testing/nvdimm: Fix support for emulating controller temperature tools/testing/nvdimm: Make DSM failure code injection an override acpi, nfit: Prefer _DSM over _LSR for namespace label reads libnvdimm: Introduce locked DIMM capacity support	2018-08-25 18:13:10 -07:00
Dan Williams	c953cc987a	libnvdimm, pmem: Restore page attributes when clearing errors Use clear_mce_nospec() to restore WB mode for the kernel linear mapping of a pmem page that was marked 'HWPoison'. A page with 'HWPoison' set has also been marked UC in PAT (page attribute table) via set_mce_nospec() to prevent speculative retrievals of poison. The 'HWPoison' flag is only cleared when overwriting an entire page. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-08-20 09:22:45 -07:00
Vishal Verma	286e877181	libnvdimm: fix ars_status output length calculation Commit `efda1b5d87` ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Introduced additional hardening for ambiguity in the ACPI spec for ars_status output sizing. However, it had a couple of cases mixed up. Where it should have been checking for (and returning) "out_field[1] - 4" it was using "out_field[1] - 8" and vice versa. This caused a four byte discrepancy in the buffer size passed on to the command handler, and in some cases, this caused memory corruption like: ./daxdev-errors.sh: line 76: 24104 Aborted (core dumped) ./daxdev-errors $busdev $region malloc(): memory corruption Program received signal SIGABRT, Aborted. [...] #5 0x00007ffff7865a2e in calloc () from /lib64/libc.so.6 #6 0x00007ffff7bc2970 in ndctl_bus_cmd_new_ars_status (ars_cap=ars_cap@entry=0x6153b0) at ars.c:136 #7 0x0000000000401644 in check_ars_status (check=0x7fffffffdeb0, bus=0x604c20) at daxdev-errors.c:144 #8 test_daxdev_clear_error (region_name=<optimized out>, bus_name=<optimized out>) at daxdev-errors.c:332 Cc: <stable@vger.kernel.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Lukasz Dorau <lukasz.dorau@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Fixes: `efda1b5d87` ("acpi, nfit, libnvdimm: fix / harden ars_status output length handling") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-of-by: Dave Jiang <dave.jiang@intel.com>	2018-08-20 08:56:38 -07:00
Jens Axboe	05b9ba4b55	Linux 4.18-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAltU8z0eHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG5X8H/2fJr7m3k242+t76 sitwvx1eoPqTgryW59dRKm9IuXAGA+AjauvHzaz1QxomeQa50JghGWefD0eiJfkA 1AphQ/24EOiAbbVk084dAI/C2p122dE4D5Fy7CrfLnuouyrbFaZI5STbnrRct7sR 9deeYW0GDHO1Uenp4WDCj0baaqJqaevZ+7GG09DnWpya2nQtSkGBjqn6GpYmrfOU mqFuxAX8mEOW6cwK16y/vYtnVjuuMAiZ63/OJ8AQ6d6ArGLwAsdn7f8Fn4I4tEr2 L0d3CRLUyegms4++Dmlu05k64buQu46WlPhjCZc5/Ts4kjrNxBuHejj2/jeSnUSt vJJlibI= =42a5 -----END PGP SIGNATURE----- Merge tag 'v4.18-rc6' into for-4.19/block2 Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>	2018-08-05 19:32:09 -06:00
Huaisheng Ye	46a590cde0	libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() pmem_direct_access() needs to check the validity of pointers kaddr and pfn for NULL assignment. If anyone equals to NULL, it doesn't need to calculate the value. If pointer equals to NULL, that is to say callers may have no need for kaddr or pfn, so this patch is prepared for allowing them to pass in NULL instead of having to pass in a pointer or local variable that they then just throw away. Signed-off-by: Huaisheng Ye <yehs1@lenovo.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-07-30 09:32:38 -07:00
Keith Busch	1e687220ef	libnvdimm: Export max available extent The 'available_size' attribute showing the combined total of all unallocated space isn't always useful to know how large of a namespace a user may be able to allocate if the region is fragmented. This patch will export the largest extent of unallocated space that may be allocated to create a new namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-07-25 14:11:43 -07:00
Keith Busch	12e3129e29	libnvdimm: Use max contiguous area for namespace size This patch will find the max contiguous area to determine the largest pmem namespace size that can be created. If the requested size exceeds the largest available, ENOSPC error will be returned. This fixes the allocation underrun error and wrong error return code that have otherwise been observed as the following kernel warning: WARNING: CPU: <CPU> PID: <PID> at drivers/nvdimm/namespace_devs.c:913 size_store Fixes: `a1f3e4d6a0` ("libnvdimm, region: update nd_region_available_dpa() for multi-pmem support") Cc: <stable@vger.kernel.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-07-25 14:11:09 -07:00
Michael Callahan	ddcf35d397	block: Add and use op_stat_group() for indexing disk_stat fields. Add and use a new op_stat_group() function for indexing partition stat fields rather than indexing them by rq_data_dir() or bio_data_dir(). This function works similarly to op_is_sync() in that it takes the request::cmd_flags or bio::bi_opf flags and determines which stats should et updated. In addition, the second parameter to generic_start_io_acct() and generic_end_io_acct() is now a REQ_OP rather than simply a read or write bit and it uses op_stat_group() on the parameter to determine the stat group. Note that the partition in_flight counts are not part of the per-cpu statistics and as such are not indexed via this function. It's now indexed by op_is_write(). tj: Refreshed on top of v4.17. Updated to pass around REQ_OP. Signed-off-by: Michael Callahan <michaelcallahan@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Joshua Morris <josh.h.morris@us.ibm.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Matias Bjorling <mb@lightnvm.io> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Alasdair Kergon <agk@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:20 -06:00
Tejun Heo	3f289dcb4b	block: make bdev_ops->rw_page() take a REQ_OP instead of bool `c11f0c0b5b` ("block/mm: make bdev_ops->rw_page() take a bool for read/write") replaced @op with boolean @is_write, which limited the amount of information going into ->rw_page() and more importantly page_endio(), which removed the need to expose block internals to mm. Unfortunately, we want to track discards separately and @is_write isn't enough information. This patch updates bdev_ops->rw_page() to take REQ_OP instead but leaves page_endio() to take bool @is_write. This allows the block part of operations to have enough information while not leaking it to mm. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Mike Christie <mchristi@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2018-07-18 08:44:14 -06:00
Dan Williams	08e6b3c6e3	libnvdimm: Introduce locked DIMM capacity support When a DIMM is locked its namespace label area may not be. Introduce the distinction of locked namespaces to allow namespace enumeration while the capacity is locked. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-07-14 10:27:00 -07:00
Linus Torvalds	4596f55476	* fix one ensures that a variable passed in by reference to acpi_nfit_ctl is always set to a value. An incremental patch is provided due to notice from testing in -next. The rest of the commits did not exhibit issues. * fix two fixes a return path in nsio_rw_bytes() that was not returning "bytes remain" as expected for the function. * fix three addresses an issue where applications polling on scrub-completion for the NVDIMM may falsely wakeup and read the wrong state value and cause hang. * the test unit changed the persistent capability attribute to fix up a broken assumption in the unit test infrastructure wrt the 'write_cache' attribute * An output ratelimit to dev_info is introduced to the dax device check_vma() function since this is easily triggered from userspace. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAltGe3oACgkQYGjFFmlT OErs8Q/9FA7nLUv0PN2fXXWpP2xrALn6tolalwv9Zvff1M+DUqPqCOAiG0+C+wnN qAWvxmTHbdtgfKU7KjC/dj7cAeMRXFVUD5ffNOZifEJ36r7nP1XC9uwPt36bouB8 NMd38oWRlofjnND7NPTDdZAqPY1Lk8fztPKeMGL9ZcCItABABUYyU2InRJEZwdrs leVJfjZlvfp6MO7iC6E0hzDKl9G/5MMyDBgC4ostaRIiSpM0dsUaod5jeNbfKBhZ sGObLP9hNr9CZ4/4XcCChdCzsvlPIF7eWoXGka2pBoXYl7lXhmdVcf54+j4i5Ify zwP9Kcllvo4oRo4dwXpd+RGOMINKpF3PkBXTMAv+KURWS859ptQDku+WseVNRm0C j2kd0WtXHnMgBV3PrgCEp0lfQfUZaQe7ULgCpkI/k+jAmBKD6NN+6wCg3bXHCHlW S1sKlSLuwAp+LdIIlFeW6Sq5+qBAtacFU1YmIPVGKdiIwaK6B115eU6CNmNwT0Vr 84zbcDbBxFn1wr0jasD72zXUM/NnxphMYTjZ0GlSdE9zEa2+0ylfoadmVCDk3qc2 xzBcc7vkSGhswjK0e5LlRNlH4KXT5TkwWciEg5GvowL2ja9bsL2uRDFqxpmkEZkt mdkH4gpW1GGkRQpAwYSuWnWEHqoGQv7R7SSayAhBmXdcTf0Gibs= =SKzT -----END PGP SIGNATURE----- Merge tag 'libnvdimm-fixes-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dave Jiang: - ensure that a variable passed in by reference to acpi_nfit_ctl is always set to a value. An incremental patch is provided due to notice from testing in -next. The rest of the commits did not exhibit issues. - fix a return path in nsio_rw_bytes() that was not returning "bytes remain" as expected for the function. - address an issue where applications polling on scrub-completion for the NVDIMM may falsely wakeup and read the wrong state value and cause hang. - change the test unit persistent capability attribute to fix up a broken assumption in the unit test infrastructure wrt the 'write_cache' attribute - ratelimit dev_info() in the dax device check_vma() function since this is easily triggered from userspace * tag 'libnvdimm-fixes-4.18-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit: fix unchecked dereference in acpi_nfit_ctl acpi, nfit: Fix scrub idle detection tools/testing/nvdimm: advertise a write cache for nfit_test acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value dev-dax: check_vma: ratelimit dev_info-s libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes()	2018-07-13 10:54:01 -07:00
Dan Williams	b62cc6fdd7	libnvdimm, pmem: Fix memcpy_mcsafe() return code handling in nsio_rw_bytes() Commit `60622d6822` "x86/asm/memcpy_mcsafe: Return bytes remaining" converted callers of memcpy_mcsafe() to expect a positive 'bytes remaining' value rather than a negative error code. The nsio_rw_bytes() conversion failed to return success. The failure is benign in that nsio_rw_bytes() will end up writing back what it just read. Fixes: `60622d6822` ("x86/asm/memcpy_mcsafe: Return bytes remaining") Cc: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-28 18:21:30 -07:00
Ross Zwisler	4557641b4c	pmem: only set QUEUE_FLAG_DAX for fsdax mode QUEUE_FLAG_DAX is an indication that a given block device supports filesystem DAX and should not be set for PMEM namespaces which are in "raw" mode. These namespaces lack struct page and are prevented from participating in filesystem DAX as of commit `569d0365f5` ("dax: require 'struct page' by default for filesystem dax"). Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Suggested-by: Mike Snitzer <snitzer@redhat.com> Fixes: `569d0365f5` ("dax: require 'struct page' by default for filesystem dax") Cc: stable@vger.kernel.org Acked-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>	2018-06-28 16:05:59 -04:00
Dan Williams	930218affe	Merge branch 'for-4.18/mcsafe' into libnvdimm-for-next	2018-06-08 15:16:44 -07:00
Dan Williams	b56845794e	Merge branch 'for-4.18/dax' into libnvdimm-for-next	2018-06-08 15:16:40 -07:00
Ross Zwisler	546eb0317c	libnvdimm, pmem: Do not flush power-fail protected CPU caches This commit: `5fdf8e5ba5` ("libnvdimm: re-enable deep flush for pmem devices via fsync()") intended to make sure that deep flush was always available even on platforms which support a power-fail protected CPU cache. An unintended side effect of this change was that we also lost the ability to skip flushing CPU caches on those power-fail protected CPU cache. Fix this by skipping the low level cache flushing in dax_flush() if we have CPU caches which are power-fail protected. The user can still override this behavior by manually setting the write_cache state of a namespace. See libndctl's ndctl_namespace_write_cache_is_enabled(), ndctl_namespace_enable_write_cache() and ndctl_namespace_disable_write_cache() functions. Cc: <stable@vger.kernel.org> Fixes: `5fdf8e5ba5` ("libnvdimm: re-enable deep flush for pmem devices via fsync()") Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-06 11:02:32 -07:00
Ross Zwisler	ce7f11a230	libnvdimm, pmem: Unconditionally deep flush on sync Prior to this commit we would only do a "deep flush" (have nvdimm_flush() write to each of the flush hints for a region) in response to an msync/fsync/sync call if the nvdimm_has_cache() returned true at the time we were setting up the request queue. This happens due to the write cache value passed in to blk_queue_write_cache(), which then causes the block layer to send down BIOs with REQ_FUA and REQ_PREFLUSH set. We do have a "write_cache" sysfs entry for namespaces, i.e.: /sys/bus/nd/devices/pfn0.1/block/pmem0/dax/write_cache which can be used to control whether or not the kernel thinks a given namespace has a write cache, but this didn't modify the deep flush behavior that we set up when the driver was initialized. Instead, it only modified whether or not DAX would flush CPU caches via dax_flush() in response to sync calls. Simplify this by making the *sync deep flush always happen, regardless of the write cache setting of a namespace. The DAX CPU cache flushing will still be controlled the write_cache setting of the namespace. Cc: <stable@vger.kernel.org> Fixes: `5fdf8e5ba5` ("libnvdimm: re-enable deep flush for pmem devices via fsync()") Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-06 10:55:53 -07:00
Ross Zwisler	d2d6364dcb	libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH Complete the move from REQ_FLUSH to REQ_PREFLUSH that apparently started way back in v4.8. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-06 10:40:56 -07:00
Dan Williams	d76401ade0	libnvdimm, e820: Register all pmem resources There is currently a mismatch between the resources that will trigger the e820_pmem driver to register/load and the resources that will actually be surfaced as pmem ranges. register_e820_pmem() uses walk_iomem_res_desc() which includes children and siblings. In contrast, e820_pmem_probe() only considers top level resources. For example the following resource tree results in the driver being loaded, but no resources being registered: 398000000000-39bfffffffff : PCI Bus 0000:ae 39be00000000-39bf07ffffff : PCI Bus 0000:af 39be00000000-39beffffffff : 0000:af:00.0 39be10000000-39beffffffff : Persistent Memory (legacy) Fix this up to allow definitions of "legacy" pmem ranges anywhere in system-physical address space. Not that it is a recommended or safe to define a pmem range in PCI space, but it is useful for debug / experimentation, and the restriction on being a top-level resource was arbitrary. Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-02 17:05:43 -07:00
Dan Williams	3f46833df9	libnvdimm: Debug probe times Instrument nvdimm_bus_probe() to emit timestamps for the start and end of libnvdimm device probing. This is useful for identifying sources of libnvdimm sub-system initialization latency. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-02 17:05:43 -07:00
Robert Elliott	254a4cd50b	linvdimm, pmem: Preserve read-only setting for pmem devices The pmem driver does not honor a forced read-only setting for very long: $ blockdev --setro /dev/pmem0 $ blockdev --getro /dev/pmem0 1 followed by various commands like these: $ blockdev --rereadpt /dev/pmem0 or $ mkfs.ext4 /dev/pmem0 results in this in the kernel serial log: nd_pmem namespace0.0: region0 read-write, marking pmem0 read-write with the read-only setting lost: $ blockdev --getro /dev/pmem0 0 That's from bus.c nvdimm_revalidate_disk(), which always applies the setting from nd_region (which is initially based on the ACPI NFIT NVDIMM state flags not_armed bit). In contrast, commit `20bd1d026a` ("scsi: sd: Keep disk read-only when re-reading partition") fixed this issue for SCSI devices to preserve the previous setting if it was set to read-only. This patch modifies bus.c to preserve any previous read-only setting. It also eliminates the kernel serial log print except for cases where read-write is changed to read-only, so it doesn't print read-only to read-only non-changes. Cc: <stable@vger.kernel.org> Fixes: `5813882094` ("libnvdimm, nfit: handle unarmed dimms, mark namespaces read-only") Signed-off-by: Robert Elliott <elliott@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-05-31 21:07:33 -07:00
Dan Williams	6dfdb2b6d8	pmem: Switch to copy_to_iter_mcsafe() Use the machine check safe version of copy_to_iter() for the ->copy_to_iter() operation published by the pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-05-22 23:18:31 -07:00
Dan Williams	b3a9a0c36e	dax: Introduce a ->copy_to_iter dax operation Similar to the ->copy_from_iter() operation, a platform may want to deploy an architecture or device specific routine for handling reads from a dax_device like /dev/pmemX. On x86 this routine will point to a machine check safe version of copy_to_iter(). For now, add the plumbing to device-mapper and the dax core. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Mike Snitzer <snitzer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-05-22 23:18:31 -07:00
Dan Williams	e763848843	mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS In preparation for fixing dax-dma-vs-unmap issues, filesystems need to be able to rely on the fact that they will get wakeups on dev_pagemap page-idle events. Introduce MEMORY_DEVICE_FS_DAX and generic_dax_page_free() as common indicator / infrastructure for dax filesytems to require. With this change there are no users of the MEMORY_DEVICE_HOST designation, so remove it. The HMM sub-system extended dev_pagemap to arrange a callback when a dev_pagemap managed page is freed. Since a dev_pagemap page is free / idle when its reference count is 1 it requires an additional branch to check the page-type at put_page() time. Given put_page() is a hot-path we do not want to incur that check if HMM is not in use, so a static branch is used to avoid that overhead when not necessary. Now, the FS_DAX implementation wants to reuse this mechanism for receiving dev_pagemap ->page_free() callbacks. Rework the HMM-specific static-key into a generic mechanism that either HMM or FS_DAX code paths can enable. For ARCH=um builds, and any other arch that lacks ZONE_DEVICE support, care must be taken to compile out the DEV_PAGEMAP_OPS infrastructure. However, we still need to support FS_DAX in the FS_DAX_LIMITED case implemented by the s390/dcssblk driver. Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Thomas Meyer <thomas@m3y3r.de> Reported-by: Dave Jiang <dave.jiang@intel.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-05-22 06:59:39 -07:00
Dan Williams	60622d6822	x86/asm/memcpy_mcsafe: Return bytes remaining Machine check safe memory copies are currently deployed in the pmem driver whenever reading from persistent memory media, so that -EIO is returned rather than triggering a kernel panic. While this protects most pmem accesses, it is not complete in the filesystem-dax case. When filesystem-dax is enabled reads may bypass the block layer and the driver via dax_iomap_actor() and its usage of copy_to_iter(). In preparation for creating a copy_to_iter() variant that can handle machine checks, teach memcpy_mcsafe() to return the number of bytes remaining rather than -EFAULT when an exception occurs. Co-developed-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: hch@lst.de Cc: linux-fsdevel@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/152539238119.31796.14318473522414462886.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2018-05-15 08:32:42 +02:00
Dan Williams	f22acf8274	Revert "libnvdimm, of_pmem: workaround OF_NUMA=n build error" With commit `df3f126482` ("libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid()") it is now possible to allow of_pmem to be built as a module as originally implemented. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-19 15:10:56 -07:00
Rob Herring	df3f126482	libnvdimm, of_pmem: use dev_to_node() instead of of_node_to_nid() Remove the direct dependency on of_node_to_nid() by using dev_to_node() instead. Any DT platform device will have its NUMA node id set when the device is created. With this, commit `291717b6fb` ("libnvdimm, of_pmem: workaround OF_NUMA=n build error") can be reverted. Fixes: `7171976089` ("libnvdimm: Add device-tree based driver") Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: linux-nvdimm@lists.01.org Signed-off-by: Rob Herring <robh@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-19 15:07:10 -07:00
Dan Williams	e7c5a571a8	libnvdimm, dimm: handle EACCES failures from label reads The new support for the standard _LSR and _LSW methods neglected to also update the nvdimm_init_config_data() and nvdimm_set_config_data() to return the translated error code from failed commands. This precision is necessary because the locked status that was previously returned on ND_CMD_GET_CONFIG_SIZE commands is now returned on ND_CMD_{GET,SET}_CONFIG_DATA commands. If the kernel misses this indication it can inadvertently fall back to label-less mode when it should otherwise avoid all access to locked regions. Cc: <stable@vger.kernel.org> Fixes: `4b27db7e26` ("acpi, nfit: add support for the _LSI, _LSR, and...") Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-16 08:18:51 -07:00
Linus Torvalds	9f3a0941fb	libnvdimm for 4.17 * A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. * The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. * Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. * The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. * Miscellaneous cleanups and updates to unit test infrastructure. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m 9eGJG1H+JspcbtLF9iHn =acZQ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit `d2c997c0f1` ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...	2018-04-10 10:25:57 -07:00
Dan Williams	e13e75b86e	Merge branch 'for-4.17/dax' into libnvdimm-for-next	2018-04-09 10:50:17 -07:00
Dan Williams	1ed41b5696	Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-next	2018-04-09 10:50:08 -07:00
Dan Williams	291717b6fb	libnvdimm, of_pmem: workaround OF_NUMA=n build error Stephen reports that an x86 allmodconfig build fails to build the of_pmem driver due to a missing definition of of_node_to_nid(). That helper is currently only exported in the OF_NUMA=y case. In other cases, ppc and sparc, it is a weak symbol, and outside of those platforms it is a static inline. Until an OF_NUMA=n configuration can reliably support usage of of_node_to_nid() in modules across architectures, mark this driver as 'bool' instead of 'tristate'. Cc: Rob Herring <robh@kernel.org> Cc: Oliver O'Halloran <oohall@gmail.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-09 09:10:22 -07:00

1 2 3 4 5 ...

435 Commits