OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Dan Williams	d3abaf43ba	acpi, nfit: Fix Address Range Scrub completion tracking The Address Range Scrub implementation tried to skip running scrubs against ranges that were already scrubbed by the BIOS. Unfortunately that support also resulted in early scrub completions as evidenced by this debug output from nfit_test: nd_region region9: ARS: range 1 short complete nd_region region3: ARS: range 1 short complete nd_region region4: ARS: range 2 ARS start (0) nd_region region4: ARS: range 2 short complete ...i.e. completions without any indications that the scrub was started. This state of affairs was hard to see in the code due to the proliferation of state bits and mistakenly trying to track done state per-range when the completion is a global property of the bus. So, kill the four ARS state bits (ARS_REQ, ARS_REQ_REDO, ARS_DONE, and ARS_SHORT), and replace them with just 2 request flags ARS_REQ_SHORT and ARS_REQ_LONG. The implementation will still complete and reap the results of BIOS initiated ARS, but it will not attempt to use that information to affect the completion status of scrubbing the ranges from a Linux perspective. Instead, try to synchronously run a short ARS per range at init time and schedule a long scrub in the background. If ARS is busy with an ARS request, schedule both a short and a long scrub for when ARS returns to idle. This logic also satisfies the intent of what ARS_REQ_REDO was trying to achieve. The new rule is that the REQ flag stays set until the next successful ars_start() for that range. With the new policy that the REQ flags are not cleared until the next start, the implementation no longer loses requests as can be seen from the following log: nd_region region3: ARS: range 1 ARS start short (0) nd_region region9: ARS: range 1 ARS start short (0) nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start short (0) nd_region region9: ARS: range 1 complete nd_region region9: ARS: range 1 ARS start long (0) nd_region region4: ARS: range 2 complete nd_region region3: ARS: range 1 ARS start long (0) nd_region region9: ARS: range 1 complete nd_region region3: ARS: range 1 complete nd_region region4: ARS: range 2 ARS start long (0) nd_region region4: ARS: range 2 complete ...note that the nfit_test emulated driver provides 2 buses, that is why some of the range indices are duplicated. Notice that each range now successfully completes a short and long scrub. Cc: <stable@vger.kernel.org> Fixes: `14c73f997a` ("nfit, address-range-scrub: introduce nfit_spa->ars_state") Fixes: `cc3d3458d4` ("acpi/nfit: queue issuing of ars when an uc error...") Reported-by: Jacek Zloch <jacek.zloch@intel.com> Reported-by: Krzysztof Rusocki <krzysztof.rusocki@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 13:57:51 -07:00
Dan Williams	f110176633	tools/testing/nvdimm: Populate dirty shutdown data Allow the unit tests to verify the retrieval of the dirty shutdown count via smart commands, and allow the driver-load-time retrieval of the smart health payload to be simulated by nfit_test. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 10:47:19 -07:00
Dan Williams	0ead11181f	acpi, nfit: Collect shutdown status Some NVDIMMs, in addition to providing an indication of whether the previous shutdown was clean, also provide a running count of lifetime dirty-shutdown events for the device. In anticipation of this functionality appearing on more devices arrange for the nfit driver to retrieve / cache this data at DIMM discovery time, and export it via sysfs. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-17 10:39:04 -07:00
Dan Williams	6f07f86c49	acpi, nfit: Introduce nfit_mem flags In preparation for adding a flag to indicate whether a DIMM publishes a dirty-shutdown count, convert the existing flags to a bit field. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-10-16 17:57:58 -07:00
Linus Torvalds	828bf6e904	libnvdimm-for-4.19_misc Collection of misc libnvdimm patches for 4.19 submission * Adding support to read locked nvdimm capacity. * Change test code to make DSM failure code injection an override. * Add support for calculate maximum contiguous area for namespace. * Add support for queueing a short ARS when there is on going ARS for nvdimm. * Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. * Improve smart injection support for nvdimm emulation testing. * Fix test code that supports for emulating controller temperature. * Fix hang on error before devm_memremap_pages() * Fix a bug that causes user memory corruption when data returned to user for ars_status. * Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax. -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE5DAy15EJMCV1R6v9YGjFFmlTOEoFAlt9uUIACgkQYGjFFmlT OErL+xAAgWHSGs8w98VtYA9kLDeTYEXutq93wJZQoBu/FMAXuuU3hYmQYnOQU87h KKYmfDkeusaih1R3IX7mzlegnnzSfQ6MraNSV76M43noJHbRTunknCPZH6ebp4fo b/eljvWlZF/idM+7YcsnoFMnHSRj2pjJGXmKQDlKedHD+KMxpmk6zEl2s5Y0zvPU 4U7UQLtk3D5IIpLNsLEmxge32BfvNf5IzoSO1aZp7Eqk0+U5Tq3Sq/Tjmd+J0RKt 6WH5yA6NqXQgBh+ayHsYU8YX62RqnbKQZXqVxD35OH64zJEUefnP1fpt9pmaZ9eL 43BPMkpM09eLAikO2ET3/3c2k6h3h9ttz1sH8t/hiroCtfmxs3XgskY06hxpKjZV EbN+BUmut5Mr+zzYitRr3dbK2aHPVU9IbU7jUw/1Tz23rq3kU5iI7SHHv1b/eWup 1Cr77Z1M6HB8VBhjnJ+R607sbRrnKQUOV7fGzAaIskyUOTWsEvIgTh/6MRiaj9MD 5HXIgc/0y9E+G93s7MsUWwzpB7J6E7EGoybST2SKPtqwtDMPsBNeWRjyA9quBCoN u1s+e+lWHYutqRW0eisDTTlq3nJwPijSx1nnzhJxw9s1EkCXz3f7KRZhyH1C79Co 7wjiuvKQ79e/HI/oXvGmTnv5lbLEpWYyJ3U3KIFfoUqugeyhr0k= =5p2n -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dave Jiang: "Collection of misc libnvdimm patches for 4.19 submission: - Adding support to read locked nvdimm capacity. - Change test code to make DSM failure code injection an override. - Add support for calculate maximum contiguous area for namespace. - Add support for queueing a short ARS when there is on going ARS for nvdimm. - Allow NULL to be passed in to ->direct_access() for kaddr and pfn params. - Improve smart injection support for nvdimm emulation testing. - Fix test code that supports for emulating controller temperature. - Fix hang on error before devm_memremap_pages() - Fix a bug that causes user memory corruption when data returned to user for ars_status. - Maintainer updates for Ross Zwisler emails and adding Jan Kara to fsdax" * tag 'libnvdimm-for-4.19_misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm: fix ars_status output length calculation device-dax: avoid hang on error before devm_memremap_pages() tools/testing/nvdimm: improve emulation of smart injection filesystem-dax: Do not request kaddr and pfn when not required md/dm-writecache: Don't request pointer dummy_addr when not required dax/super: Do not request a pointer kaddr when not required tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access() s390, dcssblk: kaddr and pfn can be NULL to ->direct_access() libnvdimm, pmem: kaddr and pfn can be NULL to ->direct_access() acpi/nfit: queue issuing of ars when an uc error notification comes in libnvdimm: Export max available extent libnvdimm: Use max contiguous area for namespace size MAINTAINERS: Add Jan Kara for filesystem DAX MAINTAINERS: update Ross Zwisler's email address tools/testing/nvdimm: Fix support for emulating controller temperature tools/testing/nvdimm: Make DSM failure code injection an override acpi, nfit: Prefer _DSM over _LSR for namespace label reads libnvdimm: Introduce locked DIMM capacity support	2018-08-25 18:13:10 -07:00
Dave Jiang	cc3d3458d4	acpi/nfit: queue issuing of ars when an uc error notification comes in When the ACPI UC error notifier gets called and ARS_REQ bit is set with the passed in flag, we can receive -EBUSY if ARS_REQ bit is already set for the nfit_spa->ars_state. When that happens, the ARS request is dropped. That can potentially cause us to miss the unreported errors that the on going ARS request does not receive. Add an ARS_REQ_REDO state that will request short ARS upon ARS completion to grab any errors we missed. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>	2018-07-27 15:28:28 -07:00
Dan Williams	099b07a25f	acpi, nfit: Prefer _DSM over _LSR for namespace label reads The _LSR method indicates locked status via error-code-3 returned in the _LSR payload. When any error is returned the payload of _LSR is truncated to a zero-length buffer. The _DSM path in comparison allows system software to retrieve the locked status and namespace label area contents. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-07-14 10:27:00 -07:00
Dave Jiang	ee6581ceba	nfit: fix unchecked dereference in acpi_nfit_ctl Incremental patch to fix the unchecked dereference in acpi_nfit_ctl. Reported by Dan Carpenter: "acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value" from Jun 28, 2018, leads to the following Smatch complaint: drivers/acpi/nfit/core.c:578 acpi_nfit_ctl() warn: variable dereferenced before check 'cmd_rc' (see line 411) drivers/acpi/nfit/core.c 410 411 *cmd_rc = -EINVAL; ^^^^^^^^^^^^^^^^^^ Patch adds unchecked dereference. Fixes: `c1985cefd8` ("acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value") Signed-off-by: Dave Jiang <dave.jiang@intel.com>	2018-07-11 10:25:24 -07:00
Dan Williams	33cc2c9667	acpi, nfit: Fix scrub idle detection The notification of scrub completion happens within the scrub workqueue. That can clearly race someone running scrub_show() and work_busy() before the workqueue has a chance to flush the recently completed work. Add a flag to reliably indicate the idle vs busy state. Without this change applications using poll(2) to wait for scrub-completion may falsely wakeup and read ARS as being busy even though the thread is going idle and then hang indefinitely. Fixes: `bc6ba80858` ("nfit, address-range-scrub: rework and simplify ARS...") Cc: <stable@vger.kernel.org> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Reported-by: Lukasz Dorau <lukasz.dorau@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-07-05 19:33:53 -07:00
Dave Jiang	c1985cefd8	acpi/nfit: fix cmd_rc for acpi_nfit_ctl to always return a value cmd_rc is passed in by reference to the acpi_nfit_ctl() function and the caller expects a value returned. However, when the package is pass through via the ND_CMD_CALL command, cmd_rc is not touched. Make sure cmd_rc is always set. Fixes: `aef2533822` ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-30 10:43:39 -07:00
Kees Cook	a86854d0c5	treewide: devm_kzalloc() -> devm_kcalloc() The devm_kzalloc() function has a 2-factor argument form, devm_kcalloc(). This patch replaces cases of: devm_kzalloc(handle, a * b, gfp) with: devm_kcalloc(handle, a * b, gfp) as well as handling cases of: devm_kzalloc(handle, a * b * c, gfp) with: devm_kzalloc(handle, array3_size(a, b, c), gfp) as it's slightly less ugly than: devm_kcalloc(handle, array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: devm_kzalloc(handle, 4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. Some manual whitespace fixes were needed in this patch, as Coccinelle really liked to write "=devm_kcalloc..." instead of "= devm_kcalloc...". The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ expression HANDLE; type TYPE; expression THING, E; @@ ( devm_kzalloc(HANDLE, - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) \| devm_kzalloc(HANDLE, - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression HANDLE; expression COUNT; typedef u8; typedef __u8; @@ ( devm_kzalloc(HANDLE, - sizeof(u8) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(__u8) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(char) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(unsigned char) * (COUNT) + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(u8) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(__u8) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(char) * COUNT + COUNT , ...) \| devm_kzalloc(HANDLE, - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ expression HANDLE; type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ expression HANDLE; identifier SIZE, COUNT; @@ - devm_kzalloc + devm_kcalloc (HANDLE, - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression HANDLE; expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression HANDLE; expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) \| devm_kzalloc(HANDLE, - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ expression HANDLE; identifier STRIDE, SIZE, COUNT; @@ ( devm_kzalloc(HANDLE, - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) \| devm_kzalloc(HANDLE, - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression HANDLE; expression E1, E2, E3; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, C1 * C2 * C3, ...) \| devm_kzalloc(HANDLE, - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) \| devm_kzalloc(HANDLE, - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression HANDLE; expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( devm_kzalloc(HANDLE, sizeof(THING) * C2, ...) \| devm_kzalloc(HANDLE, sizeof(TYPE) * C2, ...) \| devm_kzalloc(HANDLE, C1 * C2 * C3, ...) \| devm_kzalloc(HANDLE, C1 * C2, ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - sizeof(THING) * E2 + E2, sizeof(THING) , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * E2 + E1, E2 , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - (E1) * (E2) + E1, E2 , ...) \| - devm_kzalloc + devm_kcalloc (HANDLE, - E1 * E2 + E1, E2 , ...) ) Signed-off-by: Kees Cook <keescook@chromium.org>	2018-06-12 16:19:22 -07:00
Dan Williams	d4dd70915e	acpi, nfit: Remove ecc_unit_size The "Clear Error Unit" may be smaller than the ECC unit size on some devices. For example, poison may be tracked at 64-byte alignment even though the ECC unit is larger. Unless / until the ACPI specification provides a non-ambiguous way to communicate this property do not expose this to userspace. Software that had been using this property must already be prepared for the case where the property is not provided on older kernels, so it is safe to remove this attribute. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-06-03 12:49:15 -07:00
Linus Torvalds	9f3a0941fb	libnvdimm for 4.17 * A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. * The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. * Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. * The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. * Miscellaneous cleanups and updates to unit test infrastructure. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJazDt5AAoJEB7SkWpmfYgCqGMQALLwdPeY87cUK7AvQ2IXj46B lJgeVuHPzyQDbC03AS5uUYnnU3I5lFd7i4y7ZrywNpFs4lsb/bNmbUpQE5xp+Yvc 1MJ/JYDIP5X4misWYm3VJo85N49+VqSRgAQk52PBigwnZ7M6/u4cSptXM9//c9JL /NYbat6IjjY6Tx49Tec6+F3GMZjsFLcuTVkQcREoOyOqVJE4YpP0vhNjEe0vq6vr EsSWiqEI5VFH4PfJwKdKj/64IKB4FGKj2A5cEgjQBxW2vw7tTJnkRkdE3jDUjqtg xYAqGp/Dqs4+bgdYlT817YhiOVrcr5mOHj7TKWQrBPgzKCbcG5eKDmfT8t+3NEga 9kBlgisqIcG72lwZNA7QkEHxq1Omy9yc1hUv9qz2YA0G+J1WE8l1T15k1DOFwV57 qIrLLUypklNZLxvrzNjclempboKc4JCUlj+TdN5E5Y6pRs55UWTXaP7Xf5O7z0vf l/uiiHkc3MPH73YD2PSEGFJ8m8EU0N8xhrcz3M9E2sHgYCnbty1Lw3FH0/GhThVA ya1mMeDdb8A2P7gWCBk1Lqeig+rJKXSey4hKM6D0njOEtMQO1H4tFqGjyfDX1xlJ 3plUR9WBVEYzN5+9xWbwGag/ezGZ+NfcVO2gmy6yXiEph796BxRAZx/18zKRJr0m 9eGJG1H+JspcbtLF9iHn =acZQ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "This cycle was was not something I ever want to repeat as there were several late changes that have only now just settled. Half of the branch up to commit `d2c997c0f1` ("fs, dax: use page->mapping to warn...") have been in -next for several releases. The of_pmem driver and the address range scrub rework were late arrivals, and the dax work was scaled back at the last moment. The of_pmem driver missed a previous merge window due to an oversight. A sense of obligation to rectify that miss is why it is included for 4.17. It has acks from PowerPC folks. Stephen reported a build failure that only occurs when merging it with your latest tree, for now I have fixed that up by disabling modular builds of of_pmem. A test merge with your tree has received a build success report from the 0day robot over 156 configs. An initial version of the ARS rework was submitted before the merge window. It is self contained to libnvdimm, a net code reduction, and passing all unit tests. The filesystem-dax changes are based on the wait_var_event() functionality from tip/sched/core. However, late review feedback showed that those changes regressed truncate performance to a large degree. The branch was rewound to drop the truncate behavior change and now only includes preparation patches and cleanups (with full acks and reviews). The finalization of this dax-dma-vs-trnucate work will need to wait for 4.18. Summary: - A rework of the filesytem-dax implementation provides for detection of unmap operations (truncate / hole punch) colliding with in-progress device-DMA. A fix for these collisions remains a work-in-progress pending resolution of truncate latency and starvation regressions. - The of_pmem driver expands the users of libnvdimm outside of x86 and ACPI to describe an implementation of persistent memory on PowerPC with Open Firmware / Device tree. - Address Range Scrub (ARS) handling is completely rewritten to account for the fact that ARS may run for 100s of seconds and there is no platform defined way to cancel it. ARS will now no longer block namespace initialization. - The NVDIMM Namespace Label implementation is updated to handle label areas as small as 1K, down from 128K. - Miscellaneous cleanups and updates to unit test infrastructure" * tag 'libnvdimm-for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (39 commits) libnvdimm, of_pmem: workaround OF_NUMA=n build error nfit, address-range-scrub: add module option to skip initial ars nfit, address-range-scrub: rework and simplify ARS state machine nfit, address-range-scrub: determine one platform max_ars value powerpc/powernv: Create platform devs for nvdimm buses doc/devicetree: Persistent memory region bindings libnvdimm: Add device-tree based driver libnvdimm: Add of_node to region and bus descriptors libnvdimm, region: quiet region probe libnvdimm, namespace: use a safe lookup for dimm device name libnvdimm, dimm: fix dpa reservation vs uninitialized label area libnvdimm, testing: update the default smart ctrl_temperature libnvdimm, testing: Add emulation for smart injection commands nfit, address-range-scrub: introduce nfit_spa->ars_state libnvdimm: add an api to cast a 'struct nd_region' to its 'struct device' nfit, address-range-scrub: fix scrub in-progress reporting dax, dm: allow device-mapper to operate without dax support dax: introduce CONFIG_DAX_DRIVER fs, dax: use page->mapping to warn if truncate collides with a busy page ext2, dax: introduce ext2_dax_aops ...	2018-04-10 10:25:57 -07:00
Dan Williams	1ed41b5696	Merge branch 'for-4.17/libnvdimm' into libnvdimm-for-next	2018-04-09 10:50:08 -07:00
Dan Williams	bca811a7fd	nfit, address-range-scrub: add module option to skip initial ars After attempting to quickly retrieve known errors the kernel proceeds to kick off a long running ARS. Add a module option to disable this behavior at initialization time, or at new region discovery time. Otherwise, ARS can be started manually regardless of the state of this setting. Co-developed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-07 08:44:55 -07:00
Dan Williams	bc6ba80858	nfit, address-range-scrub: rework and simplify ARS state machine ARS is an operation that can take 10s to 100s of seconds to find media errors that should rarely be present. If the platform crashes due to media errors in persistent memory, the expectation is that the BIOS will report those known errors in a 'short' ARS request. A 'short' ARS request asks platform firmware to return an ARS payload with all known errors, but without issuing a 'long' scrub. At driver init a short request is issued to all PMEM ranges before registering regions. Then, in the background, a long ARS is scheduled for each region. The ARS implementation is simplified to centralize ARS completion work in the ars_complete() helper. The timeout is removed since there is no facility to cancel ARS, and this otherwise arranges for system init to never be blocked waiting for a 'long' ARS. The ars_state flags are used to coordinate ARS requests from driver init, ARS requests from userspace, and ARS requests in response to media error notifications. Given that there is no notification of ARS completion the implementation still needs to poll. It backs off exponentially to a maximum poll period of 30 minutes. Suggested-by: Toshi Kani <toshi.kani@hpe.com> Co-developed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-07 07:55:05 -07:00
Dan Williams	459d0ddb07	nfit, address-range-scrub: determine one platform max_ars value acpi_nfit_query_poison() is awkward in that it requires an nfit_spa argument in order to determine what max_ars value to use. Instead probe for the minimum max_ars across all scrub-capable ranges in the system and drop the nfit_spa argument. This enables a larger rework / simplification of the ARS state machine whereby the status can be retrieved once and then iterated over all address ranges to reap completions. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-07 07:55:05 -07:00
Dan Williams	14c73f997a	nfit, address-range-scrub: introduce nfit_spa->ars_state In preparation for re-working the ARS implementation to better handle short vs long ARS runs, introduce nfit_spa->ars_state. For now this just replaces the nfit_spa->ars_required bit-field/flag, but going forward it will be used to track ARS completion and make short vs long ARS requests. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-05 20:10:55 -07:00
Linus Torvalds	dd972f924d	* Add NVDIMM support to EDAC (Tony Luck) * misc fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAlrDZ6EACgkQEsHwGGHe VUpWWA//RG9WqqdXxPa05TxiRFWMiNSH/HSGJQZsc3K7xm2TEGVdDv5thYsxXCaW kA0oiUJA3zd62HeUIWMeGypm+hLM8T73836aIKcfxMduxQ6DR9nk51IV4RmHsLvT rd4bUsrjCov6spapqS265LOxZVnpKvxZv0NVTj6pIk0pkeghWMZh63vZtrxVkLsS 6dHsdIDBr0fMhcFBVQ67SEDpMIyMAqziPOP9sv6NNElG4Q93RnZr6KYvQdE32Iev Z6lIUU/sjVJEuhgp6FjJrhvjc+FmGJGERsFqo3Z8RT9P26Cj3wq0ov8IY/KSBazG mfJFGbL+O3lppGKjjuAwMGr4R3fZMLQpgLxdYe7VNvOHB9ea7732q9MDGiNkN/u7 z8o+dtvmPWr0Y1ir8te+7LnSQXIAwxMkKP/lMStzzuFUETQDd0RWr7GAKlbrgPOU +DhiQRPBFS7qpslA7BdqV6Ybr3nhDVZciDYWsvNpDMwdem05TxlFwC5l/BzP9NtM FmQ3B0rQ5we6gib6UZWUjW3BH9wUgFUpu/WXKJl1unMHwYNe3CvdM8yD5W7mUylh 6SVYAmkZyuJ1Du40+YMeimroe77XV60IqoUFNEzMSFeB4/GxMG6+u2gUqNuTpHpZ aMmCRj9DxWGxZyDsTc49An0X/JbVyIE/Aq6heMTYScgBvCaD1VE= =K89j -----END PGP SIGNATURE----- Merge tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp Pull EDAC updates from Borislav Petkov: "Noteworthy is the NVDIMM support: - NVDIMM support to EDAC (Tony Luck) - misc fixes" * tag 'edac_for_4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp: EDAC, sb_edac: Remove variable length array usage EDAC, skx_edac: Detect non-volatile DIMMs firmware, DMI: Add function to look up a handle and return DIMM size acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle EDAC: Add new memory type for non-volatile DIMMs EDAC: Drop duplicated array of strings for memory type names EDAC, layerscape: Allow building for LS1021A	2018-04-05 14:21:13 -07:00
Dan Williams	78727137fd	nfit, address-range-scrub: fix scrub in-progress reporting There is a small window whereby ARS scan requests can schedule work that userspace will miss when polling scrub_show. Hold the init_mutex lock over calls to report the status to close this potential escape. Also, make sure that requests to cancel the ARS workqueue are treated as an idle event. Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Fixes: `37b137ff8c` ("nfit, libnvdimm: allow an ARS scrub...") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-03 11:51:42 -07:00
Dan Williams	8d0d8ed335	nfit: fix region registration vs block-data-window ranges Commit `1cf03c00e7` "nfit: scrub and register regions in a workqueue" mistakenly attempts to register a region per BLK aperture. There is nothing to register for individual apertures as they belong as a set to a BLK aperture group that are registered with a corresponding DIMM-control-region. Filter them for registration to prevent some needless devm_kzalloc() allocations. Cc: <stable@vger.kernel.org> Fixes: `1cf03c00e7` ("nfit: scrub and register regions in a workqueue") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-04-02 16:49:30 -07:00
Dan Williams	466d1493ea	acpi, nfit: rework NVDIMM leaf method detection Some BIOSen do not handle 0-byte transfer lengths for the _LSR and _LSW (label storage read/write) methods. This causes Linux to fallback to the deprecated _DSM path, or otherwise disable label support. Introduce acpi_nvdimm_has_method() to detect whether a method is available rather than calling the method, require _LSI and _LSR to be paired, and require read support before enabling write support. Cc: <stable@vger.kernel.org> Fixes: `4b27db7e26` ("acpi, nfit: add support for the _LS...") Suggested-by: Erik Schmauss <erik.schmauss@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-28 10:44:50 -07:00
Dan Williams	0731de476a	nfit: skip region registration for incomplete control regions Per the ACPI specification the only functional purpose for a DIMM Control Region to be mapped into the system physical address space, from an OSPM perspective, is to support block-apertures. However, there are some BIOSen that publish DIMM Control Region SPA entries for pre-boot environment consumption. Undo the kernel policy of generating disabled 'ndblk' regions when this configuration is detected. Cc: <stable@vger.kernel.org> Fixes: `1f7df6f88b` ("libnvdimm, nfit: regions (block-data-window...)") Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-21 21:39:27 -07:00
Dan Williams	fe9a552e71	libnvdimm, nfit: fix persistence domain reporting The persistence domain is a point in the platform where once writes reach that destination the platform claims it will make them persistent relative to power loss. In the ACPI NFIT this is currently communicated as 2 bits in the "NFIT - Platform Capabilities Structure". The bits comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM Durability on Power Loss Capable". Commit `96c3a23905` "libnvdimm: expose platform persistence attr..." shows the persistence domain as flags, but it's really an enumerated hierarchy. Fix this newly introduced user ABI to show the closest available persistence domain before userspace develops dependencies on seeing, or needing to develop code to tolerate, the raw NFIT flags communicated through the libnvdimm-generic region attribute. Fixes: `96c3a23905` ("libnvdimm: expose platform persistence attr...") Reviewed-by: Dave Jiang <dave.jiang@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-21 15:12:07 -07:00
Tony Luck	23222f8f8d	acpi, nfit: Add function to look up nvdimm device and provide SMBIOS handle EDAC driver needs to look up attributes of NVDIMMs provided in SMBIOS. Provide a function that looks up an acpi_nfit_memory_map from a device handle (node/socket/mc/channel/dimm) and returns the SMBIOS handle. Also pass back the "flags" so we can see if the NVDIMM is OK. Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Aristeu Rozanski <aris@redhat.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Erik Schmauss <erik.schmauss@intel.com> Cc: Jean Delvare <jdelvare@suse.com> Cc: Len Brown <lenb@kernel.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Cc: Robert Moore <robert.moore@intel.com> Cc: devel@acpica.org Cc: linux-acpi@vger.kernel.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/20180312182430.10335-4-tony.luck@intel.com Signed-off-by: Borislav Petkov <bp@suse.de>	2018-03-14 12:43:50 +01:00
Johannes Thumshirn	b814735f5c	acpi, nfit: remove redundant __func__ in dev_dbg Dynamic debug can be instructed to add the function name to the debug output using the +f switch, so there is no need for the nfit module to do it again. If a user decides to add the +f switch for nfit's dynamic debug this results in double prints of the function name like the following: [ 2391.935383] acpi_nfit_ctl: nfit ACPI0012:00: acpi_nfit_ctl:nmem8 cmd: 10: func: 1 input length: 0 Thus remove the stray __func__ printing. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-03-05 15:58:36 -08:00
Ross Zwisler	ee95f4059a	Merge branch 'for-4.16/nfit' into libnvdimm-for-next	2018-02-03 00:26:26 -07:00
Ross Zwisler	d121f07691	Merge branch 'for-4.16/dax' into libnvdimm-for-next	2018-02-03 00:26:10 -07:00
Toshi Kani	23fbd7c70a	acpi, nfit: fix register dimm error handling A NULL pointer reference kernel bug was observed when acpi_nfit_add_dimm() called in acpi_nfit_register_dimms() failed. This error path does not set nfit_mem->nvdimm, but the 2nd list_for_each_entry() loop in the function assumes it's always set. Add a check to nfit_mem->nvdimm. Cc: <stable@vger.kernel.org> Fixes: `ba9c8dd3c2` ("acpi, nfit: add dimm device notification support") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2018-02-02 13:49:29 -08:00
Dave Jiang	30e6d7bf29	acpi: nfit: add persistent memory control flag for nd_region Propagate the ADR attribute flag from the NFIT platform capabilities sub-table to nd_region. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>	2018-02-01 15:01:15 -07:00
Dave Jiang	06e8ccdab1	acpi: nfit: Add support for detect platform CPU cache flush on power loss In ACPI 6.2a the platform capability structure has been added to the NFIT tables. That provides software the ability to determine whether a system supports the auto flushing of CPU caches on power loss. If the capability is supported, we do not need to do dax_flush(). Plumbing the path to set the property on per region from the NFIT tables. This patch depends on the ACPI NFIT 6.2a platform capabilities support code in include/acpi/actbl1.h. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>	2018-02-01 15:01:15 -07:00
Dan Williams	adf6895754	acpi, nfit: fix health event notification Integration testing with a BIOS that generates injected health event notifications fails to communicate those events to userspace. The nfit driver neglects to link the ACPI DIMM device with the necessary driver data so acpi_nvdimm_notify() fails this lookup: nfit_mem = dev_get_drvdata(dev); if (nfit_mem && nfit_mem->flags_attr) sysfs_notify_dirent(nfit_mem->flags_attr); Add the necessary linkage when installing the notification handler and clean it up when the nfit driver instance is torn down. Cc: <stable@vger.kernel.org> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Fixes: `ba9c8dd3c2` ("acpi, nfit: add dimm device notification support") Reported-by: Daniel Osawa <daniel.k.osawa@intel.com> Tested-by: Daniel Osawa <daniel.k.osawa@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-12-04 10:14:47 -08:00
Linus Torvalds	a3841f94c7	libnvdimm for 4.15 * Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. * Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. * Add support for the ACPI 6.2 NVDIMM media error injection methods. * Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. * Cleanup physical address information disclosures to be root-only. * Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. * Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: `957ac8c421` dax: fix PMD faults on zero-length files Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> `a39e596baa` xfs: support for synchronous DAX faults Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> `7b565c9f96` xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJaDfvcAAoJEB7SkWpmfYgCk7sP/2qJhBH+VTTdg2osDnhAdAhI co/AGEmsHFlUCMBb/Ek7UnMAmhBYiJU2q4ywPsNFBpusXpMlqNy5Iwo7k4/wQHE/ SJcIM0g4zg0ViFuUhwV+C2T0R5UzFR8JLd9EYWj/YS6aJpurtotm5l4UStaM0Hzo AhxSXJLrBDuqCpbOxbctfiGEmdRL7aRfBEAARTNRKBn/iXxJUcYHlp62rtXQS+t4 I6LC/URCWTNTTMGmzW6TRsgSD9WMfd19xKcGzN3qL6ee0KFccxN4ctFqHA/sFGOh iYLeR0XJUjJxyp+PkWGteXPVZL0Kj3bD/lSTG+Co5bm/ra8a/sh3TSFfgFyoBZD1 EqMN8Ryf80hGp3FabeH2Iw2SviYPZpHSWgjddjxLD0RA6OmpzINc+Wm8eqApjMME sbZDTOijiab4QMQ0XamF4GuDHyQtawv5Y/w2Ehhl1tmiqW+5tKhsKqxkQt+/V3Yt RTVSRe2Pkway66b+cD64IdQ6L2tyonPnmi5IzgkKOhlOEGomy+4/U2Jt2bMbhzq6 ymszKmXp2XI8P06wU8sHrIUeXO5I9qoKn/fZA73Eb8aIzgJe3tBE/5+Ab7RG6HB9 1OVfcMWoXU1gNgNktTs63X1Lsg4aW9kt/K4fPHHcqUcaliEJpJTlAbg9GLF2buoW nQ+0fTRgMRihE3ZA0Fs3 =h2vZ -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - `957ac8c421` ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - `a39e596baa` ("xfs: support for synchronous DAX faults") and `7b565c9f96` ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...	2017-11-17 09:51:57 -08:00
Dan Williams	79ab67ede2	acpi, nfit: add 'Enable Latch System Shutdown Status' command support The NVDIMM_FAMILY_INTEL 'Enable Latch System Shutdown Status' command indicates to the platform that system software has acknowledged the most recent unsafe shutdown status. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-15 16:55:41 -08:00
Dan Williams	0e7f074145	acpi, nfit: validate commands against the device type Fix occasions in acpi_nfit_ctl where we check the command type without validating whether we are parsing dimm vs bus level commands. Where the command numbers alias between dimms and bus we can make the wrong assumption just checking the raw command number. For example, with a simple nfit_test mock up of the clear-error command we trigger the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000094 IP: acpi_nfit_ctl+0x29b/0x930 [nfit] [..] Call Trace: nfit_test_probe+0xb85/0xc09 [nfit_test] platform_drv_probe+0x3b/0xa0 ? platform_drv_probe+0x3b/0xa0 driver_probe_device+0x29c/0x450 ? test_alloc+0x180/0x180 [nfit_test] __driver_attach+0xe3/0xf0 ? driver_probe_device+0x450/0x450 bus_for_each_dev+0x73/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x173/0x270 driver_register+0x60/0xe0 __platform_driver_register+0x36/0x40 nfit_test_init+0x2a1/0x1000 [nfit_test] Fixes: `4b27db7e26` ("acpi, nfit: add support for the _LSI, _LSR, and...") Reported-by: Vishal Verma <vishal.l.verma@intel.com> Tested-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-12 15:18:57 -08:00
Dave Jiang	aa9ad44a42	libnvdimm: move poison list functions to a new 'badrange' file nfit_test needs to use the poison list manipulation code as well. Make it more generic and in the process rename poison to badrange, and move all the related helpers to a new file. Signed-off-by: Dave Jiang <dave.jiang@intel.com> [vishal: Add badrange.o to nfit_test's Kbuild] [vishal: add a missed include in bus.c for the new badrange functions] [vishal: rename all instances of 'be' to 'bre'] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-11-02 10:42:30 -07:00
Greg Kroah-Hartman	b24413180f	License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a /uapi/ one with no licensing information in it, - file was a /uapi/ one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non /uapi/ files that summary was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a /uapi/ path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------\|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the /uapi/ ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------\|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2017-11-02 11:10:55 +01:00
Dan Williams	11e1427016	acpi, nfit: add support for NVDIMM_FAMILY_INTEL v1.6 DSMs Per v1.6 of the NVDIMM_FAMILY_INTEL command set [1] some of the new commands require rev-id 2. In addition to enabling ND_CMD_CALL for these new function numbers, add a lookup table for revision-ids by family and function number. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface-V1.6.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-10-30 11:22:32 -07:00
Dan Williams	b9b1504d3c	acpi, nfit: hide unknown commands from nmemX/commands For vendor specific commands that do not have a common kernel translation, hide them from nmemX/commands. For example, the following results from new enabling to probe for support of the new NVDIMM_FAMILY_INTEL DSMs specified in v1.6 of the command specification [1]: # cat /sys/bus/nd/devices/nmem0/commands smart smart_thresh flags get_size get_data set_data effect_size effect_log vendor cmd_call unknown unknown unknown unknown unknown unknown unknown unknown [1]: https://pmem.io/documents/NVDIMM_DSM_Interface-V1.6.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-10-29 12:13:07 -07:00
Dan Williams	4b27db7e26	acpi, nfit: add support for the _LSI, _LSR, and _LSW label methods ACPI 6.2 adds support for named methods to access the label storage area of an NVDIMM. We prefer these new methods if available and otherwise fallback to the NVDIMM_FAMILY_INTEL _DSMs. The kernel ioctls, ND_IOCTL_{GET,SET}_CONFIG_{SIZE,DATA}, remain generic and the driver translates the 'package' payloads into the NVDIMM_FAMILY_INTEL 'buffer' format to maintain compatibility with existing userspace and keep the output buffer parsing code in the driver common. The output payloads are mostly compatible save for the 'label area locked' status that moves from the 'config_size' (_LSI) command to the 'config_read' (_LSR) command status. Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-10-07 10:03:40 -07:00
Yasunori Goto	b37b3fd33d	acpi nfit: Enable to show what feature is supported via ND_CMD_CALL for nfit_test Though nfit_test need to show what feature is supported via ND_CMD_CALL on device/nfit/dsm_mask, currently there is no way to tell it. This patch makes to enable it. Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-10-07 09:02:28 -07:00
Linus Torvalds	89fd915c40	libnvdimm for 4.14 * Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. * The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. * A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. * Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes. -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJZtsAGAAoJEB7SkWpmfYgCrzMP/2vPvZvrFjZn5pAoZjlmTmHM ySceoOC7vwvVXIsSs52FhSjcxEoXo9cklXPwhXOPVtVUFdSDJBUOIUxwIziE6Y+5 sFJ2xT9K+5zKBUiXJwqFQDg52dn//eBNnnnDz+HQrBSzGrbWQhIZY2m19omPzv1I BeN0OCGOdW3cjSo3BCFl1d+KrSl704e7paeKq/TO3GIiAilIXleTVxcefEEodV2K ZvWHpFIhHeyN8dsF8teI952KcCT92CT/IaabxQIwCxX0/8/GFeDc5aqf77qiYWKi uxCeQXdgnaE8EZNWZWGWIWul6eYEkoCNbLeUQ7eJnECq61VxVajJS0NyGa5T9OiM P046Bo2b1b3R0IHxVIyVG0ZCm3YUMAHSn/3uRxPgESJ4bS/VQ3YP5M6MLxDOlc90 IisLilagitkK6h8/fVuVrwciRNQ71XEC34t6k7GCl/1ZnLlLT+i4/jc5NRZnGEZh aXAAGHdteQ+/mSz6p2UISFUekbd6LerwzKRw8ibDvH6pTud8orYR7g2+JoGhgb6Y pyFVE8DhIcqNKAMxBsjiRZ46OQ7qrT+AemdAG3aVv6FaNoe4o5jPLdw2cEtLqtpk +DNm0/lSWxxxozjrvu6EUZj6hk8R5E19XpRzV5QJkcKUXMu7oSrFLdMcC4FeIjl9 K4hXLV3fVBVRMiS0RA6z =5iGY -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm from Dan Williams: "A rework of media error handling in the BTT driver and other updates. It has appeared in a few -next releases and collected some late- breaking build-error and warning fixups as a result. Summary: - Media error handling support in the Block Translation Table (BTT) driver is reworked to address sleeping-while-atomic locking and memory-allocation-context conflicts. - The dax_device lookup overhead for xfs and ext4 is moved out of the iomap hot-path to a mount-time lookup. - A new 'ecc_unit_size' sysfs attribute is added to advertise the read-modify-write boundary property of a persistent memory range. - Preparatory fix-ups for arm and powerpc pmem support are included along with other miscellaneous fixes" * tag 'libnvdimm-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (26 commits) libnvdimm, btt: fix format string warnings libnvdimm, btt: clean up warning and error messages ext4: fix null pointer dereference on sbi libnvdimm, nfit: move the check on nd_reserved2 to the endpoint dax: fix FS_DAX=n BLOCK=y compilation libnvdimm: fix integer overflow static analysis warning libnvdimm, nd_blk: remove mmio_flush_range() libnvdimm, btt: rework error clearing libnvdimm: fix potential deadlock while clearing errors libnvdimm, btt: cache sector_size in arena_info libnvdimm, btt: ensure that flags were also unchanged during a map_read libnvdimm, btt: refactor map entry operations with macros libnvdimm, btt: fix a missed NVDIMM_IO_ATOMIC case in the write path libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute ext4: perform dax_device lookup at mount ext2: perform dax_device lookup at mount xfs: perform dax_device lookup at mount dax: introduce a fs_dax_get_by_bdev() helper libnvdimm, btt: check memory allocation failure libnvdimm, label: fix index block size calculation ...	2017-09-11 13:10:57 -07:00
Meng Xu	9edcad53d6	libnvdimm, nfit: move the check on nd_reserved2 to the endpoint Delay the check of nd_reserved2 to the actual endpoint (acpi_nfit_ctl) that uses it, as a prevention of a potential double-fetch bug. While examining the kernel source code, I found a dangerous operation that could turn into a double-fetch situation (a race condition bug) where the same userspace memory region are fetched twice into kernel with sanity checks after the first fetch while missing checks after the second fetch. In the case of _IOC_NR(ioctl_cmd) == ND_CMD_CALL: 1. The first fetch happens in line 935 copy_from_user(&pkg, p, sizeof(pkg) 2. subsequently `pkg.nd_reserved2` is asserted to be all zeroes (line 984 to 986). 3. The second fetch happens in line 1022 copy_from_user(buf, p, buf_len) 4. Given that `p` can be fully controlled in userspace, an attacker can race condition to override the header part of `p`, say, `((struct nd_cmd_pkg *)p)->nd_reserved2` to arbitrary value (say nine 0xFFFFFFFF for `nd_reserved2`) after the first fetch but before the second fetch. The changed value will be copied to `buf`. 5. There is no checks on the second fetches until the use of it in line 1034: nd_cmd_clear_to_send(nvdimm_bus, nvdimm, cmd, buf) and line 1038: nd_desc->ndctl(nd_desc, nvdimm, cmd, buf, buf_len, &cmd_rc) which means that the assumed relation, `p->nd_reserved2` are all zeroes might not hold after the second fetch. And once the control goes to these functions we lose the context to assert the assumed relation. 6. Based on my manual analysis, `p->nd_reserved2` is not used in function `nd_cmd_clear_to_send` and potential implementations of `nd_desc->ndctl` so there is no working exploit against it right now. However, this could easily turns to an exploitable one if careless developers start to use `p->nd_reserved2` later and assume that they are all zeroes. Move the validation of the nd_reserved2 field to the ->ndctl() implementation where it has a stable buffer to evaluate. Signed-off-by: Meng Xu <mengxu.gatech@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-09-04 11:02:21 -07:00
Robin Murphy	5deb67f77a	libnvdimm, nd_blk: remove mmio_flush_range() mmio_flush_range() suffers from a lack of clearly-defined semantics, and is somewhat ambiguous to port to other architectures where the scope of the writeback implied by "flush" and ordering might matter, but MMIO would tend to imply non-cacheable anyway. Per the rationale in `67a3e8fe90` ("nd_blk: change aperture mapping from WC to WB"), the only existing use is actually to invalidate clean cache lines for ARCH_MEMREMAP_PMEM type mappings without writeback. Since the recent cleanup of the pmem API, that also now happens to be the exact purpose of arch_invalidate_pmem(), which would be a far more well-defined tool for the job. Rather than risk potentially inconsistent implementations of mmio_flush_range() for the sake of one callsite, streamline things by removing it entirely and instead move the ARCH_MEMREMAP_PMEM related definitions up to the libnvdimm level, so they can be shared by NFIT as well. This allows NFIT to be enabled for arm64. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-08-31 15:05:10 -07:00
Dan Williams	a15797f4be	libnvdimm, nfit: export an 'ecc_unit_size' sysfs attribute When the nfit driver initializes it runs an ARS (Address Range Scrub) operation across every pmem range. Part of that process involves determining the ARS capabilities of a given address range. One of the capabilities that is reported is the 'Clear Uncorrectable Error Range Length Unit Size' (see: ACPI 6.2 section 9.20.7.4 Function Index 1 - Query ARS Capabilities). This property is of interest to userspace software as it indicates the boundary at which the NVDIMM may need to perform read-modify-write cycles to maintain ECC blocks. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-08-31 12:53:36 -07:00
Boqun Feng	1c322ac06d	acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse COMPLETION_INITIALIZER_ONSTACK() is supposed to be used as an initializer, in other words, it should only be used in assignment expressions or compound literals. So the usage in drivers/acpi/nfit/core.c: COMPLETION_INITIALIZER_ONSTACK(flush.cmp); ... is inappropriate. Besides, this usage could also break the build for another fix that reduces stack sizes caused by COMPLETION_INITIALIZER_ONSTACK(), because that fix changes COMPLETION_INITIALIZER_ONSTACK() from rvalue to lvalue, and usage as above will report the following error: drivers/acpi/nfit/core.c: In function 'acpi_nfit_flush_probe': include/linux/completion.h:77:3: error: value computed is not used [-Werror=unused-value] (*({ init_completion(&work); &work; })) This patch fixes this by replacing COMPLETION_INITIALIZER_ONSTACK() with init_completion() in acpi_nfit_flush_probe(), which does the same initialization without any other problems. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Len Brown <lenb@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: walken@google.com Cc: willy@infradead.org Link: http://lkml.kernel.org/r/20170824142239.15178-1-boqun.feng@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-08-29 15:14:38 +02:00
Dan Williams	dcb79b1544	nfit: cleanup long de-reference chains in acpi_nfit_init_interleave_set Use a local 'struct acpi_nfit_control_region *' variable to shorten the pointer chasing chains. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-08-07 14:27:57 -07:00
Dan Williams	401c0a19c6	nfit, libnvdimm, region: export 'position' in mapping info It is useful to be able to know the position of a DIMM in an interleave-set. Consider the case where the order of the DIMMs changes causing a namespace to be invalidated because the interleave-set cookie no longer matches. If the before and after state of each DIMM position is known this state debugged by the system owner. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-08-04 17:20:16 -07:00
Prarit Bhargava	7e700d2c59	acpi/nfit: Fix memory corruption/Unregister mce decoder on failure nfit_init() calls nfit_mce_register() on module load. When the module load fails the nfit mce decoder is not unregistered. The module's memory is freed leaving the decoder chain referencing junk. This will cause panics as future registrations will reference the free'd memory. Unregister the nfit mce decoder on module init failure. [v2]: register and then unregister mce handler to avoid losing mce events [v3]: also cleanup nfit workqueue Fixes: `6839a6d96f` ("nfit: do an ARS scrub on hitting a latent media error") Cc: <stable@vger.kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <lenb@kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: "Lee, Chun-Yi" <joeyli.kernel@gmail.com> Cc: Linda Knippers <linda.knippers@hpe.com> Cc: lszubowi@redhat.com Acked-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Prarit Bhargava <prarit@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-07-17 11:43:58 -07:00
Dan Williams	9d92573fff	Merge branch 'for-4.13/dax' into libnvdimm-for-next	2017-07-03 16:54:58 -07:00
Toshi Kani	807900395e	acpi/nfit: Issue Start ARS to retrieve existing records ACPI 6.2 defines in section 9.20.7.2 that the OSPM may call a Start ARS with Flags Bit [1] set upon receiving the 0x81 notification. Upon receiving the notification, the OSPM may decide to issue a Start ARS with Flags Bit [1] set to prepare for the retrieval of existing records and issue the Query ARS Status function to retrieve the records. Add support to call a Start ARS from acpi_nfit_uc_error_notify() with ND_ARS_RETURN_PREV_DATA set when HW_ERROR_SCRUB_ON is not set. Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-07-02 09:56:37 -07:00
Jerry Hoemann	41f95db7b9	acpi, nfit: Show bus_dsm_mask in sysfs Display bus_dsm_mask in sysfs as /sys/bus/nd/devices/ndbusX/nfit/dsm_mask. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-07-01 08:49:59 -07:00
Jerry Hoemann	7db5bb33ad	libnvdimm, acpi, nfit: Add bus level dsm mask for pass thru. Add a bus level dsm_mask to nvdimm_bus_descriptor to allow the passthru calling mechanism to specify a different mask from the cmd_mask. Populate bus_dsm_mask and use it to filter dsm calls that user can make through the pass thru interface. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> [djbw: use command number constants instead of a magic mask value] Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-07-01 08:49:59 -07:00
Jerry Hoemann	37d74841b9	acpi, nfit: Enable DSM pass thru for root functions. Set ND_CMD_CALL in the cmd_mask to enable calling root functions via the pass thru mechanism. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-07-01 08:49:59 -07:00
Arvind Yadav	5e93746f06	acpi, nfit: constify *_attribute_group File size before: text data bss dec hex filename 20792 1580 994 23366 5b46 drivers/acpi/nfit/core.o File size After adding 'const': text data bss dec hex filename 20968 1388 994 23350 5b36 drivers/acpi/nfit/core.o Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-29 13:50:37 -07:00
Dan Williams	c9e582aa68	libnvdimm, nfit: enable support for volatile ranges Allow volatile nfit ranges to participate in all the same infrastructure provided for persistent memory regions. A resulting resulting namespace device will still be called "pmem", but the parent region type will be "nd_volatile". This is in preparation for disabling the dax ->flush() operation in the pmem driver when it is hosted on a volatile range. Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-27 16:44:13 -07:00
Dan Williams	ca6a4657e5	x86, libnvdimm, pmem: remove global pmem api Now that all callers of the pmem api have been converted to dax helpers that call back to the pmem driver, we can remove include/linux/pmem.h and asm/pmem.h. Cc: <x86@kernel.org> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Oliver O'Halloran <oohall@gmail.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-27 16:29:54 -07:00
Toshi Kani	56b47fe657	acpi/nfit: Add support of NVDIMM memory error notification in ACPI 6.2 ACPI 6.2 defines a new ACPI notification value to NVDIMM Root Device in Table 5-169. 0x81 Unconsumed Uncorrectable Memory Error Detected Used to pro-actively notify OSPM of uncorrectable memory errors detected (for example a memory scrubbing engine that continuously scans the NVDIMMs memory). This is an optional notification. Only locations that were mapped in to SPA by the platform will generate a notification. Add support of this notification value by initiating an ARS scan. This will find new error locations and add their badblocks information. Link: http://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:39:42 -07:00
Dan Williams	8f2bc2430e	libnvdimm, label: populate 'isetcookie' for blk-aperture namespaces Starting with the v1.2 definition of namespace labels, the isetcookie field is populated and validated for blk-aperture namespaces. This adds some safety against inadvertent copying of namespace labels from one DIMM-device to another. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:31:40 -07:00
Dan Williams	faec6f8a1c	libnvdimm, label: populate the type_guid property for v1.2 namespaces The type_guid refers to the "Address Range Type GUID" for the region backing a namespace as defined the ACPI NFIT (NVDIMM Firmware Interface Table). This 'type' identifier specifies an access mechanism for the given namespace. This capability replaces the confusing usage of the 'NSLABEL_FLAG_LOCAL' flag to indicate a block-aperture-mode namespace. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:31:40 -07:00
Dan Williams	c12c48ce86	libnvdimm, label: add v1.2 interleave-set-cookie algorithm The interleave-set-cookie algorithm is extended to incorporate all the same components that are used to generate an nvdimm unique-id. For backwards compatibility we still maintain the old v1.1 definition. Reported-by: Nicholas Moulin <nicholas.w.moulin@intel.com> Reported-by: Kaushik Kanetkar <kaushik.a.kanetkar@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-15 14:31:39 -07:00
Dan Williams	0aed55af88	x86, uaccess: introduce copy_from_iter_flushcache for pmem / cache-bypass operations The pmem driver has a need to transfer data with a persistent memory destination and be able to rely on the fact that the destination writes are not cached. It is sufficient for the writes to be flushed to a cpu-store-buffer (non-temporal / "movnt" in x86 terms), as we expect userspace to call fsync() to ensure data-writes have reached a power-fail-safe zone in the platform. The fsync() triggers a REQ_FUA or REQ_FLUSH to the pmem driver which will turn around and fence previous writes with an "sfence". Implement a __copy_from_user_inatomic_flushcache, memcpy_page_flushcache, and memcpy_flushcache, that guarantee that the destination buffer is not dirty in the cpu cache on completion. The new copy_from_iter_flushcache and sub-routines will be used to replace the "pmem api" (include/linux/pmem.h + arch/x86/include/asm/pmem.h). The availability of copy_from_iter_flushcache() and memcpy_flushcache() are gated by the CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE config symbol, and fallback to copy_from_iter_nocache() and plain memcpy() otherwise. This is meant to satisfy the concern from Linus that if a driver wants to do something beyond the normal nocache semantics it should be something private to that driver [1], and Al's concern that anything uaccess related belongs with the rest of the uaccess code [2]. The first consumer of this interface is a new 'copy_from_iter' dax operation so that pmem can inject cache maintenance operations without imposing this overhead on other dax-capable drivers. [1]: https://lists.01.org/pipermail/linux-nvdimm/2017-January/008364.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2017-April/009942.html Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-06-09 09:09:56 -07:00
Andy Shevchenko	94116f8126	ACPI: Switch to use generic guid_t in acpi_evaluate_dsm() acpi_evaluate_dsm() and friends take a pointer to a raw buffer of 16 bytes. Instead we convert them to use guid_t type. At the same time we convert current users. acpi_str_to_uuid() becomes useless after the conversion and it's safe to get rid of it. Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Borislav Petkov <bp@suse.de> Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Cc: Ben Skeggs <bskeggs@redhat.com> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Joerg Roedel <jroedel@suse.de> Acked-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Yisen Zhuang <yisen.zhuang@huawei.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Acked-by: Mark Brown <broonie@kernel.org> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-07 12:20:49 +02:00
Andy Shevchenko	41c8bdb3ab	acpi, nfit: Switch to use new generic UUID API There are new types and helpers that are supposed to be used in new code. As a preparation to get rid of legacy types and API functions do the conversion here. Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>	2017-06-05 19:42:02 +02:00
Vishal Verma	fc08a4703a	acpi, nfit: Fix the memory error check in nfit_handle_mce() The check for an MCE being a memory error in the NFIT mce handler was bogus. Use the new mce_is_memory_error() helper to detect the error properly. Reported-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-3-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>	2017-05-21 21:39:59 +02:00
Dan Williams	736163671b	Merge branch 'for-4.12/dax' into libnvdimm-for-next	2017-05-04 23:38:43 -07:00
Dan Williams	9d62ed9651	libnvdimm: handle locked label storage areas Per the latest version of the "NVDIMM DSM Interface Example" [1], the label data retrieval routine can report a "locked" status. In this case all regions associated with that DIMM are disabled until the label area is unlocked. Provide generic libnvdimm enabling for NVDIMMs with label data area locking capabilities. [1]: http://pmem.io/documents/ Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-05-04 15:41:39 -07:00
Dan Williams	8f078b38dd	libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED This is a preparation patch for handling locked nvdimm label regions, a new concept as introduced by the latest DSM document on pmem.io [1]. A future patch will leverage nvdimm_set_locked() at DIMM probe time to flag regions that can not be enabled. There should be no functional difference resulting from this change. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example-V1.3.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-05-04 14:01:24 -07:00
Dan Williams	7699a6a36b	acpi, nfit: kill ACPI_NFIT_DEBUG Inevitably when one actually needs to debug a DSM issue it's on a distribution kernel that has CONFIG_ACPI_NFIT_DEBUG=n. The config symbol was only there to avoid the compile error due to the missing fallback for print_hex_dump_debug in the CONFIG_DYNAMIC_DEBUG=n case. That was fixed with commit `cdf17449af` "hexdump: do not print debug dumps for !CONFIG_DEBUG", so the config symbol can just be dropped. Cc: Joe Perches <joe@perches.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-28 16:07:00 -07:00
Dan Williams	6abccd1bfe	x86, dax, pmem: remove indirection around memcpy_from_pmem() memcpy_from_pmem() maps directly to memcpy_mcsafe(). The wrapper serves no real benefit aside from affording a more generic function name than the x86-specific 'mcsafe'. However this would not be the first time that x86 terminology leaked into the global namespace. For lack of better name, just use memcpy_mcsafe() directly. This conversion also catches a place where we should have been using plain memcpy, acpi_nfit_blk_single_io(). Cc: <x86@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-25 13:20:46 -07:00
Dan Williams	fbabd829fe	acpi, nfit: fix module unload vs workqueue shutdown race The workqueue may still be running when the devres callbacks start firing to deallocate an acpi_nfit_desc instance. Stop and flush the workqueue before letting any other devres de-allocations proceed. Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-18 10:55:37 -07:00
Dan Williams	9ccaed4bfd	acpi, nfit: limit ->flush_probe() to initialization work The nvdimm probe flushing mechanism gives userspace a sync point where it knows all asynchronous driver probe sequences have completed. However, it need not wait for other asynchronous actions, like on-demand address-range-scrub. Track the init work separately from other work in the workqueue, and only flush the former. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-17 12:34:17 -07:00
Dan Williams	caa603aae0	acpi, nfit: collate health state flags Be tolerant of cases where the BIOS provided NFIT does not consistently set the flags in all NVDIMM Region Mapping structures associated with a given dimm. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-17 12:34:17 -07:00
Dan Williams	1499934dcd	acpi, nfit: support "map failed" dimms Stop requiring dimms be successfully mapped into a system-physical-address range. For provisioning and hardware remediation purposes the kernel should account for failed devices in sysfs. If possible it should still allow management commands to be sent to the device. Reported-by: Toshi Kani <toshi.kani@hpe.com> Tested-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-17 12:34:17 -07:00
Dan Williams	ffab9385b3	acpi, nfit: add support for acpi 6.1 dimm state flags Add support for the ACPI_NFIT_MEM_MAP_FAILED ("map_fail") and ACPI_NFIT_MEM_HEALTH_ENABLED ("smart_notify") health state flags. The "map_fail" flag identifies DIMMs that were not mapped into one or more physical address ranges. The "health_notify" flag indicates whether platform firmware will send notifications when there is new SMART health data to consume. Acked-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-14 13:29:44 -07:00
Dan Williams	3c87f37269	acpi, nfit: fix acpi_get_table leak Calls to acpi_get_table() must be paired with acpi_put_table() to undo the mapping established by acpi_tb_acquire_table(). It turns out this has no effect in practice since the NFIT will already be mapped to support the /sys/firmware/acpi/tables/NFIT attribute in sysfs. Fixes: `6b11d1d677` ("ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users") Cc: Lv Zheng <lv.zheng@intel.com> Reported-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-12 21:56:41 -07:00
Linda Knippers	f2668fa7f6	acpi, nfit: remove unnecessary newline The newline in MODULE_PARM_DESC causes modinfo to print the parameter data type on a separate line, which is different from all the other module parameters and could potentially cause a problem for someone parsing the output of modinfo. Signed-off-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-12 21:56:41 -07:00
Linda Knippers	ba650cfcf9	acpi, nfit: allow specifying a default DSM family Provide the ability to request a default DSM family. If it is not supported, then fall back to the normal discovery order. This is helpful for testing platforms that support multiple DSM families. It will also allow administrators to request the DSM family that their management tools support, which may not be the first one found using the current discovery order. Signed-off-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-12 21:56:41 -07:00
Linda Knippers	095ab4b39f	acpi, nfit: allow override of built-in bitmasks for nvdimm DSMs As it is today, we can't enable or test new NVDIMM management functions provided by new firmware and tools without changing the kernel. We also can't prevent documented DSM functions from being called in the case of buggy firmware. This patch provides a module parameter that overrides the DSM function mask that is built into the kernel. If the "disable_vendor_specific" module parameter is also used we ignore the new parameter. Signed-off-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-04-12 21:56:40 -07:00
Dan Williams	b03b99a329	acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison) While reviewing the -stable patch for commit `86ef58a4e3` "nfit, libnvdimm: fix interleave set cookie calculation" Ben noted: "This is returning an int, thus it's effectively doing a 32-bit comparison and not the 64-bit comparison you say is needed." Update the compare operation to be immune to this integer demotion problem. Cc: <stable@vger.kernel.org> Cc: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Fixes: `86ef58a4e3` ("nfit, libnvdimm: fix interleave set cookie calculation") Reported-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-03-27 21:53:38 -07:00
Linus Torvalds	0b94da8dfc	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "A fix and regression test case for nvdimm namespace label compatibility. Details: - An "nvdimm namespace label" is metadata on an nvdimm that provisions dimm capacity into a "namespace" that can host a block device / dax-filesytem, or a device-dax character device. A namespace is an object that other operating environment and platform firmware needs to comprehend for capabilities like booting from an nvdimm. The label metadata contains a checksum that Linux was not calculating correctly leading to other environments rejecting the Linux label. These have received a build success notification from the kbuild robot, and a positive test result from Nick who reported the problem" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: nfit, libnvdimm: fix interleave set cookie calculation tools/testing/nvdimm: make iset cookie predictable	2017-03-03 16:48:48 -08:00
Dan Williams	86ef58a4e3	nfit, libnvdimm: fix interleave set cookie calculation The interleave-set cookie is a sum that sanity checks the composition of an interleave set has not changed from when the namespace was initially created. The checksum is calculated by sorting the DIMMs by their location in the interleave-set. The comparison for the sort must be 64-bit wide, not byte-by-byte as performed by memcmp() in the broken case. Fix the implementation to accept correct cookie values in addition to the Linux "memcmp" order cookies, but only allow correct cookies to be generated going forward. It does mean that namespaces created by third-party-tooling, or created by newer kernels with this fix, will not validate on older kernels. However, there are a couple mitigating conditions: 1/ platforms with namespace-label capable NVDIMMs are not widely available. 2/ interleave-sets with a single-dimm are by definition not affected (nothing to sort). This covers the QEMU-KVM NVDIMM emulation case. The cookie stored in the namespace label will be fixed by any write the namespace label, the most straightforward way to achieve this is to write to the "alt_name" attribute of a namespace in sysfs. Cc: <stable@vger.kernel.org> Fixes: `eaf961536e` ("libnvdimm, nfit: add interleave-set state-tracking infrastructure") Reported-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Tested-by: Nicholas Moulin <nicholas.w.moulin@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-03-01 00:49:42 -08:00
Linus Torvalds	60c906bab1	Merge branch 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Ingo Molnar: "The main changes in this cycle were: - Assign notifier chain priorities for all RAS related handlers to make the ordering explicit (Borislav Petkov) - Improve the AMD MCA banks sysfs output (Yazen Ghannam) - Various cleanups and restructuring of the x86 RAS code (Borislav Petkov)" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority x86/ras: Get rid of mce_process_work() EDAC/mce/amd: Dump TSC value EDAC/mce/amd: Unexport amd_decode_mce() x86/ras/amd/inj: Change dependency x86/ras: Flip the TSC-adding logic x86/ras/amd: Make sysfs names of banks more user-friendly x86/ras/therm_throt: Do not log a fake MCE for thermal events x86/ras/inject: Make it depend on X86_LOCAL_APIC=y	2017-02-20 12:47:44 -08:00
Dan Williams	e471486c13	acpi, nfit: fix acpi_nfit_flush_probe() crash We queue an on-stack work item to 'nfit_wq' and wait for it to complete as part of a 'flush_probe' request. However, if the user cancels the wait we need to make sure the item is flushed from the queue otherwise we are leaving an out-of-scope stack address on the work list. BUG: unable to handle kernel paging request at ffffbcb3c72f7cd0 IP: [<ffffffffa9413a7b>] __list_add+0x1b/0xb0 [..] RIP: 0010:[<ffffffffa9413a7b>] [<ffffffffa9413a7b>] __list_add+0x1b/0xb0 RSP: 0018:ffffbcb3c7ba7c00 EFLAGS: 00010046 [..] Call Trace: [<ffffffffa90bb11a>] insert_work+0x3a/0xc0 [<ffffffffa927fdda>] ? seq_open+0x5a/0xa0 [<ffffffffa90bb30a>] __queue_work+0x16a/0x460 [<ffffffffa90bbb08>] queue_work_on+0x38/0x40 [<ffffffffc0cf2685>] acpi_nfit_flush_probe+0x95/0xc0 [nfit] [<ffffffffc0cf25d0>] ? nfit_visible+0x40/0x40 [nfit] [<ffffffffa9571495>] wait_probe_show+0x25/0x60 [<ffffffffa9546b30>] dev_attr_show+0x20/0x50 Fixes: `7ae0fa439f` ("nfit, libnvdimm: async region scrub workqueue") Cc: <stable@vger.kernel.org> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2017-02-03 11:47:36 -08:00
Borislav Petkov	9026cc82b6	x86/ras, EDAC, acpi: Assign MCE notifier handlers a priority Assign all notifiers on the MCE decode chain a priority so that they get called in the correct order. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-10-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2017-01-24 09:14:57 +01:00
Rafael J. Wysocki	c8e008e2a6	Merge branches 'acpica' and 'acpi-scan' * acpica: ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users ACPICA: Tables: Allow FADT to be customized with virtual address ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel * acpi-scan: ACPI: do not warn if _BQC does not exist	2016-12-22 14:34:24 +01:00
Lv Zheng	6b11d1d677	ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users This patch removes the users of the deprectated APIs: acpi_get_table_with_size() early_acpi_os_unmap_memory() The following APIs should be used instead of: acpi_get_table() acpi_put_table() The deprecated APIs are invented to be a replacement of acpi_get_table() during the early stage so that the early mapped pointer will not be stored in ACPICA core and thus the late stage acpi_get_table() won't return a wrong pointer. The mapping size is returned just because it is required by early_acpi_os_unmap_memory() to unmap the pointer during early stage. But as the mapping size equals to the acpi_table_header.length (see acpi_tb_init_table_descriptor() and acpi_tb_validate_table()), when such a convenient result is returned, driver code will start to use it instead of accessing acpi_table_header to obtain the length. Thus this patch cleans up the drivers by replacing returned table size with acpi_table_header.length, and should be a no-op. Reported-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-12-21 02:36:38 +01:00
Dan Williams	a7de92dac9	tools/testing/nvdimm: unit test acpi_nfit_ctl() A recent flurry of bug discoveries in the nfit driver's DSM marshalling routine has highlighted the fact that we do not have unit test coverage for this routine. Add a self-test of acpi_nfit_ctl() routine before probing the "nfit_test.0" device. This mocks stimulus to acpi_nfit_ctl() and if any of the tests fail "nfit_test.0" will be unavailable causing the rest of the tests to not run / fail. This unit test will also be a place to land reproductions of quirky BIOS behavior discovered in the field and ensure the kernel does not regress against implementations it has seen in practice. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-06 17:42:36 -08:00
Dan Williams	d6eb270c57	acpi, nfit: fix bus vs dimm confusion in xlat_status Given dimms and bus commands share the same command number space we need to be careful that we are translating status in the correct context. Otherwise we can, for example, fail an ND_CMD_GET_CONFIG_SIZE command because max_xfer is zero. It fails because that condition erroneously correlates with the 'cleared == 0' failure of ND_CMD_CLEAR_ERROR. Cc: <stable@vger.kernel.org> Fixes: `aef2533822` ("libnvdimm, nfit: centralize command status translation") Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-06 16:30:37 -08:00
Dan Williams	82aa37cf09	acpi, nfit: validate ars_status output buffer size If an ARS Status command returns truncated output, do not process partial records or otherwise consume non-status fields. Cc: <stable@vger.kernel.org> Fixes: `0caeef63e6` ("libnvdimm: Add a poison list and export badblocks") Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-06 16:30:37 -08:00
Dan Williams	efda1b5d87	acpi, nfit, libnvdimm: fix / harden ars_status output length handling Given ambiguities in the ACPI 6.1 definition of the "Output (Size)" field of the ARS (Address Range Scrub) Status command, a firmware implementation may in practice return 0, 4, or 8 to indicate that there is no output payload to process. The specification states "Size of Output Buffer in bytes, including this field.". However, 'Output Buffer' is also the name of the entire payload, and earlier in the specification it states "Max Query ARS Status Output Buffer Size: Maximum size of buffer (including the Status and Extended Status fields)". Without this fix if the BIOS happens to return 0 it causes memory corruption as evidenced by this result from the acpi_nfit_ctl() unit test. ars_status00000000: 00020000 00000000 ........ BUG: stack guard page was hit at ffffc90001750000 (stack is ffffc9000174c000..ffffc9000174ffff) kernel stack overflow (page fault): 0000 [#1] SMP DEBUG_PAGEALLOC task: ffff8803332d2ec0 task.stack: ffffc9000174c000 RIP: 0010:[<ffffffff814cfe72>] [<ffffffff814cfe72>] __memcpy+0x12/0x20 RSP: 0018:ffffc9000174f9a8 EFLAGS: 00010246 RAX: ffffc9000174fab8 RBX: 0000000000000000 RCX: 000000001fffff56 RDX: 0000000000000000 RSI: ffff8803231f5a08 RDI: ffffc90001750000 RBP: ffffc9000174fa88 R08: ffffc9000174fab0 R09: ffff8803231f54b8 R10: 0000000000000008 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000003 R15: ffff8803231f54a0 FS: 00007f3a611af640(0000) GS:ffff88033ed00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc90001750000 CR3: 0000000325b20000 CR4: 00000000000406e0 Stack: ffffffffa00bc60d 0000000000000008 ffffc90000000001 ffffc9000174faac 0000000000000292 ffffffffa00c24e4 ffffffffa00c2914 0000000000000000 0000000000000000 ffffffff00000003 ffff880331ae8ad0 0000000800000246 Call Trace: [<ffffffffa00bc60d>] ? acpi_nfit_ctl+0x49d/0x750 [nfit] [<ffffffffa01f4fe0>] nfit_test_probe+0x670/0xb1b [nfit_test] Cc: <stable@vger.kernel.org> Fixes: `747ffe11b4` ("libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing") Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-06 16:08:10 -08:00
Vishal Verma	9a901f5495	acpi, nfit: fix extended status translations for ACPI DSMs ACPI DSMs can have an 'extended' status which can be non-zero to convey additional information about the command. In the xlat_status routine, where we translate the command statuses, we were returning an error for a non-zero extended status, even if the primary status indicated success. Return from each command's 'case' once we have verified both its status and extend status are good. Cc: <stable@vger.kernel.org> Fixes: `11294d63ac` ("nfit: fail DSMs that return non-zero status by default") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-12-06 16:08:10 -08:00
Dan Williams	178d6f4be8	Merge branch 'for-4.9/libnvdimm' into libnvdimm-for-next	2016-10-07 16:46:24 -07:00
Dan Williams	44c462eb9e	libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 19:13:42 -07:00
Vishal Verma	9ffd6350a1	nfit: don't start a full scrub by default for an MCE Starting a full Address Range Scrub (ARS) on hitting a memory error machine check exception may not always be desirable. Provide a way through sysfs to toggle the behavior between just adding the address (cache line) where the MCE happened to the poison list and doing a full scrub. The former (selective insertion of the address) is done unconditionally. Cc: linux-acpi@vger.kernel.org Cc: Linda Knippers <linda.knippers@hpe.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 17:00:10 -07:00
Dan Williams	11294d63ac	nfit: fail DSMs that return non-zero status by default For the DSMs where the kernel knows the format of the output buffer and originates those DSMs from within the kernel, return -EIO for any non-zero status. If the BIOS is indicating a status that we do not know how to handle, fail the DSM. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-21 09:35:19 -07:00
Vishal Verma	2e21807d4b	nfit, mce: Fix SPA matching logic in MCE handler The check for a 'pmem' type SPA in the MCE handler was inverted due to a merge/rebase error. Fixes: `6839a6d` nfit: do an ARS scrub on hitting a latent media error Cc: linux-acpi@vger.kernel.org Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-09 17:34:46 -07:00
Dan Williams	231bf117aa	tools/testing/nvdimm: unit test for acpi_nvdimm_notify() Trigger an nmemX/nfit/flags attribute to fire an event whenever a smart-threshold DSM is received. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-01 18:20:14 -07:00
Dan Williams	ba9c8dd3c2	acpi, nfit: add dimm device notification support Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012 NVDIMM Root device, can receive health event notifications. Given that these devices are precluded from registering a notification handler via acpi_driver.acpi_device_ops (due to no _HID), we use acpi_install_notify_handler() directly. The registered handler, acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags sysfs attribute when a health event notification is received. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-29 14:55:17 -07:00
Dan Williams	c14a868a5a	tools/testing/nvdimm: unit test for acpi_nfit_notify() We have had a couple bugs in this implementation in the past and before we add another ->notify() implementation for nvdimm devices, lets allow this routine to be exercised via nfit_test. Rewrite acpi_nfit_notify() in terms of a generic struct device and acpi_handle parameter, and then implement a mock acpi_evaluate_object() that returns a _FIT payload. Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-23 07:49:42 -07:00
Vishal Verma	c09f12186d	acpi, nfit: check for the correct event code in notifications Commit `209851649d` "acpi: nfit: Add support for hot-add" added support for _FIT notifications, but it neglected to verify the notification event code matches the one in the ACPI spec for "NFIT Update". Currently there is only one code in the spec, but once additional codes are added, older kernels (without this fix) will misbehave by assuming all event notifications are for an NFIT Update. Fixes: `209851649d` ("acpi: nfit: Add support for hot-add") Cc: <stable@vger.kernel.org> Cc: <linux-acpi@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-23 07:49:08 -07:00
Ross Zwisler	68202c9f0a	libnvdimm, nd_blk: mask off reserved status bits The "NVDIMM Block Window Driver Writer's Guide": http://pmem.io/documents/NVDIMM_DriverWritersGuide-July-2016.pdf ...defines the layout of the block window status register. For the July 2016 version of the spec linked to above, this happens in Figure 4 on page 26. The only bits defined in this spec are bits 31, 5, 4, 2, 1 and 0. The rest of the bits in the status register are reserved, and there is a warning following the diagram that says: Note: The driver cannot assume the value of the RESERVED bits in the status register are zero. These reserved bits need to be masked off, and the driver must avoid checking the state of those bits. This change ensures that for hardware implementations that set these reserved bits in the status register, the driver won't incorrectly fail the block I/Os. Cc: <stable@vger.kernel.org> #v4.2+ Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-08 09:26:13 -07:00
Dan Williams	0606263f24	Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next	2016-07-24 08:05:44 -07:00
Vishal Verma	6839a6d96f	nfit: do an ARS scrub on hitting a latent media error When a latent (unknown to 'badblocks') error is encountered, it will trigger a machine check exception. On a system with machine check recovery, this will only SIGBUS the process(es) which had the bad page mapped (as opposed to a kernel panic on platforms without machine check recovery features). In the former case, we want to trigger a full rescan of that nvdimm bus. This will allow any additional, new errors to be captured in the block devices' badblocks lists, and offending operations on them can be trapped early, avoiding machine checks. This is done by registering a callback function with the x86_mce_decoder_chain and calling the new ars_rescan functionality with the address in the mce notificatiion. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tony Luck <tony.luck@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-24 08:04:04 -07:00
Dan Williams	bdf97013ce	nfit: move to nfit/ sub-directory With the arrival of x86-machine-check support the nfit driver will add a (conditionally-compiled) source file. Prepare for this by moving all nfit source to drivers/acpi/nfit/. This is pure code movement, no functional changes. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-24 08:04:04 -07:00

1 2 3 4 5

205 Commits