Rename dasd_raw_build_cp() to dasd_eckd_build_cp_raw() to fit the
scheme.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We already have define_extent() that prepares necessary data for the
Define Extent CCW. The exact same thing is done in prefix_LRE().
Remove the duplicate code and move commands that were only used in
combination with the Prefix command to define_extent(). One of these
commands needs the blocksize to be specified. Add the blksize parameter
to define_extent() to account for that.
In addition, the check_XRC() function can be made more generic. Do this
and remove the Prefix-specific check_XRC_on_prefix() function.
Furthermore, prefix_LRE() uses fill_LRE_data() to prepare Locate Record
Extended data. Rename the function to fit the scheme better and make it
usable outside of the Prefix context by adding the corresponding CCW
command.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
"The bulk of the s390 patches for 4.13. Some new things but mostly bug
fixes and cleanups. Noteworthy changes:
- The SCM block driver is converted to blk-mq
- Switch s390 to 5 level page tables. The virtual address space for a
user space process can now have up to 16EB-4KB.
- Introduce a ELF phdr flag for qemu to avoid the global
vm.alloc_pgste which forces all processes to large page tables
- A couple of PCI improvements to improve error recovery
- Included is the merge of the base support for proper machine checks
for KVM"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (52 commits)
s390/dasd: Fix faulty ENODEV for RO sysfs attribute
s390/pci: recognize name clashes with uids
s390/pci: provide more debug information
s390/pci: fix handling of PEC 306
s390/pci: improve pci hotplug
s390/pci: introduce clp_get_state
s390/pci: improve error handling during fmb (de)registration
s390/pci: improve unreg_ioat error handling
s390/pci: improve error handling during interrupt deregistration
s390/pci: don't cleanup in arch_setup_msi_irqs
KVM: s390: Backup the guest's machine check info
s390/nmi: s390: New low level handling for machine check happening in guest
s390/fpu: export save_fpu_regs for all configs
s390/kvm: avoid global config of vm.alloc_pgste=1
s390: rename struct psw_bits members
s390: rename psw_bits enums
s390/mm: use correct address space when enabling DAT
s390/cio: introduce io_subchannel_type
s390/ipl: revert Load Normal semantics for LPAR CCW-type re-IPL
s390/dumpstack: remove raw stack dump
...
If a device is offline it can still be set to read-only via the bus id
through sysfs. Only the read-only feature flag for the ccw_device is
then set. If the device is online the corresponding block device needs
to be set to read-only as well (via set_disk_ro()).
The check whether there is a device to do so, however, happens after the
feature flag was set. This leads to an unnecessary "no such device"
error in the offline case.
This bug was introduced by commit 7571cb1c8e3cc ("s390/dasd: Make use of
dasd_set_feature() more often"). Fix this by simply returning count if
no device is available.
Fixes: 7571cb1c8e3cc ("s390/dasd: Make use of dasd_set_feature() more often")
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
blk_queue_split() is always called with the last arg being q->bio_split,
where 'q' is the first arg.
Also blk_queue_split() sometimes uses the passed-in 'bs' and sometimes uses
q->bio_split.
This is inconsistent and unnecessary. Remove the last arg and always use
q->bio_split inside blk_queue_split()
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Credit-to: Javier González <jg@lightnvm.io> (Noticed that lightnvm was missed)
Reviewed-by: Javier González <javier@cnexlabs.com>
Tested-by: Javier González <javier@cnexlabs.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The safe offline processing may hang forever because it waits for I/O
which can not be started because of the offline flag that prevents new
I/O from being started.
Allow I/O to be started during safe offline processing because in this
special case we take care that the queues are empty before throwing away
the device.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The safe offline processing needs, as well as the normal offline
processing, to be locked against multiple parallel executions. But it
should be able to be overtaken by a normal offline processing to make sure
that the device does not wait forever for outstanding I/O if the user
wants to.
Unfortunately the parallel processing of safe offline and normal offline
might lead to a race situation where both threads report successful
execution to the CIO layer which in turn tries to deregister the kobject
of the device twice. This leads to a
refcount_t: underflow; use-after-free.
error and the device is not able to be set online again afterwards without
a reboot.
Correct the locking of the safe offline processing by doing the following:
- Use the cdev lock to secure all set and test operations to the
device flags.
- Two safe offline processes are locked against each other using
the DASD_FLAG_SAFE_OFFLINE and DASD_FLAG_SAFE_OFFLINE_RUNNING
device flags.
The differentiation between offline triggered and offline running
is needed since the normal offline attribute is owned by CIO and
we have to pass over control in between.
- The dasd_generic_set_offline process handles the offline
processing. It is locked against parallel execution using the
DASD_FLAG_OFFLINE.
- Only a running safe offline should be able to be overtaken by a
single normal offline. This is ensured by clearing the
DASD_FLAG_SAFE_OFFLINE_RUNNING flag when a normal offline
overtakes. So this can only happen ones.
- The safe offline just aborts in this case doing nothing and
the normal offline processing finishes as usual.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
We have two flags, DASD_FLAG_DEVICE_RO and DASD_FEATURE_READONLY, that
tell us whether a device is read-only. DASD_FLAG_DEVICE_RO is set when a
device is attached as read-only to z/VM and DASD_FEATURE_READONLY is set
when either the corresponding kernel parameter is configured, or the
read-only state is changed via sysfs.
This is valuable information in any case. However, only the feature flag
is being checked at the moment when we display the current state.
Fix this by checking both flags.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Dynamic stack allocations are considered bad. Get rid of this one
occurrence and use kstrdup() instead.
Also, set the return codes so that we have only one exit where we can
call kfree().
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Exploit multiple hardware contexts (queues) that can process
requests in parallel.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Drop the tasklet that was used to complete requests in favor of
block layer helpers that finish the IO on the CPU that initiated
it.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Convert scm_blk to use the blk-mq API. This is just a simple
conversion since we still use a single queue.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Remove CONFIG_SCM_BLOCK_CLUSTER_WRITE and related code. This quirk is
no longer needed on current hardware.
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings. This patch
instead introduces a new blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning. Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.
For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.
blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
* Region media error reporting: A libnvdimm region device is the parent
to one or more namespaces. To date, media errors have been reported via
the "badblocks" attribute attached to pmem block devices for namespaces
in "raw" or "memory" mode. Given that namespaces can be in "device-dax"
or "btt-sector" mode this new interface reports media errors
generically, i.e. independent of namespace modes or state. This
subsequently allows userspace tooling to craft "ACPI 6.1 Section
9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and
submit them via the ioctl path for NVDIMM root bus devices.
* Introduce 'struct dax_device' and 'struct dax_operations': Prompted by
a request from Linus and feedback from Christoph this allows for dax
capable drivers to publish their own custom dax operations. This fixes
the broken assumption that all dax operations are related to a
persistent memory device, and makes it easier for other architectures
and platforms to add customized persistent memory support.
* 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger memory
controllers to drain write-pending buffers that would otherwise be
flushed automatically by the platform ADR (asynchronous-DRAM-refresh)
mechanism at a power loss event. Support for "locked" DIMMs is included
to prevent namespaces from surfacing when the namespace label data area
is locked. Finally, fixes for various reported deadlocks and crashes,
also tagged for -stable.
* ACPI / nfit driver updates: General updates of the nfit driver to add
DSM command overrides, ACPI 6.1 health state flags support, DSM payload
debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
commmit 565851c972 "device-dax: fix sysfs attribute deadlock"
Tested-by: Yi Zhang <yizhan@redhat.com>
commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZDONJAAoJEB7SkWpmfYgC3SsP/2KrLvTUcz646ViuPOgZ2cC4
W6wAx6cvDSt+H52kLnFEsYoFt7WAj20ggPirb/Bc5jkGlvwE0lT9Xtmso9GpVkYT
J9ZJ9pP/4YaAD3II1gmTwaUjYi0FxoOdx3Eb92yuWkO/8ylz4b2Nu3cBpYwyziGQ
nIfEVwDXRLE86u6x0bWuf6TlVuvsbdiAI55CDqDMVQC6xIOLbSez7b8QIHlpiKEb
Mw+xqdQva0esoreZEOXEhWNO+qtfILx8/ceBEGTNMp4e/JjZ2FbrSNplM+9bH5k7
ywqP8lW+mBEw0fmBBkYoVG/xyesiiBb55JLnbi8Ew+7IUxw8a3iV7wftRi62lHcK
zAjsHe4L+MansgtZsCL8wluvIPaktAdtB4xr7l9VNLKRYRUG73jEWU0gcUNryHIL
BkQJ52pUS1PkClyAsWbBBHl1I/CvzVPd21VW0YELmLR4OywKy1c+eKw2bcYgjrb4
59HZSv6S6EoKaQC+2qvVNpePil7cdfg5V2ubH/ki9HoYVyoxDptEWHnvf0NNatIH
Y7mNcOPvhOksJmnKSyHbDjtRur7WoHIlC9D7UjEFkSBWsKPjxJHoidN4SnCMRtjQ
WKQU0seoaKj04b68Bs/Qm9NozVgnsPFIUDZeLMikLFX2Jt7YSPu+Jmi2s4re6WLh
TmJQ3Ly9t3o3/weHSzmn
=Ox0s
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"The bulk of this has been in multiple -next releases. There were a few
late breaking fixes and small features that got added in the last
couple days, but the whole set has received a build success
notification from the kbuild robot.
Change summary:
- Region media error reporting: A libnvdimm region device is the
parent to one or more namespaces. To date, media errors have been
reported via the "badblocks" attribute attached to pmem block
devices for namespaces in "raw" or "memory" mode. Given that
namespaces can be in "device-dax" or "btt-sector" mode this new
interface reports media errors generically, i.e. independent of
namespace modes or state.
This subsequently allows userspace tooling to craft "ACPI 6.1
Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error"
requests and submit them via the ioctl path for NVDIMM root bus
devices.
- Introduce 'struct dax_device' and 'struct dax_operations': Prompted
by a request from Linus and feedback from Christoph this allows for
dax capable drivers to publish their own custom dax operations.
This fixes the broken assumption that all dax operations are
related to a persistent memory device, and makes it easier for
other architectures and platforms to add customized persistent
memory support.
- 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is
available for storage appliance applications to manually trigger
memory controllers to drain write-pending buffers that would
otherwise be flushed automatically by the platform ADR
(asynchronous-DRAM-refresh) mechanism at a power loss event.
Support for "locked" DIMMs is included to prevent namespaces from
surfacing when the namespace label data area is locked. Finally,
fixes for various reported deadlocks and crashes, also tagged for
-stable.
- ACPI / nfit driver updates: General updates of the nfit driver to
add DSM command overrides, ACPI 6.1 health state flags support, DSM
payload debug available by default, and various fixes.
Acknowledgements that came after the branch was pushed:
- commmit 565851c972 "device-dax: fix sysfs attribute deadlock":
Tested-by: Yi Zhang <yizhan@redhat.com>
- commit 23f4984483 "libnvdimm: rework region badblocks clearing"
Tested-by: Toshi Kani <toshi.kani@hpe.com>"
* tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits)
libnvdimm, pfn: fix 'npfns' vs section alignment
libnvdimm: handle locked label storage areas
libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED
brd: fix uninitialized use of brd->dax_dev
block, dax: use correct format string in bdev_dax_supported
device-dax: fix sysfs attribute deadlock
libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking"
libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering
libnvdimm: rework region badblocks clearing
acpi, nfit: kill ACPI_NFIT_DEBUG
libnvdimm: fix clear length of nvdimm_forget_poison()
libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify
libnvdimm, region: sysfs trigger for nvdimm_flush()
libnvdimm: fix phys_addr for nvdimm_clear_poison
x86, dax, pmem: remove indirection around memcpy_from_pmem()
block: remove block_device_operations ->direct_access()
block, dax: convert bdev_dax_supported() to dax_direct_access()
filesystem-dax: convert to dax_direct_access()
Revert "block: use DAX for partition table reads"
ext2, ext4, xfs: retrieve dax_device for iomap operations
...
Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_device, remove the block
device direct_access enabling.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Setup a dax_dev to have the same lifetime as the dcssblk block device
and add a ->direct_access() method that is equivalent to
dcssblk_direct_access(). Once fs/dax.c has been converted to use
dax_operations the old dcssblk_direct_access() will be removed.
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
On some z/VM systems the query host access command is not supported for
temp disks, though the corresponding feature code is set.
This does not have any impact beside that the information is not available.
Suppress the full blown command reject error messages to not confuse the
user. The error is still logged in the s390dbf.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Some storage servers might not support the query host access feature.
Check if the corresponding feature code is set.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
trivial fix to spelling mistake in literal string
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Pull s390 updates from Martin Schwidefsky:
- New entropy generation for the pseudo random number generator.
- Early boot printk output via sclp to help debug crashes on boot. This
needs to be enabled with a kernel parameter.
- Add proper no-execute support with a bit in the page table entry.
- Bug fixes and cleanups.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (65 commits)
s390/syscall: fix single stepped system calls
s390/zcrypt: make ap_bus explicitly non-modular
s390/zcrypt: Removed unneeded debug feature directory creation.
s390: add missing "do {} while (0)" loop constructs to multiline macros
s390/mm: add cond_resched call to kernel page table dumper
s390: get rid of MACHINE_HAS_PFMF and MACHINE_HAS_HPAGE
s390/mm: make memory_block_size_bytes available for !MEMORY_HOTPLUG
s390: replace ACCESS_ONCE with READ_ONCE
s390: Audit and remove any remaining unnecessary uses of module.h
s390: mm: Audit and remove any unnecessary uses of module.h
s390: kernel: Audit and remove any unnecessary uses of module.h
s390/kdump: Use "LINUX" ELF note name instead of "CORE"
s390: add no-execute support
s390: report new vector facilities
s390: use correct input data address for setup_randomness
s390/sclp: get rid of common response code handling
s390/sclp: don't add new lines to each printed string
s390/sclp: make early sclp code readable
s390/sclp: disable early sclp code as soon as the base sclp driver is active
s390/sclp: move early printk code to drivers
...
Since commit dd22f551 "block: Change direct_access calling convention",
the device size calculation in dcssblk_direct_access() is off-by-one.
This results in bdev_direct_access() always returning -ENXIO because the
returned value is not page aligned.
Fix this by adding 1 to the dev_sz calculation.
Fixes: dd22f551 ("block: Change direct_access calling convention")
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If safe offline is called for a DASD alias device a null pointer is passed
to fsync_bdev. So check for existence of the blockdevice before calling
fsync_bdev.
Should not be a real world problem since safe offline for an alias device
does not make sense and fsync_bdev can deal with a NULL pointer which it
gets after successful NULL pointer dereferencing on s390.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Check if the device pointer is valid. Just a sanity check since we already
are in the int handler of the device.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Allow 0 as valid input for the path_threshold attribute to deactivate
the IFCC/CCC error handling.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The function dasd_busid() still uses simple_strtoul() to convert a
string to an integer value. This function is obsolete for quite some
time already and should be replaced.
The whole parameter parsing semantic still relies somewhat on the fact,
that simple_strtoul() parses a string containing literals without
complains and just returns the parsed integer value plus the residual
string. kstrtoint(), however, would return -EINVAL in such a case.
Since we want to get rid of simple_strtoul() and now have a nice dasd[]
containing only single elements, we can clean up and simplify a few
things.
Replace simple_strtoul() with kstrtouint(), improve and simplify the
overall parameter parsing by the following:
- instead of residual strings return proper error codes
- remove dasd_parse_next_element() and decide directly what sort of
element is being parsed
- if we parse a device or a range of devices, split that element into
separate bits with a new function
- remove warning about invalid ending as it doesn't apply anymore
- annotate all parsing functions and data that can be freed after
initialisation with __init and __initdata respectively
- clean up bits and pieces while at it
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When the DASD driver is built into the kernel, the entire comma
separated parameter list is stored as one single element in the dasd[]
array, opposed to the module build where each element is stored
separately in dasd[].
There is no point in doing so. Therefore, store each part of the list as
single elements in dasd[] as well when built into the kernel.
Also, create a define for the maximum of 256 parameters.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With this feature, the DASD device driver more robustly handles DASDs
that are attached via multiple channel paths and are subject to
constant Interface-Control-Checks (IFCCs) and Channel-Control-Checks
(CCCs) or loss of High-Performance-FICON (HPF) functionality on one or
more of these paths.
If a channel path does not work correctly, it is removed from normal
operation as long as other channel paths are available. All extended
error recovery states can be queried and reset via user space
interfaces.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Store flags and path_data per channel path.
Implement get/set functions for various path masks.
The patch does not add functional changes.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Jan Hoeppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The function dasd_ro_store() calls set_disk_ro() to set the device in
question read-only. Since set_disk_ro() might sleep, we can't call it
while holding a lock. However, we also can't simply check if the device,
block, and gdp references are valid before we call set_disk_ro() because
an offline processing might have been started in the meanwhile which
will destroy those references.
In order to reliably call set_disk_ro() we have to ensure several
things:
- Still check validity of the mentioned references but additionally
check if offline processing is running and bail out accordingly. Also,
do this while holding the device lock.
- To ensure that the block device is still safe after the lock, increase
the open_count while still holding the device lock.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The reference to a device in question may get lost when the extended
error reporting (EER) attribute is being enabled/disabled while the
device is set offline at the same time. This is due to missing
refcounting and incorrect locking. Fix this by the following:
- In dasd_eer_store() get the device directly and handle the refcount
accordingly.
- Move the lock in dasd_eer_enable() up so we can ensure safe
processing.
- Check if the device is being set offline and return with -EBUSY if so.
- While at it, change the return code from -EPERM to -EMEDIUMTYPE as
suggested by a FIXME, since that is what we're actually checking.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Before we set a device offline, the open_count for the block device is
checked and certain flags are checked and set as well.
However, this is all done without holding any lock. Potentially, if the
open_count was checked but the DASD_FLAG_OFFLINE wasn't set yet, a
different process might want to increase the open_count depending on
whether DASD_FLAG_OFFLINE is set or not in the meanwhile.
This is quite racy and can lead to the loss of the device for that
process and subsequently lead to a panic.
Fix this by checking the open_count and setting the offline flags while
holding the ccwdev lock.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
When setting certain attributes, we actually set the according feature
flag. Do this by using dasd_set_feature() at a few occurrences and
remove duplicate code.
In dasd_set_feature() dasd_find_busid() is used to retrieve the devmap
for the device in question. Combined with the change above, this would
require the device to be set online at least once so that a devmap is
being created. Change that by using dasd_devmap_from_cdev() instead,
which uses dasd_find_busid() first and will create a devmap accordingly
if there is none yet.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
simple_strtoul() has been marked obsolete for quite some time now.
Replace a few last occurrences with kstrtouint().
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
block->request_queue is used many times in dasd_setup_queue. Define a
separate variable to increase readability a bit and to make it better
reusable.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Currently the block queue value max_segments is set to -1L, which
is then implicitly casted to unsigned short in blk_queue_max_segments.
This results in 65535 (64k) max_segments.
Even though the resulting value is correct, setting it implicitly using
-1L is rather confusing. Set the value explicitly using the USHRT_MAX
macro instead.
Reviewed-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Jan Höppner <hoeppner@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
the mdc value can be quite big (like 65535), so we are in undefined
territory when doing the multiplication with the (also signed)
FCX_MAX_DATA_FACTOR as outlined by UBSAN:
UBSAN: Undefined behaviour in drivers/s390/block/dasd_eckd.c:1678:14
signed integer overflow:
65535 * 65536 cannot be represented in type 'int'
CPU: 5 PID: 183 Comm: kworker/u512:1 Not tainted 4.7.0+ #150
Workqueue: events_unbound async_run_entry_fn
000000fb8b59f900 000000fb8b59f990 0000000000000002 0000000000000000
000000fb8b59fa30 000000fb8b59f9a8 000000fb8b59f9a8 000000000011732e
00000000000000a4 0000000000a309e2 0000000000a4c072 000000000000000b
000000fb8b59f9f0 000000fb8b59f990 0000000000000000 0000000000000000
0400000000d83238 000000000011732e 000000fb8b59f990 000000fb8b59f9f0
Call Trace:
([<0000000000117260>] show_trace+0x98/0xa8)
([<00000000001172e0>] show_stack+0x70/0xf0)
([<000000000053ac96>] dump_stack+0x86/0xb8)
([<000000000057f5f8>] ubsan_epilogue+0x28/0x70)
([<000000000057fe9e>] handle_overflow+0xde/0xf0)
([<00000000006c322a>] dasd_eckd_check_characteristics+0x50a/0x550)
([<00000000006b42ca>] dasd_generic_set_online+0xba/0x380)
([<0000000000693d82>] ccw_device_set_online+0x192/0x550)
([<00000000006ac1ae>] dasd_generic_auto_online+0x2e/0x70)
([<0000000000172130>] async_run_entry_fn+0x70/0x270)
([<0000000000165a72>] process_one_work+0x26a/0x638)
([<0000000000165e8a>] worker_thread+0x4a/0x658)
([<000000000016dd9c>] kthread+0x10c/0x110)
([<00000000008963ae>] kernel_thread_starter+0x6/0xc)
([<00000000008963a8>] kernel_thread_starter+0x0/0xc)
As this is a runtime value there is actually no risk of any sane
compiler to detect and (ab)use this undefinedness, but let's make
the multiplication defined by making mdc unsigned.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Trival fix, dev_err messages are missing a \n, so add it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
If the DASD device gets blocked for any reason, e.g. because it is reserved
somewhere, the host_access_count sysfs entry or the host_access_list
debugfs entry may sleep forever. Make it interruptible so that userspace
can use ^C to abort the operation.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A DASD device consists of the device itself and a discipline with a
corresponding private structure. These fields are set up during online
processing right after the device is created and before it is processed by
the state machine and made available for I/O.
During offline processing the discipline pointer and the private data gets
freed within the state machine and without protection of the existing
reference count. This might lead to a kernel panic because a function might
have taken a device reference and accesses the discipline pointer and/or
private data of the device while this is already freed.
Fix by freeing the discipline pointer and the private data after ensuring
that there is no reference to the device left.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Internal I/O is processed by the _sleep_on_function which might wait for a
device to get operational. During offline processing this will never happen
and therefore the refcount of the device will not drop to zero and the
offline processing blocks as well.
Fix by letting requests fail in the _sleep_on function during offline
processing. No further handling of the requests is necessary since this is
internal I/O and the device is thrown away afterwards.
Reviewed-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
The DASD device driver throws change events for the DASD blockdevice
after the online processing is done so that udev rules can take
actions after it.
The change event was missing for unformatted devices.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On LPAR the read message buffer command should be executed on the path
it was received on otherwise there is a chance that the CUIR assignment
might be faulty and the wrong channel path is set online/offline.
Fix by setting the path mask accordingly.
On z/VM we might not be able to do I/O on this path but there it does
not matter on which path the read message buffer command is executed.
Therefor implement a retry with an open path mask.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
When a device is in a status where CIO has killed all I/O by itself the
interrupt for a clear request may not contain an irb to determine the
clear function. Instead it contains an error pointer -EIO.
This was ignored by the DASD int_handler leading to a hanging device
waiting for a clear interrupt.
Handle -EIO error pointer correctly for requests that are clear pending and
treat the clear as successful.
Signed-off-by: Stefan Haberland <sth@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
1/ Replace pcommit with ADR / directed-flushing:
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement either
ADR, or provide one or more flush addresses per nvdimm. ADR
(Asynchronous DRAM Refresh) flushes data in posted write buffers to the
memory controller on a power-fail event. Flush addresses are defined in
ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure:
"Flush Hint Address Structure". A flush hint is an mmio address that
when written and fenced assures that all previous posted writes
targeting a given dimm have been flushed to media.
2/ On-demand ARS (address range scrub):
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the media
to refresh the bad block list, userspace can also request a re-scrub at
any time.
3/ Support for the Microsoft DSM (device specific method) command format.
4/ Support for EDK2/OVMF virtual disk device memory ranges.
5/ Various fixes and cleanups across the subsystem.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ
NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY
ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h
trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG
KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu
qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP
WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7
41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA
DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl
b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC
6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV
cN3edFVIdxvZeMgM5Ubq
=xCBG
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Replace pcommit with ADR / directed-flushing.
The pcommit instruction, which has not shipped on any product, is
deprecated. Instead, the requirement is that platforms implement
either ADR, or provide one or more flush addresses per nvdimm.
ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers
to the memory controller on a power-fail event.
Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware
Interface Table (NFIT) sub-structure: "Flush Hint Address Structure".
A flush hint is an mmio address that when written and fenced assures
that all previous posted writes targeting a given dimm have been
flushed to media.
- On-demand ARS (address range scrub).
Linux uses the results of the ACPI ARS commands to track bad blocks
in pmem devices. When latent errors are detected we re-scrub the
media to refresh the bad block list, userspace can also request a
re-scrub at any time.
- Support for the Microsoft DSM (device specific method) command
format.
- Support for EDK2/OVMF virtual disk device memory ranges.
- Various fixes and cleanups across the subsystem.
* tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits)
libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register"
nfit: do an ARS scrub on hitting a latent media error
nfit: move to nfit/ sub-directory
nfit, libnvdimm: allow an ARS scrub to be triggered on demand
libnvdimm: register nvdimm_bus devices with an nd_bus driver
pmem: clarify a debug print in pmem_clear_poison
x86/insn: remove pcommit
Revert "KVM: x86: add pcommit support"
nfit, tools/testing/nvdimm/: unify shutdown paths
libnvdimm: move ->module to struct nvdimm_bus_descriptor
nfit: cleanup acpi_nfit_init calling convention
nfit: fix _FIT evaluation memory leak + use after free
tools/testing/nvdimm: add manufacturing_{date|location} dimm properties
tools/testing/nvdimm: add virtual ramdisk range
acpi, nfit: treat virtual ramdisk SPA as pmem region
pmem: kill __pmem address space
pmem: kill wmb_pmem()
libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes
fs/dax: remove wmb_pmem()
libnvdimm, pmem: flush posted-write queues on shutdown
...