This patch implements the Namespace Write Protect feature described in
"NVMe TP 4005a Namespace Write Protect". In this version, we implement
No Write Protect and Write Protect states for target ns which can be
toggled by set-features commands from the host side.
For write-protect state transition, we need to flush the ns specified
as a part of command so we also add helpers for carrying out synchronous
flush operations.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: fixed an incorrect endianess conversion, minor cleanups]
Signed-off-by: Christoph Hellwig <hch@lst.de>
In the current implementation, we clear the AEN bit when we get the
"get log page" command if given log page is associated with AEN.
This patch allows optionally retaining the AEN for the ctrl
under consideration when Retain Asynchronous Event (RAE) bit is set
as a part of "get log page" command.
This allows the host to read the Log page and optionally retaining the
AEN associated with this log page when using userspace tools like
nvme-cli.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: also use the new helper in the just merged ANA code]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Allow creating non-default ANA groups (group ID > 1). Groups are created
either by assigning the group ID to a namespace, or by creating a configfs
group object under a specific port. All namespaces assigned to a group
that doesn't have a configfs object for a given port are marked as
inaccessible.
Allow changing the ANA state on a per-port basis by creating an
ana_groups directory under each port, and another directory with an
ana_state file in it. The default ANA group 1 directory is created
automatically for each port.
For all changes in ANA configuration the ANA change AEN is sent. We only
keep a global changecount instead of additional per-group changecounts to
keep the implementation as simple as possible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Add support for Asynchronous Namespace Access as specified in NVMe 1.3
TP 4004.
Just add a default ANA group 1 that is optimized on all ports. This is
(and will remain) the default assignment for any namespace not epxlicitly
assigned to another ANA group. The ANA state can be manually changed
through the configfs interface, including the change state.
Includes fixes and improvements from Hannes Reinecke.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field
in the Identify controller data to help the host size resources. Put
an upper limit on the supported namespaces to be able to support this
value as supporting 32-bits worth of namespaces would lead to very
large buffers. The limit is completely arbitrary at this point.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le.
Since both arguments are u8 [16], use size of destination there.
Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t
already.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The patch enables inline data sizes using up to 4 recv sges, and capping
the size at 16KB or at least 1 page size. So on a 4K page system, up to
16KB is supported, and for a 64K page system 1 page of 64KB is supported.
We avoid > 0 order page allocations for the inline buffers by using
multiple recv sges, one for each page. If the device cannot support
the configured inline data size due to lack of enough recv sges, then
log a warning and reduce the inline size.
Add a new configfs port attribute, called param_inline_data_size,
to allow configuring the size of inline data for a given nvmf port.
The maximum size allowed is still enforced by nvmet-rdma with
NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE).
And the default size, if not specified via configfs, is still PAGE_SIZE.
This preserves the existing behavior, but allows larger inline sizes
for small page systems. If the configured inline data size exceeds
NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is
reduced. If param_inline_data_size is set to 0, then inline data is
disabled for that nvmf port.
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
This patch adds support for Commands Supported and Effects log page
(Log Identifier 05h) for NVMeOF. This also makes it easier to find
which commands are supported, e.g. :-
subnqn : testnqn1
Admin Command Set
ACS2 [Get Log Page ] 00000001
ACS6 [Identify ] 00000001
ACS8 [Abort ] 00000001
ACS9 [Set Features ] 00000001
ACS10 [Get Features ] 00000001
ACS12 [Asynchronous Event Request ] 00000001
ACS24 [Keep Alive ] 00000001
NVM Command Set
IOCS0 [Flush ] 00000001
IOCS1 [Write ] 00000001
IOCS2 [Read ] 00000001
IOCS8 [Write Zeroes ] 00000001
IOCS9 [Dataset Management ] 00000001
This partticular functionality can be used from the host side to examine
the NVMeOF ctrl commands supported.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Free smart-log buffer allocated in the function after use.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Quote from Figure 106 in NVMe 1.3a:
The Identify Namespace data structure is returned to the host for the
namespace specified in the Namespace Identifier (CDW1.NSID) field if it
is an active NSID. If the specified namespace is not an active NSID,
then the controller returns a zero filled data structure.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@rimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Per section 5.2 of the NVMe 1.3 spec:
"When the controller posts a completion queue entry for an outstanding
Asynchronous Event Request command and thus reports an asynchronous
event, subsequent events of that event type are automatically masked by
the controller until the host clears that event. An event is cleared by
reading the log page associated with that event using the Get Log Page
command (see section 5.14)."
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
AEN configuration via the 'Get Features' and 'Set Features' admin
command is mandatory, so we should be implemeting handling for it.
Signed-off-by: Hannes Reinecke <hare@suse.com>
[hch: use WRITE_ONCE, check for invalid values]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Just keep a per-controller buffer of changed namespaces and copy it out
in the get log page implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
Remove the common code to allocate a buffer and copy it into the SGL.
Instead the two no-op implementations just zero the SGL directly, and
the smart log allocates a buffer on its own. This prepares for the
more elaborate ANA log page.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
This patch adds simple file backed namespace support for NVMeOF target.
The new file io-cmd-file.c is responsible for handling the code for I/O
commands when ns is file backed. Also, we introduce mempools based slow
path using sync I/Os for file backed ns to ensure forward progress under
reclaim.
The old block device based implementation is moved to io-cmd-bdev.c and
use a "nvmet_bdev_" symbol prefix. The enable/disable calls are also
move into the respective files.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
[hch: updated changelog, fixed double req->ns lookup in bdev case]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Remove the duplicate NULL initialization for req->ns. req->ns is always
initialized to NULL in nvmet_req_init(), so there is no need to reset
it later on failures unless we have previously assigned a value to it.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Commit 42de82a8b5 previously attempted to fix this, and it did
correctly pad the MN and FR fields with spaces, but the SN field still
contains 0 bytes. The current code fills out the first 16 bytes with
hex2bin, leaving the last 4 bytes zeroed. Rather than adding a lot of
error-prone math to avoid overwriting SN twice, just set the whole thing
to spaces up front (it's only 20 bytes).
Fixes: 42de82a8b5 ("nvmet: don't report 0-bytes in serial number")
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
the status is either success or some status id and
we don't need a local variable for it.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We already allocated the buffer with kzalloc.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
small typos fixed in admin-cmd.c
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
And fix the Get/Set Log Page implementation to take all 8 bits of the
feature identifier into account.
Signed-off-by: Omri Mann <omri@excelero.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
[hch: used the UUID API, updated changelog]
This changes the earlier patch "nvmet: don't report 0-bytes
in serial number" to use the memcpy_and_pad() helper introduced
in a previous patch.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The merged version of my patch "nvmet: don't report 0-bytes in serial
number" fails to remove two lines which should have been replaced,
so that the space-padded strings are overwritten again with 0-bytes.
Fix it.
Fixes: 42de82a8b5 nvmet: don't report 0-bytes in serial number
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Sagi Grimberg <sagi@grimbeg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The NVME standard mandates that the SN, MN, and FR fields of the Identify
Controller Data Structure be "ASCII strings". That means that they may
not contain 0-bytes, not even string terminators.
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[hch: fixed for the move of the serial field, updated description]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The NVMe specification defines the serial number as:
"Serial Number (SN): Contains the serial number for the NVM subsystem
that is assigned by the vendor as an ASCII string. Refer to section
7.10 for unique identifier requirements. Refer to section 1.5 for ASCII
string requirements"
So move it from the controller to the subsystem, where it belongs.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A NVMe Identify NS command with a CNS value of '3' is expecting a list
of Namespace Identification Descriptor structures to be returned to
the host for the namespace requested in the namespace identify
command.
This Namespace Identification Descriptor structure consists of the
type of the namespace identifier, the length of the identifier and the
actual identifier.
Valid types are NGUID and UUID which we have saved in our nvme_ns
structure if they have been configured via configfs. If no value has
been assigened to one of these we return an "invalid opcode" back to
the host to maintain backward compatibiliy with older implementations
without Namespace Identify Descriptor list support.
Also as the Namespace Identify Descriptor list is the only mandatory
feature change between 1.2.1 and 1.3 we can bump the advertised
version as well.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Use NVME_IDENTIFY_DATA_SIZE define instead of hard coding the magic
4096 value.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.com>
[hch: converted three more users]
Signed-off-by: Christoph Hellwig <hch@lst.de>
We've added a considerable amount of fixes for stalls and issues
with the blk-mq scheduling in the 4.11 series since forking
off the for-4.12/block branch. We need to do improvements on
top of that for 4.12, so pull in the previous fixes to make
our lives easier going forward.
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch introduces helper function for checking controller
status during admin and io command processing which returns u16
status. As to bring consistency on returning status, other
friend functions also now return u16 status instead of int
to match the spec.
As part of the theseerror log prints in also prints qid on
which command error occured.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
In this case entirely harmless as it's all-ones, but still nice to
shut up sparse.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
We don't actually need the full rculist.h header in sched.h anymore,
we will be able to include the smaller rcupdate.h header instead.
But first update code that relied on the implicit header inclusion.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch defines CNS field as 8-bit field and avoids cpu_to/from_le
conversions.
Also initialize nvme_command cns value explicitly to NVME_ID_CNS_NS
for readability (don't rely on the fact that NVME_ID_CNS_NS = 0).
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
trivial fix to spelling mistake in pr_err message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The Set Features implementation for Keep Alive Timer was using the wrong
structure when retrieving the KATO value; it was treating the Set
Features command as a Property Set command.
The NVMe spec defines the Keep Alive Timer feature as having one input
in CDW11 (4 bytes at offset 44 in the command) whereas the code was
reading 8 bytes at offset 48.
Since the Linux NVMe over Fabrics host never sets this feature, this
code has presumably never been tested.
Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull block layer updates from Jens Axboe:
"This is the main block pull request this series. Contrary to previous
release, I've kept the core and driver changes in the same branch. We
always ended up having dependencies between the two for obvious
reasons, so makes more sense to keep them together. That said, I'll
probably try and keep more topical branches going forward, especially
for cycles that end up being as busy as this one.
The major parts of this pull request is:
- Improved support for O_DIRECT on block devices, with a small
private implementation instead of using the pig that is
fs/direct-io.c. From Christoph.
- Request completion tracking in a scalable fashion. This is utilized
by two components in this pull, the new hybrid polling and the
writeback queue throttling code.
- Improved support for polling with O_DIRECT, adding a hybrid mode
that combines pure polling with an initial sleep. From me.
- Support for automatic throttling of writeback queues on the block
side. This uses feedback from the device completion latencies to
scale the queue on the block side up or down. From me.
- Support from SMR drives in the block layer and for SD. From Hannes
and Shaun.
- Multi-connection support for nbd. From Josef.
- Cleanup of request and bio flags, so we have a clear split between
which are bio (or rq) private, and which ones are shared. From
Christoph.
- A set of patches from Bart, that improve how we handle queue
stopping and starting in blk-mq.
- Support for WRITE_ZEROES from Chaitanya.
- Lightnvm updates from Javier/Matias.
- Supoort for FC for the nvme-over-fabrics code. From James Smart.
- A bunch of fixes from a whole slew of people, too many to name
here"
* 'for-4.10/block' of git://git.kernel.dk/linux-block: (182 commits)
blk-stat: fix a few cases of missing batch flushing
blk-flush: run the queue when inserting blk-mq flush
elevator: make the rqhash helpers exported
blk-mq: abstract out blk_mq_dispatch_rq_list() helper
blk-mq: add blk_mq_start_stopped_hw_queue()
block: improve handling of the magic discard payload
blk-wbt: don't throttle discard or write zeroes
nbd: use dev_err_ratelimited in io path
nbd: reset the setup task for NBD_CLEAR_SOCK
nvme-fabrics: Add FC LLDD loopback driver to test FC-NVME
nvme-fabrics: Add target support for FC transport
nvme-fabrics: Add host support for FC transport
nvme-fabrics: Add FC transport LLDD api definitions
nvme-fabrics: Add FC transport FC-NVME definitions
nvme-fabrics: Add FC transport error codes to nvme.h
Add type 0x28 NVME type code to scsi fc headers
nvme-fabrics: patch target code in prep for FC transport support
nvme-fabrics: set sqe.command_id in core not transports
parser: add u64 number parser
nvme-rdma: align to generic ib_event logging helper
...
Add support for handling write zeroes command on target.
Call into __blkdev_issue_zeroout, which the block layer expands into the
best suitable variant of zeroing the LBAs. Allow write zeroes operation
to deallocate the LBAs when calling __blkdev_issue_zeroout.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@hgst.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
Import a few updates to nvme.h from nvme-cli. This mostly includes a few
new fields and error codes, but also a few renames that so far are only
used in user space. Also one field is moved from an array of two le64
values to one of 16 u8 values so that we can more easily access it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch implements the support for smart-log command
(NVM Express 1.2.1-section 5.10.1.2 SMART / Health Information
(Log Identifier 02h)) on the target for NVMe over Fabric.
In current implementation host can retrieve following statistics:-
1. Data Units Read.
2. Data Units Written.
3. Host Read Commands.
4. Host Write Commands.
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
The host is allowed to issue identify as many times
as it wants, we need to stay consistent when reporting
the serial number for a given controller.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
This patch introduces a implementation of NVMe subsystems,
controllers and discovery service which allows to export
NVMe namespaces across fabrics such as Ethernet, FC etc.
The implementation conforms to the NVMe 1.2.1 specification
and interoperates with NVMe over fabrics host implementations.
Configuration works using configfs, and is best performed using
the nvmetcli tool from http://git.infradead.org/users/hch/nvmetcli.git,
which also has a detailed explanation of the required steps in the
README file.
Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>
Signed-off-by: Anthony Knapp <anthony.j.knapp@intel.com>
Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jens Axboe <axboe@fb.com>