LightNVM compatible device drivers does not have a method to expose
LightNVM specific sysfs entries.
To enable LightNVM sysfs entries to be exposed, lightnvm device
drivers require a struct device to attach it to. To allow both the
actual device driver and lightnvm sysfs entries to coexist, the device
driver tracks the lifetime of the nvm_dev structure.
This patch refactors NVMe and null_blk to handle the lifetime of struct
nvm_dev, which eliminates the need for struct gendisk when a lightnvm
compatible device is provided.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
With LightNVM enabled namespaces, the gendisk structure is not exposed
to the user. This prevents LightNVM users from accessing the NVMe device
driver specific sysfs entries, and LightNVM namespace geometry.
Refactor the revalidation process, so that a namespace, instead of a
gendisk, is revalidated. This later allows patches to wire up the
sysfs entries up to a non-gendisk namespace.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
If NO_DMA=y:
drivers/built-in.o: In function `nvme_nvm_dev_dma_free':
lightnvm.c:(.text+0x23df1a): undefined reference to `dma_pool_free'
drivers/built-in.o: In function `nvme_nvm_dev_dma_alloc':
lightnvm.c:(.text+0x23df38): undefined reference to `dma_pool_alloc'
drivers/built-in.o: In function `nvme_nvm_destroy_dma_pool':
lightnvm.c:(.text+0x23df4c): undefined reference to `dma_pool_destroy'
drivers/built-in.o: In function `nvme_nvm_create_dma_pool':
lightnvm.c:(.text+0x23df7e): undefined reference to `dma_pool_create'
and
ERROR: "dma_pool_destroy" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_free" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_alloc" [drivers/nvme/host/nvme-core.ko] undefined!
ERROR: "dma_pool_create" [drivers/nvme/host/nvme-core.ko] undefined!
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch removes module_init()/module_exit() from driver code by using
module_misc_device() macro. All modules in this patch has a print
statement which is removed when module_misc_device() macro is used.
If undesirable this patch can be dropped entirely, this is the only
purpose of making this as a separate patch.
Signed-off-by: PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
These two are confusing leftover of the old world order, combining
values of the REQ_OP_ and REQ_ namespaces. For callers that don't
special case we mostly just replace bi_rw with bio_data_dir or
op_is_write, except for the few cases where a switch over the REQ_OP_
values makes more sense. Any check for READA is replaced with an
explicit check for REQ_RAHEAD. Also remove the READA alias for
REQ_RAHEAD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The __nvm_submit_ppa() function is not used outside lightnvm core.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The passed by reference ppa list in nvm_set_rqd_list() is updated when
multiple planes are available. In that case, each PPA plane is
incremented when the device side PPA list is created. This prevents the
caller to rely on the PPA list to be unmodified after a call.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The gen_mark_blk_bad function marks the wrong block when a block is on
a different channel. Fix the index calculation, so that it updates the
correct block.
Reported-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The nvm_get_blk() function is called with rlun->lock held. This is ok
when the media manager implementation doesn't go out of its atomic
context. However, if a media manager persists its metadata, and
guarantees that the block is given to the target, this is no longer
a viable approach. Therefore, clean up the flow of rrpc_map_page,
and make sure that nvm_get_blk() is called without any locks acquired.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The [get/put]_blk API enables targets to get ownership of blocks at
runtime. This information is currently not recorded on disk, and the
information is therefore lost on power failure. To restore the
metadata, the [get/put]_blk must persist its metadata. In that case,
we need to control the outer lock, so that we can disable them while
updating the on-disk metadata. Fortunately, the _unlocked versions can
be removed, which allows us to move the lock into the [get/put]_blk
functions.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The ->list, ->open_list, and ->closed_list lists were previously used
for statistics. However, their usage have been removed, and thus these
can safely be removed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
If a media manager tries to initialize it targets upon media manager
initialization, the media manager will need to know which target types
are available in LightNVM. The lists of which managers and target types
are available shares the same lock.
Therefore, on initialization, the nvm_lock is taken by LightNVM core,
which later leads to a deadlock when target types are enumerated by the
media manager.
Add an exclusive lock for target types to resolve this conflict.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
To enable persistent block management to easily control creation and
removal of targets, we move target management into the media
manager. The LightNVM core continues to maintain which target types are
registered, while the media manager now keeps track of its initialized
targets.
Two new callbacks for the media manager are introduced. create_tgt and
remove_tgt. Note that remove_tgt returns 0 on successfully removing a
target, and returns 1 if the target was not found.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The generic manager should be called the general media manager, and
instead of using the rather long name of "gennvm" in front of each data
structures, use "gen" instead to shorten it. Update the description of
the media manager as well to make the media manager purpose clearer.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The responsibility of the media manager is not to keep track of
open/closed blocks. This is better maintained within a target,
that already manages this information on writes.
Remove the statistics and merge the states NVM_BLK_ST_OPEN and
NVM_BLK_ST_CLOSED.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A couple of small checkpatch fixups to stop it from complaining.
./drivers/lightnvm/core.c:360: WARNING: line over 80 characters
./drivers/lightnvm/core.c:360: ERROR: trailing statements should be on
next line
./drivers/lightnvm/core.c:503: WARNING: Block comments use a trailing */
on a separate line
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Checkpatch found two incidents where the type was preferred to be
written out in full.
./drivers/lightnvm/rrpc.h:184: WARNING: Prefer 'unsigned int' to bare
use of 'unsigned'
./drivers/lightnvm/rrpc.h:209: WARNING: Prefer 'unsigned int' to bare
use of 'unsigned'
./drivers/lightnvm/rrpc.c:51: WARNING: Prefer 'unsigned int' to bare use
of 'unsigned'
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Mark functions not used by ouside of thier implementing file as static.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Expose media manager mark_blk() to targets, as done for the rest of the
media manager callback functions.
Signed-off-by: Javier González <javier@cnexlabs.com>
Updated description
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Break the loop when rqd is not null to reduce
an unnecessary schedule.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch converts the simple bi_rw use cases in the block,
drivers, mm and fs code to set/get the bio operation using
bio_set_op_attrs/bio_op
These should be simple one or two liner cases, so I just did them
in one patch. The next patches handle the more complicated
cases in a module per patch.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The nvm_dev->max_pages_per_blk variable was removed in favor of the new
nvm->sec_per_blk variable. The ->max_pages_per_blk variable was still
used in rrpc_capacity, reporting the reserved capacity to zero. Replace
with ->sec_per_blk to calculate the reserved area again.
Signed-off-by: Javier González <javier@cnexlabs.com>
Updated patch description. Was "lightnvm: eliminate redundant variable"
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The number of ppas contained on a request is not necessarily the number
of pages that it maps to neither on the target nor on the device side.
In order to avoid confusion, rename nr_pages to nr_ppas since it is what
the variable actually contains.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Targets can update a block state when having a reference to an
in-memory virtual block. In the case that a target does not keep the
block metadata in memory, it does not have a way to update this
structure.
Therefore, expose gennvm_mark_blk() through the media managers
->mark_blk() callback and let targets update the state structure through
this callback.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Targets associated with a device manager are not freed on device
removal. They have to be manually removed before shutdown. Make sure
any outstanding targets are freed upon shutdown.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When doing GC, rrpc calculates the physical LUN to which the rrpc block
belongs too. This calculation is based on the assumption that LUNs are
assigned sequentially to the LUN list. Use the reference to the LUN
instead. This saves us the calculation and allows us to align LUNs in a
different manner to, for example, take advantage of devide parallelism.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Until now, the dma pool have been exclusively used to allocate the ppa
list being sent to the device. In pblk (upcoming), we use these pools to
allocate metadata too. Thus, we generalize the names of some variables
on the dma helper functions to make the code more readable.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
rrpc does not save any metadata on a given request. Thus, do not attempt
to free the metadata dma region.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The ppa configured for retrieving the bad block table uses the internal
lun id to setup the get bad block ppa. This increases monotonically
with the number luns available. When configuring a ppa, the channel and
lun must be specified separately, leading to an out of bound memory
access in gennvm_block_bb when lun id goes beyond the luns available
within a channel.
Additional, remove out of bound check in gennvm_block_bb(), as it was a
buggy to begin with.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The set_bb_tbl takes struct nvm_rq and only uses its ppa_list and
nr_pages internally. Instead, make these two variables explicit.
This allows a user to call it without initializing a struct nvm_rq
first.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
We move the responsibility of managing the persistent bad block table to
the target. The target may choose to mark a block bad or retry writing
to it. Never the less, it should be the target that makes the decision
and not the media manager.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A virtual block enables a block to identify multiple physical blocks.
This is useful for metadata where a device media supports multiple
planes. In that case, a block, with multiple planes can be managed
as a single vblk. Reducing the metadata required by one forth.
nvm_set_rqd_ppalist() takes care of expanding a ppa_list with vblks
automatically. However, for some use-cases, where only a single physical
block is required, the ppa_list should not be expanded.
Therefore, add a vblk parameter to nvm_set_rqd_ppalist(), and only
expand the ppa_list if vblk is set.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that device ops->get_bb_table no longer uses a callback, the
struct factory_blks can be removed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The device ops->get_bb_tbl() takes a callback, that allows the caller
to use its own callback function to update its data structures in the
returning function.
This makes it difficult to send parameters to the callback, and usually
is circumvented by small private structures, that both carry the callers
state and any flags needed to fulfill the update.
Refactor ops->get_bb_tbl() to fill a data buffer with the status of the
blocks returned, and let the user call the callback function manually.
That will provide the necessary flags and data structures and simplify
the logic around ops->get_bb_tbl().
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Users that wish to iterate all luns on a device. Must create a
struct ppa_addr and separate iterators for channels and luns. To set the
iterators, two loops are required, one to iterate channels, and another
to iterate luns. This leads to decrease in readability.
Introduce nvm_for_each_lun_ppa, which implements the nested loop and
sets ppa, channel, and lun variable for each loop body, eliminating
the boilerplate code.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A target name must be unique. However, a per-device registration of
targets is maintained on a dev->online_targets list, with a per-device
search for targets upon registration.
This results in a name collision when two targets, with the same name,
are created on two different targets, where the per-device list is not
shared.
Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The functions nvm_register_target(), nvm_unregister_target() and
associated list refers to a target type that is being registered by a
target type module. Rename nvm_*_targets() to nvm_*_tgt_type(), so that
the intension is clear.
This enables target instances to use the _nvm_*_targets() naming.
Signed-off-by: Simon A. F. Lund <slund@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Since we mainly use soffset in device sector size, we therefore store
this value in rrpc->soffset, instead of the offset in 512byte sector
size. This eliminates the "(ilog2(dev->sec_size) - 9)" calculation on
each I/O.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Updated patch description.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Calculate rrpc total blocks and sectors up front, make sense
to use them. For example, we use rrpc->nr_sects to calculate rrpc
area size, but it makes no sense if we don't initialize it up front,
since it would be zero until we finish rrpc luns init.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A memory leak occurs if the lower page table is initialized and the
following dev->lun_map fails on allocation.
Rearrange the initialization of lower page table to allow dev->lun_map
to fail gracefully without memory leak.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Move kfree of dev->lun_map to nvm_free()
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The get block table command returns a list of blocks and planes
with their associated state. Users, such as gennvm and sysblk,
manages all planes as a single virtual block.
It was therefore natural to fold the bad block list before it is
returned. However, to allow users, which manages on a per-plane
block level, to also use the interface, the get_bb_tbl interface is
changed to not fold by default and instead let the caller fold if
necessary.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The flash page size (fpg) and size across planes (pfpg) are convenient
to know when allocating buffer sizes. This has previously been a
calculated in various places. Replace with the pre-calculated values.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The nvm_submit_ppa function assumes that users manage all plane
blocks as a single block. Extend the API with nvm_submit_ppa_list
to allow the user to send its own ppa list. If the user submits more
than a single PPA, the user must take care to allocate and free
the corresponding ppa list.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The device ->submit_io() callback might fail to submit I/O to device.
In that case, the nvm_submit_ppa function should not wait for
completion. Instead return the ->submit_io() error.
Reviewed by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This fixes the following warnings:
drivers/lightnvm/sysblk.c:125:9: warning: ‘ret’ may be used
uninitialized in this function
drivers/lightnvm/sysblk.c:275:15: warning: ‘ret’ may be used
uninitialized in this function
In both cases, ret is only set from within a loop that may not be entered.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
An Open-Channel SSD can work on two modes: (i) hybrid mode, where the
L2P table is maintained both by the host and by the device; and (ii)
full host-based, where the L2P table is uniquely maintained by the host.
In the advent of a new target implementing the full host-based mode, do
not assume that the L2P table must be loaded on the generic media
manager; check device properties loaded on the identify command instead.
Signed-off-by: Javier González <javier@cnexlabs.com>
Moved into the following statement.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When the l2p table is loaded, addresses are checked for the lun they
belong to and luns are reserved accordingly. This assumes that metadata
is being stored in the backend device to recover the previous target
configuration. Since this is not yet implemented, this check collides
with some of the core initialization (e.g., sysblock initialization when
a page is formed by several sectors).
We take this check out and for now rely on that the right target will be
created instead. When metadata is stored to recover a target, this check
will come natural as part of the recovery strategy.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add a bitmap of luns to indicate the status
of luns: inuse/available. When create targets
do the necessary check to avoid allocating luns
that are already allocated.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Freed dev->lun_map if nvm_core_init later failed in the init process.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
We can create more than one target on a lightnvm
device by specifying its begin lun and end lun.
But only specify the physical address area is not
enough, we need to get the corresponding non-
intersection logical address area division from
the backend device's logcial address space.
Otherwise the targets on the device might use
the same logical addresses cause incorrect
information in the device's l2p table.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Pull block driver updates from Jens Axboe:
"This is the block driver pull request for this merge window. It sits
on top of for-4.6/core, that was just sent out.
This contains:
- A set of fixes for lightnvm. One from Alan, fixing an overflow,
and the rest from the usual suspects, Javier and Matias.
- A set of fixes for nbd from Markus and Dan, and a fixup from Arnd
for correct usage of the signed 64-bit divider.
- A set of bug fixes for the Micron mtip32xx, from Asai.
- A fix for the brd discard handling from Bart.
- Update the maintainers entry for cciss, since that hardware has
transferred ownership.
- Three bug fixes for bcache from Eric Wheeler.
- Set of fixes for xen-blk{back,front} from Jan and Konrad.
- Removal of the cpqarray driver. It has been disabled in Kconfig
since 2013, and we were initially scheduled to remove it in 3.15.
- Various updates and fixes for NVMe, with the most important being:
- Removal of the per-device NVMe thread, replacing that with a
watchdog timer instead. From Christoph.
- Exposing the namespace WWID through sysfs, from Keith.
- Set of cleanups from Ming Lin.
- Logging the controller device name instead of the underlying
PCI device name, from Sagi.
- And a bunch of fixes and optimizations from the usual suspects
in this area"
* 'for-4.6/drivers' of git://git.kernel.dk/linux-block: (49 commits)
NVMe: Expose ns wwid through single sysfs entry
drivers:block: cpqarray clean up
brd: Fix discard request processing
cpqarray: remove it from the kernel
cciss: update MAINTAINERS
NVMe: Remove unused sq_head read in completion path
bcache: fix cache_set_flush() NULL pointer dereference on OOM
bcache: cleaned up error handling around register_cache()
bcache: fix race of writeback thread starting before complete initialization
NVMe: Create discard zero quirk white list
nbd: use correct div_s64 helper
mtip32xx: remove unneeded variable in mtip_cmd_timeout()
lightnvm: generalize rrpc ppa calculations
lightnvm: remove struct nvm_dev->total_blocks
lightnvm: rename ->nr_pages to ->nr_sects
lightnvm: update closed list outside of intr context
xen/blback: Fit the important information of the thread in 17 characters
lightnvm: fold get bb tbl when using dual/quad plane mode
lightnvm: fix up nonsensical configure overrun checking
xen-blkback: advertise indirect segment support earlier
...
In rrpc, some calculations assume a certain configuration (e.g., 1 LUN,
1 sector per page). The reason behind this was that LightNVM used a
simple configuration with QEMU to test core features in the beginning.
This patch relaxes these assumptions and generalizes calculation,
allowing multiple luns to be used.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The struct nvm_dev->total_blocks was only used for calculating total
sectors. Remove and instead calculate total sectors from the number of
luns and its sectors.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The struct rrpc->nr_pages can easily be interpreted as the number of
flash pages allocated to rrpc, while it is the nr_sects. Make sure that
this is reflected from the variable name.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When an I/O finishes, full blocks are moved from the open to the closed
list - a lock is taken to protect the list. This happens at the moment
in the interrupt context, which is not correct.
This patch moves this logic to the block workqueue instead, avoiding
holding a spinlock without interrupt save in an interrupt context.
Signed-off-by: Javier González <javier@cnexlabs.com>
Fixes: ff0e498bfa ("lightnvm: manage open and closed blocks sepa...")
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When the media manager runs in dual or quad plane mode, lightnvm
abstracts away plane specific commands. This poses a problem for
get bad block table, as it reports bad blocks per plane, making the
table either two or four times bigger than expected. Fold the bad block
list before returning.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead of checking a constant 0 actually check the space available. Even
better remember to allow for the header and also check the right amount of
space is needed.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
System block allows the device to initialize with its configured media
manager. The system blocks is written to disk, and read again when media
manager is determined. For this to work, the backend must store the
data. Device drivers, such as null_blk, does not have any backend
storage. This patch allows the media manager to be initialized without a
storage backend.
It also fix incorrect configuration of capabilities in null_blk, as it
does not support get/set bad block interface.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch fixes an error on the calculation of intersecting logical
addresses; it contemplates the case where a new request including
several addresses intersects with a single locked address. This case is
typical when multiple pages are sent in a new request, while GC - which
at the moment sends one address at the time - is running.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add a warning if irqs are disabled when locking a new address in rrpc.
The typical path to a new request does not disable irqs, but this is not
guaranteed in the future.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The bio is not returned if the data page cannot be allocated.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Now that a device can be managed using the system blocks, a method to
reset the device is necessary as well. This patch introduces logic to
reset the device easily to factory state and exposes it through an
ioctl.
The ioctl takes the following flags:
NVM_FACTORY_ERASE_ONLY_USER
By default all blocks, except host-reserved blocks are erased upon
factory reset. Instead of this, only erase host-reserved blocks.
NVM_FACTORY_RESET_HOST_BLKS
Mark host-reserved blocks to be erased and set their type to free.
NVM_FACTORY_RESET_GRWN_BBLKS
Mark "grown bad blocks" to be erased and set their type to free.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Use system block information to register the appropriate media manager.
This enables the LightNVM subsystem to instantiate a media manager
selected by the user, instead of relying on automatic detection by each
media manager loaded in the kernel.
A device must now be initialized before it can proceed to initialize its
media manager. Upon initialization, the configured media manager is
automatically initialized as well.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Based on the previous patch, we now introduce an ioctl to initialize the
device using nvm_init_sysblock and create the necessary system blocks.
The user may specify the media manager that they wish to instantiate on
top. Default from user-space will be "gennvm".
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
An Open-Channel SSD shall be initialized before use. To initialize, we
define an on-disk format, that keeps a small set of metadata to bring up
the media manager on top of the device.
The initial step is introduced to allow a user to format the disks for a
given media manager. During format, a system block is stored on one to
three separate luns on the device. Each lun has the system block
duplicated. During initialization, the system block can be retrieved and
the appropriate media manager can initialized.
The on-disk format currently covers (struct nvm_system_block):
- Magic value "NVMS".
- Monotonic increasing sequence number.
- The physical block erase count.
- Version of the system block format.
- Media manager type.
- Media manager superblock physical address.
The interface provides three functions to manage the system block:
int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)
Each implement a part of the logic to manage the system block. The
initialization creates the first system blocks and mark them on the
device. Get retrieves the latest system block by scanning all pages in
the associated system blocks. The update sysblock writes new metadata
and allocates new block if necessary.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
NAND MLC memories have both lower and upper pages. When programming,
both of these must be written, before data can be read. However,
these lower and upper pages might not placed at even and odd flash
pages, but can be skipped. Therefore each flash memory has its lower
pages defined, which can then be used when programming and to know when
padding are necessary.
This patch implements the lower page definition in the specification,
and exposes it through a simple lookup table at dev->lptbl.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Some flash media has extended capabilities, such as programming SLC
pages on MLC/TLC flash, erase/program suspend, scramble and encryption.
MCCAP is introduced to detect support for these capabilities in the
command set.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
LightNVM targets need to know the state of the flash block when doing
flash optimizations. An example is implementing a write buffer to
respect the flash page size. Currently, block state is not accounted
for; the media manager only differentiates among free, bad and in-use
blocks.
This patch adds the logic in the generic media manager to enable
targets manage blocks into open and close separately, and it implements
such management in rrpc. It also adds a set of flags to describe the
state of the block (open, closed, free, bad).
In order to avoid taking two locks (nvm_lun and rrpc_lun) consecutively,
we introduce lockless get_/put_block primitives so that the open and
close list locks and future common logic is handled within the nvm_lun
lock.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Currently, a rrpc block only points to its nvm_lun. If a user wants to
find the associated rrpc lun, it will have to calculate the index and
look it up manually. By referencing the rrpc lun directly, this step can
be omitted, at the cost of a larger memory footprint.
This is important for upcoming patches that implement write buffering in
rrpc.
Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Internal logic for both core and media managers, does not have a
backing bio for issuing I/Os. Introduce nvm_submit_ppa to allow raw
I/Os to be submitted to the underlying device driver.
The function request the device, ppa, data buffer and its length and
will submit the I/O synchronously to the device. The return value may
therefore be used to detect any errors regarding the issued I/O.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Instead of passing request error into the LightNVM modules, incorporate
it into the nvm_rq.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Sometimes a user want to erase multiple PPAs at the same time. Extend
nvm_erase_ppa to take multiple ppas and number of ppas to be erased.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
There is no need to check whether dev's pages per block is
beyond rrpc support every time we init a lun, we only need
to check it once before enter the lun init loop.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The Westlake controller requires that the PPA list has sectors defined
sequentially. Currently, the PPA list is created with planes first, then
sectors. Change this to sectors first, then planes.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch fix two issues in rrpc_lun_gc
1. prio_list is protected by rrpc_lun's lock not nvm_lun's, so
acquire rlun's lock instead of lun's before operate on the list.
2. we delete block from prio_list before allocating gcb, but gcb
allocation may fail, we end without putting it back to the list,
this makes the block won't get reclaimed in the future. To solve
this issue, delete block after gcb allocation.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
We delete a block from the gc list before reclaim it, so
put it back to the list on its reclaim fail, otherwise
this block will not get reclaimed and be programmable
in the future.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
We should check last io completion status before
starting another one.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
To implement sync I/O support within the LightNVM core, the end_io
functions are refactored to take an end_io function pointer instead of
testing for initialized media manager, followed by calling its end_io
function.
Sync I/O can then be implemented using a callback that signal I/O
completion. This is similar to the logic found in blk_to_execute_io().
By implementing it this way, the underlying device I/Os submission logic
is abstracted away from core, targets, and media managers.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A device may be driven in single, double or quad plane mode. In that
case, the rqd must have either one, two, or four PPAs set for a single
PPA sent to the device. Refactor this logic into their own
functions to be shared by program/erase/read in the core.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
A device may function in single, dual or quad plane mode. The gennvm
media manager manages this with explicit helpers. They convert a single
ppa to 1, 2 or 4 separate ppas in a ppa list. To aid implementation of
recovery and system blocks, this functionality can be moved directly
into the core.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When rrpc_write_ppalist_rq and rrpc_read_ppalist_rq succeed, we setup
rq correctly, but nvm_submit_io may afterward fail since it cannot
allocate request or nvme_nvm_command, we return error but forget to
cleanup the previous work.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The mempool allocation might fail. Make sure to return error when it
does, instead of causing a kernel panic.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
When initing bad block list in gennvm_block_bb, once we move bad block
from free_list to bb_list, we should maintain both stat info
nr_free_blocks and nr_bad_blocks. So this patch fixes to add missing
operation related to nr_free_blocks.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Put bio when submission fails, since we get it
before submission. And return error when backend
device driver doesn't provide a submit_io method,
thus we can end IO properly.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
dev->nr_luns reports the total number of luns available in a device
while dev->luns_per_chnl is the number of luns per channel.
When multiple channels are available, the offset is calculated from a
channel and lun id into a linear array. As it multiplies with
the total number of luns, we go out of bound when channel id > 0 and
causes the kernel to panic when we read a protected kernel memory area.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The LightNVM module exposes a debug interface when CONFIG_NVM_DEBUG is
set. This interfaces takes a string to configure media managers and
targets. Make sure this interface is only exposed when chosen
deliberately.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
After the gennvm module has been initialized. It might be attached to
one or several devices. In that case, the module is in use. Make sure
that it can not be unloaded.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This patch fixes two issues during media manager registration.
1. The ppa pool can be used at media manager registration. Allocate the
ppa pool before that.
2. If a media manager can't be found, this should not lead to the
device being unallocated. A media manager can be registered later, that
can manage the device. Only warn if a media manager fails
initialization.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
In the case where a request queue is passed to the low lever lightnvm
device drive integration, the device driver might pass its admin
commands through another queue. Instead pass nvm_dev, and let the
low level drive the appropriate queue.
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The spin_unlock is duplicated multiple times. Jump to a single unlock
to improve the code flow.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Put the allocated blocks back to the free list
when the luns configure failed, to make these
blocks useable to others.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
rrpc_get_blk use constant 0 as the input parameter
of nvm_get_blk, this may result in getting gc block
failed unexpectedly.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
To avoid race conditions, traverse dev, media manager,
and target lists and also register, unregister entries
to/from them, should be always under the nvm_lock control.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The get_bb_tbl function takes ppa as a generic address, which is
converted to the ppa device address within the device driver. When
the update_bbtbl callback is called from get_bb_tbl, the device
specific ppa is used, instead of the generic ppa.
Make sure to pass the generic ppa.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
do device max_phys_sect boundary check first, otherwise
we will allocate dma_pools for devices whose max sectors
are beyond lightnvm support and register them.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
If copy_to_user() fails we returned error but we missed releasing
devices.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Add free block, used block, and bad block information to the show debug
interface. This information is used to debug how targets track blocks.
Also, change debug function name to make it more generic.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Maintain number of in use blocks, free blocks, and bad blocks in a per
lun basis. This allows the upper layers to get information about the
state of each lun.
Also, account for blocks reserved to the device on the free block count.
nr_free_blocks matches now the actual number of blocks on the free list
when the device is booted.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
If either max_phys_sect is out of bound, the nvm_dev structure is not
freed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The return value should be non-zero under error conditions.
Remove nvme_free(dev) to avoid free dev more than once.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This prevents outstanding IOs to be sent for completion to target after
the target has been removed. The flow is now: stop new IOs > cleanup
queue > remove target.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The linear and device specific address modes can be replaced with a
simple offset and bit length conversion that is generic across all
devices.
This both simplifies the specification and removes the special case for
qemu nvme, that previously relied on the linear address mapping.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Both the nvm_register and nvm_init does a kfree(dev) on error. Make sure
to only free it once.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
We register with nvm_devices when there registration can still fail.
Move the final registration at the end of the nvm_register function
to make sure we are fully registered when added to the nvm_devices list.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Only NAND flash with SLC and MLC is supported. Make sure to not try to
initialize TLC memory or other non-volatile memory types.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The specification was changed to reflect a multi-value bad block table.
Instead of bit-based bad block table, the bad block table now allows
eight bad block categories. Currently four are defined:
* Factory bad blocks
* Grown bad blocks
* Device-side reserved blocks
* Host-side reserved blocks
The factory and grown bad blocks are the regular bad blocks. The
reserved blocks are either for internal use or external use. In
particular, the device-side reserved blocks allows the host to
bootstrap from a limited number of flash blocks. Reducing the flash
blocks to scan upon super block initialization.
Support for both get bad block table and set bad block table is added.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
No functional changes in this patch, but it prepares us for returning
a more useful cookie related to the IO that was queued up.
Signed-off-by: Jens Axboe <axboe@fb.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
For cases where CONFIG_LBDAF is not set. The struct ppa_addr exceeds its
type on 32 bit architectures. ppa_addr requires a 64bit integer to hold
the generic ppa format. We therefore refactor it to u64 and
replaces the sector_t usages with u64 for physical addresses.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
This target allows an Open-Channel SSD to be exposed asas a block
device.
It implements a round-robin approach for sector allocation,
together with a greedy cost-based garbage collector.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
The implementation for Open-Channel SSDs is divided into media
management and targets. This patch implements a generic media manager
for open-channel SSDs. After a media manager has been initialized,
single or multiple targets can be instantiated with the media managed as
the backend.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
Open-channel SSDs are devices that share responsibilities with the host
in order to implement and maintain features that typical SSDs keep
strictly in firmware. These include (i) the Flash Translation Layer
(FTL), (ii) bad block management, and (iii) hardware units such as the
flash controller, the interface controller, and large amounts of flash
chips. In this way, Open-channels SSDs exposes direct access to their
physical flash storage, while keeping a subset of the internal features
of SSDs.
LightNVM is a specification that gives support to Open-channel SSDs
LightNVM allows the host to manage data placement, garbage collection,
and parallelism. Device specific responsibilities such as bad block
management, FTL extensions to support atomic IOs, or metadata
persistence are still handled by the device.
The implementation of LightNVM consists of two parts: core and
(multiple) targets. The core implements functionality shared across
targets. This is initialization, teardown and statistics. The targets
implement the interface that exposes physical flash to user-space
applications. Examples of such targets include key-value store,
object-store, as well as traditional block devices, which can be
application-specific.
Contributions in this patch from:
Javier Gonzalez <jg@lightnvm.io>
Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Jesper Madsen <jmad@itu.dk>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>